[jira] [Updated] (GIRAPH-184) Upgrade to junit4

2012-04-17 Thread Devaraj K (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated GIRAPH-184:
-

Attachment: GIRAPH-184-1.patch

 Upgrade to junit4
 -

 Key: GIRAPH-184
 URL: https://issues.apache.org/jira/browse/GIRAPH-184
 Project: Giraph
  Issue Type: Bug
Reporter: Devaraj K
 Attachments: GIRAPH-184-1.patch, GIRAPH-184.patch


 Presently Giraph uses JUnit 3.8.1. We can upgrade to JUnit 4

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-184) Upgrade to junit4

2012-04-16 Thread Devaraj K (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated GIRAPH-184:
-

Attachment: GIRAPH-184.patch

 Upgrade to junit4
 -

 Key: GIRAPH-184
 URL: https://issues.apache.org/jira/browse/GIRAPH-184
 Project: Giraph
  Issue Type: Bug
Reporter: Devaraj K
Assignee: Devaraj K
 Attachments: GIRAPH-184.patch


 Presently Giraph uses JUnit 3.8.1. We can upgrade to JUnit 4

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-183) Add Claudio's FOSDEM presentation (slides and video) to the site

2012-04-13 Thread Claudio Martella (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Claudio Martella updated GIRAPH-183:


Attachment: GIRAPH-183.diff

Site update.

 Add Claudio's FOSDEM presentation (slides and video) to the site
 

 Key: GIRAPH-183
 URL: https://issues.apache.org/jira/browse/GIRAPH-183
 Project: Giraph
  Issue Type: Improvement
  Components: site
Reporter: Claudio Martella
Assignee: Claudio Martella
Priority: Trivial
  Labels: newbie
 Attachments: GIRAPH-183.diff


 Presentation: 
 http://prezi.com/9ake_klzwrga/apache-giraph-distributed-graph-processing-in-the-cloud/
 Video: http://www.youtube.com/watch?v=3ZrqPEIPRe4, 
 http://www.youtube.com/watch?v=BmRaejKGeDM

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-178) TestPredicate lock has lots of boolean expressions to be simplified

2012-04-13 Thread Devaraj K (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated GIRAPH-178:
-

Attachment: GIRAPH-178.patch

 TestPredicate lock has lots of boolean expressions to be simplified
 ---

 Key: GIRAPH-178
 URL: https://issues.apache.org/jira/browse/GIRAPH-178
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan
Priority: Trivial
  Labels: newbie
 Attachments: GIRAPH-178.patch


 TestPredicateLock.java has several instances of 
 {code}assertTrue(gotPredicate == false);{code} (or {{== true}}) that can be 
 simplified to more idiomatic Java.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-176) BasicRPCCommunications has unnecessary cast of Vertex

2012-04-13 Thread Devaraj K (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated GIRAPH-176:
-

Attachment: GIRAPH-176.patch

 BasicRPCCommunications has unnecessary cast of Vertex
 -

 Key: GIRAPH-176
 URL: https://issues.apache.org/jira/browse/GIRAPH-176
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan
Priority: Minor
 Attachments: GIRAPH-176.patch


 BasicRPCCommunications.java, 1224:
 {code}  BasicVertexI, V, E, M vertex =
   vertexResolver.resolve(vertexIndex,
   originalVertex,
   vertexMutations,
   messages);{code}
 and then a few lines later at 1248:
 {code}partition.putVertex((BasicVertexI, V, E, M) vertex);{code}
 vertex gets cast to its own type. This cast can be removed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-175) Replace manual array copy to utility method call

2012-04-13 Thread Devaraj K (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated GIRAPH-175:
-

Attachment: GIRAPH-175.patch

 Replace manual array copy to utility method call
 

 Key: GIRAPH-175
 URL: https://issues.apache.org/jira/browse/GIRAPH-175
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan
Priority: Trivial
 Attachments: GIRAPH-175.patch


 {code}  String[] zkJavaOptsArray = zkJavaOptsString.split( );
   if (zkJavaOptsArray != null) {
 for (String javaOpt : zkJavaOptsArray) {
   commandList.add(javaOpt);
 }
   }{code}
 Rather than doing the loop ourselves, Collections.addAll would be simpler 
 (and faster, though that doesn't matter with such a small array).  Still 
 cleaner, though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-04-11 Thread Brian Femiano (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Femiano updated GIRAPH-153:
-

Attachment: GIRAPH-153.patch

 HBase/Accumulo Input and Output formats
 ---

 Key: GIRAPH-153
 URL: https://issues.apache.org/jira/browse/GIRAPH-153
 Project: Giraph
  Issue Type: New Feature
  Components: bsp
Affects Versions: 0.1.0
 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
Reporter: Brian Femiano
 Attachments: GIRAPH-153.patch


 Four abstract classes that wrap their respective delegate input/output 
 formats for
 easy hooks into vertex input format subclasses. I've included some sample 
 programs that show two very simple graph
 algorithms. I have a graph generator that builds out a very simple directed 
 structure, starting with a few 'root' nodes.
 Root nodes are defined as nodes which are not listed as a child anywhere in 
 the graph. 
 Algorithm 1) AccumuloRootMarker.java  -- Accumulo as read/write source. 
 Every vertex starts thinking it's a root. At superstep 0, send a message down 
 to each
 child as a non-root notification. After superstep 1, only root nodes will 
 have never been messaged. 
 Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by 
 bundling the notification logic followed by root node propagation. Once we've 
 marked the appropriate nodes as roots, tell every child which roots it can be 
 traced back to via one or more spanning trees. This will take N + 2 
 supersteps where N is the maximum number of hops from any root to any leaf, 
 plus 2 supersteps for the initial root flagging. 
 I've included all relevant code plus DistributedCacheHelper.java for 
 recursive cache file and archive searches. It is more hadoop centric than 
 giraph, but these jobs use it so I figured why not commit here. 
 These have been tested through local JobRunner, pseudo-distributed on the 
 aforementioned hardware, and full distributed on EC2. More details in the 
 comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-180) Publish SNAPSHOTs and released artifacts in the Maven repository

2012-04-10 Thread Paolo Castagna (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paolo Castagna updated GIRAPH-180:
--

Description: 
Currently Giraph uses Maven to drive its build.
However, no Maven artifacts nor SNAPSHOTs are published in the Apache Maven 
repository or Maven central.
It would be useful to have Apache Giraph artifacts and SNAPSHOTs published and 
enable people to use Giraph without recompiling themselves.

Right now users can checkout Giraph, mvn install it and use this for their 
dependency:

dependency
  groupIdorg.apache.giraph/groupId
  artifactIdgiraph/artifactId
  version0.2-SNAPSHOT/version
/dependency

So, it's not that bad, but it can be better. :-)

  was:
Currently Giraph uses Maven to drive its build.
However, no Maven artifacts nor SNAPSHOTs are published in the Apache Maven 
repository or Maven central.
It would be useful to have Apache Giraph artifacts and SNAPSHOTs published and 
enable people to use Giraph without recompiling themselves.


 Publish SNAPSHOTs and released artifacts in the Maven repository
 

 Key: GIRAPH-180
 URL: https://issues.apache.org/jira/browse/GIRAPH-180
 Project: Giraph
  Issue Type: Improvement
  Components: build
Affects Versions: 0.1.0
Reporter: Paolo Castagna
Priority: Minor
   Original Estimate: 4h
  Remaining Estimate: 4h

 Currently Giraph uses Maven to drive its build.
 However, no Maven artifacts nor SNAPSHOTs are published in the Apache Maven 
 repository or Maven central.
 It would be useful to have Apache Giraph artifacts and SNAPSHOTs published 
 and enable people to use Giraph without recompiling themselves.
 Right now users can checkout Giraph, mvn install it and use this for their 
 dependency:
 dependency
   groupIdorg.apache.giraph/groupId
   artifactIdgiraph/artifactId
   version0.2-SNAPSHOT/version
 /dependency
 So, it's not that bad, but it can be better. :-)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-179) BspServiceMaster's PathFilter can be simplified

2012-04-10 Thread Devaraj K (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated GIRAPH-179:
-

Attachment: GIRAPH-179.patch

 BspServiceMaster's PathFilter can be simplified
 ---

 Key: GIRAPH-179
 URL: https://issues.apache.org/jira/browse/GIRAPH-179
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Jakob Homan
Priority: Trivial
  Labels: newbie
 Attachments: GIRAPH-179.patch


 {code}  /**
* Only get the finalized checkpoint files
*/
   public static class FinalizedCheckpointPathFilter implements PathFilter {
 @Override
 public boolean accept(Path path) {
   if (path.getName().endsWith(
   BspService.CHECKPOINT_FINALIZED_POSTFIX)) {
 return true;
   }
   return false;
 }
   }{code}
 we can simplify this, eliminating the if statement and just returning the 
 result of {{endsWith()}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-181) Add Hadoop 1.0 profile to pom.xml

2012-04-10 Thread Eugene Koontz (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koontz updated GIRAPH-181:
-

Attachment: GIRAPH-181.patch

 Add Hadoop 1.0 profile to pom.xml
 -

 Key: GIRAPH-181
 URL: https://issues.apache.org/jira/browse/GIRAPH-181
 Project: Giraph
  Issue Type: Improvement
  Components: build
Affects Versions: 0.2.0
Reporter: Eugene Koontz
Assignee: Eugene Koontz
 Fix For: 0.2.0

 Attachments: GIRAPH-181.patch


 Hadoop 1.0.x is now considered the current stable version of Hadoop, 
 according to http://hadoop.apache.org/common/releases.html#Download .
 This JIRA is to add support within Giraph's maven profile for the 1.0.x 
 Hadoop release. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-181) Add Hadoop 1.0 profile to pom.xml

2012-04-10 Thread Eugene Koontz (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koontz updated GIRAPH-181:
-

Attachment: GIRAPH-181.patch

Add support for Hadoop 1.0.2 to README. Thanks for the reminder, Avery.

Also added some whitespace formatting for consistency.


 Add Hadoop 1.0 profile to pom.xml
 -

 Key: GIRAPH-181
 URL: https://issues.apache.org/jira/browse/GIRAPH-181
 Project: Giraph
  Issue Type: Improvement
  Components: build
Affects Versions: 0.2.0
Reporter: Eugene Koontz
Assignee: Eugene Koontz
 Fix For: 0.2.0

 Attachments: GIRAPH-181.patch, GIRAPH-181.patch


 Hadoop 1.0.x is now considered the current stable version of Hadoop, 
 according to http://hadoop.apache.org/common/releases.html#Download .
 This JIRA is to add support within Giraph's maven profile for the 1.0.x 
 Hadoop release. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-182) Provide SequenceFileVertexOutputFormat as an available OutputFormat

2012-04-10 Thread Pradeep Gollakota (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Gollakota updated GIRAPH-182:
-

Attachment: GIRAPH-182-1.patch

Implemented an abstract SequenceFileVertexOutputFormat. Provided an example 
implementation.

 Provide SequenceFileVertexOutputFormat as an available OutputFormat
 ---

 Key: GIRAPH-182
 URL: https://issues.apache.org/jira/browse/GIRAPH-182
 Project: Giraph
  Issue Type: New Feature
  Components: lib
Reporter: Pradeep Gollakota
Assignee: Pradeep Gollakota
Priority: Minor
 Attachments: GIRAPH-182-1.patch


 SequenceFile's are heavily used in Hadoop. We should provide 
 SequenceFileVertexOutputFormat. Since SequenceFileVertexInputFormat is 
 already provided, it makes sense to also provide a mirroring OutputFormat

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-168) Simplify munge directive usage with new munge flag HADOOP_SECURE (rather than HADOOP_FACEBOOK) and remove usage of HADOOP

2012-04-09 Thread Eugene Koontz (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koontz updated GIRAPH-168:
-

Attachment: GIRAPH-168.patch

-removes unneeded org.apache.hadoop.giraph.zkJar from Facebook profile
-additional README content regarding maven profile usage

{code}
mvn -Phadoop_non_secure clean verify  
mvn -Phadoop_facebook 
-Dhadoop.jar.path=/Users/ekoontz/hadoop-20/build/hadoop-0.20.1-dev-core.jar 
clean verify  
mvn -Phadoop_0.20.203 clean verify  
mvn clean verify  
mvn -Phadoop_0.23 clean verify  
mvn -Phadoop_trunk clean verify
{code}
succeeds.

 Simplify munge directive usage with new munge flag HADOOP_SECURE (rather than 
 HADOOP_FACEBOOK) and remove usage of HADOOP
 -

 Key: GIRAPH-168
 URL: https://issues.apache.org/jira/browse/GIRAPH-168
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Eugene Koontz
Assignee: Eugene Koontz
 Attachments: GIRAPH-168.patch, GIRAPH-168.patch, GIRAPH-168.patch, 
 GIRAPH-168.patch, GIRAPH-168.patch, GIRAPH-168.patch


 This JIRA relates to the mail thread here: 
 http://mail-archives.apache.org/mod_mbox/incubator-giraph-dev/201203.mbox/browser
 Currently we check for the munge flags HADOOP, HADOOP_FACEBOOK and 
 HADOOP_NON_SECURE when using munge in a few places. Hopefully we can 
 eliminate usage of munge in the future, but until then, we can mitigate the 
 complexity by consolidating the number of flags checked. This JIRA renames 
 HADOOP_FACEBOOK to HADOOP_SECURE, and removes usages of HADOOP, to handle the 
 same conditional compilation requirements. It also makes it easier to add 
 more maven profiles so that we can easily increase our hadoop version 
 coverage.
 This patch modifies the existing hadoop_facebook profile to use the new 
 HADOOP_SECURE munge flag, rather than HADOOP_FACEBOOK.
 It also adds a new hadoop maven profile, hadoop_trunk, which also sets 
 HADOOP_SECURE. 
 Finally, it adds a default profile, hadoop_0.20.203. This is needed so that 
 we can specify its dependencies separately from hadoop_trunk, because the 
 hadoop dependencies have changed between trunk and 0.205.0 - the former 
 requires hadoop-common, hadoop-mapreduce-client-core, and 
 hadoop-mapreduce-client-common, whereas the latter requires hadoop-core. 
 With this patch, the following passes:
 {code}
 mvn clean verify  mvn -Phadoop_trunk clean verify  mvn -Phadoop_0.20.203 
 clean verify
 {code}
 Current problems: 
 * I left in place the usage of HADOOP_NON_SECURE, but note that the profile 
 that uses this is hadoop_non_secure, which fails to compile on trunk: 
 https://issues.apache.org/jira/browse/GIRAPH-167 .
 * I couldn't get -Phadoop_facebook to work; does this work outside of 
 Facebook?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-171) total time in MasterThread.run() is calculated incorrectly

2012-04-05 Thread Eugene Koontz (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koontz updated GIRAPH-171:
-

Attachment: GIRAPH-171.patch

 total time in MasterThread.run() is calculated incorrectly
 --

 Key: GIRAPH-171
 URL: https://issues.apache.org/jira/browse/GIRAPH-171
 Project: Giraph
  Issue Type: Bug
Reporter: Eugene Koontz
Assignee: Eugene Koontz
 Attachments: GIRAPH-171.patch


 While running PageMarkBenchMark, I was seeing in the output:
 {{graph.MasterThread(172): total: Took 1.3336739262910001E9 seconds.}}
 This was because currently, in {{MasterThread.run()}}, we have:
 {code}
 LOG.info(total: Took  +
  ((System.currentTimeMillis() / 1000.0d) -
  setupSecs) +  seconds.);
 {code}
 but it should be:
 {code}
LOG.info(total: Took  +
((System.currentTimeMillis() - startMillis) /
   1000.0d) +  seconds.);
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-168) Simplify munge directive usage with new munge flag HADOOP_SECURE (rather than HADOOP_FACEBOOK) and remove usage of HADOOP

2012-04-03 Thread Eugene Koontz (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koontz updated GIRAPH-168:
-

Attachment: GIRAPH-168.patch

Latest patch flips the set of munge directives from {HADOOP_NEWRPC, 
HADOOP_SECURE} to {HADOOP_OLDRPC,HADOOP_NON_SECURE}. HADOOP_NON_SECURE is a 
flag used currently in trunk, so this is a return back to the current trunk 
state.

Making old-RPC-signature and non-secure be the exceptional cases seems to me 
better because if we remove older Hadoop versions, we'll have also removed the 
need for having any munge directives.

Please see the flag/profile matrix for this patch below:

||profile||HADOOP_OLDRPC||HADOOP_NON_SECURE||
|hadoop_non_secure|x|x|
|hadoop_0.20.203|x||
|hadoop_0.23| | |
|hadoop_trunk| | |
|hadoop_facebook| | |


 Simplify munge directive usage with new munge flag HADOOP_SECURE (rather than 
 HADOOP_FACEBOOK) and remove usage of HADOOP
 -

 Key: GIRAPH-168
 URL: https://issues.apache.org/jira/browse/GIRAPH-168
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Eugene Koontz
Assignee: Eugene Koontz
 Attachments: GIRAPH-168.patch, GIRAPH-168.patch, GIRAPH-168.patch


 This JIRA relates to the mail thread here: 
 http://mail-archives.apache.org/mod_mbox/incubator-giraph-dev/201203.mbox/browser
 Currently we check for the munge flags HADOOP, HADOOP_FACEBOOK and 
 HADOOP_NON_SECURE when using munge in a few places. Hopefully we can 
 eliminate usage of munge in the future, but until then, we can mitigate the 
 complexity by consolidating the number of flags checked. This JIRA renames 
 HADOOP_FACEBOOK to HADOOP_SECURE, and removes usages of HADOOP, to handle the 
 same conditional compilation requirements. It also makes it easier to add 
 more maven profiles so that we can easily increase our hadoop version 
 coverage.
 This patch modifies the existing hadoop_facebook profile to use the new 
 HADOOP_SECURE munge flag, rather than HADOOP_FACEBOOK.
 It also adds a new hadoop maven profile, hadoop_trunk, which also sets 
 HADOOP_SECURE. 
 Finally, it adds a default profile, hadoop_0.20.203. This is needed so that 
 we can specify its dependencies separately from hadoop_trunk, because the 
 hadoop dependencies have changed between trunk and 0.205.0 - the former 
 requires hadoop-common, hadoop-mapreduce-client-core, and 
 hadoop-mapreduce-client-common, whereas the latter requires hadoop-core. 
 With this patch, the following passes:
 {code}
 mvn clean verify  mvn -Phadoop_trunk clean verify  mvn -Phadoop_0.20.203 
 clean verify
 {code}
 Current problems: 
 * I left in place the usage of HADOOP_NON_SECURE, but note that the profile 
 that uses this is hadoop_non_secure, which fails to compile on trunk: 
 https://issues.apache.org/jira/browse/GIRAPH-167 .
 * I couldn't get -Phadoop_facebook to work; does this work outside of 
 Facebook?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-85) Simplify return expression in RPCCommunications::getRPCProxy

2012-03-27 Thread Eli Reisman (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Reisman updated GIRAPH-85:
--

Attachment: GIRAPH-85-3.patch

This adds the SupressWarnings(unchecked) annotation to several methods that 
seem to need it for mvn verify to run successfully. It also simplifies one more 
spot in RPCCommunications.java where variables are used to temporarily hold a 
return value, but nothing is done with that value before returning it. This 
brings the grand total to 3 places where this change was made.

I would like to throw the idea out there that assigning to the proxy and 
other variables for a moment DOES have a clarity benefit that I would hate to 
prune out of the codebase just to help me practice uploading patches, which I 
have done on GIRAPH-87 and GIRAPH-157. If someone else wants to take a crack at 
this or if you guys just want to leave it the way already is to forego this 
extra practice, I will not be upset!

If not, I think this patch will work.


 Simplify return expression in RPCCommunications::getRPCProxy
 

 Key: GIRAPH-85
 URL: https://issues.apache.org/jira/browse/GIRAPH-85
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Jakob Homan
  Labels: newbie
 Fix For: 0.2.0

 Attachments: GIRAPH-85-3.patch, GIRAPH-85.patch, GIRAPH-85.patch


 Twice in RPCCommunications::getRPCProxy a local variable, proxy, is created 
 and immediately returned.  We can simplify this to just return the value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-157) Vertex to perform graph coloring on simple, connected, undirected graphs and related test.

2012-03-23 Thread Eli Reisman (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Reisman updated GIRAPH-157:
---

Attachment: GIRAPH-157-2.patch

This is an update to fix the initialization issue that IntIntNullIntVertex had 
(see GIRAPH-161) and that therefore my variation IntIntNullTextVertex carried 
with regard to possible null initialization of edges and messages. See 
GIRAPH-161 for details. I'm still looking for larger undirected, connected, 
simple graphs in line input format like vertex id outboundEdge 
[outboundEdge...]END_OF_LINE that we know the correct chromatic number of to 
test this thing on larger input graphs. So far, every graph I test it with is 
given a correct minimal coloring. Lets break this thing, anyone?

 Vertex to perform graph coloring on simple, connected, undirected graphs and 
 related test.
 --

 Key: GIRAPH-157
 URL: https://issues.apache.org/jira/browse/GIRAPH-157
 Project: Giraph
  Issue Type: Test
  Components: examples, test
Affects Versions: 0.2.0
Reporter: Eli Reisman
Assignee: Eli Reisman
Priority: Trivial
  Labels: newbie
 Attachments: GIRAPH-157-2.patch, GIRAPH-157.patch


 Hi. I am attempting to learn the Hadoop and Giraph codebases and wanted to 
 write a simple client application for Giraph to help me learn the ins and 
 outs of it. This is a simple unit test and vertex modeled after the 
 ConnectedComponentsVertex and related test. The vertex test runs whenever you 
 run the mvn test or mvn verify suite of tests. When finished processing, 
 each vertex will have an integer value that is its color.
 This is a pretty simple implementation, and although I have tested it on a 
 number of small graphs of varied trickiness and it seems to rapidly arrive at 
 a minimal coloring, its hard (for me at least) to guess which possible 
 coloring it will arrive at and I have no idea how it will do on really big 
 graphs yet without finding some more pre-colored larger test graphs to try it 
 on. Ideas anyone?
 Anyway, it was fun to put this together, and I'd be happy to improve it or 
 receive some help or advice to further the cause. Thanks again, I am hoping 
 this will be the first of many (hopefully more useful) contributions!
 Eli

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-03-22 Thread Brian Femiano (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Femiano updated GIRAPH-153:
-

Attachment: (was: AccumuloVertexOutputFormat.java)

 HBase/Accumulo Input and Output formats
 ---

 Key: GIRAPH-153
 URL: https://issues.apache.org/jira/browse/GIRAPH-153
 Project: Giraph
  Issue Type: New Feature
  Components: bsp
Affects Versions: 0.1.0
 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
Reporter: Brian Femiano

 Four abstract classes that wrap their respective delegate input/output 
 formats for
 easy hooks into vertex input format subclasses. I've included some sample 
 programs that show two very simple graph
 algorithms. I have a graph generator that builds out a very simple directed 
 structure, starting with a few 'root' nodes.
 Root nodes are defined as nodes which are not listed as a child anywhere in 
 the graph. 
 Algorithm 1) AccumuloRootMarker.java  -- Accumulo as read/write source. 
 Every vertex starts thinking it's a root. At superstep 0, send a message down 
 to each
 child as a non-root notification. After superstep 1, only root nodes will 
 have never been messaged. 
 Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by 
 bundling the notification logic followed by root node propagation. Once we've 
 marked the appropriate nodes as roots, tell every child which roots it can be 
 traced back to via one or more spanning trees. This will take N + 2 
 supersteps where N is the maximum number of hops from any root to any leaf, 
 plus 2 supersteps for the initial root flagging. 
 I've included all relevant code plus DistributedCacheHelper.java for 
 recursive cache file and archive searches. It is more hadoop centric than 
 giraph, but these jobs use it so I figured why not commit here. 
 These have been tested through local JobRunner, pseudo-distributed on the 
 aforementioned hardware, and full distributed on EC2. More details in the 
 comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-03-22 Thread Brian Femiano (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Femiano updated GIRAPH-153:
-

Attachment: (was: AccumuloRootMarkerOutputFormat.java)

 HBase/Accumulo Input and Output formats
 ---

 Key: GIRAPH-153
 URL: https://issues.apache.org/jira/browse/GIRAPH-153
 Project: Giraph
  Issue Type: New Feature
  Components: bsp
Affects Versions: 0.1.0
 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
Reporter: Brian Femiano

 Four abstract classes that wrap their respective delegate input/output 
 formats for
 easy hooks into vertex input format subclasses. I've included some sample 
 programs that show two very simple graph
 algorithms. I have a graph generator that builds out a very simple directed 
 structure, starting with a few 'root' nodes.
 Root nodes are defined as nodes which are not listed as a child anywhere in 
 the graph. 
 Algorithm 1) AccumuloRootMarker.java  -- Accumulo as read/write source. 
 Every vertex starts thinking it's a root. At superstep 0, send a message down 
 to each
 child as a non-root notification. After superstep 1, only root nodes will 
 have never been messaged. 
 Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by 
 bundling the notification logic followed by root node propagation. Once we've 
 marked the appropriate nodes as roots, tell every child which roots it can be 
 traced back to via one or more spanning trees. This will take N + 2 
 supersteps where N is the maximum number of hops from any root to any leaf, 
 plus 2 supersteps for the initial root flagging. 
 I've included all relevant code plus DistributedCacheHelper.java for 
 recursive cache file and archive searches. It is more hadoop centric than 
 giraph, but these jobs use it so I figured why not commit here. 
 These have been tested through local JobRunner, pseudo-distributed on the 
 aforementioned hardware, and full distributed on EC2. More details in the 
 comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-03-22 Thread Brian Femiano (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Femiano updated GIRAPH-153:
-

Attachment: (was: AccumuloVertexInputFormat.java)

 HBase/Accumulo Input and Output formats
 ---

 Key: GIRAPH-153
 URL: https://issues.apache.org/jira/browse/GIRAPH-153
 Project: Giraph
  Issue Type: New Feature
  Components: bsp
Affects Versions: 0.1.0
 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
Reporter: Brian Femiano

 Four abstract classes that wrap their respective delegate input/output 
 formats for
 easy hooks into vertex input format subclasses. I've included some sample 
 programs that show two very simple graph
 algorithms. I have a graph generator that builds out a very simple directed 
 structure, starting with a few 'root' nodes.
 Root nodes are defined as nodes which are not listed as a child anywhere in 
 the graph. 
 Algorithm 1) AccumuloRootMarker.java  -- Accumulo as read/write source. 
 Every vertex starts thinking it's a root. At superstep 0, send a message down 
 to each
 child as a non-root notification. After superstep 1, only root nodes will 
 have never been messaged. 
 Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by 
 bundling the notification logic followed by root node propagation. Once we've 
 marked the appropriate nodes as roots, tell every child which roots it can be 
 traced back to via one or more spanning trees. This will take N + 2 
 supersteps where N is the maximum number of hops from any root to any leaf, 
 plus 2 supersteps for the initial root flagging. 
 I've included all relevant code plus DistributedCacheHelper.java for 
 recursive cache file and archive searches. It is more hadoop centric than 
 giraph, but these jobs use it so I figured why not commit here. 
 These have been tested through local JobRunner, pseudo-distributed on the 
 aforementioned hardware, and full distributed on EC2. More details in the 
 comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-03-22 Thread Brian Femiano (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Femiano updated GIRAPH-153:
-

Attachment: (was: ComputeIsRoot.java)

 HBase/Accumulo Input and Output formats
 ---

 Key: GIRAPH-153
 URL: https://issues.apache.org/jira/browse/GIRAPH-153
 Project: Giraph
  Issue Type: New Feature
  Components: bsp
Affects Versions: 0.1.0
 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
Reporter: Brian Femiano

 Four abstract classes that wrap their respective delegate input/output 
 formats for
 easy hooks into vertex input format subclasses. I've included some sample 
 programs that show two very simple graph
 algorithms. I have a graph generator that builds out a very simple directed 
 structure, starting with a few 'root' nodes.
 Root nodes are defined as nodes which are not listed as a child anywhere in 
 the graph. 
 Algorithm 1) AccumuloRootMarker.java  -- Accumulo as read/write source. 
 Every vertex starts thinking it's a root. At superstep 0, send a message down 
 to each
 child as a non-root notification. After superstep 1, only root nodes will 
 have never been messaged. 
 Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by 
 bundling the notification logic followed by root node propagation. Once we've 
 marked the appropriate nodes as roots, tell every child which roots it can be 
 traced back to via one or more spanning trees. This will take N + 2 
 supersteps where N is the maximum number of hops from any root to any leaf, 
 plus 2 supersteps for the initial root flagging. 
 I've included all relevant code plus DistributedCacheHelper.java for 
 recursive cache file and archive searches. It is more hadoop centric than 
 giraph, but these jobs use it so I figured why not commit here. 
 These have been tested through local JobRunner, pseudo-distributed on the 
 aforementioned hardware, and full distributed on EC2. More details in the 
 comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-03-22 Thread Brian Femiano (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Femiano updated GIRAPH-153:
-

Attachment: (was: TableRootMarkerOutputFormat.java)

 HBase/Accumulo Input and Output formats
 ---

 Key: GIRAPH-153
 URL: https://issues.apache.org/jira/browse/GIRAPH-153
 Project: Giraph
  Issue Type: New Feature
  Components: bsp
Affects Versions: 0.1.0
 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
Reporter: Brian Femiano

 Four abstract classes that wrap their respective delegate input/output 
 formats for
 easy hooks into vertex input format subclasses. I've included some sample 
 programs that show two very simple graph
 algorithms. I have a graph generator that builds out a very simple directed 
 structure, starting with a few 'root' nodes.
 Root nodes are defined as nodes which are not listed as a child anywhere in 
 the graph. 
 Algorithm 1) AccumuloRootMarker.java  -- Accumulo as read/write source. 
 Every vertex starts thinking it's a root. At superstep 0, send a message down 
 to each
 child as a non-root notification. After superstep 1, only root nodes will 
 have never been messaged. 
 Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by 
 bundling the notification logic followed by root node propagation. Once we've 
 marked the appropriate nodes as roots, tell every child which roots it can be 
 traced back to via one or more spanning trees. This will take N + 2 
 supersteps where N is the maximum number of hops from any root to any leaf, 
 plus 2 supersteps for the initial root flagging. 
 I've included all relevant code plus DistributedCacheHelper.java for 
 recursive cache file and archive searches. It is more hadoop centric than 
 giraph, but these jobs use it so I figured why not commit here. 
 These have been tested through local JobRunner, pseudo-distributed on the 
 aforementioned hardware, and full distributed on EC2. More details in the 
 comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-03-22 Thread Brian Femiano (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Femiano updated GIRAPH-153:
-

Attachment: (was: HBaseVertexOutputFormat.java)

 HBase/Accumulo Input and Output formats
 ---

 Key: GIRAPH-153
 URL: https://issues.apache.org/jira/browse/GIRAPH-153
 Project: Giraph
  Issue Type: New Feature
  Components: bsp
Affects Versions: 0.1.0
 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
Reporter: Brian Femiano

 Four abstract classes that wrap their respective delegate input/output 
 formats for
 easy hooks into vertex input format subclasses. I've included some sample 
 programs that show two very simple graph
 algorithms. I have a graph generator that builds out a very simple directed 
 structure, starting with a few 'root' nodes.
 Root nodes are defined as nodes which are not listed as a child anywhere in 
 the graph. 
 Algorithm 1) AccumuloRootMarker.java  -- Accumulo as read/write source. 
 Every vertex starts thinking it's a root. At superstep 0, send a message down 
 to each
 child as a non-root notification. After superstep 1, only root nodes will 
 have never been messaged. 
 Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by 
 bundling the notification logic followed by root node propagation. Once we've 
 marked the appropriate nodes as roots, tell every child which roots it can be 
 traced back to via one or more spanning trees. This will take N + 2 
 supersteps where N is the maximum number of hops from any root to any leaf, 
 plus 2 supersteps for the initial root flagging. 
 I've included all relevant code plus DistributedCacheHelper.java for 
 recursive cache file and archive searches. It is more hadoop centric than 
 giraph, but these jobs use it so I figured why not commit here. 
 These have been tested through local JobRunner, pseudo-distributed on the 
 aforementioned hardware, and full distributed on EC2. More details in the 
 comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-03-22 Thread Brian Femiano (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Femiano updated GIRAPH-153:
-

Attachment: (was: IdentifyAndMarkRoots.java)

 HBase/Accumulo Input and Output formats
 ---

 Key: GIRAPH-153
 URL: https://issues.apache.org/jira/browse/GIRAPH-153
 Project: Giraph
  Issue Type: New Feature
  Components: bsp
Affects Versions: 0.1.0
 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
Reporter: Brian Femiano

 Four abstract classes that wrap their respective delegate input/output 
 formats for
 easy hooks into vertex input format subclasses. I've included some sample 
 programs that show two very simple graph
 algorithms. I have a graph generator that builds out a very simple directed 
 structure, starting with a few 'root' nodes.
 Root nodes are defined as nodes which are not listed as a child anywhere in 
 the graph. 
 Algorithm 1) AccumuloRootMarker.java  -- Accumulo as read/write source. 
 Every vertex starts thinking it's a root. At superstep 0, send a message down 
 to each
 child as a non-root notification. After superstep 1, only root nodes will 
 have never been messaged. 
 Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by 
 bundling the notification logic followed by root node propagation. Once we've 
 marked the appropriate nodes as roots, tell every child which roots it can be 
 traced back to via one or more spanning trees. This will take N + 2 
 supersteps where N is the maximum number of hops from any root to any leaf, 
 plus 2 supersteps for the initial root flagging. 
 I've included all relevant code plus DistributedCacheHelper.java for 
 recursive cache file and archive searches. It is more hadoop centric than 
 giraph, but these jobs use it so I figured why not commit here. 
 These have been tested through local JobRunner, pseudo-distributed on the 
 aforementioned hardware, and full distributed on EC2. More details in the 
 comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-03-22 Thread Brian Femiano (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Femiano updated GIRAPH-153:
-

Attachment: (was: SetLongWritable.java)

 HBase/Accumulo Input and Output formats
 ---

 Key: GIRAPH-153
 URL: https://issues.apache.org/jira/browse/GIRAPH-153
 Project: Giraph
  Issue Type: New Feature
  Components: bsp
Affects Versions: 0.1.0
 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
Reporter: Brian Femiano

 Four abstract classes that wrap their respective delegate input/output 
 formats for
 easy hooks into vertex input format subclasses. I've included some sample 
 programs that show two very simple graph
 algorithms. I have a graph generator that builds out a very simple directed 
 structure, starting with a few 'root' nodes.
 Root nodes are defined as nodes which are not listed as a child anywhere in 
 the graph. 
 Algorithm 1) AccumuloRootMarker.java  -- Accumulo as read/write source. 
 Every vertex starts thinking it's a root. At superstep 0, send a message down 
 to each
 child as a non-root notification. After superstep 1, only root nodes will 
 have never been messaged. 
 Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by 
 bundling the notification logic followed by root node propagation. Once we've 
 marked the appropriate nodes as roots, tell every child which roots it can be 
 traced back to via one or more spanning trees. This will take N + 2 
 supersteps where N is the maximum number of hops from any root to any leaf, 
 plus 2 supersteps for the initial root flagging. 
 I've included all relevant code plus DistributedCacheHelper.java for 
 recursive cache file and archive searches. It is more hadoop centric than 
 giraph, but these jobs use it so I figured why not commit here. 
 These have been tested through local JobRunner, pseudo-distributed on the 
 aforementioned hardware, and full distributed on EC2. More details in the 
 comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-03-22 Thread Brian Femiano (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Femiano updated GIRAPH-153:
-

Attachment: (was: TableRootMarker.java)

 HBase/Accumulo Input and Output formats
 ---

 Key: GIRAPH-153
 URL: https://issues.apache.org/jira/browse/GIRAPH-153
 Project: Giraph
  Issue Type: New Feature
  Components: bsp
Affects Versions: 0.1.0
 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
Reporter: Brian Femiano

 Four abstract classes that wrap their respective delegate input/output 
 formats for
 easy hooks into vertex input format subclasses. I've included some sample 
 programs that show two very simple graph
 algorithms. I have a graph generator that builds out a very simple directed 
 structure, starting with a few 'root' nodes.
 Root nodes are defined as nodes which are not listed as a child anywhere in 
 the graph. 
 Algorithm 1) AccumuloRootMarker.java  -- Accumulo as read/write source. 
 Every vertex starts thinking it's a root. At superstep 0, send a message down 
 to each
 child as a non-root notification. After superstep 1, only root nodes will 
 have never been messaged. 
 Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by 
 bundling the notification logic followed by root node propagation. Once we've 
 marked the appropriate nodes as roots, tell every child which roots it can be 
 traced back to via one or more spanning trees. This will take N + 2 
 supersteps where N is the maximum number of hops from any root to any leaf, 
 plus 2 supersteps for the initial root flagging. 
 I've included all relevant code plus DistributedCacheHelper.java for 
 recursive cache file and archive searches. It is more hadoop centric than 
 giraph, but these jobs use it so I figured why not commit here. 
 These have been tested through local JobRunner, pseudo-distributed on the 
 aforementioned hardware, and full distributed on EC2. More details in the 
 comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-03-22 Thread Brian Femiano (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Femiano updated GIRAPH-153:
-

Attachment: (was: SetTextWritable.java)

 HBase/Accumulo Input and Output formats
 ---

 Key: GIRAPH-153
 URL: https://issues.apache.org/jira/browse/GIRAPH-153
 Project: Giraph
  Issue Type: New Feature
  Components: bsp
Affects Versions: 0.1.0
 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
Reporter: Brian Femiano

 Four abstract classes that wrap their respective delegate input/output 
 formats for
 easy hooks into vertex input format subclasses. I've included some sample 
 programs that show two very simple graph
 algorithms. I have a graph generator that builds out a very simple directed 
 structure, starting with a few 'root' nodes.
 Root nodes are defined as nodes which are not listed as a child anywhere in 
 the graph. 
 Algorithm 1) AccumuloRootMarker.java  -- Accumulo as read/write source. 
 Every vertex starts thinking it's a root. At superstep 0, send a message down 
 to each
 child as a non-root notification. After superstep 1, only root nodes will 
 have never been messaged. 
 Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by 
 bundling the notification logic followed by root node propagation. Once we've 
 marked the appropriate nodes as roots, tell every child which roots it can be 
 traced back to via one or more spanning trees. This will take N + 2 
 supersteps where N is the maximum number of hops from any root to any leaf, 
 plus 2 supersteps for the initial root flagging. 
 I've included all relevant code plus DistributedCacheHelper.java for 
 recursive cache file and archive searches. It is more hadoop centric than 
 giraph, but these jobs use it so I figured why not commit here. 
 These have been tested through local JobRunner, pseudo-distributed on the 
 aforementioned hardware, and full distributed on EC2. More details in the 
 comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-167) mvn -Phadoop_non_secure clean verify fails

2012-03-22 Thread Eugene Koontz (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koontz updated GIRAPH-167:
-

Attachment: GIRAPH-167.patch

 mvn -Phadoop_non_secure clean verify fails
 --

 Key: GIRAPH-167
 URL: https://issues.apache.org/jira/browse/GIRAPH-167
 Project: Giraph
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Eugene Koontz
Assignee: Eugene Koontz
  Labels: build, hadoop
 Attachments: GIRAPH-167.patch


 The {{hadoop_non_secure}} profile, which uses hadoop 0.20.2, is failing to 
 compile:
 {code}
 [ERROR] COMPILATION ERROR : 
 [INFO] -
 [ERROR] 
 /Users/ekoontz/giraph/target/munged/main/org/apache/giraph/comm/RPCCommunications.java:[184,48]
  cannot find symbol
 symbol  : variable versionID
 location: class org.apache.giraph.comm.RPCCommunicationsI,V,E,M
 [INFO] 1 error
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-162) BspCase.setup() should catch FileNotFoundException thrown from org.apache.hadoop.fs.FileSystem.listStatus()

2012-03-21 Thread Eugene Koontz (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koontz updated GIRAPH-162:
-

Description: In hadoop trunk, org.apache.hadoop.fs.FileSystem.listStatus() 
is declared to throw both FileNotFoundException and IOException. The former 
(FileNotFoundException) is currently not caught when BspCase.setup() looks for 
the GiraphJob.ZOOKEEPER_MANAGER_DIR_DEFAULT directory in order to delete it. 
The listStatus() call throws FileNotException if this directory does not exist 
and causes several tests to fail when using Hadoop trunk. This exception should 
be caught and ignored during setup(), since it's not an error for this 
directory not to exist.  (was: In hadoop trunk, 
org.apache.hadoop.fs.FileSystem.listStatus() is declared to throws both 
FileNotFoundException and IOException. The former (FileNotFoundException) is 
currently not caught when BspCase.setup() looks for the 
GiraphJob.ZOOKEEPER_MANAGER_DIR_DEFAULT directory in order to delete it. The 
listStatus() call throws FileNotException if this directory does not exist and 
causes several tests to fail when using Hadoop trunk. This exception should be 
caught and ignored during setup(), since it's not an error for this directory 
not to exist.)

 BspCase.setup() should catch FileNotFoundException thrown from 
 org.apache.hadoop.fs.FileSystem.listStatus()
 ---

 Key: GIRAPH-162
 URL: https://issues.apache.org/jira/browse/GIRAPH-162
 Project: Giraph
  Issue Type: Bug
  Components: test
Affects Versions: 0.2.0
Reporter: Eugene Koontz
 Fix For: 0.2.0

 Attachments: GIRAPH-162.patch


 In hadoop trunk, org.apache.hadoop.fs.FileSystem.listStatus() is declared to 
 throw both FileNotFoundException and IOException. The former 
 (FileNotFoundException) is currently not caught when BspCase.setup() looks 
 for the GiraphJob.ZOOKEEPER_MANAGER_DIR_DEFAULT directory in order to delete 
 it. The listStatus() call throws FileNotException if this directory does not 
 exist and causes several tests to fail when using Hadoop trunk. This 
 exception should be caught and ignored during setup(), since it's not an 
 error for this directory not to exist.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-163) bin/giraph script overwrites CLASSPATH if dev environment detected (this also removes USER_JAR from CLASSPATH)

2012-03-21 Thread Benjamin Heitmann (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Heitmann updated GIRAPH-163:
-

Attachment: GIRAPH-163.patch

this is a small patch to fix the described problem

 bin/giraph script overwrites CLASSPATH if dev environment detected (this 
 also removes USER_JAR from CLASSPATH)
 

 Key: GIRAPH-163
 URL: https://issues.apache.org/jira/browse/GIRAPH-163
 Project: Giraph
  Issue Type: Improvement
  Components: conf and scripts
Affects Versions: 0.1.0, 0.2.0
 Environment: current trunk of giraph, after running mvn compile (as 
 advised in the quick start guide). 
 Also Hadoop 1.0.1 was used. 
Reporter: Benjamin Heitmann
  Labels: newbie
 Attachments: GIRAPH-163.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 If no ./lib dir is present, then the bin/giraph script assumes it is running 
 in a dev environment. 
 This chooses an execution path through the bin/giraph script, which 
 overwrites the CLASSPATH variable instead of appending to it. 
 Incidentally, this also removes the name of the jar submitted by the user, 
 which got appended to CLASSPATH earlier in the script. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-164) fix 5 Line is longer than 80 characters style errors in GiraphRunner

2012-03-21 Thread Eugene Koontz (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koontz updated GIRAPH-164:
-

Attachment: GIRAPH-164.patch

 fix 5 Line is longer than 80 characters style errors in GiraphRunner
 --

 Key: GIRAPH-164
 URL: https://issues.apache.org/jira/browse/GIRAPH-164
 Project: Giraph
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Eugene Koontz
Priority: Trivial
 Fix For: 0.2.0

 Attachments: GIRAPH-164.patch


 {code}
 file 
 name=/Users/ekoontz/giraph/src/main/java/org/apache/giraph/GiraphRunner.java
   error line=155 severity=error message=Line is longer than 80 
 characters. 
 source=com.puppycrawl.tools.checkstyle.checks.sizes.LineLengthCheck/
   error line=156 severity=error message=Line is longer than 80 
 characters. 
 source=com.puppycrawl.tools.checkstyle.checks.sizes.LineLengthCheck/
   error line=158 severity=error message=Line is longer than 80 
 characters. 
 source=com.puppycrawl.tools.checkstyle.checks.sizes.LineLengthCheck/
   error line=161 severity=error message=Line is longer than 80 
 characters. 
 source=com.puppycrawl.tools.checkstyle.checks.sizes.LineLengthCheck/
 /file
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-165) checkstyle error: 'conf'hides a field' on line 154 of GraphRunner

2012-03-21 Thread Eugene Koontz (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koontz updated GIRAPH-165:
-

Attachment: GIRAPH-165.patch

 checkstyle error: 'conf'hides a field' on line 154 of GraphRunner
 -

 Key: GIRAPH-165
 URL: https://issues.apache.org/jira/browse/GIRAPH-165
 Project: Giraph
  Issue Type: Bug
Reporter: Eugene Koontz
Priority: Minor
 Attachments: GIRAPH-165.patch


 full checkstyle error is 
 {code}
 file 
 name=/Users/ekoontz/giraph/src/main/java/org/apache/giraph/GiraphRunner.java
 error line=154 column=21 severity=error message=apos;confapos; 
 hides a field. 
 source=com.puppycrawl.tools.checkstyle.checks.coding.HiddenFieldCheck/
 /file
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-165) checkstyle error: 'conf' hides a field on line 154 of GraphRunner

2012-03-21 Thread Eugene Koontz (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koontz updated GIRAPH-165:
-

Summary: checkstyle error:  'conf' hides a field on line 154 of 
GraphRunner  (was: checkstyle error: 'conf'hides a field' on line 154 of 
GraphRunner)

 checkstyle error:  'conf' hides a field on line 154 of GraphRunner
 

 Key: GIRAPH-165
 URL: https://issues.apache.org/jira/browse/GIRAPH-165
 Project: Giraph
  Issue Type: Bug
Reporter: Eugene Koontz
Priority: Minor
 Attachments: GIRAPH-165.patch


 full checkstyle error is 
 {code}
 file 
 name=/Users/ekoontz/giraph/src/main/java/org/apache/giraph/GiraphRunner.java
 error line=154 column=21 severity=error message=apos;confapos; 
 hides a field. 
 source=com.puppycrawl.tools.checkstyle.checks.coding.HiddenFieldCheck/
 /file
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-166) add '*.patch' to list of files that Apache Rat ignores

2012-03-21 Thread Eugene Koontz (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koontz updated GIRAPH-166:
-

Attachment: (was: pom.xml)

 add '*.patch' to list of files that Apache Rat ignores
 --

 Key: GIRAPH-166
 URL: https://issues.apache.org/jira/browse/GIRAPH-166
 Project: Giraph
  Issue Type: Improvement
Reporter: Eugene Koontz
Priority: Trivial
 Attachments: GIRAPH-166.patch


 Apache Rat will complain about too many files without licenses if it finds 
 any *.patch files in your working directory. Rat should ignore these since 
 they are temp files that aren't included in the distribution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-166) add '*.patch' to list of files that Apache Rat ignores

2012-03-21 Thread Eugene Koontz (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koontz updated GIRAPH-166:
-

Attachment: GIRAPH-166.patch

 add '*.patch' to list of files that Apache Rat ignores
 --

 Key: GIRAPH-166
 URL: https://issues.apache.org/jira/browse/GIRAPH-166
 Project: Giraph
  Issue Type: Improvement
Reporter: Eugene Koontz
Priority: Trivial
 Attachments: GIRAPH-166.patch


 Apache Rat will complain about too many files without licenses if it finds 
 any *.patch files in your working directory. Rat should ignore these since 
 they are temp files that aren't included in the distribution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-167) mvn -Phadoop_non_secure clean verify fails

2012-03-21 Thread Eugene Koontz (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koontz updated GIRAPH-167:
-

Description: 
The {{hadoop_non_secure}} profile, which uses hadoop 0.20.2, is failing to 
compile:

{code}
[ERROR] COMPILATION ERROR : 
[INFO] -
[ERROR] 
/Users/ekoontz/giraph/target/munged/main/org/apache/giraph/comm/RPCCommunications.java:[184,48]
 cannot find symbol
symbol  : variable versionID
location: class org.apache.giraph.comm.RPCCommunicationsI,V,E,M
[INFO] 1 error
{code}


  was:
The {{hadoop_non_secure}} profile, which uses hadoop 0.20.2, is failing to 
compile:

{code}
[ERROR] COMPILATION ERROR : 
[INFO] -
[ERROR] 
/Users/ekoontz/giraph/target/munged/main/org/apache/giraph/graph/partition/RangePartitionOwner.java:[26,27]
 package org.apache.hadoop.io does not exist
[ERROR] 
/Users/ekoontz/giraph/target/munged/main/org/apache/giraph/graph/partition/BasicPartitionOwner.java:[26,29]
 package org.apache.hadoop.conf does not exist
[ERROR] 
/Users/ekoontz/giraph/target/munged/main/org/apache/giraph/graph/partition/BasicPartitionOwner.java:[27,29]
 package org.apache.hadoop.conf does not exist
[ERROR] 
/Users/ekoontz/giraph/target/munged/main/org/apache/giraph/graph/partition/PartitionOwner.java:[22,27]
 package org.apache.hadoop.io does not exist
[ERROR] 
/Users/ekoontz/giraph/target/munged/main/org/apache/giraph/graph/partition/PartitionOwner.java:[27,40]
 cannot find symbol
symbol: class Writable
{code}

(more error messages follow)


 mvn -Phadoop_non_secure clean verify fails
 --

 Key: GIRAPH-167
 URL: https://issues.apache.org/jira/browse/GIRAPH-167
 Project: Giraph
  Issue Type: Bug
Reporter: Eugene Koontz

 The {{hadoop_non_secure}} profile, which uses hadoop 0.20.2, is failing to 
 compile:
 {code}
 [ERROR] COMPILATION ERROR : 
 [INFO] -
 [ERROR] 
 /Users/ekoontz/giraph/target/munged/main/org/apache/giraph/comm/RPCCommunications.java:[184,48]
  cannot find symbol
 symbol  : variable versionID
 location: class org.apache.giraph.comm.RPCCommunicationsI,V,E,M
 [INFO] 1 error
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-168) Simplify munge directive usage with new munge flag HADOOP_SECURE rather than HADOOP_FACEBOOK and HADOOP_NON_SECURE

2012-03-21 Thread Eugene Koontz (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koontz updated GIRAPH-168:
-

Attachment: GIRAPH-168.patch

 Simplify munge directive usage with new munge flag HADOOP_SECURE rather than 
 HADOOP_FACEBOOK and HADOOP_NON_SECURE
 --

 Key: GIRAPH-168
 URL: https://issues.apache.org/jira/browse/GIRAPH-168
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Eugene Koontz
Assignee: Eugene Koontz
 Attachments: GIRAPH-168.patch


 This JIRA relates to the mail thread here: 
 http://mail-archives.apache.org/mod_mbox/incubator-giraph-dev/201203.mbox/browser
 Currently we check for the munge flags HADOOP and HADOOP_FACEBOOK and 
 HADOOP_NON_SECURE when using munge in a few places. Hopefully we can 
 eliminate usage of munge in the future, but until then, we can mitigate the 
 complexity by consolidating the number of flags checked. This JIRA proposes a 
 single flag, HADOOP_SECURE, to handle the same conditional compilation 
 requirements. It also makes it easier to add more maven profiles so that we 
 can easily increase our hadoop version coverage.
 This patch modifies the existing hadoop_facebook profile to use the new 
 HADOOP_SECURE munge flag, rather than HADOOP_FACEBOOK.
 It also adds a new hadoop maven profile, hadoop_trunk, which also sets 
 HADOOP_SECURE. 
 Finally, it adds a default profile, hadoop_0.20.203. This is needed so that 
 we can specify its dependencies separately from hadoop_trunk, because the 
 hadoop dependencies have changed between trunk and 0.205.0 - the former 
 requires hadoop-common, hadoop-mapreduce-client-core, and 
 hadoop-mapreduce-client-common, whereas the latter requires hadoop-core. 
 With this patch, the following passes:
 {code}
 mvn clean verify  mvn -Phadoop_trunk clean verify  mvn -Phadoop_0.20.203 
 clean verify
 {code}
 Current problems: 
 * I left in place the usage of HADOOP_NON_SECURE, but note that the profile 
 that uses this is hadoop_non_secure, which fails to compile on trunk: 
 https://issues.apache.org/jira/browse/GIRAPH-167 .
 * I couldn't get -Phadoop_facebook to work; does this work outside of 
 Facebook?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-168) Simplify munge directive usage with new munge flag HADOOP_SECURE rather than HADOOP_FACEBOOK

2012-03-21 Thread Eugene Koontz (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koontz updated GIRAPH-168:
-

Summary: Simplify munge directive usage with new munge flag HADOOP_SECURE 
rather than HADOOP_FACEBOOK  (was: Simplify munge directive usage with new 
munge flag HADOOP_SECURE rather than HADOOP_FACEBOOK and HADOOP_NON_SECURE)

 Simplify munge directive usage with new munge flag HADOOP_SECURE rather than 
 HADOOP_FACEBOOK
 

 Key: GIRAPH-168
 URL: https://issues.apache.org/jira/browse/GIRAPH-168
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Eugene Koontz
Assignee: Eugene Koontz
 Attachments: GIRAPH-168.patch


 This JIRA relates to the mail thread here: 
 http://mail-archives.apache.org/mod_mbox/incubator-giraph-dev/201203.mbox/browser
 Currently we check for the munge flags HADOOP and HADOOP_FACEBOOK and 
 HADOOP_NON_SECURE when using munge in a few places. Hopefully we can 
 eliminate usage of munge in the future, but until then, we can mitigate the 
 complexity by consolidating the number of flags checked. This JIRA proposes a 
 single flag, HADOOP_SECURE, to handle the same conditional compilation 
 requirements. It also makes it easier to add more maven profiles so that we 
 can easily increase our hadoop version coverage.
 This patch modifies the existing hadoop_facebook profile to use the new 
 HADOOP_SECURE munge flag, rather than HADOOP_FACEBOOK.
 It also adds a new hadoop maven profile, hadoop_trunk, which also sets 
 HADOOP_SECURE. 
 Finally, it adds a default profile, hadoop_0.20.203. This is needed so that 
 we can specify its dependencies separately from hadoop_trunk, because the 
 hadoop dependencies have changed between trunk and 0.205.0 - the former 
 requires hadoop-common, hadoop-mapreduce-client-core, and 
 hadoop-mapreduce-client-common, whereas the latter requires hadoop-core. 
 With this patch, the following passes:
 {code}
 mvn clean verify  mvn -Phadoop_trunk clean verify  mvn -Phadoop_0.20.203 
 clean verify
 {code}
 Current problems: 
 * I left in place the usage of HADOOP_NON_SECURE, but note that the profile 
 that uses this is hadoop_non_secure, which fails to compile on trunk: 
 https://issues.apache.org/jira/browse/GIRAPH-167 .
 * I couldn't get -Phadoop_facebook to work; does this work outside of 
 Facebook?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-159) Case insensitive file/directory name matching will produce errors on M/R jar unpack.

2012-03-19 Thread Brian Femiano (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Femiano updated GIRAPH-159:
-

Attachment: GIRAPH-159.patch

 Case insensitive file/directory name matching will produce errors on M/R jar 
 unpack. 
 -

 Key: GIRAPH-159
 URL: https://issues.apache.org/jira/browse/GIRAPH-159
 Project: Giraph
  Issue Type: Bug
  Components: build
Affects Versions: 0.2.0
 Environment: OSX 10.6.8
Reporter: Brian Femiano
Priority: Minor
 Attachments: GIRAPH-159.patch


 This only seems to affect platforms where there can be a file/directory 
 naming conflicts
 from case insensitive matches. 
  
 I was able to reproduce running the pseudo-distributed unit tests within OSX.
 This has affected other projects: 
 https://issues.apache.org/jira/browse/MAHOUT-780
 I've been able to reproduce this on my local OSX install with the following 
 error:
 https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/a201218000e956d3/cc6eca3ef9f80ff8
 Since LICENSE.txt contains the same content as the file LICENSE, I propose we 
 exclude any LICENSE matches found in the unpacked dependency jars
 when the maven assembly phase hits 'jar-with-dependencies'. 
 I have a patch which moves the 'jar-with-dependencies' descriptor to an 
 external compile.xml file which has the proper excludes. This might also
 come in handy down the road should any additional tweaks be needed to the 
 compile phase. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-159) Case insensitive file/directory name matching will produce errors on M/R jar unpack.

2012-03-19 Thread Brian Femiano (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Femiano updated GIRAPH-159:
-

Attachment: (was: GIRAPH-159.patch)

 Case insensitive file/directory name matching will produce errors on M/R jar 
 unpack. 
 -

 Key: GIRAPH-159
 URL: https://issues.apache.org/jira/browse/GIRAPH-159
 Project: Giraph
  Issue Type: Bug
  Components: build
Affects Versions: 0.2.0
 Environment: OSX 10.6.8
Reporter: Brian Femiano
Priority: Minor

 This only seems to affect platforms where there can be a file/directory 
 naming conflicts
 from case insensitive matches. 
  
 I was able to reproduce running the pseudo-distributed unit tests within OSX.
 This has affected other projects: 
 https://issues.apache.org/jira/browse/MAHOUT-780
 I've been able to reproduce this on my local OSX install with the following 
 error:
 https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/a201218000e956d3/cc6eca3ef9f80ff8
 Since LICENSE.txt contains the same content as the file LICENSE, I propose we 
 exclude any LICENSE matches found in the unpacked dependency jars
 when the maven assembly phase hits 'jar-with-dependencies'. 
 I have a patch which moves the 'jar-with-dependencies' descriptor to an 
 external compile.xml file which has the proper excludes. This might also
 come in handy down the road should any additional tweaks be needed to the 
 compile phase. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-159) Case insensitive file/directory name matching will produce errors on M/R jar unpack.

2012-03-19 Thread Brian Femiano (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Femiano updated GIRAPH-159:
-

Attachment: compile.xml
GIRAPH-159.patch

 Case insensitive file/directory name matching will produce errors on M/R jar 
 unpack. 
 -

 Key: GIRAPH-159
 URL: https://issues.apache.org/jira/browse/GIRAPH-159
 Project: Giraph
  Issue Type: Bug
  Components: build
Affects Versions: 0.2.0
 Environment: OSX 10.6.8
Reporter: Brian Femiano
Priority: Minor
 Attachments: GIRAPH-159.patch, compile.xml


 This only seems to affect platforms where there can be a file/directory 
 naming conflicts
 from case insensitive matches. 
  
 I was able to reproduce running the pseudo-distributed unit tests within OSX.
 This has affected other projects: 
 https://issues.apache.org/jira/browse/MAHOUT-780
 I've been able to reproduce this on my local OSX install with the following 
 error:
 https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/a201218000e956d3/cc6eca3ef9f80ff8
 Since LICENSE.txt contains the same content as the file LICENSE, I propose we 
 exclude any LICENSE matches found in the unpacked dependency jars
 when the maven assembly phase hits 'jar-with-dependencies'. 
 I have a patch which moves the 'jar-with-dependencies' descriptor to an 
 external compile.xml file which has the proper excludes. This might also
 come in handy down the road should any additional tweaks be needed to the 
 compile phase. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-156) Users should be able to set simple 'custom arguments' via org.apache.giraph.GiraphRunner

2012-03-18 Thread Sebastian Schelter (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Schelter updated GIRAPH-156:
--

Attachment: GIRAPH-156-2.patch

 Users should be able to set simple 'custom arguments' via 
 org.apache.giraph.GiraphRunner
 

 Key: GIRAPH-156
 URL: https://issues.apache.org/jira/browse/GIRAPH-156
 Project: Giraph
  Issue Type: Improvement
  Components: conf and scripts
Affects Versions: 0.1.0
Reporter: Sebastian Schelter
Assignee: Sebastian Schelter
 Fix For: 0.2.0

 Attachments: GIRAPH-156-1.patch, GIRAPH-156-2.patch, GIRAPH-156.patch


 Some vertices need custom arguments to run. The SimpleShortestPathsVertex for 
 example needs to know the source vertex for the computation which is saved in 
 the job's Configuration as _SimpleShortestPathsVertex.sourceId_. Users should 
 be able to apply such simple custom arguments via GiraphRunner. 
 I propose to add a new option _--customArguments_ where users can supply 
 arguments in the form _param1=value1,param2=value2_ for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-157) Vertex to perform graph coloring on simple, connected, undirected graphs and related test.

2012-03-17 Thread Eli Reisman (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Reisman updated GIRAPH-157:
---

Attachment: GIRAPH-157.patch

 Vertex to perform graph coloring on simple, connected, undirected graphs and 
 related test.
 --

 Key: GIRAPH-157
 URL: https://issues.apache.org/jira/browse/GIRAPH-157
 Project: Giraph
  Issue Type: Test
  Components: examples, test
Affects Versions: 0.2.0
Reporter: Eli Reisman
Assignee: Eli Reisman
Priority: Trivial
  Labels: newbie
 Attachments: GIRAPH-157.patch


 Hi. I am attempting to learn the Hadoop and Giraph codebases and wanted to 
 write a simple client application for Giraph to help me learn the ins and 
 outs of it. This is a simple unit test and vertex modeled after the 
 ConnectedComponentsVertex and related test. The vertex test runs whenever you 
 run the mvn test or mvn verify suite of tests. When finished processing, 
 each vertex will have an integer value that is its color.
 This is a pretty simple implementation, and although I have tested it on a 
 number of small graphs of varied trickiness and it seems to rapidly arrive at 
 a minimal coloring, its hard (for me at least) to guess which possible 
 coloring it will arrive at and I have no idea how it will do on really big 
 graphs yet without finding some more pre-colored larger test graphs to try it 
 on. Ideas anyone?
 Anyway, it was fun to put this together, and I'd be happy to improve it or 
 receive some help or advice to further the cause. Thanks again, I am hoping 
 this will be the first of many (hopefully more useful) contributions!
 Eli

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-158) Support YARN (next generation MapReduce)

2012-03-17 Thread Eugene Koontz (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koontz updated GIRAPH-158:
-

Attachment: GIRAPH-158.patch

This patch passes mvn verify with pom.xml modified to use 0.24.0-SNAPSHOT 
as hadoop.version.

 Support YARN (next generation MapReduce)
 

 Key: GIRAPH-158
 URL: https://issues.apache.org/jira/browse/GIRAPH-158
 Project: Giraph
  Issue Type: New Feature
Reporter: Eugene Koontz
 Attachments: GIRAPH-158.patch


 YARN is a re-architecturing of the Hadoop MapReduce framework, described here:
 http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/YARN.html
 It would be good to offer support within Giraph for this framework. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-156) Users should be able to set simple 'custom arguments' via org.apache.giraph.GiraphRunner

2012-03-16 Thread Sebastian Schelter (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Schelter updated GIRAPH-156:
--

Attachment: GIRAPH-156-1.patch

had a missing whitespace in the last patch :)

 Users should be able to set simple 'custom arguments' via 
 org.apache.giraph.GiraphRunner
 

 Key: GIRAPH-156
 URL: https://issues.apache.org/jira/browse/GIRAPH-156
 Project: Giraph
  Issue Type: Improvement
  Components: conf and scripts
Affects Versions: 0.1.0
Reporter: Sebastian Schelter
Assignee: Sebastian Schelter
 Attachments: GIRAPH-156-1.patch, GIRAPH-156.patch


 Some vertices need custom arguments to run. The SimpleShortestPathsVertex for 
 example needs to know the source vertex for the computation which is saved in 
 the job's Configuration as _SimpleShortestPathsVertex.sourceId_. Users should 
 be able to apply such simple custom arguments via GiraphRunner. 
 I propose to add a new option _--customArguments_ where users can supply 
 arguments in the form _param1=value1,param2=value2_ for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-154) Worker ports are not synched properly with its peers

2012-03-15 Thread Zhiwei Gu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiwei Gu updated GIRAPH-154:
-

Attachment: GIRAPH-154.patch

passed unit test and grid test.

 Worker ports are not synched properly with its peers
 

 Key: GIRAPH-154
 URL: https://issues.apache.org/jira/browse/GIRAPH-154
 Project: Giraph
  Issue Type: Bug
  Components: bsp
Affects Versions: 0.2.0
Reporter: Zhiwei Gu
Assignee: Zhiwei Gu
 Attachments: GIRAPH-154.patch


 When worker trying multiple ports to setup the rpc server, the final port is 
 not synched with it's peer workers properly, and resulted in peer workers 
 send message to the default port.
 Here is some logs:
 
 Base port: 34900
 
 
 log for worker 161:
 
 IPC Server handler 98 on 36061: starting
 BasicRPCCommunications: Started RPC communication server: 
 gsta32085.tan.ygrid.yahoo.com/10.216.148.47:36061 with 100 handlers and 199 
 flush threads on bind attempt 1
 IPC Server handler 99 on 36061: starting
 setup: Registering health of this worker...
 getJobState: Job state already exists 
 (/_hadoopBsp/job_201203130609_14838/_masterJobState)
 getApplicationAttempt: Node 
 /_hadoopBsp/job_201203130609_14838/_applicationAttemptsDir already exists!
 getApplicationAttempt: Node 
 /_hadoopBsp/job_201203130609_14838/_applicationAttemptsDir already exists!
 registerHealth: Created my health node for attempt=0, superstep=-1 with 
 /_hadoopBsp/job_201203130609_14838/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/gsta32085.tan.ygrid.yahoo.com_161
  and workerInfo= Worker(hostname=gsta32085.tan.ygrid.yahoo.com, 
 MRpartition=161, port=35061)
 process: partitionAssignmentsReadyChanged (partitions are assigned)
 startSuperstep: Ready for computation on superstep -1 since worker selection 
 and vertex range assignments are done in 
 /_hadoopBsp/job_201203130609_14838/_applicationAttemptsDir/0/_superstepDir/-1/_partitionAssignments
 Retrying connect to server: 
 gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 0 time(s).
 Retrying connect to server: 
 gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 1 time(s).
 Retrying connect to server: 
 gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 2 time(s).
 Retrying connect to server: 
 gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 3 time(s).
 Retrying connect to server: 
 gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 4 time(s).
 Retrying connect to server: 
 gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 5 time(s).
 Retrying connect to server: 
 gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 6 time(s).
 Retrying connect to server: 
 gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 7 time(s).
 Retrying connect to server: 
 gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 8 time(s).
 Retrying connect to server: 
 gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 9 time(s).
 Retrying connect to server: 
 gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 10 time(s).
 Retrying connect to server: 
 gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 11 time(s).
 Retrying connect to server: 
 gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 12 time(s).
 Retrying connect to server: 
 gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 13 time(s).
 Retrying connect to server: 
 gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 14 time(s).
 Retrying connect to server: 
 gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 15 time(s).
 Retrying connect to server: 
 gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 16 time(s).
 Retrying connect to server: 
 gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 17 time(s).
 Retrying connect to server: 
 gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 18 time(s).
 Retrying connect to server: 
 gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 19 time(s).
 Retrying connect to server: 
 gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 20 time(s).
 Retrying connect to server: 
 gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 21 time(s).
 Retrying connect to server: 
 gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 22 time(s).
 Retrying connect to server: 
 gsta32085.tan.ygrid.yahoo.com

[jira] [Updated] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-03-07 Thread Brian Femiano (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Femiano updated GIRAPH-153:
-

 Labels:   (was: gir)
Description: 
Four abstract classes that wrap their respective delegate input/output formats 
for
easy hooks into vertex input format subclasses. I've included some sample 
programs that show two very simple graph
algorithms. I have a graph generator that builds out a very simple directed 
structure, starting with a few 'root' nodes.

Root nodes are defined as nodes which are not listed as a child anywhere in the 
graph. 

Algorithm 1) AccumuloRootMarker.java  -- Accumulo as read/write source. Every 
vertex starts thinking it's a root. At superstep 0, send a message down to each
child as a non-root notification. After superstep 1, only root nodes will have 
never been messaged. 

Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by 
bundling the notification logic followed by root node propagation. Once we've 
marked the appropriate nodes as roots, tell every child which roots it can be 
traced back to via one or more spanning trees. This will take N + 2 supersteps 
where N is the maximum number of hops from any root to any leaf, plus 2 
supersteps for the initial root flagging. 

I've included all relevant code plus DistributedCacheHelper.java for recursive 
cache file and archive searches. It is more hadoop centric than giraph, but 
these jobs use it so I figured why not commit here. 

These have been tested through local JobRunner, pseudo-distributed on the 
aforementioned hardware, and full distributed on EC2. More details in the 
comments.



  was:
Four abstract classes that wrap their respective delegate input/output formats 
for
easy hooks into vertex input format subclasses. I've included some sample 
programs that show two very simple graph
algorithms. I have a graph generator that builds out a very simple direct 
structure, starting with a few 'root' nodes.

Root nodes are defined as nodes that is not listed as a child anywhere in the 
graph. 

Algorithm 1) AccumuloRootMarker.java  -- Accumulo as read/write source. Every 
vertex starts thinking it's a root. At superstep 0, send a message down to each
child as a non-root notification. After superstep 1, only root nodes will have 
never been messaged. 

Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by 
bundling the notification logic followed by root node propagation. Once we've 
marked the appropriate nodes as roots, tell every child which roots it can be 
traced back to via one or more spanning trees. This will take N + 2 supersteps 
where N is the maximum number of hops from any root to any leaf, plus 2 
supersteps for the initial root flagging. 

I've included all relevant code plus DistributedCacheHelper.java for recursive 
cache file and archive searches. It is more hadoop centric than giraph in 
particular, but these jobs use it so I figured why not commit here. 

These have been tested through local JobRunner, pseudo-distributed on the 
aforementioned hardware, and full distributed on EC2. More details in the 
comments.




 HBase/Accumulo Input and Output formats
 ---

 Key: GIRAPH-153
 URL: https://issues.apache.org/jira/browse/GIRAPH-153
 Project: Giraph
  Issue Type: New Feature
  Components: bsp
Affects Versions: 0.1.0
 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
Reporter: Brian Femiano
 Attachments: AccumuloRootMarker.java, 
 AccumuloRootMarkerInputFormat.java, AccumuloRootMarkerOutputFormat.java, 
 AccumuloVertexInputFormat.java, AccumuloVertexOutputFormat.java, 
 ComputeIsRoot.java, DistributedCacheHelper.java, HBaseVertexInputFormat.java, 
 HBaseVertexOutputFormat.java, IdentifyAndMarkRoots.java, 
 SetLongWritable.java, SetTextWritable.java, TableRootMarker.java, 
 TableRootMarkerInputFormat.java, TableRootMarkerOutputFormat.java


 Four abstract classes that wrap their respective delegate input/output 
 formats for
 easy hooks into vertex input format subclasses. I've included some sample 
 programs that show two very simple graph
 algorithms. I have a graph generator that builds out a very simple directed 
 structure, starting with a few 'root' nodes.
 Root nodes are defined as nodes which are not listed as a child anywhere in 
 the graph. 
 Algorithm 1) AccumuloRootMarker.java  -- Accumulo as read/write source. 
 Every vertex starts thinking it's a root. At superstep 0, send a message down 
 to each
 child as a non-root notification. After superstep 1, only root nodes will 
 have never been messaged. 
 Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by 
 bundling the notification logic followed by root node propagation. Once we've 
 marked the appropriate nodes as roots, tell every

[jira] [Updated] (GIRAPH-87) Simplify boolean expression in BspService::checkpointFrequencyMet

2012-02-24 Thread Eli Reisman (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-87?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Reisman updated GIRAPH-87:
--

Attachment: GIRAPH-87.patch

This is the patch for GIRAPH-87 JIRA newbie issue. Passed mvn test, not 
tested on cluster.

 Simplify boolean expression in BspService::checkpointFrequencyMet
 -

 Key: GIRAPH-87
 URL: https://issues.apache.org/jira/browse/GIRAPH-87
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Jakob Homan
Assignee: Eli Reisman
  Labels: newbie
 Attachments: GIRAPH-87.patch


 {noformat}if (superstep  firstCheckpoint) {
 return false;
 } else if (((superstep - firstCheckpoint) % checkpointFrequency) == 
 0) {
 return true;
 } else {
 return false;
 }{noformat}
 can be simplified to just return the result of the else if evaluation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-85) Simplify return expression in RPCCommunications::getRPCProxy

2012-02-24 Thread Eli Reisman (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Reisman updated GIRAPH-85:
--

Attachment: GIRAPH-85.patch

Simplifies 2 return statements without changing program logic or flow. Passes 
mvn test but not tested on cluster.

 Simplify return expression in RPCCommunications::getRPCProxy
 

 Key: GIRAPH-85
 URL: https://issues.apache.org/jira/browse/GIRAPH-85
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Jakob Homan
  Labels: newbie
 Fix For: 0.2.0

 Attachments: GIRAPH-85.patch


 Twice in RPCCommunications::getRPCProxy a local variable, proxy, is created 
 and immediately returned.  We can simplify this to just return the value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-87) Simplify boolean expression in BspService::checkpointFrequencyMet

2012-02-24 Thread Eli Reisman (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-87?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Reisman updated GIRAPH-87:
--

Attachment: GIRAPH-87.patch

This is an improved version of GIRAPH-87.patch that passes mvn 
checkstyle:check and of course also mvn test. Not tested on a cluster.

 Simplify boolean expression in BspService::checkpointFrequencyMet
 -

 Key: GIRAPH-87
 URL: https://issues.apache.org/jira/browse/GIRAPH-87
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Jakob Homan
Assignee: Eli Reisman
  Labels: newbie
 Attachments: GIRAPH-87.patch, GIRAPH-87.patch


 {noformat}if (superstep  firstCheckpoint) {
 return false;
 } else if (((superstep - firstCheckpoint) % checkpointFrequency) == 
 0) {
 return true;
 } else {
 return false;
 }{noformat}
 can be simplified to just return the result of the else if evaluation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-85) Simplify return expression in RPCCommunications::getRPCProxy

2012-02-24 Thread Eli Reisman (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Reisman updated GIRAPH-85:
--

Attachment: GIRAPH-85.patch

This is an improved version of GIRAPH-85.patch done in redone in git and 
meeting the mvn test and mvn checkstyle:check guidelines. Not tested on 
cluster setup.

 Simplify return expression in RPCCommunications::getRPCProxy
 

 Key: GIRAPH-85
 URL: https://issues.apache.org/jira/browse/GIRAPH-85
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Jakob Homan
  Labels: newbie
 Fix For: 0.2.0

 Attachments: GIRAPH-85.patch, GIRAPH-85.patch


 Twice in RPCCommunications::getRPCProxy a local variable, proxy, is created 
 and immediately returned.  We can simplify this to just return the value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-150) PageRankBenchmark accesses wrong conf after GiraphJob is created

2012-02-16 Thread Avery Ching (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avery Ching updated GIRAPH-150:
---

Attachment: GIRAPH-150.patch

Use the job.getConfiguration() instead of getConf() or else the vertex class 
doesn't get set properly.  Also got rid of the other getConf() usage.

Tested with 'mvn package' and 'mvn verify'.

 PageRankBenchmark accesses wrong conf after GiraphJob is created
 

 Key: GIRAPH-150
 URL: https://issues.apache.org/jira/browse/GIRAPH-150
 Project: Giraph
  Issue Type: Bug
Reporter: Avery Ching
Assignee: Avery Ching
 Attachments: GIRAPH-150.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-40) Adding checkstyle enforcement of Giraph code conventions

2012-02-15 Thread Avery Ching (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-40?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avery Ching updated GIRAPH-40:
--

Attachment: GIRAPH-40.3.patch

Good suggestion, Jakob.  I have addressed Jakob's comments by changing phases 
from 'compile'-'verify' for checkstyle.  This is the only change I made.  

{noquote}
   execution
-phasecompile/phase
+phaseverify/phase
 goals
{noquote}

We need to require our users to have tested with 'mvn verify' (or 'mvn 
install') before submitting diffs.  This matches the rat approach.

 Adding checkstyle enforcement of Giraph code conventions
 

 Key: GIRAPH-40
 URL: https://issues.apache.org/jira/browse/GIRAPH-40
 Project: Giraph
  Issue Type: New Feature
Reporter: Avery Ching
Assignee: Avery Ching
Priority: Minor
 Attachments: GIRAPH-40.2.patch, GIRAPH-40.3.patch, GIRAPH-40.patch, 
 GIRAPH-40.patch


 Now that we have some code conventions (see GIRAPH-21), we should enforce 
 them with a maven checkstyle plugin.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-40) Adding checkstyle enforcement of Giraph code conventions

2012-02-14 Thread Avery Ching (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-40?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avery Ching updated GIRAPH-40:
--

Attachment: GIRAPH-40.patch

As promised, here is the full patch.  Due to its massive size, I am not posting 
this to reviewboard.  Here are the details of what I did:

* Created a checkstyle.xml file that follows our CODE_CONVENTIONS as best as 
possible.
* Compiles will now fail if checkstyle guidelines are not met.
* While checkstyle isn't comprehensive, it should reduce our reviewer overhead 
for formatting issues and common code style violations
* Current source code (not test code) now meets checkstyle checks

It passes both the local and MR unittests and also passes rat installation.

Take a look at a few files.  I don't recommend looking at everything since 

$ git diff HEAD^ | grep -P ^(\+|\-) | wc -l
   32848

Let's get this in soon to help us iterate faster and get rid of this technical 
debt!

 Adding checkstyle enforcement of Giraph code conventions
 

 Key: GIRAPH-40
 URL: https://issues.apache.org/jira/browse/GIRAPH-40
 Project: Giraph
  Issue Type: New Feature
Reporter: Avery Ching
Assignee: Avery Ching
Priority: Minor
 Attachments: GIRAPH-40.patch, GIRAPH-40.patch


 Now that we have some code conventions (see GIRAPH-21), we should enforce 
 them with a maven checkstyle plugin.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-148) giraph-site.xml needs Apache header

2012-02-13 Thread Jakob Homan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated GIRAPH-148:
---

Attachment: GIRAPH-148-b.patch

Here's one copied and pasted from our pom.xml

 giraph-site.xml needs Apache header
 ---

 Key: GIRAPH-148
 URL: https://issues.apache.org/jira/browse/GIRAPH-148
 Project: Giraph
  Issue Type: Bug
  Components: conf and scripts
Affects Versions: 0.2.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.2.0

 Attachments: GIRAPH-148-b.patch, GIRAPH-148.patch


 I forgot to add the license to the conf file and now rat is failing...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-148) giraph-site.xml needs Apache header

2012-02-10 Thread Jakob Homan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated GIRAPH-148:
---

Attachment: GIRAPH-148.patch

Quick patch...

 giraph-site.xml needs Apache header
 ---

 Key: GIRAPH-148
 URL: https://issues.apache.org/jira/browse/GIRAPH-148
 Project: Giraph
  Issue Type: Bug
  Components: conf and scripts
Affects Versions: 0.2.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.2.0

 Attachments: GIRAPH-148.patch


 I forgot to add the license to the conf file and now rat is failing...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-148) giraph-site.xml needs Apache header

2012-02-10 Thread Jakob Homan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated GIRAPH-148:
---

Summary: giraph-site.xml needs Apache header  (was: giraph-site.xml needs 
Apache head)

 giraph-site.xml needs Apache header
 ---

 Key: GIRAPH-148
 URL: https://issues.apache.org/jira/browse/GIRAPH-148
 Project: Giraph
  Issue Type: Bug
  Components: conf and scripts
Affects Versions: 0.2.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.2.0

 Attachments: GIRAPH-148.patch


 I forgot to add the license to the conf file and now rat is failing...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-145) Change partition request log level to debug rather than info

2012-02-09 Thread Jakob Homan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated GIRAPH-145:
---

Attachment: GIRAPH-145.patch

Quick patch to go down to debug level.  Verified with tests and cluster run.

 Change partition request log level to debug rather than info
 

 Key: GIRAPH-145
 URL: https://issues.apache.org/jira/browse/GIRAPH-145
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Affects Versions: 0.2.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.2.0

 Attachments: GIRAPH-145.patch


 {code:title=BasicRPCCommunications.java|borderStyle=solid}
 if (LOG.isInfoEnabled()) {
 LOG.info(sendPartitionReq: Sending to  + rpcProxy.getName() +
+ addr +  from  + workerInfo +
  , with partition  + partition);
 }{code}
 is too chatty.  We're seeing thousands and sounds of these lines for larger 
 graphs.  This should be at debug level...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-142) _hadoopBsp should be prefixable via configuration

2012-02-09 Thread Jakob Homan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated GIRAPH-142:
---

Attachment: GIRAPH-142.patch

Patch to add new config value, giraph.zkBaseZNode, that is the top-level for 
all giraph-created content on the ZK server.  New unit test.  Verified on 
running cluster as well.

 _hadoopBsp should be prefixable via configuration
 -

 Key: GIRAPH-142
 URL: https://issues.apache.org/jira/browse/GIRAPH-142
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.1.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.2.0

 Attachments: GIRAPH-142.patch


 In multitennant zookeeper clusters, it would be good to be able to specify 
 the base directory that's created for the _hadoopBsp znodes.  This would also 
 fix the issue we have with creating that directory in the source root during 
 tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-143) Add support for giraph to have a conf file

2012-02-08 Thread Jakob Homan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated GIRAPH-143:
---

Component/s: conf and scripts

 Add support for giraph to have a conf file
 --

 Key: GIRAPH-143
 URL: https://issues.apache.org/jira/browse/GIRAPH-143
 Project: Giraph
  Issue Type: New Feature
  Components: conf and scripts
Affects Versions: 0.2.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.2.0

 Attachments: GIRAPH-143.patch


 Currently one must provide all the Giraph-specific config values either via 
 the command line or snuck into another project's conf file.  Any 
 self-respecting Hadoop ecosystem project should have its own conf file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-141) mulitgraph support in giraph

2012-02-04 Thread Updated

 [ 
https://issues.apache.org/jira/browse/GIRAPH-141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

André Kelpe updated GIRAPH-141:
---

Description: 
The current vertex API only supports simple graphs, meaning that there can only 
ever be one edge between two vertices. Many graphs like the road network are in 
fact multigraphs, where many edges can connect two vertices at the same time.

Support for this could be added by introducing an IteratorEdgeWritable 
getEdgeValue() or a similar construct. Maybe introducing a slim object like a 
Connector between the edge and the vertex is also a good idea, so that you 
could do something like:

{code} 
for (final ConnectorEdgeWritable, VertexWritable conn: getEdgeValues(){
 final EdgeWritable edge = conn.getEdge();
 final VertexWritable otherVertex = conn.getOther();
// do interesting stuff
}
{code} 


  was:
The current vertex API only supports simple graphs, meaning that there can only 
ever be one edge between two vertices. Many graphs like the road network are in 
fact multigraphs, where many edges can connect two vertices at the same time.

Support for this could be added by introducing an IteratorEdgeWritable 
getEdgeValue() or a similar construct. Maybe introducing a slim object like a 
Connector between the edge and the vertex is also a good idea, so that you 
could do something like:

for (final ConnectorEdgeWritable, VertexWritable conn: getEdgeValues(){
 final EdgeWritable edge = conn.getEdge();
 final VertexWritable otherVertex = conn.getOther();
// do interesting stuff
}



 mulitgraph support in giraph
 

 Key: GIRAPH-141
 URL: https://issues.apache.org/jira/browse/GIRAPH-141
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Reporter: André Kelpe

 The current vertex API only supports simple graphs, meaning that there can 
 only ever be one edge between two vertices. Many graphs like the road network 
 are in fact multigraphs, where many edges can connect two vertices at the 
 same time.
 Support for this could be added by introducing an IteratorEdgeWritable 
 getEdgeValue() or a similar construct. Maybe introducing a slim object like a 
 Connector between the edge and the vertex is also a good idea, so that you 
 could do something like:
 {code} 
 for (final ConnectorEdgeWritable, VertexWritable conn: getEdgeValues(){
  final EdgeWritable edge = conn.getEdge();
  final VertexWritable otherVertex = conn.getOther();
 // do interesting stuff
 }
 {code} 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-133) Typo in JavaDoc in BspCase::remove

2012-02-03 Thread Harsh J (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated GIRAPH-133:
---

Attachment: GIRAPH-133.patch

 Typo in JavaDoc in BspCase::remove
 --

 Key: GIRAPH-133
 URL: https://issues.apache.org/jira/browse/GIRAPH-133
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan
Priority: Trivial
  Labels: newbie
 Attachments: GIRAPH-133.patch


 Configuration is spelled wrong in the javadoc:
 {noformat}/**
  * Helper method to remove a path if it exists.
  * 
  * @param conf Configutation
  * @param path Path to remove
  * @throws IOException
  */
 public static void remove(Configuration conf, Path path) 
 throws IOException {
 FileSystem hdfs = FileSystem.get(conf);
 hdfs.delete(path, true);
 }{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-137) De-duplicate pagerank implementation in PageRankBenchmark

2012-02-03 Thread Harsh J (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated GIRAPH-137:
---

Attachment: GIRAPH-137.patch

Subclassing having proven tricky to do (seems like a multiple inheritance 
situation?) I've tried to reuse via a static function. Is this OK or plain 
silly?

 De-duplicate pagerank implementation in PageRankBenchmark
 -

 Key: GIRAPH-137
 URL: https://issues.apache.org/jira/browse/GIRAPH-137
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan
Priority: Minor
  Labels: newbie
 Attachments: GIRAPH-137.patch


 Currently in PageRankBenchmark we have the code for pagerank duplicated in 
 each of the implementations of Vertex:
 {noformat}public static class PageRankHashMapVertex extends HashMapVertex
 LongWritable, DoubleWritable, DoubleWritable, DoubleWritable {
 @Override
 public void compute(IteratorDoubleWritable msgIterator) {
 if (getSuperstep() = 1) {
 double sum = 0;
 while (msgIterator.hasNext()) {
 sum += msgIterator.next().get();
 }
 DoubleWritable vertexValue =
 new DoubleWritable((0.15f / getNumVertices()) + 0.85f *
sum);
 setVertexValue(vertexValue);
 }
 if (getSuperstep()  getConf().getInt(SUPERSTEP_COUNT, -1)) {
 long edges = getNumOutEdges();
 sendMsgToAllEdges(
 new DoubleWritable(getVertexValue().get() / edges));
 } else {
 voteToHalt();
 }
 }
 }
 public static class PageRankEdgeListVertex extends EdgeListVertex
 LongWritable, DoubleWritable, DoubleWritable, DoubleWritable {
 @Override
 public void compute(IteratorDoubleWritable msgIterator) {
 if (getSuperstep() = 1) {
 double sum = 0;
 while (msgIterator.hasNext()) {
 sum += msgIterator.next().get();
 }
 DoubleWritable vertexValue =
 new DoubleWritable((0.15f / getNumVertices()) + 0.85f *
sum);
 setVertexValue(vertexValue);
 }
 if (getSuperstep()  getConf().getInt(SUPERSTEP_COUNT, -1)) {
 long edges = getNumOutEdges();
 sendMsgToAllEdges(
 new DoubleWritable(getVertexValue().get() / edges));
 } else {
 voteToHalt();
 }
 }
 }{noformat}
 This code can be consolidated into private class and the two implementations 
 just extend that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-130) Fix Javadoc warnings

2012-02-03 Thread Harsh J (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated GIRAPH-130:
---

Attachment: GIRAPH-130.patch

Fixes all trunk's present javadocs today.

@Avery - Perhaps once you have a QA Buildbot like other projects have, you can 
use the javadoc hooks in it to +1/-1 a patch?

 Fix Javadoc warnings
 

 Key: GIRAPH-130
 URL: https://issues.apache.org/jira/browse/GIRAPH-130
 Project: Giraph
  Issue Type: Bug
Reporter: Jakob Homan
Priority: Minor
  Labels: newbie
 Attachments: GIRAPH-130.patch


 We've accumulated a fair number of javadoc warnings recently:
 {noformat}[WARNING] Javadoc Warnings
 [WARNING] 
 /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java:146:
  warning - Tag @link: reference not found: GraphPartitioner
 [WARNING] 
 /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java:129:
  warning - @param argument superstep is not a parameter name.
 [WARNING] 
 /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java:146:
  warning - Tag @link: reference not found: GraphPartitioner
 [WARNING] 
 /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/comm/CommunicationsInterface.java:84:
  warning - @param argument vertexIndex is not a parameter name.
 [WARNING] 
 /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/comm/CommunicationsInterface.java:84:
  warning - @param argument msgList is not a parameter name.
 [WARNING] 
 /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/comm/VertexIdMessagesList.java:32:
  warning - Tag @link: reference not found: VertexIdMessage
 [WARNING] 
 /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/VertexCombiner.java:46:
  warning - Tag @link: reference not found: messages
 [WARNING] 
 /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/VertexCombiner.java:46:
  warning - Tag @link: reference not found: messages
 [WARNING] 
 /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/AggregatorWriter.java:60:
  warning - @param argument map is not a parameter name.
 [WARNING] 
 /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java:146:
  warning - Tag @link: reference not found: GraphPartitioner
 [WARNING] 
 /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java:146:
  warning - Tag @link: reference not found: GraphPartitioner
 [WARNING] 
 /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/GiraphJob.java:432:
  warning - @param argument graphPartitionerClass is not a parameter name.
 [WARNING] 
 /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/VertexCombiner.java:46:
  warning - Tag @link: reference not found: messages
 [WARNING] 
 /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/partition/MasterGraphPartitioner.java:62:
  warning - Tag @link: reference not found: GraphPartitioner
 [WARNING] 
 /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/partition/MasterGraphPartitioner.java:62:
  warning - Tag @link: reference not found: GraphPartitioner
 [WARNING] 
 /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/partition/MasterGraphPartitioner.java:62:
  warning - @param argument availableWorkerInfos is not a parameter name.
 [WARNING] 
 /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/partition/PartitionBalancer.java:176:
  warning - @param argument allPartitionStatsList is not a parameter name.
 [WARNING] 
 /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/comm/VertexIdMessagesList.java:32:
  warning - Tag @link: reference not found: VertexIdMessage
 [WARNING] 
 /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/comm/VertexIdMessagesList.java:32:
  warning - Tag @link: reference not found: VertexIdMessage
 [WARNING] 
 /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java:146:
  warning - Tag @link: reference not found: GraphPartitioner
 [WARNING] 
 /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/comm/VertexIdMessagesList.java:32:
  warning - Tag @link: reference not found: VertexIdMessage
 [WARNING] 
 /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java:146:
  warning - Tag @link: reference not found: GraphPartitioner
 [WARNING] 
 /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/comm/VertexIdMessagesList.java:32:
  warning - Tag @link: reference not found: VertexIdMessage
 {noformat}
 It would be good to fix these.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure

[jira] [Updated] (GIRAPH-136) Error message for bin/giraph could be improved

2012-02-03 Thread Jakob Homan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated GIRAPH-136:
---

Summary: Error message for bin/giraph could be improved  (was: Erorr 
message for bin/giraph could be improved)

 Error message for bin/giraph could be improved
 --

 Key: GIRAPH-136
 URL: https://issues.apache.org/jira/browse/GIRAPH-136
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.1.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.2.0

 Attachments: GIRAPH-136-b.patch, GIRAPH-136.patch


 Currently when one just runs bin/giraph without the required jar, the message 
 isn't very helpful:
 {noformat}[tardis giraph-0.1]$ bin/giraph
 Can't find user jar to execute.{noformat}
 It would be better to have a more in-depth message explaining Giraph and what 
 is expected.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-131) enable creation of test-jars to simplify testing in downstream projects

2012-02-02 Thread Eric Charles (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Charles updated GIRAPH-131:


Attachment: GIRAPH-131-source-test-jar.patch

GIRAPH-131-source-test-jar.patch allows the deployment of the test-jar sources 
(see also GIRAPH-129).

No much, but can be useful to run tests from a 3rd party project and debug in 
the giraph sources.

 enable creation of test-jars to simplify testing in downstream projects
 ---

 Key: GIRAPH-131
 URL: https://issues.apache.org/jira/browse/GIRAPH-131
 Project: Giraph
  Issue Type: Improvement
Reporter: André Kelpe
Assignee: André Kelpe
Priority: Minor
 Fix For: 0.1.0

 Attachments: GIRAPH-131-source-test-jar.patch, GIRAPH-131.patch


 Attached patch enables the creation of test-jars, which are the tests 
 packaged in a separate jar file. This makes it possible to use the 
 super-useful test infrastructure in MockUtils in downstream projects. If you 
 add the patch, you will get a ${giraph.version}-tests.jar, which can be used 
 for downstream testing like this:
 dependency
   groupIdorg.apache.giraph/groupId
   artifactIdgiraph/artifactId
   version${giraph.version}/version
   typetest-jar/type
   scopetest/scope
 /dependency
 P.S.: The patch also resets the version to 0.1-SNAPSHOT as discussed in 
 GIRAPH-129

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-120) Add Sebastian Schelter to site

2012-02-01 Thread Sebastian Schelter (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Schelter updated GIRAPH-120:
--

Attachment: GIRAPH-120.patch

 Add Sebastian Schelter to site
 --

 Key: GIRAPH-120
 URL: https://issues.apache.org/jira/browse/GIRAPH-120
 Project: Giraph
  Issue Type: Task
Affects Versions: 0.1.0
Reporter: Sebastian Schelter
Assignee: Sebastian Schelter
 Fix For: 0.1.0

 Attachments: GIRAPH-120.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-136) Erorr message for bin/giraph could be improved

2012-02-01 Thread Jakob Homan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated GIRAPH-136:
---

Attachment: GIRAPH-136-b.patch

Here's a version that tries to be a bit smarter.  If there's no lib directory, 
it checks for a target directory (if target doesn't exist, it exits) and loads 
the giraph jar from there and sets the classpath via maven (as described above).

This will work for dev enviroments with a hadoop instance.  Invariably, this 
won't work for someone and need to be modified more, but that's how these 
scripts end up becoming so convoluted.

 Erorr message for bin/giraph could be improved
 --

 Key: GIRAPH-136
 URL: https://issues.apache.org/jira/browse/GIRAPH-136
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.1.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.2.0

 Attachments: GIRAPH-136-b.patch, GIRAPH-136.patch


 Currently when one just runs bin/giraph without the required jar, the message 
 isn't very helpful:
 {noformat}[tardis giraph-0.1]$ bin/giraph
 Can't find user jar to execute.{noformat}
 It would be better to have a more in-depth message explaining Giraph and what 
 is expected.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-134) Fix NOTICE file for release

2012-01-31 Thread Jakob Homan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated GIRAPH-134:
---

Summary: Fix NOTICE file for release  (was: Fix NOTICE and LICENSE files)

 Fix NOTICE file for release
 ---

 Key: GIRAPH-134
 URL: https://issues.apache.org/jira/browse/GIRAPH-134
 Project: Giraph
  Issue Type: Improvement
  Components: documentation
Affects Versions: 0.1.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.1.0

 Attachments: GIRAPH-134.patch


 Currently both the LICENSE and NOTICE file are out of compliance for an 
 Apache release.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-134) Fix NOTICE and LICENSE files

2012-01-30 Thread Jakob Homan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated GIRAPH-134:
---

Attachment: GIRAPH-134.patch

LICENSE is actually ok for a source release, but NOTICE needs to be made 
minimal (see KAFKA-219 and associated incubator discussion list).  For the 
binary release, we'll add transitive dependencies via the maven external 
release plugin, so that'll be another JIRA.

 Fix NOTICE and LICENSE files
 

 Key: GIRAPH-134
 URL: https://issues.apache.org/jira/browse/GIRAPH-134
 Project: Giraph
  Issue Type: Improvement
  Components: documentation
Affects Versions: 0.1.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.1.0

 Attachments: GIRAPH-134.patch


 Currently both the LICENSE and NOTICE file are out of compliance for an 
 Apache release.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-128) RPC port from BasicRPCCommunications should be only a starting port, and retried

2012-01-27 Thread Avery Ching (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avery Ching updated GIRAPH-128:
---

Attachment: GIRAPH-128.4.patch

Sorry, I missed the mocking question.  Fixed it here.

 RPC port from BasicRPCCommunications should be only a starting port, and 
 retried
 

 Key: GIRAPH-128
 URL: https://issues.apache.org/jira/browse/GIRAPH-128
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.1.0
Reporter: Avery Ching
Assignee: Avery Ching
 Attachments: GIRAPH-128.2.patch, GIRAPH-128.3.patch, 
 GIRAPH-128.4.patch


 Currently Giraph uses a basic port + the task partition to get the RPC port.  
 This doesn't work well for when there are multiple Giraph jobs running 
 simultaneously in the same Hadoop cluster (port conflict).  At the same time, 
 it is nice to use this simple algorithm because it makes it very easy to 
 debug problems (you can find the troublesome mapper from the RPC port name).  
 I will be proposing a simple scheme to retry with another port.  I will round 
 the total number of mappers up to the nearest power of 10 (let's that that 
 number Z).  Then I will increment the port number by Z, retrying up to 20 
 tries.  If you have enough ports, this scheme would guarantee that up to 20 
 mappers / node would be supported.  It should be sufficient for most 
 clusters.  At the same time, we still maintain the easy debugging method 
 since you it's still easy to figure out the mapper partition from the port 
 (port % Z = map partition). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-131) enable creation of test-jars to simplify testing in downstream projects

2012-01-26 Thread Updated

 [ 
https://issues.apache.org/jira/browse/GIRAPH-131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

André Kelpe updated GIRAPH-131:
---

Attachment: GIRAPH-131.patch

 enable creation of test-jars to simplify testing in downstream projects
 ---

 Key: GIRAPH-131
 URL: https://issues.apache.org/jira/browse/GIRAPH-131
 Project: Giraph
  Issue Type: Improvement
Reporter: André Kelpe
Priority: Minor
 Attachments: GIRAPH-131.patch


 Attached patch enables the creation of test-jars, which are the tests 
 packaged in a separate jar file. This makes it possible to use the 
 super-useful test infrastructure in MockUtils in downstream projects. If you 
 add the patch, you will get a ${giraph.version}-tests.jar, which can be used 
 for downstream testing like this:
 dependency
   groupIdorg.apache.giraph/groupId
   artifactIdgiraph/artifactId
   version${giraph.version}/version
   typetest-jar/type
   scopetest/scope
 /dependency
 P.S.: The patch also resets the version to 0.1-SNAPSHOT as discussed in 
 GIRAPH-129

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-129) enable creation of javadoc and sources jars

2012-01-24 Thread Updated

 [ 
https://issues.apache.org/jira/browse/GIRAPH-129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

André Kelpe updated GIRAPH-129:
---

Attachment: GIRAPH-129.patch

 enable creation of javadoc and sources jars
 ---

 Key: GIRAPH-129
 URL: https://issues.apache.org/jira/browse/GIRAPH-129
 Project: Giraph
  Issue Type: Improvement
  Components: build
Affects Versions: 0.1.0
Reporter: André Kelpe
Priority: Minor
 Attachments: GIRAPH-129.patch


 It is pretty useful to enable the creation if javadoc and sources jars during 
 the build, so that people using IDEs like eclipse can easily jump into the 
 code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-128) RPC port from BasicRPCCommunications should be only a starting port, and retried

2012-01-24 Thread Avery Ching (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avery Ching updated GIRAPH-128:
---

Attachment: GIRAPH-128.2.patch

Updated after GIRAPH-124 was committed.

 RPC port from BasicRPCCommunications should be only a starting port, and 
 retried
 

 Key: GIRAPH-128
 URL: https://issues.apache.org/jira/browse/GIRAPH-128
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.1.0
Reporter: Avery Ching
Assignee: Avery Ching
 Attachments: GIRAPH-128.2.patch


 Currently Giraph uses a basic port + the task partition to get the RPC port.  
 This doesn't work well for when there are multiple Giraph jobs running 
 simultaneously in the same Hadoop cluster (port conflict).  At the same time, 
 it is nice to use this simple algorithm because it makes it very easy to 
 debug problems (you can find the troublesome mapper from the RPC port name).  
 I will be proposing a simple scheme to retry with another port.  I will round 
 the total number of mappers up to the nearest power of 10 (let's that that 
 number Z).  Then I will increment the port number by Z, retrying up to 20 
 tries.  If you have enough ports, this scheme would guarantee that up to 20 
 mappers / node would be supported.  It should be sufficient for most 
 clusters.  At the same time, we still maintain the easy debugging method 
 since you it's still easy to figure out the mapper partition from the port 
 (port % Z = map partition). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-45) Improve the way to keep outgoing messages

2012-01-23 Thread Claudio Martella (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-45?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Claudio Martella updated GIRAPH-45:
---

Attachment: GIRAPH-45.diff

This a premature patch not meant for inclusion but as RFC. It passes all local 
unit tests and MR except checkpointing and partitioner tests. 
Apparently I broke something with partitioning.

In case of checkpointing it breaks in 
BasicRPCCommunications#checkForMessageToNonExistentVertex(), with messages sent 
to the wrong worker (see IllegalStateException), while in TestGraphPartitioner 
the output partition files are small than required size.

I'm requesting some comments as apparently I don't get how I broke partitioner 
package by moving some code from prepareSuperstep() to putMsg* methods. There 
must be an assumption I don't get which might be obvious to one of you.

I tried to go incrementally by just refactoring 
BasicRPCCommunications#checkForMessageToNonExistentVertex() and leaving the 
rest AS-IS, so no out-of-core classes, just really trunk with 
BasicRPCCommunications#checkForMessageToNonExistentVertex() logics, and the 
code doesn't break. So... any ideas?

 Improve the way to keep outgoing messages
 -

 Key: GIRAPH-45
 URL: https://issues.apache.org/jira/browse/GIRAPH-45
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Reporter: Hyunsik Choi
 Attachments: GIRAPH-45.diff


 As discussed in GIRAPH-12(http://goo.gl/CE32U), I think that there is a 
 potential problem to cause out of memory when the rate of message generation 
 is higher than the rate of message flush (or network bandwidth).
 To overcome this problem, we need more eager strategy for message flushing or 
 some approach to spill messages into disk.
 The below link is Dmitriy's suggestion.
 https://issues.apache.org/jira/browse/GIRAPH-12?focusedCommentId=13116253page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13116253

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-124) Combiner should return IterableM instead of M or null.

2012-01-21 Thread Claudio Martella (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Claudio Martella updated GIRAPH-124:


Attachment: GIRAPH-124.diff

Fixes indentation and Exception messages according to Avery's comments.

 Combiner should return IterableM instead of M or null.
 

 Key: GIRAPH-124
 URL: https://issues.apache.org/jira/browse/GIRAPH-124
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Affects Versions: 0.1.0
Reporter: Claudio Martella
 Attachments: GIRAPH-124.diff, GIRAPH-124.diff


 Currently VertexCombiner is expected to return a single message combining the 
 input messages, or null in case no message should be sent. The new expected 
 interface should return an IterableM, possibly empty. The number of 
 elements in the returned Iterable is supposed to be smaller than the number 
 of input messages, by the initial definition of a Combiner (defined as a 
 function to reduce I/O by combining multiple messages into 1).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-124) Combiner should return IterableM instead of M or null.

2012-01-20 Thread Claudio Martella (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Claudio Martella updated GIRAPH-124:


Attachment: GIRAPH-124.diff

Implements the IterableM interface on the return value. Throwing 
IllegalStateException when returned number of elements is = the number of 
original messages.

 Combiner should return IterableM instead of M or null.
 

 Key: GIRAPH-124
 URL: https://issues.apache.org/jira/browse/GIRAPH-124
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Affects Versions: 0.1.0
Reporter: Claudio Martella
 Attachments: GIRAPH-124.diff


 Currently VertexCombiner is expected to return a single message combining the 
 input messages, or null in case no message should be sent. The new expected 
 interface should return an IterableM, possibly empty. The number of 
 elements in the returned Iterable is supposed to be smaller than the number 
 of input messages, by the initial definition of a Combiner (defined as a 
 function to reduce I/O by combining multiple messages into 1).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-126) Use Collections.emptyList() in BasicRPCCommunications.java

2012-01-18 Thread Updated

 [ 
https://issues.apache.org/jira/browse/GIRAPH-126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

André Kelpe updated GIRAPH-126:
---

Attachment: GIRAPH-126.patch

removed empty lines

 Use Collections.emptyList() in BasicRPCCommunications.java
 --

 Key: GIRAPH-126
 URL: https://issues.apache.org/jira/browse/GIRAPH-126
 Project: Giraph
  Issue Type: Improvement
Reporter: André Kelpe
Assignee: André Kelpe
Priority: Minor
 Attachments: GIRAPH-126.patch, GIRAPH-126.patch


 I am doing some tests with giraph and I am having some memory problems. While 
 I was browsing through the codebase I saw that you are allocating a new 
 ArrayList (which has an underlying array of 10 elements) for each Vertex, 
 that has no Messages to be delivered. That's a waste of memory and time. This 
 patch replaces it with the EMPTY_LIST of the Collections utility class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-126) Use Collections.emptyList() in BasicRPCCommunications.java

2012-01-18 Thread Updated

 [ 
https://issues.apache.org/jira/browse/GIRAPH-126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

André Kelpe updated GIRAPH-126:
---

Attachment: GIRAPH-126.patch

remove empty lines and grant apache the correct lines

 Use Collections.emptyList() in BasicRPCCommunications.java
 --

 Key: GIRAPH-126
 URL: https://issues.apache.org/jira/browse/GIRAPH-126
 Project: Giraph
  Issue Type: Improvement
Reporter: André Kelpe
Assignee: André Kelpe
Priority: Minor
 Attachments: GIRAPH-126.patch, GIRAPH-126.patch, GIRAPH-126.patch


 I am doing some tests with giraph and I am having some memory problems. While 
 I was browsing through the codebase I saw that you are allocating a new 
 ArrayList (which has an underlying array of 10 elements) for each Vertex, 
 that has no Messages to be delivered. That's a waste of memory and time. This 
 patch replaces it with the EMPTY_LIST of the Collections utility class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-125) Bug in LongDoubleFloatDoubleVertex.sendMsgToAllEdges()

2012-01-18 Thread Yuanyuan Tian (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuanyuan Tian updated GIRAPH-125:
-

Attachment: LongDoubleFloatDoubleVertex.java.patch

 Bug in LongDoubleFloatDoubleVertex.sendMsgToAllEdges()
 --

 Key: GIRAPH-125
 URL: https://issues.apache.org/jira/browse/GIRAPH-125
 Project: Giraph
  Issue Type: Bug
  Components: graph
Affects Versions: 0.1.0
Reporter: Yuanyuan Tian
Assignee: Yuanyuan Tian
  Labels: patch
 Fix For: 0.1.0

 Attachments: LongDoubleFloatDoubleVertex.java.patch

   Original Estimate: 5m
  Remaining Estimate: 5m

 I just found a bug in the sendMsgToAllEdges() function of the 
 LongDoubleFloatDoubleVertex class. The segment of the code that contains the 
 bug is:
 final LongWritable destVertex = new LongWritable();
 final MutableVertexLongWritable, DoubleWritable, FloatWritable,
 DoubleWritable vertex = this;
 verticesWithEdgeValues.forEachKey(new LongProcedure() {
 @Override
 public boolean apply(long destVertexId) {
 destVertex.set(destVertexId);
 vertex.sendMsg(destVertex, msg);
 return true;
 }
 });
 Here destVertex is a final object, but this single object is reused in the 
 forEachKey function many times. Each time its actual value is changed but the 
 same object is put to the underlying message list (a hashmap) through 
 vertex.sendMsg. Because the single destVertex object has been put into the 
 underlying hashmap again and again, destVertex.set(destVertexId) will change 
 the existing keys in the hashmap. Eventually, every keys added to the hash 
 map will have the same value as the last key. 
 A simple fix is as follows:
 final MutableVertexLongWritable, DoubleWritable, FloatWritable,
 DoubleWritable vertex = this;
 verticesWithEdgeValues.forEachKey(new LongProcedure() {
 @Override
 public boolean apply(long destVertexId) {
 vertex.sendMsg(new LongWritable(destVertexId), msg);
 return true;
 }
 });

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-121) BasicVertexResolver should be implementation and VertexResolver should be interface

2012-01-08 Thread Claudio Martella (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Claudio Martella updated GIRAPH-121:


Description: After the change of naming in Vertex, VertexResolver and 
BasicVertexResolver naming should be synched.  (was: After change of naming in 
Vertex, VertexResolver and BasicVertexResolver naming should be synched.)
Summary: BasicVertexResolver should be implementation and 
VertexResolver should be interface  (was: BasicVertexResolver should 
implementation and VertexResolver should be interface)

 BasicVertexResolver should be implementation and VertexResolver should be 
 interface
 ---

 Key: GIRAPH-121
 URL: https://issues.apache.org/jira/browse/GIRAPH-121
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Affects Versions: 0.70.0
Reporter: Claudio Martella
Assignee: Claudio Martella
Priority: Trivial
  Labels: newbie

 After the change of naming in Vertex, VertexResolver and BasicVertexResolver 
 naming should be synched.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-118) Clarify messages behavior in BasicVertex

2012-01-06 Thread Claudio Martella (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Claudio Martella updated GIRAPH-118:


  Description: 
initialize() can receive a null parameter for messages (at least that's what 
EdgeListVertex does). We should avoid that and pass an empty Iterable instead. 
That should be cheap for us inside of the InputFormat, just passing a static 
immutable empty list.

setMessages(IterableM) should be changed to putMessages(IterableM). the set 
prefix suggests an assignment, while setMessages is used to transfer the 
messages to the internal datastructure the user is responsible for. 
putMessages() should clarify this.
Affects Version/s: 0.70.0
 Assignee: Claudio Martella

 Clarify messages behavior in BasicVertex
 

 Key: GIRAPH-118
 URL: https://issues.apache.org/jira/browse/GIRAPH-118
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Affects Versions: 0.70.0
Reporter: Claudio Martella
Assignee: Claudio Martella
Priority: Minor

 initialize() can receive a null parameter for messages (at least that's what 
 EdgeListVertex does). We should avoid that and pass an empty Iterable 
 instead. That should be cheap for us inside of the InputFormat, just passing 
 a static immutable empty list.
 setMessages(IterableM) should be changed to putMessages(IterableM). the 
 set prefix suggests an assignment, while setMessages is used to transfer the 
 messages to the internal datastructure the user is responsible for. 
 putMessages() should clarify this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-119) VertexCombiner should work on IterableM instead of ListM

2012-01-06 Thread Claudio Martella (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Claudio Martella updated GIRAPH-119:


Attachment: GIRAPH-119.diff

Trivial refactor to solve the issue.

 VertexCombiner should work on IterableM instead of ListM
 

 Key: GIRAPH-119
 URL: https://issues.apache.org/jira/browse/GIRAPH-119
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Affects Versions: 0.70.0
Reporter: Claudio Martella
Assignee: Claudio Martella
 Attachments: GIRAPH-119.diff


 Currently VertexCombiner expects a ListM. It should be refactored to 
 IterableM to sync with Iterable-based BasicVertex messages logics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-118) Clarify messages behavior in BasicVertex

2012-01-06 Thread Claudio Martella (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Claudio Martella updated GIRAPH-118:


Attachment: GIRAPH-119.diff

Apparently the initialize() issue is also true for other parameters as well 
such as the edges (and outside of the documentation also vertexId: I'm looking 
at TestEdgeListVertex i.e.).

With this little one I just touched the putMessages() issue, probably we can 
think about the initialize() later.

 Clarify messages behavior in BasicVertex
 

 Key: GIRAPH-118
 URL: https://issues.apache.org/jira/browse/GIRAPH-118
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Affects Versions: 0.70.0
Reporter: Claudio Martella
Assignee: Claudio Martella
Priority: Minor
 Attachments: GIRAPH-119.diff


 initialize() can receive a null parameter for messages (at least that's what 
 EdgeListVertex does). We should avoid that and pass an empty Iterable 
 instead. That should be cheap for us inside of the InputFormat, just passing 
 a static immutable empty list.
 setMessages(IterableM) should be changed to putMessages(IterableM). the 
 set prefix suggests an assignment, while setMessages is used to transfer the 
 messages to the internal datastructure the user is responsible for. 
 putMessages() should clarify this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-118) Clarify messages behavior in BasicVertex

2012-01-06 Thread Claudio Martella (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Claudio Martella updated GIRAPH-118:


Attachment: GIRAPH-118.diff

Messed up with issue number in patch filename, sorry :)

 Clarify messages behavior in BasicVertex
 

 Key: GIRAPH-118
 URL: https://issues.apache.org/jira/browse/GIRAPH-118
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Affects Versions: 0.70.0
Reporter: Claudio Martella
Assignee: Claudio Martella
Priority: Minor
 Attachments: GIRAPH-118.diff, GIRAPH-119.diff


 initialize() can receive a null parameter for messages (at least that's what 
 EdgeListVertex does). We should avoid that and pass an empty Iterable 
 instead. That should be cheap for us inside of the InputFormat, just passing 
 a static immutable empty list.
 setMessages(IterableM) should be changed to putMessages(IterableM). the 
 set prefix suggests an assignment, while setMessages is used to transfer the 
 messages to the internal datastructure the user is responsible for. 
 putMessages() should clarify this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-117) DefaultWorkerContext should preserve the method signatures of WorkerContext

2012-01-02 Thread Sebastian Schelter (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Schelter updated GIRAPH-117:
--

Attachment: GIRAPH-117.patch

 DefaultWorkerContext should preserve the method signatures of WorkerContext
 ---

 Key: GIRAPH-117
 URL: https://issues.apache.org/jira/browse/GIRAPH-117
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.70.0
Reporter: Sebastian Schelter
Assignee: Sebastian Schelter
Priority: Trivial
 Attachments: GIRAPH-117.patch


 DefaultWorkerContext.preApplication() swallows the InstantiationException and 
 IllegalAccessException of WorkerContext.preApplication(). These should be 
 preserved for applications that want to register an aggregator in this method.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-108) Refactor code to run independently of Map/Reduce

2011-12-29 Thread Ed Kohlwey (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ed Kohlwey updated GIRAPH-108:
--

Attachment: GIRAPH-108.patch

This patch uses TaskAttemptContext instead of TaskInputOutputContext, which is 
a bit cleaner.

 Refactor code to run independently of Map/Reduce
 

 Key: GIRAPH-108
 URL: https://issues.apache.org/jira/browse/GIRAPH-108
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Reporter: Ed Kohlwey
 Attachments: GIRAPH-108, GIRAPH-108.patch


 It would be nice for Giraph to be refactored such that the code could 
 eventually be run outside of map/reduce. This will allow people to write 
 drivers that can run in the cool new resource manager frameworks like Mesos 
 and YARN, and eventually let the application's code base evolve to be 
 independent of map/reduce.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-114) Inconsistent message map handling in BasicRPCCommunications.LargeMessageFlushExecutor

2011-12-21 Thread Sebastian Schelter (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Schelter updated GIRAPH-114:
--

Attachment: GIRAPH-114.patch

 Inconsistent message map handling in 
 BasicRPCCommunications.LargeMessageFlushExecutor
 -

 Key: GIRAPH-114
 URL: https://issues.apache.org/jira/browse/GIRAPH-114
 Project: Giraph
  Issue Type: Bug
Affects Versions: 0.70.0
Reporter: Sebastian Schelter
Priority: Critical
 Attachments: GIRAPH-114.patch


 I'm currently implementing a simple algorithm to identify all the connected 
 components of a graph. The algorithm ran well in a local IDE unit tests on 
 toy data and in a local single node hadoop instance using a graph of ~100k 
 edges.
 When I tested it on a real cluster with the wikipedia pagelink graph (5.7M 
 vertices, 130M edges), I ran into strange exceptions like this:
 {noformat} 
 2011-12-21 12:03:57,015 INFO org.apache.hadoop.mapred.TaskInProgress: Error 
 from attempt_201112131541_0034_m_27_0: java.lang.IllegalStateException: 
 run: Caught an unrecoverable exception flush: Got ExecutionException
   at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:641)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
   at org.apache.hadoop.mapred.Child.main(Child.java:253)
 Caused by: java.lang.IllegalStateException: flush: Got ExecutionException
   at 
 org.apache.giraph.comm.BasicRPCCommunications.flush(BasicRPCCommunications.java:946)
   at 
 org.apache.giraph.graph.BspServiceWorker.finishSuperstep(BspServiceWorker.java:916)
   at org.apache.giraph.graph.GraphMapper.map(GraphMapper.java:588)
   at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:632)
   ... 7 more
 Caused by: java.util.concurrent.ExecutionException: 
 java.lang.IllegalStateException: run: Impossible for no messages in 1603276
   at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
   at java.util.concurrent.FutureTask.get(FutureTask.java:83)
   at 
 org.apache.giraph.comm.BasicRPCCommunications.flush(BasicRPCCommunications.java:941)
   ... 10 more
 Caused by: java.lang.IllegalStateException: run: Impossible for no messages 
 in 1603276
   at 
 org.apache.giraph.comm.BasicRPCCommunications$PeerFlushExecutor.run(BasicRPCCommunications.java:245)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 {noformat} 
 The exception is thrown because a vertex with no message to send to is found 
 in the datastructure holding the outgoing messages.
 I tracked this behavior down:
 In *BasicRPCCommunications:541-546* the map holding the outgoing messages for 
 vertices of a particular machine is created. It's stored in two places 
 _BasicRPCCommunications.outMessages_ and as member variable 
 _outMessagesPerPeer_ of its _PeerConnection_ :
 {noformat} 
 outMsgMap = new HashMapI, MsgListM();
 outMessages.put(addrUnresolved, outMsgMap);
 PeerConnection peerConnection = new PeerConnection(outMsgMap, peer, isProxy);
 {noformat} 
   
 In case that there are a lot of messages available for a particular vertex, a 
 large flush is trigged via _LargeMessageFlushExecutor_ (I guess this only 
 happened in the wikipedia test). During this flush the list of messages for 
 the vertex is sent out and replaced with an empty list in 
 *BasicRPCCommunications:341*
 {noformat}
 outMessageList = peerConnection.outMessagesPerPeer.get(destVertex);
 peerConnection.outMessagesPerPeer.put(destVertex, new MsgListM());
 {noformat}
 Now in the last flush that is trigggered at the end of the superstep we 
 encounter an empty message list for the vertex and therefore the exception is 
 thrown in *BasicRPCCommunications:228-247*
 {noformat}
 for (EntryI, MsgListM entry : 
 peerConnection.outMessagesPerPeer.entrySet()) {
 ...
   if (entry.getValue().isEmpty()) {
 throw new IllegalStateException(...);
 }
 {noformat}
 Simply removing the list for the vertex when executing the large flush solved 
 the issue

[jira] [Updated] (GIRAPH-109) GiraphRunner should provide support for combiners

2011-12-21 Thread Sebastian Schelter (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Schelter updated GIRAPH-109:
--

Attachment: GIRAPH-109.patch

Patch that allows specifying VertexCombiner, AggregatorWriter and WorkerContext.


 GiraphRunner should provide support for combiners
 -

 Key: GIRAPH-109
 URL: https://issues.apache.org/jira/browse/GIRAPH-109
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.70.0
Reporter: Sebastian Schelter
 Attachments: GIRAPH-109.patch


 Currently there's no way to tell GiraphRunner that you want to use a 
 Combiner. A simple option should be added, similar to the way in- and 
 outputformats are specified.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-113) Change cast to Vertex used in prepareSuperstep() to BasicVertex

2011-12-20 Thread Avery Ching (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avery Ching updated GIRAPH-113:
---

Attachment: GIRAPH-113.patch

 Change cast to Vertex used in prepareSuperstep() to BasicVertex
 ---

 Key: GIRAPH-113
 URL: https://issues.apache.org/jira/browse/GIRAPH-113
 Project: Giraph
  Issue Type: Bug
Reporter: Yuanyuan Tian
Assignee: Avery Ching
Priority: Minor
 Attachments: GIRAPH-113.patch


 Hi,
 I decided to use LongDoubleFloatDoubleVertex in a graph algorithm because it 
 uses more compact and efficient mahout collections. However I run into an 
 error when running the algorithm:
 java.lang.ClassCastException: 
 org.apache.giraph.graph.LongDoubleFloatDoubleVertex cannot be cast to 
 org.apache.giraph.graph.Vertex
 at 
 org.apache.giraph.comm.BasicRPCCommunications.prepareSuperstep(BasicRPCCommunications.java:1016)
 at 
 org.apache.giraph.graph.BspServiceWorker.startSuperstep(BspServiceWorker.java:843)
 at org.apache.giraph.graph.GraphMapper.map(GraphMapper.java:569)
 at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:728)
 ... 7 more
 Basically, the problem is that in BasicRPCCommunications.prepareSuperStep(), 
 the LongDoubleFloatDoubleVertex are cast to Vertex in the following code 
 fragment. But LongDoubleFloatDoubleVertex inherits from BasicVertex instead 
 of Vertex.
 if (vertex != null) {
((MutableVertexI, V, E, M) vertex).setVertexId(vertexIndex);
partition.putVertex((VertexI, V, E, M) vertex);
 } else if (originalVertex != null) {
   partition.removeVertex(originalVertex.getVertexId());
 }
 I did a simple change: cast LongDoubleFloatDoubleVertex to BasicVertex. The 
 problem went away, and the algorithm finished without any error. But I am not 
 sure this change has any implication to other parts of the code. So, I hope 
 to get some comments from the Giraph developers.
 if (vertex != null) {
((MutableVertexI, V, E, M) vertex).setVertexId(vertexIndex);
partition.putVertex((BasicVertexI, V, E, M) vertex);
 } else if (originalVertex != null) {
   partition.removeVertex(originalVertex.getVertexId());
 }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-106) Refactor prepareSuperstep() to make setMessages(IterableM messages) package-private

2011-12-19 Thread Avery Ching (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avery Ching updated GIRAPH-106:
---

Attachment: GIRAPH-106.diff

 Refactor prepareSuperstep() to make setMessages(IterableM messages) 
 package-private
 -

 Key: GIRAPH-106
 URL: https://issues.apache.org/jira/browse/GIRAPH-106
 Project: Giraph
  Issue Type: Improvement
Reporter: Avery Ching
Assignee: Avery Ching
 Attachments: GIRAPH-106.diff


 GIRAPH-80 revealed that there is some refactoring to make setMessages() 
 package-private (prevent users from messing around with internals).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-108) Refactor code to run independently of Map/Reduce

2011-12-19 Thread Ed Kohlwey (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ed Kohlwey updated GIRAPH-108:
--

Attachment: GIRAPH-108

 Refactor code to run independently of Map/Reduce
 

 Key: GIRAPH-108
 URL: https://issues.apache.org/jira/browse/GIRAPH-108
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Reporter: Ed Kohlwey
 Attachments: GIRAPH-108


 It would be nice for Giraph to be refactored such that the code could 
 eventually be run outside of map/reduce. This will allow people to write 
 drivers that can run in the cool new resource manager frameworks like Mesos 
 and YARN, and eventually let the application's code base evolve to be 
 independent of map/reduce.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-105) BspServiceMaster.checkWorkers() should return empty lists instead of null

2011-12-18 Thread Sebastian Schelter (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Schelter updated GIRAPH-105:
--

Attachment: GIRAPH-105-2.patch

updated patch to reflect aching's suggestions, ran local and pseudo distributed 
unit-tests

 BspServiceMaster.checkWorkers() should return empty lists instead of null
 -

 Key: GIRAPH-105
 URL: https://issues.apache.org/jira/browse/GIRAPH-105
 Project: Giraph
  Issue Type: Bug
Affects Versions: 0.70.0
Reporter: Sebastian Schelter
Priority: Minor
 Attachments: GIRAPH-105-2.patch, GIRAPH-105.patch


 BspServiceMaster.checkWorkers() is invoked in 
 BspServiceMaster.coordinateSuperstep() and in 
 BspServiceMaster.createInputSplits(). Both check for an empty list to fail 
 the job in case something has gone wrong. However, checkWorkers() returns 
 null in case of problems, causing an NPE in the calling code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-105) BspServiceMaster.checkWorkers() should return empty lists instead of null

2011-12-17 Thread Sebastian Schelter (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Schelter updated GIRAPH-105:
--

Attachment: GIRAPH-105.patch

 BspServiceMaster.checkWorkers() should return empty lists instead of null
 -

 Key: GIRAPH-105
 URL: https://issues.apache.org/jira/browse/GIRAPH-105
 Project: Giraph
  Issue Type: Bug
Affects Versions: 0.70.0
Reporter: Sebastian Schelter
Priority: Minor
 Attachments: GIRAPH-105.patch


 BspServiceMaster.checkWorkers() is invoked in 
 BspServiceMaster.coordinateSuperstep() and in 
 BspServiceMaster.createInputSplits(). Both check for an empty list to fail 
 the job in case something has gone wrong. However, checkWorkers() returns 
 null in case of problems, causing an NPE in the calling code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-57) Add new RPC call (putVertexIdMessagesList) to batch putMsgList RPCs together

2011-12-15 Thread Avery Ching (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-57?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avery Ching updated GIRAPH-57:
--

Attachment: GIRAPH-57.diff.2

With the final patch (+Apache license header on VertexIdMessages.java).

 Add new RPC call (putVertexIdMessagesList) to batch putMsgList RPCs together
 

 Key: GIRAPH-57
 URL: https://issues.apache.org/jira/browse/GIRAPH-57
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan
Assignee: Avery Ching
 Attachments: GIRAPH-57.diff, GIRAPH-57.diff.2


 Right now messages are sent to a vertex one at a time.  It would be good to 
 have a putMsgs call that could send messages to multiple vertices (all hosted 
 on the same worker).  We'd save a huge number of individual RPC calls at the 
 expense of having smaller calls with larger payloads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-104) Save half of maximum memory used from messaging

2011-12-13 Thread Avery Ching (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avery Ching updated GIRAPH-104:
---

Attachment: GIRAPH-104.diff

 Save half of maximum memory used from messaging
 ---

 Key: GIRAPH-104
 URL: https://issues.apache.org/jira/browse/GIRAPH-104
 Project: Giraph
  Issue Type: Improvement
Reporter: Avery Ching
Assignee: Avery Ching
Priority: Critical
 Attachments: GIRAPH-104.diff


 Currently, the amount of memory that Giraph uses for messaging is huge.  This 
 JIRA will reduce the messaging memory by half and provide periodic updates of 
 memory for debugging.  Details are below:
 Refactored RandomMessageBenchmark to an internal vertex class.  Added 
 aggregators to RandomMessagesBenchmark to track bytes, messages, and time for 
 the messaging.  Adjusted the postSuperstep() to be called after the flush() 
 for more accurate timings.
 Added periodic minute updates for message flushing (which can take a while, 
 especially on the memory benchmark).  This helps to see how progress is going 
 and gives an ETA.
 Memory optimizations include:
 - Clear the message list after computation 
 - Free vertex messages on the source as the flush is going on 
 - TreeMap - HashMap for VertexMutations
 - Sizing the ArrayList properly in transientInMessages

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-103) Added properties for commonly used package version to pom.xml

2011-12-09 Thread Avery Ching (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avery Ching updated GIRAPH-103:
---

Attachment: GIRAPH-103.diff

 Added properties for commonly used package version to pom.xml
 -

 Key: GIRAPH-103
 URL: https://issues.apache.org/jira/browse/GIRAPH-103
 Project: Giraph
  Issue Type: Improvement
  Components: build
Reporter: Avery Ching
Priority: Trivial
 Attachments: GIRAPH-103.diff




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-10) Aggregators are not exported

2011-12-05 Thread Claudio Martella (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-10?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Claudio Martella updated GIRAPH-10:
---

Attachment: GIRAPH-10.diff

fixed according to Avery's feedback.

About tabs and spaces, sorry about that, i probably wrote a file before setting 
my IDE correctly.

 Aggregators are not exported
 

 Key: GIRAPH-10
 URL: https://issues.apache.org/jira/browse/GIRAPH-10
 Project: Giraph
  Issue Type: New Feature
Reporter: Avery Ching
Assignee: Claudio Martella
Priority: Minor
 Attachments: GIRAPH-10.diff, GIRAPH-10.diff, GIRAPH-10.diff


 Currently, aggregator values cannot be saved after a Giraph job.  There 
 should be a way to do this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-10) Aggregators are not exported

2011-12-05 Thread Claudio Martella (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-10?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Claudio Martella updated GIRAPH-10:
---

Attachment: GIRAPH-10.diff

Fixed according to last feedback. Committing

 Aggregators are not exported
 

 Key: GIRAPH-10
 URL: https://issues.apache.org/jira/browse/GIRAPH-10
 Project: Giraph
  Issue Type: New Feature
Reporter: Avery Ching
Assignee: Claudio Martella
Priority: Minor
 Attachments: GIRAPH-10.diff, GIRAPH-10.diff, GIRAPH-10.diff, 
 GIRAPH-10.diff


 Currently, aggregator values cannot be saved after a Giraph job.  There 
 should be a way to do this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-10) Aggregators are not exported

2011-12-03 Thread Claudio Martella (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-10?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Claudio Martella updated GIRAPH-10:
---

Attachment: GIRAPH-10.diff

Exports Aggregators, adds unit test for the new class. Passes unit tests.

 Aggregators are not exported
 

 Key: GIRAPH-10
 URL: https://issues.apache.org/jira/browse/GIRAPH-10
 Project: Giraph
  Issue Type: New Feature
Reporter: Avery Ching
Assignee: Claudio Martella
Priority: Minor
 Attachments: GIRAPH-10.diff


 Currently, aggregator values cannot be saved after a Giraph job.  There 
 should be a way to do this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   >