[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-04-19 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257343#comment-13257343
 ] 

Avery Ching commented on GIRAPH-153:


Okay, so it's still tonight (even though it is 12:44 AM).  =)

Brian, I've done an initial look at the code on reviewboard 
https://reviews.apache.org/r/4801/.  Please take a look.  Thanks.

 HBase/Accumulo Input and Output formats
 ---

 Key: GIRAPH-153
 URL: https://issues.apache.org/jira/browse/GIRAPH-153
 Project: Giraph
  Issue Type: New Feature
  Components: bsp
Affects Versions: 0.1.0
 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
Reporter: Brian Femiano
 Attachments: GIRAPH-153.patch


 Four abstract classes that wrap their respective delegate input/output 
 formats for
 easy hooks into vertex input format subclasses. I've included some sample 
 programs that show two very simple graph
 algorithms. I have a graph generator that builds out a very simple directed 
 structure, starting with a few 'root' nodes.
 Root nodes are defined as nodes which are not listed as a child anywhere in 
 the graph. 
 Algorithm 1) AccumuloRootMarker.java  -- Accumulo as read/write source. 
 Every vertex starts thinking it's a root. At superstep 0, send a message down 
 to each
 child as a non-root notification. After superstep 1, only root nodes will 
 have never been messaged. 
 Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by 
 bundling the notification logic followed by root node propagation. Once we've 
 marked the appropriate nodes as roots, tell every child which roots it can be 
 traced back to via one or more spanning trees. This will take N + 2 
 supersteps where N is the maximum number of hops from any root to any leaf, 
 plus 2 supersteps for the initial root flagging. 
 I've included all relevant code plus DistributedCacheHelper.java for 
 recursive cache file and archive searches. It is more hadoop centric than 
 giraph, but these jobs use it so I figured why not commit here. 
 These have been tested through local JobRunner, pseudo-distributed on the 
 aforementioned hardware, and full distributed on EC2. More details in the 
 comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-180) Publish SNAPSHOTs and released artifacts in the Maven repository

2012-04-18 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256246#comment-13256246
 ] 

Avery Ching commented on GIRAPH-180:


This is a good idea.  The only question I would have though is would we publish 
different jars for every version of hadoop?

 Publish SNAPSHOTs and released artifacts in the Maven repository
 

 Key: GIRAPH-180
 URL: https://issues.apache.org/jira/browse/GIRAPH-180
 Project: Giraph
  Issue Type: Improvement
  Components: build
Affects Versions: 0.1.0
Reporter: Paolo Castagna
Priority: Minor
   Original Estimate: 4h
  Remaining Estimate: 4h

 Currently Giraph uses Maven to drive its build.
 However, no Maven artifacts nor SNAPSHOTs are published in the Apache Maven 
 repository or Maven central.
 It would be useful to have Apache Giraph artifacts and SNAPSHOTs published 
 and enable people to use Giraph without recompiling themselves.
 Right now users can checkout Giraph, mvn install it and use this for their 
 dependency:
 dependency
   groupIdorg.apache.giraph/groupId
   artifactIdgiraph/artifactId
   version0.2-SNAPSHOT/version
 /dependency
 So, it's not that bad, but it can be better. :-)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-04-18 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256767#comment-13256767
 ] 

Avery Ching commented on GIRAPH-153:


I think hosting the submodule on github would produce one more barrier to 
entry.  I prefer to have it with Giraph directly.  Anyone else?

 HBase/Accumulo Input and Output formats
 ---

 Key: GIRAPH-153
 URL: https://issues.apache.org/jira/browse/GIRAPH-153
 Project: Giraph
  Issue Type: New Feature
  Components: bsp
Affects Versions: 0.1.0
 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
Reporter: Brian Femiano
 Attachments: GIRAPH-153.patch


 Four abstract classes that wrap their respective delegate input/output 
 formats for
 easy hooks into vertex input format subclasses. I've included some sample 
 programs that show two very simple graph
 algorithms. I have a graph generator that builds out a very simple directed 
 structure, starting with a few 'root' nodes.
 Root nodes are defined as nodes which are not listed as a child anywhere in 
 the graph. 
 Algorithm 1) AccumuloRootMarker.java  -- Accumulo as read/write source. 
 Every vertex starts thinking it's a root. At superstep 0, send a message down 
 to each
 child as a non-root notification. After superstep 1, only root nodes will 
 have never been messaged. 
 Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by 
 bundling the notification logic followed by root node propagation. Once we've 
 marked the appropriate nodes as roots, tell every child which roots it can be 
 traced back to via one or more spanning trees. This will take N + 2 
 supersteps where N is the maximum number of hops from any root to any leaf, 
 plus 2 supersteps for the initial root flagging. 
 I've included all relevant code plus DistributedCacheHelper.java for 
 recursive cache file and archive searches. It is more hadoop centric than 
 giraph, but these jobs use it so I figured why not commit here. 
 These have been tested through local JobRunner, pseudo-distributed on the 
 aforementioned hardware, and full distributed on EC2. More details in the 
 comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-04-18 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256856#comment-13256856
 ] 

Avery Ching commented on GIRAPH-153:


I'll take a look at this patch tonight Brian. =)

 HBase/Accumulo Input and Output formats
 ---

 Key: GIRAPH-153
 URL: https://issues.apache.org/jira/browse/GIRAPH-153
 Project: Giraph
  Issue Type: New Feature
  Components: bsp
Affects Versions: 0.1.0
 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
Reporter: Brian Femiano
 Attachments: GIRAPH-153.patch


 Four abstract classes that wrap their respective delegate input/output 
 formats for
 easy hooks into vertex input format subclasses. I've included some sample 
 programs that show two very simple graph
 algorithms. I have a graph generator that builds out a very simple directed 
 structure, starting with a few 'root' nodes.
 Root nodes are defined as nodes which are not listed as a child anywhere in 
 the graph. 
 Algorithm 1) AccumuloRootMarker.java  -- Accumulo as read/write source. 
 Every vertex starts thinking it's a root. At superstep 0, send a message down 
 to each
 child as a non-root notification. After superstep 1, only root nodes will 
 have never been messaged. 
 Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by 
 bundling the notification logic followed by root node propagation. Once we've 
 marked the appropriate nodes as roots, tell every child which roots it can be 
 traced back to via one or more spanning trees. This will take N + 2 
 supersteps where N is the maximum number of hops from any root to any leaf, 
 plus 2 supersteps for the initial root flagging. 
 I've included all relevant code plus DistributedCacheHelper.java for 
 recursive cache file and archive searches. It is more hadoop centric than 
 giraph, but these jobs use it so I figured why not commit here. 
 These have been tested through local JobRunner, pseudo-distributed on the 
 aforementioned hardware, and full distributed on EC2. More details in the 
 comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-181) Add Hadoop 1.0 profile to pom.xml

2012-04-16 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254890#comment-13254890
 ] 

Avery Ching commented on GIRAPH-181:


+1, committed.

 Add Hadoop 1.0 profile to pom.xml
 -

 Key: GIRAPH-181
 URL: https://issues.apache.org/jira/browse/GIRAPH-181
 Project: Giraph
  Issue Type: Improvement
  Components: build
Affects Versions: 0.2.0
Reporter: Eugene Koontz
Assignee: Eugene Koontz
 Fix For: 0.2.0

 Attachments: GIRAPH-181.patch, GIRAPH-181.patch


 Hadoop 1.0.x is now considered the current stable version of Hadoop, 
 according to http://hadoop.apache.org/common/releases.html#Download .
 This JIRA is to add support within Giraph's maven profile for the 1.0.x 
 Hadoop release. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-184) Upgrade to junit4

2012-04-14 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254152#comment-13254152
 ] 

Avery Ching commented on GIRAPH-184:


Thanks!

 Upgrade to junit4
 -

 Key: GIRAPH-184
 URL: https://issues.apache.org/jira/browse/GIRAPH-184
 Project: Giraph
  Issue Type: Bug
Reporter: Devaraj K
Assignee: Devaraj K

 Presently Giraph uses JUnit 3.8.1. We can upgrade to JUnit 4

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-183) Add Claudio's FOSDEM presentation (slides and video) to the site

2012-04-13 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253488#comment-13253488
 ] 

Avery Ching commented on GIRAPH-183:


+1.  This is great stuff Claudio.

 Add Claudio's FOSDEM presentation (slides and video) to the site
 

 Key: GIRAPH-183
 URL: https://issues.apache.org/jira/browse/GIRAPH-183
 Project: Giraph
  Issue Type: Improvement
  Components: site
Reporter: Claudio Martella
Assignee: Claudio Martella
Priority: Trivial
  Labels: newbie
 Attachments: GIRAPH-183.diff


 Presentation: 
 http://prezi.com/9ake_klzwrga/apache-giraph-distributed-graph-processing-in-the-cloud/
 Video: http://www.youtube.com/watch?v=3ZrqPEIPRe4, 
 http://www.youtube.com/watch?v=BmRaejKGeDM

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-183) Add Claudio's FOSDEM presentation (slides and video) to the site

2012-04-13 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253655#comment-13253655
 ] 

Avery Ching commented on GIRAPH-183:


Are the problems related to GIRAPH-168?

 Add Claudio's FOSDEM presentation (slides and video) to the site
 

 Key: GIRAPH-183
 URL: https://issues.apache.org/jira/browse/GIRAPH-183
 Project: Giraph
  Issue Type: Improvement
  Components: site
Reporter: Claudio Martella
Assignee: Claudio Martella
Priority: Trivial
  Labels: newbie
 Attachments: GIRAPH-183.diff


 Presentation: 
 http://prezi.com/9ake_klzwrga/apache-giraph-distributed-graph-processing-in-the-cloud/
 Video: http://www.youtube.com/watch?v=3ZrqPEIPRe4, 
 http://www.youtube.com/watch?v=BmRaejKGeDM

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-168) Simplify munge directive usage with new munge flag HADOOP_SECURE (rather than HADOOP_FACEBOOK) and remove usage of HADOOP

2012-04-10 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250476#comment-13250476
 ] 

Avery Ching commented on GIRAPH-168:


Eugene, I committed your patch, which passed 'mvn verify', however, seems to 
have changed the way the Junit test report somehow.

Here's the result after your patch (99)

Recording test results
No test report files were found. Configuration error?
Build step 'Publish JUnit test result report' changed build result to FAILURE
Updating GIRAPH-168
Finished: FAILURE

https://builds.apache.org/job/Giraph-trunk-Commit/99/

The last commit seemed to have the JUnit test result reports just fine 
(https://builds.apache.org/job/Giraph-trunk-Commit/98/). 

Can you please take a look?

 Simplify munge directive usage with new munge flag HADOOP_SECURE (rather than 
 HADOOP_FACEBOOK) and remove usage of HADOOP
 -

 Key: GIRAPH-168
 URL: https://issues.apache.org/jira/browse/GIRAPH-168
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Eugene Koontz
Assignee: Eugene Koontz
 Attachments: GIRAPH-168.patch, GIRAPH-168.patch, GIRAPH-168.patch, 
 GIRAPH-168.patch, GIRAPH-168.patch, GIRAPH-168.patch


 This JIRA relates to the mail thread here: 
 http://mail-archives.apache.org/mod_mbox/incubator-giraph-dev/201203.mbox/browser
 Currently we check for the munge flags HADOOP, HADOOP_FACEBOOK and 
 HADOOP_NON_SECURE when using munge in a few places. Hopefully we can 
 eliminate usage of munge in the future, but until then, we can mitigate the 
 complexity by consolidating the number of flags checked. This JIRA renames 
 HADOOP_FACEBOOK to HADOOP_SECURE, and removes usages of HADOOP, to handle the 
 same conditional compilation requirements. It also makes it easier to add 
 more maven profiles so that we can easily increase our hadoop version 
 coverage.
 This patch modifies the existing hadoop_facebook profile to use the new 
 HADOOP_SECURE munge flag, rather than HADOOP_FACEBOOK.
 It also adds a new hadoop maven profile, hadoop_trunk, which also sets 
 HADOOP_SECURE. 
 Finally, it adds a default profile, hadoop_0.20.203. This is needed so that 
 we can specify its dependencies separately from hadoop_trunk, because the 
 hadoop dependencies have changed between trunk and 0.205.0 - the former 
 requires hadoop-common, hadoop-mapreduce-client-core, and 
 hadoop-mapreduce-client-common, whereas the latter requires hadoop-core. 
 With this patch, the following passes:
 {code}
 mvn clean verify  mvn -Phadoop_trunk clean verify  mvn -Phadoop_0.20.203 
 clean verify
 {code}
 Current problems: 
 * I left in place the usage of HADOOP_NON_SECURE, but note that the profile 
 that uses this is hadoop_non_secure, which fails to compile on trunk: 
 https://issues.apache.org/jira/browse/GIRAPH-167 .
 * I couldn't get -Phadoop_facebook to work; does this work outside of 
 Facebook?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-168) Simplify munge directive usage with new munge flag HADOOP_SECURE (rather than HADOOP_FACEBOOK) and remove usage of HADOOP

2012-04-10 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250874#comment-13250874
 ] 

Avery Ching commented on GIRAPH-168:


I can modify Hudson to do execute the commands you used above.  Any 
thoughts/comments?

 Simplify munge directive usage with new munge flag HADOOP_SECURE (rather than 
 HADOOP_FACEBOOK) and remove usage of HADOOP
 -

 Key: GIRAPH-168
 URL: https://issues.apache.org/jira/browse/GIRAPH-168
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Eugene Koontz
Assignee: Eugene Koontz
 Attachments: GIRAPH-168.patch, GIRAPH-168.patch, GIRAPH-168.patch, 
 GIRAPH-168.patch, GIRAPH-168.patch, GIRAPH-168.patch


 This JIRA relates to the mail thread here: 
 http://mail-archives.apache.org/mod_mbox/incubator-giraph-dev/201203.mbox/browser
 Currently we check for the munge flags HADOOP, HADOOP_FACEBOOK and 
 HADOOP_NON_SECURE when using munge in a few places. Hopefully we can 
 eliminate usage of munge in the future, but until then, we can mitigate the 
 complexity by consolidating the number of flags checked. This JIRA renames 
 HADOOP_FACEBOOK to HADOOP_SECURE, and removes usages of HADOOP, to handle the 
 same conditional compilation requirements. It also makes it easier to add 
 more maven profiles so that we can easily increase our hadoop version 
 coverage.
 This patch modifies the existing hadoop_facebook profile to use the new 
 HADOOP_SECURE munge flag, rather than HADOOP_FACEBOOK.
 It also adds a new hadoop maven profile, hadoop_trunk, which also sets 
 HADOOP_SECURE. 
 Finally, it adds a default profile, hadoop_0.20.203. This is needed so that 
 we can specify its dependencies separately from hadoop_trunk, because the 
 hadoop dependencies have changed between trunk and 0.205.0 - the former 
 requires hadoop-common, hadoop-mapreduce-client-core, and 
 hadoop-mapreduce-client-common, whereas the latter requires hadoop-core. 
 With this patch, the following passes:
 {code}
 mvn clean verify  mvn -Phadoop_trunk clean verify  mvn -Phadoop_0.20.203 
 clean verify
 {code}
 Current problems: 
 * I left in place the usage of HADOOP_NON_SECURE, but note that the profile 
 that uses this is hadoop_non_secure, which fails to compile on trunk: 
 https://issues.apache.org/jira/browse/GIRAPH-167 .
 * I couldn't get -Phadoop_facebook to work; does this work outside of 
 Facebook?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-168) Simplify munge directive usage with new munge flag HADOOP_SECURE (rather than HADOOP_FACEBOOK) and remove usage of HADOOP

2012-04-10 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250947#comment-13250947
 ] 

Avery Ching commented on GIRAPH-168:


I would ignore the facebook one for now (we can add it later), but I can try  

mvn -Phadoop_non_secure clean verify  
mvn -Phadoop_0.20.203 clean verify  
mvn clean verify  
mvn -Phadoop_0.23 clean verify  
mvn -Phadoop_trunk clean verify


 Simplify munge directive usage with new munge flag HADOOP_SECURE (rather than 
 HADOOP_FACEBOOK) and remove usage of HADOOP
 -

 Key: GIRAPH-168
 URL: https://issues.apache.org/jira/browse/GIRAPH-168
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Eugene Koontz
Assignee: Eugene Koontz
 Attachments: GIRAPH-168.patch, GIRAPH-168.patch, GIRAPH-168.patch, 
 GIRAPH-168.patch, GIRAPH-168.patch, GIRAPH-168.patch


 This JIRA relates to the mail thread here: 
 http://mail-archives.apache.org/mod_mbox/incubator-giraph-dev/201203.mbox/browser
 Currently we check for the munge flags HADOOP, HADOOP_FACEBOOK and 
 HADOOP_NON_SECURE when using munge in a few places. Hopefully we can 
 eliminate usage of munge in the future, but until then, we can mitigate the 
 complexity by consolidating the number of flags checked. This JIRA renames 
 HADOOP_FACEBOOK to HADOOP_SECURE, and removes usages of HADOOP, to handle the 
 same conditional compilation requirements. It also makes it easier to add 
 more maven profiles so that we can easily increase our hadoop version 
 coverage.
 This patch modifies the existing hadoop_facebook profile to use the new 
 HADOOP_SECURE munge flag, rather than HADOOP_FACEBOOK.
 It also adds a new hadoop maven profile, hadoop_trunk, which also sets 
 HADOOP_SECURE. 
 Finally, it adds a default profile, hadoop_0.20.203. This is needed so that 
 we can specify its dependencies separately from hadoop_trunk, because the 
 hadoop dependencies have changed between trunk and 0.205.0 - the former 
 requires hadoop-common, hadoop-mapreduce-client-core, and 
 hadoop-mapreduce-client-common, whereas the latter requires hadoop-core. 
 With this patch, the following passes:
 {code}
 mvn clean verify  mvn -Phadoop_trunk clean verify  mvn -Phadoop_0.20.203 
 clean verify
 {code}
 Current problems: 
 * I left in place the usage of HADOOP_NON_SECURE, but note that the profile 
 that uses this is hadoop_non_secure, which fails to compile on trunk: 
 https://issues.apache.org/jira/browse/GIRAPH-167 .
 * I couldn't get -Phadoop_facebook to work; does this work outside of 
 Facebook?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-182) Provide SequenceFileVertexOutputFormat as an available OutputFormat

2012-04-10 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251180#comment-13251180
 ] 

Avery Ching commented on GIRAPH-182:


Agreed, would you like to work on it Pradeep?

 Provide SequenceFileVertexOutputFormat as an available OutputFormat
 ---

 Key: GIRAPH-182
 URL: https://issues.apache.org/jira/browse/GIRAPH-182
 Project: Giraph
  Issue Type: New Feature
  Components: lib
Reporter: Pradeep Gollakota
Priority: Minor

 SequenceFile's are heavily used in Hadoop. We should provide 
 SequenceFileVertexOutputFormat. Since SequenceFileVertexInputFormat is 
 already provided, it makes sense to also provide a mirroring OutputFormat

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-168) Simplify munge directive usage with new munge flag HADOOP_SECURE (rather than HADOOP_FACEBOOK) and remove usage of HADOOP

2012-04-09 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249702#comment-13249702
 ] 

Avery Ching commented on GIRAPH-168:


Nice that you got it working with all the versions!  One question though, why 
is the line below needed in pom.xml?

org.apache.hadoop.giraph.zkJargiraph-0.2-SNAPSHOT-jar-with-dependencies.jar/org.apache.hadoop.giraph.zkJar

 Simplify munge directive usage with new munge flag HADOOP_SECURE (rather than 
 HADOOP_FACEBOOK) and remove usage of HADOOP
 -

 Key: GIRAPH-168
 URL: https://issues.apache.org/jira/browse/GIRAPH-168
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Eugene Koontz
Assignee: Eugene Koontz
 Attachments: GIRAPH-168.patch, GIRAPH-168.patch, GIRAPH-168.patch, 
 GIRAPH-168.patch, GIRAPH-168.patch


 This JIRA relates to the mail thread here: 
 http://mail-archives.apache.org/mod_mbox/incubator-giraph-dev/201203.mbox/browser
 Currently we check for the munge flags HADOOP, HADOOP_FACEBOOK and 
 HADOOP_NON_SECURE when using munge in a few places. Hopefully we can 
 eliminate usage of munge in the future, but until then, we can mitigate the 
 complexity by consolidating the number of flags checked. This JIRA renames 
 HADOOP_FACEBOOK to HADOOP_SECURE, and removes usages of HADOOP, to handle the 
 same conditional compilation requirements. It also makes it easier to add 
 more maven profiles so that we can easily increase our hadoop version 
 coverage.
 This patch modifies the existing hadoop_facebook profile to use the new 
 HADOOP_SECURE munge flag, rather than HADOOP_FACEBOOK.
 It also adds a new hadoop maven profile, hadoop_trunk, which also sets 
 HADOOP_SECURE. 
 Finally, it adds a default profile, hadoop_0.20.203. This is needed so that 
 we can specify its dependencies separately from hadoop_trunk, because the 
 hadoop dependencies have changed between trunk and 0.205.0 - the former 
 requires hadoop-common, hadoop-mapreduce-client-core, and 
 hadoop-mapreduce-client-common, whereas the latter requires hadoop-core. 
 With this patch, the following passes:
 {code}
 mvn clean verify  mvn -Phadoop_trunk clean verify  mvn -Phadoop_0.20.203 
 clean verify
 {code}
 Current problems: 
 * I left in place the usage of HADOOP_NON_SECURE, but note that the profile 
 that uses this is hadoop_non_secure, which fails to compile on trunk: 
 https://issues.apache.org/jira/browse/GIRAPH-167 .
 * I couldn't get -Phadoop_facebook to work; does this work outside of 
 Facebook?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-168) Simplify munge directive usage with new munge flag HADOOP_SECURE (rather than HADOOP_FACEBOOK) and remove usage of HADOOP

2012-04-09 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250224#comment-13250224
 ] 

Avery Ching commented on GIRAPH-168:


+1.  Given this is a somewhat large change, I'll wait until tonight to see if 
anyone opposes it.  If not, I'll commit.

 Simplify munge directive usage with new munge flag HADOOP_SECURE (rather than 
 HADOOP_FACEBOOK) and remove usage of HADOOP
 -

 Key: GIRAPH-168
 URL: https://issues.apache.org/jira/browse/GIRAPH-168
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Eugene Koontz
Assignee: Eugene Koontz
 Attachments: GIRAPH-168.patch, GIRAPH-168.patch, GIRAPH-168.patch, 
 GIRAPH-168.patch, GIRAPH-168.patch, GIRAPH-168.patch


 This JIRA relates to the mail thread here: 
 http://mail-archives.apache.org/mod_mbox/incubator-giraph-dev/201203.mbox/browser
 Currently we check for the munge flags HADOOP, HADOOP_FACEBOOK and 
 HADOOP_NON_SECURE when using munge in a few places. Hopefully we can 
 eliminate usage of munge in the future, but until then, we can mitigate the 
 complexity by consolidating the number of flags checked. This JIRA renames 
 HADOOP_FACEBOOK to HADOOP_SECURE, and removes usages of HADOOP, to handle the 
 same conditional compilation requirements. It also makes it easier to add 
 more maven profiles so that we can easily increase our hadoop version 
 coverage.
 This patch modifies the existing hadoop_facebook profile to use the new 
 HADOOP_SECURE munge flag, rather than HADOOP_FACEBOOK.
 It also adds a new hadoop maven profile, hadoop_trunk, which also sets 
 HADOOP_SECURE. 
 Finally, it adds a default profile, hadoop_0.20.203. This is needed so that 
 we can specify its dependencies separately from hadoop_trunk, because the 
 hadoop dependencies have changed between trunk and 0.205.0 - the former 
 requires hadoop-common, hadoop-mapreduce-client-core, and 
 hadoop-mapreduce-client-common, whereas the latter requires hadoop-core. 
 With this patch, the following passes:
 {code}
 mvn clean verify  mvn -Phadoop_trunk clean verify  mvn -Phadoop_0.20.203 
 clean verify
 {code}
 Current problems: 
 * I left in place the usage of HADOOP_NON_SECURE, but note that the profile 
 that uses this is hadoop_non_secure, which fails to compile on trunk: 
 https://issues.apache.org/jira/browse/GIRAPH-167 .
 * I couldn't get -Phadoop_facebook to work; does this work outside of 
 Facebook?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-171) total time in MasterThread.run() is calculated incorrectly

2012-04-06 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13248175#comment-13248175
 ] 

Avery Ching commented on GIRAPH-171:


+1 Argh, it is inconsistent with the counter, GIRAPH_TIMERS_COUNTER_GROUP_NAME. 
 Thanks for the fix Eugene!

 total time in MasterThread.run() is calculated incorrectly
 --

 Key: GIRAPH-171
 URL: https://issues.apache.org/jira/browse/GIRAPH-171
 Project: Giraph
  Issue Type: Bug
Reporter: Eugene Koontz
Assignee: Eugene Koontz
 Attachments: GIRAPH-171.patch


 While running PageMarkBenchMark, I was seeing in the output:
 {{graph.MasterThread(172): total: Took 1.3336739262910001E9 seconds.}}
 This was because currently, in {{MasterThread.run()}}, we have:
 {code}
 LOG.info(total: Took  +
  ((System.currentTimeMillis() / 1000.0d) -
  setupSecs) +  seconds.);
 {code}
 but it should be:
 {code}
LOG.info(total: Took  +
((System.currentTimeMillis() - startMillis) /
   1000.0d) +  seconds.);
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-77) Coordinator should expose a web interface with progress, vertex region assignments, etc.

2012-04-04 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246561#comment-13246561
 ] 

Avery Ching commented on GIRAPH-77:
---

Paolo, would you be interested in working on this? =)

 Coordinator should expose a web interface with progress, vertex region 
 assignments, etc.
 

 Key: GIRAPH-77
 URL: https://issues.apache.org/jira/browse/GIRAPH-77
 Project: Giraph
  Issue Type: New Feature
Reporter: Jakob Homan

 It would be nice if the coordinator worker had a web interface that showed 
 progress, splits, etc. during job execution. Right now it would duplicate 
 information currently being exposed through task status, but with the move to 
 YARN, it will be a necessity.  It would be great if we could do this in a 
 modern way to avoid the screen-scraping, etc. currently used to get 
 information from most other Hadoop project's web interfaces.  The coordinator 
 could announce its address at the beginning or via status updates.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-04-03 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245480#comment-13245480
 ] 

Avery Ching commented on GIRAPH-153:


From what you've described, sounds good to me.  In the worst case, we can 
change it to a submodule if that makes more sense in the future.  I would like 
to use a similar approach for https://issues.apache.org/jira/browse/GIRAPH-93, 
as Jakob mentioned.

 HBase/Accumulo Input and Output formats
 ---

 Key: GIRAPH-153
 URL: https://issues.apache.org/jira/browse/GIRAPH-153
 Project: Giraph
  Issue Type: New Feature
  Components: bsp
Affects Versions: 0.1.0
 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
Reporter: Brian Femiano

 Four abstract classes that wrap their respective delegate input/output 
 formats for
 easy hooks into vertex input format subclasses. I've included some sample 
 programs that show two very simple graph
 algorithms. I have a graph generator that builds out a very simple directed 
 structure, starting with a few 'root' nodes.
 Root nodes are defined as nodes which are not listed as a child anywhere in 
 the graph. 
 Algorithm 1) AccumuloRootMarker.java  -- Accumulo as read/write source. 
 Every vertex starts thinking it's a root. At superstep 0, send a message down 
 to each
 child as a non-root notification. After superstep 1, only root nodes will 
 have never been messaged. 
 Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by 
 bundling the notification logic followed by root node propagation. Once we've 
 marked the appropriate nodes as roots, tell every child which roots it can be 
 traced back to via one or more spanning trees. This will take N + 2 
 supersteps where N is the maximum number of hops from any root to any leaf, 
 plus 2 supersteps for the initial root flagging. 
 I've included all relevant code plus DistributedCacheHelper.java for 
 recursive cache file and archive searches. It is more hadoop centric than 
 giraph, but these jobs use it so I figured why not commit here. 
 These have been tested through local JobRunner, pseudo-distributed on the 
 aforementioned hardware, and full distributed on EC2. More details in the 
 comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-141) mulitgraph support in giraph

2012-04-03 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245484#comment-13245484
 ] 

Avery Ching commented on GIRAPH-141:


Yes, I also think this is an important feature.  Anyone want to work on it? =)

 mulitgraph support in giraph
 

 Key: GIRAPH-141
 URL: https://issues.apache.org/jira/browse/GIRAPH-141
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Reporter: André Kelpe

 The current vertex API only supports simple graphs, meaning that there can 
 only ever be one edge between two vertices. Many graphs like the road network 
 are in fact multigraphs, where many edges can connect two vertices at the 
 same time.
 Support for this could be added by introducing an IteratorEdgeWritable 
 getEdgeValue() or a similar construct. Maybe introducing a slim object like a 
 Connector between the edge and the vertex is also a good idea, so that you 
 could do something like:
 {code} 
 for (final ConnectorEdgeWritable, VertexWritable conn: getEdgeValues(){
  final EdgeWritable edge = conn.getEdge();
  final VertexWritable otherVertex = conn.getOther();
  doInterestingStuff(otherVertex);
  doMoreInterestingStuff(edge);
 }
 {code} 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-169) How to close all child when a job finished?

2012-04-03 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245488#comment-13245488
 ] 

Avery Ching commented on GIRAPH-169:


This is a simple case.  I'll try and see if I can replicate it sometime this 
week.  Feel free to bug me if I forget. =)

 How to close all child when a job finished?
 ---

 Key: GIRAPH-169
 URL: https://issues.apache.org/jira/browse/GIRAPH-169
 Project: Giraph
  Issue Type: Improvement
  Components: mapreduce
Affects Versions: 0.2.0
 Environment: sles 11 x64,jdk 1.6,hadoop 0.20.205.0,1 Master and 8 
 slaves,
Reporter: Jianfeng Qian
Priority: Minor

 I ran pagerank at hadoop 0.20.205.0. When the job finished,the child in 
 slaves didn't quit immediately and sometimes they never quit and I have to 
 kill them. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-169) How to close all child when a job finished?

2012-03-28 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13240234#comment-13240234
 ] 

Avery Ching commented on GIRAPH-169:


Looks like the worker log got cut off?  Also, what version of Hadoop is this?

Does it work with different numbers of workers?

 How to close all child when a job finished?
 ---

 Key: GIRAPH-169
 URL: https://issues.apache.org/jira/browse/GIRAPH-169
 Project: Giraph
  Issue Type: Improvement
  Components: mapreduce
Affects Versions: 0.2.0
 Environment: sles 11 x64,jdk 1.6,hadoop 0.20.205.0,1 Master and 8 
 slaves,
Reporter: Jianfeng Qian
Priority: Minor

 I ran pagerank at hadoop 0.20.205.0. When the job finished,the child in 
 slaves didn't quit immediately and sometimes they never quit and I have to 
 kill them. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-169) How to close all child when a job finished?

2012-03-27 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13240112#comment-13240112
 ] 

Avery Ching commented on GIRAPH-169:


How many task trackers do you have?  

Are you seeing any errors?  Is the job completing successfully?

I'm guessing that the job isn't completing successfully, since everything 
should be cleaned up.

 How to close all child when a job finished?
 ---

 Key: GIRAPH-169
 URL: https://issues.apache.org/jira/browse/GIRAPH-169
 Project: Giraph
  Issue Type: Improvement
  Components: mapreduce
Affects Versions: 0.2.0
 Environment: sles 11 x64,jdk 1.6,hadoop 0.20.205.0,1 Master and 8 
 slaves,
Reporter: Jianfeng Qian
Priority: Minor

 I ran pagerank at hadoop 0.20.205.0. When the job finished,the child in 
 slaves didn't quit immediately and sometimes they never quit and I have to 
 kill them. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-169) How to close all child when a job finished?

2012-03-27 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13240161#comment-13240161
 ] 

Avery Ching commented on GIRAPH-169:


Do you have the logs of the workers?  I'd like to see why they can't exit.

 How to close all child when a job finished?
 ---

 Key: GIRAPH-169
 URL: https://issues.apache.org/jira/browse/GIRAPH-169
 Project: Giraph
  Issue Type: Improvement
  Components: mapreduce
Affects Versions: 0.2.0
 Environment: sles 11 x64,jdk 1.6,hadoop 0.20.205.0,1 Master and 8 
 slaves,
Reporter: Jianfeng Qian
Priority: Minor

 I ran pagerank at hadoop 0.20.205.0. When the job finished,the child in 
 slaves didn't quit immediately and sometimes they never quit and I have to 
 kill them. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-159) Case insensitive file/directory name matching will produce errors on M/R jar unpack.

2012-03-25 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237909#comment-13237909
 ] 

Avery Ching commented on GIRAPH-159:


+1. I left out your GiraphRunner changes since they were fixed by earlier 
JIRAs, but verified both the problem and the solution you proposed.  Looks 
good!  Thanks for the fix!  Committing.

 Case insensitive file/directory name matching will produce errors on M/R jar 
 unpack. 
 -

 Key: GIRAPH-159
 URL: https://issues.apache.org/jira/browse/GIRAPH-159
 Project: Giraph
  Issue Type: Bug
  Components: build
Affects Versions: 0.2.0
 Environment: OSX 10.6.8
Reporter: Brian Femiano
 Attachments: GIRAPH-159.patch, compile.xml


 This only seems to affect platforms where there can be a file/directory 
 naming conflicts
 from case insensitive matches. 
  
 I was able to reproduce running the pseudo-distributed unit tests within OSX.
 This has affected other projects: 
 https://issues.apache.org/jira/browse/MAHOUT-780
 I've been able to reproduce this on my local OSX install with the following 
 error:
 https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/a201218000e956d3/cc6eca3ef9f80ff8
 Since LICENSE.txt contains the same content as the file LICENSE, I propose we 
 exclude any LICENSE matches found in the unpacked dependency jars
 when the maven assembly phase hits 'jar-with-dependencies'. 
 I have a patch which moves the 'jar-with-dependencies' descriptor to an 
 external compile.xml file which has the proper excludes. This might also
 come in handy down the road should any additional tweaks be needed to the 
 compile phase. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-03-25 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237951#comment-13237951
 ] 

Avery Ching commented on GIRAPH-153:


34 MB is huge.  Can we do something like make the dependency scope provided and 
then use the distributed cache for unittests?

 HBase/Accumulo Input and Output formats
 ---

 Key: GIRAPH-153
 URL: https://issues.apache.org/jira/browse/GIRAPH-153
 Project: Giraph
  Issue Type: New Feature
  Components: bsp
Affects Versions: 0.1.0
 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
Reporter: Brian Femiano

 Four abstract classes that wrap their respective delegate input/output 
 formats for
 easy hooks into vertex input format subclasses. I've included some sample 
 programs that show two very simple graph
 algorithms. I have a graph generator that builds out a very simple directed 
 structure, starting with a few 'root' nodes.
 Root nodes are defined as nodes which are not listed as a child anywhere in 
 the graph. 
 Algorithm 1) AccumuloRootMarker.java  -- Accumulo as read/write source. 
 Every vertex starts thinking it's a root. At superstep 0, send a message down 
 to each
 child as a non-root notification. After superstep 1, only root nodes will 
 have never been messaged. 
 Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by 
 bundling the notification logic followed by root node propagation. Once we've 
 marked the appropriate nodes as roots, tell every child which roots it can be 
 traced back to via one or more spanning trees. This will take N + 2 
 supersteps where N is the maximum number of hops from any root to any leaf, 
 plus 2 supersteps for the initial root flagging. 
 I've included all relevant code plus DistributedCacheHelper.java for 
 recursive cache file and archive searches. It is more hadoop centric than 
 giraph, but these jobs use it so I figured why not commit here. 
 These have been tested through local JobRunner, pseudo-distributed on the 
 aforementioned hardware, and full distributed on EC2. More details in the 
 comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-03-24 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237468#comment-13237468
 ] 

Avery Ching commented on GIRAPH-153:


Brian, could you make it a single patch for us to take a look at?  I'm excited 
to see this work.

 HBase/Accumulo Input and Output formats
 ---

 Key: GIRAPH-153
 URL: https://issues.apache.org/jira/browse/GIRAPH-153
 Project: Giraph
  Issue Type: New Feature
  Components: bsp
Affects Versions: 0.1.0
 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
Reporter: Brian Femiano

 Four abstract classes that wrap their respective delegate input/output 
 formats for
 easy hooks into vertex input format subclasses. I've included some sample 
 programs that show two very simple graph
 algorithms. I have a graph generator that builds out a very simple directed 
 structure, starting with a few 'root' nodes.
 Root nodes are defined as nodes which are not listed as a child anywhere in 
 the graph. 
 Algorithm 1) AccumuloRootMarker.java  -- Accumulo as read/write source. 
 Every vertex starts thinking it's a root. At superstep 0, send a message down 
 to each
 child as a non-root notification. After superstep 1, only root nodes will 
 have never been messaged. 
 Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by 
 bundling the notification logic followed by root node propagation. Once we've 
 marked the appropriate nodes as roots, tell every child which roots it can be 
 traced back to via one or more spanning trees. This will take N + 2 
 supersteps where N is the maximum number of hops from any root to any leaf, 
 plus 2 supersteps for the initial root flagging. 
 I've included all relevant code plus DistributedCacheHelper.java for 
 recursive cache file and archive searches. It is more hadoop centric than 
 giraph, but these jobs use it so I figured why not commit here. 
 These have been tested through local JobRunner, pseudo-distributed on the 
 aforementioned hardware, and full distributed on EC2. More details in the 
 comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-144) GiraphJob should not extend Job (users should not be able to call Job methods like waitForCompletion or setMapper..etc)

2012-03-24 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237597#comment-13237597
 ] 

Avery Ching commented on GIRAPH-144:


Ping, anyone?  I'd like to close this out, one way or another.

 GiraphJob should not extend Job  (users should not be able to call Job 
 methods like waitForCompletion or setMapper..etc)
 

 Key: GIRAPH-144
 URL: https://issues.apache.org/jira/browse/GIRAPH-144
 Project: Giraph
  Issue Type: Bug
Reporter: Dave
Assignee: Avery Ching
 Attachments: GIRAPH-144.patch

   Original Estimate: 24h
  Remaining Estimate: 24h



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-167) mvn -Phadoop_non_secure clean verify fails

2012-03-23 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13236414#comment-13236414
 ] 

Avery Ching commented on GIRAPH-167:


+1, Commited, thanks for fixing this.

 mvn -Phadoop_non_secure clean verify fails
 --

 Key: GIRAPH-167
 URL: https://issues.apache.org/jira/browse/GIRAPH-167
 Project: Giraph
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Eugene Koontz
Assignee: Eugene Koontz
  Labels: build, hadoop
 Attachments: GIRAPH-167.patch


 The {{hadoop_non_secure}} profile, which uses hadoop 0.20.2, is failing to 
 compile:
 {code}
 [ERROR] COMPILATION ERROR : 
 [INFO] -
 [ERROR] 
 /Users/ekoontz/giraph/target/munged/main/org/apache/giraph/comm/RPCCommunications.java:[184,48]
  cannot find symbol
 symbol  : variable versionID
 location: class org.apache.giraph.comm.RPCCommunicationsI,V,E,M
 [INFO] 1 error
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-161) Handling null messages and edges when initializing IntIntNullIntVertex

2012-03-21 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13234139#comment-13234139
 ] 

Avery Ching commented on GIRAPH-161:


+1.  There are 5 checkstyle violations from GIRAPH-156, but this isn't the 
cause.  Committing, thanks Dionysios!

 Handling null messages and edges when initializing IntIntNullIntVertex
 --

 Key: GIRAPH-161
 URL: https://issues.apache.org/jira/browse/GIRAPH-161
 Project: Giraph
  Issue Type: Bug
  Components: graph
Affects Versions: 0.1.0
Reporter: Dionysios Logothetis
 Attachments: GIRAPH-161.patch


 The initialize() method in org.apache.giraph.graph.IntIntNullIntVertex should 
 handle null messages or null edges. Especially initializing with null 
 messages is a common case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-162) BspCase.setup() should catch FileNotFoundException thrown from org.apache.hadoop.fs.FileSystem.listStatus()

2012-03-21 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13234164#comment-13234164
 ] 

Avery Ching commented on GIRAPH-162:


Looks good.  +1.  I'm committing.

 BspCase.setup() should catch FileNotFoundException thrown from 
 org.apache.hadoop.fs.FileSystem.listStatus()
 ---

 Key: GIRAPH-162
 URL: https://issues.apache.org/jira/browse/GIRAPH-162
 Project: Giraph
  Issue Type: Bug
  Components: test
Affects Versions: 0.2.0
Reporter: Eugene Koontz
 Fix For: 0.2.0

 Attachments: GIRAPH-162.patch


 In hadoop trunk, org.apache.hadoop.fs.FileSystem.listStatus() is declared to 
 throws both FileNotFoundException and IOException. The former 
 (FileNotFoundException) is currently not caught when BspCase.setup() looks 
 for the GiraphJob.ZOOKEEPER_MANAGER_DIR_DEFAULT directory in order to delete 
 it. The listStatus() call throws FileNotException if this directory does not 
 exist and causes several tests to fail when using Hadoop trunk. This 
 exception should be caught and ignored during setup(), since it's not an 
 error for this directory not to exist.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-159) Case insensitive file/directory name matching will produce errors on M/R jar unpack.

2012-03-21 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13234177#comment-13234177
 ] 

Avery Ching commented on GIRAPH-159:


Brian, can you show me how to recreate this issue on OSX?

 Case insensitive file/directory name matching will produce errors on M/R jar 
 unpack. 
 -

 Key: GIRAPH-159
 URL: https://issues.apache.org/jira/browse/GIRAPH-159
 Project: Giraph
  Issue Type: Bug
  Components: build
Affects Versions: 0.2.0
 Environment: OSX 10.6.8
Reporter: Brian Femiano
Priority: Minor
 Attachments: GIRAPH-159.patch, compile.xml


 This only seems to affect platforms where there can be a file/directory 
 naming conflicts
 from case insensitive matches. 
  
 I was able to reproduce running the pseudo-distributed unit tests within OSX.
 This has affected other projects: 
 https://issues.apache.org/jira/browse/MAHOUT-780
 I've been able to reproduce this on my local OSX install with the following 
 error:
 https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/a201218000e956d3/cc6eca3ef9f80ff8
 Since LICENSE.txt contains the same content as the file LICENSE, I propose we 
 exclude any LICENSE matches found in the unpacked dependency jars
 when the maven assembly phase hits 'jar-with-dependencies'. 
 I have a patch which moves the 'jar-with-dependencies' descriptor to an 
 external compile.xml file which has the proper excludes. This might also
 come in handy down the road should any additional tweaks be needed to the 
 compile phase. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-164) fix 5 Line is longer than 80 characters style errors in GiraphRunner

2012-03-21 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13234843#comment-13234843
 ] 

Avery Ching commented on GIRAPH-164:


+1, thanks guys.  Committing.

 fix 5 Line is longer than 80 characters style errors in GiraphRunner
 --

 Key: GIRAPH-164
 URL: https://issues.apache.org/jira/browse/GIRAPH-164
 Project: Giraph
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Eugene Koontz
Priority: Trivial
 Fix For: 0.2.0

 Attachments: GIRAPH-164.patch


 {code}
 file 
 name=/Users/ekoontz/giraph/src/main/java/org/apache/giraph/GiraphRunner.java
   error line=155 severity=error message=Line is longer than 80 
 characters. 
 source=com.puppycrawl.tools.checkstyle.checks.sizes.LineLengthCheck/
   error line=156 severity=error message=Line is longer than 80 
 characters. 
 source=com.puppycrawl.tools.checkstyle.checks.sizes.LineLengthCheck/
   error line=158 severity=error message=Line is longer than 80 
 characters. 
 source=com.puppycrawl.tools.checkstyle.checks.sizes.LineLengthCheck/
   error line=161 severity=error message=Line is longer than 80 
 characters. 
 source=com.puppycrawl.tools.checkstyle.checks.sizes.LineLengthCheck/
 /file
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-154) Worker ports are not synched properly with its peers

2012-03-17 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232012#comment-13232012
 ] 

Avery Ching commented on GIRAPH-154:


Nice work Zhiwei (+1), I verified it as well and committed.  Will close once 
Hudson verifies as well.

 Worker ports are not synched properly with its peers
 

 Key: GIRAPH-154
 URL: https://issues.apache.org/jira/browse/GIRAPH-154
 Project: Giraph
  Issue Type: Bug
  Components: bsp
Affects Versions: 0.2.0
Reporter: Zhiwei Gu
Assignee: Zhiwei Gu
 Attachments: GIRAPH-154.patch


 When worker trying multiple ports to setup the rpc server, the final port is 
 not synched with it's peer workers properly, and resulted in peer workers 
 send message to the default port.
 Here is some logs:
 
 Base port: 34900
 
 
 log for worker 161:
 
 IPC Server handler 98 on 36061: starting
 BasicRPCCommunications: Started RPC communication server: 
 gsta32085.tan.ygrid.yahoo.com/10.216.148.47:36061 with 100 handlers and 199 
 flush threads on bind attempt 1
 IPC Server handler 99 on 36061: starting
 setup: Registering health of this worker...
 getJobState: Job state already exists 
 (/_hadoopBsp/job_201203130609_14838/_masterJobState)
 getApplicationAttempt: Node 
 /_hadoopBsp/job_201203130609_14838/_applicationAttemptsDir already exists!
 getApplicationAttempt: Node 
 /_hadoopBsp/job_201203130609_14838/_applicationAttemptsDir already exists!
 registerHealth: Created my health node for attempt=0, superstep=-1 with 
 /_hadoopBsp/job_201203130609_14838/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/gsta32085.tan.ygrid.yahoo.com_161
  and workerInfo= Worker(hostname=gsta32085.tan.ygrid.yahoo.com, 
 MRpartition=161, port=35061)
 process: partitionAssignmentsReadyChanged (partitions are assigned)
 startSuperstep: Ready for computation on superstep -1 since worker selection 
 and vertex range assignments are done in 
 /_hadoopBsp/job_201203130609_14838/_applicationAttemptsDir/0/_superstepDir/-1/_partitionAssignments
 Retrying connect to server: 
 gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 0 time(s).
 Retrying connect to server: 
 gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 1 time(s).
 Retrying connect to server: 
 gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 2 time(s).
 Retrying connect to server: 
 gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 3 time(s).
 Retrying connect to server: 
 gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 4 time(s).
 Retrying connect to server: 
 gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 5 time(s).
 Retrying connect to server: 
 gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 6 time(s).
 Retrying connect to server: 
 gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 7 time(s).
 Retrying connect to server: 
 gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 8 time(s).
 Retrying connect to server: 
 gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 9 time(s).
 Retrying connect to server: 
 gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 10 time(s).
 Retrying connect to server: 
 gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 11 time(s).
 Retrying connect to server: 
 gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 12 time(s).
 Retrying connect to server: 
 gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 13 time(s).
 Retrying connect to server: 
 gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 14 time(s).
 Retrying connect to server: 
 gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 15 time(s).
 Retrying connect to server: 
 gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 16 time(s).
 Retrying connect to server: 
 gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 17 time(s).
 Retrying connect to server: 
 gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 18 time(s).
 Retrying connect to server: 
 gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 19 time(s).
 Retrying connect to server: 
 gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 20 time(s).
 Retrying connect to server: 
 gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 21 time(s).
 Retrying connect to server: 
 

[jira] [Commented] (GIRAPH-156) Users should be able to set simple 'custom arguments' via org.apache.giraph.GiraphRunner

2012-03-17 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232014#comment-13232014
 ] 

Avery Ching commented on GIRAPH-156:


+1, looks good.  It would be great if you could wrap LOG.info with if 
(LOG.isInfoEnabled()), before committing.  There are some other places in this 
file as well without the LOG enabled wrap.  You can either make that change 
here or someone else can do it in another patch.

 Users should be able to set simple 'custom arguments' via 
 org.apache.giraph.GiraphRunner
 

 Key: GIRAPH-156
 URL: https://issues.apache.org/jira/browse/GIRAPH-156
 Project: Giraph
  Issue Type: Improvement
  Components: conf and scripts
Affects Versions: 0.1.0
Reporter: Sebastian Schelter
Assignee: Sebastian Schelter
 Attachments: GIRAPH-156-1.patch, GIRAPH-156.patch


 Some vertices need custom arguments to run. The SimpleShortestPathsVertex for 
 example needs to know the source vertex for the computation which is saved in 
 the job's Configuration as _SimpleShortestPathsVertex.sourceId_. Users should 
 be able to apply such simple custom arguments via GiraphRunner. 
 I propose to add a new option _--customArguments_ where users can supply 
 arguments in the form _param1=value1,param2=value2_ for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-156) Users should be able to set simple 'custom arguments' via org.apache.giraph.GiraphRunner

2012-03-16 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13230942#comment-13230942
 ] 

Avery Ching commented on GIRAPH-156:


I think this makes sense.  Go for it. =)

 Users should be able to set simple 'custom arguments' via 
 org.apache.giraph.GiraphRunner
 

 Key: GIRAPH-156
 URL: https://issues.apache.org/jira/browse/GIRAPH-156
 Project: Giraph
  Issue Type: Improvement
  Components: conf and scripts
Affects Versions: 0.1.0
Reporter: Sebastian Schelter
Assignee: Sebastian Schelter

 Some vertices need custom arguments to run. The SimpleShortestPathsVertex for 
 example needs to know the source vertex for the computation which is saved in 
 the job's Configuration as _SimpleShortestPathsVertex.sourceId_. Users should 
 be able to apply such simple custom arguments via GiraphRunner. 
 I propose to add a new option _--customArguments_ where users can supply 
 arguments in the form _param1=value1,param2=value2_ for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-03-14 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229042#comment-13229042
 ] 

Avery Ching commented on GIRAPH-153:


Brian, this is an awesome contribution and a lot of code.  I'm really sorry 
that it took me so long to look at this.  Is there any change that you could 
add some simple unittests for your formats?  TestJsonBase64Format.java is an 
example that might be easy to adapt for your formats.

Also, I just created a page for how to contribute.  
https://cwiki.apache.org/confluence/display/GIRAPH/How+to+Contribute

Have you run 'mvn verify'?  Thanks!

 HBase/Accumulo Input and Output formats
 ---

 Key: GIRAPH-153
 URL: https://issues.apache.org/jira/browse/GIRAPH-153
 Project: Giraph
  Issue Type: New Feature
  Components: bsp
Affects Versions: 0.1.0
 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
Reporter: Brian Femiano
 Attachments: AccumuloRootMarker.java, 
 AccumuloRootMarkerInputFormat.java, AccumuloRootMarkerOutputFormat.java, 
 AccumuloVertexInputFormat.java, AccumuloVertexOutputFormat.java, 
 ComputeIsRoot.java, DistributedCacheHelper.java, HBaseVertexInputFormat.java, 
 HBaseVertexOutputFormat.java, IdentifyAndMarkRoots.java, 
 SetLongWritable.java, SetTextWritable.java, TableRootMarker.java, 
 TableRootMarkerInputFormat.java, TableRootMarkerOutputFormat.java


 Four abstract classes that wrap their respective delegate input/output 
 formats for
 easy hooks into vertex input format subclasses. I've included some sample 
 programs that show two very simple graph
 algorithms. I have a graph generator that builds out a very simple directed 
 structure, starting with a few 'root' nodes.
 Root nodes are defined as nodes which are not listed as a child anywhere in 
 the graph. 
 Algorithm 1) AccumuloRootMarker.java  -- Accumulo as read/write source. 
 Every vertex starts thinking it's a root. At superstep 0, send a message down 
 to each
 child as a non-root notification. After superstep 1, only root nodes will 
 have never been messaged. 
 Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by 
 bundling the notification logic followed by root node propagation. Once we've 
 marked the appropriate nodes as roots, tell every child which roots it can be 
 traced back to via one or more spanning trees. This will take N + 2 
 supersteps where N is the maximum number of hops from any root to any leaf, 
 plus 2 supersteps for the initial root flagging. 
 I've included all relevant code plus DistributedCacheHelper.java for 
 recursive cache file and archive searches. It is more hadoop centric than 
 giraph, but these jobs use it so I figured why not commit here. 
 These have been tested through local JobRunner, pseudo-distributed on the 
 aforementioned hardware, and full distributed on EC2. More details in the 
 comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-144) GiraphJob should not extend Job (users should not be able to call Job methods like waitForCompletion or setMapper..etc)

2012-03-07 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13224096#comment-13224096
 ] 

Avery Ching commented on GIRAPH-144:


@Jakob, any more thoughts?

 GiraphJob should not extend Job  (users should not be able to call Job 
 methods like waitForCompletion or setMapper..etc)
 

 Key: GIRAPH-144
 URL: https://issues.apache.org/jira/browse/GIRAPH-144
 Project: Giraph
  Issue Type: Bug
Reporter: Dave
Assignee: Avery Ching
 Attachments: GIRAPH-144.patch

   Original Estimate: 24h
  Remaining Estimate: 24h



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-85) Simplify return expression in RPCCommunications::getRPCProxy

2012-02-27 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217096#comment-13217096
 ] 

Avery Ching commented on GIRAPH-85:
---

I just looked at your patch in Eclipse and see that there is now a warning due 
to 

Type safety: Unchecked cast from VersionedProtocol to 
CommunicationsInterfaceI,V,E,M.

We can either keep this the way it was before, or add 
@SuppressWarnings(unchecked) to the method.  I don't have a strong opinion 
here.  Anyone else care to comment?


 Simplify return expression in RPCCommunications::getRPCProxy
 

 Key: GIRAPH-85
 URL: https://issues.apache.org/jira/browse/GIRAPH-85
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Jakob Homan
  Labels: newbie
 Fix For: 0.2.0

 Attachments: GIRAPH-85.patch, GIRAPH-85.patch


 Twice in RPCCommunications::getRPCProxy a local variable, proxy, is created 
 and immediately returned.  We can simplify this to just return the value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-87) Simplify boolean expression in BspService::checkpointFrequencyMet

2012-02-25 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-87?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216365#comment-13216365
 ] 

Avery Ching commented on GIRAPH-87:
---

+1
Thanks Eli, I committed on your behalf.

 Simplify boolean expression in BspService::checkpointFrequencyMet
 -

 Key: GIRAPH-87
 URL: https://issues.apache.org/jira/browse/GIRAPH-87
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Jakob Homan
Assignee: Eli Reisman
  Labels: newbie
 Attachments: GIRAPH-87.patch, GIRAPH-87.patch


 {noformat}if (superstep  firstCheckpoint) {
 return false;
 } else if (((superstep - firstCheckpoint) % checkpointFrequency) == 
 0) {
 return true;
 } else {
 return false;
 }{noformat}
 can be simplified to just return the result of the else if evaluation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-85) Simplify return expression in RPCCommunications::getRPCProxy

2012-02-24 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216047#comment-13216047
 ] 

Avery Ching commented on GIRAPH-85:
---

please make sure it passes 'mvn verify' as well.  That will check rat and 
checkstyle.

 Simplify return expression in RPCCommunications::getRPCProxy
 

 Key: GIRAPH-85
 URL: https://issues.apache.org/jira/browse/GIRAPH-85
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Jakob Homan
  Labels: newbie
 Fix For: 0.2.0

 Attachments: GIRAPH-85.patch


 Twice in RPCCommunications::getRPCProxy a local variable, proxy, is created 
 and immediately returned.  We can simplify this to just return the value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-40) Adding checkstyle enforcement of Giraph code conventions

2012-02-16 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-40?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13209727#comment-13209727
 ] 

Avery Ching commented on GIRAPH-40:
---

Can another committer please look at this as per Jakob's request?

 Adding checkstyle enforcement of Giraph code conventions
 

 Key: GIRAPH-40
 URL: https://issues.apache.org/jira/browse/GIRAPH-40
 Project: Giraph
  Issue Type: New Feature
Reporter: Avery Ching
Assignee: Avery Ching
Priority: Minor
 Attachments: GIRAPH-40.2.patch, GIRAPH-40.3.patch, GIRAPH-40.patch, 
 GIRAPH-40.patch


 Now that we have some code conventions (see GIRAPH-21), we should enforce 
 them with a maven checkstyle plugin.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-40) Adding checkstyle enforcement of Giraph code conventions

2012-02-16 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-40?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13209801#comment-13209801
 ] 

Avery Ching commented on GIRAPH-40:
---

Thanks so much for the reviews Jakob and Sebastian.  It's committed.

@Sebastian, 'mvn compile' and 'mvn package' will succeed with violations.  
Anything using 'verify', i.e. 'mvn verify' or 'mvn install' will hit problems 
with checkstyle and rat.

 Adding checkstyle enforcement of Giraph code conventions
 

 Key: GIRAPH-40
 URL: https://issues.apache.org/jira/browse/GIRAPH-40
 Project: Giraph
  Issue Type: New Feature
Reporter: Avery Ching
Assignee: Avery Ching
Priority: Minor
 Attachments: GIRAPH-40.2.patch, GIRAPH-40.3.patch, GIRAPH-40.patch, 
 GIRAPH-40.patch


 Now that we have some code conventions (see GIRAPH-21), we should enforce 
 them with a maven checkstyle plugin.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-150) PageRankBenchmark accesses wrong conf after GiraphJob is created

2012-02-16 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210025#comment-13210025
 ] 

Avery Ching commented on GIRAPH-150:


By the way, here was the full stack trace:

hadoop jar target/giraph-0.2-SNAPSHOT-jar-with-dependencies.jar 
org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -v -V 50 -w 3 -c 1
Exception in thread main java.lang.NullPointerException
at 
org.apache.giraph.benchmark.PageRankBenchmark.run(PageRankBenchmark.java:127)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at 
org.apache.giraph.benchmark.PageRankBenchmark.main(PageRankBenchmark.java:162)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

After this fix, it works.

 PageRankBenchmark accesses wrong conf after GiraphJob is created
 

 Key: GIRAPH-150
 URL: https://issues.apache.org/jira/browse/GIRAPH-150
 Project: Giraph
  Issue Type: Bug
Reporter: Avery Ching
Assignee: Avery Ching
 Attachments: GIRAPH-150.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-40) Adding checkstyle enforcement of Giraph code conventions

2012-02-13 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-40?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207188#comment-13207188
 ] 

Avery Ching commented on GIRAPH-40:
---

So for the first example, we need to follow that format, or else checkstyle 
will mark it an error.

For the second examples, checkstyle doesn't seem to enforce the line wrap 
indent.  So we need to still keep an eye out for those issues.

 Adding checkstyle enforcement of Giraph code conventions
 

 Key: GIRAPH-40
 URL: https://issues.apache.org/jira/browse/GIRAPH-40
 Project: Giraph
  Issue Type: New Feature
Reporter: Avery Ching
Assignee: Avery Ching
Priority: Minor
 Attachments: GIRAPH-40.patch


 Now that we have some code conventions (see GIRAPH-21), we should enforce 
 them with a maven checkstyle plugin.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-40) Adding checkstyle enforcement of Giraph code conventions

2012-02-13 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-40?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207203#comment-13207203
 ] 

Avery Ching commented on GIRAPH-40:
---

I'm not a checkstyle expert, but I don't think so. I can play around with 
trying to fix that.  Or we can fix in another issue.  I should be done with 
this patch today.


 Adding checkstyle enforcement of Giraph code conventions
 

 Key: GIRAPH-40
 URL: https://issues.apache.org/jira/browse/GIRAPH-40
 Project: Giraph
  Issue Type: New Feature
Reporter: Avery Ching
Assignee: Avery Ching
Priority: Minor
 Attachments: GIRAPH-40.patch


 Now that we have some code conventions (see GIRAPH-21), we should enforce 
 them with a maven checkstyle plugin.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-148) giraph-site.xml needs Apache header

2012-02-13 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207211#comment-13207211
 ] 

Avery Ching commented on GIRAPH-148:


+1.

 giraph-site.xml needs Apache header
 ---

 Key: GIRAPH-148
 URL: https://issues.apache.org/jira/browse/GIRAPH-148
 Project: Giraph
  Issue Type: Bug
  Components: conf and scripts
Affects Versions: 0.2.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.2.0

 Attachments: GIRAPH-148-b.patch, GIRAPH-148.patch


 I forgot to add the license to the conf file and now rat is failing...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-40) Adding checkstyle enforcement of Giraph code conventions

2012-02-11 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-40?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13206291#comment-13206291
 ] 

Avery Ching commented on GIRAPH-40:
---

Thank you for the feedback Claudio.  I'll continue to transition the other 
files and submit a final patch unless anyone has any objections.

 Adding checkstyle enforcement of Giraph code conventions
 

 Key: GIRAPH-40
 URL: https://issues.apache.org/jira/browse/GIRAPH-40
 Project: Giraph
  Issue Type: New Feature
Reporter: Avery Ching
Assignee: Avery Ching
Priority: Minor
 Attachments: GIRAPH-40.patch


 Now that we have some code conventions (see GIRAPH-21), we should enforce 
 them with a maven checkstyle plugin.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-139) Change PageRankBenchmark to be accessible via bin/giraph

2012-02-10 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205598#comment-13205598
 ] 

Avery Ching commented on GIRAPH-139:


+1
Looks good to me.

 Change PageRankBenchmark to be accessible via bin/giraph
 

 Key: GIRAPH-139
 URL: https://issues.apache.org/jira/browse/GIRAPH-139
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.2.0

 Attachments: GIRAPH-139-b.patch, GIRAPH-139.patch


 Currently the PageRankBenchmark has its own main and tool implementation and 
 is difficult to access from the bin/giraph script.  It would be better if 
 everything were accessible via bin/giraph.  The benchmark is particularly 
 problematic because it uses inner classes for its two actual Vertex 
 implementations, which have to be specified on the command line as their 
 .class name(ie 
 org.apache.giraph.benchmark.PageRankBenchmark$PageRankHashMapVertex) rather 
 than just with dots, as one would expect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-148) giraph-site.xml needs Apache header

2012-02-10 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205947#comment-13205947
 ] 

Avery Ching commented on GIRAPH-148:


Jakob, this header is formatted slightly differently from the one in pom.xml 
and the .java files we have.

 giraph-site.xml needs Apache header
 ---

 Key: GIRAPH-148
 URL: https://issues.apache.org/jira/browse/GIRAPH-148
 Project: Giraph
  Issue Type: Bug
  Components: conf and scripts
Affects Versions: 0.2.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.2.0

 Attachments: GIRAPH-148.patch


 I forgot to add the license to the conf file and now rat is failing...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-40) Adding checkstyle enforcement of Giraph code conventions

2012-02-10 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-40?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205976#comment-13205976
 ] 

Avery Ching commented on GIRAPH-40:
---

Here are some examples of one problem:  Checkstyle doesn't seem to be able to 
handle single indent versus double indent of 2 spaces when appropriate.  The 
below examples are what Checkstyle wants to have us do.

{noformat}
@Override
public BasicVertexLongWritable, DoubleWritable, DoubleWritable, M
getCurrentVertex() throws IOException, InterruptedException {

  @Override
  public VertexReaderLongWritable, DoubleWritable, DoubleWritable, M
  createVertexReader(InputSplit split, TaskAttemptContext context)
throws IOException {
{noformat}

Also, checkstyle won't enforce indenting after a line wrap.  So both of these 
examples are passing checkstyle.

{noformat}
  aggregateVertices =
  configuration.getLong(
  PseudoRandomVertexInputFormat.AGGREGATE_VERTICES, 0);

  aggregateVertices =
configuration.getLong(
  PseudoRandomVertexInputFormat.AGGREGATE_VERTICES, 0);
{noformat}


That being said, I think this is the right thing to do and we can make some 
sacrifices to have better, more uniform code.  Please let me know your thoughts.

 Adding checkstyle enforcement of Giraph code conventions
 

 Key: GIRAPH-40
 URL: https://issues.apache.org/jira/browse/GIRAPH-40
 Project: Giraph
  Issue Type: New Feature
Reporter: Avery Ching
Assignee: Avery Ching
Priority: Minor
 Attachments: GIRAPH-40.patch


 Now that we have some code conventions (see GIRAPH-21), we should enforce 
 them with a maven checkstyle plugin.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-148) giraph-site.xml needs Apache header

2012-02-10 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205992#comment-13205992
 ] 

Avery Ching commented on GIRAPH-148:


It will with checkstyle (see GIRAPH-40).  We will need to pick one or the 
other.  I don't have a strong preference.

 giraph-site.xml needs Apache header
 ---

 Key: GIRAPH-148
 URL: https://issues.apache.org/jira/browse/GIRAPH-148
 Project: Giraph
  Issue Type: Bug
  Components: conf and scripts
Affects Versions: 0.2.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.2.0

 Attachments: GIRAPH-148.patch


 I forgot to add the license to the conf file and now rat is failing...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-142) _hadoopBsp should be prefixable via configuration

2012-02-09 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205047#comment-13205047
 ] 

Avery Ching commented on GIRAPH-142:


Looks fine, could we just add a check somewhere that the path must start with 
/ and throw an exception explaining to the user the problem?

 _hadoopBsp should be prefixable via configuration
 -

 Key: GIRAPH-142
 URL: https://issues.apache.org/jira/browse/GIRAPH-142
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.1.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.2.0

 Attachments: GIRAPH-142.patch


 In multitennant zookeeper clusters, it would be good to be able to specify 
 the base directory that's created for the _hadoopBsp znodes.  This would also 
 fix the issue we have with creating that directory in the source root during 
 tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-139) Change PageRankBenchmark to be accessible via bin/giraph

2012-02-08 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203900#comment-13203900
 ] 

Avery Ching commented on GIRAPH-139:


I agree the main() and run() code should be deprecated, but preferably after 
giraph-examples.jar is ready =).  

 Change PageRankBenchmark to be accessible via bin/giraph
 

 Key: GIRAPH-139
 URL: https://issues.apache.org/jira/browse/GIRAPH-139
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.2.0

 Attachments: GIRAPH-139.patch


 Currently the PageRankBenchmark has its own main and tool implementation and 
 is difficult to access from the bin/giraph script.  It would be better if 
 everything were accessible via bin/giraph.  The benchmark is particularly 
 problematic because it uses inner classes for its two actual Vertex 
 implementations, which have to be specified on the command line as their 
 .class name(ie 
 org.apache.giraph.benchmark.PageRankBenchmark$PageRankHashMapVertex) rather 
 than just with dots, as one would expect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-139) Change PageRankBenchmark to be accessible via bin/giraph

2012-02-08 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203967#comment-13203967
 ] 

Avery Ching commented on GIRAPH-139:


sounds good to me.

 Change PageRankBenchmark to be accessible via bin/giraph
 

 Key: GIRAPH-139
 URL: https://issues.apache.org/jira/browse/GIRAPH-139
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.2.0

 Attachments: GIRAPH-139.patch


 Currently the PageRankBenchmark has its own main and tool implementation and 
 is difficult to access from the bin/giraph script.  It would be better if 
 everything were accessible via bin/giraph.  The benchmark is particularly 
 problematic because it uses inner classes for its two actual Vertex 
 implementations, which have to be specified on the command line as their 
 .class name(ie 
 org.apache.giraph.benchmark.PageRankBenchmark$PageRankHashMapVertex) rather 
 than just with dots, as one would expect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-144) GiraphJob should not extend Job (users should not be able to call Job methods like waitForCompletion or setMapper..etc)

2012-02-08 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204213#comment-13204213
 ] 

Avery Ching commented on GIRAPH-144:


I'm working on this, should have a fix by tonight.

 GiraphJob should not extend Job  (users should not be able to call Job 
 methods like waitForCompletion or setMapper..etc)
 

 Key: GIRAPH-144
 URL: https://issues.apache.org/jira/browse/GIRAPH-144
 Project: Giraph
  Issue Type: Bug
Reporter: Dave
Assignee: Avery Ching
   Original Estimate: 24h
  Remaining Estimate: 24h



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-136) Erorr message for bin/giraph could be improved

2012-02-03 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199585#comment-13199585
 ] 

Avery Ching commented on GIRAPH-136:


+1, much better.

First try.

$ ./bin/giraph target/giraph-0.1-SNAPSHOT-jar-with-dependencies.jar 
No lib directory, assuming dev environment
No HADOOP_CONF_DIR set, using /conf 
./bin/giraph: line 112: /bin/hadoop: No such file or directory
./bin/giraph: line 112: exec: /bin/hadoop: cannot execute: No such file or 
directory

Second try after setting HADOOP_CONF_DIR.

$ ./bin/giraph target/giraph-0.1-SNAPSHOT-jar-with-dependencies.jar 
No lib directory, assuming dev environment
HADOOP_CONF_DIR=/Users/aching/Avery/Work/source/hadoop-0.20.203.0/conf
usage: org.apache.giraph.GiraphRunner [-aw arg] [-c arg] [-h] [-if
   arg] [-ip arg] [-of arg] [-op arg] [-q] [-w arg] [-wc
   arg]
 -aw,--aggregatorWriter arg   AggregatorWriter class
 -c,--combiner argVertexCombiner class
 -h,--help  Help
 -if,--inputFormat argGraph inputformat
 -ip,--inputPath arg  Graph input path
 -of,--outputFormat arg   Graph outputformat
 -op,--outputPath arg Graph output path
 -q,--quiet Quiet output
 -w,--workers arg Number of workers
 -wc,--workerContext arg  WorkerContext class


 Erorr message for bin/giraph could be improved
 --

 Key: GIRAPH-136
 URL: https://issues.apache.org/jira/browse/GIRAPH-136
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.1.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.2.0

 Attachments: GIRAPH-136-b.patch, GIRAPH-136.patch


 Currently when one just runs bin/giraph without the required jar, the message 
 isn't very helpful:
 {noformat}[tardis giraph-0.1]$ bin/giraph
 Can't find user jar to execute.{noformat}
 It would be better to have a more in-depth message explaining Giraph and what 
 is expected.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-40) Adding checkstyle enforcement of Giraph code conventions

2012-02-03 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-40?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199938#comment-13199938
 ] 

Avery Ching commented on GIRAPH-40:
---

By the way Claudio, I think several IDEs have support for checkstyle (i.e. 
Eclipse and Intellij).

 Adding checkstyle enforcement of Giraph code conventions
 

 Key: GIRAPH-40
 URL: https://issues.apache.org/jira/browse/GIRAPH-40
 Project: Giraph
  Issue Type: New Feature
Reporter: Avery Ching
Assignee: Avery Ching
Priority: Minor

 Now that we have some code conventions (see GIRAPH-21), we should enforce 
 them with a maven checkstyle plugin.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-40) Adding checkstyle enforcement of Giraph code conventions

2012-02-03 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-40?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199935#comment-13199935
 ] 

Avery Ching commented on GIRAPH-40:
---

What I meant and am working on, is failing the build when checkstyle errors 
occur.  For now, I am going through and fixing the checkstyle.xml I have and 
adjusting code, then will submit a patch with the checkstyle.xml and all the 
warnings fixed.  Then, going forward, we will not have to deal with many 
formatting issues for patches.  Well, that's the goal anyway. =)

 Adding checkstyle enforcement of Giraph code conventions
 

 Key: GIRAPH-40
 URL: https://issues.apache.org/jira/browse/GIRAPH-40
 Project: Giraph
  Issue Type: New Feature
Reporter: Avery Ching
Priority: Minor

 Now that we have some code conventions (see GIRAPH-21), we should enforce 
 them with a maven checkstyle plugin.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-136) Erorr message for bin/giraph could be improved

2012-02-01 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13198070#comment-13198070
 ] 

Avery Ching commented on GIRAPH-136:


Okay, hopefully that gets addressed at some point.  +1 for this patch.

It would be nice to see help in the message on what is required to get this to 
work.  Another way would be to add to 
https://cwiki.apache.org/confluence/display/GIRAPH/Index.

 Erorr message for bin/giraph could be improved
 --

 Key: GIRAPH-136
 URL: https://issues.apache.org/jira/browse/GIRAPH-136
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.1.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.2.0

 Attachments: GIRAPH-136.patch


 Currently when one just runs bin/giraph without the required jar, the message 
 isn't very helpful:
 {noformat}[tardis giraph-0.1]$ bin/giraph
 Can't find user jar to execute.{noformat}
 It would be better to have a more in-depth message explaining Giraph and what 
 is expected.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-136) Erorr message for bin/giraph could be improved

2012-01-31 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197650#comment-13197650
 ] 

Avery Ching commented on GIRAPH-136:


I can verify the error message is improved, but perhaps the message could be 
improved further?  Is there any example usage you have for using this?

aching:~/git/git_svn_giraph_trunk$ ./bin/giraph 
Usage: giraph [-DHadoop property] jar containing vertex parameters to jar
At a minimum one must provide a path to the jar containing the vertex to be 
executed.
aching:~/git/git_svn_giraph_trunk$ ./bin/giraph 
target/giraph-0.1-SNAPSHOT-jar-with-dependencies.jar 
Can't find Giraph jar.


 Erorr message for bin/giraph could be improved
 --

 Key: GIRAPH-136
 URL: https://issues.apache.org/jira/browse/GIRAPH-136
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.1.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.2.0

 Attachments: GIRAPH-136.patch


 Currently when one just runs bin/giraph without the required jar, the message 
 isn't very helpful:
 {noformat}[tardis giraph-0.1]$ bin/giraph
 Can't find user jar to execute.{noformat}
 It would be better to have a more in-depth message explaining Giraph and what 
 is expected.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-134) Fix NOTICE and LICENSE files

2012-01-30 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13196727#comment-13196727
 ] 

Avery Ching commented on GIRAPH-134:


+1, looks good!  Excited for the release.

 Fix NOTICE and LICENSE files
 

 Key: GIRAPH-134
 URL: https://issues.apache.org/jira/browse/GIRAPH-134
 Project: Giraph
  Issue Type: Improvement
  Components: documentation
Affects Versions: 0.1.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.1.0

 Attachments: GIRAPH-134.patch


 Currently both the LICENSE and NOTICE file are out of compliance for an 
 Apache release.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-128) RPC port from BasicRPCCommunications should be only a starting port, and retried

2012-01-27 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195286#comment-13195286
 ] 

Avery Ching commented on GIRAPH-128:


Thanks for taking a look.  I forgot to upload the original (rb only for that 
one), hence part 2. 

The main motivation for the obscure case is that it would make debugging 
simpler.  We often see errors like serverX:portY, and can use portY to figure 
out which mapper to look at.  For example, currently the default starts at 
3.  If I see an error from 30001, then I know to go to mapper 1 to see it's 
problem.  And so on and so forth.  If I am running a 900 mapper job then if 
it's 31001 or 32001 then I still know to look at mapper partition 1.  If 
instead I had a 100 as the constant, then if it's 30101, I have to check both 
mapper 1 and mapper 101.  With up to 20 retries per port, we can handle at 
least 20 simultaneous jobs running on a single machine that have the same 
mapper partition id.  First of, that is probably unlikely.  But even if it does 
happen, 20 is probably more than an one machine would handle.  By the way, port 
retries are very fast (so I wouldn't worry to much about collisions).

Let me resubmit without the whitespace changes and making MAX_BIND_ATTEMPTS 
configurable.

 RPC port from BasicRPCCommunications should be only a starting port, and 
 retried
 

 Key: GIRAPH-128
 URL: https://issues.apache.org/jira/browse/GIRAPH-128
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.1.0
Reporter: Avery Ching
Assignee: Avery Ching
 Attachments: GIRAPH-128.2.patch


 Currently Giraph uses a basic port + the task partition to get the RPC port.  
 This doesn't work well for when there are multiple Giraph jobs running 
 simultaneously in the same Hadoop cluster (port conflict).  At the same time, 
 it is nice to use this simple algorithm because it makes it very easy to 
 debug problems (you can find the troublesome mapper from the RPC port name).  
 I will be proposing a simple scheme to retry with another port.  I will round 
 the total number of mappers up to the nearest power of 10 (let's that that 
 number Z).  Then I will increment the port number by Z, retrying up to 20 
 tries.  If you have enough ports, this scheme would guarantee that up to 20 
 mappers / node would be supported.  It should be sufficient for most 
 clusters.  At the same time, we still maintain the easy debugging method 
 since you it's still easy to figure out the mapper partition from the port 
 (port % Z = map partition). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-128) RPC port from BasicRPCCommunications should be only a starting port, and retried

2012-01-25 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193579#comment-13193579
 ] 

Avery Ching commented on GIRAPH-128:


Anyone want to review?  I think this will be very useful to get in before the 
release since it lets users run multiple Giraph jobs on the same cluster 
simultaneously a lot easier...

 RPC port from BasicRPCCommunications should be only a starting port, and 
 retried
 

 Key: GIRAPH-128
 URL: https://issues.apache.org/jira/browse/GIRAPH-128
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.1.0
Reporter: Avery Ching
Assignee: Avery Ching
 Attachments: GIRAPH-128.2.patch


 Currently Giraph uses a basic port + the task partition to get the RPC port.  
 This doesn't work well for when there are multiple Giraph jobs running 
 simultaneously in the same Hadoop cluster (port conflict).  At the same time, 
 it is nice to use this simple algorithm because it makes it very easy to 
 debug problems (you can find the troublesome mapper from the RPC port name).  
 I will be proposing a simple scheme to retry with another port.  I will round 
 the total number of mappers up to the nearest power of 10 (let's that that 
 number Z).  Then I will increment the port number by Z, retrying up to 20 
 tries.  If you have enough ports, this scheme would guarantee that up to 20 
 mappers / node would be supported.  It should be sufficient for most 
 clusters.  At the same time, we still maintain the easy debugging method 
 since you it's still easy to figure out the mapper partition from the port 
 (port % Z = map partition). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-130) Fix Javadoc warnings

2012-01-24 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192388#comment-13192388
 ] 

Avery Ching commented on GIRAPH-130:


It would be great to enforce this checking somehow to prevent it from happening 
at all.

 Fix Javadoc warnings
 

 Key: GIRAPH-130
 URL: https://issues.apache.org/jira/browse/GIRAPH-130
 Project: Giraph
  Issue Type: Bug
Reporter: Jakob Homan
Priority: Minor
  Labels: newbie

 We've accumulated a fair number of javadoc warnings recently:
 {noformat}[WARNING] Javadoc Warnings
 [WARNING] 
 /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java:146:
  warning - Tag @link: reference not found: GraphPartitioner
 [WARNING] 
 /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java:129:
  warning - @param argument superstep is not a parameter name.
 [WARNING] 
 /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java:146:
  warning - Tag @link: reference not found: GraphPartitioner
 [WARNING] 
 /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/comm/CommunicationsInterface.java:84:
  warning - @param argument vertexIndex is not a parameter name.
 [WARNING] 
 /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/comm/CommunicationsInterface.java:84:
  warning - @param argument msgList is not a parameter name.
 [WARNING] 
 /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/comm/VertexIdMessagesList.java:32:
  warning - Tag @link: reference not found: VertexIdMessage
 [WARNING] 
 /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/VertexCombiner.java:46:
  warning - Tag @link: reference not found: messages
 [WARNING] 
 /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/VertexCombiner.java:46:
  warning - Tag @link: reference not found: messages
 [WARNING] 
 /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/AggregatorWriter.java:60:
  warning - @param argument map is not a parameter name.
 [WARNING] 
 /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java:146:
  warning - Tag @link: reference not found: GraphPartitioner
 [WARNING] 
 /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java:146:
  warning - Tag @link: reference not found: GraphPartitioner
 [WARNING] 
 /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/GiraphJob.java:432:
  warning - @param argument graphPartitionerClass is not a parameter name.
 [WARNING] 
 /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/VertexCombiner.java:46:
  warning - Tag @link: reference not found: messages
 [WARNING] 
 /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/partition/MasterGraphPartitioner.java:62:
  warning - Tag @link: reference not found: GraphPartitioner
 [WARNING] 
 /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/partition/MasterGraphPartitioner.java:62:
  warning - Tag @link: reference not found: GraphPartitioner
 [WARNING] 
 /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/partition/MasterGraphPartitioner.java:62:
  warning - @param argument availableWorkerInfos is not a parameter name.
 [WARNING] 
 /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/partition/PartitionBalancer.java:176:
  warning - @param argument allPartitionStatsList is not a parameter name.
 [WARNING] 
 /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/comm/VertexIdMessagesList.java:32:
  warning - Tag @link: reference not found: VertexIdMessage
 [WARNING] 
 /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/comm/VertexIdMessagesList.java:32:
  warning - Tag @link: reference not found: VertexIdMessage
 [WARNING] 
 /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java:146:
  warning - Tag @link: reference not found: GraphPartitioner
 [WARNING] 
 /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/comm/VertexIdMessagesList.java:32:
  warning - Tag @link: reference not found: VertexIdMessage
 [WARNING] 
 /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java:146:
  warning - Tag @link: reference not found: GraphPartitioner
 [WARNING] 
 /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/comm/VertexIdMessagesList.java:32:
  warning - Tag @link: reference not found: VertexIdMessage
 {noformat}
 It would be good to fix these.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: 

[jira] [Commented] (GIRAPH-129) enable creation of javadoc and sources jars

2012-01-24 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192887#comment-13192887
 ] 

Avery Ching commented on GIRAPH-129:


As long as mvn compile doesn't build the javadoc, I am happy. =)

 enable creation of javadoc and sources jars
 ---

 Key: GIRAPH-129
 URL: https://issues.apache.org/jira/browse/GIRAPH-129
 Project: Giraph
  Issue Type: Improvement
  Components: build
Affects Versions: 0.1.0
Reporter: André Kelpe
Assignee: André Kelpe
Priority: Minor
 Attachments: GIRAPH-129.patch


 It is pretty useful to enable the creation if javadoc and sources jars during 
 the build, so that people using IDEs like eclipse can easily jump into the 
 code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-124) Combiner should return IterableM instead of M or null.

2012-01-21 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190351#comment-13190351
 ] 

Avery Ching commented on GIRAPH-124:


Nice, Claudio.  I haven't had a chance to fully test it, but wanted to give you 
some early feedback.

1)  Some changes have messed up indenting a little (here are some examples)

-public FloatWritable combine(LongWritable vertexIndex,
+public IterableFloatWritable combine(LongWritable vertexIndex,
   IterableFloatWritable msgList)

-   public abstract M combine(I vertexIndex,
+   public abstract IterableM combine(I vertexIndex,
  IterableM messages) throws IOException;

-M combinedMsg = combiner.combine(entry.getKey(),
+IterableM messages = combiner.combine(entry.getKey(),
  entry.getValue());

-public IntWritable combine(LongWritable vertexIndex,
+public IterableIntWritable combine(LongWritable vertexIndex,
IterableIntWritable messages)

2)  Should we make the requirement that the returned result has a size  input 
size?  I think the argument was that some classification of messages might not 
always reduce the number of messages?  Perhaps =?

 Combiner should return IterableM instead of M or null.
 

 Key: GIRAPH-124
 URL: https://issues.apache.org/jira/browse/GIRAPH-124
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Affects Versions: 0.1.0
Reporter: Claudio Martella
 Attachments: GIRAPH-124.diff


 Currently VertexCombiner is expected to return a single message combining the 
 input messages, or null in case no message should be sent. The new expected 
 interface should return an IterableM, possibly empty. The number of 
 elements in the returned Iterable is supposed to be smaller than the number 
 of input messages, by the initial definition of a Combiner (defined as a 
 function to reduce I/O by combining multiple messages into 1).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-127) Extending the API with a master.compute() function.

2012-01-19 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13189574#comment-13189574
 ] 

Avery Ching commented on GIRAPH-127:


I think this functionality is very useful and would actually replace a lot of 
the WorkerContext functionality. Sequential steps do need to be done between 
computations sometimes and Pick k random initial cluster centers is a good 
example.

While WorkerContext allows us to do simple things, it is not as efficient for 
certain calculations (i.e. suppose all workers needed a global value from HDFS, 
it is cheaper to do once and broadcast the outcome rather than all workers 
hitting HDFS). Still, WorkerContext can be useful (say for dumping worker 
stats), so I wouldn't remove it, rather just give our users a broader choice on 
computation around supersteps. 

I see that the Master#compute() should have access to all aggregators to do its 
work.  Overall, I like the idea and would definitely like to see how we can add 
this in. Getting the interface right will be a little hard I think, but we can 
iterate over it.  

Basically, from what Semih has said is that we gain 
1)  A clean way to do sequential computation between supersteps
2)  Removing the extra superstep if we simulate this idea with a 'picked worker'

 Extending the API with a master.compute() function.
 ---

 Key: GIRAPH-127
 URL: https://issues.apache.org/jira/browse/GIRAPH-127
 Project: Giraph
  Issue Type: New Feature
  Components: bsp, examples, graph
Reporter: Semih Salihoglu

 First of all, sorry for the long explanation to this feature.
 I want to expand the API of Giraph with a new function called 
 master.compute(), that would get called at the master before each superstep 
 and I will try to explain the purpose that it would serve with an example. 
 Let's say we want to implement the following simplified version of the 
 k-means clustering algorithm. Pseudocode below:
  * Input G(V, E), k, numEdgesThreshold, maxIterations
  * Algorithm:
  * int numEdgesCrossingClusters = Integer.MAX_INT;
 *  int iterationNo = 0;
  * while ((numEdgesCrossingCluster  numEdgesThreshold)  iterationNo  
 maxIterations) {
  *iterationNo++;
  *int[] clusterCenters = pickKClusterCenters(k, G);
  *findClusterCenters(G, clusterCenters);
  *numEdgesCrossingClusters = countNumEdgesCrossingClusters();
  * }
 The algorithm goes through the following steps in iterations:
 1) Pick k random initial cluster centers
 2) Assign each vertex to the cluster center that it's closest to (in Giraph, 
 this can be implemented in message passing similar to how ShortestPaths is 
 implemented):
 3) Count the nuimber of edges crossing clusters
 4) Go back to step 1, if there are a lot of edges crossing clusters and we 
 haven't exceeded maximum number of iterations yet.
 In an algorithm like this, step 2 and 3 are where most of the work happens 
 and both parts have very neat message-passing implementations. I'll try to 
 give an overview without going into the details. Let's say we define a Vertex 
 in Giraph to hold a custom Writable object that holds 2 integer values and 
 sends a message with upto 2 integer values.
 Step 2 is very similar to ShortestPaths algorithm and has two stages: In the 
 first stage, each vertex checks to see whether or not it's one of the cluster 
 centers. If so, it assigns itself the value (id, 0), otherwise it assigns 
 itself (Null, Null). In the 2nd stage, the vertices assign themselves to the 
 minimum distance cluster center by looking at their neighbors (cluster 
 centers, distance) values (received as 2 integer messages) and their current 
 values, and changing their values if they find a lower distance cluster 
 center. This happens in x number of supersteps until every vertex converges.
 Step 3, counting the number of edges crossing clusters, is also very easy to 
 implement in Giraph. Once each vertex has a cluster center, the number of 
 edges crossing clusters can be counted by an aggregator, let's say called 
 num-edges-crossing. It would again have two stages: First stage, every 
 vertex just sends its cluster id to all its neighbors. Second stage, every 
 vertex looks at their neighbors' cluster ids in the messages, and for each 
 cluster id that is not equal to its own cluster id, it increments 
 num-edges-crossing by 1.
 The other 2 steps, step 1 and 4, are very simple sequential computations. 
 Step 1 just picks k random vertex ids and puts it into an aggregator. Step 4 
 just compares num-edges-crossing by a threshold and also checks whether or 
 not the algorithm has exceeded maxIterations (not supersteps but iterations 
 of going through Steps 1-4). With the current API, it's not clear where to do 
 these computations. There is a per 

[jira] [Commented] (GIRAPH-126) Use Collections.emptyList() in BasicRPCCommunications.java

2012-01-18 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188578#comment-13188578
 ] 

Avery Ching commented on GIRAPH-126:


Agree with Jakob.  Thanks Andrei, every memory improvement helps!

 Use Collections.emptyList() in BasicRPCCommunications.java
 --

 Key: GIRAPH-126
 URL: https://issues.apache.org/jira/browse/GIRAPH-126
 Project: Giraph
  Issue Type: Improvement
Reporter: André Kelpe
Assignee: André Kelpe
Priority: Minor
 Attachments: GIRAPH-126.patch


 I am doing some tests with giraph and I am having some memory problems. While 
 I was browsing through the codebase I saw that you are allocating a new 
 ArrayList (which has an underlying array of 10 elements) for each Vertex, 
 that has no Messages to be delivered. That's a waste of memory and time. This 
 patch replaces it with the EMPTY_LIST of the Collections utility class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-126) Use Collections.emptyList() in BasicRPCCommunications.java

2012-01-18 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188588#comment-13188588
 ] 

Avery Ching commented on GIRAPH-126:


Actually, looking at it some more, does this work?  what happens when 
msgs.add(msg) is called on the empty list?  We can also do this a different way 
(ie. msgs = new ARraylistM(1)).

 Use Collections.emptyList() in BasicRPCCommunications.java
 --

 Key: GIRAPH-126
 URL: https://issues.apache.org/jira/browse/GIRAPH-126
 Project: Giraph
  Issue Type: Improvement
Reporter: André Kelpe
Assignee: André Kelpe
Priority: Minor
 Attachments: GIRAPH-126.patch, GIRAPH-126.patch, GIRAPH-126.patch


 I am doing some tests with giraph and I am having some memory problems. While 
 I was browsing through the codebase I saw that you are allocating a new 
 ArrayList (which has an underlying array of 10 elements) for each Vertex, 
 that has no Messages to be delivered. That's a waste of memory and time. This 
 patch replaces it with the EMPTY_LIST of the Collections utility class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-123) the wiki is not publicly accessible

2012-01-11 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13184655#comment-13184655
 ] 

Avery Ching commented on GIRAPH-123:


Works for me.  Thanks Jakob.

 the wiki is not publicly accessible
 ---

 Key: GIRAPH-123
 URL: https://issues.apache.org/jira/browse/GIRAPH-123
 Project: Giraph
  Issue Type: Bug
  Components: documentation
Reporter: André Kelpe
Assignee: Jakob Homan
Priority: Minor

 When I try to read the documentation on the wiki I end up on a login screen. 
 Can you please make the wiki open for the public.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-118) Clarify messages behavior in BasicVertex

2012-01-07 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181894#comment-13181894
 ] 

Avery Ching commented on GIRAPH-118:


+1, looks good!

 Clarify messages behavior in BasicVertex
 

 Key: GIRAPH-118
 URL: https://issues.apache.org/jira/browse/GIRAPH-118
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Affects Versions: 0.70.0
Reporter: Claudio Martella
Assignee: Claudio Martella
Priority: Minor
 Attachments: GIRAPH-118.diff, GIRAPH-119.diff


 initialize() can receive a null parameter for messages (at least that's what 
 EdgeListVertex does). We should avoid that and pass an empty Iterable 
 instead. That should be cheap for us inside of the InputFormat, just passing 
 a static immutable empty list.
 setMessages(IterableM) should be changed to putMessages(IterableM). the 
 set prefix suggests an assignment, while setMessages is used to transfer the 
 messages to the internal datastructure the user is responsible for. 
 putMessages() should clarify this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-119) VertexCombiner should work on IterableM instead of ListM

2012-01-06 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181566#comment-13181566
 ] 

Avery Ching commented on GIRAPH-119:


Minor nit

 @Override
 public FloatWritable combine(LongWritable vertexIndex,
-  ListFloatWritable msgList)
+  IterableFloatWritable msgList)
 throws IOException {
 return null;
 }
@@ -97,7 +97,7 @@ public class TestVertexTypes
 
 @Override
 public DoubleWritable combine(LongWritable vertexIndex,
-  ListDoubleWritable msgList)
+  IterableDoubleWritable msgList)
 throws IOException {
 return null;
 }

probably should have changed msgList to messages or something like that.  Not a 
big deal.  =)

 VertexCombiner should work on IterableM instead of ListM
 

 Key: GIRAPH-119
 URL: https://issues.apache.org/jira/browse/GIRAPH-119
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Affects Versions: 0.70.0
Reporter: Claudio Martella
Assignee: Claudio Martella
 Attachments: GIRAPH-119.diff


 Currently VertexCombiner expects a ListM. It should be refactored to 
 IterableM to sync with Iterable-based BasicVertex messages logics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-118) Clarify messages behavior in BasicVertex

2012-01-06 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181570#comment-13181570
 ] 

Avery Ching commented on GIRAPH-118:


Seems reasonable.  Please make sure to update the javadoc and the MutableVertex 
implementations.

 Clarify messages behavior in BasicVertex
 

 Key: GIRAPH-118
 URL: https://issues.apache.org/jira/browse/GIRAPH-118
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Affects Versions: 0.70.0
Reporter: Claudio Martella
Assignee: Claudio Martella
Priority: Minor

 initialize() can receive a null parameter for messages (at least that's what 
 EdgeListVertex does). We should avoid that and pass an empty Iterable 
 instead. That should be cheap for us inside of the InputFormat, just passing 
 a static immutable empty list.
 setMessages(IterableM) should be changed to putMessages(IterableM). the 
 set prefix suggests an assignment, while setMessages is used to transfer the 
 messages to the internal datastructure the user is responsible for. 
 putMessages() should clarify this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-111) Refactor I/O to be independent of Map/Reduce

2011-12-20 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173512#comment-13173512
 ] 

Avery Ching commented on GIRAPH-111:


I'm not clear on why this is necessary.  Couldn't we simply call the I/O 
methods as Hadoop would when we're not using Hadoop?  Am I missing something?

 Refactor I/O to be independent of Map/Reduce
 

 Key: GIRAPH-111
 URL: https://issues.apache.org/jira/browse/GIRAPH-111
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Reporter: Ed Kohlwey

 The I/O mechanisms should probably be abstracted entirely from Map/Reduce in 
 order to support making Giraph an independent framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-108) Refactor code to run independently of Map/Reduce

2011-12-20 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173931#comment-13173931
 ] 

Avery Ching commented on GIRAPH-108:


Actually, I'll let Jakob take a first crack at looking at this since he's got 
some expertise in the area.

 Refactor code to run independently of Map/Reduce
 

 Key: GIRAPH-108
 URL: https://issues.apache.org/jira/browse/GIRAPH-108
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Reporter: Ed Kohlwey
 Attachments: GIRAPH-108


 It would be nice for Giraph to be refactored such that the code could 
 eventually be run outside of map/reduce. This will allow people to write 
 drivers that can run in the cool new resource manager frameworks like Mesos 
 and YARN, and eventually let the application's code base evolve to be 
 independent of map/reduce.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-73) A little refactoring

2011-12-18 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-73?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13171797#comment-13171797
 ] 

Avery Ching commented on GIRAPH-73:
---

Most of these changes look good, but I'm not sure I agree with the use of 
Closeables.closeQuietly() in ZooKeeperManager.java since if we do get an 
IOException I think we'd want the program to die as soon as possible.


 A little refactoring
 

 Key: GIRAPH-73
 URL: https://issues.apache.org/jira/browse/GIRAPH-73
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.70.0
Reporter: Sebastian Schelter
Priority: Minor
 Attachments: GIRAPH-73-2.patch, GIRAPH-73.patch


 Hi, I'm currently reading Giraph's sources and starting to play with it. I 
 fixed some small things along the way (like making sure writers are closed, 
 exceptions are logged, etc.), thought that maybe helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-105) BspServiceMaster.checkWorkers() should return empty lists instead of null

2011-12-18 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13171898#comment-13171898
 ] 

Avery Ching commented on GIRAPH-105:


Thanks for the reworking!  +1.

 BspServiceMaster.checkWorkers() should return empty lists instead of null
 -

 Key: GIRAPH-105
 URL: https://issues.apache.org/jira/browse/GIRAPH-105
 Project: Giraph
  Issue Type: Bug
Affects Versions: 0.70.0
Reporter: Sebastian Schelter
Priority: Minor
 Attachments: GIRAPH-105-2.patch, GIRAPH-105.patch


 BspServiceMaster.checkWorkers() is invoked in 
 BspServiceMaster.coordinateSuperstep() and in 
 BspServiceMaster.createInputSplits(). Both check for an empty list to fail 
 the job in case something has gone wrong. However, checkWorkers() returns 
 null in case of problems, causing an NPE in the calling code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-93) Hive input / output format

2011-12-16 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170831#comment-13170831
 ] 

Avery Ching commented on GIRAPH-93:
---

Argh, since HCatalog is not published to maven, this is a bit of a problem.  We 
could add a system dependency, but it's a little messy (yucky warnings).  

I can get it to build with my compiled jar, but get warnings like:

[WARNING] 'dependencies.dependency.systemPath' for 
org.apache.hcatalog:hcatalog:jar should not point at files within the project 
directory, ${basedir}/lib/hcatalog-0.3.0-dev.jar will be unresolvable by 
dependent projects @ line 527, column 19
[WARNING] 
[WARNING] It is highly recommended to fix these problems because they threaten 
the stability of your build.
[WARNING] 
[WARNING] For this reason, future Maven versions might no longer support 
building such malformed projects.
[WARNING] 
[INFO] 
[INFO] 
[INFO] Building Apache Incubator Giraph 0.70
[INFO] 
[WARNING] The POM for org.apache.hadoop:hadoop-core:jar:0.20.1 is missing, no 
dependency information available
[WARNING] The POM for org.apache.hadoop:hadoop-core:jar:0.20.3-CDH3-SNAPSHOT is 
missing, no dependency information available
[WARNING] Could not transfer metadata asm:asm/maven-metadata.xml from/to 
local.repository (file:../../local.repository/trunk): No connector available to 
access repository local.repository (file:../../local.repository/trunk) of type 
legacy using the available factories WagonRepositoryConnectorFactory
[INFO] 
[INFO] --- maven-enforcer-plugin:1.0.1:enforce (enforce-maven) @ giraph ---
[INFO] 
[INFO] --- maven-resources-plugin:2.4.3:resources (default-resources) @ giraph 
---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory 
/home/aching/giraph/src/main/resources
[INFO] 
[INFO] --- maven-compiler-plugin:2.3.2:compile (default-compile) @ giraph ---
[INFO] Compiling 122 source files to /home/aching/giraph/target/classes
[INFO] 
[INFO] --- maven-assembly-plugin:2.2:single (build-fat-jar) @ giraph ---
[WARNING] Missing POM for org.apache.hadoop:hadoop-core:jar:0.20.1
[WARNING] Missing POM for org.apache.hadoop:hadoop-core:jar:0.20.3-CDH3-SNAPSHOT

Maybe wait on HCATALOG-132?

 Hive input / output format
 --

 Key: GIRAPH-93
 URL: https://issues.apache.org/jira/browse/GIRAPH-93
 Project: Giraph
  Issue Type: New Feature
Reporter: Avery Ching
Assignee: Avery Ching

 It would be great to be able to load/store data from/to Hive tables.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-57) Add new RPC call (putVertexIdMessagesList) to batch putMsgList RPCs together

2011-12-16 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-57?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170857#comment-13170857
 ] 

Avery Ching commented on GIRAPH-57:
---

Emergency fix to allow trunk to compile on certain platforms:

[ERROR] 
/home/hudson/hudson-slave/workspace/Giraph-trunk-Commit/trunk/src/main/java/org/apache/giraph/comm/VertexIdMessages.java:[66,45]
 type parameters of II cannot be determined; no unique maximal instance 
exists for type variable I with upper bounds 
I,org.apache.hadoop.io.WritableComparable

==
--- 
incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/VertexIdMessages.java
 (original)
+++ 
incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/VertexIdMessages.java
 Fri Dec 16 09:26:44 2011
@@ -63,7 +63,7 @@ public class VertexIdMessagesI extends 
 
 @Override
 public void readFields(DataInput input) throws IOException {
-vertexId = BspUtils.createVertexIndex(getConf());
+vertexId = BspUtils.IcreateVertexIndex(getConf());

 Add new RPC call (putVertexIdMessagesList) to batch putMsgList RPCs together
 

 Key: GIRAPH-57
 URL: https://issues.apache.org/jira/browse/GIRAPH-57
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan
Assignee: Avery Ching
 Attachments: GIRAPH-57.diff, GIRAPH-57.diff.2


 Right now messages are sent to a vertex one at a time.  It would be good to 
 have a putMsgs call that could send messages to multiple vertices (all hosted 
 on the same worker).  We'd save a huge number of individual RPC calls at the 
 expense of having smaller calls with larger payloads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-45) Improve the way to keep outgoing messages

2011-12-16 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13171073#comment-13171073
 ] 

Avery Ching commented on GIRAPH-45:
---

I think that a read messages-by-vertex at a time from disk will reduce memory 
pressure more than the partition-based storage.  I'm assuming that 
key=vertex_id and value=message_list in your explanation.  How do you keep the 
keys together in the file?  For instance, suppose that you get the following 
tuples vertex_id, message_list

0, 2.0, 3.0
3, 1.0
7, 34.0
4, 23.0
3, 20.0

In a bad scenario, you have to spill to disk after each tuple.  The files 
totally are out of order and your index vertex, bytes offset looks something 
like:
0, 0
3, 24
7, 40
4, 56

But if I'm understanding this scheme, wouldn't each vertex need to scan the 
entire file if the vertices keep coming and are totally random?  

I suppose that another way to do this is to use the partition-based method and 
add a small change.  If the partition is deemed to large to load in memory and 
sort, it could be read and re-dumped into n files, where n is chosen such that 
there is a good chance that it produces small enough files so that every one of 
them can fit in memory at a time.  This can be done recursively.

 Improve the way to keep outgoing messages
 -

 Key: GIRAPH-45
 URL: https://issues.apache.org/jira/browse/GIRAPH-45
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Reporter: Hyunsik Choi
Assignee: Hyunsik Choi

 As discussed in GIRAPH-12(http://goo.gl/CE32U), I think that there is a 
 potential problem to cause out of memory when the rate of message generation 
 is higher than the rate of message flush (or network bandwidth).
 To overcome this problem, we need more eager strategy for message flushing or 
 some approach to spill messages into disk.
 The below link is Dmitriy's suggestion.
 https://issues.apache.org/jira/browse/GIRAPH-12?focusedCommentId=13116253page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13116253

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-45) Improve the way to keep outgoing messages

2011-12-16 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13171404#comment-13171404
 ] 

Avery Ching commented on GIRAPH-45:
---

Ah, thank you for clarifying that.  The only minor downside is that a sorted 
map uses a bit more memory than a non-sorted one typically.  But it's probably 
not too big a deal.  Sounds like an idea certainly worth trying out Claudio =).

 Improve the way to keep outgoing messages
 -

 Key: GIRAPH-45
 URL: https://issues.apache.org/jira/browse/GIRAPH-45
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Reporter: Hyunsik Choi
Assignee: Hyunsik Choi

 As discussed in GIRAPH-12(http://goo.gl/CE32U), I think that there is a 
 potential problem to cause out of memory when the rate of message generation 
 is higher than the rate of message flush (or network bandwidth).
 To overcome this problem, we need more eager strategy for message flushing or 
 some approach to spill messages into disk.
 The below link is Dmitriy's suggestion.
 https://issues.apache.org/jira/browse/GIRAPH-12?focusedCommentId=13116253page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13116253

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-45) Improve the way to keep outgoing messages

2011-12-15 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170613#comment-13170613
 ] 

Avery Ching commented on GIRAPH-45:
---

You might not need the BTree for indexing the destination vertices I think.  
Couldn't we use files to group the messages sent to the same partition?  If you 
simply dump all the received vertex id, messages tuples to a file that is 
specific for a partition, we can simply load all the tuples for a single 
partition prior to computing on the worker and assign them to their 
destinations.  I'm a little concerned that using an in-memory data structure to 
keep the message indices might be a little expensive (i.e. one BTree per vertex 
in your model if I'm understanding correctly).

Regarding the streaming, I am not proposing to change the BSP model.  I'm 
talking about sending the messages as we go along during the computation.  
Currently the messages are bulk sent at the end of the superstep.  So rather 
than a bulk send, allow every worker to stream out some bunch of messages 
when under some pressure, rather than everything at the end.

As far as detecting memory pressure, it looks like Runtime seems to do an okay 
job.  If anyone knows anything better, that's cool too.  You can look at 
MemoryUtils#getRuntimeMemoryStats() for a Runtime example.  We'll need to 
define limits for memory pressure.

 Improve the way to keep outgoing messages
 -

 Key: GIRAPH-45
 URL: https://issues.apache.org/jira/browse/GIRAPH-45
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Reporter: Hyunsik Choi
Assignee: Hyunsik Choi

 As discussed in GIRAPH-12(http://goo.gl/CE32U), I think that there is a 
 potential problem to cause out of memory when the rate of message generation 
 is higher than the rate of message flush (or network bandwidth).
 To overcome this problem, we need more eager strategy for message flushing or 
 some approach to spill messages into disk.
 The below link is Dmitriy's suggestion.
 https://issues.apache.org/jira/browse/GIRAPH-12?focusedCommentId=13116253page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13116253

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-93) Hive input / output format

2011-12-15 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170631#comment-13170631
 ] 

Avery Ching commented on GIRAPH-93:
---

Just wanted to update that I did get this to work with HCatalog a while ago.  
And amazingly it actually works!  I'll put together a diff to getting this into 
Giraph.

 Hive input / output format
 --

 Key: GIRAPH-93
 URL: https://issues.apache.org/jira/browse/GIRAPH-93
 Project: Giraph
  Issue Type: New Feature
Reporter: Avery Ching
Assignee: Avery Ching

 It would be great to be able to load/store data from/to Hive tables.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-80) Don't expose the list holding the messages in BasicVertex

2011-12-15 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-80?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170651#comment-13170651
 ] 

Avery Ching commented on GIRAPH-80:
---

By the way Sebastian, you can run the Hadoop tests against a single node Hadoop 
instance (I often do this on my laptop).  It makes it much easier to run this 
test and takes me about 17 minutes or so.  Not too bad.

 Don't expose the list holding the messages in BasicVertex
 -

 Key: GIRAPH-80
 URL: https://issues.apache.org/jira/browse/GIRAPH-80
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.70.0
Reporter: Sebastian Schelter

 I'm currently trying to implement my own memory efficient vertex (similar to 
 LongDoubleFloatDoubleVertex) and ran into problems with getMsgList()
 This method returns a list pointing to the messages of the vertex and it is 
 modified externally (BasicRPCCommunications calls clear() and addAll() e.g.). 
 This makes it very hard to use something else than a java.util.List 
 internally (LongDoubleFloatDoubleVertex hacked around this) and it is 
 generally dangerous to have the internal state of an object be modified 
 externally. It also makes the code harder to read and understand.
 I'd suggest to change the API to let a vertex handle the modifications itself 
 internally (e.g. add something like pushMessages(...))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-103) Added properties for commonly used package version to pom.xml

2011-12-15 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170803#comment-13170803
 ] 

Avery Ching commented on GIRAPH-103:


No one wants to take a quick look?  It's very short, I promise...

 Added properties for commonly used package version to pom.xml
 -

 Key: GIRAPH-103
 URL: https://issues.apache.org/jira/browse/GIRAPH-103
 Project: Giraph
  Issue Type: Improvement
  Components: build
Reporter: Avery Ching
Priority: Trivial
 Attachments: GIRAPH-103.diff




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-45) Improve the way to keep outgoing messages

2011-12-14 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13169756#comment-13169756
 ] 

Avery Ching commented on GIRAPH-45:
---

I've been thinking about this a bit more.  I don't think we actually need a 
database if we use disk friendly approach and take advantage of the knowledge 
of our system.  Here is a rough proposal:

There are two ways we can save memory here (out-of-core graph) and (out-of-core 
messages).  In this way, we can use the memory as a cache rather than a totally 
in-memory database and messaging system.

Here's how we can do the out-of-core graph:

Workers already do the computation by partition.  All partitions that are owned 
by the worker need to be processed and we want to minimize the amount of data 
loaded/stored to local disk (i.e. superstep.worker id.partition 
#.vertices).  Local disk should be used here because it will be faster and no 
remote worker needs to directly access this data.

Therefore the general algorithm would be

for (partition : all in memory partitions)
  partition.computeAndGenerateOutgoingMessages()
  if (memoryPressure)
 partition.storeToFileSystem()

for (partition : remaining in file system partitions)
  partition.loadFromFileSystem()
  partition.computeAndGenerateOutgoingMessages()
  if (memoryPressure)
 partition.storeToFileSystem()

This should keep our partition cache as full as possible and have a minimal 
amount of loading/storing for partitions that can't fit in memory.

Here's how we can do the out-of-core messaging:

As the partitions are being processed by the workers, outgoing messages as kept 
in memory currently.  They are flushed is a message list grows to a certain 
size.  Otherwise, the messages are bulk sent at the end of the computation.

What we can do is wait for a sendMessageReq and check for memory pressure.  If 
memory pressure is an issue, then dump all the outgoing messages to HDFS files 
(i.e. superstep.worker id.partition #.outgoingMessages).  Future 
sendMessageReq may be kept in memory or dumped to the same HDFS files if memory 
pressure is an issue.  These HDFS files are closed prior to the flush.  During 
the flush, the worker sends the in-memory messages as normal to the 
destinations as well as the filenames of the out-of-core messages to their 
respective owners.  Note that the files are stored in HDFS to allow a remote 
worker the ability to load the messages as they see fit.  Maybe reduce the 
replication factor to 2 by default for these files?

This tactic should reduce memory usage on the destination worker as well, since 
the destination workers don't need to load the HDFS files until they are 
actually doing the computation for that partition.

Checkpoints should be able to point to the out-of-core data as well to reduce 
the amount of data to store.

Still, there is one more remaining piece (loading the graph).  This can also 
run out of memory.  Currently vertex lists are batched and sent to destination 
workers by partition.  Partitions should have the ability to be incrementally 
dumped to local files on the destination if there is memory pressure.  Then 
prior to the 1st superstep, each partition can be assembled (local files + any 
vertices stil in memory) and can use the out-of-core graph algorithm indicated 
above.

This proposal should take advantage of large reads/writes so that we don't need 
a database.  I will require out-of-core storage in the very near future as the 
graph i need to load will have billions of edges and I probably won't have 
enough nodes and memory to keep it all in core.  Please let me know your 
thoughts on this approach.


 Improve the way to keep outgoing messages
 -

 Key: GIRAPH-45
 URL: https://issues.apache.org/jira/browse/GIRAPH-45
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Reporter: Hyunsik Choi
Assignee: Hyunsik Choi

 As discussed in GIRAPH-12(http://goo.gl/CE32U), I think that there is a 
 potential problem to cause out of memory when the rate of message generation 
 is higher than the rate of message flush (or network bandwidth).
 To overcome this problem, we need more eager strategy for message flushing or 
 some approach to spill messages into disk.
 The below link is Dmitriy's suggestion.
 https://issues.apache.org/jira/browse/GIRAPH-12?focusedCommentId=13116253page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13116253

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-103) Added properties for commonly used package version to pom.xml

2011-12-09 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13166430#comment-13166430
 ] 

Avery Ching commented on GIRAPH-103:


Also, I updated my affiliation to Facebook.

 Added properties for commonly used package version to pom.xml
 -

 Key: GIRAPH-103
 URL: https://issues.apache.org/jira/browse/GIRAPH-103
 Project: Giraph
  Issue Type: Improvement
  Components: build
Reporter: Avery Ching
Priority: Trivial
 Attachments: GIRAPH-103.diff




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-10) Aggregators are not exported

2011-12-05 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-10?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13162929#comment-13162929
 ] 

Avery Ching commented on GIRAPH-10:
---

Thanks for the revised diff.  I have some suggestions, let me know what you 
think.

First, you are missing 

org.apache.giraph.examples.SimpleAggregatorWriter in the diff.

Hence I am getting errors in my build:

[ERROR] 
/Users/aching/Avery/source/giraph_trunk/src/test/java/org/apache/giraph/TestBspBasic.java:[24,33]
 cannot find symbol
symbol  : class SimpleAggregatorWriter
location: package org.apache.giraph.examples
[ERROR] 
/Users/aching/Avery/source/giraph_trunk/src/test/java/org/apache/giraph/TestBspBasic.java:[355,37]
 cannot find symbol

Also, your IDE is using tabs.  The CODE_CONVENTIONS asks for spaces instead of 
tabs.  Can you please convert all your tabs to spaces?

In AggreatorWriter.java
- Quite a few tabs in this file, please changes to spaces.
- Indentation issues lines: 60 and greater in AggregatorWriter.java
- line 52: The methods is called at the = This method is called at the
- line 60: map is a bit non-descriptive here.  Can you change it to something 
else, i.e. aggregatorNameValueMap or even just aggregatorMap?
- line 64: successfull = successful

TextAggregatorWriter.java
-line 44: aggreatos - aggregators

TestBspBasic.java
- line 371:  for (i=0; ; i++) { =  for (i = 0; ; i++) {



 Aggregators are not exported
 

 Key: GIRAPH-10
 URL: https://issues.apache.org/jira/browse/GIRAPH-10
 Project: Giraph
  Issue Type: New Feature
Reporter: Avery Ching
Assignee: Claudio Martella
Priority: Minor
 Attachments: GIRAPH-10.diff, GIRAPH-10.diff


 Currently, aggregator values cannot be saved after a Giraph job.  There 
 should be a way to do this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-10) Aggregators are not exported

2011-12-05 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-10?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163050#comment-13163050
 ] 

Avery Ching commented on GIRAPH-10:
---

Much improved Claudio.

A couple more minor suggestions:

Please add a javadoc comment for the class SimpleAggregatorWriter on what it 
does (given that it is in examples and users will be looking for help on what 
it is doing).

TestBspBasic.java

Before line 356: assertTrue(job.run(true));

Should add something like (as in other tests in that file):

Path outputPath = new Path(/tmp/ + getCallingMethodName());
removeAndSetOutput(job, outputPath);

If you don't do this, the next time someone adds another test to this dir and 
it doesn't set the output dir, it could potentially cause issues I think if 
they are relying on specific stuff in that dir.

By the way, good bug fix in the test.

If you agree and make those changes, this is an effective +1, please upload 
your final diff and feel free to commit. =)


 Aggregators are not exported
 

 Key: GIRAPH-10
 URL: https://issues.apache.org/jira/browse/GIRAPH-10
 Project: Giraph
  Issue Type: New Feature
Reporter: Avery Ching
Assignee: Claudio Martella
Priority: Minor
 Attachments: GIRAPH-10.diff, GIRAPH-10.diff, GIRAPH-10.diff


 Currently, aggregator values cannot be saved after a Giraph job.  There 
 should be a way to do this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-10) Aggregators are not exported

2011-12-04 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-10?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13162640#comment-13162640
 ] 

Avery Ching commented on GIRAPH-10:
---

Hi Claudio, great stuff!

A couple of questions.

I downloaded GIRAPH-10.diff and think you are missing AggragatorWriter.java and 
TextAggregatorWriter.java from the diff.  Also, I was thinking, shouldn't the 
default be to not write any aggregator data?

 Aggregators are not exported
 

 Key: GIRAPH-10
 URL: https://issues.apache.org/jira/browse/GIRAPH-10
 Project: Giraph
  Issue Type: New Feature
Reporter: Avery Ching
Assignee: Claudio Martella
Priority: Minor
 Attachments: GIRAPH-10.diff


 Currently, aggregator values cannot be saved after a Giraph job.  There 
 should be a way to do this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-100) Data input sampling and testing improvements

2011-12-01 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13161031#comment-13161031
 ] 

Avery Ching commented on GIRAPH-100:


Anyone? =)

 Data input sampling and testing improvements
 

 Key: GIRAPH-100
 URL: https://issues.apache.org/jira/browse/GIRAPH-100
 Project: Giraph
  Issue Type: New Feature
  Components: graph
Reporter: Avery Ching
Assignee: Avery Ching
 Attachments: GIRAPH-100.patch


 It would be really nice to help debug an application by limiting the input 
 data (% of input splits, max vertices per input split).  Also, it would be 
 nice for the workers to provide a little more debugging info on how far along 
 they are with processing the input data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-100) Data input sampling and testing improvements

2011-12-01 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13161219#comment-13161219
 ] 

Avery Ching commented on GIRAPH-100:


Sorry Jakob, I'll try to stop doing formatting changes.  Habit, I suppose.  In 
the future, I'll file separate issues for formatting cleanup.

What's the point of the changes in TextVertexInputFormat method visibility? 
Are they related to this patch?

No, I can remove it.  Just a bit safer I guess since they should be protected.

We're throwing a lot of Stringly typed exceptions. For more robust error 
handling and recovery, it may be good to strongly type these instead.

Which exceptions are you referring to?

re: SuperstepHashPartitionerFactory. Moving it out of test and into the 
example directory seems a bit counterproductive to me. It's a pathological 
implementation; wouldn't it be better to provide a more useful example, rather 
than one that's explicitly not meant to be used?

Until we start jaring up things separately, currently the Hadoop unit test is 
broken when the SuperstepHashPartitionerFactory is not found.  The right 
solution might be to create another jar that has the unittest classes and can 
be run as part of the Hadoop instance unittest.  Can we do that in another 
issue?  I agree that it isn't a good example, but it's still a good test since 
it guarantees partition movement between workers.


 Data input sampling and testing improvements
 

 Key: GIRAPH-100
 URL: https://issues.apache.org/jira/browse/GIRAPH-100
 Project: Giraph
  Issue Type: New Feature
  Components: graph
Reporter: Avery Ching
Assignee: Avery Ching
 Attachments: GIRAPH-100.patch


 It would be really nice to help debug an application by limiting the input 
 data (% of input splits, max vertices per input split).  Also, it would be 
 nice for the workers to provide a little more debugging info on how far along 
 they are with processing the input data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-100) Data input sampling and testing improvements

2011-12-01 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13161243#comment-13161243
 ] 

Avery Ching commented on GIRAPH-100:


Ah, I see.  We should file another JIRA to create a GiraphException and the 
various types I suppose.  Or do you want me to do it in this JIRA?

I can put the SuperstepHashPartitionerFactory into another directory 
src/main/java/org/apache/giraph/integration/SuperstepHashPartitionerFactory.java

I like the idea of mocking in general, but not sure how mocking can verify the 
behavior in this case.  Probably leave it as an integration test only.  IMO, we 
should file a separate JIRA for separating the tests into unittests (mocking, 
individual class tests) and integration tests, but integration tests can still 
be run locally and/or remote.

Let me know what you think and I'll make the requested changes.

 Data input sampling and testing improvements
 

 Key: GIRAPH-100
 URL: https://issues.apache.org/jira/browse/GIRAPH-100
 Project: Giraph
  Issue Type: New Feature
  Components: graph
Reporter: Avery Ching
Assignee: Avery Ching
 Attachments: GIRAPH-100.patch


 It would be really nice to help debug an application by limiting the input 
 data (% of input splits, max vertices per input split).  Also, it would be 
 nice for the workers to provide a little more debugging info on how far along 
 they are with processing the input data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-99) Make AdjacencyListVertexReader and its constructor public

2011-11-22 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155334#comment-13155334
 ] 

Avery Ching commented on GIRAPH-99:
---

Thanks Kohei, glad to see you working on Giraph!

 Make AdjacencyListVertexReader and its constructor public
 -

 Key: GIRAPH-99
 URL: https://issues.apache.org/jira/browse/GIRAPH-99
 Project: Giraph
  Issue Type: Wish
  Components: lib
Reporter: Kohei Ozaki
Assignee: Kohei Ozaki
Priority: Minor
  Labels: patch
 Fix For: 0.70.0

 Attachments: GIRAPH-99.diff


 Hi,
 I'd like to write a class inherited from AdjacencyListVertexReader
 to make a library using Giraph (like git.io/ALVR),
 but AdjacencyListVertexReader is a private class and its constructor are 
 private.
 I guess making it public is useful to handle a more complex input format
 specified by the data structure of algorithms.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-84) Simplify boolean expressions in BspRecordReader

2011-11-20 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-84?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153927#comment-13153927
 ] 

Avery Ching commented on GIRAPH-84:
---

Claudio, once a committer +1's something they can commit it can commit on 
behalf of the submitter.  If it's another committer that submits, then 
typically, after the +1, the submitter will commit.

 Simplify boolean expressions in BspRecordReader
 ---

 Key: GIRAPH-84
 URL: https://issues.apache.org/jira/browse/GIRAPH-84
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan
Assignee: Shaunak Kashyap
  Labels: newbie

 Twice in BspRecordReader boolean expressions are evaluated with == and can be 
 simplified to just one liners or variable evaluation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-96) Support for Graphs with Huge adjacency lists

2011-11-17 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-96?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152203#comment-13152203
 ] 

Avery Ching commented on GIRAPH-96:
---

The general issue of overloading our memory and working out of core has been 
discussed a little in GIRAPH-45 as well.  I suppose you could implement a 
BasicVertex that loaded everything on demand from HBase, but I suspect it would 
be a little slow, but depends on the application.

 Support for Graphs with Huge adjacency lists
 

 Key: GIRAPH-96
 URL: https://issues.apache.org/jira/browse/GIRAPH-96
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Affects Versions: 0.70.0
Reporter: Arun Suresh

 Currently the vertex initialize() method is passed the complete adjacency 
 list as a HashMap. All the current concrete implementations of Vertex iterate 
 over the adjacency list and recreate new Data Structures within the Vertex 
 instance to hold/manipulate the adjacency list. This would seize to be 
 feasible once the size of the adjacency list becomes really huge.
 I propose storing the adjacency list and all vertex information (and incoming 
 messages ?) in a distributed data store such as HBase. The adjacency list can 
 be lazily loaded via HBase Scans. I was thinking of an HBase schema where the 
 row Id is a concatenation of VertexID+OutboundVertexId with a single column 
 containing the edge.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-84) Simplify boolean expressions in BspRecordReader

2011-11-17 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-84?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152488#comment-13152488
 ] 

Avery Ching commented on GIRAPH-84:
---

ternary is fine with me.  I think we use it in the codebase.  We should 
probably add it to the coding conventions...unless someone objects.

 Simplify boolean expressions in BspRecordReader
 ---

 Key: GIRAPH-84
 URL: https://issues.apache.org/jira/browse/GIRAPH-84
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan
Assignee: Attila Csordas
  Labels: newbie

 Twice in BspRecordReader boolean expressions are evaluated with == and can be 
 simplified to just one liners or variable evaluation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-68) Implement a Graph Generator

2011-11-16 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-68?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151405#comment-13151405
 ] 

Avery Ching commented on GIRAPH-68:
---


Looks good Hyunsik, a few comments.

Probably want to add a javadoc comment for GraphGenerator
Lines 40-41: Should have 8 space indenting
Line 46: needs 4 more spaces
Line 58: Over 80 chars

So is the idea that PageRankBenchmark and RandomMessageBenchmark would use it?  
Would you like to modify them to do so?

 Implement a Graph Generator
 ---

 Key: GIRAPH-68
 URL: https://issues.apache.org/jira/browse/GIRAPH-68
 Project: Giraph
  Issue Type: New Feature
  Components: benchmark
Affects Versions: 0.70.0
Reporter: Hyunsik Choi
Assignee: Hyunsik Choi
 Attachments: GIRAPH-68_1.patch


 To provide users with benchmark environments and to deeply test the 
 input/output system of giraph, we need a graph generator. We will enable the 
 graph generator to generate various kinds of graph data sets by specifying a 
 VertexInputFormat and a VertexOutputFormat.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-91) Large-memory improvements (Memory reduced vertex implementation, fast failure, added settings)

2011-11-16 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151603#comment-13151603
 ] 

Avery Ching commented on GIRAPH-91:
---

By the way, rb allows you to download the diff directly (so you don't have to 
worry about them staying in sync).

https://reviews.apache.org/r/2868/diff/raw/

 Large-memory improvements (Memory reduced vertex implementation, fast 
 failure, added settings) 
 ---

 Key: GIRAPH-91
 URL: https://issues.apache.org/jira/browse/GIRAPH-91
 Project: Giraph
  Issue Type: Improvement
Reporter: Avery Ching
Assignee: Avery Ching
 Attachments: GIRAPH-91.diff


 Current vertex implementation uses a HashMap for storing the edges, which is 
 quite memory heavy for large graphs.  The default settings in Giraph need to 
 be improved for large graphs and heaps of 20G.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-92) Need outputformat for just vertex ID and value

2011-11-16 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151736#comment-13151736
 ] 

Avery Ching commented on GIRAPH-92:
---

I agree this could be useful.

Couple of format errors:

if (expr)

+  if(reverseOutput) {

typo

+  public void testWithDifferentDelimieter()  throws IOException,

Interrupted needs one more space

+  public void testWithDifferentDelimieter()  throws IOException,
+ InterruptedException {
+Configuration conf = new Configuration();

Extra line break

+writer.writeVertex(vertex);
+
+
+verify(tw).write(expected, null);

 Need outputformat for just vertex ID and value
 --

 Key: GIRAPH-92
 URL: https://issues.apache.org/jira/browse/GIRAPH-92
 Project: Giraph
  Issue Type: New Feature
  Components: lib
Affects Versions: 0.70.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.70.0

 Attachments: GIRAPH-92.patch


 We should have an text outputformat that just spits out the vertex id and 
 value without its edges:
 {noformat}index.html 0.9423{noformat}
 This would be particularly helpful for further processing by, for instance, 
 Pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-78) Be smarter about multiple instances of the same vertex

2011-11-16 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151782#comment-13151782
 ] 

Avery Ching commented on GIRAPH-78:
---

Actually the more I think about it, this might not be too useful unless you 
have large vertexId objects.  I guess the idea would be to keep a cache, maybe 
in the GraphState or the WorkerContext.

 Be smarter about multiple instances of the same vertex
 --

 Key: GIRAPH-78
 URL: https://issues.apache.org/jira/browse/GIRAPH-78
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan

 In a graph such as 
 {noformat}a - b, z
 b - c, z
 c - a, z
 ...
 z{noformat}
 where vertices a,b,c and are hosted on one worker and z is hosted on another, 
 it would be good to cache instances of z so a,b,c all point at the same 
 instance, rather than generating multiple copies of the same remote vertex 
 during vertex reading.  This is less important with primitive types and the 
 recent work done there, but very useful for more complex types.  Since the 
 vertex readers are in userland, it would be good to provide these facilities 
 as a library implementing users can access. ]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   >