[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats
[ https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257343#comment-13257343 ] Avery Ching commented on GIRAPH-153: Okay, so it's still tonight (even though it is 12:44 AM). =) Brian, I've done an initial look at the code on reviewboard https://reviews.apache.org/r/4801/. Please take a look. Thanks. HBase/Accumulo Input and Output formats --- Key: GIRAPH-153 URL: https://issues.apache.org/jira/browse/GIRAPH-153 Project: Giraph Issue Type: New Feature Components: bsp Affects Versions: 0.1.0 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB Reporter: Brian Femiano Attachments: GIRAPH-153.patch Four abstract classes that wrap their respective delegate input/output formats for easy hooks into vertex input format subclasses. I've included some sample programs that show two very simple graph algorithms. I have a graph generator that builds out a very simple directed structure, starting with a few 'root' nodes. Root nodes are defined as nodes which are not listed as a child anywhere in the graph. Algorithm 1) AccumuloRootMarker.java -- Accumulo as read/write source. Every vertex starts thinking it's a root. At superstep 0, send a message down to each child as a non-root notification. After superstep 1, only root nodes will have never been messaged. Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by bundling the notification logic followed by root node propagation. Once we've marked the appropriate nodes as roots, tell every child which roots it can be traced back to via one or more spanning trees. This will take N + 2 supersteps where N is the maximum number of hops from any root to any leaf, plus 2 supersteps for the initial root flagging. I've included all relevant code plus DistributedCacheHelper.java for recursive cache file and archive searches. It is more hadoop centric than giraph, but these jobs use it so I figured why not commit here. These have been tested through local JobRunner, pseudo-distributed on the aforementioned hardware, and full distributed on EC2. More details in the comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-180) Publish SNAPSHOTs and released artifacts in the Maven repository
[ https://issues.apache.org/jira/browse/GIRAPH-180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256246#comment-13256246 ] Avery Ching commented on GIRAPH-180: This is a good idea. The only question I would have though is would we publish different jars for every version of hadoop? Publish SNAPSHOTs and released artifacts in the Maven repository Key: GIRAPH-180 URL: https://issues.apache.org/jira/browse/GIRAPH-180 Project: Giraph Issue Type: Improvement Components: build Affects Versions: 0.1.0 Reporter: Paolo Castagna Priority: Minor Original Estimate: 4h Remaining Estimate: 4h Currently Giraph uses Maven to drive its build. However, no Maven artifacts nor SNAPSHOTs are published in the Apache Maven repository or Maven central. It would be useful to have Apache Giraph artifacts and SNAPSHOTs published and enable people to use Giraph without recompiling themselves. Right now users can checkout Giraph, mvn install it and use this for their dependency: dependency groupIdorg.apache.giraph/groupId artifactIdgiraph/artifactId version0.2-SNAPSHOT/version /dependency So, it's not that bad, but it can be better. :-) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats
[ https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256767#comment-13256767 ] Avery Ching commented on GIRAPH-153: I think hosting the submodule on github would produce one more barrier to entry. I prefer to have it with Giraph directly. Anyone else? HBase/Accumulo Input and Output formats --- Key: GIRAPH-153 URL: https://issues.apache.org/jira/browse/GIRAPH-153 Project: Giraph Issue Type: New Feature Components: bsp Affects Versions: 0.1.0 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB Reporter: Brian Femiano Attachments: GIRAPH-153.patch Four abstract classes that wrap their respective delegate input/output formats for easy hooks into vertex input format subclasses. I've included some sample programs that show two very simple graph algorithms. I have a graph generator that builds out a very simple directed structure, starting with a few 'root' nodes. Root nodes are defined as nodes which are not listed as a child anywhere in the graph. Algorithm 1) AccumuloRootMarker.java -- Accumulo as read/write source. Every vertex starts thinking it's a root. At superstep 0, send a message down to each child as a non-root notification. After superstep 1, only root nodes will have never been messaged. Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by bundling the notification logic followed by root node propagation. Once we've marked the appropriate nodes as roots, tell every child which roots it can be traced back to via one or more spanning trees. This will take N + 2 supersteps where N is the maximum number of hops from any root to any leaf, plus 2 supersteps for the initial root flagging. I've included all relevant code plus DistributedCacheHelper.java for recursive cache file and archive searches. It is more hadoop centric than giraph, but these jobs use it so I figured why not commit here. These have been tested through local JobRunner, pseudo-distributed on the aforementioned hardware, and full distributed on EC2. More details in the comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats
[ https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256856#comment-13256856 ] Avery Ching commented on GIRAPH-153: I'll take a look at this patch tonight Brian. =) HBase/Accumulo Input and Output formats --- Key: GIRAPH-153 URL: https://issues.apache.org/jira/browse/GIRAPH-153 Project: Giraph Issue Type: New Feature Components: bsp Affects Versions: 0.1.0 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB Reporter: Brian Femiano Attachments: GIRAPH-153.patch Four abstract classes that wrap their respective delegate input/output formats for easy hooks into vertex input format subclasses. I've included some sample programs that show two very simple graph algorithms. I have a graph generator that builds out a very simple directed structure, starting with a few 'root' nodes. Root nodes are defined as nodes which are not listed as a child anywhere in the graph. Algorithm 1) AccumuloRootMarker.java -- Accumulo as read/write source. Every vertex starts thinking it's a root. At superstep 0, send a message down to each child as a non-root notification. After superstep 1, only root nodes will have never been messaged. Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by bundling the notification logic followed by root node propagation. Once we've marked the appropriate nodes as roots, tell every child which roots it can be traced back to via one or more spanning trees. This will take N + 2 supersteps where N is the maximum number of hops from any root to any leaf, plus 2 supersteps for the initial root flagging. I've included all relevant code plus DistributedCacheHelper.java for recursive cache file and archive searches. It is more hadoop centric than giraph, but these jobs use it so I figured why not commit here. These have been tested through local JobRunner, pseudo-distributed on the aforementioned hardware, and full distributed on EC2. More details in the comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-181) Add Hadoop 1.0 profile to pom.xml
[ https://issues.apache.org/jira/browse/GIRAPH-181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254890#comment-13254890 ] Avery Ching commented on GIRAPH-181: +1, committed. Add Hadoop 1.0 profile to pom.xml - Key: GIRAPH-181 URL: https://issues.apache.org/jira/browse/GIRAPH-181 Project: Giraph Issue Type: Improvement Components: build Affects Versions: 0.2.0 Reporter: Eugene Koontz Assignee: Eugene Koontz Fix For: 0.2.0 Attachments: GIRAPH-181.patch, GIRAPH-181.patch Hadoop 1.0.x is now considered the current stable version of Hadoop, according to http://hadoop.apache.org/common/releases.html#Download . This JIRA is to add support within Giraph's maven profile for the 1.0.x Hadoop release. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-184) Upgrade to junit4
[ https://issues.apache.org/jira/browse/GIRAPH-184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254152#comment-13254152 ] Avery Ching commented on GIRAPH-184: Thanks! Upgrade to junit4 - Key: GIRAPH-184 URL: https://issues.apache.org/jira/browse/GIRAPH-184 Project: Giraph Issue Type: Bug Reporter: Devaraj K Assignee: Devaraj K Presently Giraph uses JUnit 3.8.1. We can upgrade to JUnit 4 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-183) Add Claudio's FOSDEM presentation (slides and video) to the site
[ https://issues.apache.org/jira/browse/GIRAPH-183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253488#comment-13253488 ] Avery Ching commented on GIRAPH-183: +1. This is great stuff Claudio. Add Claudio's FOSDEM presentation (slides and video) to the site Key: GIRAPH-183 URL: https://issues.apache.org/jira/browse/GIRAPH-183 Project: Giraph Issue Type: Improvement Components: site Reporter: Claudio Martella Assignee: Claudio Martella Priority: Trivial Labels: newbie Attachments: GIRAPH-183.diff Presentation: http://prezi.com/9ake_klzwrga/apache-giraph-distributed-graph-processing-in-the-cloud/ Video: http://www.youtube.com/watch?v=3ZrqPEIPRe4, http://www.youtube.com/watch?v=BmRaejKGeDM -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-183) Add Claudio's FOSDEM presentation (slides and video) to the site
[ https://issues.apache.org/jira/browse/GIRAPH-183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253655#comment-13253655 ] Avery Ching commented on GIRAPH-183: Are the problems related to GIRAPH-168? Add Claudio's FOSDEM presentation (slides and video) to the site Key: GIRAPH-183 URL: https://issues.apache.org/jira/browse/GIRAPH-183 Project: Giraph Issue Type: Improvement Components: site Reporter: Claudio Martella Assignee: Claudio Martella Priority: Trivial Labels: newbie Attachments: GIRAPH-183.diff Presentation: http://prezi.com/9ake_klzwrga/apache-giraph-distributed-graph-processing-in-the-cloud/ Video: http://www.youtube.com/watch?v=3ZrqPEIPRe4, http://www.youtube.com/watch?v=BmRaejKGeDM -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-168) Simplify munge directive usage with new munge flag HADOOP_SECURE (rather than HADOOP_FACEBOOK) and remove usage of HADOOP
[ https://issues.apache.org/jira/browse/GIRAPH-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250476#comment-13250476 ] Avery Ching commented on GIRAPH-168: Eugene, I committed your patch, which passed 'mvn verify', however, seems to have changed the way the Junit test report somehow. Here's the result after your patch (99) Recording test results No test report files were found. Configuration error? Build step 'Publish JUnit test result report' changed build result to FAILURE Updating GIRAPH-168 Finished: FAILURE https://builds.apache.org/job/Giraph-trunk-Commit/99/ The last commit seemed to have the JUnit test result reports just fine (https://builds.apache.org/job/Giraph-trunk-Commit/98/). Can you please take a look? Simplify munge directive usage with new munge flag HADOOP_SECURE (rather than HADOOP_FACEBOOK) and remove usage of HADOOP - Key: GIRAPH-168 URL: https://issues.apache.org/jira/browse/GIRAPH-168 Project: Giraph Issue Type: Improvement Affects Versions: 0.2.0 Reporter: Eugene Koontz Assignee: Eugene Koontz Attachments: GIRAPH-168.patch, GIRAPH-168.patch, GIRAPH-168.patch, GIRAPH-168.patch, GIRAPH-168.patch, GIRAPH-168.patch This JIRA relates to the mail thread here: http://mail-archives.apache.org/mod_mbox/incubator-giraph-dev/201203.mbox/browser Currently we check for the munge flags HADOOP, HADOOP_FACEBOOK and HADOOP_NON_SECURE when using munge in a few places. Hopefully we can eliminate usage of munge in the future, but until then, we can mitigate the complexity by consolidating the number of flags checked. This JIRA renames HADOOP_FACEBOOK to HADOOP_SECURE, and removes usages of HADOOP, to handle the same conditional compilation requirements. It also makes it easier to add more maven profiles so that we can easily increase our hadoop version coverage. This patch modifies the existing hadoop_facebook profile to use the new HADOOP_SECURE munge flag, rather than HADOOP_FACEBOOK. It also adds a new hadoop maven profile, hadoop_trunk, which also sets HADOOP_SECURE. Finally, it adds a default profile, hadoop_0.20.203. This is needed so that we can specify its dependencies separately from hadoop_trunk, because the hadoop dependencies have changed between trunk and 0.205.0 - the former requires hadoop-common, hadoop-mapreduce-client-core, and hadoop-mapreduce-client-common, whereas the latter requires hadoop-core. With this patch, the following passes: {code} mvn clean verify mvn -Phadoop_trunk clean verify mvn -Phadoop_0.20.203 clean verify {code} Current problems: * I left in place the usage of HADOOP_NON_SECURE, but note that the profile that uses this is hadoop_non_secure, which fails to compile on trunk: https://issues.apache.org/jira/browse/GIRAPH-167 . * I couldn't get -Phadoop_facebook to work; does this work outside of Facebook? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-168) Simplify munge directive usage with new munge flag HADOOP_SECURE (rather than HADOOP_FACEBOOK) and remove usage of HADOOP
[ https://issues.apache.org/jira/browse/GIRAPH-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250874#comment-13250874 ] Avery Ching commented on GIRAPH-168: I can modify Hudson to do execute the commands you used above. Any thoughts/comments? Simplify munge directive usage with new munge flag HADOOP_SECURE (rather than HADOOP_FACEBOOK) and remove usage of HADOOP - Key: GIRAPH-168 URL: https://issues.apache.org/jira/browse/GIRAPH-168 Project: Giraph Issue Type: Improvement Affects Versions: 0.2.0 Reporter: Eugene Koontz Assignee: Eugene Koontz Attachments: GIRAPH-168.patch, GIRAPH-168.patch, GIRAPH-168.patch, GIRAPH-168.patch, GIRAPH-168.patch, GIRAPH-168.patch This JIRA relates to the mail thread here: http://mail-archives.apache.org/mod_mbox/incubator-giraph-dev/201203.mbox/browser Currently we check for the munge flags HADOOP, HADOOP_FACEBOOK and HADOOP_NON_SECURE when using munge in a few places. Hopefully we can eliminate usage of munge in the future, but until then, we can mitigate the complexity by consolidating the number of flags checked. This JIRA renames HADOOP_FACEBOOK to HADOOP_SECURE, and removes usages of HADOOP, to handle the same conditional compilation requirements. It also makes it easier to add more maven profiles so that we can easily increase our hadoop version coverage. This patch modifies the existing hadoop_facebook profile to use the new HADOOP_SECURE munge flag, rather than HADOOP_FACEBOOK. It also adds a new hadoop maven profile, hadoop_trunk, which also sets HADOOP_SECURE. Finally, it adds a default profile, hadoop_0.20.203. This is needed so that we can specify its dependencies separately from hadoop_trunk, because the hadoop dependencies have changed between trunk and 0.205.0 - the former requires hadoop-common, hadoop-mapreduce-client-core, and hadoop-mapreduce-client-common, whereas the latter requires hadoop-core. With this patch, the following passes: {code} mvn clean verify mvn -Phadoop_trunk clean verify mvn -Phadoop_0.20.203 clean verify {code} Current problems: * I left in place the usage of HADOOP_NON_SECURE, but note that the profile that uses this is hadoop_non_secure, which fails to compile on trunk: https://issues.apache.org/jira/browse/GIRAPH-167 . * I couldn't get -Phadoop_facebook to work; does this work outside of Facebook? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-168) Simplify munge directive usage with new munge flag HADOOP_SECURE (rather than HADOOP_FACEBOOK) and remove usage of HADOOP
[ https://issues.apache.org/jira/browse/GIRAPH-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250947#comment-13250947 ] Avery Ching commented on GIRAPH-168: I would ignore the facebook one for now (we can add it later), but I can try mvn -Phadoop_non_secure clean verify mvn -Phadoop_0.20.203 clean verify mvn clean verify mvn -Phadoop_0.23 clean verify mvn -Phadoop_trunk clean verify Simplify munge directive usage with new munge flag HADOOP_SECURE (rather than HADOOP_FACEBOOK) and remove usage of HADOOP - Key: GIRAPH-168 URL: https://issues.apache.org/jira/browse/GIRAPH-168 Project: Giraph Issue Type: Improvement Affects Versions: 0.2.0 Reporter: Eugene Koontz Assignee: Eugene Koontz Attachments: GIRAPH-168.patch, GIRAPH-168.patch, GIRAPH-168.patch, GIRAPH-168.patch, GIRAPH-168.patch, GIRAPH-168.patch This JIRA relates to the mail thread here: http://mail-archives.apache.org/mod_mbox/incubator-giraph-dev/201203.mbox/browser Currently we check for the munge flags HADOOP, HADOOP_FACEBOOK and HADOOP_NON_SECURE when using munge in a few places. Hopefully we can eliminate usage of munge in the future, but until then, we can mitigate the complexity by consolidating the number of flags checked. This JIRA renames HADOOP_FACEBOOK to HADOOP_SECURE, and removes usages of HADOOP, to handle the same conditional compilation requirements. It also makes it easier to add more maven profiles so that we can easily increase our hadoop version coverage. This patch modifies the existing hadoop_facebook profile to use the new HADOOP_SECURE munge flag, rather than HADOOP_FACEBOOK. It also adds a new hadoop maven profile, hadoop_trunk, which also sets HADOOP_SECURE. Finally, it adds a default profile, hadoop_0.20.203. This is needed so that we can specify its dependencies separately from hadoop_trunk, because the hadoop dependencies have changed between trunk and 0.205.0 - the former requires hadoop-common, hadoop-mapreduce-client-core, and hadoop-mapreduce-client-common, whereas the latter requires hadoop-core. With this patch, the following passes: {code} mvn clean verify mvn -Phadoop_trunk clean verify mvn -Phadoop_0.20.203 clean verify {code} Current problems: * I left in place the usage of HADOOP_NON_SECURE, but note that the profile that uses this is hadoop_non_secure, which fails to compile on trunk: https://issues.apache.org/jira/browse/GIRAPH-167 . * I couldn't get -Phadoop_facebook to work; does this work outside of Facebook? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-182) Provide SequenceFileVertexOutputFormat as an available OutputFormat
[ https://issues.apache.org/jira/browse/GIRAPH-182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251180#comment-13251180 ] Avery Ching commented on GIRAPH-182: Agreed, would you like to work on it Pradeep? Provide SequenceFileVertexOutputFormat as an available OutputFormat --- Key: GIRAPH-182 URL: https://issues.apache.org/jira/browse/GIRAPH-182 Project: Giraph Issue Type: New Feature Components: lib Reporter: Pradeep Gollakota Priority: Minor SequenceFile's are heavily used in Hadoop. We should provide SequenceFileVertexOutputFormat. Since SequenceFileVertexInputFormat is already provided, it makes sense to also provide a mirroring OutputFormat -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-168) Simplify munge directive usage with new munge flag HADOOP_SECURE (rather than HADOOP_FACEBOOK) and remove usage of HADOOP
[ https://issues.apache.org/jira/browse/GIRAPH-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249702#comment-13249702 ] Avery Ching commented on GIRAPH-168: Nice that you got it working with all the versions! One question though, why is the line below needed in pom.xml? org.apache.hadoop.giraph.zkJargiraph-0.2-SNAPSHOT-jar-with-dependencies.jar/org.apache.hadoop.giraph.zkJar Simplify munge directive usage with new munge flag HADOOP_SECURE (rather than HADOOP_FACEBOOK) and remove usage of HADOOP - Key: GIRAPH-168 URL: https://issues.apache.org/jira/browse/GIRAPH-168 Project: Giraph Issue Type: Improvement Affects Versions: 0.2.0 Reporter: Eugene Koontz Assignee: Eugene Koontz Attachments: GIRAPH-168.patch, GIRAPH-168.patch, GIRAPH-168.patch, GIRAPH-168.patch, GIRAPH-168.patch This JIRA relates to the mail thread here: http://mail-archives.apache.org/mod_mbox/incubator-giraph-dev/201203.mbox/browser Currently we check for the munge flags HADOOP, HADOOP_FACEBOOK and HADOOP_NON_SECURE when using munge in a few places. Hopefully we can eliminate usage of munge in the future, but until then, we can mitigate the complexity by consolidating the number of flags checked. This JIRA renames HADOOP_FACEBOOK to HADOOP_SECURE, and removes usages of HADOOP, to handle the same conditional compilation requirements. It also makes it easier to add more maven profiles so that we can easily increase our hadoop version coverage. This patch modifies the existing hadoop_facebook profile to use the new HADOOP_SECURE munge flag, rather than HADOOP_FACEBOOK. It also adds a new hadoop maven profile, hadoop_trunk, which also sets HADOOP_SECURE. Finally, it adds a default profile, hadoop_0.20.203. This is needed so that we can specify its dependencies separately from hadoop_trunk, because the hadoop dependencies have changed between trunk and 0.205.0 - the former requires hadoop-common, hadoop-mapreduce-client-core, and hadoop-mapreduce-client-common, whereas the latter requires hadoop-core. With this patch, the following passes: {code} mvn clean verify mvn -Phadoop_trunk clean verify mvn -Phadoop_0.20.203 clean verify {code} Current problems: * I left in place the usage of HADOOP_NON_SECURE, but note that the profile that uses this is hadoop_non_secure, which fails to compile on trunk: https://issues.apache.org/jira/browse/GIRAPH-167 . * I couldn't get -Phadoop_facebook to work; does this work outside of Facebook? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-168) Simplify munge directive usage with new munge flag HADOOP_SECURE (rather than HADOOP_FACEBOOK) and remove usage of HADOOP
[ https://issues.apache.org/jira/browse/GIRAPH-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250224#comment-13250224 ] Avery Ching commented on GIRAPH-168: +1. Given this is a somewhat large change, I'll wait until tonight to see if anyone opposes it. If not, I'll commit. Simplify munge directive usage with new munge flag HADOOP_SECURE (rather than HADOOP_FACEBOOK) and remove usage of HADOOP - Key: GIRAPH-168 URL: https://issues.apache.org/jira/browse/GIRAPH-168 Project: Giraph Issue Type: Improvement Affects Versions: 0.2.0 Reporter: Eugene Koontz Assignee: Eugene Koontz Attachments: GIRAPH-168.patch, GIRAPH-168.patch, GIRAPH-168.patch, GIRAPH-168.patch, GIRAPH-168.patch, GIRAPH-168.patch This JIRA relates to the mail thread here: http://mail-archives.apache.org/mod_mbox/incubator-giraph-dev/201203.mbox/browser Currently we check for the munge flags HADOOP, HADOOP_FACEBOOK and HADOOP_NON_SECURE when using munge in a few places. Hopefully we can eliminate usage of munge in the future, but until then, we can mitigate the complexity by consolidating the number of flags checked. This JIRA renames HADOOP_FACEBOOK to HADOOP_SECURE, and removes usages of HADOOP, to handle the same conditional compilation requirements. It also makes it easier to add more maven profiles so that we can easily increase our hadoop version coverage. This patch modifies the existing hadoop_facebook profile to use the new HADOOP_SECURE munge flag, rather than HADOOP_FACEBOOK. It also adds a new hadoop maven profile, hadoop_trunk, which also sets HADOOP_SECURE. Finally, it adds a default profile, hadoop_0.20.203. This is needed so that we can specify its dependencies separately from hadoop_trunk, because the hadoop dependencies have changed between trunk and 0.205.0 - the former requires hadoop-common, hadoop-mapreduce-client-core, and hadoop-mapreduce-client-common, whereas the latter requires hadoop-core. With this patch, the following passes: {code} mvn clean verify mvn -Phadoop_trunk clean verify mvn -Phadoop_0.20.203 clean verify {code} Current problems: * I left in place the usage of HADOOP_NON_SECURE, but note that the profile that uses this is hadoop_non_secure, which fails to compile on trunk: https://issues.apache.org/jira/browse/GIRAPH-167 . * I couldn't get -Phadoop_facebook to work; does this work outside of Facebook? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-171) total time in MasterThread.run() is calculated incorrectly
[ https://issues.apache.org/jira/browse/GIRAPH-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13248175#comment-13248175 ] Avery Ching commented on GIRAPH-171: +1 Argh, it is inconsistent with the counter, GIRAPH_TIMERS_COUNTER_GROUP_NAME. Thanks for the fix Eugene! total time in MasterThread.run() is calculated incorrectly -- Key: GIRAPH-171 URL: https://issues.apache.org/jira/browse/GIRAPH-171 Project: Giraph Issue Type: Bug Reporter: Eugene Koontz Assignee: Eugene Koontz Attachments: GIRAPH-171.patch While running PageMarkBenchMark, I was seeing in the output: {{graph.MasterThread(172): total: Took 1.3336739262910001E9 seconds.}} This was because currently, in {{MasterThread.run()}}, we have: {code} LOG.info(total: Took + ((System.currentTimeMillis() / 1000.0d) - setupSecs) + seconds.); {code} but it should be: {code} LOG.info(total: Took + ((System.currentTimeMillis() - startMillis) / 1000.0d) + seconds.); {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-77) Coordinator should expose a web interface with progress, vertex region assignments, etc.
[ https://issues.apache.org/jira/browse/GIRAPH-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246561#comment-13246561 ] Avery Ching commented on GIRAPH-77: --- Paolo, would you be interested in working on this? =) Coordinator should expose a web interface with progress, vertex region assignments, etc. Key: GIRAPH-77 URL: https://issues.apache.org/jira/browse/GIRAPH-77 Project: Giraph Issue Type: New Feature Reporter: Jakob Homan It would be nice if the coordinator worker had a web interface that showed progress, splits, etc. during job execution. Right now it would duplicate information currently being exposed through task status, but with the move to YARN, it will be a necessity. It would be great if we could do this in a modern way to avoid the screen-scraping, etc. currently used to get information from most other Hadoop project's web interfaces. The coordinator could announce its address at the beginning or via status updates. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats
[ https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245480#comment-13245480 ] Avery Ching commented on GIRAPH-153: From what you've described, sounds good to me. In the worst case, we can change it to a submodule if that makes more sense in the future. I would like to use a similar approach for https://issues.apache.org/jira/browse/GIRAPH-93, as Jakob mentioned. HBase/Accumulo Input and Output formats --- Key: GIRAPH-153 URL: https://issues.apache.org/jira/browse/GIRAPH-153 Project: Giraph Issue Type: New Feature Components: bsp Affects Versions: 0.1.0 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB Reporter: Brian Femiano Four abstract classes that wrap their respective delegate input/output formats for easy hooks into vertex input format subclasses. I've included some sample programs that show two very simple graph algorithms. I have a graph generator that builds out a very simple directed structure, starting with a few 'root' nodes. Root nodes are defined as nodes which are not listed as a child anywhere in the graph. Algorithm 1) AccumuloRootMarker.java -- Accumulo as read/write source. Every vertex starts thinking it's a root. At superstep 0, send a message down to each child as a non-root notification. After superstep 1, only root nodes will have never been messaged. Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by bundling the notification logic followed by root node propagation. Once we've marked the appropriate nodes as roots, tell every child which roots it can be traced back to via one or more spanning trees. This will take N + 2 supersteps where N is the maximum number of hops from any root to any leaf, plus 2 supersteps for the initial root flagging. I've included all relevant code plus DistributedCacheHelper.java for recursive cache file and archive searches. It is more hadoop centric than giraph, but these jobs use it so I figured why not commit here. These have been tested through local JobRunner, pseudo-distributed on the aforementioned hardware, and full distributed on EC2. More details in the comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-141) mulitgraph support in giraph
[ https://issues.apache.org/jira/browse/GIRAPH-141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245484#comment-13245484 ] Avery Ching commented on GIRAPH-141: Yes, I also think this is an important feature. Anyone want to work on it? =) mulitgraph support in giraph Key: GIRAPH-141 URL: https://issues.apache.org/jira/browse/GIRAPH-141 Project: Giraph Issue Type: Improvement Components: graph Reporter: André Kelpe The current vertex API only supports simple graphs, meaning that there can only ever be one edge between two vertices. Many graphs like the road network are in fact multigraphs, where many edges can connect two vertices at the same time. Support for this could be added by introducing an IteratorEdgeWritable getEdgeValue() or a similar construct. Maybe introducing a slim object like a Connector between the edge and the vertex is also a good idea, so that you could do something like: {code} for (final ConnectorEdgeWritable, VertexWritable conn: getEdgeValues(){ final EdgeWritable edge = conn.getEdge(); final VertexWritable otherVertex = conn.getOther(); doInterestingStuff(otherVertex); doMoreInterestingStuff(edge); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-169) How to close all child when a job finished?
[ https://issues.apache.org/jira/browse/GIRAPH-169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245488#comment-13245488 ] Avery Ching commented on GIRAPH-169: This is a simple case. I'll try and see if I can replicate it sometime this week. Feel free to bug me if I forget. =) How to close all child when a job finished? --- Key: GIRAPH-169 URL: https://issues.apache.org/jira/browse/GIRAPH-169 Project: Giraph Issue Type: Improvement Components: mapreduce Affects Versions: 0.2.0 Environment: sles 11 x64,jdk 1.6,hadoop 0.20.205.0,1 Master and 8 slaves, Reporter: Jianfeng Qian Priority: Minor I ran pagerank at hadoop 0.20.205.0. When the job finished,the child in slaves didn't quit immediately and sometimes they never quit and I have to kill them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-169) How to close all child when a job finished?
[ https://issues.apache.org/jira/browse/GIRAPH-169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13240234#comment-13240234 ] Avery Ching commented on GIRAPH-169: Looks like the worker log got cut off? Also, what version of Hadoop is this? Does it work with different numbers of workers? How to close all child when a job finished? --- Key: GIRAPH-169 URL: https://issues.apache.org/jira/browse/GIRAPH-169 Project: Giraph Issue Type: Improvement Components: mapreduce Affects Versions: 0.2.0 Environment: sles 11 x64,jdk 1.6,hadoop 0.20.205.0,1 Master and 8 slaves, Reporter: Jianfeng Qian Priority: Minor I ran pagerank at hadoop 0.20.205.0. When the job finished,the child in slaves didn't quit immediately and sometimes they never quit and I have to kill them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-169) How to close all child when a job finished?
[ https://issues.apache.org/jira/browse/GIRAPH-169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13240112#comment-13240112 ] Avery Ching commented on GIRAPH-169: How many task trackers do you have? Are you seeing any errors? Is the job completing successfully? I'm guessing that the job isn't completing successfully, since everything should be cleaned up. How to close all child when a job finished? --- Key: GIRAPH-169 URL: https://issues.apache.org/jira/browse/GIRAPH-169 Project: Giraph Issue Type: Improvement Components: mapreduce Affects Versions: 0.2.0 Environment: sles 11 x64,jdk 1.6,hadoop 0.20.205.0,1 Master and 8 slaves, Reporter: Jianfeng Qian Priority: Minor I ran pagerank at hadoop 0.20.205.0. When the job finished,the child in slaves didn't quit immediately and sometimes they never quit and I have to kill them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-169) How to close all child when a job finished?
[ https://issues.apache.org/jira/browse/GIRAPH-169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13240161#comment-13240161 ] Avery Ching commented on GIRAPH-169: Do you have the logs of the workers? I'd like to see why they can't exit. How to close all child when a job finished? --- Key: GIRAPH-169 URL: https://issues.apache.org/jira/browse/GIRAPH-169 Project: Giraph Issue Type: Improvement Components: mapreduce Affects Versions: 0.2.0 Environment: sles 11 x64,jdk 1.6,hadoop 0.20.205.0,1 Master and 8 slaves, Reporter: Jianfeng Qian Priority: Minor I ran pagerank at hadoop 0.20.205.0. When the job finished,the child in slaves didn't quit immediately and sometimes they never quit and I have to kill them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-159) Case insensitive file/directory name matching will produce errors on M/R jar unpack.
[ https://issues.apache.org/jira/browse/GIRAPH-159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237909#comment-13237909 ] Avery Ching commented on GIRAPH-159: +1. I left out your GiraphRunner changes since they were fixed by earlier JIRAs, but verified both the problem and the solution you proposed. Looks good! Thanks for the fix! Committing. Case insensitive file/directory name matching will produce errors on M/R jar unpack. - Key: GIRAPH-159 URL: https://issues.apache.org/jira/browse/GIRAPH-159 Project: Giraph Issue Type: Bug Components: build Affects Versions: 0.2.0 Environment: OSX 10.6.8 Reporter: Brian Femiano Attachments: GIRAPH-159.patch, compile.xml This only seems to affect platforms where there can be a file/directory naming conflicts from case insensitive matches. I was able to reproduce running the pseudo-distributed unit tests within OSX. This has affected other projects: https://issues.apache.org/jira/browse/MAHOUT-780 I've been able to reproduce this on my local OSX install with the following error: https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/a201218000e956d3/cc6eca3ef9f80ff8 Since LICENSE.txt contains the same content as the file LICENSE, I propose we exclude any LICENSE matches found in the unpacked dependency jars when the maven assembly phase hits 'jar-with-dependencies'. I have a patch which moves the 'jar-with-dependencies' descriptor to an external compile.xml file which has the proper excludes. This might also come in handy down the road should any additional tweaks be needed to the compile phase. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats
[ https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237951#comment-13237951 ] Avery Ching commented on GIRAPH-153: 34 MB is huge. Can we do something like make the dependency scope provided and then use the distributed cache for unittests? HBase/Accumulo Input and Output formats --- Key: GIRAPH-153 URL: https://issues.apache.org/jira/browse/GIRAPH-153 Project: Giraph Issue Type: New Feature Components: bsp Affects Versions: 0.1.0 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB Reporter: Brian Femiano Four abstract classes that wrap their respective delegate input/output formats for easy hooks into vertex input format subclasses. I've included some sample programs that show two very simple graph algorithms. I have a graph generator that builds out a very simple directed structure, starting with a few 'root' nodes. Root nodes are defined as nodes which are not listed as a child anywhere in the graph. Algorithm 1) AccumuloRootMarker.java -- Accumulo as read/write source. Every vertex starts thinking it's a root. At superstep 0, send a message down to each child as a non-root notification. After superstep 1, only root nodes will have never been messaged. Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by bundling the notification logic followed by root node propagation. Once we've marked the appropriate nodes as roots, tell every child which roots it can be traced back to via one or more spanning trees. This will take N + 2 supersteps where N is the maximum number of hops from any root to any leaf, plus 2 supersteps for the initial root flagging. I've included all relevant code plus DistributedCacheHelper.java for recursive cache file and archive searches. It is more hadoop centric than giraph, but these jobs use it so I figured why not commit here. These have been tested through local JobRunner, pseudo-distributed on the aforementioned hardware, and full distributed on EC2. More details in the comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats
[ https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237468#comment-13237468 ] Avery Ching commented on GIRAPH-153: Brian, could you make it a single patch for us to take a look at? I'm excited to see this work. HBase/Accumulo Input and Output formats --- Key: GIRAPH-153 URL: https://issues.apache.org/jira/browse/GIRAPH-153 Project: Giraph Issue Type: New Feature Components: bsp Affects Versions: 0.1.0 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB Reporter: Brian Femiano Four abstract classes that wrap their respective delegate input/output formats for easy hooks into vertex input format subclasses. I've included some sample programs that show two very simple graph algorithms. I have a graph generator that builds out a very simple directed structure, starting with a few 'root' nodes. Root nodes are defined as nodes which are not listed as a child anywhere in the graph. Algorithm 1) AccumuloRootMarker.java -- Accumulo as read/write source. Every vertex starts thinking it's a root. At superstep 0, send a message down to each child as a non-root notification. After superstep 1, only root nodes will have never been messaged. Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by bundling the notification logic followed by root node propagation. Once we've marked the appropriate nodes as roots, tell every child which roots it can be traced back to via one or more spanning trees. This will take N + 2 supersteps where N is the maximum number of hops from any root to any leaf, plus 2 supersteps for the initial root flagging. I've included all relevant code plus DistributedCacheHelper.java for recursive cache file and archive searches. It is more hadoop centric than giraph, but these jobs use it so I figured why not commit here. These have been tested through local JobRunner, pseudo-distributed on the aforementioned hardware, and full distributed on EC2. More details in the comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-144) GiraphJob should not extend Job (users should not be able to call Job methods like waitForCompletion or setMapper..etc)
[ https://issues.apache.org/jira/browse/GIRAPH-144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237597#comment-13237597 ] Avery Ching commented on GIRAPH-144: Ping, anyone? I'd like to close this out, one way or another. GiraphJob should not extend Job (users should not be able to call Job methods like waitForCompletion or setMapper..etc) Key: GIRAPH-144 URL: https://issues.apache.org/jira/browse/GIRAPH-144 Project: Giraph Issue Type: Bug Reporter: Dave Assignee: Avery Ching Attachments: GIRAPH-144.patch Original Estimate: 24h Remaining Estimate: 24h -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-167) mvn -Phadoop_non_secure clean verify fails
[ https://issues.apache.org/jira/browse/GIRAPH-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13236414#comment-13236414 ] Avery Ching commented on GIRAPH-167: +1, Commited, thanks for fixing this. mvn -Phadoop_non_secure clean verify fails -- Key: GIRAPH-167 URL: https://issues.apache.org/jira/browse/GIRAPH-167 Project: Giraph Issue Type: Bug Affects Versions: 0.2.0 Reporter: Eugene Koontz Assignee: Eugene Koontz Labels: build, hadoop Attachments: GIRAPH-167.patch The {{hadoop_non_secure}} profile, which uses hadoop 0.20.2, is failing to compile: {code} [ERROR] COMPILATION ERROR : [INFO] - [ERROR] /Users/ekoontz/giraph/target/munged/main/org/apache/giraph/comm/RPCCommunications.java:[184,48] cannot find symbol symbol : variable versionID location: class org.apache.giraph.comm.RPCCommunicationsI,V,E,M [INFO] 1 error {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-161) Handling null messages and edges when initializing IntIntNullIntVertex
[ https://issues.apache.org/jira/browse/GIRAPH-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13234139#comment-13234139 ] Avery Ching commented on GIRAPH-161: +1. There are 5 checkstyle violations from GIRAPH-156, but this isn't the cause. Committing, thanks Dionysios! Handling null messages and edges when initializing IntIntNullIntVertex -- Key: GIRAPH-161 URL: https://issues.apache.org/jira/browse/GIRAPH-161 Project: Giraph Issue Type: Bug Components: graph Affects Versions: 0.1.0 Reporter: Dionysios Logothetis Attachments: GIRAPH-161.patch The initialize() method in org.apache.giraph.graph.IntIntNullIntVertex should handle null messages or null edges. Especially initializing with null messages is a common case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-162) BspCase.setup() should catch FileNotFoundException thrown from org.apache.hadoop.fs.FileSystem.listStatus()
[ https://issues.apache.org/jira/browse/GIRAPH-162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13234164#comment-13234164 ] Avery Ching commented on GIRAPH-162: Looks good. +1. I'm committing. BspCase.setup() should catch FileNotFoundException thrown from org.apache.hadoop.fs.FileSystem.listStatus() --- Key: GIRAPH-162 URL: https://issues.apache.org/jira/browse/GIRAPH-162 Project: Giraph Issue Type: Bug Components: test Affects Versions: 0.2.0 Reporter: Eugene Koontz Fix For: 0.2.0 Attachments: GIRAPH-162.patch In hadoop trunk, org.apache.hadoop.fs.FileSystem.listStatus() is declared to throws both FileNotFoundException and IOException. The former (FileNotFoundException) is currently not caught when BspCase.setup() looks for the GiraphJob.ZOOKEEPER_MANAGER_DIR_DEFAULT directory in order to delete it. The listStatus() call throws FileNotException if this directory does not exist and causes several tests to fail when using Hadoop trunk. This exception should be caught and ignored during setup(), since it's not an error for this directory not to exist. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-159) Case insensitive file/directory name matching will produce errors on M/R jar unpack.
[ https://issues.apache.org/jira/browse/GIRAPH-159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13234177#comment-13234177 ] Avery Ching commented on GIRAPH-159: Brian, can you show me how to recreate this issue on OSX? Case insensitive file/directory name matching will produce errors on M/R jar unpack. - Key: GIRAPH-159 URL: https://issues.apache.org/jira/browse/GIRAPH-159 Project: Giraph Issue Type: Bug Components: build Affects Versions: 0.2.0 Environment: OSX 10.6.8 Reporter: Brian Femiano Priority: Minor Attachments: GIRAPH-159.patch, compile.xml This only seems to affect platforms where there can be a file/directory naming conflicts from case insensitive matches. I was able to reproduce running the pseudo-distributed unit tests within OSX. This has affected other projects: https://issues.apache.org/jira/browse/MAHOUT-780 I've been able to reproduce this on my local OSX install with the following error: https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/a201218000e956d3/cc6eca3ef9f80ff8 Since LICENSE.txt contains the same content as the file LICENSE, I propose we exclude any LICENSE matches found in the unpacked dependency jars when the maven assembly phase hits 'jar-with-dependencies'. I have a patch which moves the 'jar-with-dependencies' descriptor to an external compile.xml file which has the proper excludes. This might also come in handy down the road should any additional tweaks be needed to the compile phase. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-164) fix 5 Line is longer than 80 characters style errors in GiraphRunner
[ https://issues.apache.org/jira/browse/GIRAPH-164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13234843#comment-13234843 ] Avery Ching commented on GIRAPH-164: +1, thanks guys. Committing. fix 5 Line is longer than 80 characters style errors in GiraphRunner -- Key: GIRAPH-164 URL: https://issues.apache.org/jira/browse/GIRAPH-164 Project: Giraph Issue Type: Bug Affects Versions: 0.2.0 Reporter: Eugene Koontz Priority: Trivial Fix For: 0.2.0 Attachments: GIRAPH-164.patch {code} file name=/Users/ekoontz/giraph/src/main/java/org/apache/giraph/GiraphRunner.java error line=155 severity=error message=Line is longer than 80 characters. source=com.puppycrawl.tools.checkstyle.checks.sizes.LineLengthCheck/ error line=156 severity=error message=Line is longer than 80 characters. source=com.puppycrawl.tools.checkstyle.checks.sizes.LineLengthCheck/ error line=158 severity=error message=Line is longer than 80 characters. source=com.puppycrawl.tools.checkstyle.checks.sizes.LineLengthCheck/ error line=161 severity=error message=Line is longer than 80 characters. source=com.puppycrawl.tools.checkstyle.checks.sizes.LineLengthCheck/ /file {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-154) Worker ports are not synched properly with its peers
[ https://issues.apache.org/jira/browse/GIRAPH-154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232012#comment-13232012 ] Avery Ching commented on GIRAPH-154: Nice work Zhiwei (+1), I verified it as well and committed. Will close once Hudson verifies as well. Worker ports are not synched properly with its peers Key: GIRAPH-154 URL: https://issues.apache.org/jira/browse/GIRAPH-154 Project: Giraph Issue Type: Bug Components: bsp Affects Versions: 0.2.0 Reporter: Zhiwei Gu Assignee: Zhiwei Gu Attachments: GIRAPH-154.patch When worker trying multiple ports to setup the rpc server, the final port is not synched with it's peer workers properly, and resulted in peer workers send message to the default port. Here is some logs: Base port: 34900 log for worker 161: IPC Server handler 98 on 36061: starting BasicRPCCommunications: Started RPC communication server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:36061 with 100 handlers and 199 flush threads on bind attempt 1 IPC Server handler 99 on 36061: starting setup: Registering health of this worker... getJobState: Job state already exists (/_hadoopBsp/job_201203130609_14838/_masterJobState) getApplicationAttempt: Node /_hadoopBsp/job_201203130609_14838/_applicationAttemptsDir already exists! getApplicationAttempt: Node /_hadoopBsp/job_201203130609_14838/_applicationAttemptsDir already exists! registerHealth: Created my health node for attempt=0, superstep=-1 with /_hadoopBsp/job_201203130609_14838/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/gsta32085.tan.ygrid.yahoo.com_161 and workerInfo= Worker(hostname=gsta32085.tan.ygrid.yahoo.com, MRpartition=161, port=35061) process: partitionAssignmentsReadyChanged (partitions are assigned) startSuperstep: Ready for computation on superstep -1 since worker selection and vertex range assignments are done in /_hadoopBsp/job_201203130609_14838/_applicationAttemptsDir/0/_superstepDir/-1/_partitionAssignments Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 0 time(s). Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 1 time(s). Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 2 time(s). Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 3 time(s). Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 4 time(s). Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 5 time(s). Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 6 time(s). Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 7 time(s). Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 8 time(s). Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 9 time(s). Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 10 time(s). Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 11 time(s). Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 12 time(s). Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 13 time(s). Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 14 time(s). Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 15 time(s). Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 16 time(s). Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 17 time(s). Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 18 time(s). Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 19 time(s). Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 20 time(s). Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 21 time(s). Retrying connect to server:
[jira] [Commented] (GIRAPH-156) Users should be able to set simple 'custom arguments' via org.apache.giraph.GiraphRunner
[ https://issues.apache.org/jira/browse/GIRAPH-156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232014#comment-13232014 ] Avery Ching commented on GIRAPH-156: +1, looks good. It would be great if you could wrap LOG.info with if (LOG.isInfoEnabled()), before committing. There are some other places in this file as well without the LOG enabled wrap. You can either make that change here or someone else can do it in another patch. Users should be able to set simple 'custom arguments' via org.apache.giraph.GiraphRunner Key: GIRAPH-156 URL: https://issues.apache.org/jira/browse/GIRAPH-156 Project: Giraph Issue Type: Improvement Components: conf and scripts Affects Versions: 0.1.0 Reporter: Sebastian Schelter Assignee: Sebastian Schelter Attachments: GIRAPH-156-1.patch, GIRAPH-156.patch Some vertices need custom arguments to run. The SimpleShortestPathsVertex for example needs to know the source vertex for the computation which is saved in the job's Configuration as _SimpleShortestPathsVertex.sourceId_. Users should be able to apply such simple custom arguments via GiraphRunner. I propose to add a new option _--customArguments_ where users can supply arguments in the form _param1=value1,param2=value2_ for this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-156) Users should be able to set simple 'custom arguments' via org.apache.giraph.GiraphRunner
[ https://issues.apache.org/jira/browse/GIRAPH-156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13230942#comment-13230942 ] Avery Ching commented on GIRAPH-156: I think this makes sense. Go for it. =) Users should be able to set simple 'custom arguments' via org.apache.giraph.GiraphRunner Key: GIRAPH-156 URL: https://issues.apache.org/jira/browse/GIRAPH-156 Project: Giraph Issue Type: Improvement Components: conf and scripts Affects Versions: 0.1.0 Reporter: Sebastian Schelter Assignee: Sebastian Schelter Some vertices need custom arguments to run. The SimpleShortestPathsVertex for example needs to know the source vertex for the computation which is saved in the job's Configuration as _SimpleShortestPathsVertex.sourceId_. Users should be able to apply such simple custom arguments via GiraphRunner. I propose to add a new option _--customArguments_ where users can supply arguments in the form _param1=value1,param2=value2_ for this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats
[ https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229042#comment-13229042 ] Avery Ching commented on GIRAPH-153: Brian, this is an awesome contribution and a lot of code. I'm really sorry that it took me so long to look at this. Is there any change that you could add some simple unittests for your formats? TestJsonBase64Format.java is an example that might be easy to adapt for your formats. Also, I just created a page for how to contribute. https://cwiki.apache.org/confluence/display/GIRAPH/How+to+Contribute Have you run 'mvn verify'? Thanks! HBase/Accumulo Input and Output formats --- Key: GIRAPH-153 URL: https://issues.apache.org/jira/browse/GIRAPH-153 Project: Giraph Issue Type: New Feature Components: bsp Affects Versions: 0.1.0 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB Reporter: Brian Femiano Attachments: AccumuloRootMarker.java, AccumuloRootMarkerInputFormat.java, AccumuloRootMarkerOutputFormat.java, AccumuloVertexInputFormat.java, AccumuloVertexOutputFormat.java, ComputeIsRoot.java, DistributedCacheHelper.java, HBaseVertexInputFormat.java, HBaseVertexOutputFormat.java, IdentifyAndMarkRoots.java, SetLongWritable.java, SetTextWritable.java, TableRootMarker.java, TableRootMarkerInputFormat.java, TableRootMarkerOutputFormat.java Four abstract classes that wrap their respective delegate input/output formats for easy hooks into vertex input format subclasses. I've included some sample programs that show two very simple graph algorithms. I have a graph generator that builds out a very simple directed structure, starting with a few 'root' nodes. Root nodes are defined as nodes which are not listed as a child anywhere in the graph. Algorithm 1) AccumuloRootMarker.java -- Accumulo as read/write source. Every vertex starts thinking it's a root. At superstep 0, send a message down to each child as a non-root notification. After superstep 1, only root nodes will have never been messaged. Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by bundling the notification logic followed by root node propagation. Once we've marked the appropriate nodes as roots, tell every child which roots it can be traced back to via one or more spanning trees. This will take N + 2 supersteps where N is the maximum number of hops from any root to any leaf, plus 2 supersteps for the initial root flagging. I've included all relevant code plus DistributedCacheHelper.java for recursive cache file and archive searches. It is more hadoop centric than giraph, but these jobs use it so I figured why not commit here. These have been tested through local JobRunner, pseudo-distributed on the aforementioned hardware, and full distributed on EC2. More details in the comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-144) GiraphJob should not extend Job (users should not be able to call Job methods like waitForCompletion or setMapper..etc)
[ https://issues.apache.org/jira/browse/GIRAPH-144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13224096#comment-13224096 ] Avery Ching commented on GIRAPH-144: @Jakob, any more thoughts? GiraphJob should not extend Job (users should not be able to call Job methods like waitForCompletion or setMapper..etc) Key: GIRAPH-144 URL: https://issues.apache.org/jira/browse/GIRAPH-144 Project: Giraph Issue Type: Bug Reporter: Dave Assignee: Avery Ching Attachments: GIRAPH-144.patch Original Estimate: 24h Remaining Estimate: 24h -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-85) Simplify return expression in RPCCommunications::getRPCProxy
[ https://issues.apache.org/jira/browse/GIRAPH-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217096#comment-13217096 ] Avery Ching commented on GIRAPH-85: --- I just looked at your patch in Eclipse and see that there is now a warning due to Type safety: Unchecked cast from VersionedProtocol to CommunicationsInterfaceI,V,E,M. We can either keep this the way it was before, or add @SuppressWarnings(unchecked) to the method. I don't have a strong opinion here. Anyone else care to comment? Simplify return expression in RPCCommunications::getRPCProxy Key: GIRAPH-85 URL: https://issues.apache.org/jira/browse/GIRAPH-85 Project: Giraph Issue Type: Improvement Affects Versions: 0.2.0 Reporter: Jakob Homan Labels: newbie Fix For: 0.2.0 Attachments: GIRAPH-85.patch, GIRAPH-85.patch Twice in RPCCommunications::getRPCProxy a local variable, proxy, is created and immediately returned. We can simplify this to just return the value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-87) Simplify boolean expression in BspService::checkpointFrequencyMet
[ https://issues.apache.org/jira/browse/GIRAPH-87?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216365#comment-13216365 ] Avery Ching commented on GIRAPH-87: --- +1 Thanks Eli, I committed on your behalf. Simplify boolean expression in BspService::checkpointFrequencyMet - Key: GIRAPH-87 URL: https://issues.apache.org/jira/browse/GIRAPH-87 Project: Giraph Issue Type: Improvement Affects Versions: 0.2.0 Reporter: Jakob Homan Assignee: Eli Reisman Labels: newbie Attachments: GIRAPH-87.patch, GIRAPH-87.patch {noformat}if (superstep firstCheckpoint) { return false; } else if (((superstep - firstCheckpoint) % checkpointFrequency) == 0) { return true; } else { return false; }{noformat} can be simplified to just return the result of the else if evaluation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-85) Simplify return expression in RPCCommunications::getRPCProxy
[ https://issues.apache.org/jira/browse/GIRAPH-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216047#comment-13216047 ] Avery Ching commented on GIRAPH-85: --- please make sure it passes 'mvn verify' as well. That will check rat and checkstyle. Simplify return expression in RPCCommunications::getRPCProxy Key: GIRAPH-85 URL: https://issues.apache.org/jira/browse/GIRAPH-85 Project: Giraph Issue Type: Improvement Affects Versions: 0.2.0 Reporter: Jakob Homan Labels: newbie Fix For: 0.2.0 Attachments: GIRAPH-85.patch Twice in RPCCommunications::getRPCProxy a local variable, proxy, is created and immediately returned. We can simplify this to just return the value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-40) Adding checkstyle enforcement of Giraph code conventions
[ https://issues.apache.org/jira/browse/GIRAPH-40?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13209727#comment-13209727 ] Avery Ching commented on GIRAPH-40: --- Can another committer please look at this as per Jakob's request? Adding checkstyle enforcement of Giraph code conventions Key: GIRAPH-40 URL: https://issues.apache.org/jira/browse/GIRAPH-40 Project: Giraph Issue Type: New Feature Reporter: Avery Ching Assignee: Avery Ching Priority: Minor Attachments: GIRAPH-40.2.patch, GIRAPH-40.3.patch, GIRAPH-40.patch, GIRAPH-40.patch Now that we have some code conventions (see GIRAPH-21), we should enforce them with a maven checkstyle plugin. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-40) Adding checkstyle enforcement of Giraph code conventions
[ https://issues.apache.org/jira/browse/GIRAPH-40?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13209801#comment-13209801 ] Avery Ching commented on GIRAPH-40: --- Thanks so much for the reviews Jakob and Sebastian. It's committed. @Sebastian, 'mvn compile' and 'mvn package' will succeed with violations. Anything using 'verify', i.e. 'mvn verify' or 'mvn install' will hit problems with checkstyle and rat. Adding checkstyle enforcement of Giraph code conventions Key: GIRAPH-40 URL: https://issues.apache.org/jira/browse/GIRAPH-40 Project: Giraph Issue Type: New Feature Reporter: Avery Ching Assignee: Avery Ching Priority: Minor Attachments: GIRAPH-40.2.patch, GIRAPH-40.3.patch, GIRAPH-40.patch, GIRAPH-40.patch Now that we have some code conventions (see GIRAPH-21), we should enforce them with a maven checkstyle plugin. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-150) PageRankBenchmark accesses wrong conf after GiraphJob is created
[ https://issues.apache.org/jira/browse/GIRAPH-150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210025#comment-13210025 ] Avery Ching commented on GIRAPH-150: By the way, here was the full stack trace: hadoop jar target/giraph-0.2-SNAPSHOT-jar-with-dependencies.jar org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -v -V 50 -w 3 -c 1 Exception in thread main java.lang.NullPointerException at org.apache.giraph.benchmark.PageRankBenchmark.run(PageRankBenchmark.java:127) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.giraph.benchmark.PageRankBenchmark.main(PageRankBenchmark.java:162) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) After this fix, it works. PageRankBenchmark accesses wrong conf after GiraphJob is created Key: GIRAPH-150 URL: https://issues.apache.org/jira/browse/GIRAPH-150 Project: Giraph Issue Type: Bug Reporter: Avery Ching Assignee: Avery Ching Attachments: GIRAPH-150.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-40) Adding checkstyle enforcement of Giraph code conventions
[ https://issues.apache.org/jira/browse/GIRAPH-40?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207188#comment-13207188 ] Avery Ching commented on GIRAPH-40: --- So for the first example, we need to follow that format, or else checkstyle will mark it an error. For the second examples, checkstyle doesn't seem to enforce the line wrap indent. So we need to still keep an eye out for those issues. Adding checkstyle enforcement of Giraph code conventions Key: GIRAPH-40 URL: https://issues.apache.org/jira/browse/GIRAPH-40 Project: Giraph Issue Type: New Feature Reporter: Avery Ching Assignee: Avery Ching Priority: Minor Attachments: GIRAPH-40.patch Now that we have some code conventions (see GIRAPH-21), we should enforce them with a maven checkstyle plugin. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-40) Adding checkstyle enforcement of Giraph code conventions
[ https://issues.apache.org/jira/browse/GIRAPH-40?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207203#comment-13207203 ] Avery Ching commented on GIRAPH-40: --- I'm not a checkstyle expert, but I don't think so. I can play around with trying to fix that. Or we can fix in another issue. I should be done with this patch today. Adding checkstyle enforcement of Giraph code conventions Key: GIRAPH-40 URL: https://issues.apache.org/jira/browse/GIRAPH-40 Project: Giraph Issue Type: New Feature Reporter: Avery Ching Assignee: Avery Ching Priority: Minor Attachments: GIRAPH-40.patch Now that we have some code conventions (see GIRAPH-21), we should enforce them with a maven checkstyle plugin. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-148) giraph-site.xml needs Apache header
[ https://issues.apache.org/jira/browse/GIRAPH-148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207211#comment-13207211 ] Avery Ching commented on GIRAPH-148: +1. giraph-site.xml needs Apache header --- Key: GIRAPH-148 URL: https://issues.apache.org/jira/browse/GIRAPH-148 Project: Giraph Issue Type: Bug Components: conf and scripts Affects Versions: 0.2.0 Reporter: Jakob Homan Assignee: Jakob Homan Fix For: 0.2.0 Attachments: GIRAPH-148-b.patch, GIRAPH-148.patch I forgot to add the license to the conf file and now rat is failing... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-40) Adding checkstyle enforcement of Giraph code conventions
[ https://issues.apache.org/jira/browse/GIRAPH-40?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13206291#comment-13206291 ] Avery Ching commented on GIRAPH-40: --- Thank you for the feedback Claudio. I'll continue to transition the other files and submit a final patch unless anyone has any objections. Adding checkstyle enforcement of Giraph code conventions Key: GIRAPH-40 URL: https://issues.apache.org/jira/browse/GIRAPH-40 Project: Giraph Issue Type: New Feature Reporter: Avery Ching Assignee: Avery Ching Priority: Minor Attachments: GIRAPH-40.patch Now that we have some code conventions (see GIRAPH-21), we should enforce them with a maven checkstyle plugin. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-139) Change PageRankBenchmark to be accessible via bin/giraph
[ https://issues.apache.org/jira/browse/GIRAPH-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205598#comment-13205598 ] Avery Ching commented on GIRAPH-139: +1 Looks good to me. Change PageRankBenchmark to be accessible via bin/giraph Key: GIRAPH-139 URL: https://issues.apache.org/jira/browse/GIRAPH-139 Project: Giraph Issue Type: Improvement Affects Versions: 0.2.0 Reporter: Jakob Homan Assignee: Jakob Homan Fix For: 0.2.0 Attachments: GIRAPH-139-b.patch, GIRAPH-139.patch Currently the PageRankBenchmark has its own main and tool implementation and is difficult to access from the bin/giraph script. It would be better if everything were accessible via bin/giraph. The benchmark is particularly problematic because it uses inner classes for its two actual Vertex implementations, which have to be specified on the command line as their .class name(ie org.apache.giraph.benchmark.PageRankBenchmark$PageRankHashMapVertex) rather than just with dots, as one would expect. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-148) giraph-site.xml needs Apache header
[ https://issues.apache.org/jira/browse/GIRAPH-148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205947#comment-13205947 ] Avery Ching commented on GIRAPH-148: Jakob, this header is formatted slightly differently from the one in pom.xml and the .java files we have. giraph-site.xml needs Apache header --- Key: GIRAPH-148 URL: https://issues.apache.org/jira/browse/GIRAPH-148 Project: Giraph Issue Type: Bug Components: conf and scripts Affects Versions: 0.2.0 Reporter: Jakob Homan Assignee: Jakob Homan Fix For: 0.2.0 Attachments: GIRAPH-148.patch I forgot to add the license to the conf file and now rat is failing... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-40) Adding checkstyle enforcement of Giraph code conventions
[ https://issues.apache.org/jira/browse/GIRAPH-40?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205976#comment-13205976 ] Avery Ching commented on GIRAPH-40: --- Here are some examples of one problem: Checkstyle doesn't seem to be able to handle single indent versus double indent of 2 spaces when appropriate. The below examples are what Checkstyle wants to have us do. {noformat} @Override public BasicVertexLongWritable, DoubleWritable, DoubleWritable, M getCurrentVertex() throws IOException, InterruptedException { @Override public VertexReaderLongWritable, DoubleWritable, DoubleWritable, M createVertexReader(InputSplit split, TaskAttemptContext context) throws IOException { {noformat} Also, checkstyle won't enforce indenting after a line wrap. So both of these examples are passing checkstyle. {noformat} aggregateVertices = configuration.getLong( PseudoRandomVertexInputFormat.AGGREGATE_VERTICES, 0); aggregateVertices = configuration.getLong( PseudoRandomVertexInputFormat.AGGREGATE_VERTICES, 0); {noformat} That being said, I think this is the right thing to do and we can make some sacrifices to have better, more uniform code. Please let me know your thoughts. Adding checkstyle enforcement of Giraph code conventions Key: GIRAPH-40 URL: https://issues.apache.org/jira/browse/GIRAPH-40 Project: Giraph Issue Type: New Feature Reporter: Avery Ching Assignee: Avery Ching Priority: Minor Attachments: GIRAPH-40.patch Now that we have some code conventions (see GIRAPH-21), we should enforce them with a maven checkstyle plugin. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-148) giraph-site.xml needs Apache header
[ https://issues.apache.org/jira/browse/GIRAPH-148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205992#comment-13205992 ] Avery Ching commented on GIRAPH-148: It will with checkstyle (see GIRAPH-40). We will need to pick one or the other. I don't have a strong preference. giraph-site.xml needs Apache header --- Key: GIRAPH-148 URL: https://issues.apache.org/jira/browse/GIRAPH-148 Project: Giraph Issue Type: Bug Components: conf and scripts Affects Versions: 0.2.0 Reporter: Jakob Homan Assignee: Jakob Homan Fix For: 0.2.0 Attachments: GIRAPH-148.patch I forgot to add the license to the conf file and now rat is failing... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-142) _hadoopBsp should be prefixable via configuration
[ https://issues.apache.org/jira/browse/GIRAPH-142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205047#comment-13205047 ] Avery Ching commented on GIRAPH-142: Looks fine, could we just add a check somewhere that the path must start with / and throw an exception explaining to the user the problem? _hadoopBsp should be prefixable via configuration - Key: GIRAPH-142 URL: https://issues.apache.org/jira/browse/GIRAPH-142 Project: Giraph Issue Type: Improvement Affects Versions: 0.1.0 Reporter: Jakob Homan Assignee: Jakob Homan Fix For: 0.2.0 Attachments: GIRAPH-142.patch In multitennant zookeeper clusters, it would be good to be able to specify the base directory that's created for the _hadoopBsp znodes. This would also fix the issue we have with creating that directory in the source root during tests. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-139) Change PageRankBenchmark to be accessible via bin/giraph
[ https://issues.apache.org/jira/browse/GIRAPH-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203900#comment-13203900 ] Avery Ching commented on GIRAPH-139: I agree the main() and run() code should be deprecated, but preferably after giraph-examples.jar is ready =). Change PageRankBenchmark to be accessible via bin/giraph Key: GIRAPH-139 URL: https://issues.apache.org/jira/browse/GIRAPH-139 Project: Giraph Issue Type: Improvement Affects Versions: 0.2.0 Reporter: Jakob Homan Assignee: Jakob Homan Fix For: 0.2.0 Attachments: GIRAPH-139.patch Currently the PageRankBenchmark has its own main and tool implementation and is difficult to access from the bin/giraph script. It would be better if everything were accessible via bin/giraph. The benchmark is particularly problematic because it uses inner classes for its two actual Vertex implementations, which have to be specified on the command line as their .class name(ie org.apache.giraph.benchmark.PageRankBenchmark$PageRankHashMapVertex) rather than just with dots, as one would expect. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-139) Change PageRankBenchmark to be accessible via bin/giraph
[ https://issues.apache.org/jira/browse/GIRAPH-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203967#comment-13203967 ] Avery Ching commented on GIRAPH-139: sounds good to me. Change PageRankBenchmark to be accessible via bin/giraph Key: GIRAPH-139 URL: https://issues.apache.org/jira/browse/GIRAPH-139 Project: Giraph Issue Type: Improvement Affects Versions: 0.2.0 Reporter: Jakob Homan Assignee: Jakob Homan Fix For: 0.2.0 Attachments: GIRAPH-139.patch Currently the PageRankBenchmark has its own main and tool implementation and is difficult to access from the bin/giraph script. It would be better if everything were accessible via bin/giraph. The benchmark is particularly problematic because it uses inner classes for its two actual Vertex implementations, which have to be specified on the command line as their .class name(ie org.apache.giraph.benchmark.PageRankBenchmark$PageRankHashMapVertex) rather than just with dots, as one would expect. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-144) GiraphJob should not extend Job (users should not be able to call Job methods like waitForCompletion or setMapper..etc)
[ https://issues.apache.org/jira/browse/GIRAPH-144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204213#comment-13204213 ] Avery Ching commented on GIRAPH-144: I'm working on this, should have a fix by tonight. GiraphJob should not extend Job (users should not be able to call Job methods like waitForCompletion or setMapper..etc) Key: GIRAPH-144 URL: https://issues.apache.org/jira/browse/GIRAPH-144 Project: Giraph Issue Type: Bug Reporter: Dave Assignee: Avery Ching Original Estimate: 24h Remaining Estimate: 24h -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-136) Erorr message for bin/giraph could be improved
[ https://issues.apache.org/jira/browse/GIRAPH-136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199585#comment-13199585 ] Avery Ching commented on GIRAPH-136: +1, much better. First try. $ ./bin/giraph target/giraph-0.1-SNAPSHOT-jar-with-dependencies.jar No lib directory, assuming dev environment No HADOOP_CONF_DIR set, using /conf ./bin/giraph: line 112: /bin/hadoop: No such file or directory ./bin/giraph: line 112: exec: /bin/hadoop: cannot execute: No such file or directory Second try after setting HADOOP_CONF_DIR. $ ./bin/giraph target/giraph-0.1-SNAPSHOT-jar-with-dependencies.jar No lib directory, assuming dev environment HADOOP_CONF_DIR=/Users/aching/Avery/Work/source/hadoop-0.20.203.0/conf usage: org.apache.giraph.GiraphRunner [-aw arg] [-c arg] [-h] [-if arg] [-ip arg] [-of arg] [-op arg] [-q] [-w arg] [-wc arg] -aw,--aggregatorWriter arg AggregatorWriter class -c,--combiner argVertexCombiner class -h,--help Help -if,--inputFormat argGraph inputformat -ip,--inputPath arg Graph input path -of,--outputFormat arg Graph outputformat -op,--outputPath arg Graph output path -q,--quiet Quiet output -w,--workers arg Number of workers -wc,--workerContext arg WorkerContext class Erorr message for bin/giraph could be improved -- Key: GIRAPH-136 URL: https://issues.apache.org/jira/browse/GIRAPH-136 Project: Giraph Issue Type: Improvement Affects Versions: 0.1.0 Reporter: Jakob Homan Assignee: Jakob Homan Fix For: 0.2.0 Attachments: GIRAPH-136-b.patch, GIRAPH-136.patch Currently when one just runs bin/giraph without the required jar, the message isn't very helpful: {noformat}[tardis giraph-0.1]$ bin/giraph Can't find user jar to execute.{noformat} It would be better to have a more in-depth message explaining Giraph and what is expected. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-40) Adding checkstyle enforcement of Giraph code conventions
[ https://issues.apache.org/jira/browse/GIRAPH-40?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199938#comment-13199938 ] Avery Ching commented on GIRAPH-40: --- By the way Claudio, I think several IDEs have support for checkstyle (i.e. Eclipse and Intellij). Adding checkstyle enforcement of Giraph code conventions Key: GIRAPH-40 URL: https://issues.apache.org/jira/browse/GIRAPH-40 Project: Giraph Issue Type: New Feature Reporter: Avery Ching Assignee: Avery Ching Priority: Minor Now that we have some code conventions (see GIRAPH-21), we should enforce them with a maven checkstyle plugin. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-40) Adding checkstyle enforcement of Giraph code conventions
[ https://issues.apache.org/jira/browse/GIRAPH-40?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199935#comment-13199935 ] Avery Ching commented on GIRAPH-40: --- What I meant and am working on, is failing the build when checkstyle errors occur. For now, I am going through and fixing the checkstyle.xml I have and adjusting code, then will submit a patch with the checkstyle.xml and all the warnings fixed. Then, going forward, we will not have to deal with many formatting issues for patches. Well, that's the goal anyway. =) Adding checkstyle enforcement of Giraph code conventions Key: GIRAPH-40 URL: https://issues.apache.org/jira/browse/GIRAPH-40 Project: Giraph Issue Type: New Feature Reporter: Avery Ching Priority: Minor Now that we have some code conventions (see GIRAPH-21), we should enforce them with a maven checkstyle plugin. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-136) Erorr message for bin/giraph could be improved
[ https://issues.apache.org/jira/browse/GIRAPH-136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13198070#comment-13198070 ] Avery Ching commented on GIRAPH-136: Okay, hopefully that gets addressed at some point. +1 for this patch. It would be nice to see help in the message on what is required to get this to work. Another way would be to add to https://cwiki.apache.org/confluence/display/GIRAPH/Index. Erorr message for bin/giraph could be improved -- Key: GIRAPH-136 URL: https://issues.apache.org/jira/browse/GIRAPH-136 Project: Giraph Issue Type: Improvement Affects Versions: 0.1.0 Reporter: Jakob Homan Assignee: Jakob Homan Fix For: 0.2.0 Attachments: GIRAPH-136.patch Currently when one just runs bin/giraph without the required jar, the message isn't very helpful: {noformat}[tardis giraph-0.1]$ bin/giraph Can't find user jar to execute.{noformat} It would be better to have a more in-depth message explaining Giraph and what is expected. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-136) Erorr message for bin/giraph could be improved
[ https://issues.apache.org/jira/browse/GIRAPH-136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197650#comment-13197650 ] Avery Ching commented on GIRAPH-136: I can verify the error message is improved, but perhaps the message could be improved further? Is there any example usage you have for using this? aching:~/git/git_svn_giraph_trunk$ ./bin/giraph Usage: giraph [-DHadoop property] jar containing vertex parameters to jar At a minimum one must provide a path to the jar containing the vertex to be executed. aching:~/git/git_svn_giraph_trunk$ ./bin/giraph target/giraph-0.1-SNAPSHOT-jar-with-dependencies.jar Can't find Giraph jar. Erorr message for bin/giraph could be improved -- Key: GIRAPH-136 URL: https://issues.apache.org/jira/browse/GIRAPH-136 Project: Giraph Issue Type: Improvement Affects Versions: 0.1.0 Reporter: Jakob Homan Assignee: Jakob Homan Fix For: 0.2.0 Attachments: GIRAPH-136.patch Currently when one just runs bin/giraph without the required jar, the message isn't very helpful: {noformat}[tardis giraph-0.1]$ bin/giraph Can't find user jar to execute.{noformat} It would be better to have a more in-depth message explaining Giraph and what is expected. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-134) Fix NOTICE and LICENSE files
[ https://issues.apache.org/jira/browse/GIRAPH-134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13196727#comment-13196727 ] Avery Ching commented on GIRAPH-134: +1, looks good! Excited for the release. Fix NOTICE and LICENSE files Key: GIRAPH-134 URL: https://issues.apache.org/jira/browse/GIRAPH-134 Project: Giraph Issue Type: Improvement Components: documentation Affects Versions: 0.1.0 Reporter: Jakob Homan Assignee: Jakob Homan Fix For: 0.1.0 Attachments: GIRAPH-134.patch Currently both the LICENSE and NOTICE file are out of compliance for an Apache release. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-128) RPC port from BasicRPCCommunications should be only a starting port, and retried
[ https://issues.apache.org/jira/browse/GIRAPH-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195286#comment-13195286 ] Avery Ching commented on GIRAPH-128: Thanks for taking a look. I forgot to upload the original (rb only for that one), hence part 2. The main motivation for the obscure case is that it would make debugging simpler. We often see errors like serverX:portY, and can use portY to figure out which mapper to look at. For example, currently the default starts at 3. If I see an error from 30001, then I know to go to mapper 1 to see it's problem. And so on and so forth. If I am running a 900 mapper job then if it's 31001 or 32001 then I still know to look at mapper partition 1. If instead I had a 100 as the constant, then if it's 30101, I have to check both mapper 1 and mapper 101. With up to 20 retries per port, we can handle at least 20 simultaneous jobs running on a single machine that have the same mapper partition id. First of, that is probably unlikely. But even if it does happen, 20 is probably more than an one machine would handle. By the way, port retries are very fast (so I wouldn't worry to much about collisions). Let me resubmit without the whitespace changes and making MAX_BIND_ATTEMPTS configurable. RPC port from BasicRPCCommunications should be only a starting port, and retried Key: GIRAPH-128 URL: https://issues.apache.org/jira/browse/GIRAPH-128 Project: Giraph Issue Type: Improvement Affects Versions: 0.1.0 Reporter: Avery Ching Assignee: Avery Ching Attachments: GIRAPH-128.2.patch Currently Giraph uses a basic port + the task partition to get the RPC port. This doesn't work well for when there are multiple Giraph jobs running simultaneously in the same Hadoop cluster (port conflict). At the same time, it is nice to use this simple algorithm because it makes it very easy to debug problems (you can find the troublesome mapper from the RPC port name). I will be proposing a simple scheme to retry with another port. I will round the total number of mappers up to the nearest power of 10 (let's that that number Z). Then I will increment the port number by Z, retrying up to 20 tries. If you have enough ports, this scheme would guarantee that up to 20 mappers / node would be supported. It should be sufficient for most clusters. At the same time, we still maintain the easy debugging method since you it's still easy to figure out the mapper partition from the port (port % Z = map partition). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-128) RPC port from BasicRPCCommunications should be only a starting port, and retried
[ https://issues.apache.org/jira/browse/GIRAPH-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193579#comment-13193579 ] Avery Ching commented on GIRAPH-128: Anyone want to review? I think this will be very useful to get in before the release since it lets users run multiple Giraph jobs on the same cluster simultaneously a lot easier... RPC port from BasicRPCCommunications should be only a starting port, and retried Key: GIRAPH-128 URL: https://issues.apache.org/jira/browse/GIRAPH-128 Project: Giraph Issue Type: Improvement Affects Versions: 0.1.0 Reporter: Avery Ching Assignee: Avery Ching Attachments: GIRAPH-128.2.patch Currently Giraph uses a basic port + the task partition to get the RPC port. This doesn't work well for when there are multiple Giraph jobs running simultaneously in the same Hadoop cluster (port conflict). At the same time, it is nice to use this simple algorithm because it makes it very easy to debug problems (you can find the troublesome mapper from the RPC port name). I will be proposing a simple scheme to retry with another port. I will round the total number of mappers up to the nearest power of 10 (let's that that number Z). Then I will increment the port number by Z, retrying up to 20 tries. If you have enough ports, this scheme would guarantee that up to 20 mappers / node would be supported. It should be sufficient for most clusters. At the same time, we still maintain the easy debugging method since you it's still easy to figure out the mapper partition from the port (port % Z = map partition). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-130) Fix Javadoc warnings
[ https://issues.apache.org/jira/browse/GIRAPH-130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192388#comment-13192388 ] Avery Ching commented on GIRAPH-130: It would be great to enforce this checking somehow to prevent it from happening at all. Fix Javadoc warnings Key: GIRAPH-130 URL: https://issues.apache.org/jira/browse/GIRAPH-130 Project: Giraph Issue Type: Bug Reporter: Jakob Homan Priority: Minor Labels: newbie We've accumulated a fair number of javadoc warnings recently: {noformat}[WARNING] Javadoc Warnings [WARNING] /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java:146: warning - Tag @link: reference not found: GraphPartitioner [WARNING] /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java:129: warning - @param argument superstep is not a parameter name. [WARNING] /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java:146: warning - Tag @link: reference not found: GraphPartitioner [WARNING] /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/comm/CommunicationsInterface.java:84: warning - @param argument vertexIndex is not a parameter name. [WARNING] /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/comm/CommunicationsInterface.java:84: warning - @param argument msgList is not a parameter name. [WARNING] /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/comm/VertexIdMessagesList.java:32: warning - Tag @link: reference not found: VertexIdMessage [WARNING] /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/VertexCombiner.java:46: warning - Tag @link: reference not found: messages [WARNING] /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/VertexCombiner.java:46: warning - Tag @link: reference not found: messages [WARNING] /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/AggregatorWriter.java:60: warning - @param argument map is not a parameter name. [WARNING] /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java:146: warning - Tag @link: reference not found: GraphPartitioner [WARNING] /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java:146: warning - Tag @link: reference not found: GraphPartitioner [WARNING] /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/GiraphJob.java:432: warning - @param argument graphPartitionerClass is not a parameter name. [WARNING] /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/VertexCombiner.java:46: warning - Tag @link: reference not found: messages [WARNING] /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/partition/MasterGraphPartitioner.java:62: warning - Tag @link: reference not found: GraphPartitioner [WARNING] /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/partition/MasterGraphPartitioner.java:62: warning - Tag @link: reference not found: GraphPartitioner [WARNING] /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/partition/MasterGraphPartitioner.java:62: warning - @param argument availableWorkerInfos is not a parameter name. [WARNING] /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/partition/PartitionBalancer.java:176: warning - @param argument allPartitionStatsList is not a parameter name. [WARNING] /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/comm/VertexIdMessagesList.java:32: warning - Tag @link: reference not found: VertexIdMessage [WARNING] /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/comm/VertexIdMessagesList.java:32: warning - Tag @link: reference not found: VertexIdMessage [WARNING] /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java:146: warning - Tag @link: reference not found: GraphPartitioner [WARNING] /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/comm/VertexIdMessagesList.java:32: warning - Tag @link: reference not found: VertexIdMessage [WARNING] /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java:146: warning - Tag @link: reference not found: GraphPartitioner [WARNING] /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/comm/VertexIdMessagesList.java:32: warning - Tag @link: reference not found: VertexIdMessage {noformat} It would be good to fix these. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see:
[jira] [Commented] (GIRAPH-129) enable creation of javadoc and sources jars
[ https://issues.apache.org/jira/browse/GIRAPH-129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192887#comment-13192887 ] Avery Ching commented on GIRAPH-129: As long as mvn compile doesn't build the javadoc, I am happy. =) enable creation of javadoc and sources jars --- Key: GIRAPH-129 URL: https://issues.apache.org/jira/browse/GIRAPH-129 Project: Giraph Issue Type: Improvement Components: build Affects Versions: 0.1.0 Reporter: André Kelpe Assignee: André Kelpe Priority: Minor Attachments: GIRAPH-129.patch It is pretty useful to enable the creation if javadoc and sources jars during the build, so that people using IDEs like eclipse can easily jump into the code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-124) Combiner should return IterableM instead of M or null.
[ https://issues.apache.org/jira/browse/GIRAPH-124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190351#comment-13190351 ] Avery Ching commented on GIRAPH-124: Nice, Claudio. I haven't had a chance to fully test it, but wanted to give you some early feedback. 1) Some changes have messed up indenting a little (here are some examples) -public FloatWritable combine(LongWritable vertexIndex, +public IterableFloatWritable combine(LongWritable vertexIndex, IterableFloatWritable msgList) - public abstract M combine(I vertexIndex, + public abstract IterableM combine(I vertexIndex, IterableM messages) throws IOException; -M combinedMsg = combiner.combine(entry.getKey(), +IterableM messages = combiner.combine(entry.getKey(), entry.getValue()); -public IntWritable combine(LongWritable vertexIndex, +public IterableIntWritable combine(LongWritable vertexIndex, IterableIntWritable messages) 2) Should we make the requirement that the returned result has a size input size? I think the argument was that some classification of messages might not always reduce the number of messages? Perhaps =? Combiner should return IterableM instead of M or null. Key: GIRAPH-124 URL: https://issues.apache.org/jira/browse/GIRAPH-124 Project: Giraph Issue Type: Improvement Components: graph Affects Versions: 0.1.0 Reporter: Claudio Martella Attachments: GIRAPH-124.diff Currently VertexCombiner is expected to return a single message combining the input messages, or null in case no message should be sent. The new expected interface should return an IterableM, possibly empty. The number of elements in the returned Iterable is supposed to be smaller than the number of input messages, by the initial definition of a Combiner (defined as a function to reduce I/O by combining multiple messages into 1). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-127) Extending the API with a master.compute() function.
[ https://issues.apache.org/jira/browse/GIRAPH-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13189574#comment-13189574 ] Avery Ching commented on GIRAPH-127: I think this functionality is very useful and would actually replace a lot of the WorkerContext functionality. Sequential steps do need to be done between computations sometimes and Pick k random initial cluster centers is a good example. While WorkerContext allows us to do simple things, it is not as efficient for certain calculations (i.e. suppose all workers needed a global value from HDFS, it is cheaper to do once and broadcast the outcome rather than all workers hitting HDFS). Still, WorkerContext can be useful (say for dumping worker stats), so I wouldn't remove it, rather just give our users a broader choice on computation around supersteps. I see that the Master#compute() should have access to all aggregators to do its work. Overall, I like the idea and would definitely like to see how we can add this in. Getting the interface right will be a little hard I think, but we can iterate over it. Basically, from what Semih has said is that we gain 1) A clean way to do sequential computation between supersteps 2) Removing the extra superstep if we simulate this idea with a 'picked worker' Extending the API with a master.compute() function. --- Key: GIRAPH-127 URL: https://issues.apache.org/jira/browse/GIRAPH-127 Project: Giraph Issue Type: New Feature Components: bsp, examples, graph Reporter: Semih Salihoglu First of all, sorry for the long explanation to this feature. I want to expand the API of Giraph with a new function called master.compute(), that would get called at the master before each superstep and I will try to explain the purpose that it would serve with an example. Let's say we want to implement the following simplified version of the k-means clustering algorithm. Pseudocode below: * Input G(V, E), k, numEdgesThreshold, maxIterations * Algorithm: * int numEdgesCrossingClusters = Integer.MAX_INT; * int iterationNo = 0; * while ((numEdgesCrossingCluster numEdgesThreshold) iterationNo maxIterations) { *iterationNo++; *int[] clusterCenters = pickKClusterCenters(k, G); *findClusterCenters(G, clusterCenters); *numEdgesCrossingClusters = countNumEdgesCrossingClusters(); * } The algorithm goes through the following steps in iterations: 1) Pick k random initial cluster centers 2) Assign each vertex to the cluster center that it's closest to (in Giraph, this can be implemented in message passing similar to how ShortestPaths is implemented): 3) Count the nuimber of edges crossing clusters 4) Go back to step 1, if there are a lot of edges crossing clusters and we haven't exceeded maximum number of iterations yet. In an algorithm like this, step 2 and 3 are where most of the work happens and both parts have very neat message-passing implementations. I'll try to give an overview without going into the details. Let's say we define a Vertex in Giraph to hold a custom Writable object that holds 2 integer values and sends a message with upto 2 integer values. Step 2 is very similar to ShortestPaths algorithm and has two stages: In the first stage, each vertex checks to see whether or not it's one of the cluster centers. If so, it assigns itself the value (id, 0), otherwise it assigns itself (Null, Null). In the 2nd stage, the vertices assign themselves to the minimum distance cluster center by looking at their neighbors (cluster centers, distance) values (received as 2 integer messages) and their current values, and changing their values if they find a lower distance cluster center. This happens in x number of supersteps until every vertex converges. Step 3, counting the number of edges crossing clusters, is also very easy to implement in Giraph. Once each vertex has a cluster center, the number of edges crossing clusters can be counted by an aggregator, let's say called num-edges-crossing. It would again have two stages: First stage, every vertex just sends its cluster id to all its neighbors. Second stage, every vertex looks at their neighbors' cluster ids in the messages, and for each cluster id that is not equal to its own cluster id, it increments num-edges-crossing by 1. The other 2 steps, step 1 and 4, are very simple sequential computations. Step 1 just picks k random vertex ids and puts it into an aggregator. Step 4 just compares num-edges-crossing by a threshold and also checks whether or not the algorithm has exceeded maxIterations (not supersteps but iterations of going through Steps 1-4). With the current API, it's not clear where to do these computations. There is a per
[jira] [Commented] (GIRAPH-126) Use Collections.emptyList() in BasicRPCCommunications.java
[ https://issues.apache.org/jira/browse/GIRAPH-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188578#comment-13188578 ] Avery Ching commented on GIRAPH-126: Agree with Jakob. Thanks Andrei, every memory improvement helps! Use Collections.emptyList() in BasicRPCCommunications.java -- Key: GIRAPH-126 URL: https://issues.apache.org/jira/browse/GIRAPH-126 Project: Giraph Issue Type: Improvement Reporter: André Kelpe Assignee: André Kelpe Priority: Minor Attachments: GIRAPH-126.patch I am doing some tests with giraph and I am having some memory problems. While I was browsing through the codebase I saw that you are allocating a new ArrayList (which has an underlying array of 10 elements) for each Vertex, that has no Messages to be delivered. That's a waste of memory and time. This patch replaces it with the EMPTY_LIST of the Collections utility class. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-126) Use Collections.emptyList() in BasicRPCCommunications.java
[ https://issues.apache.org/jira/browse/GIRAPH-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188588#comment-13188588 ] Avery Ching commented on GIRAPH-126: Actually, looking at it some more, does this work? what happens when msgs.add(msg) is called on the empty list? We can also do this a different way (ie. msgs = new ARraylistM(1)). Use Collections.emptyList() in BasicRPCCommunications.java -- Key: GIRAPH-126 URL: https://issues.apache.org/jira/browse/GIRAPH-126 Project: Giraph Issue Type: Improvement Reporter: André Kelpe Assignee: André Kelpe Priority: Minor Attachments: GIRAPH-126.patch, GIRAPH-126.patch, GIRAPH-126.patch I am doing some tests with giraph and I am having some memory problems. While I was browsing through the codebase I saw that you are allocating a new ArrayList (which has an underlying array of 10 elements) for each Vertex, that has no Messages to be delivered. That's a waste of memory and time. This patch replaces it with the EMPTY_LIST of the Collections utility class. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-123) the wiki is not publicly accessible
[ https://issues.apache.org/jira/browse/GIRAPH-123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13184655#comment-13184655 ] Avery Ching commented on GIRAPH-123: Works for me. Thanks Jakob. the wiki is not publicly accessible --- Key: GIRAPH-123 URL: https://issues.apache.org/jira/browse/GIRAPH-123 Project: Giraph Issue Type: Bug Components: documentation Reporter: André Kelpe Assignee: Jakob Homan Priority: Minor When I try to read the documentation on the wiki I end up on a login screen. Can you please make the wiki open for the public. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-118) Clarify messages behavior in BasicVertex
[ https://issues.apache.org/jira/browse/GIRAPH-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181894#comment-13181894 ] Avery Ching commented on GIRAPH-118: +1, looks good! Clarify messages behavior in BasicVertex Key: GIRAPH-118 URL: https://issues.apache.org/jira/browse/GIRAPH-118 Project: Giraph Issue Type: Improvement Components: graph Affects Versions: 0.70.0 Reporter: Claudio Martella Assignee: Claudio Martella Priority: Minor Attachments: GIRAPH-118.diff, GIRAPH-119.diff initialize() can receive a null parameter for messages (at least that's what EdgeListVertex does). We should avoid that and pass an empty Iterable instead. That should be cheap for us inside of the InputFormat, just passing a static immutable empty list. setMessages(IterableM) should be changed to putMessages(IterableM). the set prefix suggests an assignment, while setMessages is used to transfer the messages to the internal datastructure the user is responsible for. putMessages() should clarify this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-119) VertexCombiner should work on IterableM instead of ListM
[ https://issues.apache.org/jira/browse/GIRAPH-119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181566#comment-13181566 ] Avery Ching commented on GIRAPH-119: Minor nit @Override public FloatWritable combine(LongWritable vertexIndex, - ListFloatWritable msgList) + IterableFloatWritable msgList) throws IOException { return null; } @@ -97,7 +97,7 @@ public class TestVertexTypes @Override public DoubleWritable combine(LongWritable vertexIndex, - ListDoubleWritable msgList) + IterableDoubleWritable msgList) throws IOException { return null; } probably should have changed msgList to messages or something like that. Not a big deal. =) VertexCombiner should work on IterableM instead of ListM Key: GIRAPH-119 URL: https://issues.apache.org/jira/browse/GIRAPH-119 Project: Giraph Issue Type: Improvement Components: graph Affects Versions: 0.70.0 Reporter: Claudio Martella Assignee: Claudio Martella Attachments: GIRAPH-119.diff Currently VertexCombiner expects a ListM. It should be refactored to IterableM to sync with Iterable-based BasicVertex messages logics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-118) Clarify messages behavior in BasicVertex
[ https://issues.apache.org/jira/browse/GIRAPH-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181570#comment-13181570 ] Avery Ching commented on GIRAPH-118: Seems reasonable. Please make sure to update the javadoc and the MutableVertex implementations. Clarify messages behavior in BasicVertex Key: GIRAPH-118 URL: https://issues.apache.org/jira/browse/GIRAPH-118 Project: Giraph Issue Type: Improvement Components: graph Affects Versions: 0.70.0 Reporter: Claudio Martella Assignee: Claudio Martella Priority: Minor initialize() can receive a null parameter for messages (at least that's what EdgeListVertex does). We should avoid that and pass an empty Iterable instead. That should be cheap for us inside of the InputFormat, just passing a static immutable empty list. setMessages(IterableM) should be changed to putMessages(IterableM). the set prefix suggests an assignment, while setMessages is used to transfer the messages to the internal datastructure the user is responsible for. putMessages() should clarify this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-111) Refactor I/O to be independent of Map/Reduce
[ https://issues.apache.org/jira/browse/GIRAPH-111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173512#comment-13173512 ] Avery Ching commented on GIRAPH-111: I'm not clear on why this is necessary. Couldn't we simply call the I/O methods as Hadoop would when we're not using Hadoop? Am I missing something? Refactor I/O to be independent of Map/Reduce Key: GIRAPH-111 URL: https://issues.apache.org/jira/browse/GIRAPH-111 Project: Giraph Issue Type: Improvement Components: graph Reporter: Ed Kohlwey The I/O mechanisms should probably be abstracted entirely from Map/Reduce in order to support making Giraph an independent framework. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-108) Refactor code to run independently of Map/Reduce
[ https://issues.apache.org/jira/browse/GIRAPH-108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173931#comment-13173931 ] Avery Ching commented on GIRAPH-108: Actually, I'll let Jakob take a first crack at looking at this since he's got some expertise in the area. Refactor code to run independently of Map/Reduce Key: GIRAPH-108 URL: https://issues.apache.org/jira/browse/GIRAPH-108 Project: Giraph Issue Type: Improvement Components: graph Reporter: Ed Kohlwey Attachments: GIRAPH-108 It would be nice for Giraph to be refactored such that the code could eventually be run outside of map/reduce. This will allow people to write drivers that can run in the cool new resource manager frameworks like Mesos and YARN, and eventually let the application's code base evolve to be independent of map/reduce. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-73) A little refactoring
[ https://issues.apache.org/jira/browse/GIRAPH-73?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13171797#comment-13171797 ] Avery Ching commented on GIRAPH-73: --- Most of these changes look good, but I'm not sure I agree with the use of Closeables.closeQuietly() in ZooKeeperManager.java since if we do get an IOException I think we'd want the program to die as soon as possible. A little refactoring Key: GIRAPH-73 URL: https://issues.apache.org/jira/browse/GIRAPH-73 Project: Giraph Issue Type: Improvement Affects Versions: 0.70.0 Reporter: Sebastian Schelter Priority: Minor Attachments: GIRAPH-73-2.patch, GIRAPH-73.patch Hi, I'm currently reading Giraph's sources and starting to play with it. I fixed some small things along the way (like making sure writers are closed, exceptions are logged, etc.), thought that maybe helpful. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-105) BspServiceMaster.checkWorkers() should return empty lists instead of null
[ https://issues.apache.org/jira/browse/GIRAPH-105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13171898#comment-13171898 ] Avery Ching commented on GIRAPH-105: Thanks for the reworking! +1. BspServiceMaster.checkWorkers() should return empty lists instead of null - Key: GIRAPH-105 URL: https://issues.apache.org/jira/browse/GIRAPH-105 Project: Giraph Issue Type: Bug Affects Versions: 0.70.0 Reporter: Sebastian Schelter Priority: Minor Attachments: GIRAPH-105-2.patch, GIRAPH-105.patch BspServiceMaster.checkWorkers() is invoked in BspServiceMaster.coordinateSuperstep() and in BspServiceMaster.createInputSplits(). Both check for an empty list to fail the job in case something has gone wrong. However, checkWorkers() returns null in case of problems, causing an NPE in the calling code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-93) Hive input / output format
[ https://issues.apache.org/jira/browse/GIRAPH-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170831#comment-13170831 ] Avery Ching commented on GIRAPH-93: --- Argh, since HCatalog is not published to maven, this is a bit of a problem. We could add a system dependency, but it's a little messy (yucky warnings). I can get it to build with my compiled jar, but get warnings like: [WARNING] 'dependencies.dependency.systemPath' for org.apache.hcatalog:hcatalog:jar should not point at files within the project directory, ${basedir}/lib/hcatalog-0.3.0-dev.jar will be unresolvable by dependent projects @ line 527, column 19 [WARNING] [WARNING] It is highly recommended to fix these problems because they threaten the stability of your build. [WARNING] [WARNING] For this reason, future Maven versions might no longer support building such malformed projects. [WARNING] [INFO] [INFO] [INFO] Building Apache Incubator Giraph 0.70 [INFO] [WARNING] The POM for org.apache.hadoop:hadoop-core:jar:0.20.1 is missing, no dependency information available [WARNING] The POM for org.apache.hadoop:hadoop-core:jar:0.20.3-CDH3-SNAPSHOT is missing, no dependency information available [WARNING] Could not transfer metadata asm:asm/maven-metadata.xml from/to local.repository (file:../../local.repository/trunk): No connector available to access repository local.repository (file:../../local.repository/trunk) of type legacy using the available factories WagonRepositoryConnectorFactory [INFO] [INFO] --- maven-enforcer-plugin:1.0.1:enforce (enforce-maven) @ giraph --- [INFO] [INFO] --- maven-resources-plugin:2.4.3:resources (default-resources) @ giraph --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] skip non existing resourceDirectory /home/aching/giraph/src/main/resources [INFO] [INFO] --- maven-compiler-plugin:2.3.2:compile (default-compile) @ giraph --- [INFO] Compiling 122 source files to /home/aching/giraph/target/classes [INFO] [INFO] --- maven-assembly-plugin:2.2:single (build-fat-jar) @ giraph --- [WARNING] Missing POM for org.apache.hadoop:hadoop-core:jar:0.20.1 [WARNING] Missing POM for org.apache.hadoop:hadoop-core:jar:0.20.3-CDH3-SNAPSHOT Maybe wait on HCATALOG-132? Hive input / output format -- Key: GIRAPH-93 URL: https://issues.apache.org/jira/browse/GIRAPH-93 Project: Giraph Issue Type: New Feature Reporter: Avery Ching Assignee: Avery Ching It would be great to be able to load/store data from/to Hive tables. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-57) Add new RPC call (putVertexIdMessagesList) to batch putMsgList RPCs together
[ https://issues.apache.org/jira/browse/GIRAPH-57?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170857#comment-13170857 ] Avery Ching commented on GIRAPH-57: --- Emergency fix to allow trunk to compile on certain platforms: [ERROR] /home/hudson/hudson-slave/workspace/Giraph-trunk-Commit/trunk/src/main/java/org/apache/giraph/comm/VertexIdMessages.java:[66,45] type parameters of II cannot be determined; no unique maximal instance exists for type variable I with upper bounds I,org.apache.hadoop.io.WritableComparable == --- incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/VertexIdMessages.java (original) +++ incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/VertexIdMessages.java Fri Dec 16 09:26:44 2011 @@ -63,7 +63,7 @@ public class VertexIdMessagesI extends @Override public void readFields(DataInput input) throws IOException { -vertexId = BspUtils.createVertexIndex(getConf()); +vertexId = BspUtils.IcreateVertexIndex(getConf()); Add new RPC call (putVertexIdMessagesList) to batch putMsgList RPCs together Key: GIRAPH-57 URL: https://issues.apache.org/jira/browse/GIRAPH-57 Project: Giraph Issue Type: Improvement Reporter: Jakob Homan Assignee: Avery Ching Attachments: GIRAPH-57.diff, GIRAPH-57.diff.2 Right now messages are sent to a vertex one at a time. It would be good to have a putMsgs call that could send messages to multiple vertices (all hosted on the same worker). We'd save a huge number of individual RPC calls at the expense of having smaller calls with larger payloads. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-45) Improve the way to keep outgoing messages
[ https://issues.apache.org/jira/browse/GIRAPH-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13171073#comment-13171073 ] Avery Ching commented on GIRAPH-45: --- I think that a read messages-by-vertex at a time from disk will reduce memory pressure more than the partition-based storage. I'm assuming that key=vertex_id and value=message_list in your explanation. How do you keep the keys together in the file? For instance, suppose that you get the following tuples vertex_id, message_list 0, 2.0, 3.0 3, 1.0 7, 34.0 4, 23.0 3, 20.0 In a bad scenario, you have to spill to disk after each tuple. The files totally are out of order and your index vertex, bytes offset looks something like: 0, 0 3, 24 7, 40 4, 56 But if I'm understanding this scheme, wouldn't each vertex need to scan the entire file if the vertices keep coming and are totally random? I suppose that another way to do this is to use the partition-based method and add a small change. If the partition is deemed to large to load in memory and sort, it could be read and re-dumped into n files, where n is chosen such that there is a good chance that it produces small enough files so that every one of them can fit in memory at a time. This can be done recursively. Improve the way to keep outgoing messages - Key: GIRAPH-45 URL: https://issues.apache.org/jira/browse/GIRAPH-45 Project: Giraph Issue Type: Improvement Components: bsp Reporter: Hyunsik Choi Assignee: Hyunsik Choi As discussed in GIRAPH-12(http://goo.gl/CE32U), I think that there is a potential problem to cause out of memory when the rate of message generation is higher than the rate of message flush (or network bandwidth). To overcome this problem, we need more eager strategy for message flushing or some approach to spill messages into disk. The below link is Dmitriy's suggestion. https://issues.apache.org/jira/browse/GIRAPH-12?focusedCommentId=13116253page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13116253 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-45) Improve the way to keep outgoing messages
[ https://issues.apache.org/jira/browse/GIRAPH-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13171404#comment-13171404 ] Avery Ching commented on GIRAPH-45: --- Ah, thank you for clarifying that. The only minor downside is that a sorted map uses a bit more memory than a non-sorted one typically. But it's probably not too big a deal. Sounds like an idea certainly worth trying out Claudio =). Improve the way to keep outgoing messages - Key: GIRAPH-45 URL: https://issues.apache.org/jira/browse/GIRAPH-45 Project: Giraph Issue Type: Improvement Components: bsp Reporter: Hyunsik Choi Assignee: Hyunsik Choi As discussed in GIRAPH-12(http://goo.gl/CE32U), I think that there is a potential problem to cause out of memory when the rate of message generation is higher than the rate of message flush (or network bandwidth). To overcome this problem, we need more eager strategy for message flushing or some approach to spill messages into disk. The below link is Dmitriy's suggestion. https://issues.apache.org/jira/browse/GIRAPH-12?focusedCommentId=13116253page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13116253 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-45) Improve the way to keep outgoing messages
[ https://issues.apache.org/jira/browse/GIRAPH-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170613#comment-13170613 ] Avery Ching commented on GIRAPH-45: --- You might not need the BTree for indexing the destination vertices I think. Couldn't we use files to group the messages sent to the same partition? If you simply dump all the received vertex id, messages tuples to a file that is specific for a partition, we can simply load all the tuples for a single partition prior to computing on the worker and assign them to their destinations. I'm a little concerned that using an in-memory data structure to keep the message indices might be a little expensive (i.e. one BTree per vertex in your model if I'm understanding correctly). Regarding the streaming, I am not proposing to change the BSP model. I'm talking about sending the messages as we go along during the computation. Currently the messages are bulk sent at the end of the superstep. So rather than a bulk send, allow every worker to stream out some bunch of messages when under some pressure, rather than everything at the end. As far as detecting memory pressure, it looks like Runtime seems to do an okay job. If anyone knows anything better, that's cool too. You can look at MemoryUtils#getRuntimeMemoryStats() for a Runtime example. We'll need to define limits for memory pressure. Improve the way to keep outgoing messages - Key: GIRAPH-45 URL: https://issues.apache.org/jira/browse/GIRAPH-45 Project: Giraph Issue Type: Improvement Components: bsp Reporter: Hyunsik Choi Assignee: Hyunsik Choi As discussed in GIRAPH-12(http://goo.gl/CE32U), I think that there is a potential problem to cause out of memory when the rate of message generation is higher than the rate of message flush (or network bandwidth). To overcome this problem, we need more eager strategy for message flushing or some approach to spill messages into disk. The below link is Dmitriy's suggestion. https://issues.apache.org/jira/browse/GIRAPH-12?focusedCommentId=13116253page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13116253 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-93) Hive input / output format
[ https://issues.apache.org/jira/browse/GIRAPH-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170631#comment-13170631 ] Avery Ching commented on GIRAPH-93: --- Just wanted to update that I did get this to work with HCatalog a while ago. And amazingly it actually works! I'll put together a diff to getting this into Giraph. Hive input / output format -- Key: GIRAPH-93 URL: https://issues.apache.org/jira/browse/GIRAPH-93 Project: Giraph Issue Type: New Feature Reporter: Avery Ching Assignee: Avery Ching It would be great to be able to load/store data from/to Hive tables. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-80) Don't expose the list holding the messages in BasicVertex
[ https://issues.apache.org/jira/browse/GIRAPH-80?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170651#comment-13170651 ] Avery Ching commented on GIRAPH-80: --- By the way Sebastian, you can run the Hadoop tests against a single node Hadoop instance (I often do this on my laptop). It makes it much easier to run this test and takes me about 17 minutes or so. Not too bad. Don't expose the list holding the messages in BasicVertex - Key: GIRAPH-80 URL: https://issues.apache.org/jira/browse/GIRAPH-80 Project: Giraph Issue Type: Improvement Affects Versions: 0.70.0 Reporter: Sebastian Schelter I'm currently trying to implement my own memory efficient vertex (similar to LongDoubleFloatDoubleVertex) and ran into problems with getMsgList() This method returns a list pointing to the messages of the vertex and it is modified externally (BasicRPCCommunications calls clear() and addAll() e.g.). This makes it very hard to use something else than a java.util.List internally (LongDoubleFloatDoubleVertex hacked around this) and it is generally dangerous to have the internal state of an object be modified externally. It also makes the code harder to read and understand. I'd suggest to change the API to let a vertex handle the modifications itself internally (e.g. add something like pushMessages(...)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-103) Added properties for commonly used package version to pom.xml
[ https://issues.apache.org/jira/browse/GIRAPH-103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170803#comment-13170803 ] Avery Ching commented on GIRAPH-103: No one wants to take a quick look? It's very short, I promise... Added properties for commonly used package version to pom.xml - Key: GIRAPH-103 URL: https://issues.apache.org/jira/browse/GIRAPH-103 Project: Giraph Issue Type: Improvement Components: build Reporter: Avery Ching Priority: Trivial Attachments: GIRAPH-103.diff -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-45) Improve the way to keep outgoing messages
[ https://issues.apache.org/jira/browse/GIRAPH-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13169756#comment-13169756 ] Avery Ching commented on GIRAPH-45: --- I've been thinking about this a bit more. I don't think we actually need a database if we use disk friendly approach and take advantage of the knowledge of our system. Here is a rough proposal: There are two ways we can save memory here (out-of-core graph) and (out-of-core messages). In this way, we can use the memory as a cache rather than a totally in-memory database and messaging system. Here's how we can do the out-of-core graph: Workers already do the computation by partition. All partitions that are owned by the worker need to be processed and we want to minimize the amount of data loaded/stored to local disk (i.e. superstep.worker id.partition #.vertices). Local disk should be used here because it will be faster and no remote worker needs to directly access this data. Therefore the general algorithm would be for (partition : all in memory partitions) partition.computeAndGenerateOutgoingMessages() if (memoryPressure) partition.storeToFileSystem() for (partition : remaining in file system partitions) partition.loadFromFileSystem() partition.computeAndGenerateOutgoingMessages() if (memoryPressure) partition.storeToFileSystem() This should keep our partition cache as full as possible and have a minimal amount of loading/storing for partitions that can't fit in memory. Here's how we can do the out-of-core messaging: As the partitions are being processed by the workers, outgoing messages as kept in memory currently. They are flushed is a message list grows to a certain size. Otherwise, the messages are bulk sent at the end of the computation. What we can do is wait for a sendMessageReq and check for memory pressure. If memory pressure is an issue, then dump all the outgoing messages to HDFS files (i.e. superstep.worker id.partition #.outgoingMessages). Future sendMessageReq may be kept in memory or dumped to the same HDFS files if memory pressure is an issue. These HDFS files are closed prior to the flush. During the flush, the worker sends the in-memory messages as normal to the destinations as well as the filenames of the out-of-core messages to their respective owners. Note that the files are stored in HDFS to allow a remote worker the ability to load the messages as they see fit. Maybe reduce the replication factor to 2 by default for these files? This tactic should reduce memory usage on the destination worker as well, since the destination workers don't need to load the HDFS files until they are actually doing the computation for that partition. Checkpoints should be able to point to the out-of-core data as well to reduce the amount of data to store. Still, there is one more remaining piece (loading the graph). This can also run out of memory. Currently vertex lists are batched and sent to destination workers by partition. Partitions should have the ability to be incrementally dumped to local files on the destination if there is memory pressure. Then prior to the 1st superstep, each partition can be assembled (local files + any vertices stil in memory) and can use the out-of-core graph algorithm indicated above. This proposal should take advantage of large reads/writes so that we don't need a database. I will require out-of-core storage in the very near future as the graph i need to load will have billions of edges and I probably won't have enough nodes and memory to keep it all in core. Please let me know your thoughts on this approach. Improve the way to keep outgoing messages - Key: GIRAPH-45 URL: https://issues.apache.org/jira/browse/GIRAPH-45 Project: Giraph Issue Type: Improvement Components: bsp Reporter: Hyunsik Choi Assignee: Hyunsik Choi As discussed in GIRAPH-12(http://goo.gl/CE32U), I think that there is a potential problem to cause out of memory when the rate of message generation is higher than the rate of message flush (or network bandwidth). To overcome this problem, we need more eager strategy for message flushing or some approach to spill messages into disk. The below link is Dmitriy's suggestion. https://issues.apache.org/jira/browse/GIRAPH-12?focusedCommentId=13116253page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13116253 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-103) Added properties for commonly used package version to pom.xml
[ https://issues.apache.org/jira/browse/GIRAPH-103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13166430#comment-13166430 ] Avery Ching commented on GIRAPH-103: Also, I updated my affiliation to Facebook. Added properties for commonly used package version to pom.xml - Key: GIRAPH-103 URL: https://issues.apache.org/jira/browse/GIRAPH-103 Project: Giraph Issue Type: Improvement Components: build Reporter: Avery Ching Priority: Trivial Attachments: GIRAPH-103.diff -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-10) Aggregators are not exported
[ https://issues.apache.org/jira/browse/GIRAPH-10?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13162929#comment-13162929 ] Avery Ching commented on GIRAPH-10: --- Thanks for the revised diff. I have some suggestions, let me know what you think. First, you are missing org.apache.giraph.examples.SimpleAggregatorWriter in the diff. Hence I am getting errors in my build: [ERROR] /Users/aching/Avery/source/giraph_trunk/src/test/java/org/apache/giraph/TestBspBasic.java:[24,33] cannot find symbol symbol : class SimpleAggregatorWriter location: package org.apache.giraph.examples [ERROR] /Users/aching/Avery/source/giraph_trunk/src/test/java/org/apache/giraph/TestBspBasic.java:[355,37] cannot find symbol Also, your IDE is using tabs. The CODE_CONVENTIONS asks for spaces instead of tabs. Can you please convert all your tabs to spaces? In AggreatorWriter.java - Quite a few tabs in this file, please changes to spaces. - Indentation issues lines: 60 and greater in AggregatorWriter.java - line 52: The methods is called at the = This method is called at the - line 60: map is a bit non-descriptive here. Can you change it to something else, i.e. aggregatorNameValueMap or even just aggregatorMap? - line 64: successfull = successful TextAggregatorWriter.java -line 44: aggreatos - aggregators TestBspBasic.java - line 371: for (i=0; ; i++) { = for (i = 0; ; i++) { Aggregators are not exported Key: GIRAPH-10 URL: https://issues.apache.org/jira/browse/GIRAPH-10 Project: Giraph Issue Type: New Feature Reporter: Avery Ching Assignee: Claudio Martella Priority: Minor Attachments: GIRAPH-10.diff, GIRAPH-10.diff Currently, aggregator values cannot be saved after a Giraph job. There should be a way to do this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-10) Aggregators are not exported
[ https://issues.apache.org/jira/browse/GIRAPH-10?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163050#comment-13163050 ] Avery Ching commented on GIRAPH-10: --- Much improved Claudio. A couple more minor suggestions: Please add a javadoc comment for the class SimpleAggregatorWriter on what it does (given that it is in examples and users will be looking for help on what it is doing). TestBspBasic.java Before line 356: assertTrue(job.run(true)); Should add something like (as in other tests in that file): Path outputPath = new Path(/tmp/ + getCallingMethodName()); removeAndSetOutput(job, outputPath); If you don't do this, the next time someone adds another test to this dir and it doesn't set the output dir, it could potentially cause issues I think if they are relying on specific stuff in that dir. By the way, good bug fix in the test. If you agree and make those changes, this is an effective +1, please upload your final diff and feel free to commit. =) Aggregators are not exported Key: GIRAPH-10 URL: https://issues.apache.org/jira/browse/GIRAPH-10 Project: Giraph Issue Type: New Feature Reporter: Avery Ching Assignee: Claudio Martella Priority: Minor Attachments: GIRAPH-10.diff, GIRAPH-10.diff, GIRAPH-10.diff Currently, aggregator values cannot be saved after a Giraph job. There should be a way to do this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-10) Aggregators are not exported
[ https://issues.apache.org/jira/browse/GIRAPH-10?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13162640#comment-13162640 ] Avery Ching commented on GIRAPH-10: --- Hi Claudio, great stuff! A couple of questions. I downloaded GIRAPH-10.diff and think you are missing AggragatorWriter.java and TextAggregatorWriter.java from the diff. Also, I was thinking, shouldn't the default be to not write any aggregator data? Aggregators are not exported Key: GIRAPH-10 URL: https://issues.apache.org/jira/browse/GIRAPH-10 Project: Giraph Issue Type: New Feature Reporter: Avery Ching Assignee: Claudio Martella Priority: Minor Attachments: GIRAPH-10.diff Currently, aggregator values cannot be saved after a Giraph job. There should be a way to do this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-100) Data input sampling and testing improvements
[ https://issues.apache.org/jira/browse/GIRAPH-100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13161031#comment-13161031 ] Avery Ching commented on GIRAPH-100: Anyone? =) Data input sampling and testing improvements Key: GIRAPH-100 URL: https://issues.apache.org/jira/browse/GIRAPH-100 Project: Giraph Issue Type: New Feature Components: graph Reporter: Avery Ching Assignee: Avery Ching Attachments: GIRAPH-100.patch It would be really nice to help debug an application by limiting the input data (% of input splits, max vertices per input split). Also, it would be nice for the workers to provide a little more debugging info on how far along they are with processing the input data. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-100) Data input sampling and testing improvements
[ https://issues.apache.org/jira/browse/GIRAPH-100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13161219#comment-13161219 ] Avery Ching commented on GIRAPH-100: Sorry Jakob, I'll try to stop doing formatting changes. Habit, I suppose. In the future, I'll file separate issues for formatting cleanup. What's the point of the changes in TextVertexInputFormat method visibility? Are they related to this patch? No, I can remove it. Just a bit safer I guess since they should be protected. We're throwing a lot of Stringly typed exceptions. For more robust error handling and recovery, it may be good to strongly type these instead. Which exceptions are you referring to? re: SuperstepHashPartitionerFactory. Moving it out of test and into the example directory seems a bit counterproductive to me. It's a pathological implementation; wouldn't it be better to provide a more useful example, rather than one that's explicitly not meant to be used? Until we start jaring up things separately, currently the Hadoop unit test is broken when the SuperstepHashPartitionerFactory is not found. The right solution might be to create another jar that has the unittest classes and can be run as part of the Hadoop instance unittest. Can we do that in another issue? I agree that it isn't a good example, but it's still a good test since it guarantees partition movement between workers. Data input sampling and testing improvements Key: GIRAPH-100 URL: https://issues.apache.org/jira/browse/GIRAPH-100 Project: Giraph Issue Type: New Feature Components: graph Reporter: Avery Ching Assignee: Avery Ching Attachments: GIRAPH-100.patch It would be really nice to help debug an application by limiting the input data (% of input splits, max vertices per input split). Also, it would be nice for the workers to provide a little more debugging info on how far along they are with processing the input data. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-100) Data input sampling and testing improvements
[ https://issues.apache.org/jira/browse/GIRAPH-100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13161243#comment-13161243 ] Avery Ching commented on GIRAPH-100: Ah, I see. We should file another JIRA to create a GiraphException and the various types I suppose. Or do you want me to do it in this JIRA? I can put the SuperstepHashPartitionerFactory into another directory src/main/java/org/apache/giraph/integration/SuperstepHashPartitionerFactory.java I like the idea of mocking in general, but not sure how mocking can verify the behavior in this case. Probably leave it as an integration test only. IMO, we should file a separate JIRA for separating the tests into unittests (mocking, individual class tests) and integration tests, but integration tests can still be run locally and/or remote. Let me know what you think and I'll make the requested changes. Data input sampling and testing improvements Key: GIRAPH-100 URL: https://issues.apache.org/jira/browse/GIRAPH-100 Project: Giraph Issue Type: New Feature Components: graph Reporter: Avery Ching Assignee: Avery Ching Attachments: GIRAPH-100.patch It would be really nice to help debug an application by limiting the input data (% of input splits, max vertices per input split). Also, it would be nice for the workers to provide a little more debugging info on how far along they are with processing the input data. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-99) Make AdjacencyListVertexReader and its constructor public
[ https://issues.apache.org/jira/browse/GIRAPH-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155334#comment-13155334 ] Avery Ching commented on GIRAPH-99: --- Thanks Kohei, glad to see you working on Giraph! Make AdjacencyListVertexReader and its constructor public - Key: GIRAPH-99 URL: https://issues.apache.org/jira/browse/GIRAPH-99 Project: Giraph Issue Type: Wish Components: lib Reporter: Kohei Ozaki Assignee: Kohei Ozaki Priority: Minor Labels: patch Fix For: 0.70.0 Attachments: GIRAPH-99.diff Hi, I'd like to write a class inherited from AdjacencyListVertexReader to make a library using Giraph (like git.io/ALVR), but AdjacencyListVertexReader is a private class and its constructor are private. I guess making it public is useful to handle a more complex input format specified by the data structure of algorithms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-84) Simplify boolean expressions in BspRecordReader
[ https://issues.apache.org/jira/browse/GIRAPH-84?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153927#comment-13153927 ] Avery Ching commented on GIRAPH-84: --- Claudio, once a committer +1's something they can commit it can commit on behalf of the submitter. If it's another committer that submits, then typically, after the +1, the submitter will commit. Simplify boolean expressions in BspRecordReader --- Key: GIRAPH-84 URL: https://issues.apache.org/jira/browse/GIRAPH-84 Project: Giraph Issue Type: Improvement Reporter: Jakob Homan Assignee: Shaunak Kashyap Labels: newbie Twice in BspRecordReader boolean expressions are evaluated with == and can be simplified to just one liners or variable evaluation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-96) Support for Graphs with Huge adjacency lists
[ https://issues.apache.org/jira/browse/GIRAPH-96?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152203#comment-13152203 ] Avery Ching commented on GIRAPH-96: --- The general issue of overloading our memory and working out of core has been discussed a little in GIRAPH-45 as well. I suppose you could implement a BasicVertex that loaded everything on demand from HBase, but I suspect it would be a little slow, but depends on the application. Support for Graphs with Huge adjacency lists Key: GIRAPH-96 URL: https://issues.apache.org/jira/browse/GIRAPH-96 Project: Giraph Issue Type: Improvement Components: bsp Affects Versions: 0.70.0 Reporter: Arun Suresh Currently the vertex initialize() method is passed the complete adjacency list as a HashMap. All the current concrete implementations of Vertex iterate over the adjacency list and recreate new Data Structures within the Vertex instance to hold/manipulate the adjacency list. This would seize to be feasible once the size of the adjacency list becomes really huge. I propose storing the adjacency list and all vertex information (and incoming messages ?) in a distributed data store such as HBase. The adjacency list can be lazily loaded via HBase Scans. I was thinking of an HBase schema where the row Id is a concatenation of VertexID+OutboundVertexId with a single column containing the edge. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-84) Simplify boolean expressions in BspRecordReader
[ https://issues.apache.org/jira/browse/GIRAPH-84?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152488#comment-13152488 ] Avery Ching commented on GIRAPH-84: --- ternary is fine with me. I think we use it in the codebase. We should probably add it to the coding conventions...unless someone objects. Simplify boolean expressions in BspRecordReader --- Key: GIRAPH-84 URL: https://issues.apache.org/jira/browse/GIRAPH-84 Project: Giraph Issue Type: Improvement Reporter: Jakob Homan Assignee: Attila Csordas Labels: newbie Twice in BspRecordReader boolean expressions are evaluated with == and can be simplified to just one liners or variable evaluation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-68) Implement a Graph Generator
[ https://issues.apache.org/jira/browse/GIRAPH-68?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151405#comment-13151405 ] Avery Ching commented on GIRAPH-68: --- Looks good Hyunsik, a few comments. Probably want to add a javadoc comment for GraphGenerator Lines 40-41: Should have 8 space indenting Line 46: needs 4 more spaces Line 58: Over 80 chars So is the idea that PageRankBenchmark and RandomMessageBenchmark would use it? Would you like to modify them to do so? Implement a Graph Generator --- Key: GIRAPH-68 URL: https://issues.apache.org/jira/browse/GIRAPH-68 Project: Giraph Issue Type: New Feature Components: benchmark Affects Versions: 0.70.0 Reporter: Hyunsik Choi Assignee: Hyunsik Choi Attachments: GIRAPH-68_1.patch To provide users with benchmark environments and to deeply test the input/output system of giraph, we need a graph generator. We will enable the graph generator to generate various kinds of graph data sets by specifying a VertexInputFormat and a VertexOutputFormat. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-91) Large-memory improvements (Memory reduced vertex implementation, fast failure, added settings)
[ https://issues.apache.org/jira/browse/GIRAPH-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151603#comment-13151603 ] Avery Ching commented on GIRAPH-91: --- By the way, rb allows you to download the diff directly (so you don't have to worry about them staying in sync). https://reviews.apache.org/r/2868/diff/raw/ Large-memory improvements (Memory reduced vertex implementation, fast failure, added settings) --- Key: GIRAPH-91 URL: https://issues.apache.org/jira/browse/GIRAPH-91 Project: Giraph Issue Type: Improvement Reporter: Avery Ching Assignee: Avery Ching Attachments: GIRAPH-91.diff Current vertex implementation uses a HashMap for storing the edges, which is quite memory heavy for large graphs. The default settings in Giraph need to be improved for large graphs and heaps of 20G. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-92) Need outputformat for just vertex ID and value
[ https://issues.apache.org/jira/browse/GIRAPH-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151736#comment-13151736 ] Avery Ching commented on GIRAPH-92: --- I agree this could be useful. Couple of format errors: if (expr) + if(reverseOutput) { typo + public void testWithDifferentDelimieter() throws IOException, Interrupted needs one more space + public void testWithDifferentDelimieter() throws IOException, + InterruptedException { +Configuration conf = new Configuration(); Extra line break +writer.writeVertex(vertex); + + +verify(tw).write(expected, null); Need outputformat for just vertex ID and value -- Key: GIRAPH-92 URL: https://issues.apache.org/jira/browse/GIRAPH-92 Project: Giraph Issue Type: New Feature Components: lib Affects Versions: 0.70.0 Reporter: Jakob Homan Assignee: Jakob Homan Fix For: 0.70.0 Attachments: GIRAPH-92.patch We should have an text outputformat that just spits out the vertex id and value without its edges: {noformat}index.html 0.9423{noformat} This would be particularly helpful for further processing by, for instance, Pig. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-78) Be smarter about multiple instances of the same vertex
[ https://issues.apache.org/jira/browse/GIRAPH-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151782#comment-13151782 ] Avery Ching commented on GIRAPH-78: --- Actually the more I think about it, this might not be too useful unless you have large vertexId objects. I guess the idea would be to keep a cache, maybe in the GraphState or the WorkerContext. Be smarter about multiple instances of the same vertex -- Key: GIRAPH-78 URL: https://issues.apache.org/jira/browse/GIRAPH-78 Project: Giraph Issue Type: Improvement Reporter: Jakob Homan In a graph such as {noformat}a - b, z b - c, z c - a, z ... z{noformat} where vertices a,b,c and are hosted on one worker and z is hosted on another, it would be good to cache instances of z so a,b,c all point at the same instance, rather than generating multiple copies of the same remote vertex during vertex reading. This is less important with primitive types and the recent work done there, but very useful for more complex types. Since the vertex readers are in userland, it would be good to provide these facilities as a library implementing users can access. ] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira