[jira] [Updated] (GIRAPH-51) Provide unit testing tool for Giraph algorithms
[ https://issues.apache.org/jira/browse/GIRAPH-51?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated GIRAPH-51: - Attachment: GIRAPH-51-2.patch Hi Jakob, I moved InternalVertexRunner to the src as you suggested. I think we have to separate the unit tests into two categories: The first one would be testing single methods, usually invoking compute() and verifying the behavior of the vertex (I suppose that's what you aimed at with your last comment). To accomplish that one need not really run the system, it should be sufficient to injected mocked dependencies. I added two tests for SimpleShortestPathVertex as an example for such tests. I created a helper class for convenient mocking of dependencies like the hadoop configuration e.g. and for configuring the vertex. The second category, which InternalVertexRunner aims at would be something like a local integration test on toy data. It runs the system in a single JVM and executes the whole lifecycle of an algorithm (reading input from disk, running the supersteps, writing output etc). Although these tests are no compensation for real integration testing, they are often very helpful in finding subtle bugs, that normal unit testing cannot discover. Provide unit testing tool for Giraph algorithms --- Key: GIRAPH-51 URL: https://issues.apache.org/jira/browse/GIRAPH-51 Project: Giraph Issue Type: Improvement Reporter: Jakob Homan Assignee: Sebastian Schelter Attachments: GIRAPH-51-2.patch, GIRAPH-51.patch It would be nice to have a little tool, similar to MRUnit, that would allow Giraph application writers to quickly unit test their algorithms. The tool could take a Vertex implementation, a set of input and expected output and verify that after the specified number of supersteps, we've gotten what we expect. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-68) Implement a Graph Generator
[ https://issues.apache.org/jira/browse/GIRAPH-68?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151405#comment-13151405 ] Avery Ching commented on GIRAPH-68: --- Looks good Hyunsik, a few comments. Probably want to add a javadoc comment for GraphGenerator Lines 40-41: Should have 8 space indenting Line 46: needs 4 more spaces Line 58: Over 80 chars So is the idea that PageRankBenchmark and RandomMessageBenchmark would use it? Would you like to modify them to do so? Implement a Graph Generator --- Key: GIRAPH-68 URL: https://issues.apache.org/jira/browse/GIRAPH-68 Project: Giraph Issue Type: New Feature Components: benchmark Affects Versions: 0.70.0 Reporter: Hyunsik Choi Assignee: Hyunsik Choi Attachments: GIRAPH-68_1.patch To provide users with benchmark environments and to deeply test the input/output system of giraph, we need a graph generator. We will enable the graph generator to generate various kinds of graph data sets by specifying a VertexInputFormat and a VertexOutputFormat. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (GIRAPH-89) Remove debugging system.out from LongDoubleFloatDoubleVertex
[ https://issues.apache.org/jira/browse/GIRAPH-89?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shaunak Kashyap reassigned GIRAPH-89: - Assignee: Shaunak Kashyap Remove debugging system.out from LongDoubleFloatDoubleVertex Key: GIRAPH-89 URL: https://issues.apache.org/jira/browse/GIRAPH-89 Project: Giraph Issue Type: Bug Reporter: Jakob Homan Assignee: Shaunak Kashyap Priority: Minor Labels: newbie Line 137: {{System.out.println(in getNumVertices!);}} looks like a debugging line and should be removed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request: GIRAPH-89: Remove debugging system.out from LongDoubleFloatDoubleVertex
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2859/ --- Review request for giraph. Summary --- Removing System.out debugging statement. This addresses bug GIRAPH-89. https://issues.apache.org/jira/browse/GIRAPH-89 Diffs - http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/LongDoubleFloatDoubleVertex.java 1202868 Diff: https://reviews.apache.org/r/2859/diff Testing --- $ mvn test Thanks, shaunak
Re: Review Request: GIRAPH-89: Remove debugging system.out from LongDoubleFloatDoubleVertex
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2859/#review3304 --- Ship it! - Avery On 2011-11-16 20:20:09, shaunak wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2859/ --- (Updated 2011-11-16 20:20:09) Review request for giraph. Summary --- Removing System.out debugging statement. This addresses bug GIRAPH-89. https://issues.apache.org/jira/browse/GIRAPH-89 Diffs - http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/LongDoubleFloatDoubleVertex.java 1202868 Diff: https://reviews.apache.org/r/2859/diff Testing --- $ mvn test Thanks, shaunak
[jira] [Commented] (GIRAPH-89) Remove debugging system.out from LongDoubleFloatDoubleVertex
[ https://issues.apache.org/jira/browse/GIRAPH-89?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151483#comment-13151483 ] jirapos...@reviews.apache.org commented on GIRAPH-89: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2859/ --- Review request for giraph. Summary --- Removing System.out debugging statement. This addresses bug GIRAPH-89. https://issues.apache.org/jira/browse/GIRAPH-89 Diffs - http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/LongDoubleFloatDoubleVertex.java 1202868 Diff: https://reviews.apache.org/r/2859/diff Testing --- $ mvn test Thanks, shaunak Remove debugging system.out from LongDoubleFloatDoubleVertex Key: GIRAPH-89 URL: https://issues.apache.org/jira/browse/GIRAPH-89 Project: Giraph Issue Type: Bug Reporter: Jakob Homan Assignee: Shaunak Kashyap Priority: Minor Labels: newbie Line 137: {{System.out.println(in getNumVertices!);}} looks like a debugging line and should be removed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-89) Remove debugging system.out from LongDoubleFloatDoubleVertex
[ https://issues.apache.org/jira/browse/GIRAPH-89?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151484#comment-13151484 ] jirapos...@reviews.apache.org commented on GIRAPH-89: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2859/#review3304 --- Ship it! - Avery On 2011-11-16 20:20:09, shaunak wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2859/ bq. --- bq. bq. (Updated 2011-11-16 20:20:09) bq. bq. bq. Review request for giraph. bq. bq. bq. Summary bq. --- bq. bq. Removing System.out debugging statement. bq. bq. bq. This addresses bug GIRAPH-89. bq. https://issues.apache.org/jira/browse/GIRAPH-89 bq. bq. bq. Diffs bq. - bq. bq. http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/LongDoubleFloatDoubleVertex.java 1202868 bq. bq. Diff: https://reviews.apache.org/r/2859/diff bq. bq. bq. Testing bq. --- bq. bq. $ mvn test bq. bq. bq. Thanks, bq. bq. shaunak bq. bq. Remove debugging system.out from LongDoubleFloatDoubleVertex Key: GIRAPH-89 URL: https://issues.apache.org/jira/browse/GIRAPH-89 Project: Giraph Issue Type: Bug Reporter: Jakob Homan Assignee: Shaunak Kashyap Priority: Minor Labels: newbie Line 137: {{System.out.println(in getNumVertices!);}} looks like a debugging line and should be removed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (GIRAPH-89) Remove debugging system.out from LongDoubleFloatDoubleVertex
[ https://issues.apache.org/jira/browse/GIRAPH-89?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shaunak Kashyap resolved GIRAPH-89. --- Resolution: Fixed Remove debugging system.out from LongDoubleFloatDoubleVertex Key: GIRAPH-89 URL: https://issues.apache.org/jira/browse/GIRAPH-89 Project: Giraph Issue Type: Bug Reporter: Jakob Homan Assignee: Shaunak Kashyap Priority: Minor Labels: newbie Line 137: {{System.out.println(in getNumVertices!);}} looks like a debugging line and should be removed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-89) Remove debugging system.out from LongDoubleFloatDoubleVertex
[ https://issues.apache.org/jira/browse/GIRAPH-89?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151493#comment-13151493 ] Hudson commented on GIRAPH-89: -- Integrated in Giraph-trunk-Commit #35 (See [https://builds.apache.org/job/Giraph-trunk-Commit/35/]) GIRAPH-89: Remove debugging system.out from LongDoubleFloatDoubleVertex. (shaunak via aching) aching : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1202875 Files : * /incubator/giraph/trunk/CHANGELOG * /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/LongDoubleFloatDoubleVertex.java Remove debugging system.out from LongDoubleFloatDoubleVertex Key: GIRAPH-89 URL: https://issues.apache.org/jira/browse/GIRAPH-89 Project: Giraph Issue Type: Bug Reporter: Jakob Homan Assignee: Shaunak Kashyap Priority: Minor Labels: newbie Line 137: {{System.out.println(in getNumVertices!);}} looks like a debugging line and should be removed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (GIRAPH-86) Simplify boolean expressions in ZooKeeperExt::createExt
[ https://issues.apache.org/jira/browse/GIRAPH-86?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Csordas updated GIRAPH-86: - Attachment: GIRAPH-86.patch mvn test with hadoop_non_secure profile is successful, but apache-rat:check throws: Failed to execute goal org.apache.rat:apache-rat-plugin:0.7:check (default-cli) on project giraph: Too many unapproved licenses: 1 - [Help 1] before and after the change Simplify boolean expressions in ZooKeeperExt::createExt --- Key: GIRAPH-86 URL: https://issues.apache.org/jira/browse/GIRAPH-86 Project: Giraph Issue Type: Improvement Affects Versions: 0.70.0 Reporter: Jakob Homan Assignee: Attila Csordas Labels: newbie Attachments: GIRAPH-86.patch In ZooKeeperExt::createExt there are two instances of {{recursive==false}} that can be simplified to !recursive. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-91) Large-memory improvements (Memory reduced vertex implementation, fast failure, added settings)
Large-memory improvements (Memory reduced vertex implementation, fast failure, added settings) --- Key: GIRAPH-91 URL: https://issues.apache.org/jira/browse/GIRAPH-91 Project: Giraph Issue Type: Improvement Reporter: Avery Ching Current vertex implementation uses a HashMap for storing the edges, which is quite memory heavy for large graphs. The default settings in Giraph need to be improved for large graphs and heaps of 20G. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-91) Large-memory improvements (Memory reduced vertex implementation, fast failure, added settings)
[ https://issues.apache.org/jira/browse/GIRAPH-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151587#comment-13151587 ] jirapos...@reviews.apache.org commented on GIRAPH-91: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2868/ --- Review request for giraph. Summary --- There general changes should support larger heap sizes (i.e. 20G) - Added new EdgeListVertex that stores its edges in a compact pair of lists instead of Vertex's HashMap. - Added unittests TestEdgeArrayVertex to test EdgeListVertex. - Augmented PageRankBenchmark to choose between EdgeListArrayVertex or Vertex (to try it out). - Added failure cleanup for failed workers to quickly alert the master that they are dead by deleting its health ephemeral znode. This allows us to set higher ZooKeeper timeouts to deal with GC pauses and the like. In a quick test of 3 nodes, I saw failure in 43 seconds instead of 1m 52 sec. - Added a context.progress() to flushing to not kill jobs with long timeouts (GC or lots of messages). This addresses bug GIRAPH-91. https://issues.apache.org/jira/browse/GIRAPH-91 Diffs - http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.java 1202898 http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java 1202898 http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1202898 http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1202898 http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/EdgeListVertex.java PRE-CREATION http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java 1202898 http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1202898 http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestJsonBase64Format.java 1202898 http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/graph/TestEdgeListVertex.java PRE-CREATION Diff: https://reviews.apache.org/r/2868/diff Testing --- Local unittests, PageRankBenchmark on multiple machines with 20GB heaps. Thanks, Avery Large-memory improvements (Memory reduced vertex implementation, fast failure, added settings) --- Key: GIRAPH-91 URL: https://issues.apache.org/jira/browse/GIRAPH-91 Project: Giraph Issue Type: Improvement Reporter: Avery Ching Current vertex implementation uses a HashMap for storing the edges, which is quite memory heavy for large graphs. The default settings in Giraph need to be improved for large graphs and heaps of 20G. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (GIRAPH-86) Simplify boolean expressions in ZooKeeperExt::createExt
[ https://issues.apache.org/jira/browse/GIRAPH-86?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Avery Ching updated GIRAPH-86: -- Attachment: pom.diff See if this helps you...if so, please add to your patch. Simplify boolean expressions in ZooKeeperExt::createExt --- Key: GIRAPH-86 URL: https://issues.apache.org/jira/browse/GIRAPH-86 Project: Giraph Issue Type: Improvement Affects Versions: 0.70.0 Reporter: Jakob Homan Assignee: Attila Csordas Labels: newbie Attachments: GIRAPH-86.patch, pom.diff In ZooKeeperExt::createExt there are two instances of {{recursive==false}} that can be simplified to !recursive. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-91) Large-memory improvements (Memory reduced vertex implementation, fast failure, added settings)
[ https://issues.apache.org/jira/browse/GIRAPH-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151600#comment-13151600 ] Jakob Homan commented on GIRAPH-91: --- can you attach the patch to the jira, for non-rb review? Large-memory improvements (Memory reduced vertex implementation, fast failure, added settings) --- Key: GIRAPH-91 URL: https://issues.apache.org/jira/browse/GIRAPH-91 Project: Giraph Issue Type: Improvement Reporter: Avery Ching Assignee: Avery Ching Current vertex implementation uses a HashMap for storing the edges, which is quite memory heavy for large graphs. The default settings in Giraph need to be improved for large graphs and heaps of 20G. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (GIRAPH-91) Large-memory improvements (Memory reduced vertex implementation, fast failure, added settings)
[ https://issues.apache.org/jira/browse/GIRAPH-91?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Avery Ching updated GIRAPH-91: -- Attachment: GIRAPH-91.diff Sure, sorry about that. Large-memory improvements (Memory reduced vertex implementation, fast failure, added settings) --- Key: GIRAPH-91 URL: https://issues.apache.org/jira/browse/GIRAPH-91 Project: Giraph Issue Type: Improvement Reporter: Avery Ching Assignee: Avery Ching Attachments: GIRAPH-91.diff Current vertex implementation uses a HashMap for storing the edges, which is quite memory heavy for large graphs. The default settings in Giraph need to be improved for large graphs and heaps of 20G. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-91) Large-memory improvements (Memory reduced vertex implementation, fast failure, added settings)
[ https://issues.apache.org/jira/browse/GIRAPH-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151603#comment-13151603 ] Avery Ching commented on GIRAPH-91: --- By the way, rb allows you to download the diff directly (so you don't have to worry about them staying in sync). https://reviews.apache.org/r/2868/diff/raw/ Large-memory improvements (Memory reduced vertex implementation, fast failure, added settings) --- Key: GIRAPH-91 URL: https://issues.apache.org/jira/browse/GIRAPH-91 Project: Giraph Issue Type: Improvement Reporter: Avery Ching Assignee: Avery Ching Attachments: GIRAPH-91.diff Current vertex implementation uses a HashMap for storing the edges, which is quite memory heavy for large graphs. The default settings in Giraph need to be improved for large graphs and heaps of 20G. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-91) Large-memory improvements (Memory reduced vertex implementation, fast failure, added settings)
[ https://issues.apache.org/jira/browse/GIRAPH-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151643#comment-13151643 ] Claudio Martella commented on GIRAPH-91: The List-based adjacency list looks quite good to me. A couple of weeks ago I did a microbenchmark on iteration-performance of arrayList/array, TreeMap, HashMap and SkipList and I was quite impressed about the performance hit. I believe we don't only save memory here (would be curious to calculate precisely the overhead) but also in speedup with algorithms, such as PR, where the compute has an iterator-based sendMsg pattern. Good! Large-memory improvements (Memory reduced vertex implementation, fast failure, added settings) --- Key: GIRAPH-91 URL: https://issues.apache.org/jira/browse/GIRAPH-91 Project: Giraph Issue Type: Improvement Reporter: Avery Ching Assignee: Avery Ching Attachments: GIRAPH-91.diff Current vertex implementation uses a HashMap for storing the edges, which is quite memory heavy for large graphs. The default settings in Giraph need to be improved for large graphs and heaps of 20G. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (GIRAPH-92) Need outputformat for just vertex ID and value
[ https://issues.apache.org/jira/browse/GIRAPH-92?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated GIRAPH-92: -- Attachment: GIRAPH-92.patch Patch with new format and unit test. One can switch from IddelimValue to ValuedelimId via a configuration parameter. Need outputformat for just vertex ID and value -- Key: GIRAPH-92 URL: https://issues.apache.org/jira/browse/GIRAPH-92 Project: Giraph Issue Type: New Feature Components: lib Affects Versions: 0.70.0 Reporter: Jakob Homan Assignee: Jakob Homan Fix For: 0.70.0 Attachments: GIRAPH-92.patch We should have an text outputformat that just spits out the vertex id and value without its edges: {noformat}index.html 0.9423{noformat} This would be particularly helpful for further processing by, for instance, Pig. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-92) Need outputformat for just vertex ID and value
Need outputformat for just vertex ID and value -- Key: GIRAPH-92 URL: https://issues.apache.org/jira/browse/GIRAPH-92 Project: Giraph Issue Type: New Feature Components: lib Affects Versions: 0.70.0 Reporter: Jakob Homan Assignee: Jakob Homan Fix For: 0.70.0 Attachments: GIRAPH-92.patch We should have an text outputformat that just spits out the vertex id and value without its edges: {noformat}index.html 0.9423{noformat} This would be particularly helpful for further processing by, for instance, Pig. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-92) Need outputformat for just vertex ID and value
[ https://issues.apache.org/jira/browse/GIRAPH-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151736#comment-13151736 ] Avery Ching commented on GIRAPH-92: --- I agree this could be useful. Couple of format errors: if (expr) + if(reverseOutput) { typo + public void testWithDifferentDelimieter() throws IOException, Interrupted needs one more space + public void testWithDifferentDelimieter() throws IOException, + InterruptedException { +Configuration conf = new Configuration(); Extra line break +writer.writeVertex(vertex); + + +verify(tw).write(expected, null); Need outputformat for just vertex ID and value -- Key: GIRAPH-92 URL: https://issues.apache.org/jira/browse/GIRAPH-92 Project: Giraph Issue Type: New Feature Components: lib Affects Versions: 0.70.0 Reporter: Jakob Homan Assignee: Jakob Homan Fix For: 0.70.0 Attachments: GIRAPH-92.patch We should have an text outputformat that just spits out the vertex id and value without its edges: {noformat}index.html 0.9423{noformat} This would be particularly helpful for further processing by, for instance, Pig. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-78) Be smarter about multiple instances of the same vertex
[ https://issues.apache.org/jira/browse/GIRAPH-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151753#comment-13151753 ] Claudio Martella commented on GIRAPH-78: Yes, very nice, but how would you implement this? A caching Factory or you really want 100% re-use? That would require a per-worker index of Is. Be smarter about multiple instances of the same vertex -- Key: GIRAPH-78 URL: https://issues.apache.org/jira/browse/GIRAPH-78 Project: Giraph Issue Type: Improvement Reporter: Jakob Homan In a graph such as {noformat}a - b, z b - c, z c - a, z ... z{noformat} where vertices a,b,c and are hosted on one worker and z is hosted on another, it would be good to cache instances of z so a,b,c all point at the same instance, rather than generating multiple copies of the same remote vertex during vertex reading. This is less important with primitive types and the recent work done there, but very useful for more complex types. Since the vertex readers are in userland, it would be good to provide these facilities as a library implementing users can access. ] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-78) Be smarter about multiple instances of the same vertex
[ https://issues.apache.org/jira/browse/GIRAPH-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151782#comment-13151782 ] Avery Ching commented on GIRAPH-78: --- Actually the more I think about it, this might not be too useful unless you have large vertexId objects. I guess the idea would be to keep a cache, maybe in the GraphState or the WorkerContext. Be smarter about multiple instances of the same vertex -- Key: GIRAPH-78 URL: https://issues.apache.org/jira/browse/GIRAPH-78 Project: Giraph Issue Type: Improvement Reporter: Jakob Homan In a graph such as {noformat}a - b, z b - c, z c - a, z ... z{noformat} where vertices a,b,c and are hosted on one worker and z is hosted on another, it would be good to cache instances of z so a,b,c all point at the same instance, rather than generating multiple copies of the same remote vertex during vertex reading. This is less important with primitive types and the recent work done there, but very useful for more complex types. Since the vertex readers are in userland, it would be good to provide these facilities as a library implementing users can access. ] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-93) Hive input / output format
Hive input / output format -- Key: GIRAPH-93 URL: https://issues.apache.org/jira/browse/GIRAPH-93 Project: Giraph Issue Type: New Feature Reporter: Avery Ching Assignee: Avery Ching It would be great to be able to load/store data from/to Hive tables. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-93) Hive input / output format
[ https://issues.apache.org/jira/browse/GIRAPH-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151805#comment-13151805 ] Jakob Homan commented on GIRAPH-93: --- Do you mean RCFile specifically? Hive can handle data in any format there's a serde for. I've been meaning to open a jira for handling Avro-encoded data as well (and possibly specifying a graph schema for it). For directly loading tables in/out of Hive, it may be better to target HCatalog, as that will also give access to Pig (and whatever else HCatalog eventually supports) Hive input / output format -- Key: GIRAPH-93 URL: https://issues.apache.org/jira/browse/GIRAPH-93 Project: Giraph Issue Type: New Feature Reporter: Avery Ching Assignee: Avery Ching It would be great to be able to load/store data from/to Hive tables. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-93) Hive input / output format
[ https://issues.apache.org/jira/browse/GIRAPH-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151809#comment-13151809 ] Dmitriy V. Ryaboy commented on GIRAPH-93: - FWIW I already have a Thrift one lying around, which both Hive and Pig read via Elephant-Bird. It might not compile against the current trunk as I've been working on other stuff and you guys have been coding like mad.. but I can post something over the thanksgiving weekend. Hive input / output format -- Key: GIRAPH-93 URL: https://issues.apache.org/jira/browse/GIRAPH-93 Project: Giraph Issue Type: New Feature Reporter: Avery Ching Assignee: Avery Ching It would be great to be able to load/store data from/to Hive tables. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-91) Large-memory improvements (Memory reduced vertex implementation, fast failure, added settings)
[ https://issues.apache.org/jira/browse/GIRAPH-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151823#comment-13151823 ] Arun Suresh commented on GIRAPH-91: --- Avery, I see that you have used 2 sorted ArrayLists. Couldnt a LinkedHashMap have been an alternative ? I understand that the getEdgeValue and hasEdgeVale would be faster if it were a sortedArrayList. Also arraylists are more compact. But I was just wondering.. in the event that the graph is truly large (millions of edges, for a vertex) would it make sense to have the entire edgelist in memory in the first place ? we might need a scheme where only a part of the list is in memory and have chunks of the list fetched on demand when the provided iterator calls next(). In which case we can have a hybrid array + linked list (linked list of chunks of the edgelist) Large-memory improvements (Memory reduced vertex implementation, fast failure, added settings) --- Key: GIRAPH-91 URL: https://issues.apache.org/jira/browse/GIRAPH-91 Project: Giraph Issue Type: Improvement Reporter: Avery Ching Assignee: Avery Ching Attachments: GIRAPH-91.diff Current vertex implementation uses a HashMap for storing the edges, which is quite memory heavy for large graphs. The default settings in Giraph need to be improved for large graphs and heaps of 20G. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-76) Refactor worker logic from GraphMapper
[ https://issues.apache.org/jira/browse/GIRAPH-76?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151825#comment-13151825 ] Arun Suresh commented on GIRAPH-76: --- Yes, this does sound like a good idea. I could take a crack at it you havn't already started. Refactor worker logic from GraphMapper -- Key: GIRAPH-76 URL: https://issues.apache.org/jira/browse/GIRAPH-76 Project: Giraph Issue Type: Improvement Components: graph Reporter: Jakob Homan The plumbing around executing vertices is hosted within the mapper, but could be extracted to its own class and executed from the Mapper directly. This would ease testing and make it easier to host in the new YARN infrastructure. There's nothing mapper specific about this code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-91) Large-memory improvements (Memory reduced vertex implementation, fast failure, added settings)
[ https://issues.apache.org/jira/browse/GIRAPH-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151844#comment-13151844 ] Avery Ching commented on GIRAPH-91: --- Arun, we can certainly try other data structure for other BasicVertex implementations. This one is a meant for pretty decent memory reduction. I expect we will have a bunch of different implementations based on the requirements of the application. Large-memory improvements (Memory reduced vertex implementation, fast failure, added settings) --- Key: GIRAPH-91 URL: https://issues.apache.org/jira/browse/GIRAPH-91 Project: Giraph Issue Type: Improvement Reporter: Avery Ching Assignee: Avery Ching Attachments: GIRAPH-91.diff Current vertex implementation uses a HashMap for storing the edges, which is quite memory heavy for large graphs. The default settings in Giraph need to be improved for large graphs and heaps of 20G. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-93) Hive input / output format
[ https://issues.apache.org/jira/browse/GIRAPH-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151848#comment-13151848 ] Avery Ching commented on GIRAPH-93: --- I think RCFile initially, then other formats in Hive. HCatalog certainly is a good idea for the long term, not sure how ready it is now? Dmitriy, could you send me your code (don't worry about getting it to compile). I'd like to take a look at any examples. I've been trying to find any examples for using loading/storing from Hive tables to MapReduce jobs and can't find much unfortunately. I'd appreciate any pointers. Hive input / output format -- Key: GIRAPH-93 URL: https://issues.apache.org/jira/browse/GIRAPH-93 Project: Giraph Issue Type: New Feature Reporter: Avery Ching Assignee: Avery Ching It would be great to be able to load/store data from/to Hive tables. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-93) Hive input / output format
[ https://issues.apache.org/jira/browse/GIRAPH-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151851#comment-13151851 ] Arun Suresh commented on GIRAPH-93: --- Avery, This might not be an optimal solution, but just putting it out there. I understand Hive exposes a JDBC interface. Once can use the JDBC interface and the DbInputFormat http://www.cloudera.com/blog/2009/03/database-access-with-hadoop/ to load data from a Hive table for a Map Reduce Job Hive input / output format -- Key: GIRAPH-93 URL: https://issues.apache.org/jira/browse/GIRAPH-93 Project: Giraph Issue Type: New Feature Reporter: Avery Ching Assignee: Avery Ching It would be great to be able to load/store data from/to Hive tables. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-76) Refactor worker logic from GraphMapper
[ https://issues.apache.org/jira/browse/GIRAPH-76?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151860#comment-13151860 ] Jakob Homan commented on GIRAPH-76: --- I've not. Please go for it. Refactor worker logic from GraphMapper -- Key: GIRAPH-76 URL: https://issues.apache.org/jira/browse/GIRAPH-76 Project: Giraph Issue Type: Improvement Components: graph Reporter: Jakob Homan The plumbing around executing vertices is hosted within the mapper, but could be extracted to its own class and executed from the Mapper directly. This would ease testing and make it easier to host in the new YARN infrastructure. There's nothing mapper specific about this code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-93) Hive input / output format
[ https://issues.apache.org/jira/browse/GIRAPH-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151866#comment-13151866 ] Avery Ching commented on GIRAPH-93: --- Specifically, we have a lot of data stored in Hive tables and I'd like to be able to do graph computation on them with Giraph and then store the results back in Hive tables so Hive queries can operate against them as well. Hive input / output format -- Key: GIRAPH-93 URL: https://issues.apache.org/jira/browse/GIRAPH-93 Project: Giraph Issue Type: New Feature Reporter: Avery Ching Assignee: Avery Ching It would be great to be able to load/store data from/to Hive tables. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-93) Hive input / output format
[ https://issues.apache.org/jira/browse/GIRAPH-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151871#comment-13151871 ] Dmitriy V. Ryaboy commented on GIRAPH-93: - Going through HCat will be a bit gnarly (though I agree with Jakob that it's the only sensible way when you are dealing with Hive-managed tables). Writing to a directory and having Hive treat is a external will be far easier. Oh and Hive (and Pig) can read tab-delimited files, if it's just a matter of getting basic pipelining to happen initially. Jakob, maybe we can discuss on the list, but starting a Giraph job from Pig should be as simple as a mapreduce invocation (http://pig.apache.org/docs/r0.9.1/basic.html#mapreduce). Hive input / output format -- Key: GIRAPH-93 URL: https://issues.apache.org/jira/browse/GIRAPH-93 Project: Giraph Issue Type: New Feature Reporter: Avery Ching Assignee: Avery Ching It would be great to be able to load/store data from/to Hive tables. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-78) Be smarter about multiple instances of the same vertex
[ https://issues.apache.org/jira/browse/GIRAPH-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151873#comment-13151873 ] Jake Mannix commented on GIRAPH-78: --- Yeah, that's what I've been thinking too: each vertex has independent edge values to its destination, and doesn't keep a reference to the target vertex *value*, just its id. So yeah, unless the I typed objects are big, I'm not sure what you can do here. Be smarter about multiple instances of the same vertex -- Key: GIRAPH-78 URL: https://issues.apache.org/jira/browse/GIRAPH-78 Project: Giraph Issue Type: Improvement Reporter: Jakob Homan In a graph such as {noformat}a - b, z b - c, z c - a, z ... z{noformat} where vertices a,b,c and are hosted on one worker and z is hosted on another, it would be good to cache instances of z so a,b,c all point at the same instance, rather than generating multiple copies of the same remote vertex during vertex reading. This is less important with primitive types and the recent work done there, but very useful for more complex types. Since the vertex readers are in userland, it would be good to provide these facilities as a library implementing users can access. ] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira