[jira] [Commented] (GIRAPH-185) Improve concurrency of putMsg / putMsgList
[ https://issues.apache.org/jira/browse/GIRAPH-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13261906#comment-13261906 ] Avery Ching commented on GIRAPH-185: I agree that a benchmark should be done, although I expect the impact to be very small. We should at least show it's not slower. =) > Improve concurrency of putMsg / putMsgList > -- > > Key: GIRAPH-185 > URL: https://issues.apache.org/jira/browse/GIRAPH-185 > Project: Giraph > Issue Type: Improvement > Components: graph >Affects Versions: 0.2.0 >Reporter: Bo Wang >Assignee: Bo Wang > Fix For: 0.2.0 > > Attachments: GIRAPH-185.patch, GIRAPH-185.patch > > Original Estimate: 2h > Remaining Estimate: 2h > > Currently in putMsg / putMsgList, a synchronized closure is used to protect > the whole transientInMessages when adding the new message. This lock prevents > other concurrent calls to putMsg/putMsgList and increases the response time. > We should use fine-grain locks to allow high concurrency in message > communication. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats
[ https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265155#comment-13265155 ] Avery Ching commented on GIRAPH-153: I'll take a look, sorry for the delay. > HBase/Accumulo Input and Output formats > --- > > Key: GIRAPH-153 > URL: https://issues.apache.org/jira/browse/GIRAPH-153 > Project: Giraph > Issue Type: New Feature > Components: bsp >Affects Versions: 0.1.0 > Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB >Reporter: Brian Femiano > Attachments: GIRAPH-153.1.patch, GIRAPH-153.patch > > > Four abstract classes that wrap their respective delegate input/output > formats for > easy hooks into vertex input format subclasses. I've included some sample > programs that show two very simple graph > algorithms. I have a graph generator that builds out a very simple directed > structure, starting with a few 'root' nodes. > Root nodes are defined as nodes which are not listed as a child anywhere in > the graph. > Algorithm 1) AccumuloRootMarker.java --> Accumulo as read/write source. > Every vertex starts thinking it's a root. At superstep 0, send a message down > to each > child as a non-root notification. After superstep 1, only root nodes will > have never been messaged. > Algorithm 2) TableRootMarker --> HBase as read/write source. Expands on A1 by > bundling the notification logic followed by root node propagation. Once we've > marked the appropriate nodes as roots, tell every child which roots it can be > traced back to via one or more spanning trees. This will take N + 2 > supersteps where N is the maximum number of hops from any root to any leaf, > plus 2 supersteps for the initial root flagging. > I've included all relevant code plus DistributedCacheHelper.java for > recursive cache file and archive searches. It is more hadoop centric than > giraph, but these jobs use it so I figured why not commit here. > These have been tested through local JobRunner, pseudo-distributed on the > aforementioned hardware, and full distributed on EC2. More details in the > comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats
[ https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265176#comment-13265176 ] Avery Ching commented on GIRAPH-153: Brian, I'm having some trouble with your patch. I used a freshly checked out version of giraph to confirm: aching@sdwilshmbp13:~/Avery/source/giraph_trunk$ patch -p0 < ~/Desktop/GIRAPH-153.1.patch patching file giraph-formats-contrib/LICENSE.txt patching file giraph-formats-contrib/license-header.txt patching file giraph-formats-contrib/src/test/java/org/apache/giraph/BspCase.java patching file giraph-formats-contrib/src/test/java/org/apache/giraph/format/hbase/TestHBaseRootMarkerVertextFormat.java patching file giraph-formats-contrib/src/test/java/org/apache/giraph/format/hbase/edgemarker/TableEdgeInputFormat.java patching file giraph-formats-contrib/src/test/java/org/apache/giraph/format/hbase/edgemarker/TableEdgeOutputFormat.java patching file giraph-formats-contrib/src/test/java/org/apache/giraph/format/accumulo/TestAccumuloVertexFormat.java patching file giraph-formats-contrib/src/test/java/org/apache/giraph/format/accumulo/edgemarker/AccumuloEdgeInputFormat.java patching file giraph-formats-contrib/src/test/java/org/apache/giraph/format/accumulo/edgemarker/AccumuloEdgeOutputFormat.java patching file giraph-formats-contrib/src/main/assembly/compile.xml can't find file to patch at input line 1301 Perhaps you used the wrong -p or --strip option? The text leading up to this was: -- |Index: giraph-formats-contrib/src/main/java/org/apache/giraph/format/hbase/package-info.java |=== |--- giraph-formats-contrib/src/main/java/org/apache/giraph/format/hbase/package-info.java (revision 0) |+++ giraph-formats-contrib/src/main/java/org/apache/giraph/format/hbase/package-info.java (working copy) -- File to patch: > HBase/Accumulo Input and Output formats > --- > > Key: GIRAPH-153 > URL: https://issues.apache.org/jira/browse/GIRAPH-153 > Project: Giraph > Issue Type: New Feature > Components: bsp >Affects Versions: 0.1.0 > Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB >Reporter: Brian Femiano > Attachments: GIRAPH-153.1.patch, GIRAPH-153.patch > > > Four abstract classes that wrap their respective delegate input/output > formats for > easy hooks into vertex input format subclasses. I've included some sample > programs that show two very simple graph > algorithms. I have a graph generator that builds out a very simple directed > structure, starting with a few 'root' nodes. > Root nodes are defined as nodes which are not listed as a child anywhere in > the graph. > Algorithm 1) AccumuloRootMarker.java --> Accumulo as read/write source. > Every vertex starts thinking it's a root. At superstep 0, send a message down > to each > child as a non-root notification. After superstep 1, only root nodes will > have never been messaged. > Algorithm 2) TableRootMarker --> HBase as read/write source. Expands on A1 by > bundling the notification logic followed by root node propagation. Once we've > marked the appropriate nodes as roots, tell every child which roots it can be > traced back to via one or more spanning trees. This will take N + 2 > supersteps where N is the maximum number of hops from any root to any leaf, > plus 2 supersteps for the initial root flagging. > I've included all relevant code plus DistributedCacheHelper.java for > recursive cache file and archive searches. It is more hadoop centric than > giraph, but these jobs use it so I figured why not commit here. > These have been tested through local JobRunner, pseudo-distributed on the > aforementioned hardware, and full distributed on EC2. More details in the > comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats
[ https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265205#comment-13265205 ] Avery Ching commented on GIRAPH-153: Is this a fresh checkout? We shouldn't have to answer any questions like "Reversed (or previously applied) patch detected! Assume -R". > HBase/Accumulo Input and Output formats > --- > > Key: GIRAPH-153 > URL: https://issues.apache.org/jira/browse/GIRAPH-153 > Project: Giraph > Issue Type: New Feature > Components: bsp >Affects Versions: 0.1.0 > Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB >Reporter: Brian Femiano > Attachments: GIRAPH-153.1.patch, GIRAPH-153.patch > > > Four abstract classes that wrap their respective delegate input/output > formats for > easy hooks into vertex input format subclasses. I've included some sample > programs that show two very simple graph > algorithms. I have a graph generator that builds out a very simple directed > structure, starting with a few 'root' nodes. > Root nodes are defined as nodes which are not listed as a child anywhere in > the graph. > Algorithm 1) AccumuloRootMarker.java --> Accumulo as read/write source. > Every vertex starts thinking it's a root. At superstep 0, send a message down > to each > child as a non-root notification. After superstep 1, only root nodes will > have never been messaged. > Algorithm 2) TableRootMarker --> HBase as read/write source. Expands on A1 by > bundling the notification logic followed by root node propagation. Once we've > marked the appropriate nodes as roots, tell every child which roots it can be > traced back to via one or more spanning trees. This will take N + 2 > supersteps where N is the maximum number of hops from any root to any leaf, > plus 2 supersteps for the initial root flagging. > I've included all relevant code plus DistributedCacheHelper.java for > recursive cache file and archive searches. It is more hadoop centric than > giraph, but these jobs use it so I figured why not commit here. > These have been tested through local JobRunner, pseudo-distributed on the > aforementioned hardware, and full distributed on EC2. More details in the > comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats
[ https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265542#comment-13265542 ] Avery Ching commented on GIRAPH-153: No problem. The red flag for me was that this patch (244K) was so much bigger than the previous one (85k). > HBase/Accumulo Input and Output formats > --- > > Key: GIRAPH-153 > URL: https://issues.apache.org/jira/browse/GIRAPH-153 > Project: Giraph > Issue Type: New Feature > Components: bsp >Affects Versions: 0.1.0 > Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB >Reporter: Brian Femiano > Attachments: GIRAPH-153.1.patch, GIRAPH-153.2.patch, GIRAPH-153.patch > > > Four abstract classes that wrap their respective delegate input/output > formats for > easy hooks into vertex input format subclasses. I've included some sample > programs that show two very simple graph > algorithms. I have a graph generator that builds out a very simple directed > structure, starting with a few 'root' nodes. > Root nodes are defined as nodes which are not listed as a child anywhere in > the graph. > Algorithm 1) AccumuloRootMarker.java --> Accumulo as read/write source. > Every vertex starts thinking it's a root. At superstep 0, send a message down > to each > child as a non-root notification. After superstep 1, only root nodes will > have never been messaged. > Algorithm 2) TableRootMarker --> HBase as read/write source. Expands on A1 by > bundling the notification logic followed by root node propagation. Once we've > marked the appropriate nodes as roots, tell every child which roots it can be > traced back to via one or more spanning trees. This will take N + 2 > supersteps where N is the maximum number of hops from any root to any leaf, > plus 2 supersteps for the initial root flagging. > I've included all relevant code plus DistributedCacheHelper.java for > recursive cache file and archive searches. It is more hadoop centric than > giraph, but these jobs use it so I figured why not commit here. > These have been tested through local JobRunner, pseudo-distributed on the > aforementioned hardware, and full distributed on EC2. More details in the > comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-169) How to close all child when a job finished?
[ https://issues.apache.org/jira/browse/GIRAPH-169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267294#comment-13267294 ] Avery Ching commented on GIRAPH-169: Thanks for the simple case Roman. I wonder what versions are affected. 20.203 seems fine with your test case. > How to close all child when a job finished? > --- > > Key: GIRAPH-169 > URL: https://issues.apache.org/jira/browse/GIRAPH-169 > Project: Giraph > Issue Type: Improvement > Components: mapreduce >Affects Versions: 0.2.0 > Environment: sles 11 x64,jdk 1.6,hadoop 0.20.205.0,1 Master and 8 > slaves, >Reporter: Jianfeng Qian >Priority: Minor > > I ran pagerank at hadoop 0.20.205.0. When the job finished,the child in > slaves didn't quit immediately and sometimes they never quit and I have to > kill them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-169) How to close all child when a job finished?
[ https://issues.apache.org/jira/browse/GIRAPH-169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267301#comment-13267301 ] Avery Ching commented on GIRAPH-169: Roman, I just tried with hadoop-1.0.2 with your test case: hadoop jar target/giraph-0.2-SNAPSHOT-jar-with-dependencies.jar org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 10 -v -V 1000 -w 1 hadoop jar target/giraph-0.2-SNAPSHOT-jar-with-dependencies.jar org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 10 -v -V 1000 -w 2 Both of them ran 2x. One thing I did do was compile against the hadoop-1.02 version. mvn -Phadoop_1.0 clean package -DskipTests Can you verify that you compiled against the correct Hadoop profile? > How to close all child when a job finished? > --- > > Key: GIRAPH-169 > URL: https://issues.apache.org/jira/browse/GIRAPH-169 > Project: Giraph > Issue Type: Improvement > Components: mapreduce >Affects Versions: 0.2.0 > Environment: sles 11 x64,jdk 1.6,hadoop 0.20.205.0,1 Master and 8 > slaves, >Reporter: Jianfeng Qian >Priority: Minor > > I ran pagerank at hadoop 0.20.205.0. When the job finished,the child in > slaves didn't quit immediately and sometimes they never quit and I have to > kill them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (GIRAPH-127) Extending the API with a master.compute() function.
[ https://issues.apache.org/jira/browse/GIRAPH-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Avery Ching reassigned GIRAPH-127: -- Assignee: Semih Salihoglu Looking forward to this. > Extending the API with a master.compute() function. > --- > > Key: GIRAPH-127 > URL: https://issues.apache.org/jira/browse/GIRAPH-127 > Project: Giraph > Issue Type: New Feature > Components: bsp, examples, graph >Reporter: Semih Salihoglu >Assignee: Semih Salihoglu > > First of all, sorry for the long explanation to this feature. > I want to expand the API of Giraph with a new function called > master.compute(), that would get called at the master before each superstep > and I will try to explain the purpose that it would serve with an example. > Let's say we want to implement the following simplified version of the > k-means clustering algorithm. Pseudocode below: > * Input G(V, E), k, numEdgesThreshold, maxIterations > * Algorithm: > * int numEdgesCrossingClusters = Integer.MAX_INT; > * int iterationNo = 0; > * while ((numEdgesCrossingCluster > numEdgesThreshold) && iterationNo < > maxIterations) { > *iterationNo++; > *int[] clusterCenters = pickKClusterCenters(k, G); > *findClusterCenters(G, clusterCenters); > *numEdgesCrossingClusters = countNumEdgesCrossingClusters(); > * } > The algorithm goes through the following steps in iterations: > 1) Pick k random initial cluster centers > 2) Assign each vertex to the cluster center that it's closest to (in Giraph, > this can be implemented in message passing similar to how ShortestPaths is > implemented): > 3) Count the nuimber of edges crossing clusters > 4) Go back to step 1, if there are a lot of edges crossing clusters and we > haven't exceeded maximum number of iterations yet. > In an algorithm like this, step 2 and 3 are where most of the work happens > and both parts have very neat message-passing implementations. I'll try to > give an overview without going into the details. Let's say we define a Vertex > in Giraph to hold a custom Writable object that holds 2 integer values and > sends a message with upto 2 integer values. > Step 2 is very similar to ShortestPaths algorithm and has two stages: In the > first stage, each vertex checks to see whether or not it's one of the cluster > centers. If so, it assigns itself the value (id, 0), otherwise it assigns > itself (Null, Null). In the 2nd stage, the vertices assign themselves to the > minimum distance cluster center by looking at their neighbors (cluster > centers, distance) values (received as 2 integer messages) and their current > values, and changing their values if they find a lower distance cluster > center. This happens in x number of supersteps until every vertex converges. > Step 3, counting the number of edges crossing clusters, is also very easy to > implement in Giraph. Once each vertex has a cluster center, the number of > edges crossing clusters can be counted by an aggregator, let's say called > "num-edges-crossing". It would again have two stages: First stage, every > vertex just sends its cluster id to all its neighbors. Second stage, every > vertex looks at their neighbors' cluster ids in the messages, and for each > cluster id that is not equal to its own cluster id, it increments > "num-edges-crossing" by 1. > The other 2 steps, step 1 and 4, are very simple sequential computations. > Step 1 just picks k random vertex ids and puts it into an aggregator. Step 4 > just compares "num-edges-crossing" by a threshold and also checks whether or > not the algorithm has exceeded maxIterations (not supersteps but iterations > of going through Steps 1-4). With the current API, it's not clear where to do > these computations. There is a per worker function preSuperstep() that can be > implemented, but if we decide to pick a special worker, let's say worker 1, > to pick the k vertices then we'd waste an entire superstep where only worker > 1 would do work, (by picking k vertices in preSuperstep() and put them into > an aggregator), and all other workers would be idle. Trying to do this in > worker 1 in postSuperstep() would not work either because, worker 1 needs to > know that all the vertices have converged to understand that it's time to > pick k vertices or it's time do check in step 4, which would only be > available to it in the beginning of the next superstep. > A master.compute() extension would run at the master and before the superstep > and would modify the aggregator that would keep the k vertices before the > aggregators are broadcast to the workers, which are all very short sequential > computations, so they would not waste resources the way a preSuperstep() or > postSuperstep() approach would do. It would also
[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats
[ https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13268917#comment-13268917 ] Avery Ching commented on GIRAPH-153: I'll take a look this weekend Brian. Thanks for the reminder. > HBase/Accumulo Input and Output formats > --- > > Key: GIRAPH-153 > URL: https://issues.apache.org/jira/browse/GIRAPH-153 > Project: Giraph > Issue Type: New Feature > Components: bsp >Affects Versions: 0.1.0 > Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB >Reporter: Brian Femiano > Attachments: GIRAPH-153.1.patch, GIRAPH-153.2.patch, > GIRAPH-153.3.patch, GIRAPH-153.patch > > > Four abstract classes that wrap their respective delegate input/output > formats for > easy hooks into vertex input format subclasses. I've included some sample > programs that show two very simple graph > algorithms. I have a graph generator that builds out a very simple directed > structure, starting with a few 'root' nodes. > Root nodes are defined as nodes which are not listed as a child anywhere in > the graph. > Algorithm 1) AccumuloRootMarker.java --> Accumulo as read/write source. > Every vertex starts thinking it's a root. At superstep 0, send a message down > to each > child as a non-root notification. After superstep 1, only root nodes will > have never been messaged. > Algorithm 2) TableRootMarker --> HBase as read/write source. Expands on A1 by > bundling the notification logic followed by root node propagation. Once we've > marked the appropriate nodes as roots, tell every child which roots it can be > traced back to via one or more spanning trees. This will take N + 2 > supersteps where N is the maximum number of hops from any root to any leaf, > plus 2 supersteps for the initial root flagging. > I've included all relevant code plus DistributedCacheHelper.java for > recursive cache file and archive searches. It is more hadoop centric than > giraph, but these jobs use it so I figured why not commit here. > These have been tested through local JobRunner, pseudo-distributed on the > aforementioned hardware, and full distributed on EC2. More details in the > comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-37) Implement Netty-backed rpc solution
[ https://issues.apache.org/jira/browse/GIRAPH-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269303#comment-13269303 ] Avery Ching commented on GIRAPH-37: --- Since Jakob had to switch gears, I wanted to let you guys know that I've spent a few days of the past week working on a netty-only replacement for communication. I should have a patch and some performance numbers up in a few days. Users will be able to choose between the old RPC way and the this netty approach. Netty is so customizable, it will likely taking a lot of tuning to get the dials right for most cases. > Implement Netty-backed rpc solution > --- > > Key: GIRAPH-37 > URL: https://issues.apache.org/jira/browse/GIRAPH-37 > Project: Giraph > Issue Type: New Feature >Reporter: Jakob Homan >Assignee: Jakob Homan > Attachments: GIRAPH-37-wip.patch > > > GIRAPH-12 considered replacing the current Hadoop based rpc method with > Netty, but didn't went in another direction. I think there is still value in > this approach, and will also look at Finagle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (GIRAPH-37) Implement Netty-backed rpc solution
[ https://issues.apache.org/jira/browse/GIRAPH-37?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Avery Ching updated GIRAPH-37: -- Attachment: GIRAPH-37.patch Same as reviewboard file, but ensuring the license is granted here. > Implement Netty-backed rpc solution > --- > > Key: GIRAPH-37 > URL: https://issues.apache.org/jira/browse/GIRAPH-37 > Project: Giraph > Issue Type: New Feature >Reporter: Jakob Homan >Assignee: Jakob Homan > Attachments: GIRAPH-37-wip.patch, GIRAPH-37.patch > > > GIRAPH-12 considered replacing the current Hadoop based rpc method with > Netty, but didn't went in another direction. I think there is still value in > this approach, and will also look at Finagle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-37) Implement Netty-backed rpc solution
[ https://issues.apache.org/jira/browse/GIRAPH-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271245#comment-13271245 ] Avery Ching commented on GIRAPH-37: --- Thanks Claudio. Here are more results with a scaled up 10 worker setup: Hadoop RPC: hadoop jar ~/giraph-0.2-SNAPSHOT-jar-with-dependencies.jar org.apache.giraph.benchmark.PageRankBenchmark -Dgiraph.useNetty=false -w 10 -V 1000 -s 5 -e 2 -v 12/05/09 02:32:05 INFO mapred.JobClient: Giraph Timers 12/05/09 02:32:05 INFO mapred.JobClient: Total (milliseconds)=149880 12/05/09 02:32:05 INFO mapred.JobClient: Superstep 3 (milliseconds)=21575 12/05/09 02:32:05 INFO mapred.JobClient: Setup (milliseconds)=7428 12/05/09 02:32:05 INFO mapred.JobClient: Shutdown (milliseconds)=174 12/05/09 02:32:05 INFO mapred.JobClient: Vertex input superstep (milliseconds)=39558 12/05/09 02:32:05 INFO mapred.JobClient: Superstep 0 (milliseconds)=16887 12/05/09 02:32:05 INFO mapred.JobClient: Superstep 4 (milliseconds)=18613 12/05/09 02:32:05 INFO mapred.JobClient: Superstep 5 (milliseconds)=3292 12/05/09 02:32:05 INFO mapred.JobClient: Superstep 2 (milliseconds)=21313 12/05/09 02:32:05 INFO mapred.JobClient: Superstep 1 (milliseconds)=21035 Netty: hadoop jar ~/giraph-0.2-SNAPSHOT-jar-with-dependencies.jar org.apache.giraph.benchmark.PageRankBenchmark -Dgiraph.useNetty=true -w 10 -V 1000 -s 5 -e 2 -v 12/05/09 02:35:06 INFO mapred.JobClient: Giraph Timers 12/05/09 02:35:06 INFO mapred.JobClient: Total (milliseconds)=59270 12/05/09 02:35:06 INFO mapred.JobClient: Superstep 3 (milliseconds)=11827 12/05/09 02:35:06 INFO mapred.JobClient: Setup (milliseconds)=3196 12/05/09 02:35:06 INFO mapred.JobClient: Shutdown (milliseconds)=124 12/05/09 02:35:06 INFO mapred.JobClient: Vertex input superstep (milliseconds)=13130 12/05/09 02:35:06 INFO mapred.JobClient: Superstep 0 (milliseconds)=8564 12/05/09 02:35:06 INFO mapred.JobClient: Superstep 4 (milliseconds)=5540 12/05/09 02:35:06 INFO mapred.JobClient: Superstep 5 (milliseconds)=2012 12/05/09 02:35:06 INFO mapred.JobClient: Superstep 2 (milliseconds)=8601 12/05/09 02:35:06 INFO mapred.JobClient: Superstep 1 (milliseconds)=6271 These results are fairly similar to the first set (even though there are more workers). I'm pretty sure we can squeeze more performance from Netty in the future in future patches (i.e. local send optimization is missing, tuning TCP parameters, exposing more knobs to the user, etc.). > Implement Netty-backed rpc solution > --- > > Key: GIRAPH-37 > URL: https://issues.apache.org/jira/browse/GIRAPH-37 > Project: Giraph > Issue Type: New Feature >Reporter: Jakob Homan >Assignee: Jakob Homan > Attachments: GIRAPH-37-wip.patch, GIRAPH-37.patch > > > GIRAPH-12 considered replacing the current Hadoop based rpc method with > Netty, but didn't went in another direction. I think there is still value in > this approach, and will also look at Finagle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (GIRAPH-37) Implement Netty-backed IPC
[ https://issues.apache.org/jira/browse/GIRAPH-37?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Avery Ching updated GIRAPH-37: -- Assignee: Avery Ching (was: Jakob Homan) Summary: Implement Netty-backed IPC (was: Implement Netty-backed rpc solution) > Implement Netty-backed IPC > -- > > Key: GIRAPH-37 > URL: https://issues.apache.org/jira/browse/GIRAPH-37 > Project: Giraph > Issue Type: New Feature >Reporter: Jakob Homan >Assignee: Avery Ching > Attachments: GIRAPH-37-wip.patch, GIRAPH-37.patch > > > GIRAPH-12 considered replacing the current Hadoop based rpc method with > Netty, but didn't went in another direction. I think there is still value in > this approach, and will also look at Finagle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-37) Implement Netty-backed IPC
[ https://issues.apache.org/jira/browse/GIRAPH-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271696#comment-13271696 ] Avery Ching commented on GIRAPH-37: --- @Claudio, Vertex input superstep is a blocking operation when sending the vertices to the destination partition owners. Now it's non-blocking, overlapping communication and computation. Setup should be ignored. That is the time to get all the map tasks and pick a master. > Implement Netty-backed IPC > -- > > Key: GIRAPH-37 > URL: https://issues.apache.org/jira/browse/GIRAPH-37 > Project: Giraph > Issue Type: New Feature >Reporter: Jakob Homan >Assignee: Avery Ching > Attachments: GIRAPH-37-wip.patch, GIRAPH-37.patch > > > GIRAPH-12 considered replacing the current Hadoop based rpc method with > Netty, but didn't went in another direction. I think there is still value in > this approach, and will also look at Finagle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (GIRAPH-37) Implement Netty-backed IPC
[ https://issues.apache.org/jira/browse/GIRAPH-37?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Avery Ching resolved GIRAPH-37. --- Resolution: Fixed Hudson is successful, closing. > Implement Netty-backed IPC > -- > > Key: GIRAPH-37 > URL: https://issues.apache.org/jira/browse/GIRAPH-37 > Project: Giraph > Issue Type: New Feature >Reporter: Jakob Homan >Assignee: Avery Ching > Attachments: GIRAPH-37-wip.patch, GIRAPH-37.patch > > > GIRAPH-12 considered replacing the current Hadoop based rpc method with > Netty, but didn't went in another direction. I think there is still value in > this approach, and will also look at Finagle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-189) Synchronization on Map values should be in a thread safe object
Avery Ching created GIRAPH-189: -- Summary: Synchronization on Map values should be in a thread safe object Key: GIRAPH-189 URL: https://issues.apache.org/jira/browse/GIRAPH-189 Project: Giraph Issue Type: Improvement Reporter: Avery Ching See https://reviews.apache.org/r/5074/ for reasoning -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-190) Create GiraphConf extends Configuration
Avery Ching created GIRAPH-190: -- Summary: Create GiraphConf extends Configuration Key: GIRAPH-190 URL: https://issues.apache.org/jira/browse/GIRAPH-190 Project: Giraph Issue Type: Improvement Reporter: Avery Ching Priority: Minor Currently all the options in Giraph are in the GiraphJob. It would be cleaner to do configuration as part of a special GiraphConf (analagous to HiveConf) and would simplify code elsewhere as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-1) Initial code import
[ https://issues.apache.org/jira/browse/GIRAPH-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082811#comment-13082811 ] Avery Ching commented on GIRAPH-1: -- Hey Owen, I really appreciate you checking in the code. Is it possible to do it with the history though? It currently appears that all the history was lost. > Initial code import > --- > > Key: GIRAPH-1 > URL: https://issues.apache.org/jira/browse/GIRAPH-1 > Project: Giraph > Issue Type: Task >Affects Versions: 0.1.0 >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Fix For: 0.1.0 > > > I did the initial code import from github. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-1) Initial code import
[ https://issues.apache.org/jira/browse/GIRAPH-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082901#comment-13082901 ] Avery Ching commented on GIRAPH-1: -- I can do the svn dump. The history would useful I think for seeing all the changes and associated rationale. I'll package something up and send it to you tonight. > Initial code import > --- > > Key: GIRAPH-1 > URL: https://issues.apache.org/jira/browse/GIRAPH-1 > Project: Giraph > Issue Type: Task >Affects Versions: 0.1.0 >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Fix For: 0.1.0 > > > I did the initial code import from github. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-1) Initial code import
[ https://issues.apache.org/jira/browse/GIRAPH-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083440#comment-13083440 ] Avery Ching commented on GIRAPH-1: -- Owen, sorry about the delay, but I dumped and loaded the dump to verify it preserved history. Steps I tried. - Get the dump file: Available from http://www.ece.northwestern.edu/~aching/giraph.dump.tar.gz (i.e. wget http://www.ece.northwestern.edu/~aching/giraph.dump.tar.gz) - Untar the dump file (i.e. 'tar zxvf giraph.dump.tar.gz') - Load the load into the svn repository (i.e. 'svnadmin load < giraph.dump') You might want to try additional options to specify where it goes in the Apache incubator svn repository. - Move the directory to the right location This might not be necessary if you use 'svnadmin load' correctly. Otherwise the directory will be in /projects/hadoop_bsp/trunk and should probably be moved to /Giraph/trunk or something like that. Please let me know how it goes. Thanks! > Initial code import > --- > > Key: GIRAPH-1 > URL: https://issues.apache.org/jira/browse/GIRAPH-1 > Project: Giraph > Issue Type: Task >Affects Versions: 0.1.0 >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Fix For: 0.1.0 > > > I did the initial code import from github. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-1) Initial code import
[ https://issues.apache.org/jira/browse/GIRAPH-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083450#comment-13083450 ] Avery Ching commented on GIRAPH-1: -- Let me know if I can help out by the way. I'm not sure who has svn admin privileges on the apache svn server. > Initial code import > --- > > Key: GIRAPH-1 > URL: https://issues.apache.org/jira/browse/GIRAPH-1 > Project: Giraph > Issue Type: Task >Affects Versions: 0.1.0 >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Fix For: 0.1.0 > > > I did the initial code import from github. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-1) Initial code import
[ https://issues.apache.org/jira/browse/GIRAPH-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083917#comment-13083917 ] Avery Ching commented on GIRAPH-1: -- Hyunsik, No, you are right, I had to do the following procedures to get the dump (my first time dumping an svn repo). svnsync with our main yahoo repository and then svnadmin dump only giraph hence there are around 30 revisions, but most of them are empty. It took me about 3-4 hours to complete. I lost access to my work machine with the sync, but I might be able to produce a more concise dump if needed. > Initial code import > --- > > Key: GIRAPH-1 > URL: https://issues.apache.org/jira/browse/GIRAPH-1 > Project: Giraph > Issue Type: Task >Affects Versions: 0.1.0 >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Fix For: 0.1.0 > > > I did the initial code import from github. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-1) Initial code import
[ https://issues.apache.org/jira/browse/GIRAPH-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083948#comment-13083948 ] Avery Ching commented on GIRAPH-1: -- No problem. I think I can get it down to around 30k (1/10 of the original dump revisions). That sound be good enough I hope. Unfortunately, I have to redo the svnsync (started about 1/2 hour ago and on revision 6...). > Initial code import > --- > > Key: GIRAPH-1 > URL: https://issues.apache.org/jira/browse/GIRAPH-1 > Project: Giraph > Issue Type: Task >Affects Versions: 0.1.0 >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Fix For: 0.1.0 > > > I did the initial code import from github. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-1) Initial code import
[ https://issues.apache.org/jira/browse/GIRAPH-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13084016#comment-13084016 ] Avery Ching commented on GIRAPH-1: -- I have synced and redumped the svn repo, but with a very limited number of revisions (around 10k). I was able to load all the revisions into my local repo in about 10 minutes, much improved over the 3-4 hours before. Note again that the path it will produce from loading is something like /projects/hadoop_bsp/trunk and should be moved with 'svn mv' to something like https://svn.apache.org/repos/asf/incubator/giraph/trunk. Please let me know if there are issues. Thanks! New dump file location: http://www.ece.northwestern.edu/~aching/giraph_27_280777.dump.tar.gz > Initial code import > --- > > Key: GIRAPH-1 > URL: https://issues.apache.org/jira/browse/GIRAPH-1 > Project: Giraph > Issue Type: Task >Affects Versions: 0.1.0 >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Fix For: 0.1.0 > > > I did the initial code import from github. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-1) Initial code import
[ https://issues.apache.org/jira/browse/GIRAPH-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085267#comment-13085267 ] Avery Ching commented on GIRAPH-1: -- While svndumpfilter failed to work, I was able to fix the issue with svndumpfilter2. This has less than 150 revisions. Here is the final dump location: http://www.ece.northwestern.edu/~aching/2011.08.15.giraph.dump.tar.gz 1. After downloading the file, execute 'tar zxvf 2011.08.15.giraph.tar.gz' to get the actual dump file. 2. Remove the current trunk directory (i.e. svn rm /incubator/giraph/trunk) 3. Load the data into the repository location 'incubator/giraph/trunk' (will also create incubator/giraph/trunk). svnadmin load < 2011.08.15.giraph.dump That should be it! I've deleted the old dump files to avoid confusion. > Initial code import > --- > > Key: GIRAPH-1 > URL: https://issues.apache.org/jira/browse/GIRAPH-1 > Project: Giraph > Issue Type: Task >Affects Versions: 0.1.0 >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Fix For: 0.1.0 > > > I did the initial code import from github. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-2) make the project homepage
[ https://issues.apache.org/jira/browse/GIRAPH-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089436#comment-13089436 ] Avery Ching commented on GIRAPH-2: -- Agreed, any thoughts on how the homepage differs from the confluence wiki? > make the project homepage > - > > Key: GIRAPH-2 > URL: https://issues.apache.org/jira/browse/GIRAPH-2 > Project: Giraph > Issue Type: Task >Reporter: Hyunsik Choi > > We need to make the project homepage at http://incubator.apache.org/giraph/. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-3) Vertex:sentMsgToAllEdges should be sendMsg
[ https://issues.apache.org/jira/browse/GIRAPH-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089828#comment-13089828 ] Avery Ching commented on GIRAPH-3: -- Duh. I guess we should wait until the svn import is finished before doing this... > Vertex:sentMsgToAllEdges should be sendMsg > -- > > Key: GIRAPH-3 > URL: https://issues.apache.org/jira/browse/GIRAPH-3 > Project: Giraph > Issue Type: Bug >Reporter: Jakob Homan >Assignee: Jakob Homan > > The method Vertex.java:sentMsgToAllEdges() should be sendMsgToAllEdges() -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-2) make the project homepage
[ https://issues.apache.org/jira/browse/GIRAPH-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091482#comment-13091482 ] Avery Ching commented on GIRAPH-2: -- Jakob, great start! Changes look good to me. > make the project homepage > - > > Key: GIRAPH-2 > URL: https://issues.apache.org/jira/browse/GIRAPH-2 > Project: Giraph > Issue Type: Task >Reporter: Hyunsik Choi >Assignee: Jakob Homan > Attachments: GIRAPH-2.patch > > > We need to make the project homepage at http://incubator.apache.org/giraph/. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-3) Vertex:sentMsgToAllEdges should be sendMsg
[ https://issues.apache.org/jira/browse/GIRAPH-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091486#comment-13091486 ] Avery Ching commented on GIRAPH-3: -- I've +1'd it too. We can address the naming conventions in another issue. > Vertex:sentMsgToAllEdges should be sendMsg > -- > > Key: GIRAPH-3 > URL: https://issues.apache.org/jira/browse/GIRAPH-3 > Project: Giraph > Issue Type: Bug >Reporter: Jakob Homan >Assignee: Jakob Homan > Attachments: GIRAPH-3.patch > > > The method Vertex.java:sentMsgToAllEdges() should be sendMsgToAllEdges() -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-5) Remove Yahoo directories
Remove Yahoo directories Key: GIRAPH-5 URL: https://issues.apache.org/jira/browse/GIRAPH-5 Project: Giraph Issue Type: Task Reporter: Avery Ching Assignee: Avery Ching Priority: Minor As an artifact of pulling from the Yahoo! svn repository, we need to re-remove the Yahoo! specific build stuff. This was done already in GitHub, but of course, they are different places. I would like to remove the following directories: src/ci/ src/main/pkg Also, as Jakob has seen, our pom.xml needs cleanup. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (GIRAPH-5) Remove Yahoo directories
[ https://issues.apache.org/jira/browse/GIRAPH-5?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Avery Ching updated GIRAPH-5: - Attachment: diff.txt Diff after 'svn rm' of those two directories. > Remove Yahoo directories > > > Key: GIRAPH-5 > URL: https://issues.apache.org/jira/browse/GIRAPH-5 > Project: Giraph > Issue Type: Task >Reporter: Avery Ching >Assignee: Avery Ching >Priority: Minor > Attachments: diff.txt > > > As an artifact of pulling from the Yahoo! svn repository, we need to > re-remove the Yahoo! specific build stuff. This was done already in GitHub, > but of course, they are different places. > I would like to remove the following directories: > src/ci/ > src/main/pkg > Also, as Jakob has seen, our pom.xml needs cleanup. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (GIRAPH-5) Remove Yahoo directories
[ https://issues.apache.org/jira/browse/GIRAPH-5?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Avery Ching resolved GIRAPH-5. -- Resolution: Fixed Committed after Jakob's +1. > Remove Yahoo directories > > > Key: GIRAPH-5 > URL: https://issues.apache.org/jira/browse/GIRAPH-5 > Project: Giraph > Issue Type: Task >Reporter: Avery Ching >Assignee: Avery Ching >Priority: Minor > Attachments: diff.txt > > > As an artifact of pulling from the Yahoo! svn repository, we need to > re-remove the Yahoo! specific build stuff. This was done already in GitHub, > but of course, they are different places. > I would like to remove the following directories: > src/ci/ > src/main/pkg > Also, as Jakob has seen, our pom.xml needs cleanup. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-6) Remove Yahoo-specific code from pom.xml
[ https://issues.apache.org/jira/browse/GIRAPH-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092117#comment-13092117 ] Avery Ching commented on GIRAPH-6: -- Thanks for doing this. > Remove Yahoo-specific code from pom.xml > --- > > Key: GIRAPH-6 > URL: https://issues.apache.org/jira/browse/GIRAPH-6 > Project: Giraph > Issue Type: Bug >Reporter: Jakob Homan >Assignee: Jakob Homan >Priority: Blocker > Attachments: GIRAPH-6.patch > > > There are remaining references to Y! infrastructure in the pom.xml, which > prevents the build from succeeding. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-8) Update references to Yahoo bug that needs to be fixed
[ https://issues.apache.org/jira/browse/GIRAPH-8?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092124#comment-13092124 ] Avery Ching commented on GIRAPH-8: -- This has to do with internal Yahoo! bugs that need to be ported to JIRA. I will do this. The bug has not been fixed and the issue is that basically we can only store so many VertexRange objects in a single ZooKeeper znode. Currently there should be a workaround to prevent too many vertex ranges from being created. > Update references to Yahoo bug that needs to be fixed > - > > Key: GIRAPH-8 > URL: https://issues.apache.org/jira/browse/GIRAPH-8 > Project: Giraph > Issue Type: Bug >Reporter: Jakob Homan > > In BspServiceMaster.java there are three TODOS (lines 1342, 1348, 1377) > referring to those sections of code being deleted after Bug#4340282 is fixed. > We should either verify that this has been fixed, change the comments to a > more descriptive explanation, or fix whatever bug is being referenced. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-2) make the project homepage
[ https://issues.apache.org/jira/browse/GIRAPH-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092127#comment-13092127 ] Avery Ching commented on GIRAPH-2: -- Any preference between mvn2 or mvn3? Since I know there is an issue with the hadoop=non_secure with mvn2, maybe it's better to go to mvn3? I love the page. A related question is with respect to the version number (0.70). Should we move it to 0.1 to reflect the Apache version? > make the project homepage > - > > Key: GIRAPH-2 > URL: https://issues.apache.org/jira/browse/GIRAPH-2 > Project: Giraph > Issue Type: Task >Reporter: Hyunsik Choi >Assignee: Jakob Homan > Attachments: GIRAPH-2.patch, GIRAPH-2b.patch > > > We need to make the project homepage at http://incubator.apache.org/giraph/. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-2) make the project homepage
[ https://issues.apache.org/jira/browse/GIRAPH-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092132#comment-13092132 ] Avery Ching commented on GIRAPH-2: -- Well, unless there are any objections, let's go to mvn3. With respect to the product version, can we advance the Apache version to 0.70? If not, I don't mind going back to 0.1. It's just a number =). > make the project homepage > - > > Key: GIRAPH-2 > URL: https://issues.apache.org/jira/browse/GIRAPH-2 > Project: Giraph > Issue Type: Task >Reporter: Hyunsik Choi >Assignee: Jakob Homan > Attachments: GIRAPH-2.patch, GIRAPH-2b.patch > > > We need to make the project homepage at http://incubator.apache.org/giraph/. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-9) Change Yahoo License Header to Apache License Header
[ https://issues.apache.org/jira/browse/GIRAPH-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092604#comment-13092604 ] Avery Ching commented on GIRAPH-9: -- Hyunsik, I've +1ed it. It's nice to be in Apache now, thanks for making the license changes. Out of curiosity, did you use Wdev91 copyright wizard (what I previously used to create the original copyrights) or some other tool? > Change Yahoo License Header to Apache License Header > > > Key: GIRAPH-9 > URL: https://issues.apache.org/jira/browse/GIRAPH-9 > Project: Giraph > Issue Type: Task >Reporter: Hyunsik Choi >Assignee: Hyunsik Choi > Fix For: 0.1.0 > > Attachments: GIRAPH-9.patch > > > All source codes contains Yahoo License Header as follows > {noformat} > Licensed to Yahoo! under one or more contributor license agreements. > ... > {noformat} > These license header should be as follows > {noformat} > Licensed to the Apache Software Foundation (ASF) under one > or more contributor license agreements. > ... > {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (GIRAPH-10) Aggregators are not exported
[ https://issues.apache.org/jira/browse/GIRAPH-10?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Avery Ching updated GIRAPH-10: -- Priority: Minor (was: Major) > Aggregators are not exported > > > Key: GIRAPH-10 > URL: https://issues.apache.org/jira/browse/GIRAPH-10 > Project: Giraph > Issue Type: New Feature >Reporter: Avery Ching >Priority: Minor > > Currently, aggregator values cannot be saved after a Giraph job. There > should be a way to do this. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-10) Aggregators are not exported
Aggregators are not exported Key: GIRAPH-10 URL: https://issues.apache.org/jira/browse/GIRAPH-10 Project: Giraph Issue Type: New Feature Reporter: Avery Ching Currently, aggregator values cannot be saved after a Giraph job. There should be a way to do this. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-11) Improve the graph distribution of Giraph
Improve the graph distribution of Giraph Key: GIRAPH-11 URL: https://issues.apache.org/jira/browse/GIRAPH-11 Project: Giraph Issue Type: Improvement Reporter: Avery Ching Assignee: Avery Ching Currently, Giraph assumes that the data from the VertexInputFormat is sorted. If the user data is not sorted by the vertex id, they must first run a MapReduce or Pig job to generate a sorted dataset. This is often a bit inconvenient. Giraph graph partitioning is currently range based and there are some advantages and disadvantages of this approach. The proposal of this JIRA would be to allow for both range and hash based partitioning and provide more flexibility to the user. Design goals for the graph distribution: * Allow vertices to be unordered or unordered * Ability to repartition * Select the partitioning scheme based on user needs (i.e. hash or range based) * Ability to provide user-specific hints about partitions Hash-based partitioning * Good vertex balancing across ranges for random data * Bad at vertex id locality Range-based partitioning * Good at vertex id locality * Ability to split ranges easily * Can cause hotspots for hot ranges -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-12) Investigate communication improvements
Investigate communication improvements -- Key: GIRAPH-12 URL: https://issues.apache.org/jira/browse/GIRAPH-12 Project: Giraph Issue Type: Improvement Reporter: Avery Ching Priority: Minor Currently every worker will start up a thread to communicate with every other workers. Hadoop RPC is used for communication. For instance if there are 400 workers, each worker will create 400 threads. This ends up using a lot of memory, even with the option -Dmapred.child.java.opts="-Xss64k". It would be good to investigate using frameworks like Netty or custom roll our own to improve this situation. By moving away from Hadoop RPC, we would also make compatibility of different Hadoop versions easier. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-2) make the project homepage
[ https://issues.apache.org/jira/browse/GIRAPH-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092977#comment-13092977 ] Avery Ching commented on GIRAPH-2: -- Done =) +1. > make the project homepage > - > > Key: GIRAPH-2 > URL: https://issues.apache.org/jira/browse/GIRAPH-2 > Project: Giraph > Issue Type: Task >Reporter: Hyunsik Choi >Assignee: Jakob Homan > Attachments: GIRAPH-2.patch, GIRAPH-2b.patch > > > We need to make the project homepage at http://incubator.apache.org/giraph/. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-13) Port Giraph to YARN
[ https://issues.apache.org/jira/browse/GIRAPH-13?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13093213#comment-13093213 ] Avery Ching commented on GIRAPH-13: --- Agreed. > Port Giraph to YARN > --- > > Key: GIRAPH-13 > URL: https://issues.apache.org/jira/browse/GIRAPH-13 > Project: Giraph > Issue Type: New Feature >Reporter: Jakob Homan > > Now that YARN (aka MR2 aka MAPREDUCE-279) has been merged into the Hadoop > trunk, we should think about what it would take to separate out the graph > processing bits of Giraph from the MR1-specific code so as to take advantage > of the less-MR centric aspects of YARN, while still supporting both over the > medium term. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-14) Support for the Facebook Hadoop branch
Support for the Facebook Hadoop branch -- Key: GIRAPH-14 URL: https://issues.apache.org/jira/browse/GIRAPH-14 Project: Giraph Issue Type: New Feature Reporter: Avery Ching I've been working with Joe Xie on support to get Giraph running on the Facebook Hadoop branch. He verified today that the examples worked on their cluster. I need to clean up my changes a little, but otherwise, will submit a cleaned up diff. As a side note, does anyone know how we can get Hudson support for Giraph? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (GIRAPH-14) Support for the Facebook Hadoop branch
[ https://issues.apache.org/jira/browse/GIRAPH-14?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Avery Ching reassigned GIRAPH-14: - Assignee: Avery Ching > Support for the Facebook Hadoop branch > -- > > Key: GIRAPH-14 > URL: https://issues.apache.org/jira/browse/GIRAPH-14 > Project: Giraph > Issue Type: New Feature >Reporter: Avery Ching >Assignee: Avery Ching > > I've been working with Joe Xie on support to get Giraph running on the > Facebook Hadoop branch. He verified today that the examples worked on their > cluster. I need to clean up my changes a little, but otherwise, will submit > a cleaned up diff. As a side note, does anyone know how we can get Hudson > support for Giraph? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-14) Support for the Facebook Hadoop branch
[ https://issues.apache.org/jira/browse/GIRAPH-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13093348#comment-13093348 ] Avery Ching commented on GIRAPH-14: --- Thanks Hyunsik. > Support for the Facebook Hadoop branch > -- > > Key: GIRAPH-14 > URL: https://issues.apache.org/jira/browse/GIRAPH-14 > Project: Giraph > Issue Type: New Feature >Reporter: Avery Ching >Assignee: Avery Ching > > I've been working with Joe Xie on support to get Giraph running on the > Facebook Hadoop branch. He verified today that the examples worked on their > cluster. I need to clean up my changes a little, but otherwise, will submit > a cleaned up diff. As a side note, does anyone know how we can get Hudson > support for Giraph? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-17) Giraph doesn't give up properly after the maximum connect attempts to ZooKeeper
Giraph doesn't give up properly after the maximum connect attempts to ZooKeeper --- Key: GIRAPH-17 URL: https://issues.apache.org/jira/browse/GIRAPH-17 Project: Giraph Issue Type: Bug Reporter: Avery Ching Assignee: Avery Ching Priority: Minor This produces incorrect and strange behavior. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (GIRAPH-17) Giraph doesn't give up properly after the maximum connect attempts to ZooKeeper
[ https://issues.apache.org/jira/browse/GIRAPH-17?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Avery Ching updated GIRAPH-17: -- Attachment: ZooKeeperManager.java.diff > Giraph doesn't give up properly after the maximum connect attempts to > ZooKeeper > --- > > Key: GIRAPH-17 > URL: https://issues.apache.org/jira/browse/GIRAPH-17 > Project: Giraph > Issue Type: Bug >Reporter: Avery Ching >Assignee: Avery Ching >Priority: Minor > Attachments: ZooKeeperManager.java.diff > > > This produces incorrect and strange behavior. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-15) Use of Jenkins for tests and builds
[ https://issues.apache.org/jira/browse/GIRAPH-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13093439#comment-13093439 ] Avery Ching commented on GIRAPH-15: --- Looks like one of our mentors will have to do this as you suggest. Hopefully Owen, Alan, or Chris can do it for us. > Use of Jenkins for tests and builds > --- > > Key: GIRAPH-15 > URL: https://issues.apache.org/jira/browse/GIRAPH-15 > Project: Giraph > Issue Type: Task >Reporter: Hyunsik Choi >Assignee: Hyunsik Choi > > We can use Jenkins server (https://builds.apache.org/) for regular builds and > tests. To use jenkins, there are some processes. > Here is FAQ about use of Jenkins. > http://wiki.apache.org/general/Hudson -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (GIRAPH-14) Support for the Facebook Hadoop branch
[ https://issues.apache.org/jira/browse/GIRAPH-14?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Avery Ching updated GIRAPH-14: -- Attachment: facebook.txt Supports the Facebook version of Hadoop with mvn -Dhadoop=facebook -Dhadoop.jar.path= > Support for the Facebook Hadoop branch > -- > > Key: GIRAPH-14 > URL: https://issues.apache.org/jira/browse/GIRAPH-14 > Project: Giraph > Issue Type: New Feature >Reporter: Avery Ching >Assignee: Avery Ching > Attachments: facebook.txt > > > I've been working with Joe Xie on support to get Giraph running on the > Facebook Hadoop branch. He verified today that the examples worked on their > cluster. I need to clean up my changes a little, but otherwise, will submit > a cleaned up diff. As a side note, does anyone know how we can get Hudson > support for Giraph? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-17) Giraph doesn't give up properly after the maximum connect attempts to ZooKeeper
[ https://issues.apache.org/jira/browse/GIRAPH-17?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13093808#comment-13093808 ] Avery Ching commented on GIRAPH-17: --- Sure. The main part of this fix is -if (connectAttempts == 5) { +if (connectAttempts == maxConnectAttempts) { Basically this condition should be hit if the max connect attempts was tried, but never was because they because maxConnectAttempts is now 10 and became out of sync at some point (maxConnectAttempts probably used to be 5). The limit is stil not configurable, we can address that in a later issue. > Giraph doesn't give up properly after the maximum connect attempts to > ZooKeeper > --- > > Key: GIRAPH-17 > URL: https://issues.apache.org/jira/browse/GIRAPH-17 > Project: Giraph > Issue Type: Bug >Reporter: Avery Ching >Assignee: Avery Ching >Priority: Minor > Attachments: ZooKeeperManager.java.diff > > > This produces incorrect and strange behavior. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-17) Giraph doesn't give up properly after the maximum connect attempts to ZooKeeper
[ https://issues.apache.org/jira/browse/GIRAPH-17?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13093861#comment-13093861 ] Avery Ching commented on GIRAPH-17: --- Thanks for taking a look. Committed. > Giraph doesn't give up properly after the maximum connect attempts to > ZooKeeper > --- > > Key: GIRAPH-17 > URL: https://issues.apache.org/jira/browse/GIRAPH-17 > Project: Giraph > Issue Type: Bug >Reporter: Avery Ching >Assignee: Avery Ching >Priority: Minor > Attachments: ZooKeeperManager.java.diff > > > This produces incorrect and strange behavior. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-14) Support for the Facebook Hadoop branch
[ https://issues.apache.org/jira/browse/GIRAPH-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13093879#comment-13093879 ] Avery Ching commented on GIRAPH-14: --- It's good to hear that you can run it on your cluster. As far as the unittests, that is strange. I was able to repeat the same issues and will look into a fix. > Support for the Facebook Hadoop branch > -- > > Key: GIRAPH-14 > URL: https://issues.apache.org/jira/browse/GIRAPH-14 > Project: Giraph > Issue Type: New Feature >Reporter: Avery Ching >Assignee: Avery Ching > Attachments: facebook.txt > > > I've been working with Joe Xie on support to get Giraph running on the > Facebook Hadoop branch. He verified today that the examples worked on their > cluster. I need to clean up my changes a little, but otherwise, will submit > a cleaned up diff. As a side note, does anyone know how we can get Hudson > support for Giraph? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-4) New project logo
[ https://issues.apache.org/jira/browse/GIRAPH-4?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094134#comment-13094134 ] Avery Ching commented on GIRAPH-4: -- Yes, it would certainly be nice to have a real logo. Do you want to give it a shot? > New project logo > > > Key: GIRAPH-4 > URL: https://issues.apache.org/jira/browse/GIRAPH-4 > Project: Giraph > Issue Type: New Feature >Reporter: Jakob Homan > > Now for the hard part: the project logo. We should create one and add it to > the website once done. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (GIRAPH-14) Support for the Facebook Hadoop branch
[ https://issues.apache.org/jira/browse/GIRAPH-14?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Avery Ching updated GIRAPH-14: -- Attachment: facebook2.txt Looks like I needed to change the groupId so that the right dependencies are pulled in for hadoop. Please try this one out. The unittests all passed for me. (i.e. mvn -Dhadoop=facebook -Dhadoop.jar.path=/Users/aching/Desktop/hadoop-0.20.1-core.jar package) > Support for the Facebook Hadoop branch > -- > > Key: GIRAPH-14 > URL: https://issues.apache.org/jira/browse/GIRAPH-14 > Project: Giraph > Issue Type: New Feature >Reporter: Avery Ching >Assignee: Avery Ching > Attachments: facebook.txt, facebook2.txt > > > I've been working with Joe Xie on support to get Giraph running on the > Facebook Hadoop branch. He verified today that the examples worked on their > cluster. I need to clean up my changes a little, but otherwise, will submit > a cleaned up diff. As a side note, does anyone know how we can get Hudson > support for Giraph? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-14) Support for the Facebook Hadoop branch
[ https://issues.apache.org/jira/browse/GIRAPH-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094216#comment-13094216 ] Avery Ching commented on GIRAPH-14: --- Great to hear it! When one of the committers gets a chance to review, I can commit. > Support for the Facebook Hadoop branch > -- > > Key: GIRAPH-14 > URL: https://issues.apache.org/jira/browse/GIRAPH-14 > Project: Giraph > Issue Type: New Feature >Reporter: Avery Ching >Assignee: Avery Ching > Attachments: facebook.txt, facebook2.txt > > > I've been working with Joe Xie on support to get Giraph running on the > Facebook Hadoop branch. He verified today that the examples worked on their > cluster. I need to clean up my changes a little, but otherwise, will submit > a cleaned up diff. As a side note, does anyone know how we can get Hudson > support for Giraph? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-14) Support for the Facebook Hadoop branch
[ https://issues.apache.org/jira/browse/GIRAPH-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094305#comment-13094305 ] Avery Ching commented on GIRAPH-14: --- In theory, I believe that Facebook's distro is online (https://github.com/facebook/hadoop-20-warehouse). The long term story is to factor out the parts into modules and then compile them based on the user profile. Then we don't have to "munge" anything anymore. At least that's what I've thought of for now. I'm open to better solutions. Pre-processing will get unmaintainable if we have to support every version of Hadoop. That being said, we should support the big customers of Giraph and that likely includes Facebook as well. I'll add instructions to the README and submit a new patch. > Support for the Facebook Hadoop branch > -- > > Key: GIRAPH-14 > URL: https://issues.apache.org/jira/browse/GIRAPH-14 > Project: Giraph > Issue Type: New Feature >Reporter: Avery Ching >Assignee: Avery Ching > Attachments: facebook.txt, facebook2.txt > > > I've been working with Joe Xie on support to get Giraph running on the > Facebook Hadoop branch. He verified today that the examples worked on their > cluster. I need to clean up my changes a little, but otherwise, will submit > a cleaned up diff. As a side note, does anyone know how we can get Hudson > support for Giraph? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (GIRAPH-14) Support for the Facebook Hadoop branch
[ https://issues.apache.org/jira/browse/GIRAPH-14?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Avery Ching updated GIRAPH-14: -- Attachment: facebook3.patch Updated with README instructions for building with the Facebook Hadoop release. > Support for the Facebook Hadoop branch > -- > > Key: GIRAPH-14 > URL: https://issues.apache.org/jira/browse/GIRAPH-14 > Project: Giraph > Issue Type: New Feature >Reporter: Avery Ching >Assignee: Avery Ching > Attachments: facebook.txt, facebook2.txt, facebook3.patch > > > I've been working with Joe Xie on support to get Giraph running on the > Facebook Hadoop branch. He verified today that the examples worked on their > cluster. I need to clean up my changes a little, but otherwise, will submit > a cleaned up diff. As a side note, does anyone know how we can get Hudson > support for Giraph? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-18) Refactor BspServiceWorker::loadVertices()
[ https://issues.apache.org/jira/browse/GIRAPH-18?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094330#comment-13094330 ] Avery Ching commented on GIRAPH-18: --- This isn't the only area that needs refactoring =). Let me take a deeper look tomorrow, but initially looks better. > Refactor BspServiceWorker::loadVertices() > - > > Key: GIRAPH-18 > URL: https://issues.apache.org/jira/browse/GIRAPH-18 > Project: Giraph > Issue Type: Improvement >Reporter: Jakob Homan >Assignee: Jakob Homan > Attachments: GIRAPH-18.patch > > > Currently BspServiceWorker::loadVertices() is more than 200 lines and > convoluted. I found it difficult to grok while debugging today. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-18) Refactor BspServiceWorker::loadVertices()
[ https://issues.apache.org/jira/browse/GIRAPH-18?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094686#comment-13094686 ] Avery Ching commented on GIRAPH-18: --- +1 Nice work refactoring, makes the code more readable. Sorry it took me so long to review, but it's tougher for me without my trusty reviewboard =). I was able to pass the unittests with your changes. I also like your change to save some memory (every bit helps). Couple of style notes: We have a CODE_CONVENTIONS file in the base path, probably this should be updated? I'll file a separate JIRA for this. 1. Map> -> Map> 2. Current style is limitted to 80 chars per line (or should be). Maybe this is unrealistic? 3. Some 2 style indentation, i.e. while ((inputSplitPath = reserveInputSplit()) != null) { Map> maxIndexStatMap = > Refactor BspServiceWorker::loadVertices() > - > > Key: GIRAPH-18 > URL: https://issues.apache.org/jira/browse/GIRAPH-18 > Project: Giraph > Issue Type: Improvement >Reporter: Jakob Homan >Assignee: Jakob Homan > Attachments: GIRAPH-18.patch > > > Currently BspServiceWorker::loadVertices() is more than 200 lines and > convoluted. I found it difficult to grok while debugging today. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (GIRAPH-14) Support for the Facebook Hadoop branch
[ https://issues.apache.org/jira/browse/GIRAPH-14?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Avery Ching resolved GIRAPH-14. --- Resolution: Fixed Committed, with changelog addition. > Support for the Facebook Hadoop branch > -- > > Key: GIRAPH-14 > URL: https://issues.apache.org/jira/browse/GIRAPH-14 > Project: Giraph > Issue Type: New Feature >Reporter: Avery Ching >Assignee: Avery Ching > Attachments: facebook.txt, facebook2.txt, facebook3.patch > > > I've been working with Joe Xie on support to get Giraph running on the > Facebook Hadoop branch. He verified today that the examples worked on their > cluster. I need to clean up my changes a little, but otherwise, will submit > a cleaned up diff. As a side note, does anyone know how we can get Hudson > support for Giraph? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-21) Revise CODE_CONVENTIONS
Revise CODE_CONVENTIONS --- Key: GIRAPH-21 URL: https://issues.apache.org/jira/browse/GIRAPH-21 Project: Giraph Issue Type: Improvement Reporter: Avery Ching Priority: Minor Currently there is a CODE_CONVENTIONS file in the base path of Giraph. It's fairly sparse and we have been assuming an 80 char limit per line. It's good to have common conventions so that the code doesn't get too messy. Does anyone have any opinions on this now? Probably best to tackle early and then have something to follow. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-21) Revise CODE_CONVENTIONS
[ https://issues.apache.org/jira/browse/GIRAPH-21?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094768#comment-13094768 ] Avery Ching commented on GIRAPH-21: --- I'll definitely let this continue to flesh out, but 80 chars and 2 spaces is fine with me. I will modify/augment the CODE_CONVENTIONS file and then when we have consensus, I will commit as well as try to get Eclipse to help me change the source to match. Btw, it seems like Owen doesn't like abbreviations, so we can add that here too. =) > Revise CODE_CONVENTIONS > --- > > Key: GIRAPH-21 > URL: https://issues.apache.org/jira/browse/GIRAPH-21 > Project: Giraph > Issue Type: Improvement >Reporter: Avery Ching >Priority: Minor > > Currently there is a CODE_CONVENTIONS file in the base path of Giraph. It's > fairly sparse and we have been assuming an 80 char limit per line. It's good > to have common conventions so that the code doesn't get too messy. Does > anyone have any opinions on this now? Probably best to tackle early and then > have something to follow. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (GIRAPH-21) Revise CODE_CONVENTIONS
[ https://issues.apache.org/jira/browse/GIRAPH-21?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Avery Ching reassigned GIRAPH-21: - Assignee: Avery Ching > Revise CODE_CONVENTIONS > --- > > Key: GIRAPH-21 > URL: https://issues.apache.org/jira/browse/GIRAPH-21 > Project: Giraph > Issue Type: Improvement >Reporter: Avery Ching >Assignee: Avery Ching >Priority: Minor > > Currently there is a CODE_CONVENTIONS file in the base path of Giraph. It's > fairly sparse and we have been assuming an 80 char limit per line. It's good > to have common conventions so that the code doesn't get too messy. Does > anyone have any opinions on this now? Probably best to tackle early and then > have something to follow. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-18) Refactor BspServiceWorker::loadVertices()
[ https://issues.apache.org/jira/browse/GIRAPH-18?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094793#comment-13094793 ] Avery Ching commented on GIRAPH-18: --- +1 Looks better, thanks. Hey, could you add a javadoc comment for readVerticesFromInputSplit? No need to attach the patch again, though, you can just check it in. > Refactor BspServiceWorker::loadVertices() > - > > Key: GIRAPH-18 > URL: https://issues.apache.org/jira/browse/GIRAPH-18 > Project: Giraph > Issue Type: Improvement >Reporter: Jakob Homan >Assignee: Jakob Homan > Attachments: GIRAPH-18.patch, GIRAPH-18b.patch > > > Currently BspServiceWorker::loadVertices() is more than 200 lines and > convoluted. I found it difficult to grok while debugging today. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-19) Create a CHANGES.txt file
[ https://issues.apache.org/jira/browse/GIRAPH-19?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094840#comment-13094840 ] Avery Ching commented on GIRAPH-19: --- Should we add dates too? I don't have a preference on the ordering either, but agree that consistency is important. > Create a CHANGES.txt file > - > > Key: GIRAPH-19 > URL: https://issues.apache.org/jira/browse/GIRAPH-19 > Project: Giraph > Issue Type: Improvement > Components: documentation >Reporter: Owen O'Malley >Assignee: Owen O'Malley > > It is helpful to have a file that is updated with each change along with who > contributed and committed the patch. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-22) Sort out examples from unit test helpers in examples package
[ https://issues.apache.org/jira/browse/GIRAPH-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094901#comment-13094901 ] Avery Ching commented on GIRAPH-22: --- Good idea. I think a few of them are full programs though. Not sure about the best way to do this. > Sort out examples from unit test helpers in examples package > > > Key: GIRAPH-22 > URL: https://issues.apache.org/jira/browse/GIRAPH-22 > Project: Giraph > Issue Type: Improvement >Reporter: Jakob Homan > > Within src/examples there are quite a few files defined that are mainly used > in unit or other tests: > * GeneratedVertexInputFormat > * GeneratedVertexInputFormat > * LongSumAggregator > * MaxAggregator > * MinAggregator > * SimpleCombinerVertex > * SimpleFailVertex > * SimpleMsgVertex > * SimpleMutateGraphVertex > * SimpleSumCombiner > * SumAggregator > * SuperstepBalancer > Several of these explicitly say they're designed to aid in unit testing. If > these are indeed meant for testing, they should be moved to the test > directory. If they're examples, it would be better to sort out the overly > complicated ones and ones that include lots of tests and asserts, so only to > show the essence of the example. Hopefully the examples directory have a > few, very heavily documented programs of the helloworld/word count/shortest > path variety (with sample inputs) that can be quickly launched. Once new > developers grok these, they can turn to the unit tests, which can of course > be great sources to learn the code from. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-23) Giraph causes capacity scheduler to report crazy statistics
[ https://issues.apache.org/jira/browse/GIRAPH-23?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094969#comment-13094969 ] Avery Ching commented on GIRAPH-23: --- That is weird, haven't seen this issue on 0.20.203 and 0.20.204, nor on the 0.20.1 release we have at Yahoo!. > Giraph causes capacity scheduler to report crazy statistics > --- > > Key: GIRAPH-23 > URL: https://issues.apache.org/jira/browse/GIRAPH-23 > Project: Giraph > Issue Type: Bug > Environment: Hadoop 20.2, non-secure with capacity scheduler >Reporter: Jakob Homan > > Not sure why, but all our Giraph jobs create crazy values for the scheduler > in terms of number of mappers: > {noformat}51 running map tasks using -52224 map slots, 0 running reduce tasks > using 0 reduce slots. {noformat} > and this trickles out to the whole cluster: > {noformat}Used capacity: -58229 (-12468.7% of Capacity){noformat} > These numbers don't appear to affect the job and the correct themselves a > short time after the Giraph job finishes running. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-24) Job-level statistics reports one superstep greater than workers
[ https://issues.apache.org/jira/browse/GIRAPH-24?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095042#comment-13095042 ] Avery Ching commented on GIRAPH-24: --- I thought I noticed this at some point, but forgot about it. Looks good. +1 > Job-level statistics reports one superstep greater than workers > --- > > Key: GIRAPH-24 > URL: https://issues.apache.org/jira/browse/GIRAPH-24 > Project: Giraph > Issue Type: Bug >Reporter: Jakob Homan >Assignee: Jakob Homan > Attachments: GIRAPH-24.patch > > > In {{BspServiceMaster::coordinateSuperstep()}} the {{superStepCounter}} is > incremented when the coordination begins, but since the counter starts at > zero, this has the job level statistic being at superstep {{n+1}} when the > workers are reporting that they are working on {{n}}. This discrepancy > persists throughout the job. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-13) Port Giraph to YARN
[ https://issues.apache.org/jira/browse/GIRAPH-13?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095043#comment-13095043 ] Avery Ching commented on GIRAPH-13: --- This is going to be a fun one. =) Thanks for taking it on. > Port Giraph to YARN > --- > > Key: GIRAPH-13 > URL: https://issues.apache.org/jira/browse/GIRAPH-13 > Project: Giraph > Issue Type: New Feature >Reporter: Jakob Homan >Assignee: Jakob Homan > > Now that YARN (aka MR2 aka MAPREDUCE-279) has been merged into the Hadoop > trunk, we should think about what it would take to separate out the graph > processing bits of Giraph from the MR1-specific code so as to take advantage > of the less-MR centric aspects of YARN, while still supporting both over the > medium term. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-15) Use of Jenkins for tests and builds
[ https://issues.apache.org/jira/browse/GIRAPH-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095778#comment-13095778 ] Avery Ching commented on GIRAPH-15: --- Thanks for the status update. I'm excited to get this working. > Use of Jenkins for tests and builds > --- > > Key: GIRAPH-15 > URL: https://issues.apache.org/jira/browse/GIRAPH-15 > Project: Giraph > Issue Type: Task >Reporter: Hyunsik Choi >Assignee: Hyunsik Choi > > We can use Jenkins server (https://builds.apache.org/) for regular builds and > tests. To use jenkins, there are some processes. > Here is FAQ about use of Jenkins. > http://wiki.apache.org/general/Hudson -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-15) Use of Jenkins for tests and builds
[ https://issues.apache.org/jira/browse/GIRAPH-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096058#comment-13096058 ] Avery Ching commented on GIRAPH-15: --- Looks good, nice job. > Use of Jenkins for tests and builds > --- > > Key: GIRAPH-15 > URL: https://issues.apache.org/jira/browse/GIRAPH-15 > Project: Giraph > Issue Type: Task >Reporter: Hyunsik Choi >Assignee: Hyunsik Choi > > We can use Jenkins server (https://builds.apache.org/) for regular builds and > tests. To use jenkins, there are some processes. > Here is FAQ about use of Jenkins. > http://wiki.apache.org/general/Hudson -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (GIRAPH-21) Revise CODE_CONVENTIONS
[ https://issues.apache.org/jira/browse/GIRAPH-21?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Avery Ching updated GIRAPH-21: -- Attachment: GIRAPH-21.diff First proposal of the developer suggested code conventions. > Revise CODE_CONVENTIONS > --- > > Key: GIRAPH-21 > URL: https://issues.apache.org/jira/browse/GIRAPH-21 > Project: Giraph > Issue Type: Improvement >Reporter: Avery Ching >Assignee: Avery Ching >Priority: Minor > Attachments: GIRAPH-21.diff > > > Currently there is a CODE_CONVENTIONS file in the base path of Giraph. It's > fairly sparse and we have been assuming an 80 char limit per line. It's good > to have common conventions so that the code doesn't get too messy. Does > anyone have any opinions on this now? Probably best to tackle early and then > have something to follow. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-25) NPE in BspServiceMaster when failing a job
[ https://issues.apache.org/jira/browse/GIRAPH-25?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096832#comment-13096832 ] Avery Ching commented on GIRAPH-25: --- Definitely should be handled more gracefully. Thanks for filing the issue. > NPE in BspServiceMaster when failing a job > -- > > Key: GIRAPH-25 > URL: https://issues.apache.org/jira/browse/GIRAPH-25 > Project: Giraph > Issue Type: Bug >Reporter: Dmitriy V. Ryaboy >Priority: Minor > > When BspServiceMaster times out waiting for all workers to check in, it dies > with a NullPointerException. > This can perhaps be handled a bit more gracefully. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-11) Improve the graph distribution of Giraph
[ https://issues.apache.org/jira/browse/GIRAPH-11?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098098#comment-13098098 ] Avery Ching commented on GIRAPH-11: --- I'm going to assume you're asking about the current partitioning. If I'm wrong, I'll address what we plan to do in the future. The current partitioning is implemented by assuming that the input splits are sorted globally (i.e. two input split of {A, B, C} {D, E}). It will break the input splits into vertex ranges where the boundaries will not change. These vertex ranges can be passed around the workers via several different balancers. The balancer can be set via setVertexRangeBalancerClass() from GiraphJob or with the right configuration parameter (giraph.vertexRangeBalancerClass). We have some implementations for a static balancer (no vertex movement, default), and an auto balancer (configurable to balance based on vertices or edges). You're free to implement your own as well. Hope that answers some of the questions, let me know if you have more. > Improve the graph distribution of Giraph > > > Key: GIRAPH-11 > URL: https://issues.apache.org/jira/browse/GIRAPH-11 > Project: Giraph > Issue Type: Improvement >Reporter: Avery Ching >Assignee: Avery Ching > > Currently, Giraph assumes that the data from the VertexInputFormat is sorted. > If the user data is not sorted by the vertex id, they must first run a > MapReduce or Pig job to generate a sorted dataset. This is often a bit > inconvenient. > Giraph graph partitioning is currently range based and there are some > advantages and disadvantages of this approach. The proposal of this JIRA > would be to allow for both range and hash based partitioning and provide more > flexibility to the user. > Design goals for the graph distribution: > * Allow vertices to be unordered or unordered > * Ability to repartition > * Select the partitioning scheme based on user needs (i.e. hash or range > based) > * Ability to provide user-specific hints about partitions > Hash-based partitioning > * Good vertex balancing across ranges for random data > * Bad at vertex id locality > Range-based partitioning > * Good at vertex id locality > * Ability to split ranges easily > * Can cause hotspots for hot ranges -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-12) Investigate communication improvements
[ https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098512#comment-13098512 ] Avery Ching commented on GIRAPH-12: --- Jake from Twitter also recommended thinking about using Finagle. His description: "A fault tolerant, protocol-agnostic RPC system" based on Netty [which I see is already under consideration], written in scala, but with very mature java bindings too). We use it internally at Twitter for clusters of mid-tier servers which have many dozens of machines talking to hundreds of other machines, without blowing up on thread-stack or using a gazillion threads. It's mavenized, so it's easy to try out. > Investigate communication improvements > -- > > Key: GIRAPH-12 > URL: https://issues.apache.org/jira/browse/GIRAPH-12 > Project: Giraph > Issue Type: Improvement >Reporter: Avery Ching >Assignee: Hyunsik Choi >Priority: Minor > > Currently every worker will start up a thread to communicate with every other > workers. Hadoop RPC is used for communication. For instance if there are > 400 workers, each worker will create 400 threads. This ends up using a lot > of memory, even with the option > -Dmapred.child.java.opts="-Xss64k". > It would be good to investigate using frameworks like Netty or custom roll > our own to improve this situation. By moving away from Hadoop RPC, we would > also make compatibility of different Hadoop versions easier. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-26) Improve PseudoRandomVertexInputFormat to create a more realistic synthetic graph (e.g. power-law distributed vertex-cardinality).
[ https://issues.apache.org/jira/browse/GIRAPH-26?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098594#comment-13098594 ] Avery Ching commented on GIRAPH-26: --- Totally agree, any chance you might have some time to work on this? =) > Improve PseudoRandomVertexInputFormat to create a more realistic synthetic > graph (e.g. power-law distributed vertex-cardinality). > - > > Key: GIRAPH-26 > URL: https://issues.apache.org/jira/browse/GIRAPH-26 > Project: Giraph > Issue Type: Test > Components: benchmark >Reporter: Jake Mannix >Priority: Minor > > The PageRankBenchmark class, to be a proper benchmark, should run over graphs > which look more like data seen in the wild, and web link graphs, social > network graphs, and text corpora (represented as a bipartite graph) all have > power-law distributions, so benchmarking a synthetic graph which looks more > like this would be a nice test which would stress cases of uneven > split-distribution and bottlenecks of subclusters of the graph of heavily > connected vertices. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-25) NPE in BspServiceMaster when failing a job
[ https://issues.apache.org/jira/browse/GIRAPH-25?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13099055#comment-13099055 ] Avery Ching commented on GIRAPH-25: --- Thanks for the patch Dmitriy! I'll review it, add a unittest and the commit if it works as expected. > NPE in BspServiceMaster when failing a job > -- > > Key: GIRAPH-25 > URL: https://issues.apache.org/jira/browse/GIRAPH-25 > Project: Giraph > Issue Type: Bug >Reporter: Dmitriy V. Ryaboy >Priority: Minor > Attachments: GIRAPH-25.patch > > > When BspServiceMaster times out waiting for all workers to check in, it dies > with a NullPointerException. > This can perhaps be handled a bit more gracefully. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-26) Improve PseudoRandomVertexInputFormat to create a more realistic synthetic graph (e.g. power-law distributed vertex-cardinality).
[ https://issues.apache.org/jira/browse/GIRAPH-26?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13099693#comment-13099693 ] Avery Ching commented on GIRAPH-26: --- A skim of the paper appears that you will need to use Giraph to create your scale free graph with a number of iterations (supersteps). > Improve PseudoRandomVertexInputFormat to create a more realistic synthetic > graph (e.g. power-law distributed vertex-cardinality). > - > > Key: GIRAPH-26 > URL: https://issues.apache.org/jira/browse/GIRAPH-26 > Project: Giraph > Issue Type: Test > Components: benchmark >Reporter: Jake Mannix >Priority: Minor > > The PageRankBenchmark class, to be a proper benchmark, should run over graphs > which look more like data seen in the wild, and web link graphs, social > network graphs, and text corpora (represented as a bipartite graph) all have > power-law distributions, so benchmarking a synthetic graph which looks more > like this would be a nice test which would stress cases of uneven > split-distribution and bottlenecks of subclusters of the graph of heavily > connected vertices. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (GIRAPH-25) NPE in BspServiceMaster when failing a job
[ https://issues.apache.org/jira/browse/GIRAPH-25?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Avery Ching updated GIRAPH-25: -- Attachment: GIRAPH-25.2.patch Minor changes to the original (unittest, error message). > NPE in BspServiceMaster when failing a job > -- > > Key: GIRAPH-25 > URL: https://issues.apache.org/jira/browse/GIRAPH-25 > Project: Giraph > Issue Type: Bug >Reporter: Dmitriy V. Ryaboy >Priority: Minor > Attachments: GIRAPH-25.2.patch, GIRAPH-25.patch > > > When BspServiceMaster times out waiting for all workers to check in, it dies > with a NullPointerException. > This can perhaps be handled a bit more gracefully. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-25) NPE in BspServiceMaster when failing a job
[ https://issues.apache.org/jira/browse/GIRAPH-25?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100564#comment-13100564 ] Avery Ching commented on GIRAPH-25: --- Patch worked nicely. I added a unittest and tweaked an error message. Here's some example output I got (looks much better). ... 2011-09-08 11:20:35,203 INFO org.apache.giraph.graph.BspServiceMaster: checkWorkers: Only found 0 responses of 32767 needed to start superstep -1. Sleeping for 1 msecs and used 0 of 1 attempts. 2011-09-08 11:20:35,203 ERROR org.apache.giraph.graph.BspServiceMaster: checkWorkers: Did not receive enough processes in time (only 0 of 32767 required). This occurs if you do not have enough map tasks available simultaneously on your Hadoop instance to fulfill the number of requested workers. 2011-09-08 11:20:35,276 INFO org.apache.giraph.graph.BspServiceMaster: setJobState: {"_stateKey":"FAILED","_applicationAttemptKey":-1,"_superstepKey":-1} on superstep -1 2011-09-08 11:20:35,333 FATAL org.apache.giraph.graph.BspServiceMaster: failJob: Killing job job_201109080935_0009 2011-09-08 11:20:35,619 INFO org.apache.giraph.graph.BspServiceMaster: cleanup: Notifying master its okay to cleanup with /_hadoopBsp/job_201109080935_0009/_cleanedUpDir/0_master 2011-09-08 11:20:35,620 INFO org.apache.giraph.graph.BspServiceMaster: cleanUpZooKeeper: Node /_hadoopBsp/job_201109080935_0009/_cleanedUpDir already exists, no need to create. 2011-09-08 11:20:35,621 INFO org.apache.giraph.graph.BspServiceMaster: cleanUpZooKeeper: Got 1 of 32768 desired children from /_hadoopBsp/job_201109080935_0009/_cleanedUpDir 2011-09-08 11:20:35,621 INFO org.apache.giraph.graph.BspServiceMaster: cleanedUpZooKeeper: Waiting for the children of /_hadoopBsp/job_201109080935_0009/_cleanedUpDir to change since only got 1 nodes. 2011-09-08 11:20:38,182 WARN org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Forced a shutdown hook kill of the ZooKeeper process. I'll upload the minor changes and then commit it on your behalf. I ran unittests in local mode and also on a small Hadoop instance. Thanks! > NPE in BspServiceMaster when failing a job > -- > > Key: GIRAPH-25 > URL: https://issues.apache.org/jira/browse/GIRAPH-25 > Project: Giraph > Issue Type: Bug >Reporter: Dmitriy V. Ryaboy >Priority: Minor > Attachments: GIRAPH-25.2.patch, GIRAPH-25.patch > > > When BspServiceMaster times out waiting for all workers to check in, it dies > with a NullPointerException. > This can perhaps be handled a bit more gracefully. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (GIRAPH-25) NPE in BspServiceMaster when failing a job
[ https://issues.apache.org/jira/browse/GIRAPH-25?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Avery Ching reassigned GIRAPH-25: - Assignee: Dmitriy V. Ryaboy > NPE in BspServiceMaster when failing a job > -- > > Key: GIRAPH-25 > URL: https://issues.apache.org/jira/browse/GIRAPH-25 > Project: Giraph > Issue Type: Bug >Reporter: Dmitriy V. Ryaboy >Assignee: Dmitriy V. Ryaboy >Priority: Minor > Attachments: GIRAPH-25.2.patch, GIRAPH-25.patch > > > When BspServiceMaster times out waiting for all workers to check in, it dies > with a NullPointerException. > This can perhaps be handled a bit more gracefully. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-25) NPE in BspServiceMaster when failing a job
[ https://issues.apache.org/jira/browse/GIRAPH-25?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100667#comment-13100667 ] Avery Ching commented on GIRAPH-25: --- Yup, I added you and Jakob to the contributors list and assigned to you. I agree with your commit message description to not fill up the svn logs. > NPE in BspServiceMaster when failing a job > -- > > Key: GIRAPH-25 > URL: https://issues.apache.org/jira/browse/GIRAPH-25 > Project: Giraph > Issue Type: Bug >Reporter: Dmitriy V. Ryaboy >Assignee: Dmitriy V. Ryaboy >Priority: Minor > Attachments: GIRAPH-25.2.patch, GIRAPH-25.patch > > > When BspServiceMaster times out waiting for all workers to check in, it dies > with a NullPointerException. > This can perhaps be handled a bit more gracefully. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (GIRAPH-25) NPE in BspServiceMaster when failing a job
[ https://issues.apache.org/jira/browse/GIRAPH-25?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100667#comment-13100667 ] Avery Ching edited comment on GIRAPH-25 at 9/8/11 8:31 PM: --- Yup, I added you and Jake to the contributors list and assigned to you. I agree with your commit message description to not fill up the svn logs. was (Author: aching): Yup, I added you and Jakob to the contributors list and assigned to you. I agree with your commit message description to not fill up the svn logs. > NPE in BspServiceMaster when failing a job > -- > > Key: GIRAPH-25 > URL: https://issues.apache.org/jira/browse/GIRAPH-25 > Project: Giraph > Issue Type: Bug >Reporter: Dmitriy V. Ryaboy >Assignee: Dmitriy V. Ryaboy >Priority: Minor > Attachments: GIRAPH-25.2.patch, GIRAPH-25.patch > > > When BspServiceMaster times out waiting for all workers to check in, it dies > with a NullPointerException. > This can perhaps be handled a bit more gracefully. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-27) Mutable static global state in Vertex.java should be refactored
[ https://issues.apache.org/jira/browse/GIRAPH-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100829#comment-13100829 ] Avery Ching commented on GIRAPH-27: --- I'm revising it a little with some formatting changes, but overall it looks good. I'd like to submit a slight revision before commit. > Mutable static global state in Vertex.java should be refactored > --- > > Key: GIRAPH-27 > URL: https://issues.apache.org/jira/browse/GIRAPH-27 > Project: Giraph > Issue Type: Improvement > Components: graph >Affects Versions: 0.70.0 >Reporter: Jake Mannix >Assignee: Jake Mannix > Attachments: GIRAPH-27.patch, GIRAPH-27.patch > > > Vertex.java has a bunch of static methods for getting/setting global graph > state (total number of vertices, edges, a reference to the GraphMapper, etc). > Refactoring this into a GraphState object, which every Vertex can hold onto > a reference to (yes, a tiny bit more memory per Vertex, but in comparison to > what's already in there...) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-27) Mutable static global state in Vertex.java should be refactored
[ https://issues.apache.org/jira/browse/GIRAPH-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100830#comment-13100830 ] Avery Ching commented on GIRAPH-27: --- btw, I'll also run the page rank benchmark on a real cluster as well. > Mutable static global state in Vertex.java should be refactored > --- > > Key: GIRAPH-27 > URL: https://issues.apache.org/jira/browse/GIRAPH-27 > Project: Giraph > Issue Type: Improvement > Components: graph >Affects Versions: 0.70.0 >Reporter: Jake Mannix >Assignee: Jake Mannix > Attachments: GIRAPH-27.patch, GIRAPH-27.patch > > > Vertex.java has a bunch of static methods for getting/setting global graph > state (total number of vertices, edges, a reference to the GraphMapper, etc). > Refactoring this into a GraphState object, which every Vertex can hold onto > a reference to (yes, a tiny bit more memory per Vertex, but in comparison to > what's already in there...) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-27) Mutable static global state in Vertex.java should be refactored
[ https://issues.apache.org/jira/browse/GIRAPH-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100849#comment-13100849 ] Avery Ching commented on GIRAPH-27: --- I made some revisions to Jake's fix to - Do not expose GraphState to application developers - Fixing a few formatting issues https://reviews.apache.org/r/1771/ I also passed unittests and ran the PageRankBenchmark on a Yahoo! cluster with 100 workers and 500k vertices. If Jake is okay with the changes then I'll commit it on his behalf. > Mutable static global state in Vertex.java should be refactored > --- > > Key: GIRAPH-27 > URL: https://issues.apache.org/jira/browse/GIRAPH-27 > Project: Giraph > Issue Type: Improvement > Components: graph >Affects Versions: 0.70.0 >Reporter: Jake Mannix >Assignee: Jake Mannix > Attachments: GIRAPH-27.patch, GIRAPH-27.patch > > > Vertex.java has a bunch of static methods for getting/setting global graph > state (total number of vertices, edges, a reference to the GraphMapper, etc). > Refactoring this into a GraphState object, which every Vertex can hold onto > a reference to (yes, a tiny bit more memory per Vertex, but in comparison to > what's already in there...) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-27) Mutable static global state in Vertex.java should be refactored
[ https://issues.apache.org/jira/browse/GIRAPH-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100856#comment-13100856 ] Avery Ching commented on GIRAPH-27: --- Thanks. I just updated it. > Mutable static global state in Vertex.java should be refactored > --- > > Key: GIRAPH-27 > URL: https://issues.apache.org/jira/browse/GIRAPH-27 > Project: Giraph > Issue Type: Improvement > Components: graph >Affects Versions: 0.70.0 >Reporter: Jake Mannix >Assignee: Jake Mannix > Attachments: GIRAPH-27.patch, GIRAPH-27.patch > > > Vertex.java has a bunch of static methods for getting/setting global graph > state (total number of vertices, edges, a reference to the GraphMapper, etc). > Refactoring this into a GraphState object, which every Vertex can hold onto > a reference to (yes, a tiny bit more memory per Vertex, but in comparison to > what's already in there...) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-27) Mutable static global state in Vertex.java should be refactored
[ https://issues.apache.org/jira/browse/GIRAPH-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100928#comment-13100928 ] Avery Ching commented on GIRAPH-27: --- That is actually intentional, since I need to have access to the get/setGraphState() internally and I removed the get/setGraphState() from BasicVertex. So rather than expose get/setGraphState() to the user (BasicVertex), I opted to to this. I suppose we could have another interface internally that extended BasicVertex to allow getting and setting the graph state if you're concerned about exposed the vertex to the internals. Let me know what you think. > Mutable static global state in Vertex.java should be refactored > --- > > Key: GIRAPH-27 > URL: https://issues.apache.org/jira/browse/GIRAPH-27 > Project: Giraph > Issue Type: Improvement > Components: graph >Affects Versions: 0.70.0 >Reporter: Jake Mannix >Assignee: Jake Mannix > Attachments: GIRAPH-27.patch, GIRAPH-27.patch > > > Vertex.java has a bunch of static methods for getting/setting global graph > state (total number of vertices, edges, a reference to the GraphMapper, etc). > Refactoring this into a GraphState object, which every Vertex can hold onto > a reference to (yes, a tiny bit more memory per Vertex, but in comparison to > what's already in there...) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-27) Mutable static global state in Vertex.java should be refactored
[ https://issues.apache.org/jira/browse/GIRAPH-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100948#comment-13100948 ] Avery Ching commented on GIRAPH-27: --- One alternative is to change BasicVertex to an abstract class that implements get/setGraphState as package private methods. Users won't have access to get/setGraphState, while your primitive implementation would (since it's part of the same package). Thoughts? If you like it, I can submit a revised reviewboard request. > Mutable static global state in Vertex.java should be refactored > --- > > Key: GIRAPH-27 > URL: https://issues.apache.org/jira/browse/GIRAPH-27 > Project: Giraph > Issue Type: Improvement > Components: graph >Affects Versions: 0.70.0 >Reporter: Jake Mannix >Assignee: Jake Mannix > Attachments: GIRAPH-27.patch, GIRAPH-27.patch > > > Vertex.java has a bunch of static methods for getting/setting global graph > state (total number of vertices, edges, a reference to the GraphMapper, etc). > Refactoring this into a GraphState object, which every Vertex can hold onto > a reference to (yes, a tiny bit more memory per Vertex, but in comparison to > what's already in there...) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-27) Mutable static global state in Vertex.java should be refactored
[ https://issues.apache.org/jira/browse/GIRAPH-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101010#comment-13101010 ] Avery Ching commented on GIRAPH-27: --- Let's do this Jake, I'll do some of the cleanup here and repost. If you agree, I'll commit. Then you can make any additional changes in another JIRA that is specific for your primitives implementation. What do you think? I can do some of it in the next 10-15 min. > Mutable static global state in Vertex.java should be refactored > --- > > Key: GIRAPH-27 > URL: https://issues.apache.org/jira/browse/GIRAPH-27 > Project: Giraph > Issue Type: Improvement > Components: graph >Affects Versions: 0.70.0 >Reporter: Jake Mannix >Assignee: Jake Mannix > Attachments: GIRAPH-27.patch, GIRAPH-27.patch > > > Vertex.java has a bunch of static methods for getting/setting global graph > state (total number of vertices, edges, a reference to the GraphMapper, etc). > Refactoring this into a GraphState object, which every Vertex can hold onto > a reference to (yes, a tiny bit more memory per Vertex, but in comparison to > what's already in there...) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-27) Mutable static global state in Vertex.java should be refactored
[ https://issues.apache.org/jira/browse/GIRAPH-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101030#comment-13101030 ] Avery Ching commented on GIRAPH-27: --- Btw, with respect to import ordering, can you please voice your preferences on https://issues.apache.org/jira/browse/GIRAPH-21 ? Thanks! > Mutable static global state in Vertex.java should be refactored > --- > > Key: GIRAPH-27 > URL: https://issues.apache.org/jira/browse/GIRAPH-27 > Project: Giraph > Issue Type: Improvement > Components: graph >Affects Versions: 0.70.0 >Reporter: Jake Mannix >Assignee: Jake Mannix > Attachments: GIRAPH-27.patch, GIRAPH-27.patch > > > Vertex.java has a bunch of static methods for getting/setting global graph > state (total number of vertices, edges, a reference to the GraphMapper, etc). > Refactoring this into a GraphState object, which every Vertex can hold onto > a reference to (yes, a tiny bit more memory per Vertex, but in comparison to > what's already in there...) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-27) Mutable static global state in Vertex.java should be refactored
[ https://issues.apache.org/jira/browse/GIRAPH-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101031#comment-13101031 ] Avery Ching commented on GIRAPH-27: --- Waiting on another committer to +1 before committing. > Mutable static global state in Vertex.java should be refactored > --- > > Key: GIRAPH-27 > URL: https://issues.apache.org/jira/browse/GIRAPH-27 > Project: Giraph > Issue Type: Improvement > Components: graph >Affects Versions: 0.70.0 >Reporter: Jake Mannix >Assignee: Jake Mannix > Attachments: GIRAPH-27.patch, GIRAPH-27.patch > > > Vertex.java has a bunch of static methods for getting/setting global graph > state (total number of vertices, edges, a reference to the GraphMapper, etc). > Refactoring this into a GraphState object, which every Vertex can hold onto > a reference to (yes, a tiny bit more memory per Vertex, but in comparison to > what's already in there...) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-11) Improve the graph distribution of Giraph
[ https://issues.apache.org/jira/browse/GIRAPH-11?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101593#comment-13101593 ] Avery Ching commented on GIRAPH-11: --- The hash partitioning will be based on hashCode() by default, but the user can implement something they like as well based on the vertex id. I am designing it to get hash based and hash range based. In a pure hash-based distribution, you should get great load balancing. In a hash-range based distribution, the user could possibly get some locality benefits without changing anything from the hash based partitioning. Then finally, there should be a way for the user to do a pure range based split of the id space, but this requires the most work by the user to specify their division of the id space (depends on the type). The hash based and hash-range based schemes will be implemented by default and will be selectable by users. The range based scheme will be a partial implementation since we require users to do the id range partitioning. Additionally, we will provide the API for users to implement their own graph partitioning scheme. Let me know what you think. > Improve the graph distribution of Giraph > > > Key: GIRAPH-11 > URL: https://issues.apache.org/jira/browse/GIRAPH-11 > Project: Giraph > Issue Type: Improvement >Reporter: Avery Ching >Assignee: Avery Ching > > Currently, Giraph assumes that the data from the VertexInputFormat is sorted. > If the user data is not sorted by the vertex id, they must first run a > MapReduce or Pig job to generate a sorted dataset. This is often a bit > inconvenient. > Giraph graph partitioning is currently range based and there are some > advantages and disadvantages of this approach. The proposal of this JIRA > would be to allow for both range and hash based partitioning and provide more > flexibility to the user. > Design goals for the graph distribution: > * Allow vertices to be unordered or unordered > * Ability to repartition > * Select the partitioning scheme based on user needs (i.e. hash or range > based) > * Ability to provide user-specific hints about partitions > Hash-based partitioning > * Good vertex balancing across ranges for random data > * Bad at vertex id locality > Range-based partitioning > * Good at vertex id locality > * Ability to split ranges easily > * Can cause hotspots for hot ranges -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (GIRAPH-25) NPE in BspServiceMaster when failing a job
[ https://issues.apache.org/jira/browse/GIRAPH-25?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Avery Ching resolved GIRAPH-25. --- Resolution: Fixed Not sure if I am supposed to close this issue, or the reporter should, but I'll close it since it's been committed. Please reopen if there is an issue. > NPE in BspServiceMaster when failing a job > -- > > Key: GIRAPH-25 > URL: https://issues.apache.org/jira/browse/GIRAPH-25 > Project: Giraph > Issue Type: Bug >Reporter: Dmitriy V. Ryaboy >Assignee: Dmitriy V. Ryaboy >Priority: Minor > Attachments: GIRAPH-25.2.patch, GIRAPH-25.patch > > > When BspServiceMaster times out waiting for all workers to check in, it dies > with a NullPointerException. > This can perhaps be handled a bit more gracefully. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-11) Improve the graph distribution of Giraph
[ https://issues.apache.org/jira/browse/GIRAPH-11?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101710#comment-13101710 ] Avery Ching commented on GIRAPH-11: --- Regarding the difference in hash based and hash rang based, it refers to how the hash code is assigned to a partition. The application dev will implement hashCode() for their vertex id and then the assignment of the hashCode() to a partition can be hashed (i.e. hashCode() % # partitions) or range based ([0-a),[a-b)...etc). Hope that's more clear. Code will help. It's coming soon, by mid next week I hope. > Improve the graph distribution of Giraph > > > Key: GIRAPH-11 > URL: https://issues.apache.org/jira/browse/GIRAPH-11 > Project: Giraph > Issue Type: Improvement >Reporter: Avery Ching >Assignee: Avery Ching > > Currently, Giraph assumes that the data from the VertexInputFormat is sorted. > If the user data is not sorted by the vertex id, they must first run a > MapReduce or Pig job to generate a sorted dataset. This is often a bit > inconvenient. > Giraph graph partitioning is currently range based and there are some > advantages and disadvantages of this approach. The proposal of this JIRA > would be to allow for both range and hash based partitioning and provide more > flexibility to the user. > Design goals for the graph distribution: > * Allow vertices to be unordered or unordered > * Ability to repartition > * Select the partitioning scheme based on user needs (i.e. hash or range > based) > * Ability to provide user-specific hints about partitions > Hash-based partitioning > * Good vertex balancing across ranges for random data > * Bad at vertex id locality > Range-based partitioning > * Good at vertex id locality > * Ability to split ranges easily > * Can cause hotspots for hot ranges -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-25) NPE in BspServiceMaster when failing a job
[ https://issues.apache.org/jira/browse/GIRAPH-25?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101962#comment-13101962 ] Avery Ching commented on GIRAPH-25: --- Thanks for the advice. I'll be doing the same this weekend =). > NPE in BspServiceMaster when failing a job > -- > > Key: GIRAPH-25 > URL: https://issues.apache.org/jira/browse/GIRAPH-25 > Project: Giraph > Issue Type: Bug >Reporter: Dmitriy V. Ryaboy >Assignee: Dmitriy V. Ryaboy >Priority: Minor > Attachments: GIRAPH-25.2.patch, GIRAPH-25.patch > > > When BspServiceMaster times out waiting for all workers to check in, it dies > with a NullPointerException. > This can perhaps be handled a bit more gracefully. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-29) Implement TextVertexInputFormat for text-format graph data
[ https://issues.apache.org/jira/browse/GIRAPH-29?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102104#comment-13102104 ] Avery Ching commented on GIRAPH-29: --- I agree that text format graph data is nice. We have a helper abstract classes based on TextInputFormat and TextOutputFormat to do this: org.apache.giraph.lib.TextVertexInputFormat org.apache.giraph.lib.TextVertexOutputFormat An example implementation that uses those helper classes is org.apache.giraph.lib.JsonBase64VertexInputFormat org.apache.giraph.lib.JsonBase64VertexInputFormat Does this satisfy your needs? Any suggestions for improvement? > Implement TextVertexInputFormat for text-format graph data > -- > > Key: GIRAPH-29 > URL: https://issues.apache.org/jira/browse/GIRAPH-29 > Project: Giraph > Issue Type: New Feature > Components: bsp >Reporter: Hyunsik Choi >Assignee: Hyunsik Choi >Priority: Minor > Fix For: 0.70.0 > > > Supporting text-format graph data would be nice. It is helpful for developing > graph algorithms and debugging because text-format graph data are > human-readable and enable users to easily write sample data sets. > Furthermore, text-format data are exchangeable regardless of operating > systems or programming languages. > So, we need a basic InputFormat to help users develop user-defined > InputFormat classes to deal text-represented graph data sets. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-29) Implement TextVertexInputFormat for text-format graph data
[ https://issues.apache.org/jira/browse/GIRAPH-29?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102142#comment-13102142 ] Avery Ching commented on GIRAPH-29: --- No problem, the real issue is that there is little documentation (my fault). Contrary to MapReduce (map tasks = input splits), workers need not equal input split from VertexInputFormat. Workers in Giraph process InputSplits as fast as possible and may process 0 or more InputSplits. > Implement TextVertexInputFormat for text-format graph data > -- > > Key: GIRAPH-29 > URL: https://issues.apache.org/jira/browse/GIRAPH-29 > Project: Giraph > Issue Type: New Feature > Components: bsp >Reporter: Hyunsik Choi >Assignee: Hyunsik Choi >Priority: Minor > Fix For: 0.70.0 > > > Supporting text-format graph data would be nice. It is helpful for developing > graph algorithms and debugging because text-format graph data are > human-readable and enable users to easily write sample data sets. > Furthermore, text-format data are exchangeable regardless of operating > systems or programming languages. > So, we need a basic InputFormat to help users develop user-defined > InputFormat classes to deal text-represented graph data sets. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (GIRAPH-30) NPE in ZooKeeperManager if base directory cannot be created
[ https://issues.apache.org/jira/browse/GIRAPH-30?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Avery Ching updated GIRAPH-30: -- Attachment: GIRAPH-30.2.patch > NPE in ZooKeeperManager if base directory cannot be created > --- > > Key: GIRAPH-30 > URL: https://issues.apache.org/jira/browse/GIRAPH-30 > Project: Giraph > Issue Type: Bug >Reporter: Andrew Purtell >Priority: Minor > Attachments: GIRAPH-30.2.patch, GIRAPH-30.patch > > > If the base directory cannot be created, for example if running on secure > Hadoop and the user home directory does not exist, ZooKeeperManager will > throw an NPE when trying to list it. It would be better to throw an > IOException with an informative message. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira