[jira] [Commented] (GIRAPH-185) Improve concurrency of putMsg / putMsgList

2012-04-25 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13261906#comment-13261906
 ] 

Avery Ching commented on GIRAPH-185:


I agree that a benchmark should be done, although I expect the impact to be 
very small.  We should at least show it's not slower. =)

> Improve concurrency of putMsg / putMsgList
> --
>
> Key: GIRAPH-185
> URL: https://issues.apache.org/jira/browse/GIRAPH-185
> Project: Giraph
>  Issue Type: Improvement
>  Components: graph
>Affects Versions: 0.2.0
>Reporter: Bo Wang
>Assignee: Bo Wang
> Fix For: 0.2.0
>
> Attachments: GIRAPH-185.patch, GIRAPH-185.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Currently in putMsg / putMsgList, a synchronized closure is used to protect 
> the whole transientInMessages when adding the new message. This lock prevents 
> other concurrent calls to putMsg/putMsgList and increases the response time. 
> We should use fine-grain locks to allow high concurrency in message 
> communication.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-04-30 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265155#comment-13265155
 ] 

Avery Ching commented on GIRAPH-153:


I'll take a look, sorry for the delay.

> HBase/Accumulo Input and Output formats
> ---
>
> Key: GIRAPH-153
> URL: https://issues.apache.org/jira/browse/GIRAPH-153
> Project: Giraph
>  Issue Type: New Feature
>  Components: bsp
>Affects Versions: 0.1.0
> Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
>Reporter: Brian Femiano
> Attachments: GIRAPH-153.1.patch, GIRAPH-153.patch
>
>
> Four abstract classes that wrap their respective delegate input/output 
> formats for
> easy hooks into vertex input format subclasses. I've included some sample 
> programs that show two very simple graph
> algorithms. I have a graph generator that builds out a very simple directed 
> structure, starting with a few 'root' nodes.
> Root nodes are defined as nodes which are not listed as a child anywhere in 
> the graph. 
> Algorithm 1) AccumuloRootMarker.java  --> Accumulo as read/write source. 
> Every vertex starts thinking it's a root. At superstep 0, send a message down 
> to each
> child as a non-root notification. After superstep 1, only root nodes will 
> have never been messaged. 
> Algorithm 2) TableRootMarker --> HBase as read/write source. Expands on A1 by 
> bundling the notification logic followed by root node propagation. Once we've 
> marked the appropriate nodes as roots, tell every child which roots it can be 
> traced back to via one or more spanning trees. This will take N + 2 
> supersteps where N is the maximum number of hops from any root to any leaf, 
> plus 2 supersteps for the initial root flagging. 
> I've included all relevant code plus DistributedCacheHelper.java for 
> recursive cache file and archive searches. It is more hadoop centric than 
> giraph, but these jobs use it so I figured why not commit here. 
> These have been tested through local JobRunner, pseudo-distributed on the 
> aforementioned hardware, and full distributed on EC2. More details in the 
> comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-04-30 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265176#comment-13265176
 ] 

Avery Ching commented on GIRAPH-153:


Brian, I'm having some trouble with your patch.  I used a freshly checked out 
version of giraph to confirm:

aching@sdwilshmbp13:~/Avery/source/giraph_trunk$ patch -p0 < 
~/Desktop/GIRAPH-153.1.patch 
patching file giraph-formats-contrib/LICENSE.txt
patching file giraph-formats-contrib/license-header.txt
patching file 
giraph-formats-contrib/src/test/java/org/apache/giraph/BspCase.java
patching file 
giraph-formats-contrib/src/test/java/org/apache/giraph/format/hbase/TestHBaseRootMarkerVertextFormat.java
patching file 
giraph-formats-contrib/src/test/java/org/apache/giraph/format/hbase/edgemarker/TableEdgeInputFormat.java
patching file 
giraph-formats-contrib/src/test/java/org/apache/giraph/format/hbase/edgemarker/TableEdgeOutputFormat.java
patching file 
giraph-formats-contrib/src/test/java/org/apache/giraph/format/accumulo/TestAccumuloVertexFormat.java
patching file 
giraph-formats-contrib/src/test/java/org/apache/giraph/format/accumulo/edgemarker/AccumuloEdgeInputFormat.java
patching file 
giraph-formats-contrib/src/test/java/org/apache/giraph/format/accumulo/edgemarker/AccumuloEdgeOutputFormat.java
patching file giraph-formats-contrib/src/main/assembly/compile.xml
can't find file to patch at input line 1301
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--
|Index: 
giraph-formats-contrib/src/main/java/org/apache/giraph/format/hbase/package-info.java
|===
|--- 
giraph-formats-contrib/src/main/java/org/apache/giraph/format/hbase/package-info.java
  (revision 0)
|+++ 
giraph-formats-contrib/src/main/java/org/apache/giraph/format/hbase/package-info.java
  (working copy)
--
File to patch: 


> HBase/Accumulo Input and Output formats
> ---
>
> Key: GIRAPH-153
> URL: https://issues.apache.org/jira/browse/GIRAPH-153
> Project: Giraph
>  Issue Type: New Feature
>  Components: bsp
>Affects Versions: 0.1.0
> Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
>Reporter: Brian Femiano
> Attachments: GIRAPH-153.1.patch, GIRAPH-153.patch
>
>
> Four abstract classes that wrap their respective delegate input/output 
> formats for
> easy hooks into vertex input format subclasses. I've included some sample 
> programs that show two very simple graph
> algorithms. I have a graph generator that builds out a very simple directed 
> structure, starting with a few 'root' nodes.
> Root nodes are defined as nodes which are not listed as a child anywhere in 
> the graph. 
> Algorithm 1) AccumuloRootMarker.java  --> Accumulo as read/write source. 
> Every vertex starts thinking it's a root. At superstep 0, send a message down 
> to each
> child as a non-root notification. After superstep 1, only root nodes will 
> have never been messaged. 
> Algorithm 2) TableRootMarker --> HBase as read/write source. Expands on A1 by 
> bundling the notification logic followed by root node propagation. Once we've 
> marked the appropriate nodes as roots, tell every child which roots it can be 
> traced back to via one or more spanning trees. This will take N + 2 
> supersteps where N is the maximum number of hops from any root to any leaf, 
> plus 2 supersteps for the initial root flagging. 
> I've included all relevant code plus DistributedCacheHelper.java for 
> recursive cache file and archive searches. It is more hadoop centric than 
> giraph, but these jobs use it so I figured why not commit here. 
> These have been tested through local JobRunner, pseudo-distributed on the 
> aforementioned hardware, and full distributed on EC2. More details in the 
> comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-04-30 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265205#comment-13265205
 ] 

Avery Ching commented on GIRAPH-153:


Is this a fresh checkout?  We shouldn't have to answer any questions like 
"Reversed (or previously applied) patch detected! Assume -R".

> HBase/Accumulo Input and Output formats
> ---
>
> Key: GIRAPH-153
> URL: https://issues.apache.org/jira/browse/GIRAPH-153
> Project: Giraph
>  Issue Type: New Feature
>  Components: bsp
>Affects Versions: 0.1.0
> Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
>Reporter: Brian Femiano
> Attachments: GIRAPH-153.1.patch, GIRAPH-153.patch
>
>
> Four abstract classes that wrap their respective delegate input/output 
> formats for
> easy hooks into vertex input format subclasses. I've included some sample 
> programs that show two very simple graph
> algorithms. I have a graph generator that builds out a very simple directed 
> structure, starting with a few 'root' nodes.
> Root nodes are defined as nodes which are not listed as a child anywhere in 
> the graph. 
> Algorithm 1) AccumuloRootMarker.java  --> Accumulo as read/write source. 
> Every vertex starts thinking it's a root. At superstep 0, send a message down 
> to each
> child as a non-root notification. After superstep 1, only root nodes will 
> have never been messaged. 
> Algorithm 2) TableRootMarker --> HBase as read/write source. Expands on A1 by 
> bundling the notification logic followed by root node propagation. Once we've 
> marked the appropriate nodes as roots, tell every child which roots it can be 
> traced back to via one or more spanning trees. This will take N + 2 
> supersteps where N is the maximum number of hops from any root to any leaf, 
> plus 2 supersteps for the initial root flagging. 
> I've included all relevant code plus DistributedCacheHelper.java for 
> recursive cache file and archive searches. It is more hadoop centric than 
> giraph, but these jobs use it so I figured why not commit here. 
> These have been tested through local JobRunner, pseudo-distributed on the 
> aforementioned hardware, and full distributed on EC2. More details in the 
> comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-04-30 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265542#comment-13265542
 ] 

Avery Ching commented on GIRAPH-153:


No problem.  The red flag for me was that this patch (244K) was so much bigger 
than the previous one (85k).

> HBase/Accumulo Input and Output formats
> ---
>
> Key: GIRAPH-153
> URL: https://issues.apache.org/jira/browse/GIRAPH-153
> Project: Giraph
>  Issue Type: New Feature
>  Components: bsp
>Affects Versions: 0.1.0
> Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
>Reporter: Brian Femiano
> Attachments: GIRAPH-153.1.patch, GIRAPH-153.2.patch, GIRAPH-153.patch
>
>
> Four abstract classes that wrap their respective delegate input/output 
> formats for
> easy hooks into vertex input format subclasses. I've included some sample 
> programs that show two very simple graph
> algorithms. I have a graph generator that builds out a very simple directed 
> structure, starting with a few 'root' nodes.
> Root nodes are defined as nodes which are not listed as a child anywhere in 
> the graph. 
> Algorithm 1) AccumuloRootMarker.java  --> Accumulo as read/write source. 
> Every vertex starts thinking it's a root. At superstep 0, send a message down 
> to each
> child as a non-root notification. After superstep 1, only root nodes will 
> have never been messaged. 
> Algorithm 2) TableRootMarker --> HBase as read/write source. Expands on A1 by 
> bundling the notification logic followed by root node propagation. Once we've 
> marked the appropriate nodes as roots, tell every child which roots it can be 
> traced back to via one or more spanning trees. This will take N + 2 
> supersteps where N is the maximum number of hops from any root to any leaf, 
> plus 2 supersteps for the initial root flagging. 
> I've included all relevant code plus DistributedCacheHelper.java for 
> recursive cache file and archive searches. It is more hadoop centric than 
> giraph, but these jobs use it so I figured why not commit here. 
> These have been tested through local JobRunner, pseudo-distributed on the 
> aforementioned hardware, and full distributed on EC2. More details in the 
> comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-169) How to close all child when a job finished?

2012-05-03 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267294#comment-13267294
 ] 

Avery Ching commented on GIRAPH-169:


Thanks for the simple case Roman.  I wonder what versions are affected.  20.203 
seems fine with your test case.

> How to close all child when a job finished?
> ---
>
> Key: GIRAPH-169
> URL: https://issues.apache.org/jira/browse/GIRAPH-169
> Project: Giraph
>  Issue Type: Improvement
>  Components: mapreduce
>Affects Versions: 0.2.0
> Environment: sles 11 x64,jdk 1.6,hadoop 0.20.205.0,1 Master and 8 
> slaves,
>Reporter: Jianfeng Qian
>Priority: Minor
>
> I ran pagerank at hadoop 0.20.205.0. When the job finished,the child in 
> slaves didn't quit immediately and sometimes they never quit and I have to 
> kill them. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-169) How to close all child when a job finished?

2012-05-03 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267301#comment-13267301
 ] 

Avery Ching commented on GIRAPH-169:


Roman, I just tried with hadoop-1.0.2 with your test case:

hadoop jar target/giraph-0.2-SNAPSHOT-jar-with-dependencies.jar 
org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 10 -v -V 1000 -w 1

hadoop jar target/giraph-0.2-SNAPSHOT-jar-with-dependencies.jar 
org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 10 -v -V 1000 -w 2

Both of them ran 2x.  One thing I did do was compile against the hadoop-1.02 
version.

mvn -Phadoop_1.0 clean package -DskipTests

Can you verify that you compiled against the correct Hadoop profile?

> How to close all child when a job finished?
> ---
>
> Key: GIRAPH-169
> URL: https://issues.apache.org/jira/browse/GIRAPH-169
> Project: Giraph
>  Issue Type: Improvement
>  Components: mapreduce
>Affects Versions: 0.2.0
> Environment: sles 11 x64,jdk 1.6,hadoop 0.20.205.0,1 Master and 8 
> slaves,
>Reporter: Jianfeng Qian
>Priority: Minor
>
> I ran pagerank at hadoop 0.20.205.0. When the job finished,the child in 
> slaves didn't quit immediately and sometimes they never quit and I have to 
> kill them. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (GIRAPH-127) Extending the API with a master.compute() function.

2012-05-03 Thread Avery Ching (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avery Ching reassigned GIRAPH-127:
--

Assignee: Semih Salihoglu

Looking forward to this.

> Extending the API with a master.compute() function.
> ---
>
> Key: GIRAPH-127
> URL: https://issues.apache.org/jira/browse/GIRAPH-127
> Project: Giraph
>  Issue Type: New Feature
>  Components: bsp, examples, graph
>Reporter: Semih Salihoglu
>Assignee: Semih Salihoglu
>
> First of all, sorry for the long explanation to this feature.
> I want to expand the API of Giraph with a new function called 
> master.compute(), that would get called at the master before each superstep 
> and I will try to explain the purpose that it would serve with an example. 
> Let's say we want to implement the following simplified version of the 
> k-means clustering algorithm. Pseudocode below:
>  * Input G(V, E), k, numEdgesThreshold, maxIterations
>  * Algorithm:
>  * int numEdgesCrossingClusters = Integer.MAX_INT;
> *  int iterationNo = 0;
>  * while ((numEdgesCrossingCluster > numEdgesThreshold) && iterationNo < 
> maxIterations) {
>  *iterationNo++;
>  *int[] clusterCenters = pickKClusterCenters(k, G);
>  *findClusterCenters(G, clusterCenters);
>  *numEdgesCrossingClusters = countNumEdgesCrossingClusters();
>  * }
> The algorithm goes through the following steps in iterations:
> 1) Pick k random initial cluster centers
> 2) Assign each vertex to the cluster center that it's closest to (in Giraph, 
> this can be implemented in message passing similar to how ShortestPaths is 
> implemented):
> 3) Count the nuimber of edges crossing clusters
> 4) Go back to step 1, if there are a lot of edges crossing clusters and we 
> haven't exceeded maximum number of iterations yet.
> In an algorithm like this, step 2 and 3 are where most of the work happens 
> and both parts have very neat message-passing implementations. I'll try to 
> give an overview without going into the details. Let's say we define a Vertex 
> in Giraph to hold a custom Writable object that holds 2 integer values and 
> sends a message with upto 2 integer values.
> Step 2 is very similar to ShortestPaths algorithm and has two stages: In the 
> first stage, each vertex checks to see whether or not it's one of the cluster 
> centers. If so, it assigns itself the value (id, 0), otherwise it assigns 
> itself (Null, Null). In the 2nd stage, the vertices assign themselves to the 
> minimum distance cluster center by looking at their neighbors (cluster 
> centers, distance) values (received as 2 integer messages) and their current 
> values, and changing their values if they find a lower distance cluster 
> center. This happens in x number of supersteps until every vertex converges.
> Step 3, counting the number of edges crossing clusters, is also very easy to 
> implement in Giraph. Once each vertex has a cluster center, the number of 
> edges crossing clusters can be counted by an aggregator, let's say called 
> "num-edges-crossing". It would again have two stages: First stage, every 
> vertex just sends its cluster id to all its neighbors. Second stage, every 
> vertex looks at their neighbors' cluster ids in the messages, and for each 
> cluster id that is not equal to its own cluster id, it increments 
> "num-edges-crossing" by 1.
> The other 2 steps, step 1 and 4, are very simple sequential computations. 
> Step 1 just picks k random vertex ids and puts it into an aggregator. Step 4 
> just compares "num-edges-crossing" by a threshold and also checks whether or 
> not the algorithm has exceeded maxIterations (not supersteps but iterations 
> of going through Steps 1-4). With the current API, it's not clear where to do 
> these computations. There is a per worker function preSuperstep() that can be 
> implemented, but if we decide to pick a special worker, let's say worker 1,  
> to pick the k vertices then we'd waste an entire superstep where only worker 
> 1 would do work, (by picking k vertices  in preSuperstep() and put them into 
> an aggregator), and all other workers would be idle. Trying to do this in 
> worker 1 in postSuperstep() would not work either because, worker 1 needs to 
> know that all the vertices have converged to understand that it's time to 
> pick k vertices or it's time do check in step 4, which would only be 
> available to it in the beginning of the next superstep.
> A master.compute() extension would run at the master and before the superstep 
> and would modify the aggregator that would keep the k vertices before the 
> aggregators are broadcast to the workers, which are all very short sequential 
> computations, so they would not waste resources the way a preSuperstep() or 
> postSuperstep() approach would do. It would also

[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-05-05 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13268917#comment-13268917
 ] 

Avery Ching commented on GIRAPH-153:


I'll take a look this weekend Brian.  Thanks for the reminder.

> HBase/Accumulo Input and Output formats
> ---
>
> Key: GIRAPH-153
> URL: https://issues.apache.org/jira/browse/GIRAPH-153
> Project: Giraph
>  Issue Type: New Feature
>  Components: bsp
>Affects Versions: 0.1.0
> Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
>Reporter: Brian Femiano
> Attachments: GIRAPH-153.1.patch, GIRAPH-153.2.patch, 
> GIRAPH-153.3.patch, GIRAPH-153.patch
>
>
> Four abstract classes that wrap their respective delegate input/output 
> formats for
> easy hooks into vertex input format subclasses. I've included some sample 
> programs that show two very simple graph
> algorithms. I have a graph generator that builds out a very simple directed 
> structure, starting with a few 'root' nodes.
> Root nodes are defined as nodes which are not listed as a child anywhere in 
> the graph. 
> Algorithm 1) AccumuloRootMarker.java  --> Accumulo as read/write source. 
> Every vertex starts thinking it's a root. At superstep 0, send a message down 
> to each
> child as a non-root notification. After superstep 1, only root nodes will 
> have never been messaged. 
> Algorithm 2) TableRootMarker --> HBase as read/write source. Expands on A1 by 
> bundling the notification logic followed by root node propagation. Once we've 
> marked the appropriate nodes as roots, tell every child which roots it can be 
> traced back to via one or more spanning trees. This will take N + 2 
> supersteps where N is the maximum number of hops from any root to any leaf, 
> plus 2 supersteps for the initial root flagging. 
> I've included all relevant code plus DistributedCacheHelper.java for 
> recursive cache file and archive searches. It is more hadoop centric than 
> giraph, but these jobs use it so I figured why not commit here. 
> These have been tested through local JobRunner, pseudo-distributed on the 
> aforementioned hardware, and full distributed on EC2. More details in the 
> comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-37) Implement Netty-backed rpc solution

2012-05-06 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269303#comment-13269303
 ] 

Avery Ching commented on GIRAPH-37:
---

Since Jakob had to switch gears, I wanted to let you guys know that I've spent 
a few days of the past week working on a netty-only replacement for 
communication.  I should have a patch and some performance numbers up in a few 
days.  Users will be able to choose between the old RPC way and the this netty 
approach.  Netty is so customizable, it will likely taking a lot of tuning to 
get the dials right for most cases.

> Implement Netty-backed rpc solution
> ---
>
> Key: GIRAPH-37
> URL: https://issues.apache.org/jira/browse/GIRAPH-37
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Jakob Homan
>Assignee: Jakob Homan
> Attachments: GIRAPH-37-wip.patch
>
>
> GIRAPH-12 considered replacing the current Hadoop based rpc method with 
> Netty, but didn't went in another direction. I think there is still value in 
> this approach, and will also look at Finagle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-37) Implement Netty-backed rpc solution

2012-05-09 Thread Avery Ching (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-37?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avery Ching updated GIRAPH-37:
--

Attachment: GIRAPH-37.patch

Same as reviewboard file, but ensuring the license is granted here.

> Implement Netty-backed rpc solution
> ---
>
> Key: GIRAPH-37
> URL: https://issues.apache.org/jira/browse/GIRAPH-37
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Jakob Homan
>Assignee: Jakob Homan
> Attachments: GIRAPH-37-wip.patch, GIRAPH-37.patch
>
>
> GIRAPH-12 considered replacing the current Hadoop based rpc method with 
> Netty, but didn't went in another direction. I think there is still value in 
> this approach, and will also look at Finagle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-37) Implement Netty-backed rpc solution

2012-05-09 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271245#comment-13271245
 ] 

Avery Ching commented on GIRAPH-37:
---

Thanks Claudio.

Here are more results with a scaled up 10 worker setup:

Hadoop RPC:
hadoop jar ~/giraph-0.2-SNAPSHOT-jar-with-dependencies.jar 
org.apache.giraph.benchmark.PageRankBenchmark -Dgiraph.useNetty=false -w 10 -V 
1000 -s 5 -e 2 -v
12/05/09 02:32:05 INFO mapred.JobClient:   Giraph Timers
12/05/09 02:32:05 INFO mapred.JobClient: Total (milliseconds)=149880
12/05/09 02:32:05 INFO mapred.JobClient: Superstep 3 (milliseconds)=21575
12/05/09 02:32:05 INFO mapred.JobClient: Setup (milliseconds)=7428
12/05/09 02:32:05 INFO mapred.JobClient: Shutdown (milliseconds)=174
12/05/09 02:32:05 INFO mapred.JobClient: Vertex input superstep 
(milliseconds)=39558
12/05/09 02:32:05 INFO mapred.JobClient: Superstep 0 (milliseconds)=16887
12/05/09 02:32:05 INFO mapred.JobClient: Superstep 4 (milliseconds)=18613
12/05/09 02:32:05 INFO mapred.JobClient: Superstep 5 (milliseconds)=3292
12/05/09 02:32:05 INFO mapred.JobClient: Superstep 2 (milliseconds)=21313
12/05/09 02:32:05 INFO mapred.JobClient: Superstep 1 (milliseconds)=21035

Netty:
hadoop jar ~/giraph-0.2-SNAPSHOT-jar-with-dependencies.jar 
org.apache.giraph.benchmark.PageRankBenchmark -Dgiraph.useNetty=true -w 10 -V 
1000 -s 5 -e 2 -v
12/05/09 02:35:06 INFO mapred.JobClient:   Giraph Timers
12/05/09 02:35:06 INFO mapred.JobClient: Total (milliseconds)=59270
12/05/09 02:35:06 INFO mapred.JobClient: Superstep 3 (milliseconds)=11827
12/05/09 02:35:06 INFO mapred.JobClient: Setup (milliseconds)=3196
12/05/09 02:35:06 INFO mapred.JobClient: Shutdown (milliseconds)=124
12/05/09 02:35:06 INFO mapred.JobClient: Vertex input superstep 
(milliseconds)=13130
12/05/09 02:35:06 INFO mapred.JobClient: Superstep 0 (milliseconds)=8564
12/05/09 02:35:06 INFO mapred.JobClient: Superstep 4 (milliseconds)=5540
12/05/09 02:35:06 INFO mapred.JobClient: Superstep 5 (milliseconds)=2012
12/05/09 02:35:06 INFO mapred.JobClient: Superstep 2 (milliseconds)=8601
12/05/09 02:35:06 INFO mapred.JobClient: Superstep 1 (milliseconds)=6271

These results are fairly similar to the first set (even though there are more 
workers).  I'm pretty sure we can squeeze more performance from Netty in the 
future in future patches (i.e. local send optimization is missing, tuning TCP 
parameters, exposing more knobs to the user, etc.).

> Implement Netty-backed rpc solution
> ---
>
> Key: GIRAPH-37
> URL: https://issues.apache.org/jira/browse/GIRAPH-37
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Jakob Homan
>Assignee: Jakob Homan
> Attachments: GIRAPH-37-wip.patch, GIRAPH-37.patch
>
>
> GIRAPH-12 considered replacing the current Hadoop based rpc method with 
> Netty, but didn't went in another direction. I think there is still value in 
> this approach, and will also look at Finagle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-37) Implement Netty-backed IPC

2012-05-09 Thread Avery Ching (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-37?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avery Ching updated GIRAPH-37:
--

Assignee: Avery Ching  (was: Jakob Homan)
 Summary: Implement Netty-backed IPC  (was: Implement Netty-backed rpc 
solution)

> Implement Netty-backed IPC
> --
>
> Key: GIRAPH-37
> URL: https://issues.apache.org/jira/browse/GIRAPH-37
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Jakob Homan
>Assignee: Avery Ching
> Attachments: GIRAPH-37-wip.patch, GIRAPH-37.patch
>
>
> GIRAPH-12 considered replacing the current Hadoop based rpc method with 
> Netty, but didn't went in another direction. I think there is still value in 
> this approach, and will also look at Finagle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-37) Implement Netty-backed IPC

2012-05-09 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271696#comment-13271696
 ] 

Avery Ching commented on GIRAPH-37:
---

@Claudio,

Vertex input superstep is a blocking operation when sending the vertices to the 
destination partition owners.  Now it's non-blocking, overlapping communication 
and computation.

Setup should be ignored.  That is the time to get all the map tasks and pick a 
master.

> Implement Netty-backed IPC
> --
>
> Key: GIRAPH-37
> URL: https://issues.apache.org/jira/browse/GIRAPH-37
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Jakob Homan
>Assignee: Avery Ching
> Attachments: GIRAPH-37-wip.patch, GIRAPH-37.patch
>
>
> GIRAPH-12 considered replacing the current Hadoop based rpc method with 
> Netty, but didn't went in another direction. I think there is still value in 
> this approach, and will also look at Finagle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (GIRAPH-37) Implement Netty-backed IPC

2012-05-09 Thread Avery Ching (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-37?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avery Ching resolved GIRAPH-37.
---

Resolution: Fixed

Hudson is successful, closing.

> Implement Netty-backed IPC
> --
>
> Key: GIRAPH-37
> URL: https://issues.apache.org/jira/browse/GIRAPH-37
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Jakob Homan
>Assignee: Avery Ching
> Attachments: GIRAPH-37-wip.patch, GIRAPH-37.patch
>
>
> GIRAPH-12 considered replacing the current Hadoop based rpc method with 
> Netty, but didn't went in another direction. I think there is still value in 
> this approach, and will also look at Finagle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-189) Synchronization on Map values should be in a thread safe object

2012-05-09 Thread Avery Ching (JIRA)
Avery Ching created GIRAPH-189:
--

 Summary: Synchronization on Map values should be in a thread safe 
object
 Key: GIRAPH-189
 URL: https://issues.apache.org/jira/browse/GIRAPH-189
 Project: Giraph
  Issue Type: Improvement
Reporter: Avery Ching


See https://reviews.apache.org/r/5074/ for reasoning

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-190) Create GiraphConf extends Configuration

2012-05-09 Thread Avery Ching (JIRA)
Avery Ching created GIRAPH-190:
--

 Summary: Create GiraphConf extends Configuration
 Key: GIRAPH-190
 URL: https://issues.apache.org/jira/browse/GIRAPH-190
 Project: Giraph
  Issue Type: Improvement
Reporter: Avery Ching
Priority: Minor


Currently all the options in Giraph are in the GiraphJob.  It would be cleaner 
to do configuration as part of a special GiraphConf (analagous to HiveConf) and 
would simplify code elsewhere as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-1) Initial code import

2011-08-10 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082811#comment-13082811
 ] 

Avery Ching commented on GIRAPH-1:
--

Hey Owen, I really appreciate you checking in the code.  Is it possible to do 
it with the history though?  It currently appears that all the history was lost.

> Initial code import
> ---
>
> Key: GIRAPH-1
> URL: https://issues.apache.org/jira/browse/GIRAPH-1
> Project: Giraph
>  Issue Type: Task
>Affects Versions: 0.1.0
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 0.1.0
>
>
> I did the initial code import from github.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-1) Initial code import

2011-08-10 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082901#comment-13082901
 ] 

Avery Ching commented on GIRAPH-1:
--

I can do the svn dump.  The history would useful I think for seeing all the 
changes and associated rationale.  I'll package something up and send it to you 
tonight.

> Initial code import
> ---
>
> Key: GIRAPH-1
> URL: https://issues.apache.org/jira/browse/GIRAPH-1
> Project: Giraph
>  Issue Type: Task
>Affects Versions: 0.1.0
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 0.1.0
>
>
> I did the initial code import from github.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-1) Initial code import

2011-08-11 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083440#comment-13083440
 ] 

Avery Ching commented on GIRAPH-1:
--

Owen, sorry about the delay, but I dumped and loaded the dump to verify it 
preserved history.

Steps I tried.

- Get the dump file:
Available from http://www.ece.northwestern.edu/~aching/giraph.dump.tar.gz
(i.e. wget http://www.ece.northwestern.edu/~aching/giraph.dump.tar.gz)

- Untar the dump file
(i.e. 'tar zxvf giraph.dump.tar.gz')

- Load the load into the svn repository
(i.e. 'svnadmin load  < giraph.dump')
You might want to try additional options to specify where it goes in the Apache 
incubator svn repository.

- Move the directory to the right location
This might not be necessary if you use 'svnadmin load' correctly.
Otherwise the directory will be in /projects/hadoop_bsp/trunk 
and should
probably be moved to /Giraph/trunk or something like that.

Please let me know how it goes.  Thanks!

> Initial code import
> ---
>
> Key: GIRAPH-1
> URL: https://issues.apache.org/jira/browse/GIRAPH-1
> Project: Giraph
>  Issue Type: Task
>Affects Versions: 0.1.0
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 0.1.0
>
>
> I did the initial code import from github.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-1) Initial code import

2011-08-11 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083450#comment-13083450
 ] 

Avery Ching commented on GIRAPH-1:
--

Let me know if I can help out by the way.  I'm not sure who has svn admin 
privileges on the apache svn server.

> Initial code import
> ---
>
> Key: GIRAPH-1
> URL: https://issues.apache.org/jira/browse/GIRAPH-1
> Project: Giraph
>  Issue Type: Task
>Affects Versions: 0.1.0
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 0.1.0
>
>
> I did the initial code import from github.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-1) Initial code import

2011-08-11 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083917#comment-13083917
 ] 

Avery Ching commented on GIRAPH-1:
--

Hyunsik,

No, you are right, I had to do the following procedures to get the dump (my 
first time dumping an svn repo).

svnsync with our main yahoo repository
and then svnadmin dump only giraph
hence there are around 30 revisions, but most of them are empty.  It took 
me about 3-4 hours to complete.  I lost access to my work machine with the 
sync, but I might be able to produce a more concise dump if needed.

> Initial code import
> ---
>
> Key: GIRAPH-1
> URL: https://issues.apache.org/jira/browse/GIRAPH-1
> Project: Giraph
>  Issue Type: Task
>Affects Versions: 0.1.0
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 0.1.0
>
>
> I did the initial code import from github.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-1) Initial code import

2011-08-11 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083948#comment-13083948
 ] 

Avery Ching commented on GIRAPH-1:
--

No problem.  I think I can get it down to around 30k (1/10 of the original dump 
revisions).  That sound be good enough I hope.  Unfortunately, I have to redo 
the svnsync (started about 1/2 hour ago and on revision 6...).

> Initial code import
> ---
>
> Key: GIRAPH-1
> URL: https://issues.apache.org/jira/browse/GIRAPH-1
> Project: Giraph
>  Issue Type: Task
>Affects Versions: 0.1.0
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 0.1.0
>
>
> I did the initial code import from github.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-1) Initial code import

2011-08-12 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13084016#comment-13084016
 ] 

Avery Ching commented on GIRAPH-1:
--

I have synced and redumped the svn repo, but with a very limited number of 
revisions (around 10k).  I was able to load all the revisions into my local 
repo in about 10 minutes, much improved over the 3-4 hours before.  Note again 
that the path it will produce from loading is something like 
/projects/hadoop_bsp/trunk and should be moved with 'svn mv' to something 
like https://svn.apache.org/repos/asf/incubator/giraph/trunk. Please let me 
know if there are issues.  Thanks!

New dump file location:

http://www.ece.northwestern.edu/~aching/giraph_27_280777.dump.tar.gz

> Initial code import
> ---
>
> Key: GIRAPH-1
> URL: https://issues.apache.org/jira/browse/GIRAPH-1
> Project: Giraph
>  Issue Type: Task
>Affects Versions: 0.1.0
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 0.1.0
>
>
> I did the initial code import from github.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-1) Initial code import

2011-08-15 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085267#comment-13085267
 ] 

Avery Ching commented on GIRAPH-1:
--

While svndumpfilter failed to work, I was able to fix the issue with 
svndumpfilter2.  This has less than 150 revisions.  Here is the final dump 
location:

http://www.ece.northwestern.edu/~aching/2011.08.15.giraph.dump.tar.gz

1. After downloading the file, execute 'tar zxvf 2011.08.15.giraph.tar.gz' to 
get the actual dump file.

2. Remove the current trunk directory (i.e. svn rm 
/incubator/giraph/trunk)

3. Load the data into the repository location 'incubator/giraph/trunk' (will 
also create incubator/giraph/trunk).
svnadmin load  < 2011.08.15.giraph.dump

That should be it!  I've deleted the old dump files to avoid confusion.

> Initial code import
> ---
>
> Key: GIRAPH-1
> URL: https://issues.apache.org/jira/browse/GIRAPH-1
> Project: Giraph
>  Issue Type: Task
>Affects Versions: 0.1.0
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 0.1.0
>
>
> I did the initial code import from github.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-2) make the project homepage

2011-08-23 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089436#comment-13089436
 ] 

Avery Ching commented on GIRAPH-2:
--

Agreed, any thoughts on how the homepage differs from the confluence wiki?

> make the project homepage
> -
>
> Key: GIRAPH-2
> URL: https://issues.apache.org/jira/browse/GIRAPH-2
> Project: Giraph
>  Issue Type: Task
>Reporter: Hyunsik Choi
>
> We need to make the project homepage at http://incubator.apache.org/giraph/.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-3) Vertex:sentMsgToAllEdges should be sendMsg

2011-08-23 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089828#comment-13089828
 ] 

Avery Ching commented on GIRAPH-3:
--

Duh.  I guess we should wait until the svn import is finished before doing 
this...

> Vertex:sentMsgToAllEdges should be sendMsg
> --
>
> Key: GIRAPH-3
> URL: https://issues.apache.org/jira/browse/GIRAPH-3
> Project: Giraph
>  Issue Type: Bug
>Reporter: Jakob Homan
>Assignee: Jakob Homan
>
> The method Vertex.java:sentMsgToAllEdges() should be sendMsgToAllEdges()

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-2) make the project homepage

2011-08-25 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091482#comment-13091482
 ] 

Avery Ching commented on GIRAPH-2:
--

Jakob, great start!  Changes look good to me.

> make the project homepage
> -
>
> Key: GIRAPH-2
> URL: https://issues.apache.org/jira/browse/GIRAPH-2
> Project: Giraph
>  Issue Type: Task
>Reporter: Hyunsik Choi
>Assignee: Jakob Homan
> Attachments: GIRAPH-2.patch
>
>
> We need to make the project homepage at http://incubator.apache.org/giraph/.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-3) Vertex:sentMsgToAllEdges should be sendMsg

2011-08-25 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091486#comment-13091486
 ] 

Avery Ching commented on GIRAPH-3:
--

I've +1'd it too.  We can address the naming conventions in another issue.

> Vertex:sentMsgToAllEdges should be sendMsg
> --
>
> Key: GIRAPH-3
> URL: https://issues.apache.org/jira/browse/GIRAPH-3
> Project: Giraph
>  Issue Type: Bug
>Reporter: Jakob Homan
>Assignee: Jakob Homan
> Attachments: GIRAPH-3.patch
>
>
> The method Vertex.java:sentMsgToAllEdges() should be sendMsgToAllEdges()

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-5) Remove Yahoo directories

2011-08-25 Thread Avery Ching (JIRA)
Remove Yahoo directories


 Key: GIRAPH-5
 URL: https://issues.apache.org/jira/browse/GIRAPH-5
 Project: Giraph
  Issue Type: Task
Reporter: Avery Ching
Assignee: Avery Ching
Priority: Minor


As an artifact of pulling from the Yahoo! svn repository, we need to re-remove 
the Yahoo! specific build stuff.  This was done already in GitHub, but of 
course, they are different places.

I would like to remove the following directories:

src/ci/
src/main/pkg

Also, as Jakob has seen, our pom.xml needs cleanup.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-5) Remove Yahoo directories

2011-08-25 Thread Avery Ching (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-5?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avery Ching updated GIRAPH-5:
-

Attachment: diff.txt

Diff after 'svn rm' of those two directories.

> Remove Yahoo directories
> 
>
> Key: GIRAPH-5
> URL: https://issues.apache.org/jira/browse/GIRAPH-5
> Project: Giraph
>  Issue Type: Task
>Reporter: Avery Ching
>Assignee: Avery Ching
>Priority: Minor
> Attachments: diff.txt
>
>
> As an artifact of pulling from the Yahoo! svn repository, we need to 
> re-remove the Yahoo! specific build stuff.  This was done already in GitHub, 
> but of course, they are different places.
> I would like to remove the following directories:
> src/ci/
> src/main/pkg
> Also, as Jakob has seen, our pom.xml needs cleanup.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (GIRAPH-5) Remove Yahoo directories

2011-08-25 Thread Avery Ching (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-5?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avery Ching resolved GIRAPH-5.
--

Resolution: Fixed

Committed after Jakob's +1.

> Remove Yahoo directories
> 
>
> Key: GIRAPH-5
> URL: https://issues.apache.org/jira/browse/GIRAPH-5
> Project: Giraph
>  Issue Type: Task
>Reporter: Avery Ching
>Assignee: Avery Ching
>Priority: Minor
> Attachments: diff.txt
>
>
> As an artifact of pulling from the Yahoo! svn repository, we need to 
> re-remove the Yahoo! specific build stuff.  This was done already in GitHub, 
> but of course, they are different places.
> I would like to remove the following directories:
> src/ci/
> src/main/pkg
> Also, as Jakob has seen, our pom.xml needs cleanup.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-6) Remove Yahoo-specific code from pom.xml

2011-08-26 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092117#comment-13092117
 ] 

Avery Ching commented on GIRAPH-6:
--

Thanks for doing this.

> Remove Yahoo-specific code from pom.xml
> ---
>
> Key: GIRAPH-6
> URL: https://issues.apache.org/jira/browse/GIRAPH-6
> Project: Giraph
>  Issue Type: Bug
>Reporter: Jakob Homan
>Assignee: Jakob Homan
>Priority: Blocker
> Attachments: GIRAPH-6.patch
>
>
> There are remaining references to Y! infrastructure in the pom.xml, which 
> prevents the build from succeeding.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-8) Update references to Yahoo bug that needs to be fixed

2011-08-26 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-8?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092124#comment-13092124
 ] 

Avery Ching commented on GIRAPH-8:
--

This has to do with internal Yahoo! bugs that need to be ported to JIRA.  I 
will do this.  The bug has not been fixed and the issue is that basically we 
can only store so many VertexRange objects in a single ZooKeeper znode.  
Currently there should be a workaround to prevent too many vertex ranges from 
being created.

> Update references to Yahoo bug that needs to be fixed
> -
>
> Key: GIRAPH-8
> URL: https://issues.apache.org/jira/browse/GIRAPH-8
> Project: Giraph
>  Issue Type: Bug
>Reporter: Jakob Homan
>
> In BspServiceMaster.java there are three TODOS (lines 1342, 1348, 1377) 
> referring to those sections of code being deleted after Bug#4340282 is fixed. 
>  We should either verify that this has been fixed, change the comments to a 
> more descriptive explanation, or fix whatever bug is being referenced.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-2) make the project homepage

2011-08-26 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092127#comment-13092127
 ] 

Avery Ching commented on GIRAPH-2:
--

Any preference between mvn2 or mvn3?  Since I know there is an issue with the 
hadoop=non_secure with mvn2, maybe it's better to go to mvn3?

I love the page.  A related question is with respect to the version number 
(0.70).  Should we move it to 0.1 to reflect the Apache version?

> make the project homepage
> -
>
> Key: GIRAPH-2
> URL: https://issues.apache.org/jira/browse/GIRAPH-2
> Project: Giraph
>  Issue Type: Task
>Reporter: Hyunsik Choi
>Assignee: Jakob Homan
> Attachments: GIRAPH-2.patch, GIRAPH-2b.patch
>
>
> We need to make the project homepage at http://incubator.apache.org/giraph/.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-2) make the project homepage

2011-08-26 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092132#comment-13092132
 ] 

Avery Ching commented on GIRAPH-2:
--

Well, unless there are any objections, let's go to mvn3.  

With respect to the product version, can we advance the Apache version to 0.70? 
 If not, I don't mind going back to 0.1.  It's just a number =).

> make the project homepage
> -
>
> Key: GIRAPH-2
> URL: https://issues.apache.org/jira/browse/GIRAPH-2
> Project: Giraph
>  Issue Type: Task
>Reporter: Hyunsik Choi
>Assignee: Jakob Homan
> Attachments: GIRAPH-2.patch, GIRAPH-2b.patch
>
>
> We need to make the project homepage at http://incubator.apache.org/giraph/.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-9) Change Yahoo License Header to Apache License Header

2011-08-28 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092604#comment-13092604
 ] 

Avery Ching commented on GIRAPH-9:
--

Hyunsik, I've +1ed it.  It's nice to be in Apache now, thanks for making the 
license changes.  Out of curiosity, did you use Wdev91 copyright wizard (what I 
previously used to create the original copyrights) or some other tool?

> Change Yahoo License Header to Apache License Header
> 
>
> Key: GIRAPH-9
> URL: https://issues.apache.org/jira/browse/GIRAPH-9
> Project: Giraph
>  Issue Type: Task
>Reporter: Hyunsik Choi
>Assignee: Hyunsik Choi
> Fix For: 0.1.0
>
> Attachments: GIRAPH-9.patch
>
>
> All source codes contains Yahoo License Header as follows
> {noformat}
> Licensed to Yahoo! under one or more contributor license agreements. 
> ...
> {noformat}
> These license header should be as follows
> {noformat}
> Licensed to the Apache Software Foundation (ASF) under one 
> or more contributor license agreements.
> ...
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-10) Aggregators are not exported

2011-08-28 Thread Avery Ching (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-10?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avery Ching updated GIRAPH-10:
--

Priority: Minor  (was: Major)

> Aggregators are not exported
> 
>
> Key: GIRAPH-10
> URL: https://issues.apache.org/jira/browse/GIRAPH-10
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Avery Ching
>Priority: Minor
>
> Currently, aggregator values cannot be saved after a Giraph job.  There 
> should be a way to do this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-10) Aggregators are not exported

2011-08-28 Thread Avery Ching (JIRA)
Aggregators are not exported


 Key: GIRAPH-10
 URL: https://issues.apache.org/jira/browse/GIRAPH-10
 Project: Giraph
  Issue Type: New Feature
Reporter: Avery Ching


Currently, aggregator values cannot be saved after a Giraph job.  There should 
be a way to do this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-11) Improve the graph distribution of Giraph

2011-08-28 Thread Avery Ching (JIRA)
Improve the graph distribution of Giraph


 Key: GIRAPH-11
 URL: https://issues.apache.org/jira/browse/GIRAPH-11
 Project: Giraph
  Issue Type: Improvement
Reporter: Avery Ching
Assignee: Avery Ching


Currently, Giraph assumes that the data from the VertexInputFormat is sorted.  
If the user data is not sorted by the vertex id, they must first run a 
MapReduce or Pig job to generate a sorted dataset.  This is often a bit 
inconvenient.

Giraph graph partitioning is currently range based and there are some 
advantages and disadvantages of this approach.  The proposal of this JIRA would 
be to allow for both range and hash based partitioning and provide more 
flexibility to the user.

Design goals for the graph distribution:

* Allow vertices to be unordered or unordered
* Ability to repartition
* Select the partitioning scheme based on user needs (i.e. hash or range based)
* Ability to provide user-specific hints about partitions

Hash-based partitioning

* Good vertex balancing across ranges for random data
* Bad at vertex id locality

Range-based partitioning

* Good at vertex id locality
* Ability to split ranges easily
* Can cause hotspots for hot ranges


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-12) Investigate communication improvements

2011-08-28 Thread Avery Ching (JIRA)
Investigate communication improvements
--

 Key: GIRAPH-12
 URL: https://issues.apache.org/jira/browse/GIRAPH-12
 Project: Giraph
  Issue Type: Improvement
Reporter: Avery Ching
Priority: Minor


Currently every worker will start up a thread to communicate with every other 
workers.  Hadoop RPC is used for communication.  For instance if there are 400 
workers, each worker will create 400 threads.  This ends up using a lot of 
memory, even with the option  

-Dmapred.child.java.opts="-Xss64k".  

It would be good to investigate using frameworks like Netty or custom roll our 
own to improve this situation.  By moving away from Hadoop RPC, we would also 
make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-2) make the project homepage

2011-08-29 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092977#comment-13092977
 ] 

Avery Ching commented on GIRAPH-2:
--

Done =) +1.

> make the project homepage
> -
>
> Key: GIRAPH-2
> URL: https://issues.apache.org/jira/browse/GIRAPH-2
> Project: Giraph
>  Issue Type: Task
>Reporter: Hyunsik Choi
>Assignee: Jakob Homan
> Attachments: GIRAPH-2.patch, GIRAPH-2b.patch
>
>
> We need to make the project homepage at http://incubator.apache.org/giraph/.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-13) Port Giraph to YARN

2011-08-29 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-13?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13093213#comment-13093213
 ] 

Avery Ching commented on GIRAPH-13:
---

Agreed.

> Port Giraph to YARN
> ---
>
> Key: GIRAPH-13
> URL: https://issues.apache.org/jira/browse/GIRAPH-13
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Jakob Homan
>
> Now that YARN (aka MR2 aka MAPREDUCE-279) has been merged into the Hadoop 
> trunk, we should think about what it would take to separate out the graph 
> processing bits of Giraph from the MR1-specific code so as to take advantage 
> of the less-MR centric aspects of YARN, while still supporting both over the 
> medium term.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-14) Support for the Facebook Hadoop branch

2011-08-29 Thread Avery Ching (JIRA)
Support for the Facebook Hadoop branch
--

 Key: GIRAPH-14
 URL: https://issues.apache.org/jira/browse/GIRAPH-14
 Project: Giraph
  Issue Type: New Feature
Reporter: Avery Ching


I've been working with Joe Xie on support to get Giraph running on the Facebook 
Hadoop branch.  He verified today that the examples worked on their cluster.  I 
need to clean up my changes a little, but otherwise, will submit a cleaned up 
diff.  As a side note, does anyone know how we can get Hudson support for 
Giraph?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (GIRAPH-14) Support for the Facebook Hadoop branch

2011-08-29 Thread Avery Ching (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-14?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avery Ching reassigned GIRAPH-14:
-

Assignee: Avery Ching

> Support for the Facebook Hadoop branch
> --
>
> Key: GIRAPH-14
> URL: https://issues.apache.org/jira/browse/GIRAPH-14
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Avery Ching
>Assignee: Avery Ching
>
> I've been working with Joe Xie on support to get Giraph running on the 
> Facebook Hadoop branch.  He verified today that the examples worked on their 
> cluster.  I need to clean up my changes a little, but otherwise, will submit 
> a cleaned up diff.  As a side note, does anyone know how we can get Hudson 
> support for Giraph?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-14) Support for the Facebook Hadoop branch

2011-08-29 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13093348#comment-13093348
 ] 

Avery Ching commented on GIRAPH-14:
---

Thanks Hyunsik.

> Support for the Facebook Hadoop branch
> --
>
> Key: GIRAPH-14
> URL: https://issues.apache.org/jira/browse/GIRAPH-14
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Avery Ching
>Assignee: Avery Ching
>
> I've been working with Joe Xie on support to get Giraph running on the 
> Facebook Hadoop branch.  He verified today that the examples worked on their 
> cluster.  I need to clean up my changes a little, but otherwise, will submit 
> a cleaned up diff.  As a side note, does anyone know how we can get Hudson 
> support for Giraph?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-17) Giraph doesn't give up properly after the maximum connect attempts to ZooKeeper

2011-08-29 Thread Avery Ching (JIRA)
Giraph doesn't give up properly after the maximum connect attempts to ZooKeeper
---

 Key: GIRAPH-17
 URL: https://issues.apache.org/jira/browse/GIRAPH-17
 Project: Giraph
  Issue Type: Bug
Reporter: Avery Ching
Assignee: Avery Ching
Priority: Minor


This produces incorrect and strange behavior.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-17) Giraph doesn't give up properly after the maximum connect attempts to ZooKeeper

2011-08-29 Thread Avery Ching (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-17?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avery Ching updated GIRAPH-17:
--

Attachment: ZooKeeperManager.java.diff

> Giraph doesn't give up properly after the maximum connect attempts to 
> ZooKeeper
> ---
>
> Key: GIRAPH-17
> URL: https://issues.apache.org/jira/browse/GIRAPH-17
> Project: Giraph
>  Issue Type: Bug
>Reporter: Avery Ching
>Assignee: Avery Ching
>Priority: Minor
> Attachments: ZooKeeperManager.java.diff
>
>
> This produces incorrect and strange behavior.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-15) Use of Jenkins for tests and builds

2011-08-29 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13093439#comment-13093439
 ] 

Avery Ching commented on GIRAPH-15:
---

Looks like one of our mentors will have to do this as you suggest.  Hopefully 
Owen, Alan, or Chris can do it for us.

> Use of Jenkins for tests and builds
> ---
>
> Key: GIRAPH-15
> URL: https://issues.apache.org/jira/browse/GIRAPH-15
> Project: Giraph
>  Issue Type: Task
>Reporter: Hyunsik Choi
>Assignee: Hyunsik Choi
>
> We can use Jenkins server (https://builds.apache.org/) for regular builds and 
> tests. To use jenkins, there are some processes.
> Here is FAQ about use of Jenkins.
> http://wiki.apache.org/general/Hudson

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-14) Support for the Facebook Hadoop branch

2011-08-29 Thread Avery Ching (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-14?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avery Ching updated GIRAPH-14:
--

Attachment: facebook.txt

Supports the Facebook version of Hadoop with mvn -Dhadoop=facebook 
-Dhadoop.jar.path= 

> Support for the Facebook Hadoop branch
> --
>
> Key: GIRAPH-14
> URL: https://issues.apache.org/jira/browse/GIRAPH-14
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Avery Ching
>Assignee: Avery Ching
> Attachments: facebook.txt
>
>
> I've been working with Joe Xie on support to get Giraph running on the 
> Facebook Hadoop branch.  He verified today that the examples worked on their 
> cluster.  I need to clean up my changes a little, but otherwise, will submit 
> a cleaned up diff.  As a side note, does anyone know how we can get Hudson 
> support for Giraph?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-17) Giraph doesn't give up properly after the maximum connect attempts to ZooKeeper

2011-08-30 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-17?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13093808#comment-13093808
 ] 

Avery Ching commented on GIRAPH-17:
---

Sure.  The main part of this fix is

-if (connectAttempts == 5) {
+if (connectAttempts == maxConnectAttempts) {

Basically this condition should be hit if the max connect attempts was tried, 
but never was because they because maxConnectAttempts is now 10 and became out 
of sync at some point (maxConnectAttempts probably used to be 5).  

The limit is stil not configurable, we can address that in a later issue.

> Giraph doesn't give up properly after the maximum connect attempts to 
> ZooKeeper
> ---
>
> Key: GIRAPH-17
> URL: https://issues.apache.org/jira/browse/GIRAPH-17
> Project: Giraph
>  Issue Type: Bug
>Reporter: Avery Ching
>Assignee: Avery Ching
>Priority: Minor
> Attachments: ZooKeeperManager.java.diff
>
>
> This produces incorrect and strange behavior.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-17) Giraph doesn't give up properly after the maximum connect attempts to ZooKeeper

2011-08-30 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-17?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13093861#comment-13093861
 ] 

Avery Ching commented on GIRAPH-17:
---

Thanks for taking a look.  Committed.

> Giraph doesn't give up properly after the maximum connect attempts to 
> ZooKeeper
> ---
>
> Key: GIRAPH-17
> URL: https://issues.apache.org/jira/browse/GIRAPH-17
> Project: Giraph
>  Issue Type: Bug
>Reporter: Avery Ching
>Assignee: Avery Ching
>Priority: Minor
> Attachments: ZooKeeperManager.java.diff
>
>
> This produces incorrect and strange behavior.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-14) Support for the Facebook Hadoop branch

2011-08-30 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13093879#comment-13093879
 ] 

Avery Ching commented on GIRAPH-14:
---

It's good to hear that you can run it on your cluster.  As far as the 
unittests, that is strange.  I was able to repeat the same issues and will look 
into a fix.

> Support for the Facebook Hadoop branch
> --
>
> Key: GIRAPH-14
> URL: https://issues.apache.org/jira/browse/GIRAPH-14
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Avery Ching
>Assignee: Avery Ching
> Attachments: facebook.txt
>
>
> I've been working with Joe Xie on support to get Giraph running on the 
> Facebook Hadoop branch.  He verified today that the examples worked on their 
> cluster.  I need to clean up my changes a little, but otherwise, will submit 
> a cleaned up diff.  As a side note, does anyone know how we can get Hudson 
> support for Giraph?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-4) New project logo

2011-08-30 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-4?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094134#comment-13094134
 ] 

Avery Ching commented on GIRAPH-4:
--

Yes, it would certainly be nice to have a real logo.  Do you want to give it a 
shot?

> New project logo
> 
>
> Key: GIRAPH-4
> URL: https://issues.apache.org/jira/browse/GIRAPH-4
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Jakob Homan
>
> Now for the hard part: the project logo.  We should create one and add it to 
> the website once done.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-14) Support for the Facebook Hadoop branch

2011-08-30 Thread Avery Ching (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-14?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avery Ching updated GIRAPH-14:
--

Attachment: facebook2.txt

Looks like I needed to change the groupId so that the right dependencies are 
pulled in for hadoop.  Please try this one out.  The unittests all passed for 
me.

(i.e. mvn -Dhadoop=facebook 
-Dhadoop.jar.path=/Users/aching/Desktop/hadoop-0.20.1-core.jar package)

> Support for the Facebook Hadoop branch
> --
>
> Key: GIRAPH-14
> URL: https://issues.apache.org/jira/browse/GIRAPH-14
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Avery Ching
>Assignee: Avery Ching
> Attachments: facebook.txt, facebook2.txt
>
>
> I've been working with Joe Xie on support to get Giraph running on the 
> Facebook Hadoop branch.  He verified today that the examples worked on their 
> cluster.  I need to clean up my changes a little, but otherwise, will submit 
> a cleaned up diff.  As a side note, does anyone know how we can get Hudson 
> support for Giraph?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-14) Support for the Facebook Hadoop branch

2011-08-30 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094216#comment-13094216
 ] 

Avery Ching commented on GIRAPH-14:
---

Great to hear it!  When one of the committers gets a chance to review, I can 
commit.

> Support for the Facebook Hadoop branch
> --
>
> Key: GIRAPH-14
> URL: https://issues.apache.org/jira/browse/GIRAPH-14
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Avery Ching
>Assignee: Avery Ching
> Attachments: facebook.txt, facebook2.txt
>
>
> I've been working with Joe Xie on support to get Giraph running on the 
> Facebook Hadoop branch.  He verified today that the examples worked on their 
> cluster.  I need to clean up my changes a little, but otherwise, will submit 
> a cleaned up diff.  As a side note, does anyone know how we can get Hudson 
> support for Giraph?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-14) Support for the Facebook Hadoop branch

2011-08-30 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094305#comment-13094305
 ] 

Avery Ching commented on GIRAPH-14:
---

In theory, I believe that Facebook's distro is online 
(https://github.com/facebook/hadoop-20-warehouse).  The long term story is to 
factor out the parts into modules and then compile them based on the user 
profile.  Then we don't have to "munge" anything anymore.  At least that's what 
I've thought of for now.  I'm open to better solutions.  Pre-processing will 
get unmaintainable if we have to support every version of Hadoop.  That being 
said, we should support the big customers of Giraph and that likely includes 
Facebook as well.

I'll add instructions to the README and submit a new patch.

> Support for the Facebook Hadoop branch
> --
>
> Key: GIRAPH-14
> URL: https://issues.apache.org/jira/browse/GIRAPH-14
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Avery Ching
>Assignee: Avery Ching
> Attachments: facebook.txt, facebook2.txt
>
>
> I've been working with Joe Xie on support to get Giraph running on the 
> Facebook Hadoop branch.  He verified today that the examples worked on their 
> cluster.  I need to clean up my changes a little, but otherwise, will submit 
> a cleaned up diff.  As a side note, does anyone know how we can get Hudson 
> support for Giraph?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-14) Support for the Facebook Hadoop branch

2011-08-30 Thread Avery Ching (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-14?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avery Ching updated GIRAPH-14:
--

Attachment: facebook3.patch

Updated with README instructions for building with the Facebook Hadoop release.

> Support for the Facebook Hadoop branch
> --
>
> Key: GIRAPH-14
> URL: https://issues.apache.org/jira/browse/GIRAPH-14
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Avery Ching
>Assignee: Avery Ching
> Attachments: facebook.txt, facebook2.txt, facebook3.patch
>
>
> I've been working with Joe Xie on support to get Giraph running on the 
> Facebook Hadoop branch.  He verified today that the examples worked on their 
> cluster.  I need to clean up my changes a little, but otherwise, will submit 
> a cleaned up diff.  As a side note, does anyone know how we can get Hudson 
> support for Giraph?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-18) Refactor BspServiceWorker::loadVertices()

2011-08-30 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-18?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094330#comment-13094330
 ] 

Avery Ching commented on GIRAPH-18:
---

This isn't the only area that needs refactoring =).  Let me take a deeper look 
tomorrow, but initially looks better.

> Refactor BspServiceWorker::loadVertices()
> -
>
> Key: GIRAPH-18
> URL: https://issues.apache.org/jira/browse/GIRAPH-18
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Jakob Homan
>Assignee: Jakob Homan
> Attachments: GIRAPH-18.patch
>
>
> Currently BspServiceWorker::loadVertices() is more than 200 lines and 
> convoluted. I found it difficult to grok while debugging today.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-18) Refactor BspServiceWorker::loadVertices()

2011-08-31 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-18?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094686#comment-13094686
 ] 

Avery Ching commented on GIRAPH-18:
---

+1

Nice work refactoring, makes the code more readable.  Sorry it took me so long 
to review, but it's tougher for me without my trusty reviewboard =).  I was 
able to pass the unittests with your changes.

I also like your change to save some memory (every bit helps).  

Couple of style notes:

We have a CODE_CONVENTIONS file in the base path, probably this should be 
updated?  I'll file a separate JIRA for this.

1.  Map> ->  Map>
2.  Current style is limitted to 80 chars per line (or should be).  Maybe this 
is unrealistic?
3.  Some 2 style indentation, i.e.

while ((inputSplitPath = reserveInputSplit()) != null) {
  Map> maxIndexStatMap =


> Refactor BspServiceWorker::loadVertices()
> -
>
> Key: GIRAPH-18
> URL: https://issues.apache.org/jira/browse/GIRAPH-18
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Jakob Homan
>Assignee: Jakob Homan
> Attachments: GIRAPH-18.patch
>
>
> Currently BspServiceWorker::loadVertices() is more than 200 lines and 
> convoluted. I found it difficult to grok while debugging today.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (GIRAPH-14) Support for the Facebook Hadoop branch

2011-08-31 Thread Avery Ching (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-14?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avery Ching resolved GIRAPH-14.
---

Resolution: Fixed

Committed, with changelog addition.

> Support for the Facebook Hadoop branch
> --
>
> Key: GIRAPH-14
> URL: https://issues.apache.org/jira/browse/GIRAPH-14
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Avery Ching
>Assignee: Avery Ching
> Attachments: facebook.txt, facebook2.txt, facebook3.patch
>
>
> I've been working with Joe Xie on support to get Giraph running on the 
> Facebook Hadoop branch.  He verified today that the examples worked on their 
> cluster.  I need to clean up my changes a little, but otherwise, will submit 
> a cleaned up diff.  As a side note, does anyone know how we can get Hudson 
> support for Giraph?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-21) Revise CODE_CONVENTIONS

2011-08-31 Thread Avery Ching (JIRA)
Revise CODE_CONVENTIONS
---

 Key: GIRAPH-21
 URL: https://issues.apache.org/jira/browse/GIRAPH-21
 Project: Giraph
  Issue Type: Improvement
Reporter: Avery Ching
Priority: Minor


Currently there is a CODE_CONVENTIONS file in the base path of Giraph.  It's 
fairly sparse and we have been assuming an 80 char limit per line.  It's good 
to have common conventions so that the code doesn't get too messy.  Does anyone 
have any opinions on this now?  Probably best to tackle early and then have 
something to follow.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-21) Revise CODE_CONVENTIONS

2011-08-31 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-21?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094768#comment-13094768
 ] 

Avery Ching commented on GIRAPH-21:
---

I'll definitely let this continue to flesh out, but 80 chars and 2 spaces is 
fine with me.  I will modify/augment the CODE_CONVENTIONS file and then when we 
have consensus, I will commit as well as try to get Eclipse to help me change 
the source to match. Btw, it seems like Owen doesn't like abbreviations, so we 
can add that here too. =)

> Revise CODE_CONVENTIONS
> ---
>
> Key: GIRAPH-21
> URL: https://issues.apache.org/jira/browse/GIRAPH-21
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Avery Ching
>Priority: Minor
>
> Currently there is a CODE_CONVENTIONS file in the base path of Giraph.  It's 
> fairly sparse and we have been assuming an 80 char limit per line.  It's good 
> to have common conventions so that the code doesn't get too messy.  Does 
> anyone have any opinions on this now?  Probably best to tackle early and then 
> have something to follow.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (GIRAPH-21) Revise CODE_CONVENTIONS

2011-08-31 Thread Avery Ching (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-21?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avery Ching reassigned GIRAPH-21:
-

Assignee: Avery Ching

> Revise CODE_CONVENTIONS
> ---
>
> Key: GIRAPH-21
> URL: https://issues.apache.org/jira/browse/GIRAPH-21
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Avery Ching
>Assignee: Avery Ching
>Priority: Minor
>
> Currently there is a CODE_CONVENTIONS file in the base path of Giraph.  It's 
> fairly sparse and we have been assuming an 80 char limit per line.  It's good 
> to have common conventions so that the code doesn't get too messy.  Does 
> anyone have any opinions on this now?  Probably best to tackle early and then 
> have something to follow.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-18) Refactor BspServiceWorker::loadVertices()

2011-08-31 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-18?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094793#comment-13094793
 ] 

Avery Ching commented on GIRAPH-18:
---

+1
Looks better, thanks.  Hey, could you add a javadoc comment for 
readVerticesFromInputSplit?  No need to attach the patch again, though, you can 
just check it in.

> Refactor BspServiceWorker::loadVertices()
> -
>
> Key: GIRAPH-18
> URL: https://issues.apache.org/jira/browse/GIRAPH-18
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Jakob Homan
>Assignee: Jakob Homan
> Attachments: GIRAPH-18.patch, GIRAPH-18b.patch
>
>
> Currently BspServiceWorker::loadVertices() is more than 200 lines and 
> convoluted. I found it difficult to grok while debugging today.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-19) Create a CHANGES.txt file

2011-08-31 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-19?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094840#comment-13094840
 ] 

Avery Ching commented on GIRAPH-19:
---

Should we add dates too?  I don't have a preference on the ordering either, but 
agree that consistency is important.

> Create a CHANGES.txt file
> -
>
> Key: GIRAPH-19
> URL: https://issues.apache.org/jira/browse/GIRAPH-19
> Project: Giraph
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>
> It is helpful to have a file that is updated with each change along with who 
> contributed and committed the patch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-22) Sort out examples from unit test helpers in examples package

2011-08-31 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094901#comment-13094901
 ] 

Avery Ching commented on GIRAPH-22:
---

Good idea.  I think a few of them are full programs though.  Not sure about the 
best way to do this.

> Sort out examples from unit test helpers in examples package
> 
>
> Key: GIRAPH-22
> URL: https://issues.apache.org/jira/browse/GIRAPH-22
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Jakob Homan
>
> Within src/examples there are quite a few files defined that are mainly used 
> in unit or other tests:
> * GeneratedVertexInputFormat
> * GeneratedVertexInputFormat
> * LongSumAggregator
> * MaxAggregator
> * MinAggregator
> * SimpleCombinerVertex
> * SimpleFailVertex
> * SimpleMsgVertex
> * SimpleMutateGraphVertex
> * SimpleSumCombiner
> * SumAggregator
> * SuperstepBalancer
> Several of these explicitly say they're designed to aid in unit testing.  If 
> these are indeed meant for testing, they should be moved to the test 
> directory.  If they're examples, it would be better to sort out the overly 
> complicated ones and ones that include lots of tests and asserts, so only to 
> show the essence of the example.  Hopefully the examples directory have a 
> few, very heavily documented programs of the helloworld/word count/shortest 
> path variety (with sample inputs) that can be quickly launched.  Once new 
> developers grok these, they can turn to the unit tests, which can of course 
> be great sources to learn the code from.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-23) Giraph causes capacity scheduler to report crazy statistics

2011-08-31 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-23?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094969#comment-13094969
 ] 

Avery Ching commented on GIRAPH-23:
---

That is weird, haven't seen this issue on 0.20.203 and 0.20.204, nor on the 
0.20.1 release we have at Yahoo!.

> Giraph causes capacity scheduler to report crazy statistics
> ---
>
> Key: GIRAPH-23
> URL: https://issues.apache.org/jira/browse/GIRAPH-23
> Project: Giraph
>  Issue Type: Bug
> Environment: Hadoop 20.2, non-secure with capacity scheduler
>Reporter: Jakob Homan
>
> Not sure why, but all our Giraph jobs create crazy values for the scheduler 
> in terms of number of mappers:
> {noformat}51 running map tasks using -52224 map slots, 0 running reduce tasks 
> using 0 reduce slots. {noformat}
> and this trickles out to the whole cluster:
> {noformat}Used capacity: -58229 (-12468.7% of Capacity){noformat}
> These numbers don't appear to affect the job and the correct themselves a 
> short time after the Giraph job finishes running.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-24) Job-level statistics reports one superstep greater than workers

2011-08-31 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-24?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095042#comment-13095042
 ] 

Avery Ching commented on GIRAPH-24:
---

I thought I noticed this at some point, but forgot about it.  Looks good. +1

> Job-level statistics reports one superstep greater than workers
> ---
>
> Key: GIRAPH-24
> URL: https://issues.apache.org/jira/browse/GIRAPH-24
> Project: Giraph
>  Issue Type: Bug
>Reporter: Jakob Homan
>Assignee: Jakob Homan
> Attachments: GIRAPH-24.patch
>
>
> In {{BspServiceMaster::coordinateSuperstep()}} the {{superStepCounter}} is 
> incremented when the coordination begins, but since the counter starts at 
> zero, this has the job level statistic being at superstep {{n+1}} when the 
> workers are reporting that they are working on {{n}}.  This discrepancy 
> persists throughout the job.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-13) Port Giraph to YARN

2011-08-31 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-13?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095043#comment-13095043
 ] 

Avery Ching commented on GIRAPH-13:
---

This is going to be a fun one. =)  Thanks for taking it on.

> Port Giraph to YARN
> ---
>
> Key: GIRAPH-13
> URL: https://issues.apache.org/jira/browse/GIRAPH-13
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Jakob Homan
>Assignee: Jakob Homan
>
> Now that YARN (aka MR2 aka MAPREDUCE-279) has been merged into the Hadoop 
> trunk, we should think about what it would take to separate out the graph 
> processing bits of Giraph from the MR1-specific code so as to take advantage 
> of the less-MR centric aspects of YARN, while still supporting both over the 
> medium term.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-15) Use of Jenkins for tests and builds

2011-09-01 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095778#comment-13095778
 ] 

Avery Ching commented on GIRAPH-15:
---

Thanks for the status update.  I'm excited to get this working.

> Use of Jenkins for tests and builds
> ---
>
> Key: GIRAPH-15
> URL: https://issues.apache.org/jira/browse/GIRAPH-15
> Project: Giraph
>  Issue Type: Task
>Reporter: Hyunsik Choi
>Assignee: Hyunsik Choi
>
> We can use Jenkins server (https://builds.apache.org/) for regular builds and 
> tests. To use jenkins, there are some processes.
> Here is FAQ about use of Jenkins.
> http://wiki.apache.org/general/Hudson

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-15) Use of Jenkins for tests and builds

2011-09-02 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096058#comment-13096058
 ] 

Avery Ching commented on GIRAPH-15:
---

Looks good, nice job.

> Use of Jenkins for tests and builds
> ---
>
> Key: GIRAPH-15
> URL: https://issues.apache.org/jira/browse/GIRAPH-15
> Project: Giraph
>  Issue Type: Task
>Reporter: Hyunsik Choi
>Assignee: Hyunsik Choi
>
> We can use Jenkins server (https://builds.apache.org/) for regular builds and 
> tests. To use jenkins, there are some processes.
> Here is FAQ about use of Jenkins.
> http://wiki.apache.org/general/Hudson

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-21) Revise CODE_CONVENTIONS

2011-09-02 Thread Avery Ching (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-21?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avery Ching updated GIRAPH-21:
--

Attachment: GIRAPH-21.diff

First proposal of the developer suggested code conventions.

> Revise CODE_CONVENTIONS
> ---
>
> Key: GIRAPH-21
> URL: https://issues.apache.org/jira/browse/GIRAPH-21
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Avery Ching
>Assignee: Avery Ching
>Priority: Minor
> Attachments: GIRAPH-21.diff
>
>
> Currently there is a CODE_CONVENTIONS file in the base path of Giraph.  It's 
> fairly sparse and we have been assuming an 80 char limit per line.  It's good 
> to have common conventions so that the code doesn't get too messy.  Does 
> anyone have any opinions on this now?  Probably best to tackle early and then 
> have something to follow.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-25) NPE in BspServiceMaster when failing a job

2011-09-04 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-25?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096832#comment-13096832
 ] 

Avery Ching commented on GIRAPH-25:
---

Definitely should be handled more gracefully.  Thanks for filing the issue.

> NPE in BspServiceMaster when failing a job
> --
>
> Key: GIRAPH-25
> URL: https://issues.apache.org/jira/browse/GIRAPH-25
> Project: Giraph
>  Issue Type: Bug
>Reporter: Dmitriy V. Ryaboy
>Priority: Minor
>
> When BspServiceMaster times out waiting for all workers to check in, it dies 
> with a NullPointerException.
> This can perhaps be handled a bit more gracefully.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-11) Improve the graph distribution of Giraph

2011-09-06 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-11?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098098#comment-13098098
 ] 

Avery Ching commented on GIRAPH-11:
---

I'm going to assume you're asking about the current partitioning.  If I'm 
wrong, I'll address what we plan to do in the future.  The current partitioning 
is implemented by assuming that the input splits are sorted globally (i.e. two 
input split of {A, B, C} {D, E}).  It will break the input splits into vertex 
ranges where the boundaries will not change.  These vertex ranges can be passed 
around the workers via several different balancers.  The balancer can be set 
via setVertexRangeBalancerClass() from GiraphJob or with the right 
configuration parameter (giraph.vertexRangeBalancerClass).  We have some 
implementations for a static balancer (no vertex movement, default), and an 
auto balancer (configurable to balance based on vertices or edges).  You're 
free to implement your own as well.  Hope that answers some of the questions, 
let me know if you have more.

> Improve the graph distribution of Giraph
> 
>
> Key: GIRAPH-11
> URL: https://issues.apache.org/jira/browse/GIRAPH-11
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Avery Ching
>Assignee: Avery Ching
>
> Currently, Giraph assumes that the data from the VertexInputFormat is sorted. 
>  If the user data is not sorted by the vertex id, they must first run a 
> MapReduce or Pig job to generate a sorted dataset.  This is often a bit 
> inconvenient.
> Giraph graph partitioning is currently range based and there are some 
> advantages and disadvantages of this approach.  The proposal of this JIRA 
> would be to allow for both range and hash based partitioning and provide more 
> flexibility to the user.
> Design goals for the graph distribution:
> * Allow vertices to be unordered or unordered
> * Ability to repartition
> * Select the partitioning scheme based on user needs (i.e. hash or range 
> based)
> * Ability to provide user-specific hints about partitions
> Hash-based partitioning
> * Good vertex balancing across ranges for random data
> * Bad at vertex id locality
> Range-based partitioning
> * Good at vertex id locality
> * Ability to split ranges easily
> * Can cause hotspots for hot ranges

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-06 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098512#comment-13098512
 ] 

Avery Ching commented on GIRAPH-12:
---

Jake from Twitter also recommended thinking about using Finagle.  His 
description:

"A fault tolerant, protocol-agnostic RPC system" based on Netty [which I see is 
already under consideration], written in scala, but with very mature java 
bindings too).  We use it internally at Twitter for clusters of mid-tier 
servers which have many dozens of machines talking to hundreds of other 
machines, without blowing up on thread-stack or using a gazillion threads.  
It's mavenized, so it's easy to try out.

> Investigate communication improvements
> --
>
> Key: GIRAPH-12
> URL: https://issues.apache.org/jira/browse/GIRAPH-12
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Avery Ching
>Assignee: Hyunsik Choi
>Priority: Minor
>
> Currently every worker will start up a thread to communicate with every other 
> workers.  Hadoop RPC is used for communication.  For instance if there are 
> 400 workers, each worker will create 400 threads.  This ends up using a lot 
> of memory, even with the option  
> -Dmapred.child.java.opts="-Xss64k".  
> It would be good to investigate using frameworks like Netty or custom roll 
> our own to improve this situation.  By moving away from Hadoop RPC, we would 
> also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-26) Improve PseudoRandomVertexInputFormat to create a more realistic synthetic graph (e.g. power-law distributed vertex-cardinality).

2011-09-06 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-26?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098594#comment-13098594
 ] 

Avery Ching commented on GIRAPH-26:
---

Totally agree, any chance you might have some time to work on this? =)

> Improve PseudoRandomVertexInputFormat to create a more realistic synthetic 
> graph (e.g. power-law distributed vertex-cardinality).
> -
>
> Key: GIRAPH-26
> URL: https://issues.apache.org/jira/browse/GIRAPH-26
> Project: Giraph
>  Issue Type: Test
>  Components: benchmark
>Reporter: Jake Mannix
>Priority: Minor
>
> The PageRankBenchmark class, to be a proper benchmark, should run over graphs 
> which look more like data seen in the wild, and web link graphs, social 
> network graphs, and text corpora (represented as a bipartite graph) all have 
> power-law distributions, so benchmarking a synthetic graph which looks more 
> like this would be a nice test which would stress cases of uneven 
> split-distribution and bottlenecks of subclusters of the graph of heavily 
> connected vertices.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-25) NPE in BspServiceMaster when failing a job

2011-09-07 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-25?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13099055#comment-13099055
 ] 

Avery Ching commented on GIRAPH-25:
---

Thanks for the patch Dmitriy!  I'll review it, add a unittest and the commit if 
it works as expected.

> NPE in BspServiceMaster when failing a job
> --
>
> Key: GIRAPH-25
> URL: https://issues.apache.org/jira/browse/GIRAPH-25
> Project: Giraph
>  Issue Type: Bug
>Reporter: Dmitriy V. Ryaboy
>Priority: Minor
> Attachments: GIRAPH-25.patch
>
>
> When BspServiceMaster times out waiting for all workers to check in, it dies 
> with a NullPointerException.
> This can perhaps be handled a bit more gracefully.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-26) Improve PseudoRandomVertexInputFormat to create a more realistic synthetic graph (e.g. power-law distributed vertex-cardinality).

2011-09-07 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-26?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13099693#comment-13099693
 ] 

Avery Ching commented on GIRAPH-26:
---

A skim of the paper appears that you will need to use Giraph to create your 
scale free graph with a number of iterations (supersteps).

> Improve PseudoRandomVertexInputFormat to create a more realistic synthetic 
> graph (e.g. power-law distributed vertex-cardinality).
> -
>
> Key: GIRAPH-26
> URL: https://issues.apache.org/jira/browse/GIRAPH-26
> Project: Giraph
>  Issue Type: Test
>  Components: benchmark
>Reporter: Jake Mannix
>Priority: Minor
>
> The PageRankBenchmark class, to be a proper benchmark, should run over graphs 
> which look more like data seen in the wild, and web link graphs, social 
> network graphs, and text corpora (represented as a bipartite graph) all have 
> power-law distributions, so benchmarking a synthetic graph which looks more 
> like this would be a nice test which would stress cases of uneven 
> split-distribution and bottlenecks of subclusters of the graph of heavily 
> connected vertices.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-25) NPE in BspServiceMaster when failing a job

2011-09-08 Thread Avery Ching (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-25?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avery Ching updated GIRAPH-25:
--

Attachment: GIRAPH-25.2.patch

Minor changes to the original (unittest, error message).

> NPE in BspServiceMaster when failing a job
> --
>
> Key: GIRAPH-25
> URL: https://issues.apache.org/jira/browse/GIRAPH-25
> Project: Giraph
>  Issue Type: Bug
>Reporter: Dmitriy V. Ryaboy
>Priority: Minor
> Attachments: GIRAPH-25.2.patch, GIRAPH-25.patch
>
>
> When BspServiceMaster times out waiting for all workers to check in, it dies 
> with a NullPointerException.
> This can perhaps be handled a bit more gracefully.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-25) NPE in BspServiceMaster when failing a job

2011-09-08 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-25?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100564#comment-13100564
 ] 

Avery Ching commented on GIRAPH-25:
---

Patch worked nicely.  I added a unittest and tweaked an error message.  Here's 
some example output I got (looks much better).

...
2011-09-08 11:20:35,203 INFO org.apache.giraph.graph.BspServiceMaster: 
checkWorkers: Only found 0 responses of 32767 needed to start superstep -1.  
Sleeping for 1 msecs and used 0 of 1 attempts.
2011-09-08 11:20:35,203 ERROR org.apache.giraph.graph.BspServiceMaster: 
checkWorkers: Did not receive enough processes in time (only 0 of 32767 
required).  This occurs if you do not have enough map tasks available 
simultaneously on your Hadoop instance to fulfill the number of requested 
workers.
2011-09-08 11:20:35,276 INFO org.apache.giraph.graph.BspServiceMaster: 
setJobState: 
{"_stateKey":"FAILED","_applicationAttemptKey":-1,"_superstepKey":-1} on 
superstep -1
2011-09-08 11:20:35,333 FATAL org.apache.giraph.graph.BspServiceMaster: 
failJob: Killing job job_201109080935_0009
2011-09-08 11:20:35,619 INFO org.apache.giraph.graph.BspServiceMaster: cleanup: 
Notifying master its okay to cleanup with 
/_hadoopBsp/job_201109080935_0009/_cleanedUpDir/0_master
2011-09-08 11:20:35,620 INFO org.apache.giraph.graph.BspServiceMaster: 
cleanUpZooKeeper: Node /_hadoopBsp/job_201109080935_0009/_cleanedUpDir already 
exists, no need to create.
2011-09-08 11:20:35,621 INFO org.apache.giraph.graph.BspServiceMaster: 
cleanUpZooKeeper: Got 1 of 32768 desired children from 
/_hadoopBsp/job_201109080935_0009/_cleanedUpDir
2011-09-08 11:20:35,621 INFO org.apache.giraph.graph.BspServiceMaster: 
cleanedUpZooKeeper: Waiting for the children of 
/_hadoopBsp/job_201109080935_0009/_cleanedUpDir to change since only got 1 
nodes.
2011-09-08 11:20:38,182 WARN org.apache.giraph.zk.ZooKeeperManager: 
onlineZooKeeperServers: Forced a shutdown hook kill of the ZooKeeper process.

I'll upload the minor changes and then commit it on your behalf.  I ran 
unittests in local mode and also on a small Hadoop instance.  Thanks!


> NPE in BspServiceMaster when failing a job
> --
>
> Key: GIRAPH-25
> URL: https://issues.apache.org/jira/browse/GIRAPH-25
> Project: Giraph
>  Issue Type: Bug
>Reporter: Dmitriy V. Ryaboy
>Priority: Minor
> Attachments: GIRAPH-25.2.patch, GIRAPH-25.patch
>
>
> When BspServiceMaster times out waiting for all workers to check in, it dies 
> with a NullPointerException.
> This can perhaps be handled a bit more gracefully.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (GIRAPH-25) NPE in BspServiceMaster when failing a job

2011-09-08 Thread Avery Ching (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-25?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avery Ching reassigned GIRAPH-25:
-

Assignee: Dmitriy V. Ryaboy

> NPE in BspServiceMaster when failing a job
> --
>
> Key: GIRAPH-25
> URL: https://issues.apache.org/jira/browse/GIRAPH-25
> Project: Giraph
>  Issue Type: Bug
>Reporter: Dmitriy V. Ryaboy
>Assignee: Dmitriy V. Ryaboy
>Priority: Minor
> Attachments: GIRAPH-25.2.patch, GIRAPH-25.patch
>
>
> When BspServiceMaster times out waiting for all workers to check in, it dies 
> with a NullPointerException.
> This can perhaps be handled a bit more gracefully.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-25) NPE in BspServiceMaster when failing a job

2011-09-08 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-25?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100667#comment-13100667
 ] 

Avery Ching commented on GIRAPH-25:
---

Yup, I added you and Jakob to the contributors list and assigned to you.  I 
agree with your commit message description to not fill up the svn logs.

> NPE in BspServiceMaster when failing a job
> --
>
> Key: GIRAPH-25
> URL: https://issues.apache.org/jira/browse/GIRAPH-25
> Project: Giraph
>  Issue Type: Bug
>Reporter: Dmitriy V. Ryaboy
>Assignee: Dmitriy V. Ryaboy
>Priority: Minor
> Attachments: GIRAPH-25.2.patch, GIRAPH-25.patch
>
>
> When BspServiceMaster times out waiting for all workers to check in, it dies 
> with a NullPointerException.
> This can perhaps be handled a bit more gracefully.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (GIRAPH-25) NPE in BspServiceMaster when failing a job

2011-09-08 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-25?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100667#comment-13100667
 ] 

Avery Ching edited comment on GIRAPH-25 at 9/8/11 8:31 PM:
---

Yup, I added you and Jake to the contributors list and assigned to you.  I 
agree with your commit message description to not fill up the svn logs.

  was (Author: aching):
Yup, I added you and Jakob to the contributors list and assigned to you.  I 
agree with your commit message description to not fill up the svn logs.
  
> NPE in BspServiceMaster when failing a job
> --
>
> Key: GIRAPH-25
> URL: https://issues.apache.org/jira/browse/GIRAPH-25
> Project: Giraph
>  Issue Type: Bug
>Reporter: Dmitriy V. Ryaboy
>Assignee: Dmitriy V. Ryaboy
>Priority: Minor
> Attachments: GIRAPH-25.2.patch, GIRAPH-25.patch
>
>
> When BspServiceMaster times out waiting for all workers to check in, it dies 
> with a NullPointerException.
> This can perhaps be handled a bit more gracefully.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-27) Mutable static global state in Vertex.java should be refactored

2011-09-08 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100829#comment-13100829
 ] 

Avery Ching commented on GIRAPH-27:
---

I'm revising it a little with some formatting changes, but overall it looks 
good.  I'd like to submit a slight revision before commit.

> Mutable static global state in Vertex.java should be refactored
> ---
>
> Key: GIRAPH-27
> URL: https://issues.apache.org/jira/browse/GIRAPH-27
> Project: Giraph
>  Issue Type: Improvement
>  Components: graph
>Affects Versions: 0.70.0
>Reporter: Jake Mannix
>Assignee: Jake Mannix
> Attachments: GIRAPH-27.patch, GIRAPH-27.patch
>
>
> Vertex.java has a bunch of static methods for getting/setting global graph 
> state (total number of vertices, edges, a reference to the GraphMapper, etc). 
>  Refactoring this into a GraphState object, which every Vertex can hold onto 
> a reference to (yes, a tiny bit more memory per Vertex, but in comparison to 
> what's already in there...)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-27) Mutable static global state in Vertex.java should be refactored

2011-09-08 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100830#comment-13100830
 ] 

Avery Ching commented on GIRAPH-27:
---

btw, I'll also run the page rank benchmark on a real cluster as well.

> Mutable static global state in Vertex.java should be refactored
> ---
>
> Key: GIRAPH-27
> URL: https://issues.apache.org/jira/browse/GIRAPH-27
> Project: Giraph
>  Issue Type: Improvement
>  Components: graph
>Affects Versions: 0.70.0
>Reporter: Jake Mannix
>Assignee: Jake Mannix
> Attachments: GIRAPH-27.patch, GIRAPH-27.patch
>
>
> Vertex.java has a bunch of static methods for getting/setting global graph 
> state (total number of vertices, edges, a reference to the GraphMapper, etc). 
>  Refactoring this into a GraphState object, which every Vertex can hold onto 
> a reference to (yes, a tiny bit more memory per Vertex, but in comparison to 
> what's already in there...)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-27) Mutable static global state in Vertex.java should be refactored

2011-09-08 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100849#comment-13100849
 ] 

Avery Ching commented on GIRAPH-27:
---

I made some revisions to Jake's fix to 
- Do not expose GraphState to application developers
- Fixing a few formatting issues
https://reviews.apache.org/r/1771/

I also passed unittests and ran the PageRankBenchmark on a Yahoo! cluster with 
100 workers and 500k vertices.  If Jake is okay with the changes then I'll 
commit it on his behalf.

> Mutable static global state in Vertex.java should be refactored
> ---
>
> Key: GIRAPH-27
> URL: https://issues.apache.org/jira/browse/GIRAPH-27
> Project: Giraph
>  Issue Type: Improvement
>  Components: graph
>Affects Versions: 0.70.0
>Reporter: Jake Mannix
>Assignee: Jake Mannix
> Attachments: GIRAPH-27.patch, GIRAPH-27.patch
>
>
> Vertex.java has a bunch of static methods for getting/setting global graph 
> state (total number of vertices, edges, a reference to the GraphMapper, etc). 
>  Refactoring this into a GraphState object, which every Vertex can hold onto 
> a reference to (yes, a tiny bit more memory per Vertex, but in comparison to 
> what's already in there...)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-27) Mutable static global state in Vertex.java should be refactored

2011-09-08 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100856#comment-13100856
 ] 

Avery Ching commented on GIRAPH-27:
---

Thanks.  I just updated it.

> Mutable static global state in Vertex.java should be refactored
> ---
>
> Key: GIRAPH-27
> URL: https://issues.apache.org/jira/browse/GIRAPH-27
> Project: Giraph
>  Issue Type: Improvement
>  Components: graph
>Affects Versions: 0.70.0
>Reporter: Jake Mannix
>Assignee: Jake Mannix
> Attachments: GIRAPH-27.patch, GIRAPH-27.patch
>
>
> Vertex.java has a bunch of static methods for getting/setting global graph 
> state (total number of vertices, edges, a reference to the GraphMapper, etc). 
>  Refactoring this into a GraphState object, which every Vertex can hold onto 
> a reference to (yes, a tiny bit more memory per Vertex, but in comparison to 
> what's already in there...)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-27) Mutable static global state in Vertex.java should be refactored

2011-09-08 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100928#comment-13100928
 ] 

Avery Ching commented on GIRAPH-27:
---

That is actually intentional, since I need to have access to the 
get/setGraphState() internally and I removed the get/setGraphState() from 
BasicVertex.  So rather than expose get/setGraphState() to the user 
(BasicVertex), I opted to to this.  I suppose we could have another interface 
internally that extended BasicVertex to allow getting and setting the graph 
state if you're concerned about exposed the vertex to the internals.  Let me 
know what you think.

> Mutable static global state in Vertex.java should be refactored
> ---
>
> Key: GIRAPH-27
> URL: https://issues.apache.org/jira/browse/GIRAPH-27
> Project: Giraph
>  Issue Type: Improvement
>  Components: graph
>Affects Versions: 0.70.0
>Reporter: Jake Mannix
>Assignee: Jake Mannix
> Attachments: GIRAPH-27.patch, GIRAPH-27.patch
>
>
> Vertex.java has a bunch of static methods for getting/setting global graph 
> state (total number of vertices, edges, a reference to the GraphMapper, etc). 
>  Refactoring this into a GraphState object, which every Vertex can hold onto 
> a reference to (yes, a tiny bit more memory per Vertex, but in comparison to 
> what's already in there...)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-27) Mutable static global state in Vertex.java should be refactored

2011-09-08 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100948#comment-13100948
 ] 

Avery Ching commented on GIRAPH-27:
---

One alternative is to change BasicVertex to an abstract class that implements 
get/setGraphState as package private methods.  Users won't have access to 
get/setGraphState, while your primitive implementation would (since it's part 
of the same package).  Thoughts?  If you like it, I can submit a revised 
reviewboard request.

> Mutable static global state in Vertex.java should be refactored
> ---
>
> Key: GIRAPH-27
> URL: https://issues.apache.org/jira/browse/GIRAPH-27
> Project: Giraph
>  Issue Type: Improvement
>  Components: graph
>Affects Versions: 0.70.0
>Reporter: Jake Mannix
>Assignee: Jake Mannix
> Attachments: GIRAPH-27.patch, GIRAPH-27.patch
>
>
> Vertex.java has a bunch of static methods for getting/setting global graph 
> state (total number of vertices, edges, a reference to the GraphMapper, etc). 
>  Refactoring this into a GraphState object, which every Vertex can hold onto 
> a reference to (yes, a tiny bit more memory per Vertex, but in comparison to 
> what's already in there...)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-27) Mutable static global state in Vertex.java should be refactored

2011-09-08 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101010#comment-13101010
 ] 

Avery Ching commented on GIRAPH-27:
---

Let's do this Jake, I'll do some of the cleanup here and repost.  If you agree, 
I'll commit.  Then you can make any additional changes in another JIRA that is 
specific for your primitives implementation.  What do you think?  I can do some 
of it in the next 10-15 min.

> Mutable static global state in Vertex.java should be refactored
> ---
>
> Key: GIRAPH-27
> URL: https://issues.apache.org/jira/browse/GIRAPH-27
> Project: Giraph
>  Issue Type: Improvement
>  Components: graph
>Affects Versions: 0.70.0
>Reporter: Jake Mannix
>Assignee: Jake Mannix
> Attachments: GIRAPH-27.patch, GIRAPH-27.patch
>
>
> Vertex.java has a bunch of static methods for getting/setting global graph 
> state (total number of vertices, edges, a reference to the GraphMapper, etc). 
>  Refactoring this into a GraphState object, which every Vertex can hold onto 
> a reference to (yes, a tiny bit more memory per Vertex, but in comparison to 
> what's already in there...)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-27) Mutable static global state in Vertex.java should be refactored

2011-09-08 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101030#comment-13101030
 ] 

Avery Ching commented on GIRAPH-27:
---

Btw, with respect to import ordering, can you please voice your preferences on 
https://issues.apache.org/jira/browse/GIRAPH-21 ?  Thanks!

> Mutable static global state in Vertex.java should be refactored
> ---
>
> Key: GIRAPH-27
> URL: https://issues.apache.org/jira/browse/GIRAPH-27
> Project: Giraph
>  Issue Type: Improvement
>  Components: graph
>Affects Versions: 0.70.0
>Reporter: Jake Mannix
>Assignee: Jake Mannix
> Attachments: GIRAPH-27.patch, GIRAPH-27.patch
>
>
> Vertex.java has a bunch of static methods for getting/setting global graph 
> state (total number of vertices, edges, a reference to the GraphMapper, etc). 
>  Refactoring this into a GraphState object, which every Vertex can hold onto 
> a reference to (yes, a tiny bit more memory per Vertex, but in comparison to 
> what's already in there...)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-27) Mutable static global state in Vertex.java should be refactored

2011-09-08 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101031#comment-13101031
 ] 

Avery Ching commented on GIRAPH-27:
---

Waiting on another committer to +1 before committing.

> Mutable static global state in Vertex.java should be refactored
> ---
>
> Key: GIRAPH-27
> URL: https://issues.apache.org/jira/browse/GIRAPH-27
> Project: Giraph
>  Issue Type: Improvement
>  Components: graph
>Affects Versions: 0.70.0
>Reporter: Jake Mannix
>Assignee: Jake Mannix
> Attachments: GIRAPH-27.patch, GIRAPH-27.patch
>
>
> Vertex.java has a bunch of static methods for getting/setting global graph 
> state (total number of vertices, edges, a reference to the GraphMapper, etc). 
>  Refactoring this into a GraphState object, which every Vertex can hold onto 
> a reference to (yes, a tiny bit more memory per Vertex, but in comparison to 
> what's already in there...)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-11) Improve the graph distribution of Giraph

2011-09-09 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-11?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101593#comment-13101593
 ] 

Avery Ching commented on GIRAPH-11:
---

The hash partitioning will be based on hashCode() by default, but the user can 
implement something they like as well based on the vertex id.  I am designing 
it to get hash based and hash range based.  In a pure hash-based distribution, 
you should get great load balancing.  In a hash-range based distribution, the 
user could possibly get some locality benefits without changing anything from 
the hash based partitioning.  Then finally, there should be a way for the user 
to do a pure range based split of the id space, but this requires the most work 
by the user to specify their division of the id space (depends on the type).

The hash based and hash-range based schemes will be implemented by default and 
will be selectable by users.  The range based scheme will be a partial 
implementation since we require users to do the id range partitioning.  
Additionally, we will provide the API for users to implement their own graph 
partitioning scheme.

Let me know what you think.

> Improve the graph distribution of Giraph
> 
>
> Key: GIRAPH-11
> URL: https://issues.apache.org/jira/browse/GIRAPH-11
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Avery Ching
>Assignee: Avery Ching
>
> Currently, Giraph assumes that the data from the VertexInputFormat is sorted. 
>  If the user data is not sorted by the vertex id, they must first run a 
> MapReduce or Pig job to generate a sorted dataset.  This is often a bit 
> inconvenient.
> Giraph graph partitioning is currently range based and there are some 
> advantages and disadvantages of this approach.  The proposal of this JIRA 
> would be to allow for both range and hash based partitioning and provide more 
> flexibility to the user.
> Design goals for the graph distribution:
> * Allow vertices to be unordered or unordered
> * Ability to repartition
> * Select the partitioning scheme based on user needs (i.e. hash or range 
> based)
> * Ability to provide user-specific hints about partitions
> Hash-based partitioning
> * Good vertex balancing across ranges for random data
> * Bad at vertex id locality
> Range-based partitioning
> * Good at vertex id locality
> * Ability to split ranges easily
> * Can cause hotspots for hot ranges

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (GIRAPH-25) NPE in BspServiceMaster when failing a job

2011-09-09 Thread Avery Ching (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-25?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avery Ching resolved GIRAPH-25.
---

Resolution: Fixed

Not sure if I am supposed to close this issue, or the reporter should, but I'll 
close it since it's been committed.  Please reopen if there is an issue.

> NPE in BspServiceMaster when failing a job
> --
>
> Key: GIRAPH-25
> URL: https://issues.apache.org/jira/browse/GIRAPH-25
> Project: Giraph
>  Issue Type: Bug
>Reporter: Dmitriy V. Ryaboy
>Assignee: Dmitriy V. Ryaboy
>Priority: Minor
> Attachments: GIRAPH-25.2.patch, GIRAPH-25.patch
>
>
> When BspServiceMaster times out waiting for all workers to check in, it dies 
> with a NullPointerException.
> This can perhaps be handled a bit more gracefully.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-11) Improve the graph distribution of Giraph

2011-09-09 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-11?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101710#comment-13101710
 ] 

Avery Ching commented on GIRAPH-11:
---

Regarding the difference in hash based and hash rang based, it refers to how 
the hash code is assigned to a partition.  The application dev will implement 
hashCode() for their vertex id and then the assignment of the hashCode() to a 
partition can be hashed (i.e. hashCode() % # partitions) or range based 
([0-a),[a-b)...etc).  Hope that's more clear.  Code will help.  It's coming 
soon, by mid next week I hope.

> Improve the graph distribution of Giraph
> 
>
> Key: GIRAPH-11
> URL: https://issues.apache.org/jira/browse/GIRAPH-11
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Avery Ching
>Assignee: Avery Ching
>
> Currently, Giraph assumes that the data from the VertexInputFormat is sorted. 
>  If the user data is not sorted by the vertex id, they must first run a 
> MapReduce or Pig job to generate a sorted dataset.  This is often a bit 
> inconvenient.
> Giraph graph partitioning is currently range based and there are some 
> advantages and disadvantages of this approach.  The proposal of this JIRA 
> would be to allow for both range and hash based partitioning and provide more 
> flexibility to the user.
> Design goals for the graph distribution:
> * Allow vertices to be unordered or unordered
> * Ability to repartition
> * Select the partitioning scheme based on user needs (i.e. hash or range 
> based)
> * Ability to provide user-specific hints about partitions
> Hash-based partitioning
> * Good vertex balancing across ranges for random data
> * Bad at vertex id locality
> Range-based partitioning
> * Good at vertex id locality
> * Ability to split ranges easily
> * Can cause hotspots for hot ranges

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-25) NPE in BspServiceMaster when failing a job

2011-09-09 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-25?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101962#comment-13101962
 ] 

Avery Ching commented on GIRAPH-25:
---

Thanks for the advice.  I'll be doing the same this weekend =).

> NPE in BspServiceMaster when failing a job
> --
>
> Key: GIRAPH-25
> URL: https://issues.apache.org/jira/browse/GIRAPH-25
> Project: Giraph
>  Issue Type: Bug
>Reporter: Dmitriy V. Ryaboy
>Assignee: Dmitriy V. Ryaboy
>Priority: Minor
> Attachments: GIRAPH-25.2.patch, GIRAPH-25.patch
>
>
> When BspServiceMaster times out waiting for all workers to check in, it dies 
> with a NullPointerException.
> This can perhaps be handled a bit more gracefully.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-29) Implement TextVertexInputFormat for text-format graph data

2011-09-10 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-29?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102104#comment-13102104
 ] 

Avery Ching commented on GIRAPH-29:
---

I agree that text format graph data is nice.  

We have a helper abstract classes based on TextInputFormat and TextOutputFormat 
to do this:
org.apache.giraph.lib.TextVertexInputFormat
org.apache.giraph.lib.TextVertexOutputFormat

An example implementation that uses those helper classes is 
org.apache.giraph.lib.JsonBase64VertexInputFormat
org.apache.giraph.lib.JsonBase64VertexInputFormat

Does this satisfy your needs?  Any suggestions for improvement?

> Implement TextVertexInputFormat for text-format graph data
> --
>
> Key: GIRAPH-29
> URL: https://issues.apache.org/jira/browse/GIRAPH-29
> Project: Giraph
>  Issue Type: New Feature
>  Components: bsp
>Reporter: Hyunsik Choi
>Assignee: Hyunsik Choi
>Priority: Minor
> Fix For: 0.70.0
>
>
> Supporting text-format graph data would be nice. It is helpful for developing 
> graph algorithms and debugging because text-format graph data are 
> human-readable and enable users to easily write sample data sets. 
> Furthermore, text-format data are exchangeable regardless of operating 
> systems or programming languages.
> So, we need a basic InputFormat to help users develop user-defined 
> InputFormat classes to deal text-represented graph data sets.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-29) Implement TextVertexInputFormat for text-format graph data

2011-09-10 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-29?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102142#comment-13102142
 ] 

Avery Ching commented on GIRAPH-29:
---

No problem, the real issue is that there is little documentation (my fault).

Contrary to MapReduce (map tasks = input splits), workers need not equal input 
split from VertexInputFormat.  Workers in Giraph process InputSplits as fast as 
possible and may process 0 or more InputSplits.

> Implement TextVertexInputFormat for text-format graph data
> --
>
> Key: GIRAPH-29
> URL: https://issues.apache.org/jira/browse/GIRAPH-29
> Project: Giraph
>  Issue Type: New Feature
>  Components: bsp
>Reporter: Hyunsik Choi
>Assignee: Hyunsik Choi
>Priority: Minor
> Fix For: 0.70.0
>
>
> Supporting text-format graph data would be nice. It is helpful for developing 
> graph algorithms and debugging because text-format graph data are 
> human-readable and enable users to easily write sample data sets. 
> Furthermore, text-format data are exchangeable regardless of operating 
> systems or programming languages.
> So, we need a basic InputFormat to help users develop user-defined 
> InputFormat classes to deal text-represented graph data sets.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-30) NPE in ZooKeeperManager if base directory cannot be created

2011-09-10 Thread Avery Ching (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-30?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avery Ching updated GIRAPH-30:
--

Attachment: GIRAPH-30.2.patch

> NPE in ZooKeeperManager if base directory cannot be created
> ---
>
> Key: GIRAPH-30
> URL: https://issues.apache.org/jira/browse/GIRAPH-30
> Project: Giraph
>  Issue Type: Bug
>Reporter: Andrew Purtell
>Priority: Minor
> Attachments: GIRAPH-30.2.patch, GIRAPH-30.patch
>
>
> If the base directory cannot be created, for example if running on secure 
> Hadoop and the user home directory does not exist, ZooKeeperManager will 
> throw an NPE when trying to list it. It would be better to throw an 
> IOException with an informative message.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   >