Re: Review Request: Improve concurrency of putMsg / putMsgList

2012-04-24 Thread Avery Ching

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4852/#review7185
---



http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java


Bo, I'm a little leery about converting the List and ArrayList to 
LinkedList and ConcurrentLinkedList.  I believe that linked list's will use 
more memory than the array list due to the double links (forward and backward). 
 Also, is ConcurrentLinkedList supposted to outperform a synchronized 
ArrayList?  I haven't seen much on that.

The concurrenthashmap changes look good.


- Avery


On 2012-04-24 06:11:38, Bo Wang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4852/
> ---
> 
> (Updated 2012-04-24 06:11:38)
> 
> 
> Review request for giraph.
> 
> 
> Summary
> ---
> 
> Use ConcurrentHashMap and ConcurrentLinkedQueue to allow concurrent assess to 
> message map. The concurrencyLevel of ConcurrentHashMap uses the default 
> value. There may be some performance gain by tuning this value.
> 
> 
> This addresses bug GIRAPH-185.
> https://issues.apache.org/jira/browse/GIRAPH-185
> 
> 
> Diffs
> -
> 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
>  1328747 
> 
> Diff: https://reviews.apache.org/r/4852/diff
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bo
> 
>



Re: Review Request: Improve concurrency of putMsg / putMsgList

2012-04-24 Thread Avery Ching


> On 2012-04-24 20:53:33, Avery Ching wrote:
> > http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java,
> >  lines 776-777
> > <https://reviews.apache.org/r/4852/diff/1/?file=104059#file104059line776>
> >
> > Bo, I'm a little leery about converting the List and ArrayList to 
> > LinkedList and ConcurrentLinkedList.  I believe that linked list's will use 
> > more memory than the array list due to the double links (forward and 
> > backward).  Also, is ConcurrentLinkedList supposted to outperform a 
> > synchronized ArrayList?  I haven't seen much on that.
> > 
> > The concurrenthashmap changes look good.
> 
> Bo Wang wrote:
> Avery, thanks for the comments. I just measured the sizes of these 
> classes and below are an estimation. 
> 
> java.util.ArrayList: 149 bytes
> java.util.LinkedList: 101 bytes
> java.util.concurrent.ConcurrentLinkedQueue: 118 bytes
> 
> The tool I was using is a program from the link below.
> http://www.javapractices.com/topic/TopicAction.do?Id=83
> 
> In terms of performance, here is a benchmark.
> 
> http://www.javacodegeeks.com/2010/09/java-best-practices-queue-battle-and.html
> 
> In its test #1 (adding element), ConcurrentLinkedQueue performed slightly 
> better than LinkedList. In test #3 (iterator), LinkedList outperformed 
> ConcurrentLinkedQueue. I think the most time consuming part is add, while 
> iteration is also heavily used but no concurrent accesses. 
> 
>

Thanks for the response Bo.

Those numbers are for the empty data structures I'm assuming.  I was referring 
to the incremental cost of adding elements (messages) to the data structures.  
The performance isn't a a concern to me (unless we call size() somewhere).


- Avery


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4852/#review7185
---


On 2012-04-24 06:11:38, Bo Wang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4852/
> ---
> 
> (Updated 2012-04-24 06:11:38)
> 
> 
> Review request for giraph.
> 
> 
> Summary
> ---
> 
> Use ConcurrentHashMap and ConcurrentLinkedQueue to allow concurrent assess to 
> message map. The concurrencyLevel of ConcurrentHashMap uses the default 
> value. There may be some performance gain by tuning this value.
> 
> 
> This addresses bug GIRAPH-185.
> https://issues.apache.org/jira/browse/GIRAPH-185
> 
> 
> Diffs
> -
> 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
>  1328747 
> 
> Diff: https://reviews.apache.org/r/4852/diff
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bo
> 
>



Re: Review Request: Improve concurrency of putMsg / putMsgList

2012-04-24 Thread Avery Ching


> On 2012-04-24 20:53:33, Avery Ching wrote:
> > http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java,
> >  lines 776-777
> > <https://reviews.apache.org/r/4852/diff/1/?file=104059#file104059line776>
> >
> > Bo, I'm a little leery about converting the List and ArrayList to 
> > LinkedList and ConcurrentLinkedList.  I believe that linked list's will use 
> > more memory than the array list due to the double links (forward and 
> > backward).  Also, is ConcurrentLinkedList supposted to outperform a 
> > synchronized ArrayList?  I haven't seen much on that.
> > 
> > The concurrenthashmap changes look good.
> 
> Bo Wang wrote:
> Avery, thanks for the comments. I just measured the sizes of these 
> classes and below are an estimation. 
> 
> java.util.ArrayList: 149 bytes
> java.util.LinkedList: 101 bytes
> java.util.concurrent.ConcurrentLinkedQueue: 118 bytes
> 
> The tool I was using is a program from the link below.
> http://www.javapractices.com/topic/TopicAction.do?Id=83
> 
> In terms of performance, here is a benchmark.
> 
> http://www.javacodegeeks.com/2010/09/java-best-practices-queue-battle-and.html
> 
> In its test #1 (adding element), ConcurrentLinkedQueue performed slightly 
> better than LinkedList. In test #3 (iterator), LinkedList outperformed 
> ConcurrentLinkedQueue. I think the most time consuming part is add, while 
> iteration is also heavily used but no concurrent accesses. 
> 
>
> 
> Avery Ching wrote:
> Thanks for the response Bo.
> 
> Those numbers are for the empty data structures I'm assuming.  I was 
> referring to the incremental cost of adding elements (messages) to the data 
> structures.  The performance isn't a a concern to me (unless we call size() 
> somewhere).

By the incremental cost, I mean the memory cost, sorry.


- Avery


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4852/#review7185
---


On 2012-04-24 06:11:38, Bo Wang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4852/
> ---
> 
> (Updated 2012-04-24 06:11:38)
> 
> 
> Review request for giraph.
> 
> 
> Summary
> ---
> 
> Use ConcurrentHashMap and ConcurrentLinkedQueue to allow concurrent assess to 
> message map. The concurrencyLevel of ConcurrentHashMap uses the default 
> value. There may be some performance gain by tuning this value.
> 
> 
> This addresses bug GIRAPH-185.
> https://issues.apache.org/jira/browse/GIRAPH-185
> 
> 
> Diffs
> -
> 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
>  1328747 
> 
> Diff: https://reviews.apache.org/r/4852/diff
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bo
> 
>



Re: Review Request: HBase/Accumulo Input and Output formats (on behalf of Brian)

2012-04-30 Thread Avery Ching

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4801/
---

(Updated 2012-04-30 23:54:55.758151)


Review request for giraph.


Changes
---

Update of Brian's 153.2.


Summary
---

Brian's patch for GIRAPH-153.


This addresses bug GIRAPH-153.
https://issues.apache.org/jira/browse/GIRAPH-153


Diffs (updated)
-

  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/LICENSE.txt
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/license-header.txt
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/pom.xml
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/main/assembly/compile.xml
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/format/accumulo/AccumuloVertexInputFormat.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/format/accumulo/AccumuloVertexOutputFormat.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/format/accumulo/package-info.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/format/hbase/HBaseVertexInputFormat.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/format/hbase/HBaseVertexOutputFormat.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/format/hbase/package-info.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/test/java/org/apache/giraph/BspCase.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/test/java/org/apache/giraph/format/accumulo/TestAccumuloVertexFormat.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/test/java/org/apache/giraph/format/accumulo/edgemarker/AccumuloEdgeInputFormat.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/test/java/org/apache/giraph/format/accumulo/edgemarker/AccumuloEdgeOutputFormat.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/test/java/org/apache/giraph/format/hbase/TestHBaseRootMarkerVertextFormat.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/test/java/org/apache/giraph/format/hbase/edgemarker/TableEdgeInputFormat.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/test/java/org/apache/giraph/format/hbase/edgemarker/TableEdgeOutputFormat.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/4801/diff


Testing
---


Thanks,

Avery



Re: Review Request: HBase/Accumulo Input and Output formats (on behalf of Brian)

2012-04-30 Thread Avery Ching

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4801/#review7404
---


Hi Brian, the patch applies nicely, but it filled with duplicates.  Also there 
are some javadoc indentation fixes to make.  I just gave a couple of examples.  
Can you please fix this and resubmit?  Thanks!


http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/LICENSE.txt
<https://reviews.apache.org/r/4801/#comment16328>

This license is duplicated several times.



http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/license-header.txt
<https://reviews.apache.org/r/4801/#comment16329>

This license is duplicated several times.



http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/main/assembly/compile.xml
<https://reviews.apache.org/r/4801/#comment16330>

More duplication.



http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/format/accumulo/AccumuloVertexInputFormat.java
<https://reviews.apache.org/r/4801/#comment16331>

Please fix indentation.



http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/format/accumulo/AccumuloVertexInputFormat.java
<https://reviews.apache.org/r/4801/#comment16332>

Please fix indentation.



http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/format/accumulo/AccumuloVertexInputFormat.java
<https://reviews.apache.org/r/4801/#comment16333>

Please fix indentation.



http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/format/accumulo/AccumuloVertexInputFormat.java
<https://reviews.apache.org/r/4801/#comment16334>

Please fix indentation.



http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/format/accumulo/AccumuloVertexInputFormat.java
<https://reviews.apache.org/r/4801/#comment16335>

Code duplication.



http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/format/accumulo/AccumuloVertexInputFormat.java
<https://reviews.apache.org/r/4801/#comment16336>

Extra *


- Avery


On 2012-04-30 23:54:55, Avery Ching wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4801/
> ---
> 
> (Updated 2012-04-30 23:54:55)
> 
> 
> Review request for giraph.
> 
> 
> Summary
> ---
> 
> Brian's patch for GIRAPH-153.
> 
> 
> This addresses bug GIRAPH-153.
> https://issues.apache.org/jira/browse/GIRAPH-153
> 
> 
> Diffs
> -
> 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/LICENSE.txt
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/license-header.txt
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/pom.xml
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/main/assembly/compile.xml
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/format/accumulo/AccumuloVertexInputFormat.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/format/accumulo/AccumuloVertexOutputFormat.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/format/accumulo/package-info.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/format/hbase/HBaseVertexInputFormat.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/format/hbase/HBaseVertexOutputFormat.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/format/hbase/package-info.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/test/java/org/apache/giraph/BspCase.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/test/java/org/apache/giraph/format/accum

Re: [DISCUSS] Giraph graduation resolution

2012-05-04 Thread Avery Ching

+1, sounds good to me.  I also would like to have a rotating PMC chair.

Avery

On 5/4/12 10:26 AM, Jakob Homan wrote:

Both suggestions sound reasonable to me.  +1 on the current resolution.

On Fri, May 4, 2012 at 10:23 AM, Owen O'Malley  wrote:

On Fri, May 4, 2012 at 10:09 AM, Jakob Homan  wrote:

Looks good.  I might suggest adding language to rotate the PMC chair
annually to spread the responsibility around a bit and increase our
Bus Number.  Also, I was hoping to have seen Christian a bit more
during incubation...

Since this resolution is a one-off, I would suggest putting the anual
rotation in the bylaws that will be part of the project's permanent
website. That will make it more visible and easier for the project to
change it itself.

I agree that Christian hasn't been involved while it is in the
incubator. On the other hand, he was heavily involved before it came
to Apache. Looking at the svn logs, the number of commits (not
contributions) per user is:

  148 aching
  47 jghoman
  37 ckunz
  11 claudio
   7 exg
   5 ssc
   4 hyunsik
   3 omalley
   1 kunzchr
   1 jmannix
   1 ekoontz
   1 asuresh

So Christian only has 1 commit at Apache, but he has 37 prior. Given
that level of involvement, I'd rather put him on the Giraph PMC and
let him go emeritis in a few months. What do others think?

-- Owen




Review Request: Implemented a netty client/server protocol a a faster alternative to Hadoop RPC (3x improvement)

2012-05-09 Thread Avery Ching

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5074/
---

Review request for giraph.


Summary
---

* Implemented a request/response protocol with netty as a NettyClient and 
NettyServer.  There is a NettyClientWorker and NettyClientServer that 
implements WorkerClient and WorkerServer, respectively.  Netty is a lot faster 
since it's non-blocking and we can interleave computation and communication as 
opposed to Hadoop RPC (blocking).
* The netty server implementation uses concurrent hash maps to improved 
concurrency instead of synchronized blocks around maps.
* By default netty is used, but Hadoop RPC can be used with 
-Dgiraph.useNetty=false
* Changed the class hierarchy of ServerInterface to WorkerClientServer 
(WorkerClient and WorkerServer) to support a request/response protocol instead 
of just RPC
* In netty, the messages/mutations are gathered by partition and send out as a 
partition's worth of messages/mutations
* Added two new test classes (RequestTest.java and ConnectionTest.java) to test 
all requests and check netty connections.
* PageRankBenchmark uses EdgeListVertex as a default


This addresses bug GIRAPH-37.
https://issues.apache.org/jira/browse/GIRAPH-37


Diffs
-

  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/utils/MockUtils.java
 1332888 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/examples/SimpleShortestPathVertexTest.java
 1332888 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestTest.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/WorkerInfo.java
 1332888 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestJsonBase64Format.java
 1332888 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestManualCheckpoint.java
 1332888 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/comm/ConnectionTest.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java
 1332888 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphState.java
 1332888 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/VertexMutations.java
 1332888 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WritableRequest.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspService.java
 1332888 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
 1332888 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerClientServer.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerCommunications.java
 1332888 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerServer.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerClient.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerInterface.java
 1332888 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerData.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/SendVertexRequest.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/SendPartitionMutationsRequest.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/SendPartitionMessagesRequest.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/SendMutationsCache.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/SendMessageCache.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/RequestServerHandler.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/ResponseClientHandler.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/RequestRegistry.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/RequestEncoder.ja

Re: Review Request: Implemented a netty client/server protocol a a faster alternative to Hadoop RPC (3x improvement)

2012-05-09 Thread Avery Ching


> On 2012-05-09 10:10:46, Sebastian Schelter wrote:
> > http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java,
> >  line 1465
> > <https://reviews.apache.org/r/5074/diff/2/?file=108120#file108120line1465>
> >
> > I don't like it that a collection is changed outside of the class that 
> > owns it. 
> > 
> > This makes code hard to read and debug. We should rather introduce a 
> > method for this in the class that owns this map to have all mutations in 
> > one place.

Good point, it's a little heard to understand.  Since this is a Map, we can do 
as you suggested, keep it in a class and then add a method to do the clear().  
We can even add calls to do the methods that iterate over the map as well to 
not have to do any synchronization outside of the map.  I'll do this for all 
our synchronized objects in the next patch if that's okay with you (the current 
code does this as well).  It will be a somewhat medium sized change.


- Avery


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5074/#review7728
-------


On 2012-05-09 09:22:36, Avery Ching wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5074/
> ---
> 
> (Updated 2012-05-09 09:22:36)
> 
> 
> Review request for giraph.
> 
> 
> Summary
> ---
> 
> * Implemented a request/response protocol with netty as a NettyClient and 
> NettyServer.  There is a NettyClientWorker and NettyClientServer that 
> implements WorkerClient and WorkerServer, respectively.  Netty is a lot 
> faster since it's non-blocking and we can interleave computation and 
> communication as opposed to Hadoop RPC (blocking).
> * The netty server implementation uses concurrent hash maps to improved 
> concurrency instead of synchronized blocks around maps.
> * By default netty is used, but Hadoop RPC can be used with 
> -Dgiraph.useNetty=false
> * Changed the class hierarchy of ServerInterface to WorkerClientServer 
> (WorkerClient and WorkerServer) to support a request/response protocol 
> instead of just RPC
> * In netty, the messages/mutations are gathered by partition and send out as 
> a partition's worth of messages/mutations
> * Added two new test classes (RequestTest.java and ConnectionTest.java) to 
> test all requests and check netty connections.
> * PageRankBenchmark uses EdgeListVertex as a default
> 
> 
> This addresses bug GIRAPH-37.
> https://issues.apache.org/jira/browse/GIRAPH-37
> 
> 
> Diffs
> -
> 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/utils/MockUtils.java
>  1332888 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/examples/SimpleShortestPathVertexTest.java
>  1332888 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestTest.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/WorkerInfo.java
>  1332888 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestJsonBase64Format.java
>  1332888 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestManualCheckpoint.java
>  1332888 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/comm/ConnectionTest.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java
>  1332888 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphState.java
>  1332888 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/VertexMutations.java
>  1332888 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WritableRequest.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspService.java
>  1332888 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
>  1332888 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerClientServer.java
>  PRE-CREATION

Re: Review Request: GIRAPH-20 Move temporary test files from the project directory

2012-05-09 Thread Avery Ching

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5077/#review7745
---


Hey, Sebastian, overall looks good.  If no one else gets to it, I'll finish 
this review tonight.

- Avery


On 2012-05-09 11:37:47, Sebastian Schelter wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5077/
> ---
> 
> (Updated 2012-05-09 11:37:47)
> 
> 
> Review request for giraph.
> 
> 
> Summary
> ---
> 
> All temporary files that the tests generate are now written to 
> /tmp/_giraphTests including zooKeeper files, checkpoints etc. 
> 
> This behavior will be automatically configured whenever 
> InternalVertexRunner.run() or BspCase.prepareJob() is used.
> 
> Usually I can't stop myself once I have my refactoring hat on, therefore I 
> also tidied up a lot of minor stuff, removed code duplications etc.
> 
> 
> This addresses bug GIRAPH-20.
> https://issues.apache.org/jira/browse/GIRAPH-20
> 
> 
> Diffs
> -
> 
>   trunk/src/test/java/org/apache/giraph/TestZooKeeperExt.java 1332106 
>   trunk/src/test/java/org/apache/giraph/graph/TestEdgeListVertex.java 1332106 
>   trunk/src/test/java/org/apache/giraph/TestJsonBase64Format.java 1332106 
>   trunk/src/test/java/org/apache/giraph/TestManualCheckpoint.java 1332106 
>   trunk/src/test/java/org/apache/giraph/TestMutateGraphVertex.java 1332106 
>   trunk/src/test/java/org/apache/giraph/TestNotEnoughMapTasks.java 1332106 
>   trunk/src/test/java/org/apache/giraph/TestGraphPartitioner.java 1332106 
>   trunk/src/test/java/org/apache/giraph/TestAutoCheckpoint.java 1332106 
>   trunk/src/test/java/org/apache/giraph/TestBspBasic.java 1332106 
>   trunk/src/test/java/org/apache/giraph/BspCase.java 1332106 
>   trunk/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 
> 1332106 
>   trunk/src/main/java/org/apache/giraph/examples/SimplePageRankVertex.java 
> 1332106 
>   trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1332106 
>   trunk/src/main/java/org/apache/giraph/graph/TextAggregatorWriter.java 
> 1332106 
>   trunk/src/main/java/org/apache/giraph/utils/FileUtils.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/5077/diff
> 
> 
> Testing
> ---
> 
> successfully passed local and pseudo-distributed tests with Hadoop 0.20.203
> 
> 
> Thanks,
> 
> Sebastian
> 
>



Re: Review Request: GIRAPH-20 Move temporary test files from the project directory

2012-05-09 Thread Avery Ching

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5077/#review7756
---


Overall, looks great.  Can you address the questions/comments and then I'll 
re-review?


trunk/src/main/java/org/apache/giraph/graph/TextAggregatorWriter.java


Just out of curiosity, why this change?



trunk/src/main/java/org/apache/giraph/utils/FileUtils.java


Why delete it?



trunk/src/test/java/org/apache/giraph/BspCase.java


Empty params and return.



trunk/src/test/java/org/apache/giraph/BspCase.java


Empty params and return.



trunk/src/test/java/org/apache/giraph/BspCase.java


Empty params and return.



trunk/src/test/java/org/apache/giraph/BspCase.java


Empty params and return.



trunk/src/test/java/org/apache/giraph/BspCase.java


@return



trunk/src/test/java/org/apache/giraph/BspCase.java


@return



trunk/src/test/java/org/apache/giraph/TestBspBasic.java


shouldn't it be 49 not 491?


- Avery


On 2012-05-09 11:37:47, Sebastian Schelter wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5077/
> ---
> 
> (Updated 2012-05-09 11:37:47)
> 
> 
> Review request for giraph.
> 
> 
> Summary
> ---
> 
> All temporary files that the tests generate are now written to 
> /tmp/_giraphTests including zooKeeper files, checkpoints etc. 
> 
> This behavior will be automatically configured whenever 
> InternalVertexRunner.run() or BspCase.prepareJob() is used.
> 
> Usually I can't stop myself once I have my refactoring hat on, therefore I 
> also tidied up a lot of minor stuff, removed code duplications etc.
> 
> 
> This addresses bug GIRAPH-20.
> https://issues.apache.org/jira/browse/GIRAPH-20
> 
> 
> Diffs
> -
> 
>   trunk/src/test/java/org/apache/giraph/TestZooKeeperExt.java 1332106 
>   trunk/src/test/java/org/apache/giraph/graph/TestEdgeListVertex.java 1332106 
>   trunk/src/test/java/org/apache/giraph/TestJsonBase64Format.java 1332106 
>   trunk/src/test/java/org/apache/giraph/TestManualCheckpoint.java 1332106 
>   trunk/src/test/java/org/apache/giraph/TestMutateGraphVertex.java 1332106 
>   trunk/src/test/java/org/apache/giraph/TestNotEnoughMapTasks.java 1332106 
>   trunk/src/test/java/org/apache/giraph/TestGraphPartitioner.java 1332106 
>   trunk/src/test/java/org/apache/giraph/TestAutoCheckpoint.java 1332106 
>   trunk/src/test/java/org/apache/giraph/TestBspBasic.java 1332106 
>   trunk/src/test/java/org/apache/giraph/BspCase.java 1332106 
>   trunk/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 
> 1332106 
>   trunk/src/main/java/org/apache/giraph/examples/SimplePageRankVertex.java 
> 1332106 
>   trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1332106 
>   trunk/src/main/java/org/apache/giraph/graph/TextAggregatorWriter.java 
> 1332106 
>   trunk/src/main/java/org/apache/giraph/utils/FileUtils.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/5077/diff
> 
> 
> Testing
> ---
> 
> successfully passed local and pseudo-distributed tests with Hadoop 0.20.203
> 
> 
> Thanks,
> 
> Sebastian
> 
>



Re: Review Request: GIRAPH-20 Move temporary test files from the project directory

2012-05-10 Thread Avery Ching

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5077/#review7772
---

Ship it!


+1, thanks Sebastian!  I'm assuming it also passes 'mvn verify', but please 
double check before you commit.

- Avery


On 2012-05-10 09:32:10, Sebastian Schelter wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5077/
> ---
> 
> (Updated 2012-05-10 09:32:10)
> 
> 
> Review request for giraph.
> 
> 
> Summary
> ---
> 
> All temporary files that the tests generate are now written to 
> /tmp/_giraphTests including zooKeeper files, checkpoints etc. 
> 
> This behavior will be automatically configured whenever 
> InternalVertexRunner.run() or BspCase.prepareJob() is used.
> 
> Usually I can't stop myself once I have my refactoring hat on, therefore I 
> also tidied up a lot of minor stuff, removed code duplications etc.
> 
> 
> This addresses bug GIRAPH-20.
> https://issues.apache.org/jira/browse/GIRAPH-20
> 
> 
> Diffs
> -
> 
>   trunk/src/main/java/org/apache/giraph/examples/SimplePageRankVertex.java 
> 1336504 
>   trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1336504 
>   trunk/src/main/java/org/apache/giraph/graph/TextAggregatorWriter.java 
> 1336504 
>   trunk/src/main/java/org/apache/giraph/utils/FileUtils.java PRE-CREATION 
>   trunk/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 
> 1336504 
>   trunk/src/test/java/org/apache/giraph/BspCase.java 1336504 
>   trunk/src/test/java/org/apache/giraph/TestAutoCheckpoint.java 1336506 
>   trunk/src/test/java/org/apache/giraph/TestBspBasic.java 1336504 
>   trunk/src/test/java/org/apache/giraph/TestGraphPartitioner.java 1336504 
>   trunk/src/test/java/org/apache/giraph/TestJsonBase64Format.java 1336504 
>   trunk/src/test/java/org/apache/giraph/TestManualCheckpoint.java 1336506 
>   trunk/src/test/java/org/apache/giraph/TestMutateGraphVertex.java 1336504 
>   trunk/src/test/java/org/apache/giraph/TestNotEnoughMapTasks.java 1336504 
>   trunk/src/test/java/org/apache/giraph/TestZooKeeperExt.java 1336504 
>   trunk/src/test/java/org/apache/giraph/graph/TestEdgeListVertex.java 1336504 
> 
> Diff: https://reviews.apache.org/r/5077/diff
> 
> 
> Testing
> ---
> 
> successfully passed local and pseudo-distributed tests with Hadoop 0.20.203
> 
> 
> Thanks,
> 
> Sebastian
> 
>



Re: Giraph is now an Apache top level project

2012-05-16 Thread Avery Ching

Thanks Owen for helping us through this process!

Avery

On 5/16/12 1:33 PM, Owen O'Malley wrote:

Today the Apache board voted to graduate Giraph to a top level
project. Congratulations, all!

-- Owen




Re: Intention to request a new roller account

2012-05-16 Thread Avery Ching

A blog would be nice.

Avery

On 5/16/12 12:48 PM, Eugene Koontz wrote:

Hi fellow Giraphers,

I've been asked on https://issues.apache.org/jira/browse/INFRA-4377 to
notify you that I'd like to create a new Apache Roller account - (Apache
Roller is the blogging software used by https://blogs.apache.org/).

I think it would be nice to set up a blog for Giraph at some point - I
assume its url would be https://blogs.apache.org/giraph/ .

However, initially and originally, my request doesn't have anything to
do with Giraph: I'm planning on using my account to post an entry
regarding HBase Access Controls to the HBase blog
(http://blogs.apache.org/hbase/). I created this blog post over a year
ago, but the site is now gone. Fortunately it's still archived here:

  
http://web.archive.org/web/20101031022526/http://hbaseblog.com/2010/10/11/secure-hbase-access-controls/
 .

(So, please forgive the somewhat off-topic nature of this mail).

-Eugene





Re: the project wiki works well

2011-08-23 Thread Avery Ching
Thanks for testing it Hyunsik.

Avery

On Aug 23, 2011, at 12:41 AM, Hyunsik Choi wrote:

> The project wiki works well. For test, I copied "Quick Start Guide" in 
> github's giraph into the apache cwiki.
> 
> https://cwiki.apache.org/confluence/display/GIRAPH/Quick+Start+Guide



Re: Will there be JIRA import for Apache Giraph?

2011-08-23 Thread Avery Ching
Hi Henry,

I added the INFRA-3855 as a blocker on the issue.  We definitely don't want to 
have to commit 2x.

Avery

On Aug 23, 2011, at 3:50 PM, Henry Saputra wrote:

> Hi All,
> 
> From Avery comment in one of the JIRA issue, looks like there will be
> existing JIRA import for Giraph?
> 
> If thats the case, we probably should wait until the import happen
> before filing new issues because they will be overwritten by the
> import.
> 
> - Henry



Re: Will there be JIRA import for Apache Giraph?

2011-08-23 Thread Avery Ching
Henry,

Please disregard my answer, I didn't understand your question.  Jakob's answer 
is right, not much to import for issues from GitHub.

Avery

On Aug 23, 2011, at 4:19 PM, Avery Ching wrote:

> Hi Henry,
> 
> I added the INFRA-3855 as a blocker on the issue.  We definitely don't want 
> to have to commit 2x.
> 
> Avery
> 
> On Aug 23, 2011, at 3:50 PM, Henry Saputra wrote:
> 
>> Hi All,
>> 
>> From Avery comment in one of the JIRA issue, looks like there will be
>> existing JIRA import for Giraph?
>> 
>> If thats the case, we probably should wait until the import happen
>> before filing new issues because they will be overwritten by the
>> import.
>> 
>> - Henry
> 



Reviewboard for code reviews

2011-08-29 Thread Avery Ching
Anyone know if we have reviewboard access?

Thanks,

Avery


Re: Reviewboard for code reviews

2011-08-29 Thread Avery Ching
https://blogs.apache.org/infra/entry/reviewboard_instance_running_at_the

I'll file an INFRA ticket.

Thanks,

Avery

On Aug 29, 2011, at 10:07 PM, Hyunsik Choi wrote:

Looks possible. Some incubator project (e.g., Kafka) already has a
reviewboard group.

Best regards,
--
Hyunsik Choi



On Tue, Aug 30, 2011 at 1:48 PM, Avery Ching 
mailto:ach...@yahoo-inc.com>> wrote:
Anyone know if we have reviewboard access?

Thanks,

Avery




Re: Reviewboard for code reviews

2011-08-30 Thread Avery Ching
Thanks Henry.  I have filed issue

https://issues.apache.org/jira/browse/INFRA-3892

to get reviewboard access.

Avery

On Aug 30, 2011, at 11:35 AM, Henry Saputra wrote:

Hi Avery, yes you should file INFRA ticket to add Giraph as Groups in
reviews board.

I filed tickets to create one for Kafka and Gora.

- Henry

On Mon, Aug 29, 2011 at 10:13 PM, Avery Ching 
mailto:ach...@yahoo-inc.com>> wrote:
https://blogs.apache.org/infra/entry/reviewboard_instance_running_at_the

I'll file an INFRA ticket.

Thanks,

Avery

On Aug 29, 2011, at 10:07 PM, Hyunsik Choi wrote:

Looks possible. Some incubator project (e.g., Kafka) already has a
reviewboard group.

Best regards,
--
Hyunsik Choi



On Tue, Aug 30, 2011 at 1:48 PM, Avery Ching 
mailto:ach...@yahoo-inc.com>> wrote:
Anyone know if we have reviewboard access?

Thanks,

Avery






Re: Reviewboard for code reviews

2011-08-30 Thread Avery Ching
Okay, let's make it optional for now.  For me, it definitely helps to visualize 
the changes better.  Also, I think the feedback tool is pretty good.

Avery

On Aug 30, 2011, at 11:52 AM, Henry Saputra wrote:

> Argh I meant "It should just an option to help review and should not
> be required for patches."
> 
> - Henry
> 
> On Tue, Aug 30, 2011 at 11:51 AM, Henry Saputra  
> wrote:
>> +1
>> 
>> It should just optional to help review not required.
>> 
>> - Henry
>> 
>> On Tue, Aug 30, 2011 at 11:48 AM, Jakob Homan  wrote:
>>> We've just gone around on this one for Kafka and, if reviewboard is
>>> provided, it would be good to keep it as an optional part of the
>>> process.  I've had very negative experiences with it, both in Hadoop
>>> and Hive.  If one would like to do a reviewboard review, that's great
>>> - but for those who don't, standard bullet points should suffice.
>>> -jakob
>>> 
>>> 
>>> 
>>> On Tue, Aug 30, 2011 at 11:38 AM, Avery Ching  wrote:
>>>> Thanks Henry.  I have filed issue
>>>> 
>>>> https://issues.apache.org/jira/browse/INFRA-3892
>>>> 
>>>> to get reviewboard access.
>>>> 
>>>> Avery
>>>> 
>>>> On Aug 30, 2011, at 11:35 AM, Henry Saputra wrote:
>>>> 
>>>> Hi Avery, yes you should file INFRA ticket to add Giraph as Groups in
>>>> reviews board.
>>>> 
>>>> I filed tickets to create one for Kafka and Gora.
>>>> 
>>>> - Henry
>>>> 
>>>> On Mon, Aug 29, 2011 at 10:13 PM, Avery Ching 
>>>> mailto:ach...@yahoo-inc.com>> wrote:
>>>> https://blogs.apache.org/infra/entry/reviewboard_instance_running_at_the
>>>> 
>>>> I'll file an INFRA ticket.
>>>> 
>>>> Thanks,
>>>> 
>>>> Avery
>>>> 
>>>> On Aug 29, 2011, at 10:07 PM, Hyunsik Choi wrote:
>>>> 
>>>> Looks possible. Some incubator project (e.g., Kafka) already has a
>>>> reviewboard group.
>>>> 
>>>> Best regards,
>>>> --
>>>> Hyunsik Choi
>>>> 
>>>> 
>>>> 
>>>> On Tue, Aug 30, 2011 at 1:48 PM, Avery Ching 
>>>> mailto:ach...@yahoo-inc.com>> wrote:
>>>> Anyone know if we have reviewboard access?
>>>> 
>>>> Thanks,
>>>> 
>>>> Avery
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>> 



Re: dev process

2011-08-31 Thread Avery Ching
I agree that RTC is better for us, although in case of build breakage, I'm okay 
with CTR in that case.

Avery

On Aug 31, 2011, at 9:09 AM, Jakob Homan wrote:

> RTC is definitely the way to go (he said with a weary sigh).
> 
> On Wed, Aug 31, 2011 at 6:32 AM, Owen O'Malley  wrote:
>> All,
>>   It seems that we've implicitly picked review then commit (RTC) instead of 
>> commit then review (CTR). Apache projects allow either approach and I'm fine 
>> with either. We should just state what we are doing.
>>   I'd also like to propose that we keep a CHANGES.txt file that includes who 
>> contributed and committed each patch. I've created GIRAPH-19 to do that.
>> 
>> Thoughts?
>> 
>> -- Owen



Review Request: GIRAPH-27 Mutable static global state in Vertex.java should be refactored

2011-09-08 Thread Avery Ching

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1771/
---

Review request for giraph.


Summary
---

Based on Jake's submission 
https://issues.apache.org/jira/secure/attachment/12493654/GIRAPH-27.patch
Couple of small changes:
- Do not expose GraphState to application developers
- Fixing a few formatting issues


This addresses bug GIRAPH-27.
https://issues.apache.org/jira /browse/GIRAPH-27


Diffs
-

  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BasicVertex.java
 1166925 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspService.java
 1166925 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
 1166925 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedService.java
 1166925 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
 1166925 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspUtils.java
 1166925 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java
 1166925 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/Vertex.java
 1166925 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/VertexRange.java
 1166925 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/VertexResolver.java
 1166925 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestBspBasic.java
 1166925 

Diff: https://reviews.apache.org/r/1771/diff


Testing
---

Unittest and page rank benchmark on Yahoo! grid with 10 workers.


Thanks,

Avery



Re: Review Request: GIRAPH-27 Mutable static global state in Vertex.java should be refactored

2011-09-08 Thread Avery Ching

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1771/
---

(Updated 2011-09-09 01:52:18.842526)


Review request for giraph.


Changes
---

Dmitriy correctly pointed out I forgot to add GraphState.java to the diff.


Summary
---

Based on Jake's submission 
https://issues.apache.org/jira/secure/attachment/12493654/GIRAPH-27.patch
Couple of small changes:
- Do not expose GraphState to application developers
- Fixing a few formatting issues


This addresses bug GIRAPH-27.
https://issues.apache.org/jira /browse/GIRAPH-27


Diffs (updated)
-

  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedService.java
 1166925 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
 1166925 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BasicVertex.java
 1166925 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspService.java
 1166925 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
 1166925 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspUtils.java
 1166925 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java
 1166925 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphState.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/Vertex.java
 1166925 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/VertexRange.java
 1166925 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/VertexResolver.java
 1166925 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestBspBasic.java
 1166925 

Diff: https://reviews.apache.org/r/1771/diff


Testing
---

Unittest and page rank benchmark on Yahoo! grid with 10 workers.


Thanks,

Avery



Re: Review Request: GIRAPH-27 Mutable static global state in Vertex.java should be refactored

2011-09-08 Thread Avery Ching

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1771/
---

(Updated 2011-09-09 04:44:32.531014)


Review request for giraph.


Changes
---

Make BasicVertex and MutableVertex abstract classes.  BasicVertex has package 
private methods for get/setGraphState().


Summary
---

Based on Jake's submission 
https://issues.apache.org/jira/secure/attachment/12493654/GIRAPH-27.patch
Couple of small changes:
- Do not expose GraphState to application developers
- Fixing a few formatting issues


This addresses bug GIRAPH-27.
https://issues.apache.org/jira /browse/GIRAPH-27


Diffs (updated)
-

  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedService.java
 1166925 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
 1166925 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BasicVertex.java
 1166925 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspService.java
 1166925 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
 1166925 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspUtils.java
 1166925 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java
 1166925 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphState.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/MutableVertex.java
 1166925 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/Vertex.java
 1166925 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/VertexRange.java
 1166925 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/VertexResolver.java
 1166925 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestBspBasic.java
 1166925 

Diff: https://reviews.apache.org/r/1771/diff


Testing
---

Unittest and page rank benchmark on Yahoo! grid with 10 workers.


Thanks,

Avery



Re: Review Request: GIRAPH-27 Mutable static global state in Vertex.java should be refactored

2011-09-08 Thread Avery Ching

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1771/
---

(Updated 2011-09-09 06:26:15.766329)


Review request for giraph.


Changes
---

Moved as much of the getGraphState() related method implementations from Vertex 
to BasicVertex and MutableVertex.  Other changes for primitive implementations 
can be done in another JIRA.


Summary
---

Based on Jake's submission 
https://issues.apache.org/jira/secure/attachment/12493654/GIRAPH-27.patch
Couple of small changes:
- Do not expose GraphState to application developers
- Fixing a few formatting issues


This addresses bug GIRAPH-27.
https://issues.apache.org/jira /browse/GIRAPH-27


Diffs (updated)
-

  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedService.java
 1166925 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
 1166925 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BasicVertex.java
 1166925 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspService.java
 1166925 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
 1166925 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspUtils.java
 1166925 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java
 1166925 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphState.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/MutableVertex.java
 1166925 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/Vertex.java
 1166925 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/VertexRange.java
 1166925 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/VertexResolver.java
 1166925 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestBspBasic.java
 1166925 

Diff: https://reviews.apache.org/r/1771/diff


Testing
---

Unittest and page rank benchmark on Yahoo! grid with 10 workers.


Thanks,

Avery



Re: Incubator report is due

2011-09-12 Thread Avery Ching
Sounds good to me.  I'll reach out to Arun and see if he can fill out out the 
ICLA.

Avery

On Sep 12, 2011, at 8:44 AM, Owen O'Malley wrote:

> I'd propose:
> 
> Giraph
> 
> Giraph is a large-scale, fault-tolerant, Bulk Synchronous Parallel
> (BSP)-based graph
> processing framework that runs on Hadoop. Giraph entered the incubator
> in August 2011.
> 
> Project developments:
> 
> * Project website created.
> * Confluence wiki created.
> * Accounts were created for two of the committers.
> * Project is entirely on Apache infrastructure.
> 
> Next steps:
> * Adding new committers.
> * Making a release.
> * One of the initial committers still hasn't filed an ICLA. We either
> need him to move forward or remove him.



Re: Port to YARN: GIRAPH and HAMA

2011-09-13 Thread Avery Ching

Hi Vinod,

Edward and I have chatted about this at times.  It sounds better in 
theory (both BSP based and adding support for MRv2) than in practice I 
think (underlying implementations are quite different).  Actually, I 
also believe that in the future, Giraph is not going to solely be 
BSP-based graph computing.  We are also thinking about other underlying 
computing models (i.e. streaming (asynchronous) graph processing - see


http://mail-archives.apache.org/mod_mbox/incubator-giraph-user/201109.mbox/%3CCAEVHzWC8b-7RiBjkDiQKjiu-rVBz9=ogeoajxhbclcr5n3+...@mail.gmail.com%3E

But I think today, the issues are the following:

1)  Giraph runs completely as a MapReduce job on Hadoop today.  This 
needs to be maintained to support our current users, who will not likely 
move to MRv2 for at least a year.
2)  The internals of Giraph are implemented differently than Hama and 
would take some time to port to.
3)  If we have various graph processing computing models (BSP based, 
streams or asynchronous, or a combination), then being on Hama brings 
little value for Giraph.


Perhaps more practically, I wonder if it would be possible for someone 
from the Hama team to refactor our code a bit to support Hama-style BSP 
in Giraph?  Certainly would be a pretty cool project...


Avery

On 9/13/11 4:49 AM, Edward J. Yoon wrote:

Quite a while ago, I implemented a clone of Google Pregel simply using
BSPLib[1] and decided to focus on BSP computing engine.

Hama and Giraph projects are differ in slogan but not in kind.

If we made some collaboration, Giraph should be implemented on top of
Hama BSP computing engine.

Otherwise, we will back to square one again.

1. http://markmail.org/thread/4czcgtjupjvpqcqi

On Sun, Sep 11, 2011 at 11:22 PM, Vinod Kumar Vavilapalli
  wrote:

Crosspost to hama-dev and giraph-dev.

It was only in my morning time that I was looking at HAMA-431, the port of
Hama to YARN. And one of the tweets reminded me of JIRA issue GIRAPH-13
which is about porting Giraph to YARN.

I was also looking at the Girpah proposal for entry into Apache Incubator.
There is an interesting section there:
{quote}
Relationships with Other Apache Products

Giraph has some overlapping functionality with Apache Hama. However, there
are some significant differences. Giraph focuses on graph-based bulk
synchronous parallel (BSP) computing, while Apache Hama is more for general
purposed BSP computing. Giraph runs on the Hadoop infrastructure, while
Apache Hama uses its own computing framework.
{quote}

I agree with the point about Hama being a general purposed BSP and Giraph
being completely graph oriented. But the later one about the infrastructure
is going to be moot with both Giraph and Hama trying to be ported over to
YARN.

So here's my billion dollar question: Is it possible to implement Girpah's
graph based APIs over the Hama's bsp APIs which both run over a single
Apache BSP implementation over YARN?

I also do see the email thread regarding Hama and Giraph's future
collaboration when Hadoop NextGen aka YARN comes in:
http://s.apache.org/HamaVsGiraph. So are we ready for this yet?

Disclaimer: I come from the Hadoop world, have no idea of Giraph's APIs or
internals except that I see a bsp package in Giraph's source tree. I do know
a tiny bit about Hama's APIs and internal but my expertise is only two days.

Thanks,
+Vinod
(An elephant maintainer trying to see if a Giraffe can be made to ride over
a hippopotamus riding over an elephant)








Re: Port to YARN: GIRAPH and HAMA

2011-09-13 Thread Avery Ching
Maybe it's possible, hard to say what will happen in a year.  However, 
at the same time, porting an application from any of the projects to the 
another should be shouldn't be too difficult since the Pregel API is 
relatively simple.  However, as I mentioned in my original post, I 
imagine that Giraph will support non-BSP graph computing models as well 
in the future (less portable).


Avery

On 9/13/11 12:51 PM, Dan Brickley wrote:

On 13 September 2011 21:43, Dmitriy Ryaboy  wrote:

Dan,
Given how fast we are currently iterating on the API in Giraph, I think
agreeing on a common API across 3 projects is a bit premature at this stage,
unfortunately..

Current velocity aside, ... could such an interface be plausible? e.g.
this time next year?

Dan




Re: Port to YARN: GIRAPH and HAMA

2011-09-14 Thread Avery Ching

Vinod, thanks for your comments.  I've replied inline.

Avery

On 9/14/11 11:09 AM, Vinod Kumar Vavilapalli wrote:

Avery,

Some replies inline to the issues you outlined.


1)  Giraph runs completely as a MapReduce job on Hadoop today.  This needs

to be maintained to support our current users, who will not likely move to
MRv2 for at least a year.
I think what you need is to support Giraph's graph API for your users, but
no, not the underlying implementation. (Or are you leaking MapReduce APIs to
your users?) Sure, you are restricted to the under implementation(Hadoop
MRV1 or MRV2 whenever it gets used) at any point of time, but what we are
discussing is _that_ future when the underlying implementation itself also
moves to MRV2.
I think the takeaway should be that our clients (at Yahoo! and 
elsewhere) are currently using Giraph on MRv1.  While the Giraph API is 
not exposing the underlying infrastructure APIs (i.e. MRv1 and MRv2), we 
still need to support the MRv1 implementation even while we 
begin/complete the port to MRv2.  I imagine that we will need to support 
both MRv1 and MRv2 for a fairly long period of time as the transition to 
MRv2 for a company (i.e. Yahoo!) could take a very long time (i.e. 
anywhere between 8 months to multiple years).  Some of our internal 
clusters at Yahoo! today are still running 0.20.1 for example.

2)  The internals of Giraph are implemented differently than Hama..

Sure, but only at present. My original question is - given a BSP
implementation on a YARN cluster, can GiraphV2(BSP based) be simply
implemented over that or not. If today, GiraphV1 uses (its own) BSP
implementation over mapreduce APIs on Hadoop MRV1 cluster, I can clearly see
how GiraphV2 can be using (HAMA's) BSP implemented over YARN APIs.

In theory this is true.  However, as mentioned previously, we still have 
users on MRv1 and will need to support it for a long time (i.e. at least 
a year, probably more).   Also I'm fairly certain that during the next 
year, we will have non-BSP based graph processing computing models in 
place as well.  For these reasons, it may not make sense to try to put 
Giraph on top of HAMA even when we are both on MRv2.  It's hard to say 
now as it is early.  Let's visit this at a later time.



3)  If we have various graph processing computing models (BSP based,

streams or asynchronous, or a combination), then being on Hama brings little
value for Giraph.
That future isn't there yet. In any case, I'd bet when you get there, lot of
what you have now also wouldn't be an out-of-the-box fit.

 From my perspective (a third person POV), this is what I can conclude.
Giraph's velocity on Hadoop MapReduce may be real the impedence for thinking
about a possible sharing of the bsp based implementation with HAMAV2. Sure,
Giraph has other ideas regarding the computation model itself, but that is a
future that isn't here yet.

I just hope the same velocity isn't an impedance for thinking about the
next-gen version on top of YARN :) The way I see it, porting Giraph to YARN
is also a revolution in itself; most, if not all, of the implementation will
change yet with the API level compatibility. I am still eagerly looking
forward to the port of Giraph to YARN. May be more digging into Giraph
internals may help my cause too.
Giraph does appear to be moving with a fast velocity currently, but we 
have a clear intention to run on top of MRv2.  Please see 
https://issues.apache.org/jira/browse/GIRAPH-13.  Obviously, the MRv2 
changes are much better suited for Giraph and we look forward to the day 
when nearly all Hadoop instances are running MRv2.

If nothing, this discussion atleast helped sharing of some of the ideas
between the two communities.

Thanks all for putting down in your thoughts.
+Vinod


On Wed, Sep 14, 2011 at 11:46 AM, Thomas Jungblut<
thomas.jungb...@googlemail.com>  wrote:


  We are also thinking about other underlying computing models (i.e.

streaming (asynchronous) graph processing - see


That is a really cool idea. But I don't think we are going to focus solely
on graph computing. We want to enable an interface which can be used for it
(straight forward as described in the Pregel Paper), but I think you are
really graph experts- so we don't want to compete with each other :D
Our asynchronous processing (in my opinion) will just enable the sending of
messages within the computation phase. So the BarrierSync is just a little
transition to make sure every task is ready and every message has been send.
Your Vertex locking is a graph-only feature, this won't be effecting us
anyways.


Giraph runs completely as a MapReduce job on Hadoop today.
Allright.

I think our result is the following:
We (Apache Hama) are focussing on the YARN implementation of the BSP
paradigm.
If you want to run Giraph on a real BSP engine later, feel free to put your
stuff on top of that.
As far as I have seen, there is a 100% backward compatibility of YARN, so
your current solution will run on YA

Re: Vertex serialization

2011-09-15 Thread Avery Ching
It should probably be ints instead of longs for now.  I think I was 
thinking of day when it might be possible to have more than 
Integer.MAX_VALUE edges or messages.  But that would break this current 
implementation anyway.


Avery

On 9/15/11 4:49 AM, Claudio Martella wrote:

Hello,

I've noticed that Vertex's Writable implementation (readFields and
write) write longs for map (edges and messages) sizes (which are
ints). Is it for some compatibility reason?





Review Request: GIRAPH-34 Failure of Vertex reflection for putVertexList from GIRAPH-27

2011-09-16 Thread Avery Ching

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1932/
---

Review request for giraph.


Summary
---

The problem shows up when messages are sent locally, and therefore not
reflected.  Hence, after the message has been delivered to the other
local vertex, in the next superstep, if the vertex who sent the
message modifies the object, it will modify the message before the
receiving vertex has a chance to access it.  Also, combiners maybe
operate on the (shared) message simultaneously with the compute()
method, therefore potential issues could also occur with messages
eventually sent to remote workers.  The solution I am proposing is
that all messages should be copied to prevent this error from occuring
as it is pretty tough to debug.  An alternative would just be to tell
users to create copies themselves, but it's less convenient and not
intuitive in my opinion.


This addresses bug GIRAPH-34.
https://issues.apache.org/jira /browse/GIRAPH-34


Diffs
-

  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
 1171389 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BasicVertex.java
 1171389 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/Vertex.java
 1171389 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestVertexRangeBalancer.java
 1171389 

Diff: https://reviews.apache.org/r/1932/diff


Testing
---

I passed local unittests (mvn package) and unittests on my local
machine's Hadop instance (mvn package
-Dprop.mapred.job.tracker=localhost:50300).


Thanks,

Avery



Re: Review Request: GIRAPH-34 Failure of Vertex reflection for putVertexList from GIRAPH-27

2011-09-16 Thread Avery Ching

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1932/
---

(Updated 2011-09-16 20:09:21.003725)


Review request for giraph.


Changes
---

Instead of copying the user messages, simply rely on Javadoc to inform users of 
the contract with sent message values.

Passed the failing unittest with local Hadoop instance and local unittests.


Summary
---

The problem shows up when messages are sent locally, and therefore not
reflected.  Hence, after the message has been delivered to the other
local vertex, in the next superstep, if the vertex who sent the
message modifies the object, it will modify the message before the
receiving vertex has a chance to access it.  Also, combiners maybe
operate on the (shared) message simultaneously with the compute()
method, therefore potential issues could also occur with messages
eventually sent to remote workers.  The solution I am proposing is
that all messages should be copied to prevent this error from occuring
as it is pretty tough to debug.  An alternative would just be to tell
users to create copies themselves, but it's less convenient and not
intuitive in my opinion.


This addresses bug GIRAPH-34.
https://issues.apache.org/jira /browse/GIRAPH-34


Diffs (updated)
-

  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
 1171389 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleCheckpointVertex.java
 1171389 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BasicVertex.java
 1171389 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/Vertex.java
 1171389 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestVertexRangeBalancer.java
 1171389 

Diff: https://reviews.apache.org/r/1932/diff


Testing
---

I passed local unittests (mvn package) and unittests on my local
machine's Hadop instance (mvn package
-Dprop.mapred.job.tracker=localhost:50300).


Thanks,

Avery



Re: Giraph-13: Porting Giraph to YARN

2011-09-18 Thread Avery Ching

Hi Vinod,

Thank you for your thoughts.  It would be great if your comments were 
put on GIRAPH-13 so they aren't lost.  You and Jakob should sync up to 
see how to proceed on this.


Avery

On 9/18/11 7:37 AM, Vinod Kumar Vavilapalli wrote:

Hi all,

I finished an excursion into Giraph's code and now I kinda know what it
takes to port Giraph over to run on top of YARN.

When  the base Hadoop clusters are replaced by YARN clusters, Giraph will
have two options:
  - *Giraph still works over mapreduce APIs*: Even after moving to YARN
clusters, Giraph can still run over MapreduceV2+YARN. Without any code
changes at all.
  - *Giraph works natively onYARN*: This can be done in such a way that in
the medium term, Giraph can continue to work on both a Hadoop Mapreduce
cluster as well as a YARN cluster. Two visible effects when this effort goes
underway, that I can think of:
 -- There will be some moving around of classes/interface to separate
APIs from implementation details and a bit of reorganisation of code to help
support both GiraphV1 and GiraphV2.
 -- The other thing the port will probably affect is a fork in the
community's attention (depending on how much of the community's eyeballs the
new world grabs as opposed to the stabilization/feature work on GiraphV1).

Now here's the thing. Avery indicated on the other thread(about Giraph over
HAMA) that most of the users of Giraph need to work on top of a hadoop
mapreduce cluster for quite some time. Which I completely agree with, being
a long time maintainer/supporting-dev of Hadoop clusters myself.

Given that concern, before embarking on the port, I thought I'd get opinions
from the community.

Thanks,
+Vinod





Re: Unit tests on real hadoop cluster

2011-09-29 Thread Avery Ching
Actually, to be fair, I've only executed the distributed unittests on my 
own local Hadoop instance.


I just ran the Hadoop unittests against trunk on my local machine to check

mvn test -Dprop.mapred.job.tracker=localhost:50300


Results :

Tests run: 27, Failures: 0, Errors: 0, Skipped: 0

[INFO] 


[INFO] BUILD SUCCESS
[INFO] 


[INFO] Total time: 12:19.143s
[INFO] Finished at: Thu Sep 29 21:55:55 PDT 2011
[INFO] Final Memory: 6M/81M
[INFO] 



Everything should be fine.

Avery

On 9/29/11 5:18 PM, Hyunsik Choi wrote:

I would like to execute unittest on real hadoop cluster.

I try to execute the following command against giraph trunk version.

mvn test -Dprop.mapred.job.tracker=xxx.korea.ac.kr:8021
-Dprop.zookeeper.list=xxx.korea.ac.kr:2181

However, the unit tests are failed as follows:
https://gist.github.com/1252309

I think It may be my fault because the source code is trunk version.

Any suggestion to this will be helpful.

--
Hyunsik Choi
Database Lab, Korea University





Re: [jira] [Commented] (GIRAPH-10) Aggregators are not exported

2011-09-30 Thread Avery Ching

Claudio,

For the AggregatorWriter, shouldn't your case be addressed?  Since only 
a single thread will use the AggregatorWriter, it can dump the 
aggregators to HDFS at every superstep.  Am I missing something?


Avery

On 9/30/11 2:20 AM, Claudio Martella wrote:

The first option sounds like a good idea to me. It fits quite directly
with the current interface.

This design though doesn't fit with my current need for dumping data
to disk. I'll write an email about it with a couple of new questions.

I think that with some help I can produce some code on this JIRA.

On Thu, Sep 29, 2011 at 9:49 PM, Avery Ching (Commented) (JIRA)
  wrote:

[ 
https://issues.apache.org/jira/browse/GIRAPH-10?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117569#comment-13117569
 ]

Avery Ching commented on GIRAPH-10:
---

This is definitely an open question here.  Here's a quick thought off the top 
of my head.

How about something like:

public interface AggregatorsWriter {
void initialize(TaskAttemptContext context, int superstep) throws 
IOException;

void writeAggregators(Collection>  
aggregators)
throws IOException, InterruptedException;

void close(TaskAttemptContext context)
throws IOException, InterruptedException;
}

Then we could define a method that lets the user select the frequency of how 
frequently to write the aggregators maybe with an enum (ALL_SUPERSTEPS, 
LAST_SUPERSTEP...).

We could easily implement a default AggregatorsWriter that simply dumps the 
name and aggregator values with the registered aggregator name (maybe class 
name too) and then the value.toString().  And users can implement something 
better if they like.

Another alternative would be to support writing aggregators separately by the 
type, but might be a little excessive and could be done laterhopefully 
someone else also chimes in with some thoughts.


Aggregators are not exported


 Key: GIRAPH-10
 URL: https://issues.apache.org/jira/browse/GIRAPH-10
 Project: Giraph
  Issue Type: New Feature
    Reporter: Avery Ching
Priority: Minor

Currently, aggregator values cannot be saved after a Giraph job.  There should 
be a way to do this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira










Review Request: numFlushThreads is 0 when doing a single worker unittest. Changing the minimum to 1.

2011-10-09 Thread Avery Ching

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2322/
---

Review request for giraph.


Summary
---

9 unittests failed due to 

java.lang.IllegalArgumentException
at 
java.util.concurrent.ThreadPoolExecutor.(ThreadPoolExecutor.java:589)
at 
java.util.concurrent.ThreadPoolExecutor.(ThreadPoolExecutor.java:480)
at java.util.concurrent.Executors.newFixedThreadPool(Executors.java:59)
at 
org.apache.giraph.comm.BasicRPCCommunications.(BasicRPCCommunications.java:375)
at 
org.apache.giraph.comm.RPCCommunications.(RPCCommunications.java:68)
at org.apache.giraph.graph.GraphMapper.map(GraphMapper.java:571)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)

The issue is that with only a single worker, the ThreadpoolExecutor fails 
because of having 0 threads as an argument.


This addresses bug GIRAPH-48.
https://issues.apache.org/jira /browse/GIRAPH-48


Diffs
-

  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
 1180785 

Diff: https://reviews.apache.org/r/2322/diff


Testing
---

Passed unittests.


Thanks,

Avery



Re: October 2011 Incubator board report

2011-10-17 Thread Avery Ching
Thanks Chris, looks good to me.

Avery

On Sun, Oct 16, 2011 at 11:45 AM, Mattmann, Chris A (388J) <
chris.a.mattm...@jpl.nasa.gov> wrote:

> Hey Guys,
>
> I added a stub Giraph report at:
>
> http://wiki.apache.org/incubator/October2011
>
> Please improve it as you see fit.
>
> Cheers,
> Chris
>
> ++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattm...@nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++
>
>


Re: Relax RTC on web site commits?

2011-10-26 Thread Avery Ching

+1.

On 10/26/11 5:42 PM, Dmitriy Ryaboy wrote:

+1

On Wed, Oct 26, 2011 at 5:13 PM, Jakob Homan  wrote:


Currently we're doing individual JIRAs for each change to the website,
which is a bunch of ceremony for a routine matter.  In GIRAPH-66, we
discussed relaxing this requirement for website changes.  This is an
approach we've used in other projects and it has worked.  I'm +1.
Thoughts?








Re: [jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

2011-10-31 Thread Avery Ching
I'd also like to hear the use case.  Currently we don't dump messages in 
the vertex output format, but maybe there is a similar case to do so?


Avery

On 10/31/11 11:46 AM, Jake Mannix wrote:

Well I guess that gives us one reason to keep it in the API.  What's the
reasoning?  Are there
static data sets which make the most sense to have "initial messages"
serialized with the
graph, instead of generating them at start?

I guess if what you're modeling is in some sense a "2nd order"
difference/differential
equation, then knowing the state of the graph is not enough information to
uniquely
describe the evolution, you also need the "first derivative" of it's state
(ie the set of
messages it has at any given time).

   -jake

On Mon, Oct 31, 2011 at 11:38 AM, Claudio Martella<
claudio.marte...@gmail.com>  wrote:


I actually like the idea of having the messages being inserted at
vertex load. Currently I'm actually fighting with this functionality
missing and was going to open and issue sooner or later.


On Mon, Oct 31, 2011 at 6:19 PM, Jake Mannix (Commented) (JIRA)
  wrote:

[

https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13140330#comment-13140330]

Jake Mannix commented on GIRAPH-36:
---

1) BspUtils.createVertex(Configuration conf, GraphState

graphState) requires access to the GraphState for instantiation, currently.
  We could avoid it by taking that setGraphState() away from that method and
leaving it in wherever it gets first used (GraphMapper?), but why not be
safe, and always set it right after instantiation, so you know that there's
no other place where someone decides to do BspUtils.createVertex(), but
forgets to then setGraphState() on it.

2) I really don't know whether it makes sense to be able to instantiate

"in-flight" messages with vertices.  I just wanted to future-proof the API
a little bit by allowing for the possibility.  I'm fine either way.

Ensure that subclassing BasicVertex is possible by user apps


 Key: GIRAPH-36
 URL: https://issues.apache.org/jira/browse/GIRAPH-36
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Affects Versions: 0.70.0
Reporter: Jake Mannix
Assignee: Jake Mannix
Priority: Blocker
 Fix For: 0.70.0

 Attachments: GIRAPH-36.diff


Original assumptions in Giraph were that all users would subclass

Vertex (which extended MutableVertex extended BasicVertex).  Classes which
wish to have application specific data structures (ie. not a TreeMap>) may need to extend either MutableVertex or BasicVertex.
  Unfortunately VertexRange extends ArrayList, and there are other
places where the assumption is that vertex classes are either Vertex, or at
least MutableVertex.

Let's make sure the internal APIs allow for BasicVertex to be the base

class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA

administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa

For more information on JIRA, see:

http://www.atlassian.com/software/jira






--
 Claudio Martella
 claudio.marte...@gmail.com





Review Request: GIRAPH-64 Create VertexRunner to make it easier to run users' computations

2011-11-02 Thread Avery Ching

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2691/
---

Review request for giraph and Jake Mannix.


Summary
---

This is filed on behalf of Jakob Homan.


This addresses bug GIRAPH-64.
https://issues.apache.org/jira/browse/GIRAPH-64


Diffs
-

  http://svn.apache.org/repos/asf/incubator/giraph/trunk/pom.xml 1196921 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java
 1196921 

Diff: https://reviews.apache.org/r/2691/diff


Testing
---

Don't know.


Thanks,

Avery



Re: Review Request: GIRAPH-64 Create VertexRunner to make it easier to run users' computations

2011-11-02 Thread Avery Ching
Crap.  I may have forgot to add some files.  Let me check when I get home
in an hour.  Sorry.

Sent from my iPhone

On Nov 2, 2011, at 7:43 PM, Jake Mannix  wrote:

  This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2691/

Does this have the whole diff?  Seems like it's missing stuff.  Only
has GraphMapper.java changes and pom.xml...


- Jake

On November 3rd, 2011, 2:29 a.m., Avery Ching wrote:
  Review request for giraph and Jake Mannix.
By Avery Ching.

*Updated 2011-11-03 02:29:54*
Description

This is filed on behalf of Jakob Homan.

  Testing

Don't know.

  *Bugs: * GIRAPH-64 <https://issues.apache.org/jira/browse/GIRAPH-64>
Diffs

   - http://svn.apache.org/repos/asf/incubator/giraph/trunk/pom.xml
   (1196921)
   -
   
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java
   (1196921)

View Diff <https://reviews.apache.org/r/2691/diff/>


Review Request: GIRAPH-47 Export Worker's Context/State to vertices through pre/post/Application/Superstep

2011-11-07 Thread Avery Ching

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2746/
---

Review request for giraph.


Summary
---

Claudio's patch for GIRAPH-47.


This addresses bug GIRAPH-47.
https://issues.apache.org/jira/browse/GIRAPH-47


Diffs
-

  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.java
 1198865 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/benchmark/RandomMessageBenchmark.java
 1198865 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedService.java
 1198865 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java
 1198865 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleCheckpointVertex.java
 1198865 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleMutateGraphVertex.java
 1198865 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimplePageRankVertex.java
 1198865 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleVertexWithWorkerContext.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BasicVertex.java
 1198865 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspService.java
 1198865 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
 1198865 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspUtils.java
 1198865 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/DefaultWorkerContext.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java
 1198865 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java
 1198865 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/LongDoubleFloatDoubleVertex.java
 1198865 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/Vertex.java
 1198865 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/WorkerContext.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestBspBasic.java
 1198865 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestManualCheckpoint.java
 1198865 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestMutateGraphVertex.java
 1198865 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestVertexRangeBalancer.java
 1198865 

Diff: https://reviews.apache.org/r/2746/diff


Testing
---

mvn install


Thanks,

Avery



Re: Review Request: GIRAPH-47 Export Worker's Context/State to vertices through pre/post/Application/Superstep

2011-11-07 Thread Avery Ching

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2746/#review3082
---


Claudio, really nice stuff here.  Most of my comments are related to indenting. 
 But otherwise, this is a lot better IMO.  Please take a look at 
CODE_CONVENTIONS and fix accordingly.  While the official policy is 2 space, at 
this time, for the 4 space indented files, please keep to 4 spaces for 
consistency.  We will transition everything over at some point.  New files can 
be 2 space (new convention) if desired.


http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleCheckpointVertex.java
<https://reviews.apache.org/r/2746/#comment6885>

This doesn't need to be static anymore.



http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleCheckpointVertex.java
<https://reviews.apache.org/r/2746/#comment6870>

Indenting should be 8 spaces.



http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleCheckpointVertex.java
<https://reviews.apache.org/r/2746/#comment6873>

extra line.



http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleCheckpointVertex.java
<https://reviews.apache.org/r/2746/#comment6874>

extra line.



http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleCheckpointVertex.java
<https://reviews.apache.org/r/2746/#comment6875>

4 spaces.



http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleCheckpointVertex.java
<https://reviews.apache.org/r/2746/#comment6871>

4 spaces indenting.



http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleCheckpointVertex.java
<https://reviews.apache.org/r/2746/#comment6872>

4 spaces indenting.



http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleCheckpointVertex.java
<https://reviews.apache.org/r/2746/#comment6876>

Align to GiraphJob.WORKER_CONTEXT_CLASS



http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleMutateGraphVertex.java
<https://reviews.apache.org/r/2746/#comment6877>

VERTEX_COUNT shouldn't be capitalized.  All caps should be reserved for 
only static values.



http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleMutateGraphVertex.java
<https://reviews.apache.org/r/2746/#comment6878>

EDGE_COUNT shouldn't be capitalized.  All caps should be reserved for only 
static values.



http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimplePageRankVertex.java
<https://reviews.apache.org/r/2746/#comment6887>

These no longer need to be static anymore, could be private variables that 
have public accessor method.



http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleVertexWithWorkerContext.java
<https://reviews.apache.org/r/2746/#comment6879>

Might want to add a comment about this example.  I.e.

/**
 * Fully runnable example of how to 
 * emit worker data to HDFS during a graph
 * computation.
 */



http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BasicVertex.java
<https://reviews.apache.org/r/2746/#comment6880>

extra line.



http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspService.java
<https://reviews.apache.org/r/2746/#comment6881>

Awesome, I hated this.



http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
<https://reviews.apache.org/r/2746/#comment6882>

indenting.



http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/DefaultWorkerContext.java
<https://reviews.apache.org/r/2746/#comment6883>

extra line.



http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/WorkerContext.java
<https://reviews.apache.org/r/2746/#comment6884>

Other javadoc has lines in between comment and params (i.e.
    
* superstep starts.
*
* @throws IllegalAccessException


- Avery


On 2011-11-07 19:09:08, Avery Ching wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/2746/
> ---
> 
> (Updated 2011-11-07 19:09:08)
> 
&g

Re: Review Request: GIRAPH-47 Export Worker's Context/State to vertices through pre/post/Application/Superstep

2011-11-07 Thread Avery Ching


> On 2011-11-07 19:12:55, Avery Ching wrote:
> > Claudio, really nice stuff here.  Most of my comments are related to 
> > indenting.  But otherwise, this is a lot better IMO.  Please take a look at 
> > CODE_CONVENTIONS and fix accordingly.  While the official policy is 2 
> > space, at this time, for the 4 space indented files, please keep to 4 
> > spaces for consistency.  We will transition everything over at some point.  
> > New files can be 2 space (new convention) if desired.
> 
> Claudio Martella wrote:
> Ok, still have to understand a bit the code conventions. Trying to stick 
> to them. Maybe an Eclipse format conf file would help? Could you share yours, 
> if you have one?

Mine is all messed up too.  I have to manually fix some things.


> On 2011-11-07 19:12:55, Avery Ching wrote:
> > http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleCheckpointVertex.java,
> >  line 150
> > <https://reviews.apache.org/r/2746/diff/1/?file=56632#file56632line150>
> >
> > This doesn't need to be static anymore.
> 
> Claudio Martella wrote:
> Can't make it non static. Won't be able to read from tests.

Sorry I wasn't more clear, I was suggesting that we fix this.  But it's not 
really related to this issue.  So don't worry about it.  Please ignore my 
comment.


> On 2011-11-07 19:12:55, Avery Ching wrote:
> > http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimplePageRankVertex.java,
> >  lines 91-92
> > <https://reviews.apache.org/r/2746/diff/1/?file=56634#file56634line91>
> >
> > These no longer need to be static anymore, could be private variables 
> > that have public accessor method.
> 
> Claudio Martella wrote:
> Not sure we can do this. How will tests get to their values. Can't access 
> those members if not static.

I was suggesting that we fix this, maybe give the user the worker context at 
the end.  Actually not sure it's the right solution and it's not really related 
to this issue.  So don't worry about it.  Please ignore my comment.


- Avery


-------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2746/#review3082
---


On 2011-11-07 19:09:08, Avery Ching wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/2746/
> ---
> 
> (Updated 2011-11-07 19:09:08)
> 
> 
> Review request for giraph.
> 
> 
> Summary
> ---
> 
> Claudio's patch for GIRAPH-47.
> 
> 
> This addresses bug GIRAPH-47.
> https://issues.apache.org/jira/browse/GIRAPH-47
> 
> 
> Diffs
> -
> 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.java
>  1198865 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/benchmark/RandomMessageBenchmark.java
>  1198865 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedService.java
>  1198865 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java
>  1198865 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleCheckpointVertex.java
>  1198865 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleMutateGraphVertex.java
>  1198865 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimplePageRankVertex.java
>  1198865 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleVertexWithWorkerContext.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BasicVertex.java
>  1198865 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspService.java
>  1198865 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
>  1198865 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspUtils.java
>  1198865 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/ap

Re: FOSDEM

2011-11-07 Thread Avery Ching
I think it would be great if someone could talk about Giraph at FOSDEM.  
Thanks for volunteering, Claudio.  I'm not planning to be there 
unfortunately.


Avery

On 11/7/11 12:33 PM, Claudio Martella wrote:

Hello list,

I was thinking about submitting for a talk at fosdem with topic Pregel
&  Giraph. Am I overlapping with somebody else?


Best,
Claudio





Review Request: GIRAPH-11 : Improve the graph distribution of Giraph

2011-11-09 Thread Avery Ching

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2788/
---

Review request for giraph.


Summary
---

Warning: This is a very large change!

Vertex ranges no longer exist.  A generic partitioner handles the
division of vertex ids to partitions.  As a default, there is a
HashPartitioner and a HashRangePartitioner that will use the hashCode
of a Java object to decide which partition to place the vertex.
Developers can write their own algorithm to determine how to change
the partitioning as well as implement the assignment of partitions to
workers.  All vertices loaded from the input split are sent to the
owner of the partition rather than loaded locally.  This eliminates the
constraint that the vertices must be ordered in the input split.

The checkpoint format has been changed to suit the new partition
style.  Checkpoints are now a lot simpler.  The master will assign
partitions and the workers will only load their own partitions from
the checkpoint.

Unfortunately, the vertex range implementation was baked into almost
every aspect of the code (hence the ridiculous size of this diff).
But now it should be flexible to support several different graph
partitioning schemes (i.e. hash-based, hash-ranged-based, and for
special cases, fully ranged-based).

Sorry for the long delay, but this way pretty involved.


This addresses bug GIRAPH-11.
https://issues.apache.org/jira/browse/GIRAPH-11


Diffs
-

  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java
 1199643 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
 1196639 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/CommunicationsInterface.java
 1199643 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/RPCCommunications.java
 1199643 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerInterface.java
 1199643 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerCommunications.java
 1196639 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/GeneratedVertexInputFormat.java
 1199643 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/GeneratedVertexReader.java
 1196639 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/MaxAggregator.java
 1199643 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/MinAggregator.java
 1199643 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleMutateGraphVertex.java
 1199643 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleSuperstepVertex.java
 1196639 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SuperstepBalancer.java
 1199643 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SuperstepHashPartitioner.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/VerifyMessage.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/AutoBalancer.java
 1199643 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BasicVertex.java
 1199643 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BasicVertexRangeBalancer.java
 1199643 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspService.java
 1199643 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceMaster.java
 1196639 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
 1198972 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspUtils.java
 1199643 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java
 1199643 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GlobalStats.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java
 1198972 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphState.java
 1199643 
  
http://svn.apache.org/repos/asf/in

Re: Newbie wanting to get involved with Giraph development

2011-11-10 Thread Avery Ching
Claudio gave great advice.  The only thing I'd like to add is that you 
also might want to consider using reviewboard for throwing up patches 
(https://reviews.apache.org/) as it helps to get line-by-line comments 
and suggestions.  Definitely email the mailing lists if you have 
questions and have fun!


Avery

On 11/10/11 9:40 AM, Claudio Martella wrote:

Hi,

I don't know if there's anything such as an official Development
Process, I can share how I usually do when I contribute to ASF
projects.

(1) Considering there's already an open issue (a ticket in the JIRA)
I'd download from SVN the  version to which the issue applies, I'd
write the fix, test it through unittests that come with the project
(and fix them if necessary) and create the patch (svn diff>
ISSUE.diff is the way I do) that I later attach it to the issue.

(2) If there's not open issue yet about a bug or a feature missing, if
it's trivial (such as the one you've chosen) I'd write the patch as in
(1) and open the issue describing the problem by attaching the fix. If
it's not trivial I'd open the issue and wait for some contribution to
the discussion. Learn from the other issues you find on the JIRA.

Make sure to follow CODE_CONVENTIONS file ( ;-) ).

I'm expecting the experts to give more insights about the process.

Hope this helps,
Claudio

On Thu, Nov 10, 2011 at 5:56 PM, Shaunak Kashyap  wrote:

Hi,

I'm a newbie to Giraph and ASF projects in general. I would like to
help with Giraph development and think I've found the perfect JIRA to
start: https://issues.apache.org/jira/browse/GIRAPH-63.

Before I make any code changes, however, I'd like to know more about
the development process of this project. What is a good place to start
learning about this?

Thank you,

Shaunak

--
"Now the hardness of this world slowly grinds your dreams away /
Makin' a fool's joke out of the promises we make" --- Bruce
Springsteen, "Blood Brothers"








Re: better way to update site?

2011-11-11 Thread Avery Ching

+1.  Having the pre-generated files in svn is good enough for me.

Avery

On 11/11/11 1:36 PM, Jakob Homan wrote:

As Avery documented in GIRAPH-36
(https://issues.apache.org/jira/browse/GIRAPH-35?focusedCommentId=13107195&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13107195)
it's pretty painful to update the site currently.  This is due to the
fact that we're keeping the site within the src tree (as dictated by
mvn), so we can't just check it out, as other, ant/forrest projects
do.  The other project I found that does this is hbase, which avoids
the whole mess by not keeping their site (ie the contents of
people.apache.org/www/incubator/giraph) in svn, but rather generating
the site locally and then copying straight to that directory.

I think this may be a better approach since it avoids the huge churn
of rming and re-creating the whole site structure each time.

In this schema, once the site is updated, run mvn site:site to
generate its contents, verify its correctness, then scp it to
people.apache.org and replace the current directory.  (or rsync it and
be done).  We'll still have all the history of the site, etc., just
none of the hassle.

What do people think?




Re: Review Request: GIRAPH-11 : Improve the graph distribution of Giraph

2011-11-13 Thread Avery Ching

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2788/
---

(Updated 2011-11-14 06:56:19.251685)


Review request for giraph.


Changes
---

Updated the diff as per Hyunsik's request to build against recent trunk 
changes.  While I was waiting I added some fixed and additions as well.

Upgrade ZooKeeper to 3.3.3 from 3.3.1.

Fixed bug in PseudoRandomVertexInputFormat.java where the edges are not fully 
added (hasEdge is not the right place to look for the edge).

Fixed bug in BasicRPCCommunications when putting to a local inPartitionMap

Added counter for last checkpointed superstep

Master should refresh the progress every 60 seconds while waiting for workers 
to ensure that the job isn't killed

Fixed bugs in vertexCounter, finishedVertexCoutner, edgeCounter, and 
sentMessages counter not resetting every update (just cumultatively being 
added).

Add additional helpful status messages for debugging.

Turned off speculative execution for Giraph (not a good idea).

Added analysis of the partition balancing for debugging


Summary
---

Warning: This is a very large change!

Vertex ranges no longer exist.  A generic partitioner handles the
division of vertex ids to partitions.  As a default, there is a
HashPartitioner and a HashRangePartitioner that will use the hashCode
of a Java object to decide which partition to place the vertex.
Developers can write their own algorithm to determine how to change
the partitioning as well as implement the assignment of partitions to
workers.  All vertices loaded from the input split are sent to the
owner of the partition rather than loaded locally.  This eliminates the
constraint that the vertices must be ordered in the input split.

The checkpoint format has been changed to suit the new partition
style.  Checkpoints are now a lot simpler.  The master will assign
partitions and the workers will only load their own partitions from
the checkpoint.

Unfortunately, the vertex range implementation was baked into almost
every aspect of the code (hence the ridiculous size of this diff).
But now it should be flexible to support several different graph
partitioning schemes (i.e. hash-based, hash-ranged-based, and for
special cases, fully ranged-based).

Sorry for the long delay, but this way pretty involved.


This addresses bug GIRAPH-11.
https://issues.apache.org/jira/browse/GIRAPH-11


Diffs (updated)
-

  http://svn.apache.org/repos/asf/incubator/giraph/trunk/pom.xml 1201607 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/benchmark/PseudoRandomVertexInputFormat.java
 1201607 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java
 1201607 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
 1201607 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/CommunicationsInterface.java
 1201607 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/RPCCommunications.java
 1201607 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerInterface.java
 1201607 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerCommunications.java
 1201607 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/GeneratedVertexInputFormat.java
 1201607 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/GeneratedVertexReader.java
 1201607 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/MaxAggregator.java
 1201607 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/MinAggregator.java
 1201607 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleMutateGraphVertex.java
 1201607 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleSuperstepVertex.java
 1201607 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SuperstepBalancer.java
 1201607 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SuperstepHashPartitioner.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/VerifyMessage.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/AutoBalancer.java
 1201607 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BasicVerte

Re: Review Request: GIRAPH-11 : Improve the graph distribution of Giraph

2011-11-14 Thread Avery Ching

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2788/
---

(Updated 2011-11-14 22:24:27.293676)


Review request for giraph.


Changes
---

Made changes based on Jakob's review.


Summary
---

Warning: This is a very large change!

Vertex ranges no longer exist.  A generic partitioner handles the
division of vertex ids to partitions.  As a default, there is a
HashPartitioner and a HashRangePartitioner that will use the hashCode
of a Java object to decide which partition to place the vertex.
Developers can write their own algorithm to determine how to change
the partitioning as well as implement the assignment of partitions to
workers.  All vertices loaded from the input split are sent to the
owner of the partition rather than loaded locally.  This eliminates the
constraint that the vertices must be ordered in the input split.

The checkpoint format has been changed to suit the new partition
style.  Checkpoints are now a lot simpler.  The master will assign
partitions and the workers will only load their own partitions from
the checkpoint.

Unfortunately, the vertex range implementation was baked into almost
every aspect of the code (hence the ridiculous size of this diff).
But now it should be flexible to support several different graph
partitioning schemes (i.e. hash-based, hash-ranged-based, and for
special cases, fully ranged-based).

Sorry for the long delay, but this way pretty involved.


This addresses bug GIRAPH-11.
https://issues.apache.org/jira/browse/GIRAPH-11


Diffs (updated)
-

  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionStats.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionUtils.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionBalancer.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionExchange.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionOwner.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/Partition.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/MasterGraphPartitioner.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashRangeWorkerPartitioner.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashPartitionerFactory.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashRangePartitionerFactory.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/GraphPartitionerFactory.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashMasterPartitioner.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/VertexRangeBalancer.java
 1201630 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/WorkerInfo.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/BasicPartitionOwner.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/StaticBalancer.java
 1201630 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/Vertex.java
 1201630 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/VertexEdgeCount.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/VertexRange.java
 1201630 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspUtils.java
 1201630 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java
 1201630 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GlobalStats.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java
 1201630 
  
http://svn.a

Re: Review Request: GIRAPH-11 : Improve the graph distribution of Giraph

2011-11-14 Thread Avery Ching
 line 76
> > <https://reviews.apache.org/r/2788/diff/2/?file=57811#file57811line76>
> >
> > typo: dependant -> dependent

Changed.


> On 2011-11-14 20:54:28, Jakob Homan wrote:
> > http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionBalancer.java,
> >  line 123
> > <https://reviews.apache.org/r/2788/diff/2/?file=57814#file57814line123>
> >
> > rename: value -> totalValue, to be consistent with usage.

Changed.


> On 2011-11-14 20:54:28, Jakob Homan wrote:
> > http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/RangeWorkerPartitioner.java,
> >  line 117
> > <https://reviews.apache.org/r/2788/diff/2/?file=57824#file57824line117>
> >
> > I'm unclear on this.

RangePartitionerFactory unfortunately is abstract, needs implementations of 
various index types.  A developer can use RangeWorkerPartitioner as something 
to help them out for their particular implementation.  This is somewhat 
experimental work, but the idea is that it will allow very very advanced users 
to customize partiitoning based on a range for their particular index type.  I 
am making this class abstract with a big notice on what needs to be done if you 
want to use it.


- Avery


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2788/#review3211
---


On 2011-11-14 22:24:27, Avery Ching wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/2788/
> ---
> 
> (Updated 2011-11-14 22:24:27)
> 
> 
> Review request for giraph.
> 
> 
> Summary
> ---
> 
> Warning: This is a very large change!
> 
> Vertex ranges no longer exist.  A generic partitioner handles the
> division of vertex ids to partitions.  As a default, there is a
> HashPartitioner and a HashRangePartitioner that will use the hashCode
> of a Java object to decide which partition to place the vertex.
> Developers can write their own algorithm to determine how to change
> the partitioning as well as implement the assignment of partitions to
> workers.  All vertices loaded from the input split are sent to the
> owner of the partition rather than loaded locally.  This eliminates the
> constraint that the vertices must be ordered in the input split.
> 
> The checkpoint format has been changed to suit the new partition
> style.  Checkpoints are now a lot simpler.  The master will assign
> partitions and the workers will only load their own partitions from
> the checkpoint.
> 
> Unfortunately, the vertex range implementation was baked into almost
> every aspect of the code (hence the ridiculous size of this diff).
> But now it should be flexible to support several different graph
> partitioning schemes (i.e. hash-based, hash-ranged-based, and for
> special cases, fully ranged-based).
> 
> Sorry for the long delay, but this way pretty involved.
> 
> 
> This addresses bug GIRAPH-11.
> https://issues.apache.org/jira/browse/GIRAPH-11
> 
> 
> Diffs
> -
> 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionStats.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionUtils.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionBalancer.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionExchange.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionOwner.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/Partition.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/MasterGraphPartitioner.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashRangeWorkerPartitioner.java
>  PRE-CREATION 
>   
> http://svn.apache.org/r

Re: Did GIRAPH-11 break vertex reactivation?

2011-11-15 Thread Avery Ching

Yes, I think I broke it.  Sorry.  Let me get you a diff to test quickly.

Avery

On 11/15/11 12:42 PM, Sebastian Schelter wrote:

Hi,

I updated to the latest trunk (after the GIRAPH-11 commit) and wanted to
continue to work on GIRAPH-51 where I use a small toy graph to test
SimpleShortestPathVertex.

Unfortunately my code did not work anymore and I guess I tracked it down
to the fact that vertex that voted to halt are not reacted anymore when
new messages arrive.

In SimpleShortestPathVertex every vertex always votes to halt and only
gets reactivated when a shorter path to it has been found. However my
test run always finished after superstep 0.

I don't know too much about Giraph's internals yet, but my guess is that
the number of sent messages is not tracked correctly anymore. Therefore
giraph finishes the algorithm (as all vertices voted to halt) although
there should still be messages in the pipeline.

I think I tracked it down to this behavior:

GraphMapper declares a variable workerSentMessages = 0 and never
increases it. This variable is given to
BspServiceWorker.finishSuperstep() which writes it to zookeeper and uses
it to compute the GlobalStats afterwards, which are used to decide
whether a new superstep has to be scheduled. As it has never been
increased, the algorithm will always stop when all vertices voted to halt.

It would be great if someone could confirm/disprove this speculation and
help me to continue work on GIRAPH-51

--sebastian




Re: Did GIRAPH-11 break vertex reactivation?

2011-11-15 Thread Avery Ching

This should fix it.  It passed local unittests.  Let me know.

Avery

On 11/15/11 1:03 PM, Avery Ching wrote:

Yes, I think I broke it.  Sorry.  Let me get you a diff to test quickly.

Avery

On 11/15/11 12:42 PM, Sebastian Schelter wrote:

Hi,

I updated to the latest trunk (after the GIRAPH-11 commit) and wanted to
continue to work on GIRAPH-51 where I use a small toy graph to test
SimpleShortestPathVertex.

Unfortunately my code did not work anymore and I guess I tracked it down
to the fact that vertex that voted to halt are not reacted anymore when
new messages arrive.

In SimpleShortestPathVertex every vertex always votes to halt and only
gets reactivated when a shorter path to it has been found. However my
test run always finished after superstep 0.

I don't know too much about Giraph's internals yet, but my guess is that
the number of sent messages is not tracked correctly anymore. Therefore
giraph finishes the algorithm (as all vertices voted to halt) although
there should still be messages in the pipeline.

I think I tracked it down to this behavior:

GraphMapper declares a variable workerSentMessages = 0 and never
increases it. This variable is given to
BspServiceWorker.finishSuperstep() which writes it to zookeeper and uses
it to compute the GlobalStats afterwards, which are used to decide
whether a new superstep has to be scheduled. As it has never been
increased, the algorithm will always stop when all vertices voted to 
halt.


It would be great if someone could confirm/disprove this speculation and
help me to continue work on GIRAPH-51

--sebastian




Index: src/main/java/org/apache/giraph/graph/BspServiceWorker.java
===
--- src/main/java/org/apache/giraph/graph/BspServiceWorker.java (revision 
1202424)
+++ src/main/java/org/apache/giraph/graph/BspServiceWorker.java (working copy)
@@ -548,7 +548,7 @@
 workerGraphPartitioner.finalizePartitionStats(
 partitionStatsList, workerPartitionMap);
 
-finishSuperstep(partitionStatsList, 0);
+finishSuperstep(partitionStatsList);
 }
 
 /**
@@ -773,8 +773,7 @@
 }
 
 @Override
-public boolean finishSuperstep(List partitionStatsList,
-   long workersSentMessages) {
+public boolean finishSuperstep(List partitionStatsList) {
 // This barrier blocks until success (or the master signals it to
 // restart).
 //
@@ -785,8 +784,9 @@
 // of this worker
 // 3. Let the master know it is finished.
 // 4. Then it waits for the master to say whether to stop or not.
+long workerSentMessages = 0;
 try {
-commService.flush(getContext());
+workerSentMessages = commService.flush(getContext());
 } catch (IOException e) {
 throw new IllegalStateException(
 "finishSuperstep: flush failed", e);
@@ -807,7 +807,7 @@
 workerFinishedInfoObj.put(JSONOBJ_PARTITION_STATS_KEY,
   Base64.encodeBytes(partitionStatsBytes));
 workerFinishedInfoObj.put(JSONOBJ_NUM_MESSAGES_KEY,
-  workersSentMessages);
+  workerSentMessages);
 } catch (JSONException e) {
 throw new RuntimeException(e);
 }
Index: src/main/java/org/apache/giraph/graph/GraphMapper.java
===
--- src/main/java/org/apache/giraph/graph/GraphMapper.java  (revision 
1202424)
+++ src/main/java/org/apache/giraph/graph/GraphMapper.java  (working copy)
@@ -512,7 +512,6 @@
 
 List partitionStatsList =
 new ArrayList();
-long workerSentMessages = 0;
 do {
 long superstep = serviceWorker.getSuperstep();
 
@@ -556,7 +555,6 @@
 context.progress();
 
 partitionStatsList.clear();
-workerSentMessages = 0;
 for (Partition partition :
 serviceWorker.getPartitionMap().values()) {
 PartitionStats partitionStats =
@@ -593,8 +591,7 @@
  " maxMem=" + Runtime.getRuntime().maxMemory() +
  " freeMem=" + Runtime.getRuntime().freeMemory());
 }
-} while (!serviceWorker.finishSuperstep(partitionStatsList,
-workerSentMessages));
+} while (!serviceWorker.finishSuperstep(partitionStatsList));
 if (LOG.isInfoEnabled()) {
 LOG.info("map: BSP application done " +
  "(global vertices marked done)");
Index: src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java
===
--- src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java  

Re: Review Request: GIRAPH-89: Remove debugging system.out from LongDoubleFloatDoubleVertex

2011-11-16 Thread Avery Ching

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2859/#review3304
---

Ship it!


- Avery


On 2011-11-16 20:20:09, shaunak wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/2859/
> ---
> 
> (Updated 2011-11-16 20:20:09)
> 
> 
> Review request for giraph.
> 
> 
> Summary
> ---
> 
> Removing System.out debugging statement.
> 
> 
> This addresses bug GIRAPH-89.
> https://issues.apache.org/jira/browse/GIRAPH-89
> 
> 
> Diffs
> -
> 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/LongDoubleFloatDoubleVertex.java
>  1202868 
> 
> Diff: https://reviews.apache.org/r/2859/diff
> 
> 
> Testing
> ---
> 
> $ mvn test
> 
> 
> Thanks,
> 
> shaunak
> 
>



Review Request: GIRAPH-91 - Large-memory improvements (Memory reduced vertex implementation, fast failure, added settings)

2011-11-16 Thread Avery Ching

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2868/
---

Review request for giraph.


Summary
---

There general changes should support larger heap sizes (i.e. >20G)

- Added new EdgeListVertex that stores its edges in a compact pair of lists 
instead of Vertex's HashMap.

- Added unittests TestEdgeArrayVertex to test EdgeListVertex.

- Augmented PageRankBenchmark to choose between EdgeListArrayVertex or Vertex 
(to try it out).

- Added failure cleanup for failed workers to quickly alert the master that 
they are dead by deleting its health ephemeral znode.  This allows us to set 
higher ZooKeeper timeouts to deal with GC pauses and the like.  In a quick test 
of 3 nodes, I saw failure in 43 seconds instead of 1m 52 sec.

- Added a context.progress() to flushing to not kill jobs with long timeouts 
(GC or lots of messages).


This addresses bug GIRAPH-91.
https://issues.apache.org/jira/browse/GIRAPH-91


Diffs
-

  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.java
 1202898 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java
 1202898 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
 1202898 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
 1202898 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/EdgeListVertex.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java
 1202898 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java
 1202898 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestJsonBase64Format.java
 1202898 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/graph/TestEdgeListVertex.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/2868/diff


Testing
---

Local unittests, PageRankBenchmark on multiple machines with >20GB heaps.


Thanks,

Avery



Re: Apache Giraph talk @ FOSDEM

2011-11-21 Thread Avery Ching
Thanks for volunteering Claudio!  It's very nice that Apache Giraph was 
mentioned in the invite, even though it's a relatively new open-source 
project.  I'll read your draft and send you feedback privately sometime 
today.  Also, if you need slides, please feel free to use anything I've 
ever posted (http://www.slideshare.net/averyching).  I think you can 
download them natively.  Also, if you need feedback for your slides, let 
me know.


Avery

On 11/21/11 12:43 PM, Claudio Martella wrote:

Hi devs,

FOSDEM has announced a devroom completely dedicated to Graph Processing:

https://lists.fosdem.org/pipermail/fosdem/2011-November/001344.html

I'm going to submit for a talk there. Here's the draft, feedback is welcome :)

Title: "Apache Giraph: distributed graph processing in the cloud."

Abstract: Web and online social graphs have been rapidly growing in
size and scale during the past decade. In 2008, Google estimated that
the number of web pages reached over a trillion. Online social
networking and email sites, including Yahoo!, Google, Microsoft,
Facebook, LinkedIn, and Twitter, have hundreds of millions of users
and are expected to grow much more in the future. Processing these
graphs plays a big role in relevant and personalized information for
users, such as results from a search engine or news in an online
social networking site.

The Apache Giraph (http://incubator.apache.org/giraph) project is a
faul-tolerant in-memory distributed graph processing system which runs
on top of a standard Hadoop cluster and is capable of running any
standard Bulk Synchronous Parallel (BSP) operation over any large
generic data set which can be represented as a graph. Apache Giraph is
a loose implementation of Google Pregel.
Giraph entered the ASF Incubator in July 2011, where it has enlisted
the aid of committers from Yahoo!, Facebook, LinkedIn, and Twitter.

The talk will present why running MapReduce jobs for graph processing
can be a problem,  introducing the reason why Google designed Pregel
at first place. Later, the BSP model will be presented focusing on how
it can be used to implement a distributed graph processing engine.
The last part of the talk will be dedicated to Apache Giraph, with a
description of the programming model (i.e. the API, some typical
examples such as PageRank and Single Source Shortest Path) along with
a technical overview of how the architecture of Giraph works and how
it leverages the Hadoop infrastructure.


Best,
Claudio





Review Request: GIRAPH-100 - Data input sampling and testing improvements

2011-11-29 Thread Avery Ching

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2959/
---

Review request for giraph.


Summary
---

Got rid of ZooKeeper message for node created on the input split reservation.

Adding some features for debugging:
- Taking only a % of the input splits
- Taking a maximum number of vertices in an input split

Added master status update for number of workers have responded.

Workers will output some information about how the % of input splits that have 
been completed.

Fixed a bug where a forced flush of cached vertices in the input split was 
happening per input split rather than at the end of processing all input 
splits.  This requires an additional barrier after processing all the input 
splits to allow for the final flush of the cached vertices.

Factored out barrierOnWorkerList to reuse the barrier code coordination by the 
master.

Factored out markInputSplitPathFinished to make the code a bit cleaner.

Clearing out the transientInMessages and inMessages maps to reduce processing 
time.

Changed the default partition count multipler to produce n^2 partitions rather 
than 0.5xn^2 for better balancing when the maximum limit is not exceeded.

Changed SimpleCheckpointVertex to throw an Exception instead of System.exit(-1) 
for a faster failure (seconds instead of minutes).

Moved SuperstepHashPartitionerFactory to the examples directory.  If it is not 
there, the test against a real Hadoop instance will fail from 
ClassNotFoundException.


This addresses bug GIRAPH-100.
https://issues.apache.org/jira/browse/GIRAPH-100


Diffs
-

  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/lib/TextVertexInputFormat.java
 1207804 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashMasterPartitioner.java
 1207804 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/lib/IdWithValueTextOutputFormat.java
 1207804 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java
 1207804 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
 1207804 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspService.java
 1207804 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceMaster.java
 1207804 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SuperstepHashPartitionerFactory.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceMaster.java
 1207804 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
 1207804 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleCheckpointVertex.java
 1207804 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestGraphPartitioner.java
 1207804 

Diff: https://reviews.apache.org/r/2959/diff


Testing
---

Passed local and Hadoop instance unittests.  Ran PageRankBenchmark on a real 
Hadoop cluster.


Thanks,

Avery



Re: Review Request: GIRAPH-100 - Data input sampling and testing improvements

2011-12-01 Thread Avery Ching

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2959/
---

(Updated 2011-12-02 02:55:14.025295)


Review request for giraph.


Changes
---

Moved examples/SuperstepHashPartitionerFactory.java to 
integration/SuperstepHashPartitionerFactory.java: 

Added a few context.progress() to the communication cycle to avoid task 
timeouts.


Summary
---

Got rid of ZooKeeper message for node created on the input split reservation.

Adding some features for debugging:
- Taking only a % of the input splits
- Taking a maximum number of vertices in an input split

Added master status update for number of workers have responded.

Workers will output some information about how the % of input splits that have 
been completed.

Fixed a bug where a forced flush of cached vertices in the input split was 
happening per input split rather than at the end of processing all input 
splits.  This requires an additional barrier after processing all the input 
splits to allow for the final flush of the cached vertices.

Factored out barrierOnWorkerList to reuse the barrier code coordination by the 
master.

Factored out markInputSplitPathFinished to make the code a bit cleaner.

Clearing out the transientInMessages and inMessages maps to reduce processing 
time.

Changed the default partition count multipler to produce n^2 partitions rather 
than 0.5xn^2 for better balancing when the maximum limit is not exceeded.

Changed SimpleCheckpointVertex to throw an Exception instead of System.exit(-1) 
for a faster failure (seconds instead of minutes).

Moved SuperstepHashPartitionerFactory to the examples directory.  If it is not 
there, the test against a real Hadoop instance will fail from 
ClassNotFoundException.


This addresses bug GIRAPH-100.
https://issues.apache.org/jira/browse/GIRAPH-100


Diffs (updated)
-

  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceMaster.java
 1209336 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
 1209336 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspService.java
 1209336 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceMaster.java
 1209336 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
 1209336 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java
 1209336 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashMasterPartitioner.java
 1209336 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/integration/SuperstepHashPartitionerFactory.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/lib/IdWithValueTextOutputFormat.java
 1209336 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/lib/TextVertexInputFormat.java
 1209336 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestGraphPartitioner.java
 1209336 

Diff: https://reviews.apache.org/r/2959/diff


Testing
---

Passed local and Hadoop instance unittests.  Ran PageRankBenchmark on a real 
Hadoop cluster.


Thanks,

Avery



Review Request: Save half of maximum memory used from messaging

2011-12-13 Thread Avery Ching

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3175/
---

Review request for giraph.


Summary
---

Currently, the amount of memory that Giraph uses for messaging is huge. This 
JIRA will reduce the messaging memory by half and provide periodic updates of 
memory for debugging. Details are below:

Refactored RandomMessageBenchmark to an internal vertex class. Added 
aggregators to RandomMessagesBenchmark to track bytes, messages, and time for 
the messaging. Adjusted the postSuperstep() to be called after the flush() for 
more accurate timings.

Added periodic minute updates for message flushing (which can take a while, 
especially on the memory benchmark). This helps to see how progress is going 
and gives an ETA.

Memory optimizations include:

-Clear the message list after computation
-Free vertex messages on the source as the flush is going on
-TreeMap -> HashMap for VertexMutations
-Sizing the ArrayList properly in transientInMessages


This addresses bug GIRAPH-104.
https://issues.apache.org/jira/browse/GIRAPH-104


Diffs
-

  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/benchmark/RandomMessageBenchmark.java
 1213849 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
 1213849 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/LongSumAggregator.java
 1213849 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
 1213849 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java
 1213849 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/WorkerContext.java
 1213849 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/utils/MemoryUtils.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/3175/diff


Testing
---

Passed local and Hadoop unittests.  RandomMessageBenchmark was run at scale on 
a real cluster.


Thanks,

Avery



Review Request: GIRAPH-57 Add new RPC call (putVertexIdMessagesList) to batch putMsgList RPCs together

2011-12-14 Thread Avery Ching

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3201/
---

Review request for giraph.


Summary
---

Added new putVertexIdMessagesList RPC and supporting classes (VertexIdMessages 
and VertexIdMessagesList) to reduce the total number of RPCs during a flush.  
This improves the number of RPC / sec and overall I/O bandwidth.  The amount of 
batching is done by the total number of messages and is configurable at runtime 
(default of 5000, weighted toward helping smaller messages).  I have noted some 
performance results in https://issues.apache.org/jira/browse/GIRAPH-57 (between 
25 - 1075 percent improvements).

Also, while tinkering with BasicRPCCommunications, notices inconsistent spaces 
between 'synchronized' and '('.  Removed spaces and standardized in the 
CODE_CONVENTIONS.


This addresses bug GIRAPH-57.
https://issues.apache.org/jira/browse/GIRAPH-57


Diffs
-

  http://svn.apache.org/repos/asf/incubator/giraph/trunk/CODE_CONVENTIONS 
1213849 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/ArrayListWritable.java
 1213849 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
 1214406 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/CommunicationsInterface.java
 1213849 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java
 1213849 

Diff: https://reviews.apache.org/r/3201/diff


Testing
---

Passed local and Hadoop unittests.  Used the RandomMessageBenchmark on a small 
cluster.


Thanks,

Avery



Re: Review Request: GIRAPH-57 Add new RPC call (putVertexIdMessagesList) to batch putMsgList RPCs together

2011-12-14 Thread Avery Ching

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3201/
---

(Updated 2011-12-14 19:50:29.358317)


Review request for giraph.


Changes
---

Forgot a few important files (VertexIdMessages.java and 
VertexIdMessagesList.java)


Summary
---

Added new putVertexIdMessagesList RPC and supporting classes (VertexIdMessages 
and VertexIdMessagesList) to reduce the total number of RPCs during a flush.  
This improves the number of RPC / sec and overall I/O bandwidth.  The amount of 
batching is done by the total number of messages and is configurable at runtime 
(default of 5000, weighted toward helping smaller messages).  I have noted some 
performance results in https://issues.apache.org/jira/browse/GIRAPH-57 (between 
25 - 1075 percent improvements).

Also, while tinkering with BasicRPCCommunications, notices inconsistent spaces 
between 'synchronized' and '('.  Removed spaces and standardized in the 
CODE_CONVENTIONS.


This addresses bug GIRAPH-57.
https://issues.apache.org/jira/browse/GIRAPH-57


Diffs (updated)
-

  http://svn.apache.org/repos/asf/incubator/giraph/trunk/CODE_CONVENTIONS 
1213849 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/ArrayListWritable.java
 1213849 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
 1214406 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/CommunicationsInterface.java
 1213849 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/VertexIdMessages.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/VertexIdMessagesList.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java
 1213849 

Diff: https://reviews.apache.org/r/3201/diff


Testing
---

Passed local and Hadoop unittests.  Used the RandomMessageBenchmark on a small 
cluster.


Thanks,

Avery



Re: Review Request: Refactor vertices to not expose the internal datastructure for holding messages

2011-12-15 Thread Avery Ching

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3203/#review3936
---


I think that overall this looks pretty nice.  I do have a couple of 
suggestions.  Also in the CODE_CONVENTIONS, comments should start with a 
capital letter i.e. (// This convention is silly).  


/trunk/src/main/java/org/apache/giraph/graph/BasicVertex.java


Should be package-private to avoid the user from mucking around with the 
message data structure.



/trunk/src/main/java/org/apache/giraph/graph/BasicVertex.java


Should be package-private to only be called by the infrastructure.  

Can we capitalize the comments?  I.e. /** Release...

Also the comment is not quite right.  releaseResources() will be called 
after the computation of the vertex, not only after a halted vertex.



/trunk/src/main/java/org/apache/giraph/utils/MemoryUtils.java


Thanks for fixing this (my bad)!  Argh.


- Avery


On 2011-12-15 10:42:39, Sebastian Schelter wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/3203/
> ---
> 
> (Updated 2011-12-15 10:42:39)
> 
> 
> Review request for giraph.
> 
> 
> Summary
> ---
> 
> refactoring that gives BasicVertex this 3 new methods:
> 
> public abstract Iterable getMessages()
> 
> returns an unmodifiable iterable allowing access to the current messages
> 
> public abstract void setMessages(Iterable messages);
> 
> replacement for getMsgList().clear() followed by getMsgList().addAll(...);
> 
> public abstract void releaseResources();
> 
> after a vertex voted to halt, all references to messages it could still hold 
> should be removed to enable earlier GC, instead of externally calling 
> replacement for getMsgList().clear(), this method should be used
> 
> Local unit tests pass, unfortunately I wasn't yet able to run the tests on my 
> hadoop cluster (still have problems because I can only access it via a socks 
> proxy)
> 
> 
> This addresses bug GIRAPH-80.
> https://issues.apache.org/jira/browse/GIRAPH-80
> 
> 
> Diffs
> -
> 
>   /trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 
> 1214675 
>   /trunk/src/main/java/org/apache/giraph/graph/BasicVertex.java 1214675 
>   /trunk/src/main/java/org/apache/giraph/graph/BasicVertexResolver.java 
> 1214675 
>   /trunk/src/main/java/org/apache/giraph/graph/EdgeListVertex.java 1214675 
>   /trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1214675 
>   
> /trunk/src/main/java/org/apache/giraph/graph/LongDoubleFloatDoubleVertex.java 
> 1214675 
>   /trunk/src/main/java/org/apache/giraph/graph/Vertex.java 1214675 
>   /trunk/src/main/java/org/apache/giraph/graph/VertexResolver.java 1214675 
>   /trunk/src/main/java/org/apache/giraph/utils/ComparisonUtils.java 
> PRE-CREATION 
>   /trunk/src/main/java/org/apache/giraph/utils/MemoryUtils.java 1214675 
>   /trunk/src/test/java/org/apache/giraph/utils/ComparisonUtilsTest.java 
> PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/3203/diff
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Sebastian
> 
>



Re: Running tests in pseudo-distributed mode

2011-12-17 Thread Avery Ching

We should document this somewhere.  It is not intuitive as you mention.

Avery

On 12/17/11 1:41 AM, Sebastian Schelter wrote:

A small hint for everyone who wants to run giraph's unit tests on a
pseudo-distributed single node hadoop cluster:

You have to adjust the configuration to allow 4 concurrent map tasks per
node (default in hadoop-0.20.203 is 2), otherwise the tests will fail!

You have to adjust mapred.tasktracker.map.tasks.maximum and
mapred.map.tasks in mapred-site.xml. Took me a while to figure out :)

--sebastian




Re: Review Request: GIRAPH-80 Refactor vertices to not expose the internal datastructure for holding messages

2011-12-17 Thread Avery Ching

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3203/#review3967
---

Ship it!


+1.  Thanks for the changes.  I will commit and then open up a separate JIRA to 
make setMessages() package-private.

- Avery


On 2011-12-17 09:36:06, Sebastian Schelter wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/3203/
> ---
> 
> (Updated 2011-12-17 09:36:06)
> 
> 
> Review request for giraph.
> 
> 
> Summary
> ---
> 
> refactoring that gives BasicVertex this 3 new methods:
> 
> public abstract Iterable getMessages()
> 
> returns an unmodifiable iterable allowing access to the current messages
> 
> public abstract void setMessages(Iterable messages);
> 
> replacement for getMsgList().clear() followed by getMsgList().addAll(...);
> 
> public abstract void releaseResources();
> 
> after a vertex voted to halt, all references to messages it could still hold 
> should be removed to enable earlier GC, instead of externally calling 
> replacement for getMsgList().clear(), this method should be used
> 
> Local unit tests pass, unfortunately I wasn't yet able to run the tests on my 
> hadoop cluster (still have problems because I can only access it via a socks 
> proxy)
> 
> 
> This addresses bug GIRAPH-80.
> https://issues.apache.org/jira/browse/GIRAPH-80
> 
> 
> Diffs
> -
> 
>   /trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 
> 1215442 
>   /trunk/src/main/java/org/apache/giraph/graph/BasicVertex.java 1215442 
>   /trunk/src/main/java/org/apache/giraph/graph/BasicVertexResolver.java 
> 1215442 
>   /trunk/src/main/java/org/apache/giraph/graph/EdgeListVertex.java 1215442 
>   /trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1215442 
>   
> /trunk/src/main/java/org/apache/giraph/graph/LongDoubleFloatDoubleVertex.java 
> 1215442 
>   /trunk/src/main/java/org/apache/giraph/graph/Vertex.java 1215442 
>   /trunk/src/main/java/org/apache/giraph/graph/VertexResolver.java 1215442 
>   /trunk/src/main/java/org/apache/giraph/utils/ComparisonUtils.java 
> PRE-CREATION 
>   /trunk/src/main/java/org/apache/giraph/utils/MemoryUtils.java 1215442 
>   /trunk/src/test/java/org/apache/giraph/utils/ComparisonUtilsTest.java 
> PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/3203/diff
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Sebastian
> 
>



Review Request: GIRAPH-106: Change prepareSuperstep() to make setMessages(Iterable messages) package-private

2011-12-19 Thread Avery Ching

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3255/
---

Review request for giraph.


Summary
---

Added method assignMessagesToVertex() to bypass the package-private access for 
setMessages().  Cleaned up some missed formatting for GIRAPH-80 as well.


This addresses bug GIRAPH-106.
https://issues.apache.org/jira/browse/GIRAPH-106


Diffs
-

  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java
 1220642 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
 1220642 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/CommunicationsInterface.java
 1220642 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BasicVertex.java
 1220642 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BasicVertexResolver.java
 1220642 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
 1220642 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/EdgeListVertex.java
 1220642 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/LongDoubleFloatDoubleVertex.java
 1220642 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/Vertex.java
 1220642 

Diff: https://reviews.apache.org/r/3255/diff


Testing
---

Passed local unittests.


Thanks,

Avery



Review Request: GIRAPH-112: Use elements() properly in LongDoubleFloatDoubleVertex

2011-12-20 Thread Avery Ching

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3287/
---

Review request for giraph.


Summary
---

As pointed out by YuanYua, the array returned by elements() cannot have its 
length used since the array contains all the elements currently stored in the 
mahout collections, even including invalid elements between size and capacity.

Whenever possible I converted elements() into forEach(), forEachKey(), 
forEachPair().  Used size() in other cases.

Fixed some formatting violations as well in LongDoubleFloatDoubleVertex.java.


This addresses bug GIRAPH-112.
https://issues.apache.org/jira/browse/GIRAPH-112


Diffs
-

  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/LongDoubleFloatDoubleVertex.java
 1221634 

Diff: https://reviews.apache.org/r/3287/diff


Testing
---

Local unittests and MR unittests.


Thanks,

Avery



Re: Review Request: GIRAPH-112: Use elements() properly in LongDoubleFloatDoubleVertex

2011-12-21 Thread Avery Ching
Thanks for the quick review Sebastian!  I think I still need a +1 from a 
Giraph committer before I can commit.


Avery

On 12/20/11 11:58 PM, Sebastian Schelter wrote:
This is an automatically generated e-mail. To reply, visit: 
https://reviews.apache.org/r/3287/



Ship it!

I ran into the same issue yesterday and the solution presented here is correct. 
For reasons of efficiency, list.elements() returns the internal underlying 
array for the list, which might be bigger than the number of elements stored in 
the list. Therefore you should only iterate until list.size() or use the 
foreachKey() callback.

- Sebastian


On December 21st, 2011, 7:50 a.m., Avery Ching wrote:

Review request for giraph.
By Avery Ching.

/Updated 2011-12-21 07:50:20/


  Description

As pointed out by YuanYua, the array returned by elements() cannot have its 
length used since the array contains all the elements currently stored in the 
mahout collections, even including invalid elements between size and capacity.

Whenever possible I converted elements() into forEach(), forEachKey(), 
forEachPair().  Used size() in other cases.

Fixed some formatting violations as well in LongDoubleFloatDoubleVertex.java.


  Testing

Local unittests and MR unittests.

*Bugs: * GIRAPH-112 <https://issues.apache.org/jira/browse/GIRAPH-112>


  Diffs

  * 
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/LongDoubleFloatDoubleVertex.java
(1221634)

View Diff <https://reviews.apache.org/r/3287/diff/>





Re: Unable to load vertices

2011-12-23 Thread Avery Ching
What MutableVertex implementation are you using?  Sounds like the issue 
only happens during the RPC to send the vertex to another worker.  Maybe 
a bug in the Writable implementation?


Avery

On 12/23/11 3:14 AM, Sebastian Schelter wrote:

Hmm, the job works if I use a single worker only locally, strange...

On 23.12.2011 11:07, Claudio Martella wrote:

With a super quick look, so i might be completely wrong, this looks
like you're running a different hadoop locally and on your test. Is
there any chance you're not using hadoop non_secure locally but you're
in your distributed mode?

On Fri, Dec 23, 2011 at 10:49 AM, Sebastian Schelter  wrote:

Hi,

I'm currently implementing an algorithm for diameter and radius
estimation. It already works when I run it on toy data via
InternalVertexRunner in a unit test.

Unfortunately, in my tests with a single node hadoop instance and real
cluster, I always run into the attached exception during startup. Does
anybody have an idea what might cause this?

--sebastian


2011-12-23 10:43:09,769 INFO org.apache.hadoop.mapred.TaskInProgress:
Error from attempt_201112230924_0006_m_01_0:
java.lang.IllegalStateException: run: Caught an unrecoverable exception
setup: Offlining servers due to exception...
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:641)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.Child.main(Child.java:253)
Caused by: java.lang.RuntimeException: setup: Offlining servers due to
exception...
at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:466)
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:630)
... 7 more
Caused by: java.lang.IllegalStateException: setup: loadVertices failed
at
org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:582)
at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:458)
... 8 more
Caused by: java.lang.RuntimeException: java.io.IOException: Call to
poodle-6/127.0.1.1:30002 failed on local exception: java.io.EOFException
at
org.apache.giraph.comm.BasicRPCCommunications.sendPartitionReq(BasicRPCCommunications.java:768)
at
org.apache.giraph.graph.BspServiceWorker.loadVertices(BspServiceWorker.java:304)
at
org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:575)
... 9 more
Caused by: java.io.IOException: Call to poodle-6/127.0.1.1:30002 failed
on local exception: java.io.EOFException
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1065)
at org.apache.hadoop.ipc.Client.call(Client.java:1033)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224)
at $Proxy3.putVertexList(Unknown Source)
at
org.apache.giraph.comm.BasicRPCCommunications.sendPartitionReq(BasicRPCCommunications.java:765)
... 11 more
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at 
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:767)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:712)







Re: Unable to load vertices

2011-12-23 Thread Avery Ching
Without looking at your code, maybe your I, V, E, or M types might have 
Writable issues?  In the single worker case, does checkpointing work?  
That would verify the writing part of Writable is okay, but not the 
reading part...(well you can do a manual checkpoint restart I guess to 
verify that).


Avery

On 12/23/11 9:23 AM, Sebastian Schelter wrote:

I'm extending org.apache.giraph.graph.Vertex directly. I also created
unit tests for the serialization of the Writables (writing them to a
byte array and reading them back) without finding something. Thank you
for the advice however, I'll continue searching :)

--sebastian


On 23.12.2011 18:14, Avery Ching wrote:

What MutableVertex implementation are you using?  Sounds like the issue
only happens during the RPC to send the vertex to another worker.  Maybe
a bug in the Writable implementation?

Avery

On 12/23/11 3:14 AM, Sebastian Schelter wrote:

Hmm, the job works if I use a single worker only locally, strange...

On 23.12.2011 11:07, Claudio Martella wrote:

With a super quick look, so i might be completely wrong, this looks
like you're running a different hadoop locally and on your test. Is
there any chance you're not using hadoop non_secure locally but you're
in your distributed mode?

On Fri, Dec 23, 2011 at 10:49 AM, Sebastian Schelter
wrote:

Hi,

I'm currently implementing an algorithm for diameter and radius
estimation. It already works when I run it on toy data via
InternalVertexRunner in a unit test.

Unfortunately, in my tests with a single node hadoop instance and real
cluster, I always run into the attached exception during startup. Does
anybody have an idea what might cause this?

--sebastian


2011-12-23 10:43:09,769 INFO org.apache.hadoop.mapred.TaskInProgress:
Error from attempt_201112230924_0006_m_01_0:
java.lang.IllegalStateException: run: Caught an unrecoverable exception
setup: Offlining servers due to exception...
 at
org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:641)
 at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)

 at org.apache.hadoop.mapred.Child.main(Child.java:253)
Caused by: java.lang.RuntimeException: setup: Offlining servers due to
exception...
 at
org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:466)
 at
org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:630)
 ... 7 more
Caused by: java.lang.IllegalStateException: setup: loadVertices failed
 at
org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:582)

 at
org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:458)
 ... 8 more
Caused by: java.lang.RuntimeException: java.io.IOException: Call to
poodle-6/127.0.1.1:30002 failed on local exception:
java.io.EOFException
 at
org.apache.giraph.comm.BasicRPCCommunications.sendPartitionReq(BasicRPCCommunications.java:768)

 at
org.apache.giraph.graph.BspServiceWorker.loadVertices(BspServiceWorker.java:304)

 at
org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:575)

 ... 9 more
Caused by: java.io.IOException: Call to poodle-6/127.0.1.1:30002 failed
on local exception: java.io.EOFException
 at org.apache.hadoop.ipc.Client.wrapException(Client.java:1065)
 at org.apache.hadoop.ipc.Client.call(Client.java:1033)
 at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224)
 at $Proxy3.putVertexList(Unknown Source)
 at
org.apache.giraph.comm.BasicRPCCommunications.sendPartitionReq(BasicRPCCommunications.java:765)

 ... 11 more
Caused by: java.io.EOFException
 at java.io.DataInputStream.readInt(DataInputStream.java:375)
 at
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:767)

 at org.apache.hadoop.ipc.Client$Connection.run(Client.java:712)






Re: Review Request: Port of the HCC algorithm for identifying all connected components of a graph

2011-12-24 Thread Avery Ching

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3313/#review4114
---


Sebastian, this is really awesome work, thanks for sharing it!  While I didn't 
read the paper, your code looks good and the compute() code is pretty 
straightforward.  IntIntNullIntVertex.java is a good example of how to make a 
very compact vertex.

I only have a few minor formatting requests.  

In the CODE_CONVENTIONS, comments should be:

- All classes, members, and member methods should have Javadoc in the following
  style.  C-style comments for javadoc and // comments for non-javadoc.  Also,
  the comment block should have a line break that separates the comment
  section and the @ section.

While not in the CODE_CONVENTIONS, but should be, Giraph follows spaces in the 
<>, i.e. .  Can you please add spaces i.e. ?

I marked as many as I could find, please correct any others I have missed.


/trunk/src/main/java/org/apache/giraph/examples/ConnectedComponentsVertex.java


CODE_CONVENTIONS comment suggestion.



/trunk/src/main/java/org/apache/giraph/examples/ConnectedComponentsVertex.java


You could shorten this a tad with the foreach pattern instead.



/trunk/src/main/java/org/apache/giraph/examples/ConnectedComponentsVertex.java


CODE_CONVENTIONS comment suggestion.



/trunk/src/main/java/org/apache/giraph/examples/ConnectedComponentsVertex.java


Could be foreach (again).



/trunk/src/main/java/org/apache/giraph/examples/ConnectedComponentsVertex.java


CODE_CONVENTIONS



/trunk/src/main/java/org/apache/giraph/examples/ConnectedComponentsVertex.java


CODE_CONVENTIONS



/trunk/src/main/java/org/apache/giraph/examples/IntIntNullIntTextInputFormat.java


CODE_CONVENTIONS



/trunk/src/main/java/org/apache/giraph/examples/IntIntNullIntTextInputFormat.java


CODE_CONVENTIONS



/trunk/src/main/java/org/apache/giraph/examples/IntIntNullIntTextInputFormat.java


CODE_CONVENTIONS



/trunk/src/main/java/org/apache/giraph/examples/IntIntNullIntTextInputFormat.java


CODE_CONVENTIONS



/trunk/src/main/java/org/apache/giraph/examples/IntIntNullIntTextInputFormat.java


CODE_CONVENTIONS



/trunk/src/main/java/org/apache/giraph/examples/IntIntNullIntTextInputFormat.java


CODE_CONVENTIONS



/trunk/src/main/java/org/apache/giraph/examples/MinimumIntCombiner.java


CODE_CONVENTIONS



/trunk/src/main/java/org/apache/giraph/examples/VertexWithComponentTextOutputFormat.java


Capital 'text'



/trunk/src/main/java/org/apache/giraph/graph/IntIntNullIntVertex.java


NullWritable,IntWritable

should be

NullWritable, IntWritable



/trunk/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java


String,String -> String, String


- Avery


On 2011-12-24 09:32:15, Sebastian Schelter wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/3313/
> ---
> 
> (Updated 2011-12-24 09:32:15)
> 
> 
> Review request for giraph.
> 
> 
> Summary
> ---
> 
> Port of the HCC algorithm to Giraph. Each vertex needs to find the smallest 
> vertex id in its component.
> 
> I created a very memory-efficient abstract vertex in 
> org.apache.giraph.graph.IntIntNullIntVertex and had 
> org.apache.giraph.examples.ConnectedComponentsVertex extend that. 
> org.apache.giraph.examples.ConnectedComponentsVertexTest contains an 
> "integration" test on toy data.
> 
> I had to patch org.apache.giraph.utils.InternalVertexRunner to allow the use 
> of combiners and to shutdown() the local zookeeper instance in the tests.
> 
> Local and pseudo-distributed unit tests were passed. I also tested the 
> algorithm on a 6-machine hadoop cluster using the wikipedia pagelink graph 
> (5.7M vertices, 130M edges).
> 
> 
> This addresses bug GIRAPH-115.
> https://issues.apache.org/jira/browse/GIRAPH-115
> 
> 
> Diffs
> -
> 
>   
> /trunk/src/main/java/org/apache/giraph/examples/ConnectedComponentsVertex.java
>  PRE

Re: Review Request: Port of the HCC algorithm for identifying all connected components of a graph

2011-12-25 Thread Avery Ching

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3313/#review4117
---

Ship it!


Looks great.  I'll commit on your behalf.


/trunk/src/test/java/org/apache/giraph/examples/ConnectedComponentsVertexTest.java


I'll fix this one for you. =) Remove */.


- Avery


On 2011-12-25 09:36:39, Sebastian Schelter wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/3313/
> ---
> 
> (Updated 2011-12-25 09:36:39)
> 
> 
> Review request for giraph.
> 
> 
> Summary
> ---
> 
> Port of the HCC algorithm to Giraph. Each vertex needs to find the smallest 
> vertex id in its component.
> 
> I created a very memory-efficient abstract vertex in 
> org.apache.giraph.graph.IntIntNullIntVertex and had 
> org.apache.giraph.examples.ConnectedComponentsVertex extend that. 
> org.apache.giraph.examples.ConnectedComponentsVertexTest contains an 
> "integration" test on toy data.
> 
> I had to patch org.apache.giraph.utils.InternalVertexRunner to allow the use 
> of combiners and to shutdown() the local zookeeper instance in the tests.
> 
> Local and pseudo-distributed unit tests were passed. I also tested the 
> algorithm on a 6-machine hadoop cluster using the wikipedia pagelink graph 
> (5.7M vertices, 130M edges).
> 
> 
> This addresses bug GIRAPH-115.
> https://issues.apache.org/jira/browse/GIRAPH-115
> 
> 
> Diffs
> -
> 
>   
> /trunk/src/main/java/org/apache/giraph/examples/ConnectedComponentsVertex.java
>  PRE-CREATION 
>   
> /trunk/src/main/java/org/apache/giraph/examples/IntIntNullIntTextInputFormat.java
>  PRE-CREATION 
>   /trunk/src/main/java/org/apache/giraph/examples/MinimumIntCombiner.java 
> PRE-CREATION 
>   
> /trunk/src/main/java/org/apache/giraph/examples/VertexWithComponentTextOutputFormat.java
>  PRE-CREATION 
>   /trunk/src/main/java/org/apache/giraph/graph/IntIntNullIntVertex.java 
> PRE-CREATION 
>   /trunk/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 
> 1222837 
>   
> /trunk/src/main/java/org/apache/giraph/utils/UnmodifiableIntArrayIterator.java
>  PRE-CREATION 
>   
> /trunk/src/test/java/org/apache/giraph/examples/ConnectedComponentsVertexTest.java
>  PRE-CREATION 
>   /trunk/src/test/java/org/apache/giraph/examples/MinimumIntCombinerTest.java 
> PRE-CREATION 
>   
> /trunk/src/test/java/org/apache/giraph/examples/SimpleShortestPathVertexTest.java
>  1222837 
> 
> Diff: https://reviews.apache.org/r/3313/diff
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Sebastian
> 
>



Re: Unable to load vertices

2011-12-27 Thread Avery Ching

Glad you found the issue Sebastian! =)

Avery

On 12/27/11 12:47 PM, Sebastian Schelter wrote:

You were write it was an issue with writing/reading the vertex value.
Only took me three days of searching to find out that I simply forgot to
call setVertexValue() ... :)

--sebastian



On 23.12.2011 18:28, Avery Ching wrote:

Without looking at your code, maybe your I, V, E, or M types might have
Writable issues?  In the single worker case, does checkpointing work?
That would verify the writing part of Writable is okay, but not the
reading part...(well you can do a manual checkpoint restart I guess to
verify that).

Avery

On 12/23/11 9:23 AM, Sebastian Schelter wrote:

I'm extending org.apache.giraph.graph.Vertex directly. I also created
unit tests for the serialization of the Writables (writing them to a
byte array and reading them back) without finding something. Thank you
for the advice however, I'll continue searching :)

--sebastian


On 23.12.2011 18:14, Avery Ching wrote:

What MutableVertex implementation are you using?  Sounds like the issue
only happens during the RPC to send the vertex to another worker.  Maybe
a bug in the Writable implementation?

Avery

On 12/23/11 3:14 AM, Sebastian Schelter wrote:

Hmm, the job works if I use a single worker only locally, strange...

On 23.12.2011 11:07, Claudio Martella wrote:

With a super quick look, so i might be completely wrong, this looks
like you're running a different hadoop locally and on your test. Is
there any chance you're not using hadoop non_secure locally but you're
in your distributed mode?

On Fri, Dec 23, 2011 at 10:49 AM, Sebastian Schelter
wrote:

Hi,

I'm currently implementing an algorithm for diameter and radius
estimation. It already works when I run it on toy data via
InternalVertexRunner in a unit test.

Unfortunately, in my tests with a single node hadoop instance and
real
cluster, I always run into the attached exception during startup.
Does
anybody have an idea what might cause this?

--sebastian


2011-12-23 10:43:09,769 INFO org.apache.hadoop.mapred.TaskInProgress:
Error from attempt_201112230924_0006_m_01_0:
java.lang.IllegalStateException: run: Caught an unrecoverable
exception
setup: Offlining servers due to exception...
  at
org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:641)
  at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
  at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
  at java.security.AccessController.doPrivileged(Native
Method)
  at javax.security.auth.Subject.doAs(Subject.java:396)
  at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)


  at org.apache.hadoop.mapred.Child.main(Child.java:253)
Caused by: java.lang.RuntimeException: setup: Offlining servers
due to
exception...
  at
org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:466)
  at
org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:630)
  ... 7 more
Caused by: java.lang.IllegalStateException: setup: loadVertices
failed
  at
org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:582)


  at
org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:458)
  ... 8 more
Caused by: java.lang.RuntimeException: java.io.IOException: Call to
poodle-6/127.0.1.1:30002 failed on local exception:
java.io.EOFException
  at
org.apache.giraph.comm.BasicRPCCommunications.sendPartitionReq(BasicRPCCommunications.java:768)


  at
org.apache.giraph.graph.BspServiceWorker.loadVertices(BspServiceWorker.java:304)


  at
org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:575)


  ... 9 more
Caused by: java.io.IOException: Call to poodle-6/127.0.1.1:30002
failed
on local exception: java.io.EOFException
  at
org.apache.hadoop.ipc.Client.wrapException(Client.java:1065)
  at org.apache.hadoop.ipc.Client.call(Client.java:1033)
  at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224)
  at $Proxy3.putVertexList(Unknown Source)
  at
org.apache.giraph.comm.BasicRPCCommunications.sendPartitionReq(BasicRPCCommunications.java:765)


  ... 11 more
Caused by: java.io.EOFException
  at java.io.DataInputStream.readInt(DataInputStream.java:375)
  at
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:767)


  at
org.apache.hadoop.ipc.Client$Connection.run(Client.java:712)




Review Request: Make EdgeListVertex the default vertex implementation, fix bugs related to EdgeListVertex.

2012-01-01 Thread Avery Ching

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3349/
---

Review request for giraph.


Summary
---

* Changed Vertex.java to HashMapVertex.java.  This makes it less likely folks 
will use it as a default.  I have included comments that suggest EdgeListVertex 
for static graphs (most cases).

* Found and fixed bugs in EdgeListVertex with the way that binarySearch was 
being used.  Added unittests to check for adding/getting/removing edges.

* Changed classes that extend Vertex to extending EdgeListVertex instead.

* Changed MutableVertex to BasicVertex for addVertex, addVertexReq to be a 
little safer

* Tried to make sure that when a class that extends MutableVertex is 
instantiated that it also will call readFields() or initialize().  This fixed 
several bugs.

* Changed the interface of BasicVertex#initialize from  public abstract void 
initialize(I vertexId, V vertexValue, Map edges, List messages) to 
initialize(I vertexId, V vertexValue, Map edges, Iterable messages) to 
better fit the recent changes to BasicVertex getting/setting messages with an 
Iterable.

* Found and removed duplicated code from several MutableVertex extended classes 
for addVertexRequest, removeVertexRequest, addEdgeRequest and removeEdgeRequest.

* Changed Vertex cast to BasicVertex cast in Partition and MockUtils.

* There are some tabs --> spaces conversions done automatically from my Ecipse 
settings for the files I touched.


This addresses bug GIRAPH-116.
https://issues.apache.org/jira/browse/GIRAPH-116


Diffs
-

  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.java
 1226330 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/benchmark/PseudoRandomVertexInputFormat.java
 1226330 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
 1226330 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/CommunicationsInterface.java
 1226330 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerCommunications.java
 1226330 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleCheckpointVertex.java
 1226330 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleCombinerVertex.java
 1226330 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleFailVertex.java
 1226330 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleMsgVertex.java
 1226330 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleMutateGraphVertex.java
 1226330 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleShortestPathsVertex.java
 1226330 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleSuperstepVertex.java
 1226330 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleVertexWithWorkerContext.java
 1226330 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/VerifyMessage.java
 1226330 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BasicVertex.java
 1226330 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BasicVertexResolver.java
 1226330 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspUtils.java
 1226330 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/EdgeListVertex.java
 1226330 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/HashMapVertex.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/IntIntNullIntVertex.java
 1226330 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/LongDoubleFloatDoubleVertex.java
 1226330 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/MutableVertex.java
 1226330 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/VertexResolver.java
 1226330 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/Partition.java
 1226330 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestBspBasic.java
 1226330 
  
http://s

Re: Review Request: Make EdgeListVertex the default vertex implementation, fix bugs related to EdgeListVertex.

2012-01-02 Thread Avery Ching


> On 2012-01-02 11:04:25, Sebastian Schelter wrote:
> > Had a quick look over your changes and everything looked good. I think it's 
> > right to assume that most implementations will use static graphs and to 
> > offer EdgeListVertex as the default extension point for this. The only 
> > thing I don't like is the name change from MutableVertex to BasicVertex, I 
> > liked the former better because it sounds much more expressive to me.

Sebastian, I didn't change the name from MutableVertex to BasicVertex, sorry 
for the mixup.  I think MutableVertex is useful as well.  The only thing I did 
chance for for the addVertex() and addVertexReq() methods to take a BasicVertex 
rather than a MutableVertex to be a tad safer.  We still have the BasicVertex 
-> MutableVertex -> User vertex class hierarchy.


> On 2012-01-02 11:04:25, Sebastian Schelter wrote:
> > http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BasicVertex.java,
> >  line 61
> > <https://reviews.apache.org/r/3349/diff/2/?file=66112#file66112line61>
> >
> > good thing to have the Iterable<> abstraction here

=)


- Avery


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3349/#review4168
---


On 2012-01-02 02:35:50, Avery Ching wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/3349/
> ---
> 
> (Updated 2012-01-02 02:35:50)
> 
> 
> Review request for giraph.
> 
> 
> Summary
> ---
> 
> * Changed Vertex.java to HashMapVertex.java.  This makes it less likely folks 
> will use it as a default.  I have included comments that suggest 
> EdgeListVertex for static graphs (most cases).
> 
> * Found and fixed bugs in EdgeListVertex with the way that binarySearch was 
> being used.  Added unittests to check for adding/getting/removing edges.
> 
> * Changed classes that extend Vertex to extending EdgeListVertex instead.
> 
> * Changed MutableVertex to BasicVertex for addVertex, addVertexReq to be a 
> little safer
> 
> * Tried to make sure that when a class that extends MutableVertex is 
> instantiated that it also will call readFields() or initialize().  This fixed 
> several bugs.
> 
> * Changed the interface of BasicVertex#initialize from  public abstract void 
> initialize(I vertexId, V vertexValue, Map edges, List messages) to 
> initialize(I vertexId, V vertexValue, Map edges, Iterable messages) 
> to better fit the recent changes to BasicVertex getting/setting messages with 
> an Iterable.
> 
> * Found and removed duplicated code from several MutableVertex extended 
> classes for addVertexRequest, removeVertexRequest, addEdgeRequest and 
> removeEdgeRequest.
> 
> * Changed Vertex cast to BasicVertex cast in Partition and MockUtils.
> 
> * There are some tabs --> spaces conversions done automatically from my 
> Ecipse settings for the files I touched.
> 
> 
> This addresses bug GIRAPH-116.
> https://issues.apache.org/jira/browse/GIRAPH-116
> 
> 
> Diffs
> -
> 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.java
>  1226330 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/benchmark/PseudoRandomVertexInputFormat.java
>  1226330 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
>  1226330 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/CommunicationsInterface.java
>  1226330 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerCommunications.java
>  1226330 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleCheckpointVertex.java
>  1226330 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleCombinerVertex.java
>  1226330 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleFailVertex.java
>  1226330 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleMsgVertex.java
>  1226330 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleMutateGraphVertex.ja

Re: Review Request: Make EdgeListVertex the default vertex implementation, fix bugs related to EdgeListVertex.

2012-01-02 Thread Avery Ching

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3349/
---

(Updated 2012-01-02 18:51:05.007864)


Review request for giraph.


Changes
---

Sorry, forgot to include the new tests for 
giraph/trunk/src/test/java/org/apache/giraph/graph/TestEdgeListVertex.java.  
I'll commit and if there is a problem, I can fix it later.  The tests pass.


Summary
---

* Changed Vertex.java to HashMapVertex.java.  This makes it less likely folks 
will use it as a default.  I have included comments that suggest EdgeListVertex 
for static graphs (most cases).

* Found and fixed bugs in EdgeListVertex with the way that binarySearch was 
being used.  Added unittests to check for adding/getting/removing edges.

* Changed classes that extend Vertex to extending EdgeListVertex instead.

* Changed MutableVertex to BasicVertex for addVertex, addVertexReq to be a 
little safer

* Tried to make sure that when a class that extends MutableVertex is 
instantiated that it also will call readFields() or initialize().  This fixed 
several bugs.

* Changed the interface of BasicVertex#initialize from  public abstract void 
initialize(I vertexId, V vertexValue, Map edges, List messages) to 
initialize(I vertexId, V vertexValue, Map edges, Iterable messages) to 
better fit the recent changes to BasicVertex getting/setting messages with an 
Iterable.

* Found and removed duplicated code from several MutableVertex extended classes 
for addVertexRequest, removeVertexRequest, addEdgeRequest and removeEdgeRequest.

* Changed Vertex cast to BasicVertex cast in Partition and MockUtils.

* There are some tabs --> spaces conversions done automatically from my Ecipse 
settings for the files I touched.


This addresses bug GIRAPH-116.
https://issues.apache.org/jira/browse/GIRAPH-116


Diffs (updated)
-

  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.java
 1226507 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/benchmark/PseudoRandomVertexInputFormat.java
 1226507 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
 1226507 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/CommunicationsInterface.java
 1226507 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerCommunications.java
 1226507 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleCheckpointVertex.java
 1226507 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleCombinerVertex.java
 1226507 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleFailVertex.java
 1226507 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleMsgVertex.java
 1226507 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleMutateGraphVertex.java
 1226507 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleShortestPathsVertex.java
 1226507 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleSuperstepVertex.java
 1226507 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleVertexWithWorkerContext.java
 1226507 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/VerifyMessage.java
 1226507 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BasicVertex.java
 1226507 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BasicVertexResolver.java
 1226507 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspUtils.java
 1226507 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/EdgeListVertex.java
 1226507 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/HashMapVertex.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/IntIntNullIntVertex.java
 1226507 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/LongDoubleFloatDoubleVertex.java
 1226507 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/MutableVertex.java
 1226507 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/VertexResolver.java
 122

Re: Persisting global values

2012-01-02 Thread Avery Ching
There is support that Claudio added for exporting aggregator values 
(https://issues.apache.org/jira/browse/GIRAPH-10).  You could probably 
also do your final computation there if that is what you want (if you 
want to do end of application manipulation).  WorkerContext would be a 
good way to do aggregator manipulation in between supersteps (or at the 
end), as you suggested as well.  Is the manipulation you want to do in 
between supersteps or just at the end of the application?


Avery

On 1/2/12 6:05 AM, Sebastian Schelter wrote:

Hi,

I'm working on an algorithm that computes a global value of the graph
(its so called effective diameter) and I have an Aggregator with which
this value can be computed. What would be the correct place to implement
this computation?

I thought about WorkerContext first, but it seems that this is run on
each worker, which doesn't really fit my problem. Another question would
be how to persist that global value after the algorithm is done.

--sebastian




Re: Added stub Incubator report for January 2012

2012-01-04 Thread Avery Ching

Thanks Chris.  Looks good to me.

Avery

On 1/4/12 6:13 AM, Mattmann, Chris A (388J) wrote:

Hey Guys,

Here's a stub report for Giraph for January 2012:

http://wiki.apache.org/incubator/January2012

Please update and add more detail.

Thanks!

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++





Re: Time to roll a release?

2012-01-04 Thread Avery Ching
+1.  I appreciate you volunteering.  Hopefully those Kafka lessons can 
be put to good use!


Avery

On 1/4/12 12:15 PM, Jakob Homan wrote:

I think there's been enough work done since Giraph entered incubation
that we're ready to do a release.  We've had significant performance
and usability improvements, to the point where anyone interested in
Giraph/Pregal/BSP should definitely take a look at the code and try it
out.  Rolling a release would signal anyone left on the fence that
it's worth their time.  This is also a required criterion for
advancing through the incubator, as we're doing well on the others
currently.

Having been peripherally involved in Kafka's recent first release, I
can tell you it's quite a lot of paperwork, but I'm happy to volunteer
to roll the first one.  Any objections? Ideas?  Hysterical laughter?

-Jakob




Re: some weird code

2012-01-06 Thread Avery Ching

Hi Claudio, answers inline:

On 1/6/12 8:25 AM, Claudio Martella wrote:

Hello,

I hope somebody can shed some light on a piece of code i'm looking at
while working on GIRAPH-45 (and this code is also the object of
GIRAPH-95, so we'd probably get two birds with one stone here).

The code is taking care of vertex resolving in
BasicRPCCommunication::prepareSuperstep():
[line 1091]:
if (vertex != null) {
 ((MutableVertex) vertex).setVertexId(vertexIndex);
 partition.putVertex((BasicVertex) vertex);
 } else if (originalVertex != null) {
 partition.removeVertex(originalVertex.getVertexId());
 }

First, vertex cannot be null as it's resolved by vertexRevolver, but i
guess it's a sanity check. But the real question is: why would you
setVertex() considering it's been already initialized correctly in
vertexResolver?
Actually it can be null.  Since user's can implement their own vertex 
resolver, they are allowed to return null from the javadoc.


/**
 * A vertex may have been removed, created zero or more times and had
 * zero or more messages sent to it.  This method will handle all 
situations
 * excluding the normal case (a vertex already exists and has zero 
or more

 * messages sent it to).
 *
 * @param vertexId Vertex id (can be used for {@link BasicVertex}'s
 *initialize())
 * @param vertex Original vertex or null if none
 * @param vertexChanges Changes that happened to this vertex or 
null if none
 * @param messages messages received in the last superstep or null 
if none
 * @return Vertex to be returned, if null, and a vertex currently 
exists

 * it will be removed
 */


Am I missing something or did I just realize that GIRAPH-95 is solved
by just removing that line? :)

Thanks

Well, not sure about that.  The set is done there I think to ensure 
safety.  Here's the issue:  Suppose that the resolve() doesn't set the 
vertex id correctly (i.e. in this partition).  That would be a bug and 
probably cause issues.  Probably this should be changed to be a check 
though.  Something like...


if (vertex != null) {
if (vertex.getVertexId().equals(vertexIndex)) {
throw new 
IllegalStateException("BasicRPCCommunications: Illegal to set the vertex 
index differently from " + vertexIndex);

if (originalVertex == null) {
partition.putVertex((BasicVertex) vertex);
} else {
partition.removeVertex(originalVertex.getVertexId());
}
}

What do you think?

Avery


Re: some weird code

2012-01-08 Thread Avery Ching

Given our changes to Vertex, I think so. +1

Avery

On 1/8/12 8:28 AM, Claudio Martella wrote:

One thing about the VertexResolver. Doesn't it make more sense if the
interface is called VertexResolver and the default basic
implementation is called BasicVertexResolver?

On Fri, Jan 6, 2012 at 10:19 PM, Claudio Martella
  wrote:

Hi avery,
sorry forgot resolver was exported to user space. I ll consider this. About
your idea, it makes sense although I somehow I believe that if user space
messes up it s not our fault. Your solution though makes evrrybody happy.
Will implement this and send the separate patch. Thanks


On Friday, January 6, 2012, Avery Ching  wrote:

Hi Claudio, answers inline:

On 1/6/12 8:25 AM, Claudio Martella wrote:

Hello,

I hope somebody can shed some light on a piece of code i'm looking at
while working on GIRAPH-45 (and this code is also the object of
GIRAPH-95, so we'd probably get two birds with one stone here).

The code is taking care of vertex resolving in
BasicRPCCommunication::prepareSuperstep():
[line 1091]:
if (vertex != null) {
 ((MutableVertex)
vertex).setVertexId(vertexIndex);
 partition.putVertex((BasicVertex) vertex);
 } else if (originalVertex != null) {
 partition.removeVertex(originalVertex.getVertexId());
 }

First, vertex cannot be null as it's resolved by vertexRevolver, but i
guess it's a sanity check. But the real question is: why would you
setVertex() considering it's been already initialized correctly in
vertexResolver?

Actually it can be null.  Since user's can implement their own vertex
resolver, they are allowed to return null from the javadoc.

/**
 * A vertex may have been removed, created zero or more times and had
 * zero or more messages sent to it.  This method will handle all
situations
 * excluding the normal case (a vertex already exists and has zero or
more
 * messages sent it to).
 *
 * @param vertexId Vertex id (can be used for {@link BasicVertex}'s
 *initialize())
 * @param vertex Original vertex or null if none
 * @param vertexChanges Changes that happened to this vertex or null if
none
 * @param messages messages received in the last superstep or null if
none
 * @return Vertex to be returned, if null, and a vertex currently
exists
 * it will be removed
 */


Am I missing something or did I just realize that GIRAPH-95 is solved
by just removing that line? :)

Thanks


Well, not sure about that.  The set is done there I think to ensure
safety.  Here's the issue:  Suppose that the resolve() doesn't set the
vertex id correctly (i.e. in this partition).  That would be a bug and
probably cause issues.  Probably this should be changed to be a check
though.  Something like...

if (vertex != null) {
if (vertex.getVertexId().equals(vertexIndex)) {
throw new IllegalStateException("BasicRPCCommunications:
Illegal to set the vertex index differently from " + vertexIndex);
if (originalVertex == null) {
partition.putVertex((BasicVertex) vertex);
} else {
partition.removeVertex(originalVertex.getVertexId());
}
}

What do you think?

Avery


--
Claudio Martella
claudio.marte...@gmail.com







Re: on the semantics of the combiner

2012-01-09 Thread Avery Ching

The javadoc for VertexCombiner#combine() is

  /**
   * Combines message values for a particular vertex index.
   *
   * @param vertexIndex Index of the vertex getting these messages
   * @param msgList List of the messages to be combined
   * @return Message that is combined from {@link MsgList} or null if no
   * message it to be sent
   * @throws IOException
   */

I think we are somewhat vague on what a combiner can return to support 
various use cases.  A combiner should be particular to a particular 
compute() algorithm.  I think it should be legal to return null from a 
combiner, in that case, no message should be sent to that vertex.


It seems like it would be an overhead to call a combiner when there are 
0 messages.  I can't see a case where that would be useful.  Perhaps we 
should change the javadoc to insure that msgList must contain at least 
one message to have combine() being called.


Avery

On 1/9/12 5:37 AM, Claudio Martella wrote:

Hi Sebastian,

yes, that was my point, I agree completely with you.
Fixing my test was not the issue, my question was whether we want to
define explicitly the semantics of this scenario.
Personally, I believe the combiner should be ready to receive 0
messages, as it's the case of BasicVertex::initialize(), putMessages()
and compute(), and act accordingly.

In the particular example, I believe the SimpleSumCombiner is bugged.
It's true that the sum of no values is 0, but it's also true that the
null return semantics of combine() is more suitable for this exact
situation.


On Mon, Jan 9, 2012 at 2:21 PM, Sebastian Schelter  wrote:

I think we currently implicitly assume that there is at least one
element in the Iterable passed to the combiner. The messaging code only
invokes the combiner only if at least one message for the target vertex
has been sent.

However, we should not rely on implicit implementation details but
explicitly specify the semantics of combiners.

--sebastian

On 09.01.2012 13:29, Claudio Martella wrote:

Hello list,

for GIRAPH-45 I'm touching the incoming messages and hit an
interesting problem with the combiner semantics.
currently, my code fails testBspCombiner for the following reason:

SimpleSumCombiner::compute() returns a value even if there are no
messages in the iterator (in this case it returns 0) and for this
reason the vertices get activated at each superstep.

At each superstep, under-the-hood, I pass the combiner for each vertex
an Iterable, which can be empty:

 public Iterable  getMessages(I vertexId) {
   Iterable  messages = inMessages.getMessages(vertexId);
   if (combiner != null) {
   M combinedMsg;
   try {
   combinedMsg = combiner.combine(vertexId, messages);
   }  catch (IOException e) {
   throw new RuntimeException("could not combine", e);
   }
   if (combinedMsg != null) {
   List  tmp = new ArrayList(1);
   tmp.add(combinedMsg);
   messages = tmp;
   } else {
   messages = new ArrayList(0);
   }
   }
   return messages;
 }

the Iterable returned by this methods is passed to
basicVertex.putMessages() right before the compute().
Now, the question is: who's wrong? The combiner code that returns a
sum of 0 over no values, or the framework that calls the combiner with
0 messages?










Re: on the semantics of the combiner

2012-01-09 Thread Avery Ching
Combiners should be commutative and associative.  In my opinion that 
means reducing to a single message or none at all.  Can you think of a 
case when more than 1 message should be returned from a combiner?  I 
know that returning null isn't preferable in general, but I think that 
functionality (returning no messages), is nice to have and isn't a huge 
amount of work on our side.


Avery

On 1/9/12 12:13 PM, Claudio Martella wrote:

To clarify, I was not discussing the possibility for combine to return
null. I see why it would be useful, given that combine returns M,
there's no other way to let combiner ask not to send any message,
although i agree with Jakob, I also believe returning null should be
avoided but only used, roughly, as an init value for a
reference/pointer.
Perhaps, we could, but i'm just thinking out loud here, let combine()
return Iterable, basicallly letting it define what to combine to
({0, 1, k } messages). It would be a powerful extension to the model,
but maybe it's too much.

As far as the size of the messages parameter, I agree with you that 0
messages gives nothing to combine and it would be somehow awkward, it
was more a matter of synching it with the other methods getting the
messages parameter.
Probably, having a more clear javadoc will do the job here.

What do you think?

On Mon, Jan 9, 2012 at 8:42 PM, Jakob Homan  wrote:

I'm not a big fan of returning null as it adds extra complexity to the
calling code (null checks, or not, since people usually will forget
them).  Avery is correct that combiners are application specific.  Is
it conceivable that one would want to write a combiner that returned
something for an input of no parameters, ie combining the empty list
doesn't return the empty list?  I imagine for most combiners,
combining a single message would result in that message.

On Mon, Jan 9, 2012 at 11:28 AM, Avery Ching  wrote:

The javadoc for VertexCombiner#combine() is

  /**
   * Combines message values for a particular vertex index.
   *
   * @param vertexIndex Index of the vertex getting these messages
   * @param msgList List of the messages to be combined
   * @return Message that is combined from {@link MsgList} or null if no
   * message it to be sent
   * @throws IOException
   */

I think we are somewhat vague on what a combiner can return to support
various use cases.  A combiner should be particular to a particular
compute() algorithm.  I think it should be legal to return null from a
combiner, in that case, no message should be sent to that vertex.

It seems like it would be an overhead to call a combiner when there are 0
messages.  I can't see a case where that would be useful.  Perhaps we should
change the javadoc to insure that msgList must contain at least one message
to have combine() being called.

Avery


On 1/9/12 5:37 AM, Claudio Martella wrote:

Hi Sebastian,

yes, that was my point, I agree completely with you.
Fixing my test was not the issue, my question was whether we want to
define explicitly the semantics of this scenario.
Personally, I believe the combiner should be ready to receive 0
messages, as it's the case of BasicVertex::initialize(), putMessages()
and compute(), and act accordingly.

In the particular example, I believe the SimpleSumCombiner is bugged.
It's true that the sum of no values is 0, but it's also true that the
null return semantics of combine() is more suitable for this exact
situation.


On Mon, Jan 9, 2012 at 2:21 PM, Sebastian Schelterwrote:

I think we currently implicitly assume that there is at least one
element in the Iterable passed to the combiner. The messaging code only
invokes the combiner only if at least one message for the target vertex
has been sent.

However, we should not rely on implicit implementation details but
explicitly specify the semantics of combiners.

--sebastian

On 09.01.2012 13:29, Claudio Martella wrote:

Hello list,

for GIRAPH-45 I'm touching the incoming messages and hit an
interesting problem with the combiner semantics.
currently, my code fails testBspCombiner for the following reason:

SimpleSumCombiner::compute() returns a value even if there are no
messages in the iterator (in this case it returns 0) and for this
reason the vertices get activated at each superstep.

At each superstep, under-the-hood, I pass the combiner for each vertex
an Iterable, which can be empty:

 public IterablegetMessages(I vertexId) {
   Iterablemessages = inMessages.getMessages(vertexId);
   if (combiner != null) {
   M combinedMsg;
   try {
   combinedMsg = combiner.combine(vertexId,
messages);
   }  catch (IOException e) {
   throw new RuntimeException("could not combine",
e);
   }
   if (combinedMsg != null) {
   Listtmp = new ArrayList(1);
   tmp.add(comb

Re: on the semantics of the combiner

2012-01-09 Thread Avery Ching
I agree that C&A doesn't require it, however, I can't think of why I 
would want to use a combiner to expand the number of messages.  Can you?


Avery

On 1/9/12 3:57 PM, Jakob Homan wrote:

In my opinion that means reducing to a single message or none at all.

C&A doesn't require this, however.  Hadoop's combiner interface, for
instance, doesn't require a single  or no value to be returned; it has
the same interface as a reducer, zero or more values.  Would adapting
the semantics of Giraph's combiner to return a list of messages
(possibly empty) make it more useful?

On Mon, Jan 9, 2012 at 3:21 PM, Claudio Martella
  wrote:

Yes, what is you say is completely reasonable, you convinced me :)

On Mon, Jan 9, 2012 at 11:28 PM, Avery Ching  wrote:

Combiners should be commutative and associative.  In my opinion that means
reducing to a single message or none at all.  Can you think of a case when
more than 1 message should be returned from a combiner?  I know that
returning null isn't preferable in general, but I think that functionality
(returning no messages), is nice to have and isn't a huge amount of work on
our side.

Avery


On 1/9/12 12:13 PM, Claudio Martella wrote:

To clarify, I was not discussing the possibility for combine to return
null. I see why it would be useful, given that combine returns M,
there's no other way to let combiner ask not to send any message,
although i agree with Jakob, I also believe returning null should be
avoided but only used, roughly, as an init value for a
reference/pointer.
Perhaps, we could, but i'm just thinking out loud here, let combine()
return Iterable, basicallly letting it define what to combine to
({0, 1, k } messages). It would be a powerful extension to the model,
but maybe it's too much.

As far as the size of the messages parameter, I agree with you that 0
messages gives nothing to combine and it would be somehow awkward, it
was more a matter of synching it with the other methods getting the
messages parameter.
Probably, having a more clear javadoc will do the job here.

What do you think?

On Mon, Jan 9, 2012 at 8:42 PM, Jakob Homanwrote:

I'm not a big fan of returning null as it adds extra complexity to the
calling code (null checks, or not, since people usually will forget
them).  Avery is correct that combiners are application specific.  Is
it conceivable that one would want to write a combiner that returned
something for an input of no parameters, ie combining the empty list
doesn't return the empty list?  I imagine for most combiners,
combining a single message would result in that message.

On Mon, Jan 9, 2012 at 11:28 AM, Avery Chingwrote:

The javadoc for VertexCombiner#combine() is

  /**
   * Combines message values for a particular vertex index.
   *
   * @param vertexIndex Index of the vertex getting these messages
   * @param msgList List of the messages to be combined
   * @return Message that is combined from {@link MsgList} or null if no
   * message it to be sent
   * @throws IOException
   */

I think we are somewhat vague on what a combiner can return to support
various use cases.  A combiner should be particular to a particular
compute() algorithm.  I think it should be legal to return null from a
combiner, in that case, no message should be sent to that vertex.

It seems like it would be an overhead to call a combiner when there are
0
messages.  I can't see a case where that would be useful.  Perhaps we
should
change the javadoc to insure that msgList must contain at least one
message
to have combine() being called.

Avery


On 1/9/12 5:37 AM, Claudio Martella wrote:

Hi Sebastian,

yes, that was my point, I agree completely with you.
Fixing my test was not the issue, my question was whether we want to
define explicitly the semantics of this scenario.
Personally, I believe the combiner should be ready to receive 0
messages, as it's the case of BasicVertex::initialize(), putMessages()
and compute(), and act accordingly.

In the particular example, I believe the SimpleSumCombiner is bugged.
It's true that the sum of no values is 0, but it's also true that the
null return semantics of combine() is more suitable for this exact
situation.


On Mon, Jan 9, 2012 at 2:21 PM, Sebastian Schelter
  wrote:

I think we currently implicitly assume that there is at least one
element in the Iterable passed to the combiner. The messaging code
only
invokes the combiner only if at least one message for the target
vertex
has been sent.

However, we should not rely on implicit implementation details but
explicitly specify the semantics of combiners.

--sebastian

On 09.01.2012 13:29, Claudio Martella wrote:

Hello list,

for GIRAPH-45 I'm touching the incoming messages and hit an
interesting problem with the combiner semantics.
currently, my code fails testBspCombiner for the following reason:

SimpleSumCombiner::compute() returns a value

Re: on the semantics of the combiner

2012-01-10 Thread Avery Ching
The general idea of combiners is to reduce the number of messages sent.  
Combiners are purely an optimization and the application should work 
correctly without it (since it's never guaranteed to actually be 
called).  Combiners can only modify the messages sent to a single 
vertex, so they can't send messages to other vertices.  Any other work 
(i.e. sending messages) should be done by the vertex in the compute() 
method.


While I think that grouping behavior could actually be implemented 
within a message object (still reducing the number of messages to 1 or 
0) I suppose that in some simple cases (i.e. grouping), it might be 
easier by doing it in the combiner as you both have mentioned?  The only 
thing I suppose I'm concerned about is letting users do something that 
is not optimal.  Generally, expanding messages is not what you want your 
combiner to do.  Also, since grouping behavior can be implemented in the 
message object, it forces users to avoid shooting themselves in the foot.


Good discussion (it's making me really think about this)!

Avery

On 1/10/12 10:32 AM, Claudio Martella wrote:

Ok, now i see where you're going. I guess that the thing here is that
the combiner would "act" like (on its behalf) D, and to do so
concretely it would probably need some local data related to D (edges
values? vertexvalue?).
I also think that k>  n is also possible in principle and we could let
the user decide whether to use this power or not, once/if we agree
that letting the user send k messages in the combiner is useful (and
the grouping behavior shown by the label propagation example should do
so).

On Tue, Jan 10, 2012 at 7:04 PM, Jakob Homan  wrote:

Those two messages would have gone to D, been expanded to, say, 4,
which would have then then been sent to, say, M.  This would save the
sending of the two to D and send the 4 directly to M.  I'm not saying
it's a great example, but it is legal.  This is of course assuming
that combiners can generate messages bound for vertices other than the
original destination, which I don't know if that has even been
discussed.

On Tue, Jan 10, 2012 at 9:49 AM, Claudio Martella
  wrote:

i'm not sure i understand what you'd save here. if the two messages
were going to be expanded to k messages on the destination worker D,
but you expand them on W, you end up sending k messages instead of 2.
right?

On Tue, Jan 10, 2012 at 6:26 PM, Jakob Homan  wrote:

it doesn't have to be expand, k, the number of elements returned by
the combiner, can still be smaller than n,

Right.  Grouping would be the most common case.  It would be possible
to be great than k, as well.  For instance, consider two messages,
both generated on the same worker (W) by two two different vertices,
both bound for another vertex, Z.  A combiner on W could get both of
these messages, do some work on them, as it would have knowledge of
both, and generate some arbitrary number of messages bound for other
vertices (thus saving the shuffle/transfer of the original messages).


On Tue, Jan 10, 2012 at 12:08 AM, Claudio Martella
  wrote:

it doesn't have to be expand, k, the number of elements returned by
the combiner, can still be smaller than n, the size of the messages
parameter. as a first example, you can imagine your vertex receiving
semantically-different classes/types of messages, and you can imagine
willing to be summarizing them in different messages, i.e. if your
messages come along with labels or just simply by the source vertex,
if required by the algorithm, think of label propagation to have just
an example, or some sort of labeled-pagerank.

On Tue, Jan 10, 2012 at 3:05 AM, Avery Ching  wrote:

I agree that C&A doesn't require it, however, I can't think of why I would
want to use a combiner to expand the number of messages.  Can you?

Avery


On 1/9/12 3:57 PM, Jakob Homan wrote:

In my opinion that means reducing to a single message or none at all.

C&A doesn't require this, however.  Hadoop's combiner interface, for
instance, doesn't require a single  or no value to be returned; it has
the same interface as a reducer, zero or more values.  Would adapting
the semantics of Giraph's combiner to return a list of messages
(possibly empty) make it more useful?

On Mon, Jan 9, 2012 at 3:21 PM, Claudio Martella
    wrote:

Yes, what is you say is completely reasonable, you convinced me :)

On Mon, Jan 9, 2012 at 11:28 PM, Avery Chingwrote:

Combiners should be commutative and associative.  In my opinion that
means
reducing to a single message or none at all.  Can you think of a case
when
more than 1 message should be returned from a combiner?  I know that
returning null isn't preferable in general, but I think that
functionality
(returning no messages), is nice to have and isn't a huge amount of work
on
our side.

Avery


On 1/9/12 12:13 PM, Claudio Martella wrote:

To clarify, 

Re: Fwd: Call for Submission Berlin Buzzwords 2012all for Submission Berlin Buzzwords - http://berlinbuzzwords.de

2012-01-11 Thread Avery Ching
I can't make it, but would be willing to help with slides if necessary.  
It would be great if someone could talk about Giraph.


Avery

On 1/11/12 12:26 PM, Sebastian Schelter wrote:

Forwarding Simon's call for Berlin Buzzwords.

Does anybody plan to give a talk about Giraph at Buzzwords? I'll
definitely be at the conference as I'm living in Berlin. We should
also try to organize a Giraph meeting in the evening maybe together
with the Mahout people.

Best,
Sebastian


-- Forwarded message --
From: Simon Willnauer
Date: 2012/1/11
Subject: Call for Submission Berlin Buzzwords 2012all for Submission
Berlin Buzzwords - http://berlinbuzzwords.de
To: java-user, d...@lucene.apache.org,
solr-u...@lucene.apache.org, mahout-...@lucene.apache.org,
lucy-...@incubator.apache.org, lucy-u...@incubator.apache.org,
mapreduce-u...@hadoop.apache.org, hdfs-u...@hadoop.apache.org,
hdfs-...@hadoop.apache.org, mapreduce-...@hadoop.apache.org,
gene...@lucene.apache.org


Call for Submission Berlin Buzzwords 2012 - Search, Store, Scale  --
June 4 / 5. 2012

The event will comprise presentations on scalable data processing. We
invite you to submit talks on the topics:
  * IR / Search - Lucene, Solr, katta, ElasticSearch or comparable solutions
  * NoSQL - like CouchDB, MongoDB, Jackrabbit, HBase and others
  * Hadoop - Hadoop itself, MapReduce, Cascading or Pig and relatives

Related topics not explicitly listed above are more than welcome. We are
looking for presentations on the implementation of the systems
themselves, technical talks,
real world applications and case studies.

Important Dates (all dates in GMT +2)
  * Submission deadline: March 11th 2012, 23:59 MEZ
  * Notification of accepted speakers: April 6st, 2012, MEZ
  * Publication of final schedule: April 13th, 2012
  * Conference: June 4/5. 2012

High quality, technical submissions are called for, ranging from
principles to practice. We are looking for real world use cases,
background on the architecture of specific projects and a deep dive
into architectures built on top of e.g. Hadoop clusters.

To submit your proposal please register to our website [1] and log in
[2] once you received the confirmation email. Once this is done you
can submit your proposal here [3]; please do so no later than March
11th, 2012. Acceptance notifications will be sent out soon after the
submission deadline. Please include your name, bio and email, the
title of the talk, a brief abstract in English language. Please
indicate whether you want to give a lightning (10min), short (20min)
or long (40min) presentation and indicate the level of experience with
the topic your audience should have (e.g. whether your talk will be
suitable for newbies or is targeted for experienced users.) If you'd
like to pitch your brand new product in your talk, please let us know
as well -
there will be extra space for presenting new ideas, awesome products
and great new projects.

The presentation format is short. We will be enforcing the schedule rigorously.

If you are interested in sponsoring the event (e.g. we would be happy
to provide videos after the event, free drinks for attendees as well
as an after-show party), please contact us.

Follow @berlinbuzzwords on Twitter for updates. Tickets, news on the
conference, and the final schedule are be published at
http://berlinbuzzwords.de.

Program Committee Chairs:

  *  Isabel Drost (Nokia&  Apache Mahout)
  *  Jan Lehnardt (CouchBase&  Apache CouchDB)
  *  Simon Willnauer (SearchWorkings&  Apache Lucene)
  *  Grant Ingersoll (Lucid Imagination&  Apache Lucene)
  *  Owen O’Malley (Yahoo Inc.&  Apache Hadoop)
  *  Jim Webber (Neo Technology&  Neo4j)
  *  Sean Treadway (Soundcloud)


Please re-distribute this CfP to people who might be interested.

Contact us at:

newthinking communications
GmbH Schönhauser Allee 6/7
10119 Berlin,
Germany
Julia Gemählich
Isabel Drost
Simon Willnauer
  +49(0)30-9210 596

[1] http://berlinbuzzwords.de/user/register
[2] http://berlinbuzzwords.de/user
[3] http://berlinbuzzwords.de/node/add/session




Re: why we should remove implicit vertex creation

2012-01-12 Thread Avery Ching

Claudio,

You are right that vertices are created automatically when messages are 
sent to non-existent vertices.  But that behavior can be made 
application specific.  The default resolution of mutations/messages is 
VertexResolver.  But you are always welcome to implement your own 
application specific behavior.  For instance, you might just want to 
drop the message.  If there is a simultaneous create/delete, you may 
want to always create.  You have the power to implement any behavior you 
want by setting the vertex resolver (see 
GiraphJob#setVertexResolverClass()).


Hope this helps,

Avery

On 1/12/12 3:42 PM, Claudio Martella wrote:

Hello Giraphers,

I have a few comments about the current design of Giraph regarding the
implicit creation of vertices.
As it's currently designed, if you send a message to a non-existent
vertices, Giraph creates it for you.
Although I can understand it can get handy as it allows for lazy
dataset creation, I think it comes at some cost and I believe this
cost is bigger than the advantage:

1) it overlaps the mutation API, where a vertex can be created
explicitly when the semantics of the algorithm require it, with
knowledge about what's going on and with explicit state. This is an
ambiguous and unclear part of the API which is difficult for me to
justify and probably confusing for the user too. Which brings me to
the second point.

2) it requires a different, and partially duplicate,code path for
mutations and implicit vertex creation in our code, as it's clear by
looking at BasicRPCCommunication and as it's been experienced
currently by me in the email I recently sent to the list. Which brings
me to the third point.

3) in order to manage this, for every message we have to hit, sooner
or later, the Worker vertices set to see if the vertex is existing and
whether it should be implicitly created. This is computationally
expensive both if you have a HashMap but also if you have a TreeMap
for range partitioning. Also, if we're going to create more exotic
partitioning (topology-partitioning?), we're going to hit the problem
more.

In general, I don't know any graph API that doesn't require to either
list explicitly the vertex set at load or to create the vertex
explicitly through API. As I said, I understand it allows for lazy
creation of the input file, with possibly missing vertices explicitly
enlisted (missing as a source vertex but existing as an endpoint for
an edge), but this could be really fixed robustly by a single
MapReduce job.

What do you guys think?





Re: why we should remove implicit vertex creation

2012-01-13 Thread Avery Ching

Claudio,

What are you advocating in particular?

Graph mutation should be allowed (i.e. adding vertices).  We allow this 
to happen through the addVertexReq() interface and through the 
VertexResolver implementation (say for messages to non-existent 
vertices).  I can see why this would be useful.  Imagine you are 
computing page rank on the web graph, but you only have a subset of the 
sites, but all the outlinks for each site.  It is nice to be able to 
allow new vertices (sites) while running the application.


I agree that the way that vertices are created and initialized is a bit 
vague.  We can work on improving the interfaces if anyone has suggestions.


Avery

On 1/13/12 12:10 AM, Claudio Martella wrote:

Hi Avery,

thanks for your feedback. I know that users can decide to drop this
behavior, but this doesn't mean that those three points don't hold, to
me.

On Fri, Jan 13, 2012 at 8:35 AM, Avery Ching  wrote:

Claudio,

You are right that vertices are created automatically when messages are sent
to non-existent vertices.  But that behavior can be made application
specific.  The default resolution of mutations/messages is VertexResolver.
  But you are always welcome to implement your own application specific
behavior.  For instance, you might just want to drop the message.  If there
is a simultaneous create/delete, you may want to always create.  You have
the power to implement any behavior you want by setting the vertex resolver
(see GiraphJob#setVertexResolverClass()).

Hope this helps,

Avery


On 1/12/12 3:42 PM, Claudio Martella wrote:

Hello Giraphers,

I have a few comments about the current design of Giraph regarding the
implicit creation of vertices.
As it's currently designed, if you send a message to a non-existent
vertices, Giraph creates it for you.
Although I can understand it can get handy as it allows for lazy
dataset creation, I think it comes at some cost and I believe this
cost is bigger than the advantage:

1) it overlaps the mutation API, where a vertex can be created
explicitly when the semantics of the algorithm require it, with
knowledge about what's going on and with explicit state. This is an
ambiguous and unclear part of the API which is difficult for me to
justify and probably confusing for the user too. Which brings me to
the second point.

2) it requires a different, and partially duplicate,code path for
mutations and implicit vertex creation in our code, as it's clear by
looking at BasicRPCCommunication and as it's been experienced
currently by me in the email I recently sent to the list. Which brings
me to the third point.

3) in order to manage this, for every message we have to hit, sooner
or later, the Worker vertices set to see if the vertex is existing and
whether it should be implicitly created. This is computationally
expensive both if you have a HashMap but also if you have a TreeMap
for range partitioning. Also, if we're going to create more exotic
partitioning (topology-partitioning?), we're going to hit the problem
more.

In general, I don't know any graph API that doesn't require to either
list explicitly the vertex set at load or to create the vertex
explicitly through API. As I said, I understand it allows for lazy
creation of the input file, with possibly missing vertices explicitly
enlisted (missing as a source vertex but existing as an endpoint for
an edge), but this could be really fixed robustly by a single
MapReduce job.

What do you guys think?








Re: on the semantics of the combiner

2012-01-13 Thread Avery Ching

+1

I'm fine with this.  If we agree to return an Iterable, then we should 
make sure to either throw if the size of the Iterable > messages.size() 
to at the very least LOG.warn("This combiner is likely to be implemented 
wrong").  I prefer an exception, since we have no use case for expanding 
the set of messages.


Also, I'd like to have something in the javadoc saying something like 
"While the number of messages returned can be equal to the same number 
of messages that was inputted, the purpose of the combiner is to reduced 
the number of messages from the input."


Avery

On 1/13/12 9:34 AM, Claudio Martella wrote:

Ok,

I guess we can vote then about this, what do you think?
Shall we take 72h?

I'm +1 for returning an iterable that can be empty.
I'm +1 for the returned iterable to be<= messages.size()


On Tue, Jan 10, 2012 at 9:48 PM, Sebastian Schelter  wrote:

I think we should make the combiner return a list/iterable that can
potentially be empty. However we should assume that the number of
elements returned is smaller than or equal to the number of input
elements (whats the use of a combiner if this is not given?). I also
concur that the code should not depend on the combiner being applied
(similar to the way combiners work in hadoop).

--sebastian

2012/1/10 Jakob Homan:

A composite object would essentially be a wrapper around a list and
introduce the need for all vertices to be ready to extract that list
at all times.  For instance, a combiner passed 10 messages may be able
to combine 7 of them but do nothing with the other three, leaving four
messages.  If we allow zero or one return elements, the combiner would
have to create a composite object with a list of those four messages,
whereas if we return a list, it just skips that step and returns the
four messages.  Additionally, the receiving vertex would have to
handle the possibility of a composite object every time even though
the combiner may or may not have been run during the superstep, or
even included in that job (since combiners are optional to the job
itself).  It would be better if one could write a Giraph application
that was completely agnostic of whether or not a combiner was
included.

On Tue, Jan 10, 2012 at 12:00 PM, Claudio Martella
  wrote:

I believe the argument of not letting users shoot their foot doesn't
stand :) Once you give them any API they have the power to do anything
wrong, as they already can with Giraph (or anything else for what it
matters), by designing an algorithm wrongly (which would be what it
would turn out to be a wrong combiner). It's definitely true that a
composite object would make the grouping (List) but I thought
we were talking about simplifying life to users :). I think it would
be more flexible (for the present and for the future) and also more
elegant,  but not necessarily a must (although it'd come practically
for free).

Very cool discussion.

On Tue, Jan 10, 2012 at 8:30 PM, Jakob Homan  wrote:

Combiners can only modify the messages sent to a single vertex, so they can't 
send messages to other vertices.

Yeah, the more I've thought about this, the more problematic it would
be.  These new messages may be generated upon arrival at the
destination vertex (since combiners can be run on the receiving vertex
before processing as well).  When would they be forwarded to their new
destinations at that point?  It would be possible to get into a
feedback loop of messages jumping around before a superstep could ever
actually be done.

That being said, our inability to think of a good application doesn't
mean there won't be one in the future, and it's probably better to be
more flexible than try to impose what appears optimal now.  The
benefit of forcing 0 or 1 message from a combiner seems less than the
flexibility of allowing another list of messages (which may or may not
be the same number of elements as the original, less than, or even
more than).


Good discussion (it's making me really think about this)!

Agreed.


On Tue, Jan 10, 2012 at 11:23 AM, Avery Ching  wrote:

The general idea of combiners is to reduce the number of messages sent.
  Combiners are purely an optimization and the application should work
correctly without it (since it's never guaranteed to actually be called).
  Combiners can only modify the messages sent to a single vertex, so they
can't send messages to other vertices.  Any other work (i.e. sending
messages) should be done by the vertex in the compute() method.

While I think that grouping behavior could actually be implemented within a
message object (still reducing the number of messages to 1 or 0) I suppose
that in some simple cases (i.e. grouping), it might be easier by doing it in
the combiner as you both have mentioned?  The only thing I suppose I'm
concerned about is letting users do something that is not optimal.
  Generally, expanding messages is not w

Re: why we should remove implicit vertex creation

2012-01-13 Thread Avery Ching

Inline responses.

Happy Friday,

Avery

On 1/13/12 10:51 AM, Claudio Martella wrote:

Hi Avery,

thanks for your feedback.

I'm advocating for allowing mutations only through Mutable interface
methods. I agree that it can come handy to have the implicit vertex
creation, for the reason you mentioned (which I called lazy inputset
creation), but you can obtain the same through a simple  single M/R
job run in advance.
I think this is pretty expensive (extra MR job).  Users do have this 
option, but I doubt many would take it when they don't have to.



What we win back is that we don't have the
computational cost, and code complexity of checking if the vertex
exists already for each message we get.
Checking if the vertex exists is pretty cheap in a hashmap (constant 
time).  We should verify that this is a computational overhead (maybe 
some profiling) before optimizing it.  I suppose we could add a switch 
to bypass any graph mutation in general.

You know what I mean?

On Fri, Jan 13, 2012 at 7:44 PM, Avery Ching  wrote:

Claudio,

What are you advocating in particular?

Graph mutation should be allowed (i.e. adding vertices).  We allow this to
happen through the addVertexReq() interface and through the VertexResolver
implementation (say for messages to non-existent vertices).  I can see why
this would be useful.  Imagine you are computing page rank on the web graph,
but you only have a subset of the sites, but all the outlinks for each site.
  It is nice to be able to allow new vertices (sites) while running the
application.

I agree that the way that vertices are created and initialized is a bit
vague.  We can work on improving the interfaces if anyone has suggestions.

Avery


On 1/13/12 12:10 AM, Claudio Martella wrote:

Hi Avery,

thanks for your feedback. I know that users can decide to drop this
behavior, but this doesn't mean that those three points don't hold, to
me.

On Fri, Jan 13, 2012 at 8:35 AM, Avery Chingwrote:

Claudio,

You are right that vertices are created automatically when messages are
sent
to non-existent vertices.  But that behavior can be made application
specific.  The default resolution of mutations/messages is
VertexResolver.
  But you are always welcome to implement your own application specific
behavior.  For instance, you might just want to drop the message.  If
there
is a simultaneous create/delete, you may want to always create.  You have
the power to implement any behavior you want by setting the vertex
resolver
(see GiraphJob#setVertexResolverClass()).

Hope this helps,

Avery


On 1/12/12 3:42 PM, Claudio Martella wrote:

Hello Giraphers,

I have a few comments about the current design of Giraph regarding the
implicit creation of vertices.
As it's currently designed, if you send a message to a non-existent
vertices, Giraph creates it for you.
Although I can understand it can get handy as it allows for lazy
dataset creation, I think it comes at some cost and I believe this
cost is bigger than the advantage:

1) it overlaps the mutation API, where a vertex can be created
explicitly when the semantics of the algorithm require it, with
knowledge about what's going on and with explicit state. This is an
ambiguous and unclear part of the API which is difficult for me to
justify and probably confusing for the user too. Which brings me to
the second point.

2) it requires a different, and partially duplicate,code path for
mutations and implicit vertex creation in our code, as it's clear by
looking at BasicRPCCommunication and as it's been experienced
currently by me in the email I recently sent to the list. Which brings
me to the third point.

3) in order to manage this, for every message we have to hit, sooner
or later, the Worker vertices set to see if the vertex is existing and
whether it should be implicitly created. This is computationally
expensive both if you have a HashMap but also if you have a TreeMap
for range partitioning. Also, if we're going to create more exotic
partitioning (topology-partitioning?), we're going to hit the problem
more.

In general, I don't know any graph API that doesn't require to either
list explicitly the vertex set at load or to create the vertex
explicitly through API. As I said, I understand it allows for lazy
creation of the input file, with possibly missing vertices explicitly
enlisted (missing as a source vertex but existing as an endpoint for
an edge), but this could be really fixed robustly by a single
MapReduce job.

What do you guys think?










Re: Blueprint Support

2012-01-17 Thread Avery Ching

Hi Jeff,

Thanks for pinging us.  While we don't have a JIRA for Blueprints, we 
definitely thought it would be a great idea in the past.  Arun Suresh 
(arun.sur...@gmail.com) looked into this briefly I believe (cc'ed 
here).  Hopefully he can give more details.


Avery


On 1/17/12 8:44 AM, Jeff G wrote:

Any plans to add TinkerPop Blueprint support to Giraph?  This would be a
huge gain for the graph development community when wanting to test and
upgrade to something like Haddop and Giraph.
https://github.com/tinkerpop/blueprints/wiki/

Blueprints is a collection of interfaces, implementations, ouplementations,
and test suites for the property graph data model. Blueprints is analogous
to the JDBC, but for graph databases. As such, it provides a common set of
interfaces to allow developers to plug-and-play their graph database
backend. Moreover, software written atop Blueprints works over all
Blueprints-enabled graph databases.

- Jeff G





Re: [jira] [Created] (GIRAPH-127) Extending the API with a master.compute() function.

2012-01-19 Thread Avery Ching
Not sure if Semih is on the giraph-dev list.  Forwarding the question to 
him.


Avery

P.S.  Interesting idea if I understand correctly, attaching the compute 
functionality to an aggregator that the master will run between supersteps?


On 1/19/12 1:20 PM, Claudio Martella wrote:

Hi Semih,

interesting email. I'm probably not getting your technique right, but
why wouldn't it be possible to compute the master.compute() inside of
an aggregator?

Not only it *should* be possible, but as aggregators are computed both
on workers AND on the master, you should have a faster computation.
for instance you could aggregate the number of cut edges on each
worker and aggregate the total number on the master. Same could happen
for choosing the centroids.

On Thu, Jan 19, 2012 at 9:52 PM, Semih Salihoglu (Created) (JIRA)
  wrote:

Extending the API with a master.compute() function.
---

 Key: GIRAPH-127
 URL: https://issues.apache.org/jira/browse/GIRAPH-127
 Project: Giraph
  Issue Type: New Feature
  Components: bsp, examples, graph
Reporter: Semih Salihoglu


First of all, sorry for the long explanation to this feature.

I want to expand the API of Giraph with a new function called master.compute(), 
that would get called at the master before each superstep and I will try to 
explain the purpose that it would serve with an example. Let's say we want to 
implement the following simplified version of the k-means clustering algorithm. 
Pseudocode below:
  * Input G(V, E), k, numEdgesThreshold, maxIterations
  * Algorithm:
  * int numEdgesCrossingClusters = Integer.MAX_INT;
*  int iterationNo = 0;
  * while ((numEdgesCrossingCluster>  numEdgesThreshold)&&  iterationNo<  
maxIterations) {
  *iterationNo++;
  *int[] clusterCenters = pickKClusterCenters(k, G);
  *findClusterCenters(G, clusterCenters);
  *numEdgesCrossingClusters = countNumEdgesCrossingClusters();
  * }
The algorithm goes through the following steps in iterations:
1) Pick k random initial cluster centers
2) Assign each vertex to the cluster center that it's closest to (in Giraph, 
this can be implemented in message passing similar to how ShortestPaths is 
implemented):
3) Count the nuimber of edges crossing clusters
4) Go back to step 1, if there are a lot of edges crossing clusters and we 
haven't exceeded maximum number of iterations yet.

In an algorithm like this, step 2 and 3 are where most of the work happens and 
both parts have very neat message-passing implementations. I'll try to give an 
overview without going into the details. Let's say we define a Vertex in Giraph 
to hold a custom Writable object that holds 2 integer values and sends a 
message with upto 2 integer values.
Step 2 is very similar to ShortestPaths algorithm and has two stages: In the 
first stage, each vertex checks to see whether or not it's one of the cluster 
centers. If so, it assigns itself the value (id, 0), otherwise it assigns 
itself (Null, Null). In the 2nd stage, the vertices assign themselves to the 
minimum distance cluster center by looking at their neighbors (cluster centers, 
distance) values (received as 2 integer messages) and their current values, and 
changing their values if they find a lower distance cluster center. This 
happens in x number of supersteps until every vertex converges.
Step 3, counting the number of edges crossing clusters, is also very easy to implement in Giraph. 
Once each vertex has a cluster center, the number of edges crossing clusters can be counted by an 
aggregator, let's say called "num-edges-crossing". It would again have two stages: First 
stage, every vertex just sends its cluster id to all its neighbors. Second stage, every vertex 
looks at their neighbors' cluster ids in the messages, and for each cluster id that is not equal to 
its own cluster id, it increments "num-edges-crossing" by 1.

The other 2 steps, step 1 and 4, are very simple sequential computations. Step 1 just 
picks k random vertex ids and puts it into an aggregator. Step 4 just compares 
"num-edges-crossing" by a threshold and also checks whether or not the 
algorithm has exceeded maxIterations (not supersteps but iterations of going through 
Steps 1-4). With the current API, it's not clear where to do these computations. There is 
a per worker function preSuperstep() that can be implemented, but if we decide to pick a 
special worker, let's say worker 1,  to pick the k vertices then we'd waste an entire 
superstep where only worker 1 would do work, (by picking k vertices  in preSuperstep() 
and put them into an aggregator), and all other workers would be idle. Trying to do this 
in worker 1 in postSuperstep() would not work either because, worker 1 needs to know that 
all the vertices have converged to understand that it's time to pick k vertices or it's 
time do check in step 4, which would only be available to it in the begi

Review Request: GIRAPH-128: RPC port from BasicRPCCommunications should be only a starting port, and retried

2012-01-23 Thread Avery Ching

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3596/
---

Review request for giraph.


Summary
---

Simple handling of port collisions on the same machine while preserving 
debugability from the port number alone.  Round up the max number of workers to 
the next power of 10 and use it as a constant to increase the port number with.

Added a unit test to ensure it is working correctly.

Fixed 2 minor warnings on
src/main/java/org/apache/giraph/examples/MinimumIntCombiner.java
src/main/java/org/apache/giraph/examples/SimpleSumCombiner.java

of removing 'import java.util.List'.


This addresses bug GIRAPH-128.
https://issues.apache.org/jira/browse/GIRAPH-128


Diffs
-

  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
 1234970 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/MinimumIntCombiner.java
 1234970 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleSumCombiner.java
 1234970 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/comm/RPCCommunicationsTest.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/3596/diff


Testing
---

Passed local and MR unittests.


Thanks,

Avery



Re: Review Request: GIRAPH-124: Combiner should return Iterable instead of M or null.

2012-01-23 Thread Avery Ching

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3592/#review4538
---

Ship it!


Looks great!  One very minor thing.  Before you commit, can you please prefix 
the exception messages with "run: "? The other exception messages always start 
with the method name and then a colon.  You don't need to resubmit a review, 
just make those changes and please commit.  Thanks!


trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java


please prefix with "run: "



trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java


please prefix with "run: "



trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java


please prefix with "run: "



trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java


please prefix with "run: "


- Avery


On 2012-01-23 20:38:48, Claudio Martella wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/3592/
> ---
> 
> (Updated 2012-01-23 20:38:48)
> 
> 
> Review request for giraph.
> 
> 
> Summary
> ---
> 
> Fixed the null check on the returned value and the javadoc..
> 
> 
> Diffs
> -
> 
>   trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 
> 1234376 
>   trunk/src/main/java/org/apache/giraph/examples/MinimumIntCombiner.java 
> 1234376 
>   trunk/src/main/java/org/apache/giraph/examples/SimpleSumCombiner.java 
> 1234376 
>   trunk/src/main/java/org/apache/giraph/graph/VertexCombiner.java 1234376 
>   trunk/src/test/java/org/apache/giraph/TestVertexTypes.java 1234376 
>   trunk/src/test/java/org/apache/giraph/examples/MinimumIntCombinerTest.java 
> 1234376 
> 
> Diff: https://reviews.apache.org/r/3592/diff
> 
> 
> Testing
> ---
> 
> local and MR unit tests
> 
> 
> Thanks,
> 
> Claudio
> 
>



Re: Review Request: GIRAPH-128: RPC port from BasicRPCCommunications should be only a starting port, and retried

2012-01-24 Thread Avery Ching

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3596/
---

(Updated 2012-01-24 21:53:06.906563)


Review request for giraph.


Changes
---

Updated after GIRAPH-124 was committed.


Summary
---

Simple handling of port collisions on the same machine while preserving 
debugability from the port number alone.  Round up the max number of workers to 
the next power of 10 and use it as a constant to increase the port number with.

Added a unit test to ensure it is working correctly.

Fixed 2 minor warnings on
src/main/java/org/apache/giraph/examples/MinimumIntCombiner.java
src/main/java/org/apache/giraph/examples/SimpleSumCombiner.java

of removing 'import java.util.List'.


This addresses bug GIRAPH-128.
https://issues.apache.org/jira/browse/GIRAPH-128


Diffs (updated)
-

  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
 1235026 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/MinimumIntCombiner.java
 1235026 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleSumCombiner.java
 1235026 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/comm/RPCCommunicationsTest.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/3596/diff


Testing
---

Passed local and MR unittests.


Thanks,

Avery



Re: Review Request: GIRAPH-128: RPC port from BasicRPCCommunications should be only a starting port, and retried

2012-01-27 Thread Avery Ching

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3596/
---

(Updated 2012-01-28 01:15:26.114994)


Review request for giraph.


Changes
---

Removed whitspace changes for MinimumIntCombiner.java and 
SimpleSumCombiner.java and made GiraphJob.MAX_RPC_PORT_BIND_ATTEMPTS 
configurable, but default to 20.


Summary
---

Simple handling of port collisions on the same machine while preserving 
debugability from the port number alone.  Round up the max number of workers to 
the next power of 10 and use it as a constant to increase the port number with.

Added a unit test to ensure it is working correctly.

Fixed 2 minor warnings on
src/main/java/org/apache/giraph/examples/MinimumIntCombiner.java
src/main/java/org/apache/giraph/examples/SimpleSumCombiner.java

of removing 'import java.util.List'.


This addresses bug GIRAPH-128.
https://issues.apache.org/jira/browse/GIRAPH-128


Diffs (updated)
-

  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
 1236935 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java
 1236935 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/comm/RPCCommunicationsTest.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/3596/diff


Testing
---

Passed local and MR unittests.


Thanks,

Avery



Re: Review Request: GIRAPH-128: RPC port from BasicRPCCommunications should be only a starting port, and retried

2012-01-27 Thread Avery Ching

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3596/
---

(Updated 2012-01-28 03:01:01.809862)


Review request for giraph.


Changes
---

Addressed the mockito suggestion.


Summary
---

Simple handling of port collisions on the same machine while preserving 
debugability from the port number alone.  Round up the max number of workers to 
the next power of 10 and use it as a constant to increase the port number with.

Added a unit test to ensure it is working correctly.

Fixed 2 minor warnings on
src/main/java/org/apache/giraph/examples/MinimumIntCombiner.java
src/main/java/org/apache/giraph/examples/SimpleSumCombiner.java

of removing 'import java.util.List'.


This addresses bug GIRAPH-128.
https://issues.apache.org/jira/browse/GIRAPH-128


Diffs (updated)
-

  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
 1236935 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java
 1236935 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/comm/RPCCommunicationsTest.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/3596/diff


Testing
---

Passed local and MR unittests.


Thanks,

Avery



Re: svn commit: r1238775 - /incubator/giraph/branches/branch-0.1/

2012-01-31 Thread Avery Ching
Thanks again for doing the release Jakob.  It would be awesome if you 
could keep documentation on the steps you are taking so future releases 
will be easy.


Avery

On 1/31/12 11:54 AM, jgho...@apache.org wrote:

Author: jghoman
Date: Tue Jan 31 19:54:50 2012
New Revision: 1238775

URL: http://svn.apache.org/viewvc?rev=1238775&view=rev
Log:
Branching from trunk at r1238773 for 0.1 release.

Added:
 incubator/giraph/branches/branch-0.1/
   - copied from r1238774, incubator/giraph/trunk/





Re: [VOTE] Release Giraph 0.1-incubating (rc0)

2012-01-31 Thread Avery Ching
To address the issues of binaries, could we release multiple binaries of 
Giraph that coincide with the different versions of Hadoop?


On 1/31/12 7:44 PM, David Garcia wrote:

I think these concerns preclude the entire idea of a release.  A release
should be something that users can use as a dependency. . .like a maven
coordinate.  I think you guys should wait until you have made these
decisions. . .and then cut a binary.

On 1/31/12 5:36 PM, "Jakob Homan"  wrote:


Giraphers-
I've created a candidate for our first release. It's a source release
without a binary for two reasons: first, there's still discussion
going on about what needs to be done for the NOTICE and LICENSE files
for projects that bring in transitive dependencies to the binary
release
(http://www.mail-archive.com/general@incubator.apache.org/msg32693.html)
and second because we're still munging our binary against three types
of Hadoop, which would mean we'd need to release three different
binary artifacts, which seems suboptimal.  Hopefully both of these
issues will be addressed by 0.2.

I've tested the release against an unsecure 20.2 cluster.  It'd be
great to test it against other configurations.  Note that we're voting
on the tag; the files are provided as a convenience.

Release notes:
http://people.apache.org/~jghoman/giraph-0.1.0-incubating-rc0/RELEASE_NOTE
S.html

Release artifacts:
http://people.apache.org/~jghoman/giraph-0.1.0-incubating-rc0/

Corresponding svn tag:
http://svn.apache.org/repos/asf/incubator/giraph/tags/release-0.1-rc0/

Our signing keys (my key doesn't seem to be being picked up by
http://people.apache.org/keys/group/giraph.asc):
http://svn.apache.org/repos/asf/incubator/giraph/KEYS

The vote runs for 72 hours, until Friday 4pm PST.  After a successful
vote here, Incubator will vote on the release as well.

Thanks,
Jakob




Re: [VOTE] Release Giraph 0.1-incubating (rc0)

2012-02-02 Thread Avery Ching

+1.
I'm fine with this.

Avery

On 1/31/12 8:45 PM, Jakob Homan wrote:

I think these concerns preclude the entire idea of a release.

As mentioned above, we're releasing a tag (a specific svn revision).
That is what the release is.  Both src .tar.gz and binary files are
courtesies.


A release should be something that users can use as a dependency. . .like a 
maven coordinate.

A source release in no way prevents us from creating jars of the
release and adding them to Apache's maven repo.  In fact, we can't add
a jar until we have a release.


I think you guys should wait until you have made these decisions

If you would like to assist with moving away from the munging, there
is an open JIRA to do so.  Any effort would be appreciated.


To address the issues of binaries, could we release multiple binaries of Giraph 
that coincide with the different versions of Hadoop?

Adding in external dependencies for a binary release (and even just
for a source release with jars that couldn't be brought in via
maven/sbt) caused significant delay recently for Kafka.  I'd like to
avoid that here.  Also, since we intend to release early and often,
there's no reason we can't follow up with a 0.2 in short order - there
are going to be a lot of patches in the next few weeks.


On Tue, Jan 31, 2012 at 8:17 PM, Avery Ching  wrote:

To address the issues of binaries, could we release multiple binaries of
Giraph that coincide with the different versions of Hadoop?


On 1/31/12 7:44 PM, David Garcia wrote:

I think these concerns preclude the entire idea of a release.  A release
should be something that users can use as a dependency. . .like a maven
coordinate.  I think you guys should wait until you have made these
decisions. . .and then cut a binary.

On 1/31/12 5:36 PM, "Jakob Homan"wrote:


Giraphers-
I've created a candidate for our first release. It's a source release
without a binary for two reasons: first, there's still discussion
going on about what needs to be done for the NOTICE and LICENSE files
for projects that bring in transitive dependencies to the binary
release
(http://www.mail-archive.com/general@incubator.apache.org/msg32693.html)
and second because we're still munging our binary against three types
of Hadoop, which would mean we'd need to release three different
binary artifacts, which seems suboptimal.  Hopefully both of these
issues will be addressed by 0.2.

I've tested the release against an unsecure 20.2 cluster.  It'd be
great to test it against other configurations.  Note that we're voting
on the tag; the files are provided as a convenience.

Release notes:

http://people.apache.org/~jghoman/giraph-0.1.0-incubating-rc0/RELEASE_NOTE
S.html

Release artifacts:
http://people.apache.org/~jghoman/giraph-0.1.0-incubating-rc0/

Corresponding svn tag:
http://svn.apache.org/repos/asf/incubator/giraph/tags/release-0.1-rc0/

Our signing keys (my key doesn't seem to be being picked up by
http://people.apache.org/keys/group/giraph.asc):
http://svn.apache.org/repos/asf/incubator/giraph/KEYS

The vote runs for 72 hours, until Friday 4pm PST.  After a successful
vote here, Incubator will vote on the release as well.

Thanks,
Jakob






Re: [VOTE] Release Giraph 0.1-incubating (rc0)

2012-02-02 Thread Avery Ching
I've run the tests for branch-0.1 and tested against PageRankBenchmark 
against a Facebook Hadoop instance.  I'm +1'ing both the release and the 
source release idea.


Avery

On 2/2/12 2:26 PM, Jakob Homan wrote:

Are you +1ing the release, or just the idea of having a source release
in general?

The vote ends tomorrow, so it would be great if the committers and
mentors could take a look...


On Thu, Feb 2, 2012 at 2:18 PM, Avery Ching  wrote:

+1.
I'm fine with this.

Avery


On 1/31/12 8:45 PM, Jakob Homan wrote:

I think these concerns preclude the entire idea of a release.

As mentioned above, we're releasing a tag (a specific svn revision).
That is what the release is.  Both src .tar.gz and binary files are
courtesies.


A release should be something that users can use as a dependency. . .like
a maven coordinate.

A source release in no way prevents us from creating jars of the
release and adding them to Apache's maven repo.  In fact, we can't add
a jar until we have a release.


I think you guys should wait until you have made these decisions

If you would like to assist with moving away from the munging, there
is an open JIRA to do so.  Any effort would be appreciated.


To address the issues of binaries, could we release multiple binaries of
Giraph that coincide with the different versions of Hadoop?

Adding in external dependencies for a binary release (and even just
for a source release with jars that couldn't be brought in via
maven/sbt) caused significant delay recently for Kafka.  I'd like to
avoid that here.  Also, since we intend to release early and often,
there's no reason we can't follow up with a 0.2 in short order - there
are going to be a lot of patches in the next few weeks.


On Tue, Jan 31, 2012 at 8:17 PM, Avery Chingwrote:

To address the issues of binaries, could we release multiple binaries of
Giraph that coincide with the different versions of Hadoop?


On 1/31/12 7:44 PM, David Garcia wrote:

I think these concerns preclude the entire idea of a release.  A release
should be something that users can use as a dependency. . .like a maven
coordinate.  I think you guys should wait until you have made these
decisions. . .and then cut a binary.

On 1/31/12 5:36 PM, "Jakob Homan"  wrote:


Giraphers-
I've created a candidate for our first release. It's a source release
without a binary for two reasons: first, there's still discussion
going on about what needs to be done for the NOTICE and LICENSE files
for projects that bring in transitive dependencies to the binary
release

(http://www.mail-archive.com/general@incubator.apache.org/msg32693.html)
and second because we're still munging our binary against three types
of Hadoop, which would mean we'd need to release three different
binary artifacts, which seems suboptimal.  Hopefully both of these
issues will be addressed by 0.2.

I've tested the release against an unsecure 20.2 cluster.  It'd be
great to test it against other configurations.  Note that we're voting
on the tag; the files are provided as a convenience.

Release notes:


http://people.apache.org/~jghoman/giraph-0.1.0-incubating-rc0/RELEASE_NOTE
S.html

Release artifacts:
http://people.apache.org/~jghoman/giraph-0.1.0-incubating-rc0/

Corresponding svn tag:
http://svn.apache.org/repos/asf/incubator/giraph/tags/release-0.1-rc0/

Our signing keys (my key doesn't seem to be being picked up by
http://people.apache.org/keys/group/giraph.asc):
http://svn.apache.org/repos/asf/incubator/giraph/KEYS

The vote runs for 72 hours, until Friday 4pm PST.  After a successful
vote here, Incubator will vote on the release as well.

Thanks,
Jakob






  1   2   3   4   5   6   7   >