[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-13 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103422#comment-13103422
 ] 

Avery Ching commented on GIRAPH-12:
---

Hyunsik, just to update, I grabbed your patch and it passed unittest on my 
machine.  Then I ran it on a cluster at Yahoo!.  

I didn't have time to make a messaging benchmark, so I ran PageRankBenchmark.  
I ran with 100 workers, 1 M vertices, 3 supersteps, and 10 edges per vertex.

Here are 2 runs with the original code:

11/09/13 07:02:08 INFO mapred.JobClient:   Giraph Timers
11/09/13 07:02:08 INFO mapred.JobClient: Total (milliseconds)=46709
11/09/13 07:02:08 INFO mapred.JobClient: Superstep 3 (milliseconds)=1682
11/09/13 07:02:08 INFO mapred.JobClient: Setup (milliseconds)=3228
11/09/13 07:02:08 INFO mapred.JobClient: Shutdown (milliseconds)=1223
11/09/13 07:02:08 INFO mapred.JobClient: Vertex input superstep 
(milliseconds)=3578
11/09/13 07:02:08 INFO mapred.JobClient: Superstep 0 (milliseconds)=16222
11/09/13 07:02:08 INFO mapred.JobClient: Superstep 2 (milliseconds)=12302
11/09/13 07:02:08 INFO mapred.JobClient: Superstep 1 (milliseconds)=8467

13 07:14:51 INFO mapred.JobClient:   Giraph Timers
11/09/13 07:14:51 INFO mapred.JobClient: Total (milliseconds)=51475
11/09/13 07:14:51 INFO mapred.JobClient: Superstep 3 (milliseconds)=1348
11/09/13 07:14:51 INFO mapred.JobClient: Setup (milliseconds)=7233
11/09/13 07:14:51 INFO mapred.JobClient: Shutdown (milliseconds)=884
11/09/13 07:14:51 INFO mapred.JobClient: Vertex input superstep 
(milliseconds)=3284
11/09/13 07:14:51 INFO mapred.JobClient: Superstep 0 (milliseconds)=22213
11/09/13 07:14:51 INFO mapred.JobClient: Superstep 2 (milliseconds)=8553
11/09/13 07:14:51 INFO mapred.JobClient: Superstep 1 (milliseconds)=7955


Here are 2 runs with your code:

11/09/13 07:06:56 INFO mapred.JobClient:   Giraph Timers
11/09/13 07:06:56 INFO mapred.JobClient: Total (milliseconds)=51935
11/09/13 07:06:56 INFO mapred.JobClient: Superstep 3 (milliseconds)=1150
11/09/13 07:06:56 INFO mapred.JobClient: Setup (milliseconds)=3338
11/09/13 07:06:56 INFO mapred.JobClient: Shutdown (milliseconds)=833
11/09/13 07:06:56 INFO mapred.JobClient: Vertex input superstep 
(milliseconds)=3401
11/09/13 07:06:56 INFO mapred.JobClient: Superstep 0 (milliseconds)=17297
11/09/13 07:06:56 INFO mapred.JobClient: Superstep 2 (milliseconds)=14384
11/09/13 07:06:56 INFO mapred.JobClient: Superstep 1 (milliseconds)=11528

11/09/13 07:12:09 INFO mapred.JobClient:   Giraph Timers
11/09/13 07:12:09 INFO mapred.JobClient: Total (milliseconds)=51985
11/09/13 07:12:09 INFO mapred.JobClient: Superstep 3 (milliseconds)=1362
11/09/13 07:12:09 INFO mapred.JobClient: Setup (milliseconds)=3776
11/09/13 07:12:09 INFO mapred.JobClient: Shutdown (milliseconds)=710
11/09/13 07:12:09 INFO mapred.JobClient: Vertex input superstep 
(milliseconds)=3771
11/09/13 07:12:09 INFO mapred.JobClient: Superstep 0 (milliseconds)=17741
11/09/13 07:12:09 INFO mapred.JobClient: Superstep 2 (milliseconds)=13068
11/09/13 07:12:09 INFO mapred.JobClient: Superstep 1 (milliseconds)=11551

In my limited testing, numbers aren't too different.  I also see that the 
connections are maintained throughout the application run as you mentioned.  So 
the only tradeoff is possibly the reduced parallelization of message sending 
(user chosen vs all threads).  I like the approach and think it's an 
improvement (controllable threads).  Perhaps the only comment is that regarding 
the following code block.

for(PeerConnection pc : peerConnections.values()) {
futures.add(executor.submit(new PeerFlushExecutor(pc)));
}

Probably would be good to randomize the PeerConnection objects to avoid 
hotspots on the receiving side?


 Investigate communication improvements
 --

 Key: GIRAPH-12
 URL: https://issues.apache.org/jira/browse/GIRAPH-12
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Reporter: Avery Ching
Assignee: Hyunsik Choi
Priority: Minor
 Attachments: GIRAPH-12_1.patch


 Currently every worker will start up a thread to communicate with every other 
 workers.  Hadoop RPC is used for communication.  For instance if there are 
 400 workers, each worker will create 400 threads.  This ends up using a lot 
 of memory, even with the option  
 -Dmapred.child.java.opts=-Xss64k.  
 It would be good to investigate using frameworks like Netty or custom roll 
 our own to improve this situation.  By moving away from Hadoop RPC, we would 
 also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: 

[jira] [Commented] (GIRAPH-31) Hide the SortedMapI, EdgeI,E in Vertex from client visibility (impl. detail), replace with appropriate accessor methods

2011-09-13 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103745#comment-13103745
 ] 

Jake Mannix commented on GIRAPH-31:
---

And for the implementations which have both the ability to provide a sorted 
iterator which isn't prohibitively expensive, but also provide a much faster 
unsorted iterator, they can choose whether to return true or false from the 
isSorted() method, and provide another method of the type you're suggesting. 


 Hide the SortedMapI, EdgeI,E in Vertex from client visibility (impl. 
 detail), replace with appropriate accessor methods
 ---

 Key: GIRAPH-31
 URL: https://issues.apache.org/jira/browse/GIRAPH-31
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Affects Versions: 0.70.0
Reporter: Jake Mannix
Assignee: Jake Mannix
 Attachments: GIRAPH-31.diff


 As discussed on the list, and on GIRAPH-28, the SortedMapI, EdgeI,E is an 
 implementation detail which needs not be exposed to application developers - 
 they need to iterate over the edges, and possibly access them one-by-one, and 
 remove them (in the Mutable case), but they don't need the SortedMap, and 
 creating primitive-optimized BasicVertex implementations is hampered by the 
 fact that clients expect this Map to exist.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-31) Hide the SortedMapI, EdgeI,E in Vertex from client visibility (impl. detail), replace with appropriate accessor methods

2011-09-13 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103798#comment-13103798
 ] 

Jake Mannix commented on GIRAPH-31:
---

+1 to that, given your argument on the current use of the class.  It may come a 
time when we have generic things going on in GraphMapper or BspServiceWorker 
which need to do special optimized things to sorted vertices, and at that time 
we can add an isSorted() or getSortedIterator() method.

 Hide the SortedMapI, EdgeI,E in Vertex from client visibility (impl. 
 detail), replace with appropriate accessor methods
 ---

 Key: GIRAPH-31
 URL: https://issues.apache.org/jira/browse/GIRAPH-31
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Affects Versions: 0.70.0
Reporter: Jake Mannix
Assignee: Jake Mannix
 Attachments: GIRAPH-31.diff


 As discussed on the list, and on GIRAPH-28, the SortedMapI, EdgeI,E is an 
 implementation detail which needs not be exposed to application developers - 
 they need to iterate over the edges, and possibly access them one-by-one, and 
 remove them (in the Mutable case), but they don't need the SortedMap, and 
 creating primitive-optimized BasicVertex implementations is hampered by the 
 fact that clients expect this Map to exist.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-31) Hide the SortedMapI, EdgeI,E in Vertex from client visibility (impl. detail), replace with appropriate accessor methods

2011-09-13 Thread Jake Mannix (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-31?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jake Mannix updated GIRAPH-31:
--

Attachment: GIRAPH-31.diff

Updated patch - remove isSorted(), document the fact that the iterator may or 
may not be sorted (and in fact is, in Vertex), and that users may subclass 
either Vertex *or* MutableVertex.  

I have not tested subclassing BasicVertex, which I suspect would fail in 
various ways, as VertexReader, GraphMapper, and some other classes may expect 
to get a MutableVertex for some methods.

 Hide the SortedMapI, EdgeI,E in Vertex from client visibility (impl. 
 detail), replace with appropriate accessor methods
 ---

 Key: GIRAPH-31
 URL: https://issues.apache.org/jira/browse/GIRAPH-31
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Affects Versions: 0.70.0
Reporter: Jake Mannix
Assignee: Jake Mannix
 Attachments: GIRAPH-31.diff, GIRAPH-31.diff


 As discussed on the list, and on GIRAPH-28, the SortedMapI, EdgeI,E is an 
 implementation detail which needs not be exposed to application developers - 
 they need to iterate over the edges, and possibly access them one-by-one, and 
 remove them (in the Mutable case), but they don't need the SortedMap, and 
 creating primitive-optimized BasicVertex implementations is hampered by the 
 fact that clients expect this Map to exist.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Port to YARN: GIRAPH and HAMA

2011-09-13 Thread Dmitriy Ryaboy
Dan,
Given how fast we are currently iterating on the API in Giraph, I think
agreeing on a common API across 3 projects is a bit premature at this stage,
unfortunately..

D

On Tue, Sep 13, 2011 at 11:20 AM, Dan Brickley dan...@danbri.org wrote:

 On 13 September 2011 19:47, Avery Ching ach...@apache.org wrote:

  Perhaps more practically, I wonder if it would be possible for someone
 from
  the Hama team to refactor our code a bit to support Hama-style BSP in
  Giraph?  Certainly would be a pretty cool project...

 Maybe this is crazy, but: I was wondering...  Pregel's basic API
 approach is pretty straightforward, gloriously simple even. Could we
 have platform-neutral APIs that allowed portability of applications
 between  Pregel-based platforms? At least for Java...

 Right now, those of us who are more 'application people' than platform
 developers, are left searching around on 'pregel opensource' and have
 to try to guess which of the various Pregel-eseque platforms is
 looking most healthy. For example, my summer vacation project was
 checking out GoldenOrbOS. Yet by the time I get back, the Mahout list
 was buzzing with discussion of Giraph, so I took a look at that (and
 was pleasantly suprised).

 There is clearly a lot of energy and creativity right now going into
 this kind of distributed graph processing platform. That suggests to
 me that *finalising* cross-platform APIs would be premature. But it is
 also a time when platforms have a certain amount of flexibility that
 they will loose as they get adopted and embedded within products and
 processes. Could a Pregel-like Java API be agreed between platforms
 (e.g. let's consider Giraph, Hama, GoldenOrbOS), so that those of us
 investigating applications could proceed with some hope of later
 portability. This might be cheaper than trying to persuade Giraph to
 rebuild on top of Hama, or suchlike. Anyone care to make a first pass
 at suggesting some common interfaces?

 cheers,

 Dan




-- 
Dmitriy V Ryaboy
Twitter Analytics
http://twitter.com/squarecog


Re: Port to YARN: GIRAPH and HAMA

2011-09-13 Thread Dan Brickley
On 13 September 2011 21:43, Dmitriy Ryaboy dmit...@twitter.com wrote:
 Dan,
 Given how fast we are currently iterating on the API in Giraph, I think
 agreeing on a common API across 3 projects is a bit premature at this stage,
 unfortunately..

Current velocity aside, ... could such an interface be plausible? e.g.
this time next year?

Dan


Re: Port to YARN: GIRAPH and HAMA

2011-09-13 Thread Avery Ching
Maybe it's possible, hard to say what will happen in a year.  However, 
at the same time, porting an application from any of the projects to the 
another should be shouldn't be too difficult since the Pregel API is 
relatively simple.  However, as I mentioned in my original post, I 
imagine that Giraph will support non-BSP graph computing models as well 
in the future (less portable).


Avery

On 9/13/11 12:51 PM, Dan Brickley wrote:

On 13 September 2011 21:43, Dmitriy Ryaboydmit...@twitter.com  wrote:

Dan,
Given how fast we are currently iterating on the API in Giraph, I think
agreeing on a common API across 3 projects is a bit premature at this stage,
unfortunately..

Current velocity aside, ... could such an interface be plausible? e.g.
this time next year?

Dan




[jira] [Commented] (GIRAPH-31) Hide the SortedMapI, EdgeI,E in Vertex from client visibility (impl. detail), replace with appropriate accessor methods

2011-09-13 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103944#comment-13103944
 ] 

Avery Ching commented on GIRAPH-31:
---

How about I wait until tonight (say after 7 pm) sometime to commit this?  In 
case anyone has any last thoughts...

 Hide the SortedMapI, EdgeI,E in Vertex from client visibility (impl. 
 detail), replace with appropriate accessor methods
 ---

 Key: GIRAPH-31
 URL: https://issues.apache.org/jira/browse/GIRAPH-31
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Affects Versions: 0.70.0
Reporter: Jake Mannix
Assignee: Jake Mannix
 Attachments: GIRAPH-31.diff, GIRAPH-31.diff


 As discussed on the list, and on GIRAPH-28, the SortedMapI, EdgeI,E is an 
 implementation detail which needs not be exposed to application developers - 
 they need to iterate over the edges, and possibly access them one-by-one, and 
 remove them (in the Mutable case), but they don't need the SortedMap, and 
 creating primitive-optimized BasicVertex implementations is hampered by the 
 fact that clients expect this Map to exist.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-31) Hide the SortedMapI, EdgeI,E in Vertex from client visibility (impl. detail), replace with appropriate accessor methods

2011-09-13 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103948#comment-13103948
 ] 

Jake Mannix commented on GIRAPH-31:
---

Sounds good to me!  Lazy consensus is pretty common to The Apache Way ( 
http://www.apache.org/foundation/voting.html#LazyConsensus ).

 Hide the SortedMapI, EdgeI,E in Vertex from client visibility (impl. 
 detail), replace with appropriate accessor methods
 ---

 Key: GIRAPH-31
 URL: https://issues.apache.org/jira/browse/GIRAPH-31
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Affects Versions: 0.70.0
Reporter: Jake Mannix
Assignee: Jake Mannix
 Attachments: GIRAPH-31.diff, GIRAPH-31.diff


 As discussed on the list, and on GIRAPH-28, the SortedMapI, EdgeI,E is an 
 implementation detail which needs not be exposed to application developers - 
 they need to iterate over the edges, and possibly access them one-by-one, and 
 remove them (in the Mutable case), but they don't need the SortedMap, and 
 creating primitive-optimized BasicVertex implementations is hampered by the 
 fact that clients expect this Map to exist.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira