[jira] [Commented] (GIRAPH-12) Investigate communication improvements
[ https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103422#comment-13103422 ] Avery Ching commented on GIRAPH-12: --- Hyunsik, just to update, I grabbed your patch and it passed unittest on my machine. Then I ran it on a cluster at Yahoo!. I didn't have time to make a messaging benchmark, so I ran PageRankBenchmark. I ran with 100 workers, 1 M vertices, 3 supersteps, and 10 edges per vertex. Here are 2 runs with the original code: 11/09/13 07:02:08 INFO mapred.JobClient: Giraph Timers 11/09/13 07:02:08 INFO mapred.JobClient: Total (milliseconds)=46709 11/09/13 07:02:08 INFO mapred.JobClient: Superstep 3 (milliseconds)=1682 11/09/13 07:02:08 INFO mapred.JobClient: Setup (milliseconds)=3228 11/09/13 07:02:08 INFO mapred.JobClient: Shutdown (milliseconds)=1223 11/09/13 07:02:08 INFO mapred.JobClient: Vertex input superstep (milliseconds)=3578 11/09/13 07:02:08 INFO mapred.JobClient: Superstep 0 (milliseconds)=16222 11/09/13 07:02:08 INFO mapred.JobClient: Superstep 2 (milliseconds)=12302 11/09/13 07:02:08 INFO mapred.JobClient: Superstep 1 (milliseconds)=8467 13 07:14:51 INFO mapred.JobClient: Giraph Timers 11/09/13 07:14:51 INFO mapred.JobClient: Total (milliseconds)=51475 11/09/13 07:14:51 INFO mapred.JobClient: Superstep 3 (milliseconds)=1348 11/09/13 07:14:51 INFO mapred.JobClient: Setup (milliseconds)=7233 11/09/13 07:14:51 INFO mapred.JobClient: Shutdown (milliseconds)=884 11/09/13 07:14:51 INFO mapred.JobClient: Vertex input superstep (milliseconds)=3284 11/09/13 07:14:51 INFO mapred.JobClient: Superstep 0 (milliseconds)=22213 11/09/13 07:14:51 INFO mapred.JobClient: Superstep 2 (milliseconds)=8553 11/09/13 07:14:51 INFO mapred.JobClient: Superstep 1 (milliseconds)=7955 Here are 2 runs with your code: 11/09/13 07:06:56 INFO mapred.JobClient: Giraph Timers 11/09/13 07:06:56 INFO mapred.JobClient: Total (milliseconds)=51935 11/09/13 07:06:56 INFO mapred.JobClient: Superstep 3 (milliseconds)=1150 11/09/13 07:06:56 INFO mapred.JobClient: Setup (milliseconds)=3338 11/09/13 07:06:56 INFO mapred.JobClient: Shutdown (milliseconds)=833 11/09/13 07:06:56 INFO mapred.JobClient: Vertex input superstep (milliseconds)=3401 11/09/13 07:06:56 INFO mapred.JobClient: Superstep 0 (milliseconds)=17297 11/09/13 07:06:56 INFO mapred.JobClient: Superstep 2 (milliseconds)=14384 11/09/13 07:06:56 INFO mapred.JobClient: Superstep 1 (milliseconds)=11528 11/09/13 07:12:09 INFO mapred.JobClient: Giraph Timers 11/09/13 07:12:09 INFO mapred.JobClient: Total (milliseconds)=51985 11/09/13 07:12:09 INFO mapred.JobClient: Superstep 3 (milliseconds)=1362 11/09/13 07:12:09 INFO mapred.JobClient: Setup (milliseconds)=3776 11/09/13 07:12:09 INFO mapred.JobClient: Shutdown (milliseconds)=710 11/09/13 07:12:09 INFO mapred.JobClient: Vertex input superstep (milliseconds)=3771 11/09/13 07:12:09 INFO mapred.JobClient: Superstep 0 (milliseconds)=17741 11/09/13 07:12:09 INFO mapred.JobClient: Superstep 2 (milliseconds)=13068 11/09/13 07:12:09 INFO mapred.JobClient: Superstep 1 (milliseconds)=11551 In my limited testing, numbers aren't too different. I also see that the connections are maintained throughout the application run as you mentioned. So the only tradeoff is possibly the reduced parallelization of message sending (user chosen vs all threads). I like the approach and think it's an improvement (controllable threads). Perhaps the only comment is that regarding the following code block. for(PeerConnection pc : peerConnections.values()) { futures.add(executor.submit(new PeerFlushExecutor(pc))); } Probably would be good to randomize the PeerConnection objects to avoid hotspots on the receiving side? Investigate communication improvements -- Key: GIRAPH-12 URL: https://issues.apache.org/jira/browse/GIRAPH-12 Project: Giraph Issue Type: Improvement Components: bsp Reporter: Avery Ching Assignee: Hyunsik Choi Priority: Minor Attachments: GIRAPH-12_1.patch Currently every worker will start up a thread to communicate with every other workers. Hadoop RPC is used for communication. For instance if there are 400 workers, each worker will create 400 threads. This ends up using a lot of memory, even with the option -Dmapred.child.java.opts=-Xss64k. It would be good to investigate using frameworks like Netty or custom roll our own to improve this situation. By moving away from Hadoop RPC, we would also make compatibility of different Hadoop versions easier. -- This message is automatically generated by JIRA. For more information on JIRA, see:
[jira] [Commented] (GIRAPH-31) Hide the SortedMapI, EdgeI,E in Vertex from client visibility (impl. detail), replace with appropriate accessor methods
[ https://issues.apache.org/jira/browse/GIRAPH-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103745#comment-13103745 ] Jake Mannix commented on GIRAPH-31: --- And for the implementations which have both the ability to provide a sorted iterator which isn't prohibitively expensive, but also provide a much faster unsorted iterator, they can choose whether to return true or false from the isSorted() method, and provide another method of the type you're suggesting. Hide the SortedMapI, EdgeI,E in Vertex from client visibility (impl. detail), replace with appropriate accessor methods --- Key: GIRAPH-31 URL: https://issues.apache.org/jira/browse/GIRAPH-31 Project: Giraph Issue Type: Improvement Components: graph Affects Versions: 0.70.0 Reporter: Jake Mannix Assignee: Jake Mannix Attachments: GIRAPH-31.diff As discussed on the list, and on GIRAPH-28, the SortedMapI, EdgeI,E is an implementation detail which needs not be exposed to application developers - they need to iterate over the edges, and possibly access them one-by-one, and remove them (in the Mutable case), but they don't need the SortedMap, and creating primitive-optimized BasicVertex implementations is hampered by the fact that clients expect this Map to exist. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-31) Hide the SortedMapI, EdgeI,E in Vertex from client visibility (impl. detail), replace with appropriate accessor methods
[ https://issues.apache.org/jira/browse/GIRAPH-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103798#comment-13103798 ] Jake Mannix commented on GIRAPH-31: --- +1 to that, given your argument on the current use of the class. It may come a time when we have generic things going on in GraphMapper or BspServiceWorker which need to do special optimized things to sorted vertices, and at that time we can add an isSorted() or getSortedIterator() method. Hide the SortedMapI, EdgeI,E in Vertex from client visibility (impl. detail), replace with appropriate accessor methods --- Key: GIRAPH-31 URL: https://issues.apache.org/jira/browse/GIRAPH-31 Project: Giraph Issue Type: Improvement Components: graph Affects Versions: 0.70.0 Reporter: Jake Mannix Assignee: Jake Mannix Attachments: GIRAPH-31.diff As discussed on the list, and on GIRAPH-28, the SortedMapI, EdgeI,E is an implementation detail which needs not be exposed to application developers - they need to iterate over the edges, and possibly access them one-by-one, and remove them (in the Mutable case), but they don't need the SortedMap, and creating primitive-optimized BasicVertex implementations is hampered by the fact that clients expect this Map to exist. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (GIRAPH-31) Hide the SortedMapI, EdgeI,E in Vertex from client visibility (impl. detail), replace with appropriate accessor methods
[ https://issues.apache.org/jira/browse/GIRAPH-31?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jake Mannix updated GIRAPH-31: -- Attachment: GIRAPH-31.diff Updated patch - remove isSorted(), document the fact that the iterator may or may not be sorted (and in fact is, in Vertex), and that users may subclass either Vertex *or* MutableVertex. I have not tested subclassing BasicVertex, which I suspect would fail in various ways, as VertexReader, GraphMapper, and some other classes may expect to get a MutableVertex for some methods. Hide the SortedMapI, EdgeI,E in Vertex from client visibility (impl. detail), replace with appropriate accessor methods --- Key: GIRAPH-31 URL: https://issues.apache.org/jira/browse/GIRAPH-31 Project: Giraph Issue Type: Improvement Components: graph Affects Versions: 0.70.0 Reporter: Jake Mannix Assignee: Jake Mannix Attachments: GIRAPH-31.diff, GIRAPH-31.diff As discussed on the list, and on GIRAPH-28, the SortedMapI, EdgeI,E is an implementation detail which needs not be exposed to application developers - they need to iterate over the edges, and possibly access them one-by-one, and remove them (in the Mutable case), but they don't need the SortedMap, and creating primitive-optimized BasicVertex implementations is hampered by the fact that clients expect this Map to exist. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Port to YARN: GIRAPH and HAMA
Dan, Given how fast we are currently iterating on the API in Giraph, I think agreeing on a common API across 3 projects is a bit premature at this stage, unfortunately.. D On Tue, Sep 13, 2011 at 11:20 AM, Dan Brickley dan...@danbri.org wrote: On 13 September 2011 19:47, Avery Ching ach...@apache.org wrote: Perhaps more practically, I wonder if it would be possible for someone from the Hama team to refactor our code a bit to support Hama-style BSP in Giraph? Certainly would be a pretty cool project... Maybe this is crazy, but: I was wondering... Pregel's basic API approach is pretty straightforward, gloriously simple even. Could we have platform-neutral APIs that allowed portability of applications between Pregel-based platforms? At least for Java... Right now, those of us who are more 'application people' than platform developers, are left searching around on 'pregel opensource' and have to try to guess which of the various Pregel-eseque platforms is looking most healthy. For example, my summer vacation project was checking out GoldenOrbOS. Yet by the time I get back, the Mahout list was buzzing with discussion of Giraph, so I took a look at that (and was pleasantly suprised). There is clearly a lot of energy and creativity right now going into this kind of distributed graph processing platform. That suggests to me that *finalising* cross-platform APIs would be premature. But it is also a time when platforms have a certain amount of flexibility that they will loose as they get adopted and embedded within products and processes. Could a Pregel-like Java API be agreed between platforms (e.g. let's consider Giraph, Hama, GoldenOrbOS), so that those of us investigating applications could proceed with some hope of later portability. This might be cheaper than trying to persuade Giraph to rebuild on top of Hama, or suchlike. Anyone care to make a first pass at suggesting some common interfaces? cheers, Dan -- Dmitriy V Ryaboy Twitter Analytics http://twitter.com/squarecog
Re: Port to YARN: GIRAPH and HAMA
On 13 September 2011 21:43, Dmitriy Ryaboy dmit...@twitter.com wrote: Dan, Given how fast we are currently iterating on the API in Giraph, I think agreeing on a common API across 3 projects is a bit premature at this stage, unfortunately.. Current velocity aside, ... could such an interface be plausible? e.g. this time next year? Dan
Re: Port to YARN: GIRAPH and HAMA
Maybe it's possible, hard to say what will happen in a year. However, at the same time, porting an application from any of the projects to the another should be shouldn't be too difficult since the Pregel API is relatively simple. However, as I mentioned in my original post, I imagine that Giraph will support non-BSP graph computing models as well in the future (less portable). Avery On 9/13/11 12:51 PM, Dan Brickley wrote: On 13 September 2011 21:43, Dmitriy Ryaboydmit...@twitter.com wrote: Dan, Given how fast we are currently iterating on the API in Giraph, I think agreeing on a common API across 3 projects is a bit premature at this stage, unfortunately.. Current velocity aside, ... could such an interface be plausible? e.g. this time next year? Dan
[jira] [Commented] (GIRAPH-31) Hide the SortedMapI, EdgeI,E in Vertex from client visibility (impl. detail), replace with appropriate accessor methods
[ https://issues.apache.org/jira/browse/GIRAPH-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103944#comment-13103944 ] Avery Ching commented on GIRAPH-31: --- How about I wait until tonight (say after 7 pm) sometime to commit this? In case anyone has any last thoughts... Hide the SortedMapI, EdgeI,E in Vertex from client visibility (impl. detail), replace with appropriate accessor methods --- Key: GIRAPH-31 URL: https://issues.apache.org/jira/browse/GIRAPH-31 Project: Giraph Issue Type: Improvement Components: graph Affects Versions: 0.70.0 Reporter: Jake Mannix Assignee: Jake Mannix Attachments: GIRAPH-31.diff, GIRAPH-31.diff As discussed on the list, and on GIRAPH-28, the SortedMapI, EdgeI,E is an implementation detail which needs not be exposed to application developers - they need to iterate over the edges, and possibly access them one-by-one, and remove them (in the Mutable case), but they don't need the SortedMap, and creating primitive-optimized BasicVertex implementations is hampered by the fact that clients expect this Map to exist. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-31) Hide the SortedMapI, EdgeI,E in Vertex from client visibility (impl. detail), replace with appropriate accessor methods
[ https://issues.apache.org/jira/browse/GIRAPH-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103948#comment-13103948 ] Jake Mannix commented on GIRAPH-31: --- Sounds good to me! Lazy consensus is pretty common to The Apache Way ( http://www.apache.org/foundation/voting.html#LazyConsensus ). Hide the SortedMapI, EdgeI,E in Vertex from client visibility (impl. detail), replace with appropriate accessor methods --- Key: GIRAPH-31 URL: https://issues.apache.org/jira/browse/GIRAPH-31 Project: Giraph Issue Type: Improvement Components: graph Affects Versions: 0.70.0 Reporter: Jake Mannix Assignee: Jake Mannix Attachments: GIRAPH-31.diff, GIRAPH-31.diff As discussed on the list, and on GIRAPH-28, the SortedMapI, EdgeI,E is an implementation detail which needs not be exposed to application developers - they need to iterate over the edges, and possibly access them one-by-one, and remove them (in the Mutable case), but they don't need the SortedMap, and creating primitive-optimized BasicVertex implementations is hampered by the fact that clients expect this Map to exist. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira