[jira] [Commented] (GIRAPH-12) Investigate communication improvements
[ https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104270#comment-13104270 ] Avery Ching commented on GIRAPH-12: --- Sound great, hope you had a nice vacation. Perhaps if you have some extra time, could you draft up a message passing benchmark that could be useful to compare you final implementation against the original? > Investigate communication improvements > -- > > Key: GIRAPH-12 > URL: https://issues.apache.org/jira/browse/GIRAPH-12 > Project: Giraph > Issue Type: Improvement > Components: bsp >Reporter: Avery Ching >Assignee: Hyunsik Choi >Priority: Minor > Attachments: GIRAPH-12_1.patch > > > Currently every worker will start up a thread to communicate with every other > workers. Hadoop RPC is used for communication. For instance if there are > 400 workers, each worker will create 400 threads. This ends up using a lot > of memory, even with the option > -Dmapred.child.java.opts="-Xss64k". > It would be good to investigate using frameworks like Netty or custom roll > our own to improve this situation. By moving away from Hadoop RPC, we would > also make compatibility of different Hadoop versions easier. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (GIRAPH-31) Hide the SortedMap> in Vertex from client visibility (impl. detail), replace with appropriate accessor methods
[ https://issues.apache.org/jira/browse/GIRAPH-31?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Avery Ching resolved GIRAPH-31. --- Resolution: Fixed Thanks Jake! > Hide the SortedMap> in Vertex from client visibility (impl. > detail), replace with appropriate accessor methods > --- > > Key: GIRAPH-31 > URL: https://issues.apache.org/jira/browse/GIRAPH-31 > Project: Giraph > Issue Type: Improvement > Components: graph >Affects Versions: 0.70.0 >Reporter: Jake Mannix >Assignee: Jake Mannix > Attachments: GIRAPH-31.diff, GIRAPH-31.diff > > > As discussed on the list, and on GIRAPH-28, the SortedMap> is an > implementation detail which needs not be exposed to application developers - > they need to iterate over the edges, and possibly access them one-by-one, and > remove them (in the Mutable case), but they don't need the SortedMap, and > creating primitive-optimized BasicVertex implementations is hampered by the > fact that clients expect this Map to exist. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-31) Hide the SortedMap> in Vertex from client visibility (impl. detail), replace with appropriate accessor methods
[ https://issues.apache.org/jira/browse/GIRAPH-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104234#comment-13104234 ] Avery Ching commented on GIRAPH-31: --- Times up (it is 9:10 PM) and there were no comments. If there are any additional interface changes, we can always address them later. I made some minor changes to fit the code conventions and verified that unittests passed. > Hide the SortedMap> in Vertex from client visibility (impl. > detail), replace with appropriate accessor methods > --- > > Key: GIRAPH-31 > URL: https://issues.apache.org/jira/browse/GIRAPH-31 > Project: Giraph > Issue Type: Improvement > Components: graph >Affects Versions: 0.70.0 >Reporter: Jake Mannix >Assignee: Jake Mannix > Attachments: GIRAPH-31.diff, GIRAPH-31.diff > > > As discussed on the list, and on GIRAPH-28, the SortedMap> is an > implementation detail which needs not be exposed to application developers - > they need to iterate over the edges, and possibly access them one-by-one, and > remove them (in the Mutable case), but they don't need the SortedMap, and > creating primitive-optimized BasicVertex implementations is hampered by the > fact that clients expect this Map to exist. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-12) Investigate communication improvements
[ https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104230#comment-13104230 ] Hyunsik Choi commented on GIRAPH-12: Sorry for late response. Actually, I was on vacation between September 12-13. Thank you for your testing. As you pointed out, the current patch incurs hotspots on the receiving side. I will add code lines to randomize flushes to mitigate skewness problem and some tweaks to improve the performance. > Investigate communication improvements > -- > > Key: GIRAPH-12 > URL: https://issues.apache.org/jira/browse/GIRAPH-12 > Project: Giraph > Issue Type: Improvement > Components: bsp >Reporter: Avery Ching >Assignee: Hyunsik Choi >Priority: Minor > Attachments: GIRAPH-12_1.patch > > > Currently every worker will start up a thread to communicate with every other > workers. Hadoop RPC is used for communication. For instance if there are > 400 workers, each worker will create 400 threads. This ends up using a lot > of memory, even with the option > -Dmapred.child.java.opts="-Xss64k". > It would be good to investigate using frameworks like Netty or custom roll > our own to improve this situation. By moving away from Hadoop RPC, we would > also make compatibility of different Hadoop versions easier. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-30) NPE in ZooKeeperManager if base directory cannot be created
[ https://issues.apache.org/jira/browse/GIRAPH-30?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104220#comment-13104220 ] Avery Ching commented on GIRAPH-30: --- Taking to long for another committer, and Andrew did review it. I have committed. If this is an issue, please reopen. > NPE in ZooKeeperManager if base directory cannot be created > --- > > Key: GIRAPH-30 > URL: https://issues.apache.org/jira/browse/GIRAPH-30 > Project: Giraph > Issue Type: Bug >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Attachments: GIRAPH-30.2.patch, GIRAPH-30.patch > > > If the base directory cannot be created, for example if running on secure > Hadoop and the user home directory does not exist, ZooKeeperManager will > throw an NPE when trying to list it. It would be better to throw an > IOException with an informative message. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Port to YARN: GIRAPH and HAMA
Interesintg. In our community, someone's thinking about asynchronous message processing for more efficient iteration[1], too. As I mentioned before to you, differ in slogan but not in kind. The technical issues are nothing, Avery. Anyway, ... It would be nice if we can talk together continuously, for collaborative competition. http://s.apache.org/HamaVsGiraph 1. http://markmail.org/thread/nrrevdrb5qc7ic5c On Wed, Sep 14, 2011 at 2:47 AM, Avery Ching wrote: > Hi Vinod, > > Edward and I have chatted about this at times. It sounds better in theory > (both BSP based and adding support for MRv2) than in practice I think > (underlying implementations are quite different). Actually, I also believe > that in the future, Giraph is not going to solely be BSP-based graph > computing. We are also thinking about other underlying computing models > (i.e. streaming (asynchronous) graph processing - see > > http://mail-archives.apache.org/mod_mbox/incubator-giraph-user/201109.mbox/%3CCAEVHzWC8b-7RiBjkDiQKjiu-rVBz9=ogeoajxhbclcr5n3+...@mail.gmail.com%3E > > But I think today, the issues are the following: > > 1) Giraph runs completely as a MapReduce job on Hadoop today. This needs > to be maintained to support our current users, who will not likely move to > MRv2 for at least a year. > 2) The internals of Giraph are implemented differently than Hama and would > take some time to port to. > 3) If we have various graph processing computing models (BSP based, streams > or asynchronous, or a combination), then being on Hama brings little value > for Giraph. > > Perhaps more practically, I wonder if it would be possible for someone from > the Hama team to refactor our code a bit to support Hama-style BSP in > Giraph? Certainly would be a pretty cool project... > > Avery > > On 9/13/11 4:49 AM, Edward J. Yoon wrote: >> >> Quite a while ago, I implemented a clone of Google Pregel simply using >> BSPLib[1] and decided to focus on BSP computing engine. >> >> Hama and Giraph projects are differ in slogan but not in kind. >> >> If we made some collaboration, Giraph should be implemented on top of >> Hama BSP computing engine. >> >> Otherwise, we will back to square one again. >> >> 1. http://markmail.org/thread/4czcgtjupjvpqcqi >> >> On Sun, Sep 11, 2011 at 11:22 PM, Vinod Kumar Vavilapalli >> wrote: >>> >>> Crosspost to hama-dev and giraph-dev. >>> >>> It was only in my morning time that I was looking at HAMA-431, the port >>> of >>> Hama to YARN. And one of the tweets reminded me of JIRA issue GIRAPH-13 >>> which is about porting Giraph to YARN. >>> >>> I was also looking at the Girpah proposal for entry into Apache >>> Incubator. >>> There is an interesting section there: >>> {quote} >>> Relationships with Other Apache Products >>> >>> Giraph has some overlapping functionality with Apache Hama. However, >>> there >>> are some significant differences. Giraph focuses on graph-based bulk >>> synchronous parallel (BSP) computing, while Apache Hama is more for >>> general >>> purposed BSP computing. Giraph runs on the Hadoop infrastructure, while >>> Apache Hama uses its own computing framework. >>> {quote} >>> >>> I agree with the point about Hama being a general purposed BSP and Giraph >>> being completely graph oriented. But the later one about the >>> infrastructure >>> is going to be moot with both Giraph and Hama trying to be ported over to >>> YARN. >>> >>> So here's my billion dollar question: Is it possible to implement >>> Girpah's >>> graph based APIs over the Hama's bsp APIs which both run over a single >>> Apache BSP implementation over YARN? >>> >>> I also do see the email thread regarding Hama and Giraph's future >>> collaboration when Hadoop NextGen aka YARN comes in: >>> http://s.apache.org/HamaVsGiraph. So are we ready for this yet? >>> >>> Disclaimer: I come from the Hadoop world, have no idea of Giraph's APIs >>> or >>> internals except that I see a bsp package in Giraph's source tree. I do >>> know >>> a tiny bit about Hama's APIs and internal but my expertise is only two >>> days. >>> >>> Thanks, >>> +Vinod >>> (An elephant maintainer trying to see if a Giraffe can be made to ride >>> over >>> a hippopotamus riding over an elephant) >>> >> >> > > -- Best Regards, Edward J. Yoon @eddieyoon
[jira] [Commented] (GIRAPH-31) Hide the SortedMap> in Vertex from client visibility (impl. detail), replace with appropriate accessor methods
[ https://issues.apache.org/jira/browse/GIRAPH-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103948#comment-13103948 ] Jake Mannix commented on GIRAPH-31: --- Sounds good to me! "Lazy consensus" is pretty common to The Apache Way ( http://www.apache.org/foundation/voting.html#LazyConsensus ). > Hide the SortedMap> in Vertex from client visibility (impl. > detail), replace with appropriate accessor methods > --- > > Key: GIRAPH-31 > URL: https://issues.apache.org/jira/browse/GIRAPH-31 > Project: Giraph > Issue Type: Improvement > Components: graph >Affects Versions: 0.70.0 >Reporter: Jake Mannix >Assignee: Jake Mannix > Attachments: GIRAPH-31.diff, GIRAPH-31.diff > > > As discussed on the list, and on GIRAPH-28, the SortedMap> is an > implementation detail which needs not be exposed to application developers - > they need to iterate over the edges, and possibly access them one-by-one, and > remove them (in the Mutable case), but they don't need the SortedMap, and > creating primitive-optimized BasicVertex implementations is hampered by the > fact that clients expect this Map to exist. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-31) Hide the SortedMap> in Vertex from client visibility (impl. detail), replace with appropriate accessor methods
[ https://issues.apache.org/jira/browse/GIRAPH-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103944#comment-13103944 ] Avery Ching commented on GIRAPH-31: --- How about I wait until tonight (say after 7 pm) sometime to commit this? In case anyone has any last thoughts... > Hide the SortedMap> in Vertex from client visibility (impl. > detail), replace with appropriate accessor methods > --- > > Key: GIRAPH-31 > URL: https://issues.apache.org/jira/browse/GIRAPH-31 > Project: Giraph > Issue Type: Improvement > Components: graph >Affects Versions: 0.70.0 >Reporter: Jake Mannix >Assignee: Jake Mannix > Attachments: GIRAPH-31.diff, GIRAPH-31.diff > > > As discussed on the list, and on GIRAPH-28, the SortedMap> is an > implementation detail which needs not be exposed to application developers - > they need to iterate over the edges, and possibly access them one-by-one, and > remove them (in the Mutable case), but they don't need the SortedMap, and > creating primitive-optimized BasicVertex implementations is hampered by the > fact that clients expect this Map to exist. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Port to YARN: GIRAPH and HAMA
Maybe it's possible, hard to say what will happen in a year. However, at the same time, porting an application from any of the projects to the another should be shouldn't be too difficult since the Pregel API is relatively simple. However, as I mentioned in my original post, I imagine that Giraph will support non-BSP graph computing models as well in the future (less portable). Avery On 9/13/11 12:51 PM, Dan Brickley wrote: On 13 September 2011 21:43, Dmitriy Ryaboy wrote: Dan, Given how fast we are currently iterating on the API in Giraph, I think agreeing on a common API across 3 projects is a bit premature at this stage, unfortunately.. Current velocity aside, ... could such an interface be plausible? e.g. this time next year? Dan
Re: Port to YARN: GIRAPH and HAMA
On 13 September 2011 21:43, Dmitriy Ryaboy wrote: > Dan, > Given how fast we are currently iterating on the API in Giraph, I think > agreeing on a common API across 3 projects is a bit premature at this stage, > unfortunately.. Current velocity aside, ... could such an interface be plausible? e.g. this time next year? Dan
Re: Port to YARN: GIRAPH and HAMA
Dan, Given how fast we are currently iterating on the API in Giraph, I think agreeing on a common API across 3 projects is a bit premature at this stage, unfortunately.. D On Tue, Sep 13, 2011 at 11:20 AM, Dan Brickley wrote: > On 13 September 2011 19:47, Avery Ching wrote: > > > Perhaps more practically, I wonder if it would be possible for someone > from > > the Hama team to refactor our code a bit to support Hama-style BSP in > > Giraph? Certainly would be a pretty cool project... > > Maybe this is crazy, but: I was wondering... Pregel's basic API > approach is pretty straightforward, gloriously simple even. Could we > have platform-neutral APIs that allowed portability of applications > between Pregel-based platforms? At least for Java... > > Right now, those of us who are more 'application people' than platform > developers, are left searching around on 'pregel opensource' and have > to try to guess which of the various Pregel-eseque platforms is > looking most healthy. For example, my summer vacation project was > checking out GoldenOrbOS. Yet by the time I get back, the Mahout list > was buzzing with discussion of Giraph, so I took a look at that (and > was pleasantly suprised). > > There is clearly a lot of energy and creativity right now going into > this kind of distributed graph processing platform. That suggests to > me that *finalising* cross-platform APIs would be premature. But it is > also a time when platforms have a certain amount of flexibility that > they will loose as they get adopted and embedded within products and > processes. Could a Pregel-like Java API be agreed between platforms > (e.g. let's consider Giraph, Hama, GoldenOrbOS), so that those of us > investigating applications could proceed with some hope of later > portability. This might be cheaper than trying to persuade Giraph to > rebuild on top of Hama, or suchlike. Anyone care to make a first pass > at suggesting some common interfaces? > > cheers, > > Dan > -- Dmitriy V Ryaboy Twitter Analytics http://twitter.com/squarecog
[jira] [Updated] (GIRAPH-31) Hide the SortedMap> in Vertex from client visibility (impl. detail), replace with appropriate accessor methods
[ https://issues.apache.org/jira/browse/GIRAPH-31?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jake Mannix updated GIRAPH-31: -- Attachment: GIRAPH-31.diff Updated patch - remove isSorted(), document the fact that the iterator may or may not be sorted (and in fact is, in Vertex), and that users may subclass either Vertex *or* MutableVertex. I have not tested subclassing BasicVertex, which I suspect would fail in various ways, as VertexReader, GraphMapper, and some other classes may expect to get a MutableVertex for some methods. > Hide the SortedMap> in Vertex from client visibility (impl. > detail), replace with appropriate accessor methods > --- > > Key: GIRAPH-31 > URL: https://issues.apache.org/jira/browse/GIRAPH-31 > Project: Giraph > Issue Type: Improvement > Components: graph >Affects Versions: 0.70.0 >Reporter: Jake Mannix >Assignee: Jake Mannix > Attachments: GIRAPH-31.diff, GIRAPH-31.diff > > > As discussed on the list, and on GIRAPH-28, the SortedMap> is an > implementation detail which needs not be exposed to application developers - > they need to iterate over the edges, and possibly access them one-by-one, and > remove them (in the Mutable case), but they don't need the SortedMap, and > creating primitive-optimized BasicVertex implementations is hampered by the > fact that clients expect this Map to exist. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Port to YARN: GIRAPH and HAMA
On 13 September 2011 19:47, Avery Ching wrote: > Perhaps more practically, I wonder if it would be possible for someone from > the Hama team to refactor our code a bit to support Hama-style BSP in > Giraph? Certainly would be a pretty cool project... Maybe this is crazy, but: I was wondering... Pregel's basic API approach is pretty straightforward, gloriously simple even. Could we have platform-neutral APIs that allowed portability of applications between Pregel-based platforms? At least for Java... Right now, those of us who are more 'application people' than platform developers, are left searching around on 'pregel opensource' and have to try to guess which of the various Pregel-eseque platforms is looking most healthy. For example, my summer vacation project was checking out GoldenOrbOS. Yet by the time I get back, the Mahout list was buzzing with discussion of Giraph, so I took a look at that (and was pleasantly suprised). There is clearly a lot of energy and creativity right now going into this kind of distributed graph processing platform. That suggests to me that *finalising* cross-platform APIs would be premature. But it is also a time when platforms have a certain amount of flexibility that they will loose as they get adopted and embedded within products and processes. Could a Pregel-like Java API be agreed between platforms (e.g. let's consider Giraph, Hama, GoldenOrbOS), so that those of us investigating applications could proceed with some hope of later portability. This might be cheaper than trying to persuade Giraph to rebuild on top of Hama, or suchlike. Anyone care to make a first pass at suggesting some common interfaces? cheers, Dan
Re: Port to YARN: GIRAPH and HAMA
Hi Vinod, Edward and I have chatted about this at times. It sounds better in theory (both BSP based and adding support for MRv2) than in practice I think (underlying implementations are quite different). Actually, I also believe that in the future, Giraph is not going to solely be BSP-based graph computing. We are also thinking about other underlying computing models (i.e. streaming (asynchronous) graph processing - see http://mail-archives.apache.org/mod_mbox/incubator-giraph-user/201109.mbox/%3CCAEVHzWC8b-7RiBjkDiQKjiu-rVBz9=ogeoajxhbclcr5n3+...@mail.gmail.com%3E But I think today, the issues are the following: 1) Giraph runs completely as a MapReduce job on Hadoop today. This needs to be maintained to support our current users, who will not likely move to MRv2 for at least a year. 2) The internals of Giraph are implemented differently than Hama and would take some time to port to. 3) If we have various graph processing computing models (BSP based, streams or asynchronous, or a combination), then being on Hama brings little value for Giraph. Perhaps more practically, I wonder if it would be possible for someone from the Hama team to refactor our code a bit to support Hama-style BSP in Giraph? Certainly would be a pretty cool project... Avery On 9/13/11 4:49 AM, Edward J. Yoon wrote: Quite a while ago, I implemented a clone of Google Pregel simply using BSPLib[1] and decided to focus on BSP computing engine. Hama and Giraph projects are differ in slogan but not in kind. If we made some collaboration, Giraph should be implemented on top of Hama BSP computing engine. Otherwise, we will back to square one again. 1. http://markmail.org/thread/4czcgtjupjvpqcqi On Sun, Sep 11, 2011 at 11:22 PM, Vinod Kumar Vavilapalli wrote: Crosspost to hama-dev and giraph-dev. It was only in my morning time that I was looking at HAMA-431, the port of Hama to YARN. And one of the tweets reminded me of JIRA issue GIRAPH-13 which is about porting Giraph to YARN. I was also looking at the Girpah proposal for entry into Apache Incubator. There is an interesting section there: {quote} Relationships with Other Apache Products Giraph has some overlapping functionality with Apache Hama. However, there are some significant differences. Giraph focuses on graph-based bulk synchronous parallel (BSP) computing, while Apache Hama is more for general purposed BSP computing. Giraph runs on the Hadoop infrastructure, while Apache Hama uses its own computing framework. {quote} I agree with the point about Hama being a general purposed BSP and Giraph being completely graph oriented. But the later one about the infrastructure is going to be moot with both Giraph and Hama trying to be ported over to YARN. So here's my billion dollar question: Is it possible to implement Girpah's graph based APIs over the Hama's bsp APIs which both run over a single Apache BSP implementation over YARN? I also do see the email thread regarding Hama and Giraph's future collaboration when Hadoop NextGen aka YARN comes in: http://s.apache.org/HamaVsGiraph. So are we ready for this yet? Disclaimer: I come from the Hadoop world, have no idea of Giraph's APIs or internals except that I see a bsp package in Giraph's source tree. I do know a tiny bit about Hama's APIs and internal but my expertise is only two days. Thanks, +Vinod (An elephant maintainer trying to see if a Giraffe can be made to ride over a hippopotamus riding over an elephant)
[jira] [Commented] (GIRAPH-31) Hide the SortedMap> in Vertex from client visibility (impl. detail), replace with appropriate accessor methods
[ https://issues.apache.org/jira/browse/GIRAPH-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103815#comment-13103815 ] Jake Mannix commented on GIRAPH-31: --- Noticed one more thing: if people do subclass Vertex, we need to change destEdgeMap to be protected, as we don't provide a getter anymore, so subclasses which want to do range-queries or whatnot, can do so. > Hide the SortedMap> in Vertex from client visibility (impl. > detail), replace with appropriate accessor methods > --- > > Key: GIRAPH-31 > URL: https://issues.apache.org/jira/browse/GIRAPH-31 > Project: Giraph > Issue Type: Improvement > Components: graph >Affects Versions: 0.70.0 >Reporter: Jake Mannix >Assignee: Jake Mannix > Attachments: GIRAPH-31.diff > > > As discussed on the list, and on GIRAPH-28, the SortedMap> is an > implementation detail which needs not be exposed to application developers - > they need to iterate over the edges, and possibly access them one-by-one, and > remove them (in the Mutable case), but they don't need the SortedMap, and > creating primitive-optimized BasicVertex implementations is hampered by the > fact that clients expect this Map to exist. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-31) Hide the SortedMap> in Vertex from client visibility (impl. detail), replace with appropriate accessor methods
[ https://issues.apache.org/jira/browse/GIRAPH-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103798#comment-13103798 ] Jake Mannix commented on GIRAPH-31: --- +1 to that, given your argument on the current use of the class. It may come a time when we have generic things going on in GraphMapper or BspServiceWorker which need to do special optimized things to sorted vertices, and at that time we can add an "isSorted()" or "getSortedIterator()" method. > Hide the SortedMap> in Vertex from client visibility (impl. > detail), replace with appropriate accessor methods > --- > > Key: GIRAPH-31 > URL: https://issues.apache.org/jira/browse/GIRAPH-31 > Project: Giraph > Issue Type: Improvement > Components: graph >Affects Versions: 0.70.0 >Reporter: Jake Mannix >Assignee: Jake Mannix > Attachments: GIRAPH-31.diff > > > As discussed on the list, and on GIRAPH-28, the SortedMap> is an > implementation detail which needs not be exposed to application developers - > they need to iterate over the edges, and possibly access them one-by-one, and > remove them (in the Mutable case), but they don't need the SortedMap, and > creating primitive-optimized BasicVertex implementations is hampered by the > fact that clients expect this Map to exist. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Port to YARN: GIRAPH and HAMA
Quite a while ago, I implemented a clone of Google Pregel simply using BSPLib[1] and decided to focus on BSP computing engine. Hama and Giraph projects are differ in slogan but not in kind. If we made some collaboration, Giraph should be implemented on top of Hama BSP computing engine. Otherwise, we will back to square one again. 1. http://markmail.org/thread/4czcgtjupjvpqcqi On Sun, Sep 11, 2011 at 11:22 PM, Vinod Kumar Vavilapalli wrote: > Crosspost to hama-dev and giraph-dev. > > It was only in my morning time that I was looking at HAMA-431, the port of > Hama to YARN. And one of the tweets reminded me of JIRA issue GIRAPH-13 > which is about porting Giraph to YARN. > > I was also looking at the Girpah proposal for entry into Apache Incubator. > There is an interesting section there: > {quote} > Relationships with Other Apache Products > > Giraph has some overlapping functionality with Apache Hama. However, there > are some significant differences. Giraph focuses on graph-based bulk > synchronous parallel (BSP) computing, while Apache Hama is more for general > purposed BSP computing. Giraph runs on the Hadoop infrastructure, while > Apache Hama uses its own computing framework. > {quote} > > I agree with the point about Hama being a general purposed BSP and Giraph > being completely graph oriented. But the later one about the infrastructure > is going to be moot with both Giraph and Hama trying to be ported over to > YARN. > > So here's my billion dollar question: Is it possible to implement Girpah's > graph based APIs over the Hama's bsp APIs which both run over a single > Apache BSP implementation over YARN? > > I also do see the email thread regarding Hama and Giraph's future > collaboration when Hadoop NextGen aka YARN comes in: > http://s.apache.org/HamaVsGiraph. So are we ready for this yet? > > Disclaimer: I come from the Hadoop world, have no idea of Giraph's APIs or > internals except that I see a bsp package in Giraph's source tree. I do know > a tiny bit about Hama's APIs and internal but my expertise is only two days. > > Thanks, > +Vinod > (An elephant maintainer trying to see if a Giraffe can be made to ride over > a hippopotamus riding over an elephant) > -- Best Regards, Edward J. Yoon @eddieyoon
[jira] [Commented] (GIRAPH-31) Hide the SortedMap> in Vertex from client visibility (impl. detail), replace with appropriate accessor methods
[ https://issues.apache.org/jira/browse/GIRAPH-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103793#comment-13103793 ] Avery Ching commented on GIRAPH-31: --- After "iterating" on it, given that we don't have a well defined use case for a sorted iterator and those apis I suggested are a little nasty, I think the prefer the following: Each Vertex implementation should implement Iterable as you both suggest, but I think following the Java utils style of sorted or not feels the most natural. We can describe the iterating order via javadoc and we can have multiple Vertex implementation, i.e. SortedPrimitiveVertex, HashPrimitiveVertex, etc. Somehow isSorted() feels a little yucky. Examples from java utils Set implemetnations: TreeSet: Iterator iterator() Returns an iterator over the elements in this set in ascending order. HashSet: Iterator iterator() Returns an iterator over the elements in this set. > Hide the SortedMap> in Vertex from client visibility (impl. > detail), replace with appropriate accessor methods > --- > > Key: GIRAPH-31 > URL: https://issues.apache.org/jira/browse/GIRAPH-31 > Project: Giraph > Issue Type: Improvement > Components: graph >Affects Versions: 0.70.0 >Reporter: Jake Mannix >Assignee: Jake Mannix > Attachments: GIRAPH-31.diff > > > As discussed on the list, and on GIRAPH-28, the SortedMap> is an > implementation detail which needs not be exposed to application developers - > they need to iterate over the edges, and possibly access them one-by-one, and > remove them (in the Mutable case), but they don't need the SortedMap, and > creating primitive-optimized BasicVertex implementations is hampered by the > fact that clients expect this Map to exist. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-31) Hide the SortedMap> in Vertex from client visibility (impl. detail), replace with appropriate accessor methods
[ https://issues.apache.org/jira/browse/GIRAPH-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103745#comment-13103745 ] Jake Mannix commented on GIRAPH-31: --- And for the implementations which have both the ability to provide a sorted iterator which isn't prohibitively expensive, but also provide a much faster unsorted iterator, they can choose whether to return true or false from the "isSorted()" method, and provide another method of the type you're suggesting. > Hide the SortedMap> in Vertex from client visibility (impl. > detail), replace with appropriate accessor methods > --- > > Key: GIRAPH-31 > URL: https://issues.apache.org/jira/browse/GIRAPH-31 > Project: Giraph > Issue Type: Improvement > Components: graph >Affects Versions: 0.70.0 >Reporter: Jake Mannix >Assignee: Jake Mannix > Attachments: GIRAPH-31.diff > > > As discussed on the list, and on GIRAPH-28, the SortedMap> is an > implementation detail which needs not be exposed to application developers - > they need to iterate over the edges, and possibly access them one-by-one, and > remove them (in the Mutable case), but they don't need the SortedMap, and > creating primitive-optimized BasicVertex implementations is hampered by the > fact that clients expect this Map to exist. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-31) Hide the SortedMap> in Vertex from client visibility (impl. detail), replace with appropriate accessor methods
[ https://issues.apache.org/jira/browse/GIRAPH-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103741#comment-13103741 ] Jake Mannix commented on GIRAPH-31: --- Right, as many implementations will just 'throw new UnsupportedOperationException("We don't sort!");' > Hide the SortedMap> in Vertex from client visibility (impl. > detail), replace with appropriate accessor methods > --- > > Key: GIRAPH-31 > URL: https://issues.apache.org/jira/browse/GIRAPH-31 > Project: Giraph > Issue Type: Improvement > Components: graph >Affects Versions: 0.70.0 >Reporter: Jake Mannix >Assignee: Jake Mannix > Attachments: GIRAPH-31.diff > > > As discussed on the list, and on GIRAPH-28, the SortedMap> is an > implementation detail which needs not be exposed to application developers - > they need to iterate over the edges, and possibly access them one-by-one, and > remove them (in the Mutable case), but they don't need the SortedMap, and > creating primitive-optimized BasicVertex implementations is hampered by the > fact that clients expect this Map to exist. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-31) Hide the SortedMap> in Vertex from client visibility (impl. detail), replace with appropriate accessor methods
[ https://issues.apache.org/jira/browse/GIRAPH-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103740#comment-13103740 ] Dmitriy V. Ryaboy commented on GIRAPH-31: - Avery, It seems like requiring all BasicVertex implementations to implement a sorted iterator even when they don't need it is a bit heavy-handed. > Hide the SortedMap> in Vertex from client visibility (impl. > detail), replace with appropriate accessor methods > --- > > Key: GIRAPH-31 > URL: https://issues.apache.org/jira/browse/GIRAPH-31 > Project: Giraph > Issue Type: Improvement > Components: graph >Affects Versions: 0.70.0 >Reporter: Jake Mannix >Assignee: Jake Mannix > Attachments: GIRAPH-31.diff > > > As discussed on the list, and on GIRAPH-28, the SortedMap> is an > implementation detail which needs not be exposed to application developers - > they need to iterate over the edges, and possibly access them one-by-one, and > remove them (in the Mutable case), but they don't need the SortedMap, and > creating primitive-optimized BasicVertex implementations is hampered by the > fact that clients expect this Map to exist. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-31) Hide the SortedMap> in Vertex from client visibility (impl. detail), replace with appropriate accessor methods
[ https://issues.apache.org/jira/browse/GIRAPH-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103735#comment-13103735 ] Avery Ching commented on GIRAPH-31: --- You sure you don't want to just provide the interfaces Iterator> getOutEdgeIterator(); Iterator> getSortedOutEdgeIterator(); or Iterator getOutEdgeIterator(); Iterator getSortedOutEdgeIterator(); It would do away with this issue of sorted...and still keep iterable, but sorted or not, it's up to the implementation. > Hide the SortedMap> in Vertex from client visibility (impl. > detail), replace with appropriate accessor methods > --- > > Key: GIRAPH-31 > URL: https://issues.apache.org/jira/browse/GIRAPH-31 > Project: Giraph > Issue Type: Improvement > Components: graph >Affects Versions: 0.70.0 >Reporter: Jake Mannix >Assignee: Jake Mannix > Attachments: GIRAPH-31.diff > > > As discussed on the list, and on GIRAPH-28, the SortedMap> is an > implementation detail which needs not be exposed to application developers - > they need to iterate over the edges, and possibly access them one-by-one, and > remove them (in the Mutable case), but they don't need the SortedMap, and > creating primitive-optimized BasicVertex implementations is hampered by the > fact that clients expect this Map to exist. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-31) Hide the SortedMap> in Vertex from client visibility (impl. detail), replace with appropriate accessor methods
[ https://issues.apache.org/jira/browse/GIRAPH-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103734#comment-13103734 ] Jake Mannix commented on GIRAPH-31: --- Avery, Dmitriy - after thinking about it, I think both true and false are wrong! BasicVertex shouldn't implement this method at all, leave it abstract, and sublcasses which implement iterator() are forced to also tell users whether it chose to implement it sorted or not. > Hide the SortedMap> in Vertex from client visibility (impl. > detail), replace with appropriate accessor methods > --- > > Key: GIRAPH-31 > URL: https://issues.apache.org/jira/browse/GIRAPH-31 > Project: Giraph > Issue Type: Improvement > Components: graph >Affects Versions: 0.70.0 >Reporter: Jake Mannix >Assignee: Jake Mannix > Attachments: GIRAPH-31.diff > > > As discussed on the list, and on GIRAPH-28, the SortedMap> is an > implementation detail which needs not be exposed to application developers - > they need to iterate over the edges, and possibly access them one-by-one, and > remove them (in the Mutable case), but they don't need the SortedMap, and > creating primitive-optimized BasicVertex implementations is hampered by the > fact that clients expect this Map to exist. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-31) Hide the SortedMap> in Vertex from client visibility (impl. detail), replace with appropriate accessor methods
[ https://issues.apache.org/jira/browse/GIRAPH-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103689#comment-13103689 ] Claudio Martella commented on GIRAPH-31: One question: how can I provide my own implementation of the edge-containing datastructure if addEdge is final? Maybe we should drop the final? > Hide the SortedMap> in Vertex from client visibility (impl. > detail), replace with appropriate accessor methods > --- > > Key: GIRAPH-31 > URL: https://issues.apache.org/jira/browse/GIRAPH-31 > Project: Giraph > Issue Type: Improvement > Components: graph >Affects Versions: 0.70.0 >Reporter: Jake Mannix >Assignee: Jake Mannix > Attachments: GIRAPH-31.diff > > > As discussed on the list, and on GIRAPH-28, the SortedMap> is an > implementation detail which needs not be exposed to application developers - > they need to iterate over the edges, and possibly access them one-by-one, and > remove them (in the Mutable case), but they don't need the SortedMap, and > creating primitive-optimized BasicVertex implementations is hampered by the > fact that clients expect this Map to exist. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-31) Hide the SortedMap> in Vertex from client visibility (impl. detail), replace with appropriate accessor methods
[ https://issues.apache.org/jira/browse/GIRAPH-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103442#comment-13103442 ] Dmitriy V. Ryaboy commented on GIRAPH-31: - I was just commenting on the javadoc, not the implementation. Though now that you say that, i think you are right, false is a safer thing to do. > Hide the SortedMap> in Vertex from client visibility (impl. > detail), replace with appropriate accessor methods > --- > > Key: GIRAPH-31 > URL: https://issues.apache.org/jira/browse/GIRAPH-31 > Project: Giraph > Issue Type: Improvement > Components: graph >Affects Versions: 0.70.0 >Reporter: Jake Mannix >Assignee: Jake Mannix > Attachments: GIRAPH-31.diff > > > As discussed on the list, and on GIRAPH-28, the SortedMap> is an > implementation detail which needs not be exposed to application developers - > they need to iterate over the edges, and possibly access them one-by-one, and > remove them (in the Mutable case), but they don't need the SortedMap, and > creating primitive-optimized BasicVertex implementations is hampered by the > fact that clients expect this Map to exist. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-31) Hide the SortedMap> in Vertex from client visibility (impl. detail), replace with appropriate accessor methods
[ https://issues.apache.org/jira/browse/GIRAPH-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103436#comment-13103436 ] Avery Ching commented on GIRAPH-31: --- committer +1. A few minor formatting issues (missing javadoc and over 80 char lines - I can fix before committing), but otherwise great! I agree with Dmitriy's comment that the default should be false. We should probably wait (maybe a day) for other folks to chime in for this one since it's a user facing interface. > Hide the SortedMap> in Vertex from client visibility (impl. > detail), replace with appropriate accessor methods > --- > > Key: GIRAPH-31 > URL: https://issues.apache.org/jira/browse/GIRAPH-31 > Project: Giraph > Issue Type: Improvement > Components: graph >Affects Versions: 0.70.0 >Reporter: Jake Mannix >Assignee: Jake Mannix > Attachments: GIRAPH-31.diff > > > As discussed on the list, and on GIRAPH-28, the SortedMap> is an > implementation detail which needs not be exposed to application developers - > they need to iterate over the edges, and possibly access them one-by-one, and > remove them (in the Mutable case), but they don't need the SortedMap, and > creating primitive-optimized BasicVertex implementations is hampered by the > fact that clients expect this Map to exist. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-12) Investigate communication improvements
[ https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103422#comment-13103422 ] Avery Ching commented on GIRAPH-12: --- Hyunsik, just to update, I grabbed your patch and it passed unittest on my machine. Then I ran it on a cluster at Yahoo!. I didn't have time to make a messaging benchmark, so I ran PageRankBenchmark. I ran with 100 workers, 1 M vertices, 3 supersteps, and 10 edges per vertex. Here are 2 runs with the original code: 11/09/13 07:02:08 INFO mapred.JobClient: Giraph Timers 11/09/13 07:02:08 INFO mapred.JobClient: Total (milliseconds)=46709 11/09/13 07:02:08 INFO mapred.JobClient: Superstep 3 (milliseconds)=1682 11/09/13 07:02:08 INFO mapred.JobClient: Setup (milliseconds)=3228 11/09/13 07:02:08 INFO mapred.JobClient: Shutdown (milliseconds)=1223 11/09/13 07:02:08 INFO mapred.JobClient: Vertex input superstep (milliseconds)=3578 11/09/13 07:02:08 INFO mapred.JobClient: Superstep 0 (milliseconds)=16222 11/09/13 07:02:08 INFO mapred.JobClient: Superstep 2 (milliseconds)=12302 11/09/13 07:02:08 INFO mapred.JobClient: Superstep 1 (milliseconds)=8467 13 07:14:51 INFO mapred.JobClient: Giraph Timers 11/09/13 07:14:51 INFO mapred.JobClient: Total (milliseconds)=51475 11/09/13 07:14:51 INFO mapred.JobClient: Superstep 3 (milliseconds)=1348 11/09/13 07:14:51 INFO mapred.JobClient: Setup (milliseconds)=7233 11/09/13 07:14:51 INFO mapred.JobClient: Shutdown (milliseconds)=884 11/09/13 07:14:51 INFO mapred.JobClient: Vertex input superstep (milliseconds)=3284 11/09/13 07:14:51 INFO mapred.JobClient: Superstep 0 (milliseconds)=22213 11/09/13 07:14:51 INFO mapred.JobClient: Superstep 2 (milliseconds)=8553 11/09/13 07:14:51 INFO mapred.JobClient: Superstep 1 (milliseconds)=7955 Here are 2 runs with your code: 11/09/13 07:06:56 INFO mapred.JobClient: Giraph Timers 11/09/13 07:06:56 INFO mapred.JobClient: Total (milliseconds)=51935 11/09/13 07:06:56 INFO mapred.JobClient: Superstep 3 (milliseconds)=1150 11/09/13 07:06:56 INFO mapred.JobClient: Setup (milliseconds)=3338 11/09/13 07:06:56 INFO mapred.JobClient: Shutdown (milliseconds)=833 11/09/13 07:06:56 INFO mapred.JobClient: Vertex input superstep (milliseconds)=3401 11/09/13 07:06:56 INFO mapred.JobClient: Superstep 0 (milliseconds)=17297 11/09/13 07:06:56 INFO mapred.JobClient: Superstep 2 (milliseconds)=14384 11/09/13 07:06:56 INFO mapred.JobClient: Superstep 1 (milliseconds)=11528 11/09/13 07:12:09 INFO mapred.JobClient: Giraph Timers 11/09/13 07:12:09 INFO mapred.JobClient: Total (milliseconds)=51985 11/09/13 07:12:09 INFO mapred.JobClient: Superstep 3 (milliseconds)=1362 11/09/13 07:12:09 INFO mapred.JobClient: Setup (milliseconds)=3776 11/09/13 07:12:09 INFO mapred.JobClient: Shutdown (milliseconds)=710 11/09/13 07:12:09 INFO mapred.JobClient: Vertex input superstep (milliseconds)=3771 11/09/13 07:12:09 INFO mapred.JobClient: Superstep 0 (milliseconds)=17741 11/09/13 07:12:09 INFO mapred.JobClient: Superstep 2 (milliseconds)=13068 11/09/13 07:12:09 INFO mapred.JobClient: Superstep 1 (milliseconds)=11551 In my limited testing, numbers aren't too different. I also see that the connections are maintained throughout the application run as you mentioned. So the only tradeoff is possibly the reduced parallelization of message sending (user chosen vs all threads). I like the approach and think it's an improvement (controllable threads). Perhaps the only comment is that regarding the following code block. for(PeerConnection pc : peerConnections.values()) { futures.add(executor.submit(new PeerFlushExecutor(pc))); } Probably would be good to randomize the PeerConnection objects to avoid hotspots on the receiving side? > Investigate communication improvements > -- > > Key: GIRAPH-12 > URL: https://issues.apache.org/jira/browse/GIRAPH-12 > Project: Giraph > Issue Type: Improvement > Components: bsp >Reporter: Avery Ching >Assignee: Hyunsik Choi >Priority: Minor > Attachments: GIRAPH-12_1.patch > > > Currently every worker will start up a thread to communicate with every other > workers. Hadoop RPC is used for communication. For instance if there are > 400 workers, each worker will create 400 threads. This ends up using a lot > of memory, even with the option > -Dmapred.child.java.opts="-Xss64k". > It would be good to investigate using frameworks like Netty or custom roll > our own to improve this situation. By moving away from Hadoop RPC, we would > also make compatibility of different Hadoop versions easier. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://ww
[jira] [Commented] (GIRAPH-31) Hide the SortedMap> in Vertex from client visibility (impl. detail), replace with appropriate accessor methods
[ https://issues.apache.org/jira/browse/GIRAPH-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103412#comment-13103412 ] Dmitriy V. Ryaboy commented on GIRAPH-31: - non-committer +1. Please change javadoc for providesSortedIterator to not just say "@return true" -- implementations that override this to return false might forget to provide their own javadoc, inherit this, and this claim behavior opposite from what they actually do. > Hide the SortedMap> in Vertex from client visibility (impl. > detail), replace with appropriate accessor methods > --- > > Key: GIRAPH-31 > URL: https://issues.apache.org/jira/browse/GIRAPH-31 > Project: Giraph > Issue Type: Improvement > Components: graph >Affects Versions: 0.70.0 >Reporter: Jake Mannix >Assignee: Jake Mannix > Attachments: GIRAPH-31.diff > > > As discussed on the list, and on GIRAPH-28, the SortedMap> is an > implementation detail which needs not be exposed to application developers - > they need to iterate over the edges, and possibly access them one-by-one, and > remove them (in the Mutable case), but they don't need the SortedMap, and > creating primitive-optimized BasicVertex implementations is hampered by the > fact that clients expect this Map to exist. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira