Re: [Announcement] Giraph talk in Berlin on May 29th
Hi, > It would be good to present users a couple of non trivial examples and one > or > two 'real' use cases where Apache Giraph is used for processing large > graphs. > Apache Giraph comes with two examples: all shortest paths from a single > source > and PageRank. Google's Pregel paper describes 'bipartite matching' and > 'semi-clustering'. Is anyone working on implementing these in Giraph? > Or, what if in the shortest paths example you actually want to know the > path? > > I have some toy code (not really well tested) that implements b-matching (that is matching with integer capacities on the nodes). It's a simple greedy method, along the lines of the one described here www.vldb.org/pvldb/vol4/p460-morales.pdf I can share it if you are interested. Cheers, -- Gianmarco It would be great to have examples on more advanced features: custom > partitioning functions, aggregators, ... > > Personally, I'd like to see a side-by-side comparison of Google's Pregel as > described in their paper and Giraph implementation (I am particularly > interested > on where they diverge and why). > > Another question (or thing I am not so sure about) is about 'capacity > planning' > (sort of...). Given a dataset and an algorithm implemented in Giraph, how > you > determine how many workers would be needed (in order to fit all your graph > and > messages for each superstep in RAM)? > > Last but not least, it seems to me that PageRank is what you use to > 'benchmark' > Giraph, is that the case? If that is the case, sharing a common dataset for > others to use would be a first initial step to allow people to compare > performances of different software running the very same algorithm, over > the > same data and the same hardware infrastructure. > > Paolo > > Sebastian Schelter wrote: > > Hi, > > > > I will give a talk titled "Large Scale Graph Processing with Apache > > Giraph" in Berlin on May 29th. Details are available at: > > > > > https://www.xing.com/events/gameduell-tech-talk-on-the-topic-large-scale-graph-processing-with-apache-giraph-1092275 > > > > Best, > > Sebastian > >
Re: [Announcement] Giraph talk in Berlin on May 29th
u...@mahout.apache.org Hi, by the way, about talks/presentations, here are the Apache Giraph talks/presentations I found: “Giraph: Large-scale graph processing on Hadoop”, Avery Ching Hadoop Summit 2011 - Santa Clara, California - June 2011 http://www.slideshare.net/averyching/20110628giraph-hadoop-summit http://www.youtube.com/watch?v=l4nQjAG6fac “Apache Giraph: Distributed Graph Processing in the Cloud”, Claudio Martella FOSDEM 2012 - Brussels, Belgium - February 2012 http://prezi.com/9ake_klzwrga/apache-giraph-distributed-graph-processing-in-the-cloud/ http://blog.acaro.org/entry/giraph-talk-for-graphdevroom-fosdem-2012 http://www.youtube.com/watch?v=3ZrqPEIPRe4 http://www.youtube.com/watch?v=BmRaejKGeDM “Introducing Apache Giraph for Large Scale Graph Processing”, Sebastian Schelter Apache Hadoop Get Together - Berlin, Germany - April 2012 http://ssc.io/introducing-apache-giraph-for-large-scale-graph-processing/ http://www.slideshare.net/sscdotopen/introducing-apache-giraph-for-large-scale-graph-processing http://vimeo.com/40737998 You could put the links on the Apache Giraph wiki. First of all, thank you for sharing them and may I add a few comments or suggestions for future presentations? (don't take this as a critic, please)... It would be good to present users a couple of non trivial examples and one or two 'real' use cases where Apache Giraph is used for processing large graphs. Apache Giraph comes with two examples: all shortest paths from a single source and PageRank. Google's Pregel paper describes 'bipartite matching' and 'semi-clustering'. Is anyone working on implementing these in Giraph? Or, what if in the shortest paths example you actually want to know the path? It would be great to have examples on more advanced features: custom partitioning functions, aggregators, ... Personally, I'd like to see a side-by-side comparison of Google's Pregel as described in their paper and Giraph implementation (I am particularly interested on where they diverge and why). Another question (or thing I am not so sure about) is about 'capacity planning' (sort of...). Given a dataset and an algorithm implemented in Giraph, how you determine how many workers would be needed (in order to fit all your graph and messages for each superstep in RAM)? Last but not least, it seems to me that PageRank is what you use to 'benchmark' Giraph, is that the case? If that is the case, sharing a common dataset for others to use would be a first initial step to allow people to compare performances of different software running the very same algorithm, over the same data and the same hardware infrastructure. Paolo Sebastian Schelter wrote: > Hi, > > I will give a talk titled "Large Scale Graph Processing with Apache > Giraph" in Berlin on May 29th. Details are available at: > > https://www.xing.com/events/gameduell-tech-talk-on-the-topic-large-scale-graph-processing-with-apache-giraph-1092275 > > Best, > Sebastian
Re: [Announcement] Giraph talk in Berlin on May 29th
Warming up your audience :) On 12.05.2012 22:01, Jakob Homan wrote: > Stealing my thunder? :) > > On Sat, May 12, 2012 at 7:36 AM, Avery Ching wrote: >> Nice! >> >> Avery >> >> >> On 5/12/12 2:58 AM, Sebastian Schelter wrote: >>> >>> Hi, >>> >>> I will give a talk titled "Large Scale Graph Processing with Apache >>> Giraph" in Berlin on May 29th. Details are available at: >>> >>> >>> https://www.xing.com/events/gameduell-tech-talk-on-the-topic-large-scale-graph-processing-with-apache-giraph-1092275 >>> >>> Best, >>> Sebastian >> >>
Re: [Announcement] Giraph talk in Berlin on May 29th
Stealing my thunder? :) On Sat, May 12, 2012 at 7:36 AM, Avery Ching wrote: > Nice! > > Avery > > > On 5/12/12 2:58 AM, Sebastian Schelter wrote: >> >> Hi, >> >> I will give a talk titled "Large Scale Graph Processing with Apache >> Giraph" in Berlin on May 29th. Details are available at: >> >> >> https://www.xing.com/events/gameduell-tech-talk-on-the-topic-large-scale-graph-processing-with-apache-giraph-1092275 >> >> Best, >> Sebastian > >
Re: [Announcement] Giraph talk in Berlin on May 29th
Nice! Avery On 5/12/12 2:58 AM, Sebastian Schelter wrote: Hi, I will give a talk titled "Large Scale Graph Processing with Apache Giraph" in Berlin on May 29th. Details are available at: https://www.xing.com/events/gameduell-tech-talk-on-the-topic-large-scale-graph-processing-with-apache-giraph-1092275 Best, Sebastian
[Announcement] Giraph talk in Berlin on May 29th
Hi, I will give a talk titled "Large Scale Graph Processing with Apache Giraph" in Berlin on May 29th. Details are available at: https://www.xing.com/events/gameduell-tech-talk-on-the-topic-large-scale-graph-processing-with-apache-giraph-1092275 Best, Sebastian