Re: [Announcement] Giraph talk in Berlin on May 29th

2012-05-14 Thread Gianmarco De Francisci Morales
Hi,


> It would be good to present users a couple of non trivial examples and one
> or
> two 'real' use cases where Apache Giraph is used for processing large
> graphs.
> Apache Giraph comes with two examples: all shortest paths from a single
> source
> and PageRank. Google's Pregel paper describes 'bipartite matching' and
> 'semi-clustering'. Is anyone working on implementing these in Giraph?
> Or, what if in the shortest paths example you actually want to know the
> path?
>
>
I have some toy code (not really well tested) that implements b-matching
(that is matching with integer capacities on the nodes).
It's a simple greedy method, along the lines of the one described here
www.vldb.org/pvldb/vol4/p460-morales.pdf

I can share it if you are interested.

Cheers,
--
Gianmarco



It would be great to have examples on more advanced features: custom
> partitioning functions, aggregators, ...
>
> Personally, I'd like to see a side-by-side comparison of Google's Pregel as
> described in their paper and Giraph implementation (I am particularly
> interested
> on where they diverge and why).
>
> Another question (or thing I am not so sure about) is about 'capacity
> planning'
> (sort of...). Given a dataset and an algorithm implemented in Giraph, how
> you
> determine how many workers would be needed (in order to fit all your graph
> and
> messages for each superstep in RAM)?
>
> Last but not least, it seems to me that PageRank is what you use to
> 'benchmark'
> Giraph, is that the case? If that is the case, sharing a common dataset for
> others to use would be a first initial step to allow people to compare
> performances of different software running the very same algorithm, over
> the
> same data and the same hardware infrastructure.
>
> Paolo
>
> Sebastian Schelter wrote:
> > Hi,
> >
> > I will give a talk titled "Large Scale Graph Processing with Apache
> > Giraph" in Berlin on May 29th. Details are available at:
> >
> >
> https://www.xing.com/events/gameduell-tech-talk-on-the-topic-large-scale-graph-processing-with-apache-giraph-1092275
> >
> > Best,
> > Sebastian
>
>


Re: [Announcement] Giraph talk in Berlin on May 29th

2012-05-14 Thread Paolo Castagna
u...@mahout.apache.org

Hi,
by the way, about talks/presentations, here are the Apache Giraph
talks/presentations I found:

“Giraph: Large-scale graph processing on Hadoop”, Avery Ching
Hadoop Summit 2011 - Santa Clara, California - June 2011
http://www.slideshare.net/averyching/20110628giraph-hadoop-summit
http://www.youtube.com/watch?v=l4nQjAG6fac

“Apache Giraph: Distributed Graph Processing in the Cloud”, Claudio Martella
FOSDEM 2012 - Brussels, Belgium - February 2012
http://prezi.com/9ake_klzwrga/apache-giraph-distributed-graph-processing-in-the-cloud/
http://blog.acaro.org/entry/giraph-talk-for-graphdevroom-fosdem-2012
http://www.youtube.com/watch?v=3ZrqPEIPRe4
http://www.youtube.com/watch?v=BmRaejKGeDM

“Introducing Apache Giraph for Large Scale Graph Processing”, Sebastian Schelter
Apache Hadoop Get Together - Berlin, Germany - April 2012
http://ssc.io/introducing-apache-giraph-for-large-scale-graph-processing/
http://www.slideshare.net/sscdotopen/introducing-apache-giraph-for-large-scale-graph-processing
http://vimeo.com/40737998

You could put the links on the Apache Giraph wiki.

First of all, thank you for sharing them and may I add a few comments or
suggestions for future presentations? (don't take this as a critic, please)...

It would be good to present users a couple of non trivial examples and one or
two 'real' use cases where Apache Giraph is used for processing large graphs.
Apache Giraph comes with two examples: all shortest paths from a single source
and PageRank. Google's Pregel paper describes 'bipartite matching' and
'semi-clustering'. Is anyone working on implementing these in Giraph?
Or, what if in the shortest paths example you actually want to know the path?

It would be great to have examples on more advanced features: custom
partitioning functions, aggregators, ...

Personally, I'd like to see a side-by-side comparison of Google's Pregel as
described in their paper and Giraph implementation (I am particularly interested
on where they diverge and why).

Another question (or thing I am not so sure about) is about 'capacity planning'
(sort of...). Given a dataset and an algorithm implemented in Giraph, how you
determine how many workers would be needed (in order to fit all your graph and
messages for each superstep in RAM)?

Last but not least, it seems to me that PageRank is what you use to 'benchmark'
Giraph, is that the case? If that is the case, sharing a common dataset for
others to use would be a first initial step to allow people to compare
performances of different software running the very same algorithm, over the
same data and the same hardware infrastructure.

Paolo

Sebastian Schelter wrote:
> Hi,
> 
> I will give a talk titled "Large Scale Graph Processing with Apache
> Giraph" in Berlin on May 29th. Details are available at:
> 
> https://www.xing.com/events/gameduell-tech-talk-on-the-topic-large-scale-graph-processing-with-apache-giraph-1092275
> 
> Best,
> Sebastian



Re: [Announcement] Giraph talk in Berlin on May 29th

2012-05-13 Thread Sebastian Schelter
Warming up your audience :)

On 12.05.2012 22:01, Jakob Homan wrote:
> Stealing my thunder? :)
> 
> On Sat, May 12, 2012 at 7:36 AM, Avery Ching  wrote:
>> Nice!
>>
>> Avery
>>
>>
>> On 5/12/12 2:58 AM, Sebastian Schelter wrote:
>>>
>>> Hi,
>>>
>>> I will give a talk titled "Large Scale Graph Processing with Apache
>>> Giraph" in Berlin on May 29th. Details are available at:
>>>
>>>
>>> https://www.xing.com/events/gameduell-tech-talk-on-the-topic-large-scale-graph-processing-with-apache-giraph-1092275
>>>
>>> Best,
>>> Sebastian
>>
>>



Re: [Announcement] Giraph talk in Berlin on May 29th

2012-05-12 Thread Jakob Homan
Stealing my thunder? :)

On Sat, May 12, 2012 at 7:36 AM, Avery Ching  wrote:
> Nice!
>
> Avery
>
>
> On 5/12/12 2:58 AM, Sebastian Schelter wrote:
>>
>> Hi,
>>
>> I will give a talk titled "Large Scale Graph Processing with Apache
>> Giraph" in Berlin on May 29th. Details are available at:
>>
>>
>> https://www.xing.com/events/gameduell-tech-talk-on-the-topic-large-scale-graph-processing-with-apache-giraph-1092275
>>
>> Best,
>> Sebastian
>
>


Re: [Announcement] Giraph talk in Berlin on May 29th

2012-05-12 Thread Avery Ching

Nice!

Avery

On 5/12/12 2:58 AM, Sebastian Schelter wrote:

Hi,

I will give a talk titled "Large Scale Graph Processing with Apache
Giraph" in Berlin on May 29th. Details are available at:

https://www.xing.com/events/gameduell-tech-talk-on-the-topic-large-scale-graph-processing-with-apache-giraph-1092275

Best,
Sebastian




[Announcement] Giraph talk in Berlin on May 29th

2012-05-12 Thread Sebastian Schelter
Hi,

I will give a talk titled "Large Scale Graph Processing with Apache
Giraph" in Berlin on May 29th. Details are available at:

https://www.xing.com/events/gameduell-tech-talk-on-the-topic-large-scale-graph-processing-with-apache-giraph-1092275

Best,
Sebastian