Re: Getting Started Code

2012-04-21 Thread Benjamin Heitmann

On 20 Apr 2012, at 17:27, Etienne Dumoulin wrote:

 Like it is a bit difficult to start from scratch a first time,
 I copied/pasted the getting started code.
 
 I do not understand why there is 3 template classes when
 TextVertexInputFormat has 4
 (from the javadoc and eclipse).
 
 For example:
 public static class SimpleShortestPathsVertexInputFormat extends
 TextVertexInputFormatLongWritable, DoubleWritable, FloatWritable
 
 Is there something I miss?

Personally, I dont think that the code to which the getting started guide 
refers, 
is the best place to start *understanding* Giraph.
But its the best code to execute in order to see if your Hadoop setup actually 
works for Giraph. 

A better example for understanding Giraph IMHO is the 
ConnectedComponentsVertexTest class (in the test directory of the code). 
It shows that you need to implement a Vertex class, an TextInputFormat and a 
TextOutputFormat class in order to define your own job. 
Then you can use InternalVertexRunner.run() in order to test your code inside 
of Giraph, and figure out Hadoop later. 

I was puzzled about your 3 templates versus 4 error, but then I tried it out 
in Eclipse. If you are also using eclipse, then I know what you mean ;) 
If you mouse over e.g. SimpleShortesPathsVertex, then eclipse will display a 
tool-tip. However, thats just the top part of the javadoc with 3 template 
parameters, you need to scroll down, 
in order to see all 4 elements of the template signature. To do that, you can 
either press F2, to see the scroll bar or resize the window. 
Or you click on the class, and then you can read the source of the javadoc. 

However, keep in mind, that some specific classes which are used in the 
TextInputFormat and the TextOutputFormat have template signatures
which really only have 2 or 3 elements. 

As a second thing, I would look at the SimpleShortestPathVertexTest class 
(again in the test directory).

Then you can look at the PageRankeBenchmark class in giraph.benchmark. 


Let me/us know if you have more questions. 

Re: Slides for my talk at the Berlin Hadoop Get Together

2012-04-21 Thread Sebastian Schelter
I'm not sure what we'll do at the workshop, maybe Claudio could give a
presentation of his excellent prezi slides?

Best,
Sebastian


On 19.04.2012 22:07, Avery Ching wrote:
 Very nice!  Will these be similar to the 'Parallel Processing beyond
 MapReduce' workshop after Berlin Buzzwords?  It would be good to add at
 leaset one of them to the page.
 
 Avery
 
 On 4/19/12 12:31 PM, Sebastian Schelter wrote:
 Here are the slides of my talk Introducing Apache Giraph for Large
 Scale Graph Processing at the Berlin Hadoop Get Together yesterday:

 http://www.slideshare.net/sscdotopen/introducing-apache-giraph-for-large-scale-graph-processing


 I reused a lot of stuff from Claudio's excellent prezi presentation.

 Best,
 Sebastian
 



Re: Slides for my talk at the Berlin Hadoop Get Together

2012-04-21 Thread Sebastian Schelter
The benchmark is part of a paper which my colleagues have recently
submitted. Once it's published, I'll ask them to describe their
benchmark results on this list.




On 19.04.2012 22:20, Claudio Martella wrote:
 I like it, very clear.
 I think it would be nice to add the one with some comparison with
 Stratosphere, it would be a nice add-on. Do you have any update on the
 benchmark between Giraph and stratosphere you mentioned on twitter a
 while ago?
 
 Looking forward to meet you guys in Berlin.
 
 On Thu, Apr 19, 2012 at 10:07 PM, Avery Ching ach...@apache.org wrote:
 Very nice!  Will these be similar to the 'Parallel Processing beyond
 MapReduce' workshop after Berlin Buzzwords?  It would be good to add at
 leaset one of them to the page.

 Avery


 On 4/19/12 12:31 PM, Sebastian Schelter wrote:

 Here are the slides of my talk Introducing Apache Giraph for Large
 Scale Graph Processing at the Berlin Hadoop Get Together yesterday:


 http://www.slideshare.net/sscdotopen/introducing-apache-giraph-for-large-scale-graph-processing

 I reused a lot of stuff from Claudio's excellent prezi presentation.

 Best,
 Sebastian