Re: Comparing BSP and MR

2011-12-09 Thread Praveen Sripati
Jake,

 Let's not crosspost, please, it make the thread of conversation totally
opaque as to who is talking about what.

Agree. I got it after the OP.

 There is only one set of map tasks for the Giraph job - those
long-running map tasks run possibly many supersteps.

OK. But, map tasks don't communicate with each other. How are messages sent
across in the communication phase of a super step that happens within a map?

That's what led me to think that Map tasks correspond to computing phase
and not the complete super step. Once all the Maps are complete, another
set of Maps are launched. If I am not wrong, Hadoop supports MMR type of
jobs.

 In Giraph, vertices can move around workers between supersteps.  A vertex
will run on the worker that it is assigned to.

Is there any advantage of moving the processing of vertices from one worker
to another. Can't there be affinity between a worker and the vertices it
processes?

Regards,
Praveen

On Fri, Dec 9, 2011 at 11:33 PM, Jake Mannix jake.man...@gmail.com wrote:

 [hama-user to bcc:]

 Let's not crosspost, please, it make the thread of conversation totally
 opaque as to who is talking about what.

 On Fri, Dec 9, 2011 at 1:42 AM, Praveen Sripati 
 praveensrip...@gmail.comwrote:

 Thanks to Thomas and Avery for the response.

  For Giraph you are quite correct, all the stuff is submitted as a MR
 job. But a full map stage is not a superstep, the whole computation is a
 done in one mapping phase.

 So a map task in MR corresponds to a computation phase in a superstep.
 Once the computation phase for a superstep is complete, the vertex output
 is stored using the defined OutputFormat, the message sent (may be) to
 another vertex and the map task is stopped. Once the barrier
 synchronization phase is complete, another set of map tasks are invoked for
 the vertices which have received a message.


 In Giraph, each superstep does not lead to storage into an OutputFormat.
  The data lives all in memory from the time the first superstep starts to
 the time the final superstep stops (except that for tolerance of failures,
 checkpoints are stored to disk at user-specified intervals).  There is only
 one set of map tasks for the Giraph job - those long-running map tasks run
 possibly many supersteps.


 In a regular MR Job (not Giraph) the number of Map tasks equals to the
 number of InputSplits. But, in case of Giraph the total number of maps to
 be launched is usually more than the number of input vertices.


 Number of maps  number of input vertices?  Not at all.  That would be
 insane.  We want to be able to run over multi-billion vertex graphs.  We're
 going to launch multiple billions of mappers?   The splitting of the data
 in Giraph is very similar to in a regular MR job, divide up your input data
 among the number of mappers you have, and you're off and running.



  Where are the incoming, outgoing messages and state stored
  Memory

 What happens if a particular node is lost in case of Hama and Giraph? Are
 the messages not persisted somewhere to be fetched later.


 If nodes are lost, the system has to back up to the most recent
 checkpoint, where graph state has been persisted to HDFS.  Messages are not
 currently persisted, but the state at which the graph was in to produce any
 messages was.


  In Giraph, vertices can move around workers between supersteps.  A
 vertex will run on the worker that it is assigned to.

 Is data locality considered while moving vertices around workers in
 Giraph?


 Data is all in memory, and typical graph algorithms are basically sending
 roughly the size of the entire graph (number of total edges) out over
 distributed RPC in any given superstep, so shuffling the graph around by
 RPC is not much more to do.



  As you can see, you could write a MapReduce Engine with BSP on top of
 Apache Hama.

 It's being the done other way, BSP is implemented in Giraph using Hadoop.


 I'll let the Hama people explain to you about how one would implement MR
 on top of Hama.  You are correct that in Giraph, the Hadoop
 JobTracker/TaskTracker and HDFS are used as substrate to help implement BSP
 (although I would not say that MR is being used to implement BSP, as
 there is no MR going on in Giraph).

   -jake




 Praveen

 On Fri, Dec 9, 2011 at 12:51 PM, Avery Ching ach...@apache.org wrote:

  Hi Praveen,

 Answers inline.  Hope that helps!

 Avery

 On 12/8/11 10:16 PM, Praveen Sripati wrote:

 Hi,

 I know about MapReduce/Hadoop and trying to get myself around
 BSP/Hama-Giraph by comparing MR and BSP.

 - Map Phase in MR is similar to Computation Phase in BSP. BSP allows for
 process to exchange data in the communication phase, but there is no
 communication between the mappers in the Map Phase. Though the data flows
 from Map tasks to Reducer tasks. Please correct me if I am wrong. Any other
 significant differences?

 I suppose you can think of it that way.  I like to compare a BSP
 superstep to a MapReduce job since it's 

Re: Comparing BSP and MR

2011-12-09 Thread Jake Mannix
On Fri, Dec 9, 2011 at 8:16 PM, Praveen Sripati praveensrip...@gmail.comwrote:

 Jake,


  Let's not crosspost, please, it make the thread of conversation totally
 opaque as to who is talking about what.

 Agree. I got it after the OP.


  There is only one set of map tasks for the Giraph job - those
 long-running map tasks run possibly many supersteps.

 OK. But, map tasks don't communicate with each other. How are messages
 sent across in the communication phase of a super step that happens within
 a map?


Giraph maps do communicate: via RPC.  This is done repeatedly in every
mapper, during the compute phase.  This is something that is not normal to
MapReduce, it is special to Giraph.


  In Giraph, vertices can move around workers between supersteps.  A
 vertex will run on the worker that it is assigned to.

 Is there any advantage of moving the processing of vertices from one
 worker to another. Can't there be affinity between a worker and the
 vertices it processes?


Often there will be affinity, but if the graph itself evolves during
computation (some sort of iterative pruning or clustering), then moving
around may make sense.  Also: if nodes die.

  -jake


 Regards,
 Praveen

 On Fri, Dec 9, 2011 at 11:33 PM, Jake Mannix jake.man...@gmail.comwrote:

 [hama-user to bcc:]

 Let's not crosspost, please, it make the thread of conversation totally
 opaque as to who is talking about what.

 On Fri, Dec 9, 2011 at 1:42 AM, Praveen Sripati praveensrip...@gmail.com
  wrote:

 Thanks to Thomas and Avery for the response.

  For Giraph you are quite correct, all the stuff is submitted as a MR
 job. But a full map stage is not a superstep, the whole computation is a
 done in one mapping phase.

 So a map task in MR corresponds to a computation phase in a superstep.
 Once the computation phase for a superstep is complete, the vertex output
 is stored using the defined OutputFormat, the message sent (may be) to
 another vertex and the map task is stopped. Once the barrier
 synchronization phase is complete, another set of map tasks are invoked for
 the vertices which have received a message.


 In Giraph, each superstep does not lead to storage into an OutputFormat.
  The data lives all in memory from the time the first superstep starts to
 the time the final superstep stops (except that for tolerance of failures,
 checkpoints are stored to disk at user-specified intervals).  There is only
 one set of map tasks for the Giraph job - those long-running map tasks run
 possibly many supersteps.


 In a regular MR Job (not Giraph) the number of Map tasks equals to the
 number of InputSplits. But, in case of Giraph the total number of maps to
 be launched is usually more than the number of input vertices.


 Number of maps  number of input vertices?  Not at all.  That would be
 insane.  We want to be able to run over multi-billion vertex graphs.  We're
 going to launch multiple billions of mappers?   The splitting of the data
 in Giraph is very similar to in a regular MR job, divide up your input data
 among the number of mappers you have, and you're off and running.



  Where are the incoming, outgoing messages and state stored
  Memory

 What happens if a particular node is lost in case of Hama and Giraph?
 Are the messages not persisted somewhere to be fetched later.


 If nodes are lost, the system has to back up to the most recent
 checkpoint, where graph state has been persisted to HDFS.  Messages are not
 currently persisted, but the state at which the graph was in to produce any
 messages was.


  In Giraph, vertices can move around workers between supersteps.  A
 vertex will run on the worker that it is assigned to.

 Is data locality considered while moving vertices around workers in
 Giraph?


 Data is all in memory, and typical graph algorithms are basically sending
 roughly the size of the entire graph (number of total edges) out over
 distributed RPC in any given superstep, so shuffling the graph around by
 RPC is not much more to do.



  As you can see, you could write a MapReduce Engine with BSP on top of
 Apache Hama.

 It's being the done other way, BSP is implemented in Giraph using Hadoop.


 I'll let the Hama people explain to you about how one would implement MR
 on top of Hama.  You are correct that in Giraph, the Hadoop
 JobTracker/TaskTracker and HDFS are used as substrate to help implement BSP
 (although I would not say that MR is being used to implement BSP, as
 there is no MR going on in Giraph).

   -jake




 Praveen

 On Fri, Dec 9, 2011 at 12:51 PM, Avery Ching ach...@apache.org wrote:

  Hi Praveen,

 Answers inline.  Hope that helps!

 Avery

 On 12/8/11 10:16 PM, Praveen Sripati wrote:

 Hi,

 I know about MapReduce/Hadoop and trying to get myself around
 BSP/Hama-Giraph by comparing MR and BSP.

 - Map Phase in MR is similar to Computation Phase in BSP. BSP allows
 for process to exchange data in the communication phase, but there is no
 communication between the mappers in the