Re: Message processing

2011-09-09 Thread Avery Ching

Jake,

The model you're describing sounds a bit like a hybrid between BSP and 
asynchronous, stream based computing.  Definitely would be great to 
experiment with either of these a bit.  I too would prefer to eliminate 
any complicated locking models (i.e. a distributed lock manager for all 
vertices).  But it might come to that if we decide that asynchronous 
remote vertex mutation is important.  I think an asynchronous model 
could provide performance benefits over BSP in some cases.  But 
debugging would be more difficult (less deterministic).  So I believe 
both models will be useful for Giraph.


Avery

On 9/9/11 8:26 AM, Jake Mannix wrote:


On Fri, Sep 9, 2011 at 8:03 AM, Avery Ching > wrote:


The GraphLab model is more asynchronous than BSP  They allow you
to update your neighbors rather than the BSP model of messaging
per superstep.  Rather than one massive barrier in BSP, they
implement this with vertex locking.  They also all a vertex to
modify the state of its neighbors.  We could certainly add
something similar as an alternative computing model, perhaps
without locking.  Here's one idea:

1) No explicit supersteps (asynchronous)


This sounds interesting, esp. for streaming algorithms, although I was 
thinking something slightly less ambitions to start out: still have 
supersteps (effectively) which describe when each vertex has had a 
chance to send all messages it wants for this iteration, and has 
processed all inbound messages.


2) All vertices execute compute() (and may or may not send
messages) initially
3) Vertices can examine their neighbors or any vertex in the graph
(issue RPCs to get their state)


"or any vertex in the graph" sounds pretty scary, but yes, powerful. 
 I like it when my seemingly radical ideas get made look not so scary 
by comparison! :)


4) When messages are received by a vertex, compute() is executed
on it (and state is locally locked to compute only)
5) Vertices stlll vote to halt when done, indicating the end of
the application.
6) Combiners can still be used to reduce the number of messages
sent (and the number of times compute is executed).

This could be fun.  And provide an interesting comparison platform
barrier based vs vertex based synchronization.


Yeah, I think locking is an implementation detail which might be even 
avoidable: if Vertices are effectively given a messageQueue which they 
can process from, we could interpolate between buffering and 
processing messages synchonously.  The per-mapper threading model 
could get... interesting!

  -jake




Re: Message processing

2011-09-09 Thread Avery Ching
Here's the reference to GraphLab. http://graphlab.org/

On Fri, Sep 9, 2011 at 8:20 AM, Claudio Martella  wrote:

> sounds like a DataFlow paradigm. Could you please provide a reference to
> this "they"? :)
>
>
> On Fri, Sep 9, 2011 at 5:03 PM, Avery Ching  wrote:
>
>> The GraphLab model is more asynchronous than BSP  They allow you to update
>> your neighbors rather than the BSP model of messaging per superstep.  Rather
>> than one massive barrier in BSP, they implement this with vertex locking.
>>  They also all a vertex to modify the state of its neighbors.  We could
>> certainly add something similar as an alternative computing model, perhaps
>> without locking.  Here's one idea:
>>
>> 1) No explicit supersteps (asynchronous)
>> 2) All vertices execute compute() (and may or may not send messages)
>> initially
>> 3) Vertices can examine their neighbors or any vertex in the graph (issue
>> RPCs to get their state)
>> 4) When messages are received by a vertex, compute() is executed on it
>> (and state is locally locked to compute only)
>> 5) Vertices stlll vote to halt when done, indicating the end of the
>> application.
>> 6) Combiners can still be used to reduce the number of messages sent (and
>> the number of times compute is executed).
>>
>> This could be fun.  And provide an interesting comparison platform barrier
>> based vs vertex based synchronization.
>>
>> On Fri, Sep 9, 2011 at 6:36 AM, Jake Mannix wrote:
>>
>>>
>>>
>>> On Fri, Sep 9, 2011 at 3:22 AM, Claudio Martella <
>>> claudio.marte...@gmail.com> wrote:
>>>
 One misunderstanding my side. Isn't it true that the messages have to be
 buffered as they all have to be collected before they can be processed (by
 definition of superstep)? So you cannot really process them as they come?
>>>
>>>
>>> This is the current implementation, yes, but I'm trying to see if an
>>> alternative is also possible in this framework, for Vertex implementations
>>> which are able to handle asynchronous updates.  In this model, a vertex
>>> would be required to be able to handle multiple calls to compute() in a
>>> single "superstep", and would instead signal that it's superstep
>>> computations are done at some (application specific) point.
>>>
>>> I know this goes outside of the concept of a "BSP" model, but I didn't
>>> want to get into too many details before I figure out how possible it was to
>>> implement some of this.
>>>
>>>-jake
>>>
>>
>>
>
>
> --
> Claudio Martella
> claudio.marte...@gmail.com
>


Re: Message processing

2011-09-09 Thread Jake Mannix
On Fri, Sep 9, 2011 at 8:03 AM, Avery Ching  wrote:

> The GraphLab model is more asynchronous than BSP  They allow you to update
> your neighbors rather than the BSP model of messaging per superstep.  Rather
> than one massive barrier in BSP, they implement this with vertex locking.
>  They also all a vertex to modify the state of its neighbors.  We could
> certainly add something similar as an alternative computing model, perhaps
> without locking.  Here's one idea:
>
> 1) No explicit supersteps (asynchronous)
>

This sounds interesting, esp. for streaming algorithms, although I was
thinking something slightly less ambitions to start out: still have
supersteps (effectively) which describe when each vertex has had a chance to
send all messages it wants for this iteration, and has processed all inbound
messages.


> 2) All vertices execute compute() (and may or may not send messages)
> initially
> 3) Vertices can examine their neighbors or any vertex in the graph (issue
> RPCs to get their state)
>

"or any vertex in the graph" sounds pretty scary, but yes, powerful.  I like
it when my seemingly radical ideas get made look not so scary by comparison!
:)


> 4) When messages are received by a vertex, compute() is executed on it (and
> state is locally locked to compute only)
> 5) Vertices stlll vote to halt when done, indicating the end of the
> application.
> 6) Combiners can still be used to reduce the number of messages sent (and
> the number of times compute is executed).
>
> This could be fun.  And provide an interesting comparison platform barrier
> based vs vertex based synchronization.
>

Yeah, I think locking is an implementation detail which might be even
avoidable: if Vertices are effectively given a messageQueue which they can
process from, we could interpolate between buffering and processing messages
synchonously.  The per-mapper threading model could get... interesting!

  -jake


Re: Message processing

2011-09-09 Thread Claudio Martella
sounds like a DataFlow paradigm. Could you please provide a reference to
this "they"? :)

On Fri, Sep 9, 2011 at 5:03 PM, Avery Ching  wrote:

> The GraphLab model is more asynchronous than BSP  They allow you to update
> your neighbors rather than the BSP model of messaging per superstep.  Rather
> than one massive barrier in BSP, they implement this with vertex locking.
>  They also all a vertex to modify the state of its neighbors.  We could
> certainly add something similar as an alternative computing model, perhaps
> without locking.  Here's one idea:
>
> 1) No explicit supersteps (asynchronous)
> 2) All vertices execute compute() (and may or may not send messages)
> initially
> 3) Vertices can examine their neighbors or any vertex in the graph (issue
> RPCs to get their state)
> 4) When messages are received by a vertex, compute() is executed on it (and
> state is locally locked to compute only)
> 5) Vertices stlll vote to halt when done, indicating the end of the
> application.
> 6) Combiners can still be used to reduce the number of messages sent (and
> the number of times compute is executed).
>
> This could be fun.  And provide an interesting comparison platform barrier
> based vs vertex based synchronization.
>
> On Fri, Sep 9, 2011 at 6:36 AM, Jake Mannix  wrote:
>
>>
>>
>> On Fri, Sep 9, 2011 at 3:22 AM, Claudio Martella <
>> claudio.marte...@gmail.com> wrote:
>>
>>> One misunderstanding my side. Isn't it true that the messages have to be
>>> buffered as they all have to be collected before they can be processed (by
>>> definition of superstep)? So you cannot really process them as they come?
>>
>>
>> This is the current implementation, yes, but I'm trying to see if an
>> alternative is also possible in this framework, for Vertex implementations
>> which are able to handle asynchronous updates.  In this model, a vertex
>> would be required to be able to handle multiple calls to compute() in a
>> single "superstep", and would instead signal that it's superstep
>> computations are done at some (application specific) point.
>>
>> I know this goes outside of the concept of a "BSP" model, but I didn't
>> want to get into too many details before I figure out how possible it was to
>> implement some of this.
>>
>>-jake
>>
>
>


-- 
Claudio Martella
claudio.marte...@gmail.com


Re: Message processing

2011-09-09 Thread Avery Ching
The GraphLab model is more asynchronous than BSP  They allow you to update
your neighbors rather than the BSP model of messaging per superstep.  Rather
than one massive barrier in BSP, they implement this with vertex locking.
 They also all a vertex to modify the state of its neighbors.  We could
certainly add something similar as an alternative computing model, perhaps
without locking.  Here's one idea:

1) No explicit supersteps (asynchronous)
2) All vertices execute compute() (and may or may not send messages)
initially
3) Vertices can examine their neighbors or any vertex in the graph (issue
RPCs to get their state)
4) When messages are received by a vertex, compute() is executed on it (and
state is locally locked to compute only)
5) Vertices stlll vote to halt when done, indicating the end of the
application.
6) Combiners can still be used to reduce the number of messages sent (and
the number of times compute is executed).

This could be fun.  And provide an interesting comparison platform barrier
based vs vertex based synchronization.

On Fri, Sep 9, 2011 at 6:36 AM, Jake Mannix  wrote:

>
>
> On Fri, Sep 9, 2011 at 3:22 AM, Claudio Martella <
> claudio.marte...@gmail.com> wrote:
>
>> One misunderstanding my side. Isn't it true that the messages have to be
>> buffered as they all have to be collected before they can be processed (by
>> definition of superstep)? So you cannot really process them as they come?
>
>
> This is the current implementation, yes, but I'm trying to see if an
> alternative is also possible in this framework, for Vertex implementations
> which are able to handle asynchronous updates.  In this model, a vertex
> would be required to be able to handle multiple calls to compute() in a
> single "superstep", and would instead signal that it's superstep
> computations are done at some (application specific) point.
>
> I know this goes outside of the concept of a "BSP" model, but I didn't want
> to get into too many details before I figure out how possible it was to
> implement some of this.
>
>-jake
>


Re: Message processing

2011-09-09 Thread Jake Mannix
On Fri, Sep 9, 2011 at 3:22 AM, Claudio Martella  wrote:

> One misunderstanding my side. Isn't it true that the messages have to be
> buffered as they all have to be collected before they can be processed (by
> definition of superstep)? So you cannot really process them as they come?


This is the current implementation, yes, but I'm trying to see if an
alternative is also possible in this framework, for Vertex implementations
which are able to handle asynchronous updates.  In this model, a vertex
would be required to be able to handle multiple calls to compute() in a
single "superstep", and would instead signal that it's superstep
computations are done at some (application specific) point.

I know this goes outside of the concept of a "BSP" model, but I didn't want
to get into too many details before I figure out how possible it was to
implement some of this.

  -jake


Re: Message processing

2011-09-09 Thread Claudio Martella
One misunderstanding my side. Isn't it true that the messages have to be
buffered as they all have to be collected before they can be processed (by
definition of superstep)? So you cannot really process them as they come?

On Thu, Sep 8, 2011 at 11:33 PM, Avery Ching  wrote:

>  On 9/8/11 2:19 PM, Jake Mannix wrote:
>
>
>
> On Thu, Sep 8, 2011 at 2:01 PM, Avery Ching  wrote:
>
>>  Yes, the messages are delivered to inMessages (map of id -> list of
>> messages).  Then they are transferred to the vertices just before the
>> computation on a worker.
>>
>
>  So just to make sure I understand correctly (I think I do, but reasoning
> about distributed async operations makes my head hurt a bit), I'm going to
> sound dumb and repeat this back to you: they show up in an asynchronous
> manner into BasicRPCCommunications, where they get buffered up into the
> inMessages map (one list for each vertex on the local node), where the
> GraphMapper then loops through and gives them to each Vertex before they do
> their compute() calls?
>
>
> You have got it =).  Although to clarify, all the messages are attached to
> the Vertex in the commService.prepareSuperstep() in GraphMapper.  I'm
> starting to wonder if we should have this conversation on
> giraph-...@incubator.apache.org =).  Never tried that list yet though...
>
>
>
>On 9/8/11 1:55 PM, Jake Mannix wrote:
>>
>> So in particular, each GraphMapper has a BasicRPCCommunications object
>> (and a CommunicationsInterface proxy object for each of the other nodes),
>> and when other nodes call putMsg(vertexId, message) on their respective
>> proxies, it shows up in the target vertex's BasicRPCCommunions putMsg()
>> call, is that right?
>>
>> On Thu, Sep 8, 2011 at 1:38 PM, Avery Ching  wrote:
>>
>>> If I'm understanding your question correctly, the iterator is created in
>>> GraphMapper, just before compute() is called.  Messages are tranferred to
>>> the vertices prior to the compute portion superstep.
>>>
>>> Avery
>>>
>>>
>>> On 9/8/11 12:59 PM, Jake Mannix wrote:
>>>
 Question about the message passing API:

  BaseVertex#sendMsg(I id, M msg)

 sends messages.  And

  BaseVertex#compute(Iterator msgIterator)

 deals with them in a big iterator provided by the framework.  My
 question is where the Iterator of messages is created?  If I wanted to say,
 subvert the processing to avoid buffering them up into one big List, and
 instead handle them as they come in, where would I look in the code to see
 where it's buffered up?

  -jake



>>>
>>
>>
>
>


-- 
Claudio Martella
claudio.marte...@gmail.com


Re: Message processing

2011-09-08 Thread Dmitriy Ryaboy
At this point, -dev and -user is probably pretty much the same set of people :)

On Thu, Sep 8, 2011 at 2:33 PM, Avery Ching  wrote:
> On 9/8/11 2:19 PM, Jake Mannix wrote:
>
> On Thu, Sep 8, 2011 at 2:01 PM, Avery Ching  wrote:
>>
>> Yes, the messages are delivered to inMessages (map of id -> list of
>> messages).  Then they are transferred to the vertices just before the
>> computation on a worker.
>
> So just to make sure I understand correctly (I think I do, but reasoning
> about distributed async operations makes my head hurt a bit), I'm going to
> sound dumb and repeat this back to you: they show up in an asynchronous
> manner into BasicRPCCommunications, where they get buffered up into the
> inMessages map (one list for each vertex on the local node), where the
> GraphMapper then loops through and gives them to each Vertex before they do
> their compute() calls?
>
>
> You have got it =).  Although to clarify, all the messages are attached to
> the Vertex in the commService.prepareSuperstep() in GraphMapper.  I'm
> starting to wonder if we should have this conversation on
> giraph-...@incubator.apache.org =).  Never tried that list yet though...
>
>
>> On 9/8/11 1:55 PM, Jake Mannix wrote:
>>
>> So in particular, each GraphMapper has a BasicRPCCommunications object
>> (and a CommunicationsInterface proxy object for each of the other nodes),
>> and when other nodes call putMsg(vertexId, message) on their respective
>> proxies, it shows up in the target vertex's BasicRPCCommunions putMsg()
>> call, is that right?
>>
>> On Thu, Sep 8, 2011 at 1:38 PM, Avery Ching  wrote:
>>>
>>> If I'm understanding your question correctly, the iterator is created in
>>> GraphMapper, just before compute() is called.  Messages are tranferred to
>>> the vertices prior to the compute portion superstep.
>>>
>>> Avery
>>>
>>> On 9/8/11 12:59 PM, Jake Mannix wrote:

 Question about the message passing API:

  BaseVertex#sendMsg(I id, M msg)

 sends messages.  And

  BaseVertex#compute(Iterator msgIterator)

 deals with them in a big iterator provided by the framework.  My
 question is where the Iterator of messages is created?  If I wanted to say,
 subvert the processing to avoid buffering them up into one big List, and
 instead handle them as they come in, where would I look in the code to see
 where it's buffered up?

  -jake


>>>
>>
>>
>
>
>



-- 
Dmitriy V Ryaboy
Twitter Analytics
http://twitter.com/squarecog


Re: Message processing

2011-09-08 Thread Avery Ching

On 9/8/11 2:19 PM, Jake Mannix wrote:



On Thu, Sep 8, 2011 at 2:01 PM, Avery Ching > wrote:


Yes, the messages are delivered to inMessages (map of id -> list
of messages).  Then they are transferred to the vertices just
before the computation on a worker.


So just to make sure I understand correctly (I think I do, but 
reasoning about distributed async operations makes my head hurt a 
bit), I'm going to sound dumb and repeat this back to you: they show 
up in an asynchronous manner into BasicRPCCommunications, where they 
get buffered up into the inMessages map (one list for each vertex on 
the local node), where the GraphMapper then loops through and gives 
them to each Vertex before they do their compute() calls?
You have got it =).  Although to clarify, all the messages are attached 
to the Vertex in the commService.prepareSuperstep() in GraphMapper.  I'm 
starting to wonder if we should have this conversation on 
giraph-...@incubator.apache.org =).  Never tried that list yet though...




On 9/8/11 1:55 PM, Jake Mannix wrote:

So in particular, each GraphMapper has a BasicRPCCommunications
object (and a CommunicationsInterface proxy object for each of
the other nodes), and when other nodes call putMsg(vertexId,
message) on their respective proxies, it shows up in the target
vertex's BasicRPCCommunions putMsg() call, is that right?

On Thu, Sep 8, 2011 at 1:38 PM, Avery Ching mailto:ach...@apache.org>> wrote:

If I'm understanding your question correctly, the iterator is
created in GraphMapper, just before compute() is called.
 Messages are tranferred to the vertices prior to the compute
portion superstep.

Avery


On 9/8/11 12:59 PM, Jake Mannix wrote:

Question about the message passing API:

 BaseVertex#sendMsg(I id, M msg)

sends messages.  And

 BaseVertex#compute(Iterator msgIterator)

deals with them in a big iterator provided by the
framework.  My question is where the Iterator of messages
is created?  If I wanted to say, subvert the processing
to avoid buffering them up into one big List, and instead
handle them as they come in, where would I look in the
code to see where it's buffered up?

 -jake











Re: Message processing

2011-09-08 Thread Jake Mannix
On Thu, Sep 8, 2011 at 2:01 PM, Avery Ching  wrote:

>  Yes, the messages are delivered to inMessages (map of id -> list of
> messages).  Then they are transferred to the vertices just before the
> computation on a worker.
>

So just to make sure I understand correctly (I think I do, but reasoning
about distributed async operations makes my head hurt a bit), I'm going to
sound dumb and repeat this back to you: they show up in an asynchronous
manner into BasicRPCCommunications, where they get buffered up into the
inMessages map (one list for each vertex on the local node), where the
GraphMapper then loops through and gives them to each Vertex before they do
their compute() calls?


On 9/8/11 1:55 PM, Jake Mannix wrote:
>
> So in particular, each GraphMapper has a BasicRPCCommunications object (and
> a CommunicationsInterface proxy object for each of the other nodes), and
> when other nodes call putMsg(vertexId, message) on their respective proxies,
> it shows up in the target vertex's BasicRPCCommunions putMsg() call, is that
> right?
>
> On Thu, Sep 8, 2011 at 1:38 PM, Avery Ching  wrote:
>
>> If I'm understanding your question correctly, the iterator is created in
>> GraphMapper, just before compute() is called.  Messages are tranferred to
>> the vertices prior to the compute portion superstep.
>>
>> Avery
>>
>>
>> On 9/8/11 12:59 PM, Jake Mannix wrote:
>>
>>> Question about the message passing API:
>>>
>>>  BaseVertex#sendMsg(I id, M msg)
>>>
>>> sends messages.  And
>>>
>>>  BaseVertex#compute(Iterator msgIterator)
>>>
>>> deals with them in a big iterator provided by the framework.  My question
>>> is where the Iterator of messages is created?  If I wanted to say, subvert
>>> the processing to avoid buffering them up into one big List, and instead
>>> handle them as they come in, where would I look in the code to see where
>>> it's buffered up?
>>>
>>>  -jake
>>>
>>>
>>>
>>
>
>


Re: Message processing

2011-09-08 Thread Avery Ching
Yes, the messages are delivered to inMessages (map of id -> list of 
messages).  Then they are transferred to the vertices just before the 
computation on a worker.


On 9/8/11 1:55 PM, Jake Mannix wrote:
So in particular, each GraphMapper has a BasicRPCCommunications object 
(and a CommunicationsInterface proxy object for each of the other 
nodes), and when other nodes call putMsg(vertexId, message) on their 
respective proxies, it shows up in the target vertex's 
BasicRPCCommunions putMsg() call, is that right?


On Thu, Sep 8, 2011 at 1:38 PM, Avery Ching > wrote:


If I'm understanding your question correctly, the iterator is
created in GraphMapper, just before compute() is called.  Messages
are tranferred to the vertices prior to the compute portion superstep.

Avery


On 9/8/11 12:59 PM, Jake Mannix wrote:

Question about the message passing API:

 BaseVertex#sendMsg(I id, M msg)

sends messages.  And

 BaseVertex#compute(Iterator msgIterator)

deals with them in a big iterator provided by the framework.
 My question is where the Iterator of messages is created?  If
I wanted to say, subvert the processing to avoid buffering
them up into one big List, and instead handle them as they
come in, where would I look in the code to see where it's
buffered up?

 -jake








Re: Message processing

2011-09-08 Thread Jake Mannix
So in particular, each GraphMapper has a BasicRPCCommunications object (and
a CommunicationsInterface proxy object for each of the other nodes), and
when other nodes call putMsg(vertexId, message) on their respective proxies,
it shows up in the target vertex's BasicRPCCommunions putMsg() call, is that
right?

On Thu, Sep 8, 2011 at 1:38 PM, Avery Ching  wrote:

> If I'm understanding your question correctly, the iterator is created in
> GraphMapper, just before compute() is called.  Messages are tranferred to
> the vertices prior to the compute portion superstep.
>
> Avery
>
>
> On 9/8/11 12:59 PM, Jake Mannix wrote:
>
>> Question about the message passing API:
>>
>>  BaseVertex#sendMsg(I id, M msg)
>>
>> sends messages.  And
>>
>>  BaseVertex#compute(Iterator msgIterator)
>>
>> deals with them in a big iterator provided by the framework.  My question
>> is where the Iterator of messages is created?  If I wanted to say, subvert
>> the processing to avoid buffering them up into one big List, and instead
>> handle them as they come in, where would I look in the code to see where
>> it's buffered up?
>>
>>  -jake
>>
>>
>>
>


Re: Message processing

2011-09-08 Thread Avery Ching
If I'm understanding your question correctly, the iterator is created in 
GraphMapper, just before compute() is called.  Messages are tranferred 
to the vertices prior to the compute portion superstep.


Avery

On 9/8/11 12:59 PM, Jake Mannix wrote:

Question about the message passing API:

  BaseVertex#sendMsg(I id, M msg)

sends messages.  And

  BaseVertex#compute(Iterator msgIterator)

deals with them in a big iterator provided by the framework.  My 
question is where the Iterator of messages is created?  If I wanted to 
say, subvert the processing to avoid buffering them up into one big 
List, and instead handle them as they come in, where would I look in 
the code to see where it's buffered up?


  -jake