RE: Two joins in GraphX Pregel implementation

2015-07-29 Thread Ulanov, Alexander
: Tuesday, July 28, 2015 12:05 PM To: Ulanov, Alexander Cc: Robin East; dev@spark.apache.org Subject: Re: Two joins in GraphX Pregel implementation On 27 Jul 2015, at 16:42, Ulanov, Alexander mailto:alexander.ula...@hp.com>> wrote: It seems that the mentioned two joins can be rewritten as one oute

Re: Two joins in GraphX Pregel implementation

2015-07-28 Thread Ankur Dave
On 27 Jul 2015, at 16:42, Ulanov, Alexander wrote: > It seems that the mentioned two joins can be rewritten as one outer join You're right. In fact, the outer join can be streamlined further using a method from GraphOps: g = g.joinVertices(messages)(vprog).cache() Then, instead of passing new

RE: Two joins in GraphX Pregel implementation

2015-07-28 Thread Ulanov, Alexander
. Do you know the reason why this improvement is not pushed? CC’ing Dave From: Robin East [mailto:robin.e...@xense.co.uk] Sent: Monday, July 27, 2015 9:11 AM To: Ulanov, Alexander Cc: dev@spark.apache.org Subject: Re: Two joins in GraphX Pregel implementation Quite possibly - there is a JIRA open

RE: Two joins in GraphX Pregel implementation

2015-07-27 Thread Ulanov, Alexander
27, 2015 8:56 AM To: Ulanov, Alexander Cc: dev@spark.apache.org Subject: Re: Two joins in GraphX Pregel implementation What happens to this line of code: messages = g.mapReduceTriplets(sendMsg, mergeMsg, Some((newVerts, activeDir))).cache() Part of the Pregel ‘contract’ is that vertices that

Two joins in GraphX Pregel implementation

2015-07-27 Thread Ulanov, Alexander
Dear Spark developers, Below is the GraphX Pregel code snippet from https://spark.apache.org/docs/latest/graphx-programming-guide.html#pregel-api: (it does not contain caching step): while (activeMessages > 0 && i < maxIterations) { // Receive the messages: