Re: Processing graphs

2015-02-18 Thread Vijayasarathy Kannan
Hi,

Thanks for your reply.

I basically want to check if my understanding what parallelize() on RDDs is
correct. In my case, I create a vertex RDD and edge RDD and distribute them
by calling parallelize(). Now does Spark perform any operation on these
RDDs in parallel?

For example, if I apply groupBy on the edge RDD (grouping by source vertex)
and call a function F on the grouped RDD, will F be applied on each group
in parallel and will Spark determine how to do this in parallel regardless
of the number of groups?

Thanks.

On Tue, Feb 17, 2015 at 5:03 PM, Yifan LI iamyifa...@gmail.com wrote:

 Hi Kannan,

 I am not sure I have understood what your question is exactly, but maybe
 the reduceByKey or reduceByKeyLocally functionality is better to your need.



 Best,
 Yifan LI





 On 17 Feb 2015, at 17:37, Vijayasarathy Kannan kvi...@vt.edu wrote:

 Hi,

 I am working on a Spark application that processes graphs and I am trying
 to do the following.

 - group the vertices (key - vertex, value - set of its outgoing edges)
 - distribute each key to separate processes and process them (like mapper)
 - reduce the results back at the main process

 Does the groupBy functionality do the distribution by default?
 Do we have to explicitly use RDDs to enable automatic distribution?

 It'd be great if you could help me understand these and how to go about
 with the problem.

 Thanks.





Processing graphs

2015-02-17 Thread Vijayasarathy Kannan
Hi,

I am working on a Spark application that processes graphs and I am trying
to do the following.

- group the vertices (key - vertex, value - set of its outgoing edges)
- distribute each key to separate processes and process them (like mapper)
- reduce the results back at the main process

Does the groupBy functionality do the distribution by default?
Do we have to explicitly use RDDs to enable automatic distribution?

It'd be great if you could help me understand these and how to go about
with the problem.

Thanks.


Re: Processing graphs

2015-02-17 Thread Yifan LI
Hi Kannan,

I am not sure I have understood what your question is exactly, but maybe the 
reduceByKey or reduceByKeyLocally functionality is better to your need.



Best,
Yifan LI





 On 17 Feb 2015, at 17:37, Vijayasarathy Kannan kvi...@vt.edu wrote:
 
 Hi,
 
 I am working on a Spark application that processes graphs and I am trying to 
 do the following.
 
 - group the vertices (key - vertex, value - set of its outgoing edges)
 - distribute each key to separate processes and process them (like mapper)
 - reduce the results back at the main process
 
 Does the groupBy functionality do the distribution by default?
 Do we have to explicitly use RDDs to enable automatic distribution?
 
 It'd be great if you could help me understand these and how to go about with 
 the problem.
 
 Thanks.