The former: a single new RDD is returned.

Check the PairRDDFunctions docs 
(http://spark.apache.org/docs/latest/api/core/index.html#org.apache.spark.rdd.PairRDDFunctions):

def groupByKey(): RDD[(K, Seq[V])]
Group the values for each key in the RDD into a single sequence.


On Wednesday, March 19, 2014 at 9:32 AM, Adrian Mocanu wrote:

> When you partition via groupByKey tulpes (parts of the RDD) are moved from 
> some node to another node based on key (hash partitioning).
> Do the tuples remain part of 1 RDD as before but moved to different nodes or 
> does this shuffling create, say, several RDDs which will have parts of the 
> original RDD?
>  
> Thanks
> -Adrian
>  
> 
> 
> 


Reply via email to