Yeah reduce() will leave you with one big collection of sets on the
driver. Maybe the set of all identifiers isn't so big -- a hundred
million Longs even isn't so much. I'm glad to hear cartesian works but
can that scale? you're making an RDD of N^2 elements initially which
is just vast.

On Thu, Jul 24, 2014 at 2:09 AM, Roch Denis <rde...@exostatic.com> wrote:
> Ah yes, you're quite right with partitions I could probably process a good
> chunk of the data but I didn't think a reduce would work? Sorry, I'm still
> new to Spark and map reduce in general but I thought that the reduce result
> wasn't an RDD and had to fit into memory. If the result of a reduce can be
> any size, then yes I can see how to make it work.
>
> Sorry for not being certain, the doc is not quite clear on that point, at
> least to me.
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Help-in-merging-a-RDD-agaisnt-itself-using-the-V-of-a-K-V-tp10530p10556.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to