Recommended way to join 2 RDDs - one large, the other small

Shay Seng Thu, 14 Nov 2013 11:15:57 -0800

Hi,

Just wondering what people suggest for joining of 2 RDDs of very different
sizes


I have a sequence of map reduce that will in the end yield me a RDD ~ 500MB
- 800MB  that typically has a couple hundred partitions.

After that I want to join that rdd with 2 smaller rdds 1  will be <50MB
another probably in the KB range. call them RDDSmall, and RDDTiny.

What is the most efficient way to RDD.join(RDDSmall).join(RDDTiny)?

If RDDSmall has less partitions than RDD, won't the join cause RDD to
coalesce into the same number of partitions as RDDSmall, and even worse
RDDTiny?

tks,
shay

Recommended way to join 2 RDDs - one large, the other small

Reply via email to