Hello Imran,
Thanks for your response. I noticed the "intersection" and "subtract"
methods for a RDD, does they work based on hash off all the fields in a RDD
record ?
- Himanish
On Thu, Feb 19, 2015 at 6:11 PM, Imran Rashid wrote:
> the more scalable alternative is to do a join (or a variant
the more scalable alternative is to do a join (or a variant like cogroup,
leftOuterJoin, subtractByKey etc. found in PairRDDFunctions)
the downside is this requires a shuffle of both your RDDs
On Thu, Feb 19, 2015 at 3:36 PM, Himanish Kushary
wrote:
> Hi,
>
> I have two RDD's with csv data as b
Hi,
I have two RDD's with csv data as below :
RDD-1
101970_5854301840,fbcf5485-e696-4100-9468-a17ec7c5bb43,19229261643
101970_5854301839,fbaf5485-e696-4100-9468-a17ec7c5bb39,9229261645
101970_5854301839,fbbf5485-e696-4100-9468-a17ec7c5bb39,9229261647
101970_17038953,546853f9-cf07-4700-b202-00f21