Hi,

I have some code that creates ~ 80 RDD and then a sc.union is applied to
combine all 80 into one for the next step (to run topByKey for example)...

While creating 80 RDDs take 3 mins per RDD, doing a union over them takes 3
hrs (I am validating these numbers)...

Is there any checkpoint based option to further speed up the union ?

Thanks.
Deb

Reply via email to