Best way to partition RDD

shahab Thu, 30 Oct 2014 09:17:51 -0700

Hi.

I am running an application in the Spark which first loads data from
Cassandra and then performs some map/reduce jobs.


val srdd = sqlContext.sql("select * from mydb.mytable "  )
I noticed that the "srdd" only has one partition . no matter how big is the
data loaded form Cassandra.

So I perform "repartition" on the RDD , and then I did the map/reduce
functions.

But the main problem is that "repartition" takes so much time (almost 2
min), which is not acceptable in my use-case. Is there any better way to do
repartitioning?

best,
/Shahab

Best way to partition RDD

Reply via email to