Re: Best way to partition RDD

Helena Edelson Thu, 30 Oct 2014 10:14:59 -0700

Hi Shahab, 
-How many spark/cassandra nodes are in your cluster?
-What is your deploy topology for spark and cassandra clusters? Are they 
co-located?


- Helena
@helenaedelson

On Oct 30, 2014, at 12:16 PM, shahab <[email protected]> wrote:

> Hi.
> 
> I am running an application in the Spark which first loads data from 
> Cassandra and then performs some map/reduce jobs.
> val srdd = sqlContext.sql("select * from mydb.mytable "  )
> 
> I noticed that the "srdd" only has one partition . no matter how big is the 
> data loaded form Cassandra.
> 
> So I perform "repartition" on the RDD , and then I did the map/reduce 
> functions.
> 
> But the main problem is that "repartition" takes so much time (almost 2 min), 
> which is not acceptable in my use-case. Is there any better way to do 
> repartitioning?
> 
> best,
> /Shahab

Re: Best way to partition RDD

Reply via email to