Hi Shahab, -How many spark/cassandra nodes are in your cluster? -What is your deploy topology for spark and cassandra clusters? Are they co-located?
- Helena @helenaedelson On Oct 30, 2014, at 12:16 PM, shahab <[email protected]> wrote: > Hi. > > I am running an application in the Spark which first loads data from > Cassandra and then performs some map/reduce jobs. > val srdd = sqlContext.sql("select * from mydb.mytable " ) > > I noticed that the "srdd" only has one partition . no matter how big is the > data loaded form Cassandra. > > So I perform "repartition" on the RDD , and then I did the map/reduce > functions. > > But the main problem is that "repartition" takes so much time (almost 2 min), > which is not acceptable in my use-case. Is there any better way to do > repartitioning? > > best, > /Shahab
