Prateek, I believe that one task is created per Cassandra partition. How is your data partitioned?
Regards, Bryan Jeffrey On Thu, Mar 10, 2016 at 10:36 AM, Prateek . <prat...@aricent.com> wrote: > Hi, > > > > I have a Spark Batch job for reading timeseries data from Cassandra which > has 50,000 rows. > > > > > > JavaRDD<String> cassandraRowsRDD = javaFunctions.cassandraTable("iotdata", > "coordinate") > > .map(*new* Function<CassandraRow, String>() { > > @Override > > *public* String call(CassandraRow cassandraRow) > *throws* Exception { > > *return* cassandraRow.toString(); > > } > > }); > > > > List<String> lm = cassandraRowsRDD.collect(); > > > > > > I am testing in local mode where I am observing Spark is creating 770870 > tasks (one job, one stage) which is taking many hours to complete. Can any > please suggest, what could be possible issues. > > > > > > *Stage Id* > > *Description* > > *Submitted* > > *Duration* > > *Tasks: Succeeded/Total* > > *Input* > > *Output* > > *Shuffle Read* > > *Shuffle Write* > > 0 > > collect at CassandraSpark.java:94 > <http://localhost:4040/stages/stage?id=0&attempt=0>+details > > 2016/03/10 21:01:15 > > 9 s > > 137/*770870* > > > > > > Thank You > > > > Prateek > "DISCLAIMER: This message is proprietary to Aricent and is intended solely > for the use of the individual to whom it is addressed. It may contain > privileged or confidential information and should not be circulated or used > for any purpose other than for what it is intended. If you have received > this message in error, please notify the originator immediately. If you are > not the intended recipient, you are notified that you are strictly > prohibited from using, copying, altering, or disclosing the contents of > this message. Aricent accepts no responsibility for loss or damage arising > from the use of the information transmitted by this email including damage > from virus." >