Hi,
I have a Spark Batch job for reading timeseries data from Cassandra which has
50,000 rows.
JavaRDD<String> cassandraRowsRDD = javaFunctions.cassandraTable("iotdata",
"coordinate")
.map(new Function<CassandraRow, String>() {
@Override
public String call(CassandraRow cassandraRow) throws
Exception {
return cassandraRow.toString();
}
});
List<String> lm = cassandraRowsRDD.collect();
I am testing in local mode where I am observing Spark is creating 770870 tasks
(one job, one stage) which is taking many hours to complete. Can any please
suggest, what could be possible issues.
Stage Id
Description
Submitted
Duration
Tasks: Succeeded/Total
Input
Output
Shuffle Read
Shuffle Write
0
collect at
CassandraSpark.java:94<http://localhost:4040/stages/stage?id=0&attempt=0>+details
2016/03/10 21:01:15
9 s
137/770870
Thank You
Prateek
"DISCLAIMER: This message is proprietary to Aricent and is intended solely for
the use of the individual to whom it is addressed. It may contain privileged or
confidential information and should not be circulated or used for any purpose
other than for what it is intended. If you have received this message in error,
please notify the originator immediately. If you are not the intended
recipient, you are notified that you are strictly prohibited from using,
copying, altering, or disclosing the contents of this message. Aricent accepts
no responsibility for loss or damage arising from the use of the information
transmitted by this email including damage from virus."