Hi,

I am building a Spark app which aggregates sensor data stored in Cassandra.
After I submit my app to spark the driver and application show up quickly
then, before any Spark job shows up in the application UI  there is a huge
lag, on the order of minutes to sometimes hours. Once the Spark job itself
gets added, processing is very fast (ms to possibly a few seconds).

While this hang is occurring there is not a high load on my driver (cpu
etc. through top) or my Cassandra cluster (disk/network etc. seen through
top and Datastax Opscenter). Once the job does start you can see the load
spike.

I am accessing Cassandra using the Datastax Spark Cassandra connector. I do
have a lot of Cassandra partitions accessed in each job (7 days * 24 hours
* 50+ sensors) and I can see these Cassandra partitions in the DAG graph. I
have tried coalesce, which seems to help somewhat but the lag is still
orders of magnitude larger than any processing time.

Thanks,

Patrick

Reply via email to