Hi, I am building a Spark app which aggregates sensor data stored in Cassandra. After I submit my app to spark the driver and application show up quickly then, before any Spark job shows up in the application UI there is a huge lag, on the order of minutes to sometimes hours. Once the Spark job itself gets added, processing is very fast (ms to possibly a few seconds).
While this hang is occurring there is not a high load on my driver (cpu etc. through top) or my Cassandra cluster (disk/network etc. seen through top and Datastax Opscenter). Once the job does start you can see the load spike. I am accessing Cassandra using the Datastax Spark Cassandra connector. I do have a lot of Cassandra partitions accessed in each job (7 days * 24 hours * 50+ sensors) and I can see these Cassandra partitions in the DAG graph. I have tried coalesce, which seems to help somewhat but the lag is still orders of magnitude larger than any processing time. Thanks, Patrick