I'm sitting here looking at my application crunching gigabytes of data on a
cluster and I have no idea if it's an hour away from completion or a
minute. The web UI shows progress through each stage, but not how many
stages remaining. How can I work out how many stages my program will take
automatically?

My application has a slightly interesting DAG (re-use of functions that
contain Spark transformations, persistent RDDs). Not that complex, but not
'step 1, step 2, step 3'.

I'm guessing that if the driver program runs sequentially sending messages
to Spark, then Spark has no knowledge of the structure of the driver
program. Therefore it's necessary to execute it on a small test dataset and
see how many stages result?

When I set spark.eventLog.enabled = true and run on (very small) test data
I don't get any stage messages in my STDOUT or in the log file. This is on
a `local` instance.

Did I miss something obvious?

Thanks!

Joe

Reply via email to