Can you shed more light on what kind of processing you are doing? One common pattern that I have seen for active core/executor utilization dropping to zero is while reading ORC data and the driver seems (I think) to be doing schema validation. In my case I would have hundreds of thousands of ORC data files and there is dead silence for about 1-2 hours. I have tried providing a schema and disabling schema validation while reading the ORC data, but that does not seem to help (Spark 2.2.1).
And as you know, in most cases, there is a linear relationship between number of partitions in your data and the concurrently active executors. Another thing I would suggest is use the following two API calls/method – they will annotate the spark stages and jobs with what is being executed in the Spark UI. SparkContext.setJobGroup(….) SparkContext.setJobDescription(….) From: Vitaliy Pisarev <vitaliy.pisa...@biocatch.com> Date: Thursday, November 15, 2018 at 8:51 AM To: user <user@spark.apache.org> Cc: David Markovitz <dudu.markov...@microsoft.com> Subject: How to address seemingly low core utilization on a spark workload? I have a workload that runs on a cluster of 300 cores. Below is a plot of the amount of active tasks over time during the execution of this workload: [image.png] What I deduce is that there are substantial intervals where the cores are heavily under-utilised. What actions can I take to: * Increase the efficiency (== core utilisation) of the cluster? * Understand the root causes behind the drops in core utilisation?