I recently came across this (haven’t tried it out yet) but maybe it can help guide you to identify the root cause.
https://github.com/groupon/sparklint From: Vitaliy Pisarev <vitaliy.pisa...@biocatch.com> Date: Thursday, November 15, 2018 at 10:08 AM To: user <user@spark.apache.org> Cc: David Markovitz <dudu.markov...@microsoft.com> Subject: How to address seemingly low core utilization on a spark workload? I have a workload that runs on a cluster of 300 cores. Below is a plot of the amount of active tasks over time during the execution of this workload: What I deduce is that there are substantial intervals where the cores are heavily under-utilised. What actions can I take to: Increase the efficiency (== core utilisation) of the cluster? Understand the root causes behind the drops in core utilisation?
--------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org