You can use spark speculation as a way to get around the problem. Here is a useful link:
http://asyncified.io/2016/08/13/leveraging-spark-speculation-to-identify-and-re-schedule-slow-running-tasks/ Sent from my iPhone > On Feb 20, 2018, at 5:52 PM, Nikhil Goyal <nownik...@gmail.com> wrote: > > Hi guys, > > I have a job which gets stuck if a couple of tasks get killed due to OOM > exception. Spark doesn't kill the job and it keeps on running for hours. > Ideally I would expect Spark to kill the job or restart the killed executors > but nothing seems to be happening. Anybody got idea about this? > > Thanks > Nikhil