Usually more information as to the cause of this will be found down in your logs. I generally see this happen when an out of memory exception has occurred for one reason or another on an executor. It's possible your memory settings are too small per executor or the concurrent number of tasks you are running are too large for some of the executors. Other times, it's possible using RDD functions like groupBy() that collect an unbounded amount of items into memory could be causing it.
Either way, the logs for the executors should be able to give you some insight, have you looked at those yet? On Tue, Aug 18, 2015 at 6:26 PM, VIJAYAKUMAR JAWAHARLAL <[email protected] > wrote: > Hi All > > Why am I getting ExecutorLostFailure and executors are completely lost > for rest of the processing? Eventually it makes job to fail. One thing for > sure that lot of shuffling happens across executors in my program. > > Is there a way to understand and debug ExecutorLostFailure? Any pointers > regarding “ExecutorLostFailure” would help me a lot. > > Thanks > Vijay >
