Hi guys, I keep running into a strange problem where my jobs start to fail with the dreaded "Resubmitted (resubmitted due to lost executor)” because of having too many temp files from previous runs.
Both /var/run and /spill have enough disk space left, but after a given amount of jobs have run, following jobs will struggle with completion. There are a lot of failures without any exception message, only the above mentioned lost executor. As soon as I clear out /var/run/spark/work/ and the spill disk, everything goes back to normal. Thanks for any hint, - Marius --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org