I'm running a 10 node standalone cluster and I'm having issues with a stage completing - it keeps hanging somewhere between 196 and 199/200 blocks completed, but never errors and doesn't move forward.
<http://apache-spark-user-list.1001560.n3.nabble.com/file/t9731/stages.png> If I look at the task(s) still running, the stdout and stderr always give the same message: Error: invalid log directory /usr/local/spark/spark-2.4.0-bin-hadoop2.7/work/app-20181129113214-0002/0/ <http://apache-spark-user-list.1001560.n3.nabble.com/file/t9731/error.png> This always happens on the same node. If I SSH into that node app folder, I see that there is a /1/, but not a /0/. Why is it looking for the wrong folder? This is stage 16/19, so it isn't like it bombs from the get-go - that executor has done many previous tasks. I can't figure out how to troubleshoot any further - the spark job never bombs, that one task just keeps running... <http://apache-spark-user-list.1001560.n3.nabble.com/file/t9731/workers.png> Thanks! -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org