I'm running a 10 node standalone cluster and I'm having issues with a stage
completing - it keeps hanging somewhere between 196 and 199/200 blocks
completed, but never errors and doesn't move forward.

<http://apache-spark-user-list.1001560.n3.nabble.com/file/t9731/stages.png> 

If I look at the task(s) still running, the stdout and stderr always give
the same message:
Error: invalid log directory
/usr/local/spark/spark-2.4.0-bin-hadoop2.7/work/app-20181129113214-0002/0/

<http://apache-spark-user-list.1001560.n3.nabble.com/file/t9731/error.png> 

This always happens on the same node. If I SSH into that node app folder, I
see that there is a /1/, but not a /0/.

Why is it looking for the wrong folder? This is stage 16/19, so it isn't
like it bombs from the get-go - that executor has done many previous tasks.
I can't figure out how to troubleshoot any further - the spark job never
bombs, that one task just keeps running...

<http://apache-spark-user-list.1001560.n3.nabble.com/file/t9731/workers.png> 

Thanks!



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to