I've built a spark job in which an external program is called through the use
of pipe().
Job runs correctly on cluster when the input is a small sample dataset but
when the input is a real large dataset it stays on RUNNING state forever.

I've tried different ways to tune executor memory, executor cores, overhead
memory but havent found a solution so far.
I've also tried to force external program to use only 1 thread in case there
is a problem due to it being a multithread application but nothing.

Any suggestion would be welcome



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to