Hi, I was looking at incorrect place for logs, yes I see some errors in logs:
"Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages." logger="org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend",message="Disconnected from Spark cluster! Waiting for reconnection..." So what is best way to deal with this situation? I would rather have driver killed along with it, is there a way to achieve that? On Mon, May 15, 2017 at 3:05 PM, Shixiong(Ryan) Zhu <shixi...@databricks.com > wrote: > So you are using `client` mode. Right? If so, Spark cluster doesn't manage > the driver for you. Did you see any error logs in driver? > > On Mon, May 15, 2017 at 3:01 PM, map reduced <k3t.gi...@gmail.com> wrote: > >> Hi, >> >> Setup: Standalone cluster with 32 workers, 1 master >> I am running a long running streaming spark job (read from Kafka -> >> process -> send to Http endpoint) which should ideally never stop. >> >> I have 2 questions: >> 1) I have seen some times Driver is still running but application marked >> as *Finished*. *Any idea why this happens or any way to debug this?* >> Sometimes after running for say 2-3 days (or 4-5 days - random timeframe) >> this issue arises, not sure what is causing it. Nothing in logs suggests >> failures or exceptions >> >> 2) Is there a way for Driver to kill itself instead of keeping on running >> without any application to drive? >> >> Thanks, >> KP >> > >