I am not pyspark person ..
But from the errors I could figure out that your Spark application is
having memory issues .
Are you collecting the results to the driver at any point of time or have
configured less memory for the nodes ?
and If you are using Dataframes then there is  issue raised  in Jira
<java.net.SocketTimeoutException: Accept timed out>

Hope this helps

Thanks,
Divya

On 16 December 2016 at 16:53, Russell Jurney <russell.jur...@gmail.com>
wrote:

> I have created a PySpark Streaming application that uses Spark ML to
> classify flight delays into three categories: on-time, slightly late, very
> late. After an hour or so something times out and the whole thing crashes.
>
> The code and error are on a gist here: https://gist.github.com/rjurney/
> 17d471bc98fd1ec925c37d141017640d
>
> While I am interested in why I am getting an exception, I am more
> interested in understanding what the correct deployment model is... because
> long running processes will have new and varied errors and exceptions.
> Right now with what I've built, Spark is a highly dependable distributed
> system but in streaming mode the entire thing is dependent on one Python
> PID going down. This can't be how apps are deployed in the wild because it
> will never be very reliable, right? But I don't see anything about this in
> the docs, so I am confused.
>
> Note that I use this to run the app, maybe that is the problem?
>
> ssc.start()
> ssc.awaitTermination()
>
>
> What is the actual deployment model for Spark Streaming? All I know to do
> right now is to restart the PID. I'm new to Spark, and the docs don't
> really explain this (that I can see).
>
> Thanks!
> --
> Russell Jurney twitter.com/rjurney russell.jur...@gmail.com relato.io
>

Reply via email to