subject:"PySpark SocketConnect Issue in Cluster"

Re: PySpark SocketConnect Issue in Cluster

2014-04-09 Thread Surendranauth Hiraman

This appears to be an issue around using pandas. Even if we just instantiate a dataframe and do nothing with it, the python worker process is exiting. But if we remove any pandas references, the same job runs to completion. Has anyone run into this before? -Suren On Mon, Apr 7, 2014 at 1:10 PM

PySpark SocketConnect Issue in Cluster

2014-04-07 Thread Surendranauth Hiraman

Hi, We have a situation where a Pyspark script works fine as a local process ("local" url) on the Master and the Worker nodes, which would indicate that all python dependencies are set up properly on each machine. But when we try to run the script at the cluster level (using the master's url), if