I was getting this error after upgrading my nodes to Python2.7. I suspected the problem was due to conflicting Python versions, but my 2.7 install seemed correct on my nodes.
I set the PYSPARK_PYTHON variable to my 2.7 install (as I still had 2.6 installed and linked to the 'python' executable, with 'python2.7' the name for my new install) I’m still figuring out why this was happening, but even though I was defining the PYSPARK_PYTHON environment variable in my ../conf/spark-shell.sh script, it was being overwritten. I eventually realized to look at where the python executable is actually being set in /pyspark/context.py. sc.pythonExec (where sc is my spark context) was returning ‘python’ instead of ‘python2.7’ even though I had ‘python2.7’ in my config script. Setting os.environ[‘PYSPARK_PYTHON’] = ‘python2.7’ directly in my script before creating the sparkcontext object solved the problem. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-net-SocketException-on-reduceByKey-in-pyspark-tp2184p4612.html Sent from the Apache Spark User List mailing list archive at Nabble.com.