Github user bersprockets commented on a diff in the pull request: https://github.com/apache/spark/pull/20424#discussion_r168080320 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala --- @@ -191,7 +192,26 @@ private[spark] class PythonWorkerFactory(pythonExec: String, envVars: Map[String daemon = pb.start() val in = new DataInputStream(daemon.getInputStream) - daemonPort = in.readInt() + try { + daemonPort = in.readInt() + } catch { + case _: EOFException => + throw new IOException(s"No port number in $daemonModule's stdout") + } + + // test that the returned port number is within a valid range. + // note: this does not cover the case where the port number + // is arbitrary data but is also coincidentally within range + if (daemonPort < 1 || daemonPort > 0xffff) { + val exceptionMessage = f""" + |Bad data in $daemonModule's standard output. + |Expected valid port number, got 0x$daemonPort%08x. + |PYTHONPATH set to '$pythonPath' + |Python command is '${command.asScala.mkString(" ")}' + |One possibility is a sitecustomize.py module in your python installation + |that is printing to stdout""" --- End diff -- Nice, except this one thing: >This module 'sitecustomize.py' can be located in your Python path: /.../spark/python/lib/pyspark.zip:/.../spark/python/lib/py4j-0.10.6-src.zip:/.../spark/assembly/target/scala-2.11/jars/spark-core_2.11-2.4.0-SNAPSHOT.jar:/.../spark/python/lib/py4j-0.10.6-src.zip:/.../spark/python/: I display the path because maybe you accidentally configured the path to have some old .zip or other incompatible versions of expected python modules on your path. sitecustomize.py might not be in your path, although it could be. I found a machine that had it here: /usr/lib/python2.7/sitecustomize.py. Another, I found it here: /usr/lib64/python2.7/sitecustomize.py
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org