Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20424#discussion_r167424753
--- Diff:
core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala ---
@@ -191,7 +192,25 @@ private[spark] class PythonWorkerFactory(pythonExec:
String, envVars: Map[String
daemon = pb.start()
val in = new DataInputStream(daemon.getInputStream)
- daemonPort = in.readInt()
+ try {
+ daemonPort = in.readInt()
+ } catch {
+ case _: EOFException =>
+ throw new IOException(s"No port number in $daemonModule's
stdout")
+ }
+
+ // test that the returned port number is within a valid range.
+ // note: this does not cover the case where the port number
+ // is arbitrary data but is also coincidentally within range
+ if (daemonPort < 1 || daemonPort > 0xffff) {
+ val exceptionMessage = f"""
+ |Bad data in $daemonModule's standard output.
+ |Expected valid port number, got 0x$daemonPort%08x.
+ |PYTHONPATH set to '$pythonPath'
+ |Python command is '${command.asScala.mkString(" ")}'
+ |Check if you have a sitecustomize.py module in your python
installation."""
--- End diff --
This sounds like there's only one case - `sitecustomize`. I believe there
are many possibility, for example, custom python executable, `usercustomize`
and etc. Can we just say few words to implies `sitecustomize` is one potential
case?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]