Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20424#discussion_r168075909
--- Diff:
core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala ---
@@ -191,7 +192,26 @@ private[spark] class PythonWorkerFactory(pythonExec:
String, envVars: Map[String
daemon = pb.start()
val in = new DataInputStream(daemon.getInputStream)
- daemonPort = in.readInt()
+ try {
+ daemonPort = in.readInt()
+ } catch {
+ case _: EOFException =>
+ throw new IOException(s"No port number in $daemonModule's
stdout")
+ }
+
+ // test that the returned port number is within a valid range.
+ // note: this does not cover the case where the port number
+ // is arbitrary data but is also coincidentally within range
+ if (daemonPort < 1 || daemonPort > 0xffff) {
+ val exceptionMessage = f"""
+ |Bad data in $daemonModule's standard output.
+ |Expected valid port number, got 0x$daemonPort%08x.
+ |PYTHONPATH set to '$pythonPath'
+ |Python command is '${command.asScala.mkString(" ")}'
+ |One possibility is a sitecustomize.py module in your
python installation
+ |that is printing to stdout"""
--- End diff --
Shall we keep the format same as:
https://github.com/apache/spark/blob/b63abee881f2b4379f375500d51fdef706d6d512/core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala#L232-L235
?
the current message looks:
```
...
Caused by: java.io.IOException:
Bad data in pyspark.daemon's standard output.
Expected valid port number, got 0x4920616d.
PYTHONPATH set to
'/.../spark/python/lib/pyspark.zip:/.../spark/python/lib/py4j-0.10.6-src.zip:/.../spark/assembly/target/scala-2.11/jars/spark-core_2.11-2.4.0-SNAPSHOT.jar:/.../spark/python/lib/py4j-0.10.6-src.zip:/.../spark/python/:'
Python command is 'python -m pyspark.daemon'
Check if you have a sitecustomize.py module in your python installation.
...
```
I made a suggestion while verifying this PR:
```
...
Error from bad data in pyspark.daemon's standard output. Invalid port
number:
1633771786 (0x6161610a)
Python command to execute the daemon was:
python -m pyspark.daemon
One possibility is a sitecustomize module printing some data to the
standard output.
This module 'sitecustomize.py' can be located in your Python path:
/.../spark/python/lib/pyspark.zip:/.../spark/python/lib/py4j-0.10.6-src.zip:/.../spark/assembly/target/scala-2.11/jars/spark-core_2.11-2.4.0-SNAPSHOT.jar:/.../spark/python/lib/py4j-0.10.6-src.zip:/.../spark/python/:
...
```
Here is the diff I used. I also did some insane nitpicks here as well.
```diff
diff --git
a/core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala
b/core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala
index 5790c050a7f..b44aa6064bb 100644
---
a/core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala
+++
b/core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala
@@ -196,20 +196,24 @@ private[spark] class PythonWorkerFactory(pythonExec:
String, envVars: Map[String
daemonPort = in.readInt()
} catch {
case _: EOFException =>
- throw new IOException(s"No port number in $daemonModule's
stdout")
+ throw new SparkException(s"No port number in $daemonModule's
standard output.")
}
- // test that the returned port number is within a valid range.
- // note: this does not cover the case where the port number
- // is arbitrary data but is also coincidentally within range
+ // Check if the returned port number is within a valid range.
+ // Note: this does not cover the case where the port number is
arbitrary data but is
+ // also coincidentally within range.
if (daemonPort < 1 || daemonPort > 0xffff) {
val exceptionMessage = f"""
- |Bad data in $daemonModule's standard output.
- |Expected valid port number, got 0x$daemonPort%08x.
- |PYTHONPATH set to '$pythonPath'
- |Python command is '${command.asScala.mkString(" ")}'
- |Check if you have a sitecustomize.py module in your python
installation."""
- throw new IOException(exceptionMessage.stripMargin)
+ |Error from bad data in $daemonModule's standard output.
Invalid port number:
+ | $daemonPort (0x$daemonPort%08x)
+ |Python command to execute the daemon was:
+ | ${command.asScala.mkString(" ")}
+ |
+ |One possibility is a sitecustomize module printing some data
to the standard output.
+ |This module 'sitecustomize.py' can be located in your Python
path:
+ | $pythonPath
+ |"""
+ throw new SparkException(exceptionMessage.stripMargin)
}
// Redirect daemon stdout and stderr
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]