Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20424#discussion_r168075909
  
    --- Diff: 
core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala ---
    @@ -191,7 +192,26 @@ private[spark] class PythonWorkerFactory(pythonExec: 
String, envVars: Map[String
             daemon = pb.start()
     
             val in = new DataInputStream(daemon.getInputStream)
    -        daemonPort = in.readInt()
    +        try {
    +          daemonPort = in.readInt()
    +        } catch {
    +          case _: EOFException =>
    +            throw new IOException(s"No port number in $daemonModule's 
stdout")
    +        }
    +
    +        // test that the returned port number is within a valid range.
    +        // note: this does not cover the case where the port number
    +        // is arbitrary data but is also coincidentally within range
    +        if (daemonPort < 1 || daemonPort > 0xffff) {
    +          val exceptionMessage = f"""
    +               |Bad data in  $daemonModule's standard output.
    +               |Expected valid port number, got 0x$daemonPort%08x.
    +               |PYTHONPATH set to '$pythonPath'
    +               |Python command is '${command.asScala.mkString(" ")}'
    +               |One possibility is a sitecustomize.py module in your 
python installation
    +               |that is printing to stdout"""
    --- End diff --
    
    Shall we keep the format same as:
    
    
https://github.com/apache/spark/blob/b63abee881f2b4379f375500d51fdef706d6d512/core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala#L232-L235
    
    ?
    
    the current message looks:
    
    ```
    ...
    Caused by: java.io.IOException:
    Bad data in  pyspark.daemon's standard output.
    Expected valid port number, got 0x4920616d.
    PYTHONPATH set to 
'/.../spark/python/lib/pyspark.zip:/.../spark/python/lib/py4j-0.10.6-src.zip:/.../spark/assembly/target/scala-2.11/jars/spark-core_2.11-2.4.0-SNAPSHOT.jar:/.../spark/python/lib/py4j-0.10.6-src.zip:/.../spark/python/:'
    Python command is 'python -m pyspark.daemon'
    Check if you have a sitecustomize.py module in your python installation.
    ...
    ```
    
    I made a suggestion while verifying this PR:
    
    
    ```
    ...
    Error from bad data in pyspark.daemon's standard output. Invalid port 
number:
      1633771786 (0x6161610a)
    Python command to execute the daemon was:
      python -m pyspark.daemon
    
    One possibility is a sitecustomize module printing some data to the 
standard output.
    This module 'sitecustomize.py' can be located in your Python path:
      
/.../spark/python/lib/pyspark.zip:/.../spark/python/lib/py4j-0.10.6-src.zip:/.../spark/assembly/target/scala-2.11/jars/spark-core_2.11-2.4.0-SNAPSHOT.jar:/.../spark/python/lib/py4j-0.10.6-src.zip:/.../spark/python/:
    ...
    ```
    
    Here is the diff I used. I also did some insane nitpicks here as well.
    
    ```diff
    diff --git 
a/core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala 
b/core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala
    index 5790c050a7f..b44aa6064bb 100644
    --- 
a/core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala
    +++ 
b/core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala
    @@ -196,20 +196,24 @@ private[spark] class PythonWorkerFactory(pythonExec: 
String, envVars: Map[String
               daemonPort = in.readInt()
             } catch {
               case _: EOFException =>
    -            throw new IOException(s"No port number in $daemonModule's 
stdout")
    +            throw new SparkException(s"No port number in $daemonModule's 
standard output.")
             }
    
    -        // test that the returned port number is within a valid range.
    -        // note: this does not cover the case where the port number
    -        // is arbitrary data but is also coincidentally within range
    +        // Check if the returned port number is within a valid range.
    +        // Note: this does not cover the case where the port number is 
arbitrary data but is
    +        // also coincidentally within range.
             if (daemonPort < 1 || daemonPort > 0xffff) {
               val exceptionMessage = f"""
    -               |Bad data in  $daemonModule's standard output.
    -               |Expected valid port number, got 0x$daemonPort%08x.
    -               |PYTHONPATH set to '$pythonPath'
    -               |Python command is '${command.asScala.mkString(" ")}'
    -               |Check if you have a sitecustomize.py module in your python 
installation."""
    -          throw new IOException(exceptionMessage.stripMargin)
    +             |Error from bad data in $daemonModule's standard output. 
Invalid port number:
    +             |  $daemonPort (0x$daemonPort%08x)
    +             |Python command to execute the daemon was:
    +             |  ${command.asScala.mkString(" ")}
    +             |
    +             |One possibility is a sitecustomize module printing some data 
to the standard output.
    +             |This module 'sitecustomize.py' can be located in your Python 
path:
    +             |  $pythonPath
    +             |"""
    +          throw new SparkException(exceptionMessage.stripMargin)
             }
    
             // Redirect daemon stdout and stderr
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to