Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20424#discussion_r167424753
  
    --- Diff: 
core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala ---
    @@ -191,7 +192,25 @@ private[spark] class PythonWorkerFactory(pythonExec: 
String, envVars: Map[String
             daemon = pb.start()
     
             val in = new DataInputStream(daemon.getInputStream)
    -        daemonPort = in.readInt()
    +        try {
    +          daemonPort = in.readInt()
    +        } catch {
    +          case _: EOFException =>
    +            throw new IOException(s"No port number in $daemonModule's 
stdout")
    +        }
    +
    +        // test that the returned port number is within a valid range.
    +        // note: this does not cover the case where the port number
    +        // is arbitrary data but is also coincidentally within range
    +        if (daemonPort < 1 || daemonPort > 0xffff) {
    +          val exceptionMessage = f"""
    +               |Bad data in  $daemonModule's standard output.
    +               |Expected valid port number, got 0x$daemonPort%08x.
    +               |PYTHONPATH set to '$pythonPath'
    +               |Python command is '${command.asScala.mkString(" ")}'
    +               |Check if you have a sitecustomize.py module in your python 
installation."""
    --- End diff --
    
    This sounds like there's only one case - `sitecustomize`. I believe there 
are many possibility, for example, custom python executable, `usercustomize` 
and etc. Can we just say few words to implies `sitecustomize` is one potential 
case?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to