Github user bersprockets commented on the issue:
https://github.com/apache/spark/pull/20519
>yea but we can't simply flush and ignore the stdout specifically from
sitecustomize unless we define a kind of an additional protocol like this
because we can't simply distinguish if the output
We might be able to distinguish between sitecustomize.py output and
daemon.py output. Assuming the code in the sitecustomize.py is not
multi-threaded, we can assume all output from sitecustomize.py comes *before*
any output from daemon.py. Therefore, if daemon.py first prints a "magic
number" or some other string that is unlikely to show up in sitecustomize.py
output, PythonWorkerFactory.startDaemon() will know when daemon.py output
starts. daemon.py would print the port number only after printing this magic
value. For example:
<pre>
<junk from sitecustomize.py>daemon port: ^@^@\325
</pre>
Once the scala code sees "daemon port: " in the launched process's stdout,
it knows the next 4 bytes are the port number.
However, if sitecustomize.py starts multi-threaded code (and if that's even
possible, that's a corner-corner-corner case), its output could potentially be
interleaved with the daemon's output. Also, I am not sure sitecustomize.py
output is guaranteed to show up first in stdout, but it seems reasonable that
it would.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]