Github user andrewor14 commented on a diff in the pull request:
https://github.com/apache/spark/pull/2129#discussion_r16758263
--- Diff: python/pyspark/java_gateway.py ---
@@ -69,6 +70,22 @@ def preexec_func():
error_msg +=
"--------------------------------------------------------------\n"
raise Exception(error_msg)
+ # In Windows, ensure the Java child processes do not linger after
Python has exited.
+ # In UNIX-based systems, the child process can kill itself on
broken pipe (i.e. when
+ # the parent process' stdin sends an EOF). In Windows, however,
this is not possible
+ # because java.lang.Process reads directly from the parent
process' stdin, contending
+ # with any opportunity to read an EOF from the parent. Note that
this is only best
+ # effort and will not take effect if the python process is
violently terminated.
+ if on_windows:
+ # In Windows, the child process here is "spark-submit.cmd",
not the JVM itself
+ # (because the UNIX "exec" command is not available). This
means we cannot simply
+ # call proc.kill(), which kills only the "spark-submit.cmd"
process but not the
+ # JVMs. Instead, we use "taskkill" with the tree-kill option
"/t" to terminate all
+ # child processes in the tree.
+ def killChild():
+ Popen(["cmd", "/c", "taskkill", "/f", "/t", "/pid",
str(proc.pid)])
--- End diff --
Sure. If we are to link it I'd rather provide a more official one, say
http://technet.microsoft.com/en-us/library/bb491009.aspx
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]