[GitHub] spark pull request: [SPARK-1740] [PySpark] kill the python worker

JoshRosen Sun, 03 Aug 2014 15:40:28 -0700

Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/1643#issuecomment-51005638
  
    Thanks for updating this!  In my earlier review, I had overlooked that the 
kill involved forking the JVM; your new approach of having the daemon kill the 
workers is much better.
    
    The test case looks good, too (clever use of Python's `for ... else` 
construct; I hadn't seen that before).
    
    In https://github.com/apache/spark/pull/1680, there was some discussion 
over whether to use SIGKILL vs. SIGHUP to kill the Python workers.  Now that 
I've had more time to think about it, I think SIGKILL is a fine approach:
    
    - Spark doesn't provide any documented, user-facing mechanisms for allowing 
tasks to perform cleanup work when they're cancelled.
    - The only case where it might make sense to have a cleanup mechanism is 
when performing side-effects, such as writing to an external database.  A 
machine could immediately lose power or otherwise fail without executing the 
cleanup mechanism, so users already have to guard against the effects of 
immediate failures (e.g. by using transactions).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-1740] [PySpark] kill the python worker

Reply via email to