[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

JoshRosen Sat, 06 Sep 2014 16:25:24 -0700

Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/2259#issuecomment-54731896
  
    Do you think worker re-use should be enabled by default?
    
    The only problem that I anticipate is for applications that share a single 
SparkContext with both Python and Scala processes; in these cases, the Python 
tasks may continue to hog resources (memory that's not used for caching RDDs) 
even after they complete.  This seems like a rare use-case, though, so we could 
document this change and advise those users to disable this setting.
    
    I'm inclined to have it on by default, since it will be a huge performance 
win for the vast majority of PySpark users.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

Reply via email to