Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/2259#issuecomment-54731896
Do you think worker re-use should be enabled by default?
The only problem that I anticipate is for applications that share a single
SparkContext with both Python and Scala processes; in these cases, the Python
tasks may continue to hog resources (memory that's not used for caching RDDs)
even after they complete. This seems like a rare use-case, though, so we could
document this change and advise those users to disable this setting.
I'm inclined to have it on by default, since it will be a huge performance
win for the vast majority of PySpark users.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]