Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/1977#issuecomment-91011683
I spent most of the morning looking this over again and the patch looks
pretty good to me. I think I understand the lifecycle of values pretty well.
I left a couple of questions / comments upthread, but overall I think this is
pretty close to being ready to merge.
One quick question: when do groupBy's spill files get cleaned up? It seems
like we want them to be cleaned at the end of the task, since at that point we
know for sure that they won't be re-used. To handle this case, what do you
think about adding a cleanup hook mechanism that allows us to register cleanup
code to run at the end of a task? In the past, we could have relied on
shutdown hooks for this, but that's no longer possible due to worker re-use.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]