Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/1977#issuecomment-91011683
  
    I spent most of the morning looking this over again and the patch looks 
pretty good to me.  I think I understand the lifecycle of values pretty well.  
I left a couple of questions / comments upthread, but overall I think this is 
pretty close to being ready to merge.
    
    One quick question: when do groupBy's spill files get cleaned up?  It seems 
like we want them to be cleaned at the end of the task, since at that point we 
know for sure that they won't be re-used.  To handle this case, what do you 
think about adding a cleanup hook mechanism that allows us to register cleanup 
code to run at the end of a task?  In the past, we could have relied on 
shutdown hooks for this, but that's no longer possible due to worker re-use.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to