Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/607#issuecomment-41912162
  
    I talked to Daryn a bit about the FileSystem.closeAllForUGI and it is fine 
to do here if we want to take that route.  He also agreed that moving it up 
makes more sense. So unless there is a reason not to move it up (like tasks are 
running as different users) I would prefer to do that.  Note we also need to 
move the one in ApplicationMaster up if that hasn't been done yet since I 
believe Patrick found that issue with localization.
    
    I do understand your concerns that if leave it as it is now and do the 
FileSystem.closeAllForUGI after the tasks run, if it does get shared between 
tasks and we close it then we close it for both tasks.  The only thing I can 
say is don't do it... Its really no different then if we move the doas up, we 
have to make sure the task code doesn't close the filesystem and cause issues 
for the entire executor.  Either way you have to be careful.  if we move it up 
we can actually share it and seems a bit less prone to error.
    
    Also in response to @aarondav first concern, I don't really follow. You 
aren't getting any benefits from the cache now. Its specific to that task so no 
other tasks can use that cached version of the filesystem so it is causing 
extra overhead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to