Github user tgravescs commented on the pull request:
https://github.com/apache/spark/pull/607#issuecomment-41912162
I talked to Daryn a bit about the FileSystem.closeAllForUGI and it is fine
to do here if we want to take that route. He also agreed that moving it up
makes more sense. So unless there is a reason not to move it up (like tasks are
running as different users) I would prefer to do that. Note we also need to
move the one in ApplicationMaster up if that hasn't been done yet since I
believe Patrick found that issue with localization.
I do understand your concerns that if leave it as it is now and do the
FileSystem.closeAllForUGI after the tasks run, if it does get shared between
tasks and we close it then we close it for both tasks. The only thing I can
say is don't do it... Its really no different then if we move the doas up, we
have to make sure the task code doesn't close the filesystem and cause issues
for the entire executor. Either way you have to be careful. if we move it up
we can actually share it and seems a bit less prone to error.
Also in response to @aarondav first concern, I don't really follow. You
aren't getting any benefits from the cache now. Its specific to that task so no
other tasks can use that cached version of the filesystem so it is causing
extra overhead.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---