GitHub user tgravescs opened a pull request:
https://github.com/apache/spark/pull/621
[WIP] SPARK-1676: Cache Hadoop UGIs by default to prevent FileSystem leak
Move the doAs in Executor higher up so that we only have 1 ugi and aren't
leaking filesystems.
Fix spark on yarn to work when the cluster is running as user "yarn" but
the clients are launched as the user and want to read/write to hdfs as the user.
Note this hasn't been fully tested yet. Need to test in standalone mode,
need to test to make sure it doesn't leak filesystems, need to look at the
local mode backend. One specific thing I need to look at in standalone mode is
I don't think SPARK_USER is set when the CoarseGrainedExecutorBackend runs the
doas in this case so that will have to change slightly or make sure its set
when this is called.
Putting this up for people to look at and possibly test. I don't have
access to a mesos cluster.
This is alternative to https://github.com/apache/spark/pull/607
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tgravescs/spark SPARK-1676
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/621.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #621
----
commit 93988531d7ab4a4e902949744f80121593ba1f52
Author: Thomas Graves <[email protected]>
Date: 2014-05-02T20:43:34Z
change to have doAs in executor higher up.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---