GitHub user tgravescs opened a pull request:

    https://github.com/apache/spark/pull/621

    [WIP] SPARK-1676: Cache Hadoop UGIs by default to prevent FileSystem leak

    Move the doAs in Executor higher up so that we only have 1 ugi and aren't 
leaking filesystems. 
    Fix spark on yarn to work when the cluster is running as user "yarn" but 
the clients are launched as the user and want to read/write to hdfs as the user.
    
    Note this hasn't been fully tested yet.  Need to test in standalone mode, 
need to test to make sure it doesn't leak filesystems, need to look at the 
local mode backend.  One specific thing I need to look at in standalone mode is 
I don't think SPARK_USER is set when the CoarseGrainedExecutorBackend runs the 
doas in this case so that will have to change slightly or make sure its set 
when this is called.
    
    Putting this up for people to look at and possibly test.  I don't have 
access to a mesos cluster.
    
    This is alternative to https://github.com/apache/spark/pull/607

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tgravescs/spark SPARK-1676

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/621.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #621
    
----
commit 93988531d7ab4a4e902949744f80121593ba1f52
Author: Thomas Graves <[email protected]>
Date:   2014-05-02T20:43:34Z

    change to have doAs in executor higher up.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to