This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new eb6fd7e  [SPARK-26877][YARN] Support user-level app staging directory 
in yarn mode when spark.yarn…
eb6fd7e is described below

commit eb6fd7eab77d3d5b2e7e827a21b127b146e5c089
Author: Liupengcheng <liupengch...@xiaomi.com>
AuthorDate: Wed Feb 20 11:45:12 2019 -0800

    [SPARK-26877][YARN] Support user-level app staging directory in yarn mode 
when spark.yarn…
    
    Currently, when running applications on yarn mode, the app staging 
directory of  is controlled by `spark.yarn.stagingDir` config if specified, and 
this directory cannot separate different users, sometimes, it's inconvenient 
for file and quota management for users.
    
    Sometimes, there might be an unexpected increasing of the staging files, 
two possible reasons are:
    1. The `spark.yarn.preserve.staging.files` provided can be misused by users
    2. cron task constantly starting new applications on non-existent yarn 
queue(wrong configuration).
    
    But now, we are not easy to find out the which user obtains the most HDFS 
files or spaces.
    what's more, even we want set HDFS name quota or space quota for each user 
to limit the increase is impossible.
    
    So I propose to add user sub directories under this app staging directory 
which is more clear.
    
    existing UT
    
    Closes #23786 from liupc/Support-user-level-app-staging-dir.
    
    Authored-by: Liupengcheng <liupengch...@xiaomi.com>
    Signed-off-by: Marcelo Vanzin <van...@cloudera.com>
---
 .../yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala      | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
index 6ca81fb..e0dba8c 100644
--- 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
+++ 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
@@ -177,7 +177,8 @@ private[spark] class Client(
 
       // The app staging dir based on the STAGING_DIR configuration if 
configured
       // otherwise based on the users home directory.
-      val appStagingBaseDir = sparkConf.get(STAGING_DIR).map { new Path(_) }
+      val appStagingBaseDir = sparkConf.get(STAGING_DIR)
+        .map { new Path(_, 
UserGroupInformation.getCurrentUser.getShortUserName) }
         .getOrElse(FileSystem.get(hadoopConf).getHomeDirectory())
       stagingDirPath = new Path(appStagingBaseDir, getAppStagingDir(appId))
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to