Dear all,

We're running into some permission issues with a simple setup of Spark on YARN.

User A starts the YARN resourcemanager on machine 1 and YARN nodemanager on machine 2

User B starts a Spark application (with spark.master = "yarn") on machine 1

We have already changed some parameters in HADOOP/YARN/SPARK, namely

* in yarn-site.xml:

<property>
  <name>yarn.nodemanager.local-dirs</name>
  <value>/tmp</value>
</property>

<property>
  <name> </name>
  <value>read,write,execute,delete</value>
</property>

<property>
<name>yarn.nodemanager.default-container-executor.log-dirs.permissions</name>
  <value>777</value>
</property>

* in core-default.xml:

<property>
  <name>fs.permissions.umask-mode</name>
  <value>000</value>
  <description>
    The umask used when creating files and directories.
    Can be in octal or in symbolic. Examples are:
    "022" (octal for u=rwx,g=r-x,o=r-x in symbolic),
    or "u=rwx,g=rwx,o=" (symbolic for 007 in octal).
  </description>
</property>

* in spark-defaults.conf:

spark.yarn.stagingDir /tmp/spark-yarn-staging-dir


This in turn makes folders under the YARN log directory (/tmp/userlogs) have permissions 777, while YARN local directory for the specific user (/tmp/usercache/userA/) have just 750 permissions.

Even more weirdly, when user A starts a Spark application, the current application directory under the YARN local directory folder, for example /tmp/usercache/userA/appcache/application_1613995549456_0001, has the following permissions:

drwx--x---. 34 userA userA 4096 Feb 22 13:31 application_1613995549456_0001


At the same time, the spark staging directory looks to have 777 permissions on all subfolders until /tmp/spark-yarn-staging-dir/userA/.sparkStaging/ . But following subfolders after that, those that get created during an application, have only 700 permission!

This stops user B from sending Spark applications to the YARN cluster whatsoever, with errors like

File file:/tmp/spark-yarn-staging-dir/userB/.sparkStaging/application_1613997737582_0001/scala-library-2.12.10.jar does not exist

And why are certain that those jars exist. In addition, we tried to quickly change the permissions of the application folder to 777 on the fly just after it starts, that makes the application run fine without any errors. We have tried many parameters and we're stuck right now, we just think that it has to do somehow with the fact that yarn.nodemanager.default-container-executor.log-dirs.permissions accepts the classic user/group/all values whereas yarn.nodemanager.runtime.linux.sandbox-mode.local-dirs.permissions only has this comma separated list of values syntax that doesn't allow for extending the permissions to group/all.

We hope somebody will be able to help us out, thanks in advance.

Cheers,

Vincenzo


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org

Reply via email to