Dear all,
We're running into some permission issues with a simple setup of Spark
on YARN.
User A starts the YARN resourcemanager on machine 1 and YARN nodemanager
on machine 2
User B starts a Spark application (with spark.master = "yarn") on machine 1
We have already changed some parameters in HADOOP/YARN/SPARK, namely
* in yarn-site.xml:
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/tmp</value>
</property>
<property>
<name> </name>
<value>read,write,execute,delete</value>
</property>
<property>
<name>yarn.nodemanager.default-container-executor.log-dirs.permissions</name>
<value>777</value>
</property>
* in core-default.xml:
<property>
<name>fs.permissions.umask-mode</name>
<value>000</value>
<description>
The umask used when creating files and directories.
Can be in octal or in symbolic. Examples are:
"022" (octal for u=rwx,g=r-x,o=r-x in symbolic),
or "u=rwx,g=rwx,o=" (symbolic for 007 in octal).
</description>
</property>
* in spark-defaults.conf:
spark.yarn.stagingDir /tmp/spark-yarn-staging-dir
This in turn makes folders under the YARN log directory (/tmp/userlogs)
have permissions 777, while YARN local directory for the specific user
(/tmp/usercache/userA/) have just 750 permissions.
Even more weirdly, when user A starts a Spark application, the current
application directory under the YARN local directory folder, for example
/tmp/usercache/userA/appcache/application_1613995549456_0001, has the
following permissions:
drwx--x---. 34 userA userA 4096 Feb 22 13:31 application_1613995549456_0001
At the same time, the spark staging directory looks to have 777
permissions on all subfolders until
/tmp/spark-yarn-staging-dir/userA/.sparkStaging/ . But following
subfolders after that, those that get created during an application,
have only 700 permission!
This stops user B from sending Spark applications to the YARN cluster
whatsoever, with errors like
File
file:/tmp/spark-yarn-staging-dir/userB/.sparkStaging/application_1613997737582_0001/scala-library-2.12.10.jar
does not exist
And why are certain that those jars exist. In addition, we tried to
quickly change the permissions of the application folder to 777 on the
fly just after it starts, that makes the application run fine without
any errors. We have tried many parameters and we're stuck right now, we
just think that it has to do somehow with the fact that
yarn.nodemanager.default-container-executor.log-dirs.permissions accepts
the classic user/group/all values whereas
yarn.nodemanager.runtime.linux.sandbox-mode.local-dirs.permissions only
has this comma separated list of values syntax that doesn't allow for
extending the permissions to group/all.
We hope somebody will be able to help us out, thanks in advance.
Cheers,
Vincenzo
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org