> On 16 Sep 2016, at 04:43, gsvigruha <gergely.svigr...@lynxanalytics.com> > wrote: > > Hi, > > is there a way to impersonate multiple users using the same SparkContext > (e.g. like this > https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/Superusers.html) > when going through the Spark API? > > What I'd like to do is that > 1) submit a long running Spark yarn-client application using a Hadoop > superuser (e.g. "super") > 2) impersonate different users with "super" when reading/writing restricted > HDFS files using the Spark API > > I know about the --proxy-user flag but its effect is fixed within a > spark-submit. > > I looked at the code and it seems the username is determined by the > SPARK_USER env var first (which seems to be always set) and then the > UserGroupInformation. > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L2247 > What I'd like I guess is the UserGroupInformation to take priority. >
If you can get the Kerberos tickets or Hadoop tokens all the way to your code, then you execute the code in a doAs call, this adopts the kerberos tokens of that context to access HDFS, Hive, HBase, etc otherUserUGI.doAs { .... } If you just want to run something as a different user -short lived: have oozie set things up -long-lived: you need the kerberos keytab of whoever the app needs to run as. On an insecure cluster, the identity used to talk to HDFS can actually be set in the env var HADOOP_USER_NAME, you can also use some of the UGI methods like createProxyUser() to create the identity to spoof in val hbase = UserGroupInformation.createRemoteUser("hbase") hbase.doAs() { ... } some possibly useful information https://www.youtube.com/watch?v=Xz2tPmK2cKg https://steveloughran.gitbooks.io/kerberos_and_hadoop/content/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org