> On 16 Sep 2016, at 04:43, gsvigruha <gergely.svigr...@lynxanalytics.com> 
> wrote:
> Hi,
> is there a way to impersonate multiple users using the same SparkContext
> (e.g. like this
> https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/Superusers.html)
> when going through the Spark API?
> What I'd like to do is that
> 1) submit a long running Spark yarn-client application using a Hadoop
> superuser (e.g. "super")
> 2) impersonate different users with "super" when reading/writing restricted
> HDFS files using the Spark API
> I know about the --proxy-user flag but its effect is fixed within a
> spark-submit.
> I looked at the code and it seems the username is determined by the
> SPARK_USER env var first (which seems to be always set) and then the
> UserGroupInformation.
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L2247
> What I'd like I guess is the UserGroupInformation to take priority.

If you can get the Kerberos tickets or Hadoop tokens all the way to your code, 
then you execute the code in a doAs call, this adopts the kerberos tokens of 
that context to access HDFS, Hive, HBase, etc

otherUserUGI.doAs {

If you just want to run something as a different user

-short lived: have oozie set things up
-long-lived: you need the kerberos keytab of whoever the app needs to run as. 

On an insecure cluster, the identity used to talk to HDFS can actually be set 
in the env var HADOOP_USER_NAME, you can also use some of the UGI methods like 
createProxyUser() to create the identity to spoof in 

val hbase = UserGroupInformation.createRemoteUser("hbase")
hbase.doAs() { ... }

some possibly useful information


To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to