I just implemented this in our application. The impersonation is done before the job is submitted. In spark yarn (we are using yarn cluster mode) , it just takes the current User from UserGroupInfoemation and summitted to yarn resource manager.
If one use Kinit from command line, the who Jvm needs to has the same principal and you have to handle ticket expiration with cron job. If this is individual cli at hoc job, this might be ok. But if you intended to use an application to run spark job and end user interact with spark, then you need set up a service super user use that user to login to Kerbros KDC (Kinit equivalent) programmally, then create proxy user to impersonate end user. You can handle ticket expiration in code as well. So there is no need of cron job Certainly one can move all these logic to spark, one need to create spark service user principal and keytab. As part of the spark job submit , one can pass the principal and keytab location to the spark and spark can create a proxy user if the authentication is Kerberos, as well as add job delegation tokens I will love to contribute this if we need this in spark , as I just completed the Hadoop Kerberos authentication feature, It covers pig, map reduce , spark, sqoops as well as standard HDFS access. I will take a look at sandy's jira Chester > On Feb 2, 2015, at 2:37 PM, Jim Green <openkbi...@gmail.com> wrote: > > Hi Team, > > Does spark support impersonation? > For example, when spark on yarn/hive/hbase/etc..., which user is used by > default? > The user which starts the spark job? > Any suggestions related to impersonation? > > -- > Thanks, > www.openkb.info > (Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)