I just implemented this in our application. The impersonation is done before 
the job is submitted. In spark yarn (we are using yarn cluster mode) , it just 
takes the current User from UserGroupInfoemation and summitted to yarn resource 
manager. 

If one use Kinit from command line, the who Jvm needs to has the same principal 
and you have to handle ticket expiration with cron job. 

If this is individual cli at hoc job, this might be ok. But if you intended to 
use an application to run spark job and end user interact with spark, then you 
need set up a service super user use that user to login to Kerbros KDC (Kinit 
equivalent) programmally, then create proxy user to impersonate end user. You 
can handle ticket expiration in code as well. So there is no need of cron job

Certainly one can move all these logic to spark, one need to create spark 
service user principal and keytab. As part of the spark job submit , one can 
pass the principal and keytab location to the spark and spark can create a 
proxy user if the authentication is Kerberos, as well as add job delegation 
tokens

I will love to contribute this if we need this in spark , as I just completed 
the Hadoop Kerberos authentication feature, It covers pig, map reduce , spark, 
sqoops as well as standard HDFS access.

I will take a look at sandy's jira 

Chester 

> On Feb 2, 2015, at 2:37 PM, Jim Green <openkbi...@gmail.com> wrote:
> 
> Hi Team,
> 
> Does spark support impersonation?
> For example, when spark on yarn/hive/hbase/etc..., which user is used by 
> default?
> The user which starts the spark job?
> Any suggestions related to impersonation?
> 
> -- 
> Thanks,
> www.openkb.info 
> (Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)

Reply via email to