> On 31 Aug 2015, at 11:02, Daniel Schulz <[email protected]> wrote:
> 
> Hi guys,
> 
> In a nutshell: does Spark check and respect user privileges when 
> reading/writing data.

Yes, in a locked down YARN cluster —until your tokens expire

> 
> I am curious about the data security when Spark runs on top of HDFS — maybe 
> though YARN. Is Spark running it's long-running JVM processes as a Spark 
> user, that makes no distinction when accessing data? So is there a 
> shortcoming when using Spark because the JVM processes are already running 
> and therefore the launching user is omitted by Spark when accessing data 
> residing on HDFS? Or is Spark only reading/writing data, that the user had 
> access to, that launched this Thread?


in a kerberized YARN cluster, the processes run as the specific user submitting 
the job (or whoever the kerberos ID -> OS ID mapping files say they are), with 
the delegated tokens passed up from the client to talk to HDFS. In Spark 1.5 
you get the Hive credentials pushed up too.

This means that access is granted with the rights of the user deploying the 
application, HDFS checking it on every request.

It also means that when the HDFS delegation tokens expire, your HDFS access 
goes away. Spark 1.5 addresses this by allowing you to optionally provide a 
keytab for the app master, which is used to re-authenticate with the KDC, and 
then HDFS. This changes the problem to "getting your cluster ops team to give 
you a keytab"

the New ORA book, Hadoop Security, is the best start to Hadoop cluster 
security; Spending some money on the eBook is a worthwhile investment


I'm doing a low-level document on the internals at 
https://github.com/steveloughran/kerberos_and_hadoop/ —though that's targeted 
at developers and people debugging their code more than users of apps



> 
> What about local store when running in Standalone mode? What about access 
> calls to HBase or Hive then?
> 

Someone else will have to cover that
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to