Github user mccheah commented on the pull request:

    https://github.com/apache/spark/pull/4106#issuecomment-75871551
  
    One way around this could be to require the driver to also have read-access 
to the keytab. In this model, any user that wishes to run a Spark job must also 
be able to authenticate to HDFS from their machine using that keytab. That 
prevents arbitrary users from accessing Spark remotely without the keytab and 
abusing the privileges that the HDFS superuser has - they would only be allowed 
to do so if they themselves had the permissions to obtain those privileges 
independent of Spark.
    
    Indeed, logging in driver-side is already required in 
HadoopRDD.getPartitions() and PairRDDFunctions.saveAsHadoopDataset (my patch 
adds this, but still fixing the merge right now so it might be hard to find) - 
these query HDFS for file metadata, and require logging in at the driver. We 
can just move the logging in to the Spark Context initialization.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to