Github user mccheah commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-75871551
One way around this could be to require the driver to also have read-access
to the keytab. In this model, any user that wishes to run a Spark job must also
be able to authenticate to HDFS from their machine using that keytab. That
prevents arbitrary users from accessing Spark remotely without the keytab and
abusing the privileges that the HDFS superuser has - they would only be allowed
to do so if they themselves had the permissions to obtain those privileges
independent of Spark.
Indeed, logging in driver-side is already required in
HadoopRDD.getPartitions() and PairRDDFunctions.saveAsHadoopDataset (my patch
adds this, but still fixing the merge right now so it might be hard to find) -
these query HDFS for file metadata, and require logging in at the driver. We
can just move the logging in to the Spark Context initialization.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]