Github user vanzin commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-76028480
Hi @mccheah, just to clarify my comments, I'm thinking about someone
looking at this feature and thinking that "hey, Spark Standalone now supports
kerberos!", while that's not entirely true. There are many caveats to this
approach. It might cover the needs of your use case, and I'm willing to give
you the benefit of the doubt about that, even though I have to be frank and say
I don't really understand it.
That being said, checking access to the keytab on the driver may not be
enough for security purposes. If you do it explicitly when creating the
SparkContext, it might work for client mode, although that also means you're
trusting the client machine. But it doesn't work at all in cluster mode, where
the driver would be run as a child process of a Worker. So there's extra
configuration to not allow the Master to service requests from outside the
trusted domain, for example.
As for the KDC denying logins, it might be possible to work around it by
using `user/host@REALM`-style principals, instead of the same principal for
every worker. You don't need to log in for every executor - just have the
Worker manage the kerberos login, and all processes launched by it will
automatically inherit the credentials. Having different principals for each
Worker means the KDC won't freak out. I believe that HDFS ignores the "host"
part when determining the actual user, but I might be wrong there - you'd have
to check that out.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]