Github user mgummelt commented on the issue:
https://github.com/apache/spark/pull/16788
@vanzin When you say "distributing the principal's credentials", I take it
you mean that the driver logs in via Kerberos, and submits the resulting token
(TGT?) via `amContainer.setTokens`. That's what I understand from reading the
code. Whereas the Hadoop delegation tokens are distributed via HDFS itself.
Is this correct?
I think this is necessary for YARN, because in both client and cluster
mode, the `ApplicationMaster` runs remotely, correct? In Mesos client mode,
the scheduler runs in the same process as `spark-submit` (the driver), so
there's no need for Kerberos token distribution. The scheduler can simply use
the `UserGroupInformation` the user initially logged in with.
We would need some method of Kerberos token distribution in cluster mode,
but we can punt on that. Users have many ways of running Spark jobs
asynchronously, and we'll have to take those one by one. I think we can just
focus on solving this in client mode for now.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]