Github user pwendell commented on the pull request:
https://github.com/apache/spark/pull/2320#issuecomment-55515763
@vanzin there is currently a path where the `addFile` HTTP server is
authenticated via a shared secret and this under the hood uses Diffie-Helman.
This is used in YARN mode.
@tgravescs if a user is manually setting the shared secret on the driver
and worker nodes, am I correct in understanding that addFile will be properly
authenticated, at least for the transfer? This is my understanding based on the
original design, but please correct me if that's wrong.
I think the user requirement is the following:
1. A company is running a standalone cluster.
2. They are fine if all Spark jobs in the cluster run as the same HDFS
user, they just want to have a mechanism to access a secured HDFS environment.
3. They are fine to log in based on a keytab.
4. They also don't want to trust the network on the cluster. I.e. don't
want to allow someone to fetch HDFS tokens easily over a known protocol,
without authentication.
AFAIK there are a fairly wide number of use cases like this.
I think this is achievable if the user just manually sets
`spark.authenticate.secret` at the driver and workers. And then we use
`sc.addFile` to disseminate tokens. @tgravescs - does that seem correct? We
also need to test whether this works well... for instance I think at this point
we'll actually ship the value of `spark.authenticate.secret` across the wire if
it's set. But architecturally, I do think this would work.
I'll update the requirements on the JIRA to match those proposed in this
comment.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]