Github user vanzin commented on a diff in the pull request:
https://github.com/apache/spark/pull/20945#discussion_r178986112
--- Diff:
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala
---
@@ -506,6 +506,10 @@ private[spark] class MesosClusterScheduler(
options ++= Seq("--class", desc.command.mainClass)
}
+ desc.conf.getOption("spark.mesos.proxyUser").foreach { v =>
+ options ++= Seq("--proxy-user", v)
--- End diff --
I'm still confused about how submission works on Mesos in cluster mode. You
mention a DC/OS CLI. Does that mean you're not using spark-submit?
The point I'm trying to make is that using `--proxy-user` in client mode in
this context is a security issue. And I'm really uncomfortable with adding code
in Spark that is basically a big security hole. You're basically giving up the
idea of multiple users here, since by doing that any user can impersonate
anyone else.
To comment on a few things:
> yarn has as it assumes hdfs to manage secrets.
No. Secrets (in this context, delegation tokens) are sent directly to YARN
in the application request. No HDFS involved. That's done in
`Client.setupSecurityToken` in the Spark code base.
The launcher has the TGT; the launcher creates DTs for services and
attaches them to the YARN container.
work
So regardless of whether the user is running the app as himself, or
impersonating another user, the application itself *does not* have access to
the user's Kerberos credentials (not the password, not the keytab, not the
TGT). It only has access to the DTs created by the launcher.
And that's why Spark-on-YARN is secure in this context.
> first I would have to log in my node as user X who can impersonate other
users
I don't understand why do you have to impersonate at all here. If someone
is launching an app, most of the time they want to launch the app as
themselves. That's the "not impersonating" case in the above YARN scenario.
> I would upload user's X ticket cache (which I can point to with
KRB5CCNAME) on the cluster
Why? That's not even guaranteed to work (what if the TGT is bound to the
machine that requested it?). That's why Spark-on-YARN creates DTs. That's what
you should be giving the application.
Mesos in client mode does that AFAIK - because the user starting the
application needs a proper kerberos login for things to work. But in cluster
mode you need to figure out how to get the DTs to the application. And
impersonation is not the way to do it for the reasons already explained.
So, going back to a previous suggestion of mine you have two ways of fixing
this:
- require the launcher to have a kerberos login, and send DTs to the
application. a.k.a. what Spark-on-YARN does.
- in the code that launches the driver on the Mesos side, create the DTs in
a safe context (e.g. *not* as part of the spark-submit invocation) and provide
them to the Spark driver using the `HADOOP_TOKEN_FILE_LOCATION` env var.
Sorry but what you have here won't work. If doing that, you might as well
run all applications on your cluster as the super user - things will be just as
secure.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]