Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20945#discussion_r178986112
  
    --- Diff: 
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala
 ---
    @@ -506,6 +506,10 @@ private[spark] class MesosClusterScheduler(
           options ++= Seq("--class", desc.command.mainClass)
         }
     
    +    desc.conf.getOption("spark.mesos.proxyUser").foreach { v =>
    +      options ++= Seq("--proxy-user", v)
    --- End diff --
    
    I'm still confused about how submission works on Mesos in cluster mode. You 
mention a DC/OS CLI. Does that mean you're not using spark-submit?
    
    The point I'm trying to make is that using `--proxy-user` in client mode in 
this context is a security issue. And I'm really uncomfortable with adding code 
in Spark that is basically a big security hole. You're basically giving up the 
idea of multiple users here, since by doing that any user can impersonate 
anyone else.
    
    To comment on a few things:
    
    > yarn has as it assumes hdfs to manage secrets.
    
    No. Secrets (in this context, delegation tokens) are sent directly to YARN 
in the application request. No HDFS involved. That's done in 
`Client.setupSecurityToken` in the Spark code base.
    
    The launcher has the TGT; the launcher creates DTs for services and 
attaches them to the YARN container.
    work
    So regardless of whether the user is running the app as himself, or 
impersonating another user, the application itself *does not* have access to 
the user's Kerberos credentials (not the password, not the keytab, not the 
TGT). It only has access to the DTs created by the launcher.
    
    And that's why Spark-on-YARN is secure in this context.
    
    > first I would have to log in my node as user X who can impersonate other 
users
    
    I don't understand why do you have to impersonate at all here. If someone 
is launching an app, most of the time they want to launch the app as 
themselves. That's the "not impersonating" case in the above YARN scenario.
    
    > I would upload user's X ticket cache (which I can point to with 
KRB5CCNAME) on the cluster 
    
    Why? That's not even guaranteed to work (what if the TGT is bound to the 
machine that requested it?). That's why Spark-on-YARN creates DTs. That's what 
you should be giving the application.
    
    Mesos in client mode does that AFAIK - because the user starting the 
application needs a proper kerberos login for things to work. But in cluster 
mode you need to figure out how to get the DTs to the application. And 
impersonation is not the way to do it for the reasons already explained.
    
    So, going back to a previous suggestion of mine you have two ways of fixing 
this:
    
    - require the launcher to have a kerberos login, and send DTs to the 
application. a.k.a. what Spark-on-YARN does.
    - in the code that launches the driver on the Mesos side, create the DTs in 
a safe context (e.g. *not* as part of the spark-submit invocation) and provide 
them to the Spark driver using the `HADOOP_TOKEN_FILE_LOCATION` env var.
    
    Sorry but what you have here won't work. If doing that, you might as well 
run all applications on your cluster as the super user - things will be just as 
secure.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to