Github user mridulm commented on the issue:

    https://github.com/apache/spark/pull/17723
  
    
    
    Hi @mgummelt, hopefully I have clarified some of my thinking above. 
Responding to specific points below.
    
    > It seems the first point of contention is the distinction between Hadoop 
and YARN. This PR relies on Hadoop libraries in core, but it shouldn't rely on 
yarn. If it is, that's a mistake and I should fix that.
    
    My query was not regarding the actual credential provider implementations 
themselves (which will require their dependencies), but whether spark core 
needs to depend on the api.
    Put another way, suppose we moved credential provider implementations into 
a separate module - will spark core still need to depend on this or not ?
    
    @vanzin's made the point that since we depend on hadoop-client, which 
depends on hadoop-security - this does not matter anymore :-)
    
    
    > Then the discussion becomes whether we should rely on Hadoop in core. It 
looks like @mridulm acknowledges we're already using Hadoop in core, so I hope 
we agree that this PR doesn't create a new problem, but that it does increase 
the coupling.
    
    Hopefully I covered this in my earlier comments.
    I was not revisiting use of hadoop in core (that is pervasive in spark), 
but whether hadoop-security model is sufficient for what we are attempting.
    
    
    > And also as @vanzin points out, ultimately, there's no way to get around 
the requirement of using Hadoop security libraries such as UGI if our goal is 
to access Hadoop services. Hadoop services require Hadoop delegation tokens, 
rather than some more broadly applicable security standard. And I hope we agree 
we don't want to duplicate the UGI client code in both the yarn and mesos 
module (sharing that client code was the whole motivation of this PR).
    
    Definitely agree on not duplicating code !
    
    Paraphrasing what you mentioned earlier and elaborating, I am trying to 
understand if:
    * We can assume hadoop-security is sufficient for our usecases.
      * In this case, we leverage existing implementations as-is (more or less) 
- and can expose hadoop-security in our interface definitions (`traits`, 
execution environment, how we use the credentials).
    * hadoop-security becomes one supported model (and currently only one). For 
example, model definition could be:
      * Defining pre-requisites (principal/keytab, external credential update, 
etc).
      * environment setup (`UGI. loginUserFromKeytabAndReturnUGI.doAs`, etc 
currently used).
      * Application of acquired credentials 
(`UGI.getCurrentUser.addCredentials`) at executors and driver.
      * credential provider's declaring which model they are for.
    * Perhaps some other solution (synthesis of the above ? new ?)
    
    > So the only alternative I see would be to create a separate hadoop 
module, place all this code there, and create new interfaces that 
Hadoop-specific code would implement. One obstacle to that is the massive 
amount of work. The other is that I'm not a huge fan of creating interfaces 
when we only have on implementation, since you often end up with the wrong 
interface, so you have to rewrite it anyway.
    
    I share your concerns here ! Creating incorrect interface we get stuck 
with, a single implementation causing our interfaces to be very specialized, 
potentially over-designing.
    I am trying to understand if what we have is sufficient or do we need to 
look more closely at our dependency on hadoop-security.
    
    
    I am not very familiar with mesos or kubernetes, but a cursory search 
indicated they have other forms of authentication ? If yes can you comment if, 
with the current model, will mesos be able to evolve to support others ?
    Since I do not have sufficient compelling examples to give, I am unable to 
convince @vanzin :-)
    
    > So my proposal is that we acknowledge that decoupling all Hadoop code 
into a hadoop module and placing UGI behind a common interface should be done 
at some point, but we wait to do it until we at least have some other security 
provider that would implement that interface.
    
    
    Since we are exposing credential providers as an api from core, we will 
need to support it.
    If it is possible to support other models in future without breaking our 
exposed interfaces - that is something which would be an excellent way forward 
too.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to