[GitHub] spark issue #17723: [SPARK-20434][YARN][CORE] Move kerberos delegation token...

mgummelt Mon, 01 May 2017 11:40:43 -0700

Github user mgummelt commented on the issue:

    https://github.com/apache/spark/pull/17723
  
    Hey @vanzin @mridulm.  Sorry for joining the party a bit late.  I just read 
through the discussion.
    
    It seems the first point of contention is the distinction between Hadoop 
and YARN.  This PR relies on Hadoop libraries in `core`, but it shouldn't rely 
on `yarn`.  If it is, that's a mistake and I should fix that.
    
    Then the discussion becomes whether we should rely on Hadoop in `core`.  It 
looks like @mridulm acknowledges we're already using Hadoop in `core`, so I 
hope we agree that this PR doesn't create a *new* problem, but that it does 
increase the coupling. 
    
    I agree that, ideally, all Hadoop-specific code would be factored out into 
a separate `hadoop` module.  But, as @vanzin points out, doing so would be a 
massive undertaking.  Spark identity and access control is based on Hadoop 
security (`UserGroupInformation`), and Hadoop filesystems are exposed through 
`HadoopRDD`.  We'd have to, at minimum, create an entirely new Spark access 
control interface for which `UGI` is just one provider.
    
    And also as @vanzin points out, ultimately, there's no way to get around 
the requirement of using Hadoop security libraries such as `UGI` if our goal is 
to access Hadoop services.  Hadoop services require Hadoop delegation tokens, 
rather than some more broadly applicable security standard.  And I hope we 
agree we don't want to duplicate the `UGI` client code in both the `yarn` and 
`mesos` module (sharing that client code was the whole motivation of this PR).
    
    So the only alternative I see would be to create a separate `hadoop` 
module, place all this code there, and create new interfaces that 
Hadoop-specific code would implement.  One obstacle to that is the massive 
amount of work.  The other is that I'm not a huge fan of creating interfaces 
when we only have on implementation, since you often end up with the wrong 
interface, so you have to rewrite it anyway.
    
    So my proposal is that we acknowledge that splitting out all Hadoop code 
from core and placing it behind a common interface should be done at some 
point, but we wait to do it until we at least have some other security provider 
that would implement that interface. 
    
    BTW, I'll definitely go back and ensure that no Hadoop interfaces are 
publicly exposed in `core`.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #17723: [SPARK-20434][YARN][CORE] Move kerberos delegation token...

Reply via email to