Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/17723
  
    > Support for long running applications (which require token renewal, etc) 
was added much later in spark
    
    That's different and not what this change is about. Support for Hadoop 
security (i.e. delegation tokens) has existed at least since Spark 1.0 (I'm 
hazy before that since that's when I started playing with Spark). And it 
doesn't change the fact that it's the only custom security framework that 
people have ever tried to use with Spark, as far as I'm aware.
    
    Hadoop security is different and puts a lot of burden on clients to do 
things; there are good reasons for that, but it means that it's not as simple 
as just providing a password. I wish there was a library that made all this 
easier (wouldn't it be great if there was a single service to contact and ask 
for "delegation token for service X", like the Kerberos TGS?), but that's not 
the case.
    
    I can think of 3 other types of systems that Spark supports (directly or 
through extensions):
    
    - Those with no security. e.g. Kudu (as far as I know that's still in the 
roadmap), Kafka 0.8, etc
    - Those that are happy with just a simple secret stashed somewhere. e.g. 
S3, JDBC drivers, etc. Even though I've never seen it, I also count cert-based 
authentication here, since I'm pretty sure you can achieve that with existing 
features in Spark.
    - Systems that implemented Kerberos-based auth but not Hadoop delegation 
tokens. (Looking at you, Kafka 0.10.) That means it's really hard to use those 
services in a distributed environment where security is enabled.
    
    None of those require code in Spark to handle things specially (the third 
would, but then you'd run into the good reasons why delegation tokens exist in 
the first place, so they really should start using them instead).
    
    > If we are not exposing an api for spark core, while maintaining backward 
compatibility
    
    I guess I'm a little less queasy than you are about exposing an unstable 
API. That's what the "Unstable" annotation means to me. It's an API that is 
still being designed, and exposing it serves as both a way to let people write 
extensions that fit the model and also collect feedback about things that don't 
fit well.
    
    I have issues with the things you're suggesting for a few different reasons.
    
    - keeping the API private means people won't extend it, so we won't get 
feedback if there's any.
    
    - moving the API to a separate module is a distinction without a 
difference. It will still be a public Spark API, and still should follow the 
rules of backwards compatibility. It would just increase coupling since it's 
very unlikely that core wouldn't call into that module (since many people have 
asked for Hadoop auth support in standalone too - I have some issues with the 
security model there but it's a separate discussion).
    
    - trying to work on an abstract interface to rule them all is a wild goose 
chase. Unless you can point me to the contrary, we don't have an example of 
what a different system would look like, so whatever abstract interface we end 
up with will still be heavily modeled after the Hadoop system. Not exposing 
Hadoop types is not a great gain if the whole mechanism still works like Hadoop 
security. (It makes Spark's handling of backwards compatibility easier, 
probably, but here the model is more important than the types exposed in the 
API.)
    
    Yes, exposing an unstable interface risks more work in the future. We may 
have to change it, and then we have to decide whether to write code to keep 
compatibility, or break all users. It's a risk, but at the same time, we have a 
few years of only needing to support this model, so it doesn't seem like it's 
that big of a risk to me.
    
    In the case that this elusive other system shows up in the future, we'll 
probably have a lot more issues; we could try to merge both APIs or just handle 
it with a completely separate one. In either case, there will be work. So I'm 
really not seeing the benefit of going out of our way to mitigate future work. 
It's better to do that work when we have a better idea of what it even looks 
like.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to