Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/17723
  
    > As I detailed above, the exposed spi contract is not just the interface, 
but also how it is to be used by spark.
    
    There's very little lee way on that front. Those semantics are mostly 
already defined by Hadoop security; as I said, you can try to create some 
abstraction on top of that, but it would be abstraction for the sake of 
abstraction: the Hadoop implementation would basically be exactly this code, 
and we don't have an example of any other types of services to see if the 
abstraction actually makes sense for them.
    
    > It would ensure that spark is not tied to hadoop security
    
    There's nothing tying Spark to Hadoop security here. There's code that 
allows Spark to use Hadoop services with security enabled, but that does not 
preclude other types of services from being used. I've given many examples of 
that already.
    
    > Once interfaces are exposed, and have been exposed for a few releases, it 
is exceedingly difficult for users if we break/modify them
    
    Then we make an effort to not break them. Adding an abstraction that does 
not abstract anything useful does not help with that. If anything, it makes 
things worse.
    
    > // Modify ConfigurableCredentialManager to segregate by serviceType of 
ServiceTokenProvider
    
    See, you're already making assumptions that these other systems use 
credentials in a certain way.
    
    >   override def acquireTokens(manager: ConfigurableCredentialManager): 
Array[Byte] = {
    
    Now you're exposing `ConfigurableCredentialManager` as a public API, which 
it was never meant to be.
    
    > // Spark core itself becomes decoupled from hadoop-security with the 
above.
    
    It doesn't. You just created an interface that mimics the Hadoop security 
interfaces, and if you change the semantics of those, you'll end up breaking 
Hadoop security.
    
    > This ensures that future requirements are reasonably captured 
    
    No, you just created an interface that is Hadoop security without Hadoop 
types. That does not take into account any other service type. You're just 
assuming that they'll behave like Hadoop services.
    
    So, again, abstraction for the sake of abstraction. It is much simpler if 
instead of trying to do this, you leave them as separate concerns. The 
Hadoop-based security interface exposes Hadoop semantics. If later there's a 
different type of service, it will probably need to expose a different public 
interface for services to provide their credentials. You're just guessing that 
they behave similarly enough to Hadoop that your proposed interface will 
suffice.
    
    Then, whether internally to Spark those two can be handled similarly (so 
that code can be shared) is a different question that can only be answered when 
those services are known.
    
    >   // Invoked in executor
    >  def applyTokens(data: Array[Byte])
    
    This is maybe the only good idea I see in your proposal, but at the same 
time, it's already internally handled by Spark with the current code and 
doesn't need to be exposed to the credential provider implementations.
    
    I really don't understand what the abstraction is buying anybody at this 
point. If you don't know what other types of services you want to support, by 
definition you can't create an abstraction. So, as I pointed out, your 
abstraction looks exactly like the only type of service it does support - 
Hadoop. Which makes it kinda pointless as an abstraction.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to