Github user mridulm commented on the issue:
https://github.com/apache/spark/pull/17723
> The only public interface being added in this change is
ServiceCredentialProvider. It's an interface that service-specific libraries
(e.g. a Solr connector, or a Kudu connector) would extend, and user
applications would never touch. It's also an existing interface, which was in
the YARN module but, really, has functionality that is independent of YARN.
As I detailed above, the exposed spi contract is not just the interface,
but also how it is to be used by spark.
It includes the environment `obtainCredentials` is invoked within, use of
`Credentials` as container for tokens and how `Credentials` are leveraged in
driver/executor.
This, in entirety, defines the public contract - not just the methods in
the interface : and this is dependent on how hadoop-security
authenticates/authorizes services.
> And, on top of that, I don't see a point in abstracting it further. It
would just be abstraction for the sake of abstraction.
It would ensure that spark is not tied to hadoop security, but can continue
to leverage it for managing long running applications/services.
Once interfaces are exposed, and have been exposed for a few releases, it
is exceedingly difficult for users if we break/modify them - irrespective of
the stability annotation's we add.
A straw man pseudocode proposal which elaborates is:
```
@Unstable
// rename
trait BaseServiceTokenProvider[C, T] {
// Must match a supported ServiceTokenManager.serviceType
def serviceType: String
def serviceName: String
def credentialsRequired(conf: C): Boolean
def obtainCredentials(conf: C, tokenContainer: T): Option[Long]
}
// Existing ServiceCredentialProvider becomes
@Unstable
trait ServiceCredentialProvider extends
BaseServiceTokenProvider[Configuration, Credentials] {
override def serviceType: String =
ServiceTokenManager.HADOOP_SECURITY_SERVICE_TYPE
}
// Modify ConfigurableCredentialManager to segregate by serviceType of
ServiceTokenProvider
// which will be queried by ServiceTokenManager. If unknown serviceType,
then log error.
@Unstable
trait ServiceTokenManager {
val HADOOP_SECURITY_SERVICE_TYPE = "hadoop-security"
def serviceType: String
// Return false if validation fails. Invoked in driver.
def configure(conf: org.apache.spark.SparkConf): Boolean
// Invoked in driver. If required, application of tokens to driver to be
done as part of
// acquireTokens
def acquireTokens(manager: ConfigurableCredentialManager): Array[Byte]
// Invoked in executor
def applyTokens(data: Array[Byte])
}
// Existing implementation will be encapsulated within this class.
private[spark] class YarnServiceTokenManager extends ServiceTokenManager {
override def serviceType: String = HADOOP_SECURITY_SERVICE_TYPE
// In driver
override def configure(conf: SparkConf): Boolean = {
// save principal, keytab; create freshHadoopConf
// configure
true
}
override def acquireTokens(manager: ConfigurableCredentialManager):
Array[Byte] = {
val ugi = UGI.loginUserFromKeytabAndReturnUGI(principal, keytab)
val tempCreds = ugi.getCredentials
ugi.doAs {
manager.getCredentialProviders(serviceType).flatMap { provider =>
// existing code in ConfigurableCredentialManager.obtainCredentials
}.foldLeft(Long.MaxValue)(math.min)
}
// apply it to driver
UGI.currentUser.addCredentials(tempCreds)
// serialize tempCreds to Array[Byte]
serialize(tempCreds)
}
override def applyTokens(data: Array[Byte]): Unit = {
// deserialize data into Credentials; apply it to UGI.currentUser
}
}
// Spark core itself becomes decoupled from hadoop-security with the above.
// It also allows us to add other models if applicable.
// Changes to driver to leverage above would essentially be:
// a) load ServiceTokenManager's via ServiceLoader.
// b) create CredentialManager - which loads BaseServiceTokenProvider via
service loader.
// b.1) Reject any token provider which has a service type not loaded in (a)
// c) configure (a)
managers.foreach(_.configure(sparkConf))
// token acquisition
val acquiredTokenMap = managers.map(manager => (manager.serviceType,
manager.acquireTokens(credentialManager)))
// serialize and send acquiredTokenMap to executors.
// At executors, application is:
acquiredTokenMap.map {
case (managerType, tokenData) =>
serviceTokenManagers(managerType).applyTokens(tokenData)
}
```
This ensures that future requirements are reasonably captured without any
implementation specific changes required to the interface or behavior.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]