sunchao commented on a change in pull request #31761:
URL: https://github.com/apache/spark/pull/31761#discussion_r589881790



##########
File path: core/src/main/scala/org/apache/spark/internal/config/package.scala
##########
@@ -691,6 +691,15 @@ package object config {
     .toSequence
     .createWithDefault(Nil)
 
+  private[spark] val KERBEROS_FILESYSTEM_RENEWAL_EXCLUDE =
+    ConfigBuilder("spark.kerberos.renewal.exclude.hadoopFileSystems")
+      .doc("The list of Hadoop filesystem URLs whose hosts will be excluded 
from " +

Review comment:
       It's hard to imagine that this applies to defaultFS/stagingDir since it 
means the YARN cluster (or other types of clusters) is not configured to be in 
the same Kerberos realm as these which could cause other more serious issues. 
But, just as `spark.kerberos.access.hadoopFileSystems`, I guess nothing stops 
users from putting in the host for defaultFS/stagingDir in the config as well, 
and Spark will just do what it's told to do.

##########
File path: 
core/src/main/scala/org/apache/spark/deploy/security/HadoopFSDelegationTokenProvider.scala
##########
@@ -99,11 +100,24 @@ private[deploy] class HadoopFSDelegationTokenProvider
   private def fetchDelegationTokens(
       renewer: String,
       filesystems: Set[FileSystem],
-      creds: Credentials): Credentials = {
+      creds: Credentials,
+      hadoopConf: Configuration,
+      sparkConf: SparkConf): Credentials = {
+
+    // The hosts on which the file systems to be excluded from token renewal
+    val fsToExclude = sparkConf.get(KERBEROS_FILESYSTEM_RENEWAL_EXCLUDE)
+      .map(new Path(_).getFileSystem(hadoopConf).getUri.getHost)
+      .toSet
 
     filesystems.foreach { fs =>
-      logInfo(s"getting token for: $fs with renewer $renewer")
-      fs.addDelegationTokens(renewer, creds)
+      if (fsToExclude.contains(fs.getUri.getHost)) {
+        // RM skips renewing token with empty renewer

Review comment:
       Hmm does it only apply to YARN though? It seems Spark has its own 
`HadoopDelegationTokenManager` which is separated from YARN. It has its own 
renewal logic as well and I'm not sure we even need to use "" for renewer.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to