PrabhuJoseph commented on pull request #33674:
URL: https://github.com/apache/spark/pull/33674#issuecomment-989729420


   @Shockang  Have tested the patch but it does not address the reported issue. 
The spark job runs in yarn-client mode where the Driver runs along with Client 
creates a new JobConf in HadoopRDD for every partition which internally fetches 
a FileSystem Delegation Token. So when there are 1000 partitions - 1000 time 
delegation token will be fetched. 
   
   The Spark Client gets the FileSystem Delegation Token at start (Client.scala 
- setupSecurityToken) and places in the Token file and pass it to the Spark 
Application Master & Executors to use. But the Client uses different 
credentials which does not have FileSystem Delegation Token as it is using TGT. 
(Refer SPARK-15754)
   
   And so every call Driver (Client Mode) makes to list the path creates a 
separate JobConf and adds the Client Credentials which does not have FileSystem 
token and so obtains a new token. 
   
   One simple fix is to expose a config which adds obtained hadoop filesystem 
delegation token into the client user credentials if enabled. This will improve 
the performance by fetching delegation token only once when running query on a 
partitioned table.
   
   Client.scala
   
     private val hadoopConf = new 
YarnConfiguration(SparkHadoopUtil.newConfiguration(sparkConf))
     private val isClusterMode = sparkConf.get("spark.submit.deployMode", 
"client") == "cluster"
   +  private val useDelegationToken = 
sparkConf.getBoolean("spark.client.useDelegationToken", false)
     // AM related configurations
     private val amMemory = if (isClusterMode) {
       // and adding delegation tokens could lead to expired or cancelled 
tokens being used
       // later, as reported in SPARK-15754.
       val currentUser = UserGroupInformation.getCurrentUser()
       if (SparkHadoopUtil.get.isProxyUser(currentUser)) {
     +  if (SparkHadoopUtil.get.isProxyUser(currentUser) || useDelegationToken) 
{
     +  logInfo("Adding obtained Hadoop Delegation Tokens into User 
Credentials")
         currentUser.addCredentials(credentials)
       }
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to