Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/4688#issuecomment-97183253
  
    > So I noticed that if the dfs.namenode.delegation.token.renew-interval is 
different from the max lifetime  of the token, a lot of exceptions get thrown 
around with the token being expired etc - and the executors  may not be able to 
read the new tokens. It looks like the tokens don't get renewed if HDFS is not 
accessed before the renew interval - so for an executor which accesses HDFS 
rarely enough, it may not  be able to read from HDFS.
    
    > So instead of waiting till 80% of max lifetime, I wait till 0.75 * 
dfs.namenode.delegation.token.renew-interval to renew. This means that the 
hdfs-site.xml file must be in sync with the one on the namenode >(my 
understanding is this param's value is rarely changed, so this is unlikely to 
be an issue at all).
    
    thanks for the updates and details on testing.
    
    So my guess on this is that after the initial expiration period the yarn RM 
isn't renewing the tokens anymore since it doesn't get the updated ones (it 
only has the one you initially submitted the application with).  Thus in order 
for the token to stay good for longer then 1 day you either have to renew it or 
do the loginFromKeytab like you mention.   
    
    So you could change this to renew until the max lifetime and then do the 
loginFromKeytab.  I don't think doing the loginFromKeytab is going to add much 
more overhead then doing the renewal so I'm ok with leaving this doing the 
loginFromKeytab before the renewal period. We could always change it later if 
we decide.
    
    I'd rather not use dfs.namenode.delegation.token.renew-interval config.  As 
you say it might not match on the gateway as compared to what the namenode is 
using.  You can get the renewal interval by doing a renew on the token once.  
Then we can store that and do the loginFromKeytab at X% of that. Note that 
addDelegationTokens in obtainTokensForNamenodes will return a list of tokens 
that you could renew to get the period.
    
    I'll look through the rest of the code and leave any comments.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to