[ 
https://issues.apache.org/jira/browse/YARN-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-10754:
--------------------------
    Description: 
As  YARN-9768 described:

Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews HDFS 
tokens received to check for validity and expiration time.

This call is made to an underlying HDFS NN or Router Node (which has exact APIs 
as HDFS NN). If one of the nodes is bad and the renew call is stuck the thread 
remains stuck indefinitely. The thread should ideally timeout the renewToken 
and retry from the client's perspective.

But it only consider the app recovery, not consider the app submitted:

!image-2021-04-27-11-38-29-162.png|width=516,height=428!

It will cause the app submitted not retry, when renew token (HDFS Namenode/ 
Router) timeout. 

  was:
As 

Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews HDFS 
tokens received to check for validity and expiration time.

This call is made to an underlying HDFS NN or Router Node (which has exact APIs 
as HDFS NN). If one of the nodes is bad and the renew call is stuck the thread 
remains stuck indefinitely. The thread should ideally timeout the renewToken 
and retry from the client's perspective.


> RM Renew Delegation token thread should timeout and retry should also 
> consider app new submitted.
> -------------------------------------------------------------------------------------------------
>
>                 Key: YARN-10754
>                 URL: https://issues.apache.org/jira/browse/YARN-10754
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Qi Zhu
>            Assignee: Qi Zhu
>            Priority: Major
>         Attachments: YARN-10754.001.patch, image-2021-04-27-11-38-29-162.png
>
>
> As  YARN-9768 described:
> Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews 
> HDFS tokens received to check for validity and expiration time.
> This call is made to an underlying HDFS NN or Router Node (which has exact 
> APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the 
> thread remains stuck indefinitely. The thread should ideally timeout the 
> renewToken and retry from the client's perspective.
> But it only consider the app recovery, not consider the app submitted:
> !image-2021-04-27-11-38-29-162.png|width=516,height=428!
> It will cause the app submitted not retry, when renew token (HDFS Namenode/ 
> Router) timeout. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to