Bibin A Chundatt commented on YARN-4254:

Thanks for looking into issue
Had cancelled the patch .Sorry forgot to mention the same in JIRA .
Looking further got to know that the retry is for DNS related case. But attempt 
should give up after a fixed period of time. 

For this jira  
Would it make more sense if the RM simply refused to accept nodemanagers into 
the cluster that are unresolvable?

This solutions sounds good.


Also the fact that we try forever seems broken to me. We should be giving up at 
some point and failing the attempt, whether that be due to unknown host 
exceptions or other persistent errors.
Will try to find out further why timeout is not happening or the same is not 

> ApplicationAttempt stuck for ever due to UnknowHostexception
> ------------------------------------------------------------
>                 Key: YARN-4254
>                 URL: https://issues.apache.org/jira/browse/YARN-4254
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Bibin A Chundatt
>            Assignee: Bibin A Chundatt
>         Attachments: 0001-YARN-4254.patch
> Scenario
> =======
> 1. RM HA and 5 NMs available in cluster and are working fine 
> 2. Add one more NM to the same cluster but RM /etc/hosts not updated.
> 3. Submit application to the same cluster
> If Am get allocated to the newly added NM the *application attempt will get 
> stuck for ever*.User will not get to know why the same happened.
> Impact
> 1.RM logs gets overloaded with exception
> 2.Application gets stuck for ever.
> Handling suggestion YARN-261 allows for Fail application attempt .
> If we fail the same next attempt could get assigned to another NM.

This message was sent by Atlassian JIRA

Reply via email to