Jason Lowe commented on YARN-4254:

True, registering could take significantly longer if DNS is slow.  However IIRC 
the NameNode also resolves datanodes when they register and rejects datanodes 
that cannot be resolved, so I believe there is precedent for it.  Curious, was 
this new node added as a datanode as well, and if so what did the NameNode do?

Anyway we don't have to do registration rejection as part of this JIRA, and 
even with that fix it wouldn't solve the problem if the node was resolvable 
when it joined but not when the AM launched.  The real issue for this JIRA is 
why did it try forever on a bad nodename resolution.  Did it really try 
forever, or was it a case of something like YARN-3208 where it would eventually 
complete but just not for a really long time due to retries at multiple levels?

> ApplicationAttempt stuck for ever due to UnknowHostexception
> ------------------------------------------------------------
>                 Key: YARN-4254
>                 URL: https://issues.apache.org/jira/browse/YARN-4254
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Bibin A Chundatt
>            Assignee: Bibin A Chundatt
>         Attachments: 0001-YARN-4254.patch
> Scenario
> =======
> 1. RM HA and 5 NMs available in cluster and are working fine 
> 2. Add one more NM to the same cluster but RM /etc/hosts not updated.
> 3. Submit application to the same cluster
> If Am get allocated to the newly added NM the *application attempt will get 
> stuck for ever*.User will not get to know why the same happened.
> Impact
> 1.RM logs gets overloaded with exception
> 2.Application gets stuck for ever.
> Handling suggestion YARN-261 allows for Fail application attempt .
> If we fail the same next attempt could get assigned to another NM.

This message was sent by Atlassian JIRA

Reply via email to