[ 
https://issues.apache.org/jira/browse/YARN-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13937809#comment-13937809
 ] 

Jason Lowe commented on YARN-1842:
----------------------------------

Wondering if this is a case where the NM or AM somehow failed to heartbeat and 
expired from the RM's point of view.  At that point the RM will ask the NM to 
kill all containers when it resyncs and will have cleaned up the bookkeeping on 
the AM (hence an unknown app attempt).  The RM log should shed some light on 
what happened there.

Normally when an AM is told to "go away" by the RM there will be a subsequent 
AM attempt following it up (assuming there are app attempt retries left).  In 
those cases the AM attempt should leave without causing any damage to 
subsequent attempts (e.g.: don't cleanup staging areas and prevent subsequent 
attempts from launching).  However if the attempt is the last one then it 
should go ahead and perform any normal shutdown cleanup as there will not be 
any subsequent attempts to clean up the mess.

> InvalidApplicationMasterRequestException raised during AM-requested shutdown
> ----------------------------------------------------------------------------
>
>                 Key: YARN-1842
>                 URL: https://issues.apache.org/jira/browse/YARN-1842
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.2.0
>            Reporter: Steve Loughran
>
> Report of the RM raising a stack trace 
> [https://gist.github.com/matyix/9596735] during AM-initiated shutdown. The AM 
> could just swallow this and exit, but it could be a sign of a race condition 
> YARN-side, or maybe just in the RM client code/AM dual signalling the 
> shutdown. 
> I haven't replicated this myself; maybe the stack will help track down the 
> problem. Otherwise: what is the policy YARN apps should adopt for AM's 
> handling errors on shutdown? go straight to an exit(-1)?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to