[ 
https://issues.apache.org/jira/browse/YARN-8372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16496007#comment-16496007
 ] 

Vinod Kumar Vavilapalli commented on YARN-8372:
-----------------------------------------------

bq. DS app master should handle shutdown request properly whether to clean up 
or not based on the attempt number check.
This is not possible to do. The AM actually doesn't know what the last 
attempt-number is. See MAPREDUCE-5956 and YARN-2261 for background.

May be DS should (a) just never cleanup containers if the CLI flag 
{{keep_containers_across_application_attempts}} is true. (b) cleanup in 
shut-down hook like it does today if 
{{keep_containers_across_application_attempts}} is false.

> ApplicationAttemptNotFoundException should be handled correctly by 
> Distributed Shell App Master
> -----------------------------------------------------------------------------------------------
>
>                 Key: YARN-8372
>                 URL: https://issues.apache.org/jira/browse/YARN-8372
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: distributed-shell
>            Reporter: Charan Hebri
>            Assignee: Suma Shivaprasad
>            Priority: Major
>
> {noformat}
> try {
>   response = client.allocate(progress);
> } catch (ApplicationAttemptNotFoundException e) {
> handler.onShutdownRequest();
> LOG.info("Shutdown requested. Stopping callback.");
> return;{noformat}
> is a code snippet from AMRMClientAsyncImpl. The corresponding 
> onShutdownRequest call for the Distributed Shell App master,
> {noformat}
> @Override
> public void onShutdownRequest() {
>   done = true;
> }{noformat}
> Due to the above change, the current behavior is that whenever an application 
> attempt fails due to a NM restart (NM where the DS AM is running), an 
> ApplicationAttemptNotFoundException is thrown and all containers for that 
> attempt including the ones that are running on other NMs are killed by the AM 
> and marked as COMPLETE. The subsequent attempt spawns new containers just 
> like a new attempt. This behavior is different to a Map Reduce application 
> where the containers are not killed.
> cc [~rohithsharma]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to