[ 
https://issues.apache.org/jira/browse/YARN-8710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16640248#comment-16640248
 ] 

Suma Shivaprasad commented on YARN-8710:
----------------------------------------

Thanks [~shuzirra] for the review. The total retry count can be configured by 
setting both "yarn.resourcemanager.am.max-attempts" and 
"yarn.service.am-restart.max-attempts" so that AM retries could be scheduled on 
other nodes. This would prevent delays in trying to schedule on a node which 
may not be reachable or unhealthy. The service user can override this behaviour 
by explicitly setting this in the YARN service spec if they still need infinite 
NM retries via yarn.service.container-failure.retry.max 
Please let me know if you still have any concerns on the patch.

> Service AM should set a finite limit on NM container max retries 
> -----------------------------------------------------------------
>
>                 Key: YARN-8710
>                 URL: https://issues.apache.org/jira/browse/YARN-8710
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn-native-services
>            Reporter: Suma Shivaprasad
>            Assignee: Suma Shivaprasad
>            Priority: Major
>         Attachments: YARN-8710.1.patch
>
>
> Container retries are currently set to a default of -1 in 
> AbstractProviderService.buildContainerRetry. If this is not overridden via 
> service spec with a finite value for yarn.service.container-failure.retry.max 
> , this causes infinite NM reties for the container for ALWAYS/ON_FAILURE 
> restart policy . Ideally it should try a finite number of time on the same NM 
> and subsequently Service AM can retry on another node.
> We can set this to default value of 3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to