[ https://issues.apache.org/jira/browse/YARN-8710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16648298#comment-16648298 ]
Suma Shivaprasad commented on YARN-8710: ---------------------------------------- Thanks [~billie.rinaldi] > Service AM should set a finite limit on NM container max retries > ----------------------------------------------------------------- > > Key: YARN-8710 > URL: https://issues.apache.org/jira/browse/YARN-8710 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services > Reporter: Suma Shivaprasad > Assignee: Suma Shivaprasad > Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-8710.1.patch, YARN-8710.2.patch > > > Container retries are currently set to a default of -1 in > AbstractProviderService.buildContainerRetry. If this is not overridden via > service spec with a finite value for yarn.service.container-failure.retry.max > , this causes infinite NM reties for the container for ALWAYS/ON_FAILURE > restart policy . Ideally it should try a finite number of time on the same NM > and subsequently Service AM can retry on another node. > We can set this to default value of 3. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org