[jira] [Updated] (YARN-3480) Make AM max attempts stored in RMAppImpl and RMStateStore to be configurable
[ https://issues.apache.org/jira/browse/YARN-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jun Gong updated YARN-3480: --- Description: When RM HA is enabled and running containers are kept across attempts, apps are more likely to finish successfully with more retries(attempts), so it will be better to set 'yarn.resourcemanager.am.max-attempts' larger. However it will make RMStateStore(FileSystem/HDFS/ZK) store more attempts, and make RM recover process much slower. It might be better to set max attempts to be stored in RMStateStore. BTW: When 'attemptFailuresValidityInterval'(introduced in YARN-611) is set to a small value, retried attempts might be very large. So we need to delete some attempts stored in RMStateStore and RMStateStore. was:When RM HA is enabled and running containers are kept across attempts, apps are more likely to finish successfully with more retries(attempts), so it will be better to set 'yarn.resourcemanager.am.max-attempts' larger. However it will make RMStateStore(FileSystem/HDFS/ZK) store more attempts, and make RM recover process much slower. It might be better to set max attempts to be stored in RMStateStore. Make AM max attempts stored in RMAppImpl and RMStateStore to be configurable Key: YARN-3480 URL: https://issues.apache.org/jira/browse/YARN-3480 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.6.0 Reporter: Jun Gong Assignee: Jun Gong Attachments: YARN-3480.01.patch, YARN-3480.02.patch When RM HA is enabled and running containers are kept across attempts, apps are more likely to finish successfully with more retries(attempts), so it will be better to set 'yarn.resourcemanager.am.max-attempts' larger. However it will make RMStateStore(FileSystem/HDFS/ZK) store more attempts, and make RM recover process much slower. It might be better to set max attempts to be stored in RMStateStore. BTW: When 'attemptFailuresValidityInterval'(introduced in YARN-611) is set to a small value, retried attempts might be very large. So we need to delete some attempts stored in RMStateStore and RMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3480) Make AM max attempts stored in RMAppImpl and RMStateStore to be configurable
[ https://issues.apache.org/jira/browse/YARN-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jun Gong updated YARN-3480: --- Attachment: YARN-3480.03.patch Update patch. Fix javac warning, checkstyple and test cases error. Make AM max attempts stored in RMAppImpl and RMStateStore to be configurable Key: YARN-3480 URL: https://issues.apache.org/jira/browse/YARN-3480 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.6.0 Reporter: Jun Gong Assignee: Jun Gong Attachments: YARN-3480.01.patch, YARN-3480.02.patch, YARN-3480.03.patch When RM HA is enabled and running containers are kept across attempts, apps are more likely to finish successfully with more retries(attempts), so it will be better to set 'yarn.resourcemanager.am.max-attempts' larger. However it will make RMStateStore(FileSystem/HDFS/ZK) store more attempts, and make RM recover process much slower. It might be better to set max attempts to be stored in RMStateStore. BTW: When 'attemptFailuresValidityInterval'(introduced in YARN-611) is set to a small value, retried attempts might be very large. So we need to delete some attempts stored in RMStateStore and RMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3480) Make AM max attempts stored in RMAppImpl and RMStateStore to be configurable
[ https://issues.apache.org/jira/browse/YARN-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jun Gong updated YARN-3480: --- Attachment: YARN-3480.02.patch Make AM max attempts stored in RMAppImpl and RMStateStore to be configurable Key: YARN-3480 URL: https://issues.apache.org/jira/browse/YARN-3480 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.6.0 Reporter: Jun Gong Assignee: Jun Gong Attachments: YARN-3480.01.patch, YARN-3480.02.patch When RM HA is enabled and running containers are kept across attempts, apps are more likely to finish successfully with more retries(attempts), so it will be better to set 'yarn.resourcemanager.am.max-attempts' larger. However it will make RMStateStore(FileSystem/HDFS/ZK) store more attempts, and make RM recover process much slower. It might be better to set max attempts to be stored in RMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3480) Make AM max attempts stored in RMAppImpl and RMStateStore to be configurable
[ https://issues.apache.org/jira/browse/YARN-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jun Gong updated YARN-3480: --- Summary: Make AM max attempts stored in RMAppImpl and RMStateStore to be configurable (was: Make AM max attempts stored in RMStateStore to be configurable) Make AM max attempts stored in RMAppImpl and RMStateStore to be configurable Key: YARN-3480 URL: https://issues.apache.org/jira/browse/YARN-3480 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.6.0 Reporter: Jun Gong Assignee: Jun Gong Attachments: YARN-3480.01.patch When RM HA is enabled and running containers are kept across attempts, apps are more likely to finish successfully with more retries(attempts), so it will be better to set 'yarn.resourcemanager.am.max-attempts' larger. However it will make RMStateStore(FileSystem/HDFS/ZK) store more attempts, and make RM recover process much slower. It might be better to set max attempts to be stored in RMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)