[jira] [Updated] (YARN-542) Change the default global AM max-attempts value to be not one

2013-04-11 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-542:
-

Description: 
Today, the global AM max-attempts is set to 1 which is a bad choice. AM 
max-attempts accounts for both AM level failures as well as container crashes 
due to localization issue, lost nodes etc. To account for AM crashes due to 
problems that are not caused by user code, mainly lost nodes, we want to give 
AMs some retires.

I propose we change it to atleast two. Can change it to 4 to match other 
retry-configs.

  was:
Today, the AM max-retries is set to 1 which is a bad choice. AM max-retries 
accounts for both AM level failures as well as container crashes due to 
localization issue, lost nodes etc. To account for AM crashes due to problems 
that are not caused by user code, mainly lost nodes, we want to give AMs some 
retires.

I propose we change it to atleast two. Can change it to 4 to match other 
retry-configs.


 Change the default global AM max-attempts value to be not one
 -

 Key: YARN-542
 URL: https://issues.apache.org/jira/browse/YARN-542
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen

 Today, the global AM max-attempts is set to 1 which is a bad choice. AM 
 max-attempts accounts for both AM level failures as well as container crashes 
 due to localization issue, lost nodes etc. To account for AM crashes due to 
 problems that are not caused by user code, mainly lost nodes, we want to give 
 AMs some retires.
 I propose we change it to atleast two. Can change it to 4 to match other 
 retry-configs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-542) Change the default global AM max-attempts value to be not one

2013-04-11 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-542:
-

Summary: Change the default global AM max-attempts value to be not one  
(was: Change the default AM retry value to be not one)

 Change the default global AM max-attempts value to be not one
 -

 Key: YARN-542
 URL: https://issues.apache.org/jira/browse/YARN-542
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen

 Today, the AM max-retries is set to 1 which is a bad choice. AM max-retries 
 accounts for both AM level failures as well as container crashes due to 
 localization issue, lost nodes etc. To account for AM crashes due to problems 
 that are not caused by user code, mainly lost nodes, we want to give AMs some 
 retires.
 I propose we change it to atleast two. Can change it to 4 to match other 
 retry-configs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-542) Change the default global AM max-attempts value to be not one

2013-04-11 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-542:
-

Attachment: YARN-542.1.patch

I've drafted a patch, which includes the following modifications:

1. Change the default value of yarn.resourcemanager.am.max-attempts from 1 to 2.

2. In the test cases, where more than one attempt is set, 
YarnConfiguration.DEFAULT_RM_AM_MAX_ATTEMPTS is used instead of the hard-coding 
values.

3. Assert the set maxAttempts  1 where one and more than one will make 
difference.

 Change the default global AM max-attempts value to be not one
 -

 Key: YARN-542
 URL: https://issues.apache.org/jira/browse/YARN-542
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
 Attachments: YARN-542.1.patch


 Today, the global AM max-attempts is set to 1 which is a bad choice. AM 
 max-attempts accounts for both AM level failures as well as container crashes 
 due to localization issue, lost nodes etc. To account for AM crashes due to 
 problems that are not caused by user code, mainly lost nodes, we want to give 
 AMs some retires.
 I propose we change it to atleast two. Can change it to 4 to match other 
 retry-configs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira