[jira] [Commented] (YARN-4087) Set YARN_FAIL_FAST to be false by default
[ https://issues.apache.org/jira/browse/YARN-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729422#comment-14729422 ] Anubhav Dhoot commented on YARN-4087: - In general if we are not failing the daemon if fail fast flag is false, we still need to ensure we are not leaving inconsistent state in RM. For eg in YARN-4032. YARN-2019 is the other case where we did not need to do anything. This would mean every patch from now on that uses fail fast to not crash the daemon should consider taking corrective action to ensure correctness. Does that make sense? > Set YARN_FAIL_FAST to be false by default > - > > Key: YARN-4087 > URL: https://issues.apache.org/jira/browse/YARN-4087 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-4087.1.patch, YARN-4087.2.patch > > > Increasingly, I feel setting this property to be false makes more sense > especially in production environment, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4087) Set YARN_FAIL_FAST to be false by default
[ https://issues.apache.org/jira/browse/YARN-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14726097#comment-14726097 ] Jian He commented on YARN-4087: --- bq. as there are no retries or explicit app-failures Retry already happened internally before the final Exception is thrown. Right, app will be stuck at certain state, since no notification is sent back. But, explicitly failing the app may be too harsh, since the app itself can actually proceed without any impact. I think we can still notify back that the store operation is done and let the app continue. Also, print warning message on application page something like "Application is not persisted in state-store due to state-store error. Application will be lost if RM restarted." > Set YARN_FAIL_FAST to be false by default > - > > Key: YARN-4087 > URL: https://issues.apache.org/jira/browse/YARN-4087 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-4087.1.patch, YARN-4087.2.patch > > > Increasingly, I feel setting this property to be false makes more sense > especially in production environment, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4087) Set YARN_FAIL_FAST to be false by default
[ https://issues.apache.org/jira/browse/YARN-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14725761#comment-14725761 ] Vinod Kumar Vavilapalli commented on YARN-4087: --- Yes, I just checked that YARN-2019 added the config only in 2.8 which is unreleased now. So, we can safely change the default. bq. Also, may be we should mark this JIRA as incompatible (for behavior)? The previous behavior was undesired, and nobody in practice should depend on it. I think there was a bigger thing that got missed at YARN-2019. If we ignore the failure when the config is off, the higher order operations are stuck in a weird state as there are no retries or explicit app-failures, [~jianhe]? > Set YARN_FAIL_FAST to be false by default > - > > Key: YARN-4087 > URL: https://issues.apache.org/jira/browse/YARN-4087 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-4087.1.patch, YARN-4087.2.patch > > > Increasingly, I feel setting this property to be false makes more sense > especially in production environment, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4087) Set YARN_FAIL_FAST to be false by default
[ https://issues.apache.org/jira/browse/YARN-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14724003#comment-14724003 ] Jian He commented on YARN-4087: --- bq. In yarn-default.xml the default value for RM_FAIL_FAST is true. DIdn't get you. Isn't the default value set to YARN_FAIL_FAST too? > Set YARN_FAIL_FAST to be false by default > - > > Key: YARN-4087 > URL: https://issues.apache.org/jira/browse/YARN-4087 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-4087.1.patch, YARN-4087.2.patch > > > Increasingly, I feel setting this property to be false makes more sense > especially in production environment, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4087) Set YARN_FAIL_FAST to be false by default
[ https://issues.apache.org/jira/browse/YARN-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14723242#comment-14723242 ] Bibin A Chundatt commented on YARN-4087: In yarn-default.xml the default value for RM_FAIL_FAST is true. In code the default value for RM_FAIL_FAST is taken from YARN_FAIL_FAST whose value is false. > Set YARN_FAIL_FAST to be false by default > - > > Key: YARN-4087 > URL: https://issues.apache.org/jira/browse/YARN-4087 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-4087.1.patch, YARN-4087.2.patch > > > Increasingly, I feel setting this property to be false makes more sense > especially in production environment, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4087) Set YARN_FAIL_FAST to be false by default
[ https://issues.apache.org/jira/browse/YARN-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14718641#comment-14718641 ] Junping Du commented on YARN-4087: -- Patch LGTM. bq. +1, if fail-fast hasn't been in any prior release and we are not drastically altering the behavior. I believe fail-fast just involve recently. However, the default behavior when RM/NM state store get failed could be different from previous releases: it failed NM/RM daemons previously, now we could tolerant it keep running with log some error messages. We should definitely note this in our release notes. Also, may be we should mark this JIRA as incompatible (for behavior)? > Set YARN_FAIL_FAST to be false by default > - > > Key: YARN-4087 > URL: https://issues.apache.org/jira/browse/YARN-4087 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-4087.1.patch, YARN-4087.2.patch > > > Increasingly, I feel setting this property to be false makes more sense > especially in production environment, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4087) Set YARN_FAIL_FAST to be false by default
[ https://issues.apache.org/jira/browse/YARN-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717666#comment-14717666 ] Hadoop QA commented on YARN-4087: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 18m 23s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 51s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 3s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 2m 0s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 29s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 12s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 1m 59s | Tests passed in hadoop-yarn-common. | | | | 46m 22s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12752862/YARN-4087.2.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / a9c8ea7 | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8932/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8932/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8932/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8932/console | This message was automatically generated. > Set YARN_FAIL_FAST to be false by default > - > > Key: YARN-4087 > URL: https://issues.apache.org/jira/browse/YARN-4087 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-4087.1.patch, YARN-4087.2.patch > > > Increasingly, I feel setting this property to be false makes more sense > especially in production environment, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4087) Set YARN_FAIL_FAST to be false by default
[ https://issues.apache.org/jira/browse/YARN-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717568#comment-14717568 ] Jian He commented on YARN-4087: --- The YARN_FAIL_FAST is a global knob to control all components, e.g. RM, NM; The config description does the clarification. Just can't think of a concise and meaningful name. Any naming suggestion is welcome. Update the patch to carify the config description more. > Set YARN_FAIL_FAST to be false by default > - > > Key: YARN-4087 > URL: https://issues.apache.org/jira/browse/YARN-4087 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-4087.1.patch > > > Increasingly, I feel setting this property to be false makes more sense > especially in production environment, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4087) Set YARN_FAIL_FAST to be false by default
[ https://issues.apache.org/jira/browse/YARN-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717526#comment-14717526 ] Hitesh Shah commented on YARN-4087: --- It would be good to rename the config property to something that provides a bit more clarity on what the config knob is meant to control. > Set YARN_FAIL_FAST to be false by default > - > > Key: YARN-4087 > URL: https://issues.apache.org/jira/browse/YARN-4087 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-4087.1.patch > > > Increasingly, I feel setting this property to be false makes more sense > especially in production environment, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4087) Set YARN_FAIL_FAST to be false by default
[ https://issues.apache.org/jira/browse/YARN-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717131#comment-14717131 ] Jian He commented on YARN-4087: --- [~bibinchundatt], the logic is that default value for RM_FAIL_FAST is YARN_FAIL_FAST > Set YARN_FAIL_FAST to be false by default > - > > Key: YARN-4087 > URL: https://issues.apache.org/jira/browse/YARN-4087 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-4087.1.patch > > > Increasingly, I feel setting this property to be false makes more sense > especially in production environment, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4087) Set YARN_FAIL_FAST to be false by default
[ https://issues.apache.org/jira/browse/YARN-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715892#comment-14715892 ] Hadoop QA commented on YARN-4087: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 18m 27s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 8m 2s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 9s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 58s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 31s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 10s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 1m 59s | Tests passed in hadoop-yarn-common. | | | | 46m 40s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12752615/YARN-4087.1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f44b599 | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8922/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8922/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8922/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8922/console | This message was automatically generated. > Set YARN_FAIL_FAST to be false by default > - > > Key: YARN-4087 > URL: https://issues.apache.org/jira/browse/YARN-4087 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-4087.1.patch > > > Increasingly, I feel setting this property to be false makes more sense > especially in production environment, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4087) Set YARN_FAIL_FAST to be false by default
[ https://issues.apache.org/jira/browse/YARN-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715872#comment-14715872 ] Bibin A Chundatt commented on YARN-4087: So by default in yarn-default.xml yarn.resourcemanager.fail-fast=true yarn.fail-fast=false In YarnConfiguration {code} public static boolean shouldRMFailFast(Configuration conf) { return conf.getBoolean(YarnConfiguration.RM_FAIL_FAST, conf.getBoolean(YarnConfiguration.YARN_FAIL_FAST, YarnConfiguration.DEFAULT_YARN_FAIL_FAST)); } {code} some mismatch rt? No plans to change YarnConfiguration.RM_FAIL_FAST. > Set YARN_FAIL_FAST to be false by default > - > > Key: YARN-4087 > URL: https://issues.apache.org/jira/browse/YARN-4087 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-4087.1.patch > > > Increasingly, I feel setting this property to be false makes more sense > especially in production environment, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4087) Set YARN_FAIL_FAST to be false by default
[ https://issues.apache.org/jira/browse/YARN-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715829#comment-14715829 ] Karthik Kambatla commented on YARN-4087: +1, if fail-fast hasn't been in any prior release and we are not drastically altering the behavior. In any case, it would be nice to release note this new behavior for 2.8.0. > Set YARN_FAIL_FAST to be false by default > - > > Key: YARN-4087 > URL: https://issues.apache.org/jira/browse/YARN-4087 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-4087.1.patch > > > Increasingly, I feel setting this property to be false makes more sense > especially in production environment, -- This message was sent by Atlassian JIRA (v6.3.4#6332)