[jira] [Commented] (MAPREDUCE-5718) MR job will fail after commit fail
[ https://issues.apache.org/jira/browse/MAPREDUCE-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15092352#comment-15092352 ] Junping Du commented on MAPREDUCE-5718: --- Given MAPREDUCE-5485 is already commit in, shall we mark this JIRA as duplicated? > MR job will fail after commit fail > -- > > Key: MAPREDUCE-5718 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5718 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am >Affects Versions: 2.6.0 >Reporter: Karthik Kambatla >Assignee: Yang Hao > Attachments: MAPREDUCE-5718.v2.patch, mr-5718-0.patch > > > when any of this happens: > * While testing RM HA, if the RM fails over while an MR AM is in the middle > of a commit, > * When testing preempting, if the MR AM fails over during the middle of a > commit > the subsequent AM gets spawned but dies with a diagnostic message - "We > crashed durring a commit". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5718) MR job will fail after commit fail
[ https://issues.apache.org/jira/browse/MAPREDUCE-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305180#comment-14305180 ] Jason Lowe commented on MAPREDUCE-5718: --- The issue is that the OutputCommitter is user-specified code. It may not be doing filesystem operations at all during the commit (e.g.: committing to a database, a REST API, etc.), and that procedure may not be restartable without the chance of corrupting or losing data. Yes, FileOutputCommitter's commit procedure is something that can be restarted, but jobs are not required to use FileOutputCommitter nor dump their output in a file at all. That's why MAPREDUCE-5485 was filed, because to do this safely the framework needs an indication from the output committer whether or not commit is a restartable procedure. > MR job will fail after commit fail > -- > > Key: MAPREDUCE-5718 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5718 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am >Affects Versions: 2.3.0, 2.6.0 >Reporter: Karthik Kambatla >Assignee: Yang Hao > Fix For: 2.6.0 > > Attachments: MAPREDUCE-5718.v2.patch, mr-5718-0.patch > > > when any of this happens: > * While testing RM HA, if the RM fails over while an MR AM is in the middle > of a commit, > * When testing preempting, if the MR AM fails over during the middle of a > commit > the subsequent AM gets spawned but dies with a diagnostic message - "We > crashed durring a commit". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5718) MR job will fail after commit fail
[ https://issues.apache.org/jira/browse/MAPREDUCE-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305035#comment-14305035 ] Yang Hao commented on MAPREDUCE-5718: - Hi, I have read the comment twice. It may not be safe to recover under this situation. I'm wondering whether you are worring about missing data. But during the recovery, task output data will be moved to the new output dir, just as the normal AM failover does, then it's safe to recover Our cluster will add this feature. Can you give more specific information on the unsafe situation? Thanks a lot > MR job will fail after commit fail > -- > > Key: MAPREDUCE-5718 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5718 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am >Affects Versions: 2.3.0, 2.6.0 >Reporter: Karthik Kambatla >Assignee: Yang Hao > Fix For: 2.6.0 > > Attachments: MAPREDUCE-5718.v2.patch, mr-5718-0.patch > > > when any of this happens: > * While testing RM HA, if the RM fails over while an MR AM is in the middle > of a commit, > * When testing preempting, if the MR AM fails over during the middle of a > commit > the subsequent AM gets spawned but dies with a diagnostic message - "We > crashed durring a commit". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5718) MR job will fail after commit fail
[ https://issues.apache.org/jira/browse/MAPREDUCE-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305036#comment-14305036 ] Yang Hao commented on MAPREDUCE-5718: - Hi, I have read the comment twice. It may not be safe to recover under this situation. I'm wondering whether you are worring about missing data. But during the recovery, task output data will be moved to the new output dir, just as the normal AM failover does, then it's safe to recover Our cluster will add this feature. Can you give more specific information on the unsafe situation? Thanks a lot > MR job will fail after commit fail > -- > > Key: MAPREDUCE-5718 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5718 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am >Affects Versions: 2.3.0, 2.6.0 >Reporter: Karthik Kambatla >Assignee: Yang Hao > Fix For: 2.6.0 > > Attachments: MAPREDUCE-5718.v2.patch, mr-5718-0.patch > > > when any of this happens: > * While testing RM HA, if the RM fails over while an MR AM is in the middle > of a commit, > * When testing preempting, if the MR AM fails over during the middle of a > commit > the subsequent AM gets spawned but dies with a diagnostic message - "We > crashed durring a commit". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5718) MR job will fail after commit fail
[ https://issues.apache.org/jira/browse/MAPREDUCE-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14304041#comment-14304041 ] Karthik Kambatla commented on MAPREDUCE-5718: - [~yanghaogn] - initially, I was also trying to delete the startCommitFile if there is not corresponding endFile. However, we can't do that for reasons Jason described here - https://issues.apache.org/jira/browse/MAPREDUCE-5718?focusedCommentId=13872189&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13872189 > MR job will fail after commit fail > -- > > Key: MAPREDUCE-5718 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5718 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am >Affects Versions: 2.3.0, 2.6.0 >Reporter: Karthik Kambatla >Assignee: Yang Hao > Fix For: 2.6.0 > > Attachments: MAPREDUCE-5718.v2.patch, mr-5718-0.patch > > > when any of this happens: > * While testing RM HA, if the RM fails over while an MR AM is in the middle > of a commit, > * When testing preempting, if the MR AM fails over during the middle of a > commit > the subsequent AM gets spawned but dies with a diagnostic message - "We > crashed durring a commit". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5718) MR job will fail after commit fail
[ https://issues.apache.org/jira/browse/MAPREDUCE-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303369#comment-14303369 ] Hadoop QA commented on MAPREDUCE-5718: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12694551/MAPREDUCE-5718.v2.patch against trunk revision 8cb4731. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app: org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5147//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5147//console This message is automatically generated. > MR job will fail after commit fail > -- > > Key: MAPREDUCE-5718 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5718 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am >Affects Versions: 2.3.0, 2.6.0 >Reporter: Karthik Kambatla >Assignee: Yang Hao > Fix For: 2.6.0 > > Attachments: MAPREDUCE-5718.v2.patch, mr-5718-0.patch > > > when any of this happens: > * While testing RM HA, if the RM fails over while an MR AM is in the middle > of a commit, > * When testing preempting, if the MR AM fails over during the middle of a > commit > the subsequent AM gets spawned but dies with a diagnostic message - "We > crashed durring a commit". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5718) MR job will fail after commit fail
[ https://issues.apache.org/jira/browse/MAPREDUCE-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14291561#comment-14291561 ] Yang Hao commented on MAPREDUCE-5718: - if crashed during a commit and if an api comes out to check if the job can fail over, then the problem will be fixed > MR job will fail after commit fail > -- > > Key: MAPREDUCE-5718 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5718 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am >Affects Versions: 2.3.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Labels: ha > Attachments: mr-5718-0.patch > > > when any of this happens: > * While testing RM HA, if the RM fails over while an MR AM is in the middle > of a commit, > * When testing preempting, if the MR AM fails over during the middle of a > commit > the subsequent AM gets spawned but dies with a diagnostic message - "We > crashed durring a commit". -- This message was sent by Atlassian JIRA (v6.3.4#6332)