[jira] [Commented] (MAPREDUCE-5718) MR job will fail after commit fail

2016-01-11 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15092352#comment-15092352
 ] 

Junping Du commented on MAPREDUCE-5718:
---

Given MAPREDUCE-5485 is already commit in, shall we mark this JIRA as 
duplicated?

> MR job will fail after commit fail
> --
>
> Key: MAPREDUCE-5718
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5718
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Yang Hao
> Attachments: MAPREDUCE-5718.v2.patch, mr-5718-0.patch
>
>
> when any of this happens:
> * While testing RM HA, if the RM fails over while an MR AM is in the middle 
> of a commit, 
> * When testing preempting, if the MR AM fails over during the middle of a 
> commit
> the subsequent AM gets spawned but dies with a diagnostic message - "We 
> crashed durring a commit". 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5718) MR job will fail after commit fail

2015-02-04 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305180#comment-14305180
 ] 

Jason Lowe commented on MAPREDUCE-5718:
---

The issue is that the OutputCommitter is user-specified code.  It may not be 
doing filesystem operations at all during the commit (e.g.: committing to a 
database, a REST API, etc.), and that procedure may not be restartable without 
the chance of corrupting or losing data.  Yes, FileOutputCommitter's commit 
procedure is something that can be restarted, but jobs are not required to use 
FileOutputCommitter nor dump their output in a file at all.  That's why 
MAPREDUCE-5485 was filed, because to do this safely the framework needs an 
indication from the output committer whether or not commit is a restartable 
procedure.

> MR job will fail after commit fail
> --
>
> Key: MAPREDUCE-5718
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5718
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am
>Affects Versions: 2.3.0, 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Yang Hao
> Fix For: 2.6.0
>
> Attachments: MAPREDUCE-5718.v2.patch, mr-5718-0.patch
>
>
> when any of this happens:
> * While testing RM HA, if the RM fails over while an MR AM is in the middle 
> of a commit, 
> * When testing preempting, if the MR AM fails over during the middle of a 
> commit
> the subsequent AM gets spawned but dies with a diagnostic message - "We 
> crashed durring a commit". 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5718) MR job will fail after commit fail

2015-02-04 Thread Yang Hao (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305035#comment-14305035
 ] 

Yang Hao commented on MAPREDUCE-5718:
-

Hi, I have read the comment twice. It may not be safe to recover under this 
situation. 

I'm wondering whether you are worring about missing data. But during the 
recovery, task output data will be moved to the new output dir, just as the 
normal AM failover does, then it's safe to recover

Our cluster will add this feature. Can you give more specific information on 
the unsafe situation? Thanks a lot

> MR job will fail after commit fail
> --
>
> Key: MAPREDUCE-5718
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5718
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am
>Affects Versions: 2.3.0, 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Yang Hao
> Fix For: 2.6.0
>
> Attachments: MAPREDUCE-5718.v2.patch, mr-5718-0.patch
>
>
> when any of this happens:
> * While testing RM HA, if the RM fails over while an MR AM is in the middle 
> of a commit, 
> * When testing preempting, if the MR AM fails over during the middle of a 
> commit
> the subsequent AM gets spawned but dies with a diagnostic message - "We 
> crashed durring a commit". 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5718) MR job will fail after commit fail

2015-02-04 Thread Yang Hao (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305036#comment-14305036
 ] 

Yang Hao commented on MAPREDUCE-5718:
-

Hi, I have read the comment twice. It may not be safe to recover under this 
situation. 

I'm wondering whether you are worring about missing data. But during the 
recovery, task output data will be moved to the new output dir, just as the 
normal AM failover does, then it's safe to recover

Our cluster will add this feature. Can you give more specific information on 
the unsafe situation? Thanks a lot

> MR job will fail after commit fail
> --
>
> Key: MAPREDUCE-5718
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5718
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am
>Affects Versions: 2.3.0, 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Yang Hao
> Fix For: 2.6.0
>
> Attachments: MAPREDUCE-5718.v2.patch, mr-5718-0.patch
>
>
> when any of this happens:
> * While testing RM HA, if the RM fails over while an MR AM is in the middle 
> of a commit, 
> * When testing preempting, if the MR AM fails over during the middle of a 
> commit
> the subsequent AM gets spawned but dies with a diagnostic message - "We 
> crashed durring a commit". 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5718) MR job will fail after commit fail

2015-02-03 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14304041#comment-14304041
 ] 

Karthik Kambatla commented on MAPREDUCE-5718:
-

[~yanghaogn] - initially, I was also trying to delete the startCommitFile if 
there is not corresponding endFile. However, we can't do that for reasons Jason 
described here - 
https://issues.apache.org/jira/browse/MAPREDUCE-5718?focusedCommentId=13872189&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13872189

> MR job will fail after commit fail
> --
>
> Key: MAPREDUCE-5718
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5718
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am
>Affects Versions: 2.3.0, 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Yang Hao
> Fix For: 2.6.0
>
> Attachments: MAPREDUCE-5718.v2.patch, mr-5718-0.patch
>
>
> when any of this happens:
> * While testing RM HA, if the RM fails over while an MR AM is in the middle 
> of a commit, 
> * When testing preempting, if the MR AM fails over during the middle of a 
> commit
> the subsequent AM gets spawned but dies with a diagnostic message - "We 
> crashed durring a commit". 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5718) MR job will fail after commit fail

2015-02-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303369#comment-14303369
 ] 

Hadoop QA commented on MAPREDUCE-5718:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12694551/MAPREDUCE-5718.v2.patch
  against trunk revision 8cb4731.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app:

  org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5147//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5147//console

This message is automatically generated.

> MR job will fail after commit fail
> --
>
> Key: MAPREDUCE-5718
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5718
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am
>Affects Versions: 2.3.0, 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Yang Hao
> Fix For: 2.6.0
>
> Attachments: MAPREDUCE-5718.v2.patch, mr-5718-0.patch
>
>
> when any of this happens:
> * While testing RM HA, if the RM fails over while an MR AM is in the middle 
> of a commit, 
> * When testing preempting, if the MR AM fails over during the middle of a 
> commit
> the subsequent AM gets spawned but dies with a diagnostic message - "We 
> crashed durring a commit". 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5718) MR job will fail after commit fail

2015-01-26 Thread Yang Hao (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14291561#comment-14291561
 ] 

Yang Hao commented on MAPREDUCE-5718:
-

if crashed during a commit and if an api comes out to check if the job can fail 
over, then the problem will be fixed

> MR job will fail after commit fail
> --
>
> Key: MAPREDUCE-5718
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5718
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am
>Affects Versions: 2.3.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>  Labels: ha
> Attachments: mr-5718-0.patch
>
>
> when any of this happens:
> * While testing RM HA, if the RM fails over while an MR AM is in the middle 
> of a commit, 
> * When testing preempting, if the MR AM fails over during the middle of a 
> commit
> the subsequent AM gets spawned but dies with a diagnostic message - "We 
> crashed durring a commit". 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)