[jira] [Commented] (MAPREDUCE-7282) MR v2 commit algorithm should be deprecated and not the default

2020-10-07 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17209768#comment-17209768
 ] 

Steve Loughran commented on MAPREDUCE-7282:
---

Created MAPREDUCE-7300 which adds ability to ask a committer whether failures 
during task attempt commit are recoverable.

combined with configurable policy on what to do here: warn + continue vs fail, 
people can stick with v2 committer but can be happy that if something went 
wrong during task commit, they'd know.

> MR v2 commit algorithm should be deprecated and not the default
> ---
>
> Key: MAPREDUCE-7282
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7282
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 3.3.0, 3.2.1, 3.1.3, 3.3.1
>Reporter: Steve Loughran
>Priority: Major
>
> The v2 MR commit algorithm moves files from the task attempt dir into the 
> dest dir on task commit -one by one
> It is therefore not atomic
> # if a task commit fails partway through and another task attempt commits 
> -unless exactly the same filenames are used, output of the first attempt may 
> be included in the final result
> # if a worker partitions partway through task commit, and then continues 
> after another attempt has committed, it may partially overwrite the output 
> -even when the filenames are the same
> Both MR and spark assume that task commits are atomic. Either they need to 
> consider that this is not the case, we add a way to probe for a committer 
> supporting atomic task commit, and the engines both add handling for task 
> commit failures (probably fail job)
> Better: we remove this as the default, maybe also warn when it is being used



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7282) MR v2 commit algorithm should be deprecated and not the default

2020-10-05 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208145#comment-17208145
 ] 

Steve Loughran commented on MAPREDUCE-7282:
---

bq.  Tasks request permission from the AM to commit.

yes, and then we assume that they continue to completion, rather than pausing 
for an extended period of time, so by the time the AM/spark driver gets a 
timeout, it can be assumed to be one of a network failure or the worker has 
failed/VM/k8s container terminated. The "suspended for a long time and then 
continues" risk does exist,  and is unlikely on a physical cluster, but in a 
world of VMs, not entirely inconceivable. 

I note the MR AM does track its time from last heartbeat to the YARN RM to 
detect partitions, workers don't. 

> MR v2 commit algorithm should be deprecated and not the default
> ---
>
> Key: MAPREDUCE-7282
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7282
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 3.3.0, 3.2.1, 3.1.3, 3.3.1
>Reporter: Steve Loughran
>Priority: Major
>
> The v2 MR commit algorithm moves files from the task attempt dir into the 
> dest dir on task commit -one by one
> It is therefore not atomic
> # if a task commit fails partway through and another task attempt commits 
> -unless exactly the same filenames are used, output of the first attempt may 
> be included in the final result
> # if a worker partitions partway through task commit, and then continues 
> after another attempt has committed, it may partially overwrite the output 
> -even when the filenames are the same
> Both MR and spark assume that task commits are atomic. Either they need to 
> consider that this is not the case, we add a way to probe for a committer 
> supporting atomic task commit, and the engines both add handling for task 
> commit failures (probably fail job)
> Better: we remove this as the default, maybe also warn when it is being used



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7282) MR v2 commit algorithm should be deprecated and not the default

2020-09-29 Thread Daryn Sharp (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17204264#comment-17204264
 ] 

Daryn Sharp commented on MAPREDUCE-7282:


I'm also -1 on changing the default.  It exposes users to new (old but new to 
them) behavior that may have quirks. This was a 2.7 change from 5 years ago so 
if it's a high risk issue our customers would have squawked by now. Has this 
been frequently observed or theorized?

Notably our users won't tolerate the performance regression and SLA misses. I 
seem to recall jobs that ran for a single-digit minutes followed by a 
double-digit commit. The v2 commit amortized the commit to under a minute.

I'm not a MR expert. Here's my understanding:
{quote}if a task commit fails partway through and another task attempt commits 
-unless exactly the same filenames are used, output of the first attempt may be 
included in the final result
{quote}
Isn't that indicative of a non-deterministic job? Should the risk to a few 
"bad" jobs outweigh the benefit to the mass majority of jobs? Why not change 
the committer for at risk jobs?
{quote}if a worker partitions partway through task commit, and then continues 
after another attempt has committed, it may partially overwrite the output 
-even when the filenames are the same
{quote}
I don't think this can happen. Tasks request permission from the AM to commit.

> MR v2 commit algorithm should be deprecated and not the default
> ---
>
> Key: MAPREDUCE-7282
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7282
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 3.3.0, 3.2.1, 3.1.3, 3.3.1
>Reporter: Steve Loughran
>Priority: Major
>
> The v2 MR commit algorithm moves files from the task attempt dir into the 
> dest dir on task commit -one by one
> It is therefore not atomic
> # if a task commit fails partway through and another task attempt commits 
> -unless exactly the same filenames are used, output of the first attempt may 
> be included in the final result
> # if a worker partitions partway through task commit, and then continues 
> after another attempt has committed, it may partially overwrite the output 
> -even when the filenames are the same
> Both MR and spark assume that task commits are atomic. Either they need to 
> consider that this is not the case, we add a way to probe for a committer 
> supporting atomic task commit, and the engines both add handling for task 
> commit failures (probably fail job)
> Better: we remove this as the default, maybe also warn when it is being used



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7282) MR v2 commit algorithm should be deprecated and not the default

2020-09-29 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17204156#comment-17204156
 ] 

Steve Loughran commented on MAPREDUCE-7282:
---

[~Jim_Brennan] -have a look at the latest PR.

this retains it, simply changes the default and logs @ WARN When you use 
V1...and uses a special log for that warning so you can turn it off without 
running the risk of losing important messages

> MR v2 commit algorithm should be deprecated and not the default
> ---
>
> Key: MAPREDUCE-7282
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7282
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 3.3.0, 3.2.1, 3.1.3, 3.3.1
>Reporter: Steve Loughran
>Priority: Major
>
> The v2 MR commit algorithm moves files from the task attempt dir into the 
> dest dir on task commit -one by one
> It is therefore not atomic
> # if a task commit fails partway through and another task attempt commits 
> -unless exactly the same filenames are used, output of the first attempt may 
> be included in the final result
> # if a worker partitions partway through task commit, and then continues 
> after another attempt has committed, it may partially overwrite the output 
> -even when the filenames are the same
> Both MR and spark assume that task commits are atomic. Either they need to 
> consider that this is not the case, we add a way to probe for a committer 
> supporting atomic task commit, and the engines both add handling for task 
> commit failures (probably fail job)
> Better: we remove this as the default, maybe also warn when it is being used



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7282) MR v2 commit algorithm should be deprecated and not the default

2020-09-24 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201605#comment-17201605
 ] 

Hadoop QA commented on MAPREDUCE-7282:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
15s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 2 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 31m 
21s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
34s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
29s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
45s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 41s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
15s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs 
config; considering switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
13s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
30s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
32s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
32s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
28s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
28s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 20s{color} | 
{color:orange}https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2320/3/artifact/out/diff-checkstyle-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core.txt{color}
 | {color:orange} 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core: 
The patch generated 1 new + 46 unchanged - 5 fixed = 47 total (was 51) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
30s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green}{color} | {color:green} The patch has no ill-formed 
XML file. {color} |
| {color:green}+1{color} | 

[jira] [Commented] (MAPREDUCE-7282) MR v2 commit algorithm should be deprecated and not the default

2020-09-23 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201046#comment-17201046
 ] 

Jim Brennan commented on MAPREDUCE-7282:


I am -1 on the proposal to remove the v2 algorithm.   Please see [~jlowe]'s 
 
[comment|https://issues.apache.org/jira/browse/MAPREDUCE-4815?focusedCommentId=14271115=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14271115]
 from the original discussion on [MAPREDUCE-4815].

We have been running with the v2 algorithm in production on large clusters for 
years at Verizon Media (Yahoo).   I don't think it is appropriate to remove it.

> MR v2 commit algorithm should be deprecated and not the default
> ---
>
> Key: MAPREDUCE-7282
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7282
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 3.3.0, 3.2.1, 3.1.3, 3.3.1
>Reporter: Steve Loughran
>Priority: Major
>
> The v2 MR commit algorithm moves files from the task attempt dir into the 
> dest dir on task commit -one by one
> It is therefore not atomic
> # if a task commit fails partway through and another task attempt commits 
> -unless exactly the same filenames are used, output of the first attempt may 
> be included in the final result
> # if a worker partitions partway through task commit, and then continues 
> after another attempt has committed, it may partially overwrite the output 
> -even when the filenames are the same
> Both MR and spark assume that task commits are atomic. Either they need to 
> consider that this is not the case, we add a way to probe for a committer 
> supporting atomic task commit, and the engines both add handling for task 
> commit failures (probably fail job)
> Better: we remove this as the default, maybe also warn when it is being used



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org