[jira] [Commented] (MAPREDUCE-7028) Concurrent task progress updates causing NPE in Application Master

2018-01-02 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16308905#comment-16308905
 ] 

Miklos Szegedi commented on MAPREDUCE-7028:
---

Thank you for the patch [~grepas]. I do not see any more issues with patch 004. 
[~jlowe], what do you think?

> Concurrent task progress updates causing NPE in Application Master
> --
>
> Key: MAPREDUCE-7028
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7028
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am
>Affects Versions: 3.1.0, 3.0.1, 2.10.0, 2.9.1, 2.8.4, 2.7.6
>Reporter: Gergo Repas
>Assignee: Gergo Repas
> Attachments: MAPREDUCE-7028.000.patch, MAPREDUCE-7028.001.patch, 
> MAPREDUCE-7028.002.patch, MAPREDUCE-7028.003.patch, MAPREDUCE-7028.004.patch
>
>
> Concurrent task progress updates can cause a NullPointerException in the 
> Application Master (stack trace is with code at current trunk):
> {quote}
> 2017-12-20 06:49:42,369 INFO [IPC Server handler 9 on 39501] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1513780867907_0001_m_02_0 is : 0.02677883
> 2017-12-20 06:49:42,369 INFO [IPC Server handler 13 on 39501] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1513780867907_0001_m_02_0 is : 0.02677883
> 2017-12-20 06:49:42,383 FATAL [AsyncDispatcher event handler] 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
> java.lang.NullPointerException
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$StatusUpdater.transition(TaskAttemptImpl.java:2450)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$StatusUpdater.transition(TaskAttemptImpl.java:2433)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1362)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:154)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1543)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1535)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:748)
> 2017-12-20 06:49:42,385 INFO [IPC Server handler 13 on 39501] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1513780867907_0001_m_02_0 is : 0.02677883
> 2017-12-20 06:49:42,386 INFO [AsyncDispatcher ShutDown handler] 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..
> {quote}
> This happened naturally in several big wordcount runs, and I could reproduce 
> this reliably by artificially making task updates more frequent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6926) Allow MR jobs to opt out of oversubscription

2018-01-02 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16308813#comment-16308813
 ] 

Haibo Chen commented on MAPREDUCE-6926:
---

As pointed by Miklos offiline, HADOOP-15122 should fix the build issue here. 
Cherry-picking HADOOP-15122 and re triggering another Jenkins job.

> Allow MR jobs to opt out of oversubscription
> 
>
> Key: MAPREDUCE-6926
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6926
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: mrv2
>Affects Versions: 3.0.0-alpha3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: MAPREDUCE-6926-YARN-1011.00.patch, 
> MAPREDUCE-6926-YARN-1011.01.patch, MAPREDUCE-6926-YARN-1011.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6926) Allow MR jobs to opt out of oversubscription

2018-01-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16308736#comment-16308736
 ] 

Hadoop QA commented on MAPREDUCE-6926:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} YARN-1011 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  2m 
32s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
15s{color} | {color:red} root in YARN-1011 failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
32s{color} | {color:red} hadoop-mapreduce-client in YARN-1011 failed. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green} YARN-1011 passed {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
10s{color} | {color:red} hadoop-mapreduce-client-core in YARN-1011 failed. 
{color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
17s{color} | {color:red} hadoop-mapreduce-client-app in YARN-1011 failed. 
{color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  1m 
16s{color} | {color:red} branch has errors when building and testing our client 
artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
58s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
45s{color} | {color:green} YARN-1011 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
32s{color} | {color:red} hadoop-mapreduce-client-core in the patch failed. 
{color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
21s{color} | {color:red} hadoop-mapreduce-client-app in the patch failed. 
{color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
30s{color} | {color:red} hadoop-mapreduce-client in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 30s{color} 
| {color:red} hadoop-mapreduce-client in the patch failed. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green} hadoop-mapreduce-project/hadoop-mapreduce-client: 
The patch generated 0 new + 714 unchanged - 2 fixed = 714 total (was 716) 
{color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
12s{color} | {color:red} hadoop-mapreduce-client-core in the patch failed. 
{color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m  
9s{color} | {color:red} hadoop-mapreduce-client-app in the patch failed. 
{color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  0m  
9s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
18s{color} | {color:red} hadoop-mapreduce-client-app in the patch failed. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
59s{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 18s{color} 
| {color:red} hadoop-mapreduce-client-app in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 17m  

[jira] [Updated] (MAPREDUCE-6926) Allow MR jobs to opt out of oversubscription

2018-01-02 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-6926:
--
Attachment: MAPREDUCE-6926-YARN-1011.02.patch

> Allow MR jobs to opt out of oversubscription
> 
>
> Key: MAPREDUCE-6926
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6926
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: mrv2
>Affects Versions: 3.0.0-alpha3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: MAPREDUCE-6926-YARN-1011.00.patch, 
> MAPREDUCE-6926-YARN-1011.01.patch, MAPREDUCE-6926-YARN-1011.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6926) Allow MR jobs to opt out of oversubscription

2018-01-02 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16308705#comment-16308705
 ] 

Haibo Chen commented on MAPREDUCE-6926:
---

Good catch. I updated the message in the new patch.

> Allow MR jobs to opt out of oversubscription
> 
>
> Key: MAPREDUCE-6926
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6926
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: mrv2
>Affects Versions: 3.0.0-alpha3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: MAPREDUCE-6926-YARN-1011.00.patch, 
> MAPREDUCE-6926-YARN-1011.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6926) Allow MR jobs to opt out of oversubscription

2018-01-02 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16308701#comment-16308701
 ] 

Miklos Szegedi commented on MAPREDUCE-6926:
---

Thank you, [~haibochen] for the reply. There is one more thing I found that the 
following line is inaccurate. It should mention not enforced or and instead of 
or.
{code}
866 Assert.fail("The execution type of ResourceRequest " + 
resourceRequest +
867 " is not guaranteed or enforced.");
{code}


> Allow MR jobs to opt out of oversubscription
> 
>
> Key: MAPREDUCE-6926
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6926
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: mrv2
>Affects Versions: 3.0.0-alpha3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: MAPREDUCE-6926-YARN-1011.00.patch, 
> MAPREDUCE-6926-YARN-1011.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7028) Concurrent task progress updates causing NPE in Application Master

2018-01-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16308634#comment-16308634
 ] 

Hadoop QA commented on MAPREDUCE-7028:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
10s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 18s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 18s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  8m 28s{color} 
| {color:red} hadoop-mapreduce-client-app in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 48m 24s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | MAPREDUCE-7028 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12904265/MAPREDUCE-7028.004.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 1e29f88510a4 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / dfe0cd8 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7270/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-app.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7270/testReport/ |
| Max. process+thread count | 530 (vs. ulimit of 5000) |
| modules | C: 

[jira] [Commented] (MAPREDUCE-7028) Concurrent task progress updates causing NPE in Application Master

2018-01-02 Thread Gergo Repas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16308585#comment-16308585
 ] 

Gergo Repas commented on MAPREDUCE-7028:


I re-ran the test I mentioned in my previous comment, there were a couple of 
retries (TaskAttempt attempt_1514921548444_0002_m_00_0: lastStatusRef 
changed by another thread, retrying...) in the MRAM log, progress looked good 
and the job ran to success.

> Concurrent task progress updates causing NPE in Application Master
> --
>
> Key: MAPREDUCE-7028
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7028
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am
>Affects Versions: 3.1.0, 3.0.1, 2.10.0, 2.9.1, 2.8.4, 2.7.6
>Reporter: Gergo Repas
>Assignee: Gergo Repas
> Attachments: MAPREDUCE-7028.000.patch, MAPREDUCE-7028.001.patch, 
> MAPREDUCE-7028.002.patch, MAPREDUCE-7028.003.patch, MAPREDUCE-7028.004.patch
>
>
> Concurrent task progress updates can cause a NullPointerException in the 
> Application Master (stack trace is with code at current trunk):
> {quote}
> 2017-12-20 06:49:42,369 INFO [IPC Server handler 9 on 39501] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1513780867907_0001_m_02_0 is : 0.02677883
> 2017-12-20 06:49:42,369 INFO [IPC Server handler 13 on 39501] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1513780867907_0001_m_02_0 is : 0.02677883
> 2017-12-20 06:49:42,383 FATAL [AsyncDispatcher event handler] 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
> java.lang.NullPointerException
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$StatusUpdater.transition(TaskAttemptImpl.java:2450)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$StatusUpdater.transition(TaskAttemptImpl.java:2433)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1362)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:154)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1543)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1535)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:748)
> 2017-12-20 06:49:42,385 INFO [IPC Server handler 13 on 39501] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1513780867907_0001_m_02_0 is : 0.02677883
> 2017-12-20 06:49:42,386 INFO [AsyncDispatcher ShutDown handler] 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..
> {quote}
> This happened naturally in several big wordcount runs, and I could reproduce 
> this reliably by artificially making task updates more frequent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7028) Concurrent task progress updates causing NPE in Application Master

2018-01-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16308577#comment-16308577
 ] 

Hadoop QA commented on MAPREDUCE-7028:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  3m 
14s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 59s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m  5s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  8m  1s{color} 
| {color:red} hadoop-mapreduce-client-app in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 51m 16s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.mapred.TestTaskAttemptListenerImpl |
|   | hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | MAPREDUCE-7028 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12904260/MAPREDUCE-7028.003.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 3cda9c52d8b5 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 7fe6f83 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7269/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-app.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7269/testReport/ |
| Max. process+thread count | 522 (vs. 

[jira] [Updated] (MAPREDUCE-6989) [Umbrella] Uploader tool for Distributed Cache Deploy of the mapreduce framework and dependencies

2018-01-02 Thread Miklos Szegedi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated MAPREDUCE-6989:
--
Summary: [Umbrella] Uploader tool for Distributed Cache Deploy of the 
mapreduce framework and dependencies  (was: [Umbrella] Uploader tool for 
Distributed Cache Deploy)

> [Umbrella] Uploader tool for Distributed Cache Deploy of the mapreduce 
> framework and dependencies
> -
>
> Key: MAPREDUCE-6989
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6989
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
> Attachments: MAPREDUCE-6989 Mapreduce framework uploader tool.pdf
>
>
> The proposal is to create a tool that collects all available jars in the 
> Hadoop classpath and adds them to a single tarball file. It then uploads the 
> resulting archive to an HDFS directory. This saves the cluster administrator 
> from having to set this up manually for Distributed Cache Deploy.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6989) [Umbrella] Uploader tool for Distributed Cache Deploy

2018-01-02 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16308570#comment-16308570
 ] 

Miklos Szegedi commented on MAPREDUCE-6989:
---

One more thing, this tool should run only once per upgrade to update the 
framework only. Individual tasks can still specify their own jars and use the 
shared cache for those.

> [Umbrella] Uploader tool for Distributed Cache Deploy
> -
>
> Key: MAPREDUCE-6989
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6989
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
> Attachments: MAPREDUCE-6989 Mapreduce framework uploader tool.pdf
>
>
> The proposal is to create a tool that collects all available jars in the 
> Hadoop classpath and adds them to a single tarball file. It then uploads the 
> resulting archive to an HDFS directory. This saves the cluster administrator 
> from having to set this up manually for Distributed Cache Deploy.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6989) [Umbrella] Uploader tool for Distributed Cache Deploy

2018-01-02 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16308551#comment-16308551
 ] 

Miklos Szegedi commented on MAPREDUCE-6989:
---

[~ctrezzo], thanks for the comment. I believe they can extend each other but 
they have slightly distinct functionality. This tool is rather a collector for 
multiple jars in the class path to a single tarball and it also uploads but 
that is just a auxiliary task. It could leverage the shared cache though not to 
upload a duplicate instance of the same jar. Please let me know, if I am 
missing something. Now that you mentioned, there is a need to delete the jar 
once no applications are using it. It would be very useful, if we could solve 
that with the shared cache.

> [Umbrella] Uploader tool for Distributed Cache Deploy
> -
>
> Key: MAPREDUCE-6989
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6989
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
> Attachments: MAPREDUCE-6989 Mapreduce framework uploader tool.pdf
>
>
> The proposal is to create a tool that collects all available jars in the 
> Hadoop classpath and adds them to a single tarball file. It then uploads the 
> resulting archive to an HDFS directory. This saves the cluster administrator 
> from having to set this up manually for Distributed Cache Deploy.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7028) Concurrent task progress updates causing NPE in Application Master

2018-01-02 Thread Gergo Repas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gergo Repas updated MAPREDUCE-7028:
---
Attachment: MAPREDUCE-7028.004.patch

MAPREDUCE-7028.004.patch: fixed the bug that caused the unit test failures. I'm 
re-running the test with the tasks that artificially send frequent updates 
(with this test I could reliably reproduce the issue), and will let you know 
about the results.

> Concurrent task progress updates causing NPE in Application Master
> --
>
> Key: MAPREDUCE-7028
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7028
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am
>Affects Versions: 3.1.0, 3.0.1, 2.10.0, 2.9.1, 2.8.4, 2.7.6
>Reporter: Gergo Repas
>Assignee: Gergo Repas
> Attachments: MAPREDUCE-7028.000.patch, MAPREDUCE-7028.001.patch, 
> MAPREDUCE-7028.002.patch, MAPREDUCE-7028.003.patch, MAPREDUCE-7028.004.patch
>
>
> Concurrent task progress updates can cause a NullPointerException in the 
> Application Master (stack trace is with code at current trunk):
> {quote}
> 2017-12-20 06:49:42,369 INFO [IPC Server handler 9 on 39501] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1513780867907_0001_m_02_0 is : 0.02677883
> 2017-12-20 06:49:42,369 INFO [IPC Server handler 13 on 39501] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1513780867907_0001_m_02_0 is : 0.02677883
> 2017-12-20 06:49:42,383 FATAL [AsyncDispatcher event handler] 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
> java.lang.NullPointerException
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$StatusUpdater.transition(TaskAttemptImpl.java:2450)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$StatusUpdater.transition(TaskAttemptImpl.java:2433)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1362)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:154)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1543)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1535)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:748)
> 2017-12-20 06:49:42,385 INFO [IPC Server handler 13 on 39501] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1513780867907_0001_m_02_0 is : 0.02677883
> 2017-12-20 06:49:42,386 INFO [AsyncDispatcher ShutDown handler] 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..
> {quote}
> This happened naturally in several big wordcount runs, and I could reproduce 
> this reliably by artificially making task updates more frequent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6989) [Umbrella] Uploader tool for Distributed Cache Deploy

2018-01-02 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16308542#comment-16308542
 ] 

Chris Trezzo commented on MAPREDUCE-6989:
-

Hey [~miklos.szeg...@cloudera.com]! Thanks for the work so far! I have a 
question around the high-level approach: Is there a reason why we can't 
leverage the shared cache for this? There is already an upload mechanism that 
has been built, along with a cleaning mechanism and a way to cache similar jars.

> [Umbrella] Uploader tool for Distributed Cache Deploy
> -
>
> Key: MAPREDUCE-6989
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6989
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
> Attachments: MAPREDUCE-6989 Mapreduce framework uploader tool.pdf
>
>
> The proposal is to create a tool that collects all available jars in the 
> Hadoop classpath and adds them to a single tarball file. It then uploads the 
> resulting archive to an HDFS directory. This saves the cluster administrator 
> from having to set this up manually for Distributed Cache Deploy.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7028) Concurrent task progress updates causing NPE in Application Master

2018-01-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16308512#comment-16308512
 ] 

Hadoop QA commented on MAPREDUCE-7028:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m  8s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 18s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  8m  4s{color} 
| {color:red} hadoop-mapreduce-client-app in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 59m 19s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.mapred.TestTaskAttemptListenerImpl |
|   | hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | MAPREDUCE-7028 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12904257/MAPREDUCE-7028.002.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 8d0a6a205360 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 7fe6f83 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7268/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-app.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7268/testReport/ |
| Max. process+thread count | 511 (vs. 

[jira] [Commented] (MAPREDUCE-7015) Possible race condition in JHS if the job is not loaded

2018-01-02 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16308511#comment-16308511
 ] 

Haibo Chen commented on MAPREDUCE-7015:
---

[~pbacsko] I think you meant to say step #6 is slow enough for the most times, 
right? Otherwise, the cli client would almost always try to find the config 
file in the intermediate directory when the file is already quickly moved into 
the done directory, and fail.

> Possible race condition in JHS if the job is not loaded
> ---
>
> Key: MAPREDUCE-7015
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7015
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
> Attachments: MAPREDUCE-7015-POC01.patch
>
>
> There could be a race condition inside JHS. In our build environment, 
> {{TestMRJobClient.testJobClient()}} failed with this exception:
> {noformat}
> ava.io.FileNotFoundException: File does not exist: 
> hdfs://localhost:32836/tmp/hadoop-yarn/staging/history/done_intermediate/jenkins/job_1509975084722_0001_conf.xml
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1266)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1258)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1258)
>   at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:340)
>   at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:292)
>   at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2123)
>   at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2092)
>   at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2068)
>   at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:460)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at 
> org.apache.hadoop.mapreduce.TestMRJobClient.runTool(TestMRJobClient.java:94)
>   at 
> org.apache.hadoop.mapreduce.TestMRJobClient.testConfig(TestMRJobClient.java:551)
>   at 
> org.apache.hadoop.mapreduce.TestMRJobClient.testJobClient(TestMRJobClient.java:167)
> {noformat}
> Root cause:
> 1. MapReduce job completes
> 2. CLI calls {{cluster.getJob(jobid)}}
> 3. The job is finished and the client side gets redirected to JHS
> 4. The job data is missing from {{CachedHistoryStorage}} so JHS tries to find 
> the job
> 5. First it scans the intermediate directory and finds the job
> 6. The call {{moveToDone()}} is scheduled for execution on a separate thread 
> inside {{moveToDoneExecutor}} but does not get the chance to run immediately
> 7. RPC invocation returns with the path pointing to 
> {{/tmp/hadoop-yarn/staging/history/done_intermediate}}
> 8. The call to {{moveToDone()}} completes which moves the contents of 
> {{done_intermediate}} to {{done}}
> 9. Hadoop CLI tries to download the config file from done_intermediate but 
> it's no longer there
> Usually step #6 is fast enough to complete before step #7, but sometimes it 
> can get behind, causing this race condition.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7028) Concurrent task progress updates causing NPE in Application Master

2018-01-02 Thread Gergo Repas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gergo Repas updated MAPREDUCE-7028:
---
Attachment: MAPREDUCE-7028.003.patch

MAPREDUCE-7028.003.patch: Two minor changes: 1. swapping the order of adding 
fetchFailedMaps (first the ones from the previous update, then from the current 
update), 2. evaluating asyncUpdatedNeeded outside the while loop. 

> Concurrent task progress updates causing NPE in Application Master
> --
>
> Key: MAPREDUCE-7028
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7028
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am
>Affects Versions: 3.1.0, 3.0.1, 2.10.0, 2.9.1, 2.8.4, 2.7.6
>Reporter: Gergo Repas
>Assignee: Gergo Repas
> Attachments: MAPREDUCE-7028.000.patch, MAPREDUCE-7028.001.patch, 
> MAPREDUCE-7028.002.patch, MAPREDUCE-7028.003.patch
>
>
> Concurrent task progress updates can cause a NullPointerException in the 
> Application Master (stack trace is with code at current trunk):
> {quote}
> 2017-12-20 06:49:42,369 INFO [IPC Server handler 9 on 39501] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1513780867907_0001_m_02_0 is : 0.02677883
> 2017-12-20 06:49:42,369 INFO [IPC Server handler 13 on 39501] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1513780867907_0001_m_02_0 is : 0.02677883
> 2017-12-20 06:49:42,383 FATAL [AsyncDispatcher event handler] 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
> java.lang.NullPointerException
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$StatusUpdater.transition(TaskAttemptImpl.java:2450)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$StatusUpdater.transition(TaskAttemptImpl.java:2433)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1362)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:154)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1543)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1535)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:748)
> 2017-12-20 06:49:42,385 INFO [IPC Server handler 13 on 39501] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1513780867907_0001_m_02_0 is : 0.02677883
> 2017-12-20 06:49:42,386 INFO [AsyncDispatcher ShutDown handler] 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..
> {quote}
> This happened naturally in several big wordcount runs, and I could reproduce 
> this reliably by artificially making task updates more frequent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7028) Concurrent task progress updates causing NPE in Application Master

2018-01-02 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16308466#comment-16308466
 ] 

Miklos Szegedi commented on MAPREDUCE-7028:
---

Thank you for the comment, [~jlowe]. Indeed the lock does not help in all cases.
Thank you for the patch [~grepas].
{code}
602   taskAttemptStatus.fetchFailedMaps.addAll(
603   taskAttemptStatus.fetchFailedMaps);
604   taskAttemptStatus.fetchFailedMaps.addAll(
605   lastStatus.fetchFailedMaps);
{code}
The time order of the two lists is the opposite, so I would reverse them. Also 
the asyncUpdateNeeded update can go outside the loop.

> Concurrent task progress updates causing NPE in Application Master
> --
>
> Key: MAPREDUCE-7028
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7028
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am
>Affects Versions: 3.1.0, 3.0.1, 2.10.0, 2.9.1, 2.8.4, 2.7.6
>Reporter: Gergo Repas
>Assignee: Gergo Repas
> Attachments: MAPREDUCE-7028.000.patch, MAPREDUCE-7028.001.patch, 
> MAPREDUCE-7028.002.patch
>
>
> Concurrent task progress updates can cause a NullPointerException in the 
> Application Master (stack trace is with code at current trunk):
> {quote}
> 2017-12-20 06:49:42,369 INFO [IPC Server handler 9 on 39501] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1513780867907_0001_m_02_0 is : 0.02677883
> 2017-12-20 06:49:42,369 INFO [IPC Server handler 13 on 39501] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1513780867907_0001_m_02_0 is : 0.02677883
> 2017-12-20 06:49:42,383 FATAL [AsyncDispatcher event handler] 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
> java.lang.NullPointerException
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$StatusUpdater.transition(TaskAttemptImpl.java:2450)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$StatusUpdater.transition(TaskAttemptImpl.java:2433)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1362)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:154)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1543)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1535)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:748)
> 2017-12-20 06:49:42,385 INFO [IPC Server handler 13 on 39501] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1513780867907_0001_m_02_0 is : 0.02677883
> 2017-12-20 06:49:42,386 INFO [AsyncDispatcher ShutDown handler] 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..
> {quote}
> This happened naturally in several big wordcount runs, and I could reproduce 
> this reliably by artificially making task updates more frequent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (MAPREDUCE-7028) Concurrent task progress updates causing NPE in Application Master

2018-01-02 Thread Gergo Repas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16308437#comment-16308437
 ] 

Gergo Repas edited comment on MAPREDUCE-7028 at 1/2/18 6:02 PM:


[~miklos.szeg...@cloudera.com] Thanks for the review, I addressed your 
comments. Regarding using locks vs retry-logic: I agree with [~jlowe]'s 
comment, status updates are sent in a synchronous and sequential manner from 
the task, and it must be the RPC retry logic causing the concurrent updates. 
And also, even the locking would not ensure that the update that was first sent 
will be processed first. So I have not changed the retry logic to locking for 
now.


was (Author: grepas):
[~miklos.szeg...@cloudera.com] Thanks for the review, I addressed your 
comments. Regarding using locks vs retry-logic: I agree with [~jlowe]'s 
comment, status updates are sent in a synchronous manner from the task, and it 
must be the RPC retry logic causing the concurrent updates. And also, even the 
locking would not ensure that the update that was first sent will be processed 
first. So I have not changed the retry logic to locking for now.

> Concurrent task progress updates causing NPE in Application Master
> --
>
> Key: MAPREDUCE-7028
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7028
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am
>Affects Versions: 3.1.0, 3.0.1, 2.10.0, 2.9.1, 2.8.4, 2.7.6
>Reporter: Gergo Repas
>Assignee: Gergo Repas
> Attachments: MAPREDUCE-7028.000.patch, MAPREDUCE-7028.001.patch, 
> MAPREDUCE-7028.002.patch
>
>
> Concurrent task progress updates can cause a NullPointerException in the 
> Application Master (stack trace is with code at current trunk):
> {quote}
> 2017-12-20 06:49:42,369 INFO [IPC Server handler 9 on 39501] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1513780867907_0001_m_02_0 is : 0.02677883
> 2017-12-20 06:49:42,369 INFO [IPC Server handler 13 on 39501] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1513780867907_0001_m_02_0 is : 0.02677883
> 2017-12-20 06:49:42,383 FATAL [AsyncDispatcher event handler] 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
> java.lang.NullPointerException
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$StatusUpdater.transition(TaskAttemptImpl.java:2450)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$StatusUpdater.transition(TaskAttemptImpl.java:2433)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1362)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:154)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1543)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1535)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:748)
> 2017-12-20 06:49:42,385 INFO [IPC Server handler 13 on 39501] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1513780867907_0001_m_02_0 is : 0.02677883
> 2017-12-20 06:49:42,386 INFO [AsyncDispatcher ShutDown handler] 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..
> {quote}
> This happened naturally in several big wordcount runs, and I could reproduce 
> this reliably by artificially making task updates more frequent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7028) Concurrent task progress updates causing NPE in Application Master

2018-01-02 Thread Gergo Repas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gergo Repas updated MAPREDUCE-7028:
---
Attachment: MAPREDUCE-7028.002.patch

[~miklos.szeg...@cloudera.com] Thanks for the review, I addressed your 
comments. Regarding using locks vs retry-logic: I agree with [~jlowe]'s 
comment, status updates are sent in a synchronous manner from the task, and it 
must be the RPC retry logic causing the concurrent updates. And also, even the 
locking would not ensure that the update that was first sent will be processed 
first. So I have not changed the retry logic to locking for now.

> Concurrent task progress updates causing NPE in Application Master
> --
>
> Key: MAPREDUCE-7028
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7028
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am
>Affects Versions: 3.1.0, 3.0.1, 2.10.0, 2.9.1, 2.8.4, 2.7.6
>Reporter: Gergo Repas
>Assignee: Gergo Repas
> Attachments: MAPREDUCE-7028.000.patch, MAPREDUCE-7028.001.patch, 
> MAPREDUCE-7028.002.patch
>
>
> Concurrent task progress updates can cause a NullPointerException in the 
> Application Master (stack trace is with code at current trunk):
> {quote}
> 2017-12-20 06:49:42,369 INFO [IPC Server handler 9 on 39501] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1513780867907_0001_m_02_0 is : 0.02677883
> 2017-12-20 06:49:42,369 INFO [IPC Server handler 13 on 39501] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1513780867907_0001_m_02_0 is : 0.02677883
> 2017-12-20 06:49:42,383 FATAL [AsyncDispatcher event handler] 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
> java.lang.NullPointerException
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$StatusUpdater.transition(TaskAttemptImpl.java:2450)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$StatusUpdater.transition(TaskAttemptImpl.java:2433)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1362)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:154)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1543)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1535)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:748)
> 2017-12-20 06:49:42,385 INFO [IPC Server handler 13 on 39501] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1513780867907_0001_m_02_0 is : 0.02677883
> 2017-12-20 06:49:42,386 INFO [AsyncDispatcher ShutDown handler] 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..
> {quote}
> This happened naturally in several big wordcount runs, and I could reproduce 
> this reliably by artificially making task updates more frequent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6926) Allow MR jobs to opt out of oversubscription

2018-01-02 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16308413#comment-16308413
 ] 

Haibo Chen commented on MAPREDUCE-6926:
---

Thanks [~miklos.szeg...@cloudera.com] for the review! I agree with you that 
Using Clock.getTime() would ease unit test in general. But in this case, there 
are already other ContainerRequest constructors to allow easy unit test through 
direct specification of request time. Plus, I believe my change is preserving 
the current behavior in the code.

> Allow MR jobs to opt out of oversubscription
> 
>
> Key: MAPREDUCE-6926
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6926
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: mrv2
>Affects Versions: 3.0.0-alpha3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: MAPREDUCE-6926-YARN-1011.00.patch, 
> MAPREDUCE-6926-YARN-1011.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7030) Uploader tool should ignore symlinks to the same directory

2018-01-02 Thread Gergo Repas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16308338#comment-16308338
 ] 

Gergo Repas commented on MAPREDUCE-7030:


[~miklos.szeg...@cloudera.com] - thanks, just a couple of comments:
# boolean ignoreSymlink - its visibility can be restricted to private
# The exclusion will not apply to symlinks with "./symlink-target" content, 
even though it's pointing to the same file as if its content was 
"symlink-target" - is that acceptable for this feature?

> Uploader tool should ignore symlinks to the same directory
> --
>
> Key: MAPREDUCE-7030
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7030
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Minor
> Attachments: MAPREDUCE-7030.000.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7015) Possible race condition in JHS if the job is not loaded

2018-01-02 Thread Peter Bacsko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16307925#comment-16307925
 ] 

Peter Bacsko commented on MAPREDUCE-7015:
-

[~jlowe], [~haibochen] what do you think about the solution? There's an even 
simpler one: just remove {{moveToDoneExecutor}} completely, is that acceptable?

> Possible race condition in JHS if the job is not loaded
> ---
>
> Key: MAPREDUCE-7015
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7015
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
> Attachments: MAPREDUCE-7015-POC01.patch
>
>
> There could be a race condition inside JHS. In our build environment, 
> {{TestMRJobClient.testJobClient()}} failed with this exception:
> {noformat}
> ava.io.FileNotFoundException: File does not exist: 
> hdfs://localhost:32836/tmp/hadoop-yarn/staging/history/done_intermediate/jenkins/job_1509975084722_0001_conf.xml
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1266)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1258)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1258)
>   at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:340)
>   at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:292)
>   at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2123)
>   at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2092)
>   at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2068)
>   at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:460)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at 
> org.apache.hadoop.mapreduce.TestMRJobClient.runTool(TestMRJobClient.java:94)
>   at 
> org.apache.hadoop.mapreduce.TestMRJobClient.testConfig(TestMRJobClient.java:551)
>   at 
> org.apache.hadoop.mapreduce.TestMRJobClient.testJobClient(TestMRJobClient.java:167)
> {noformat}
> Root cause:
> 1. MapReduce job completes
> 2. CLI calls {{cluster.getJob(jobid)}}
> 3. The job is finished and the client side gets redirected to JHS
> 4. The job data is missing from {{CachedHistoryStorage}} so JHS tries to find 
> the job
> 5. First it scans the intermediate directory and finds the job
> 6. The call {{moveToDone()}} is scheduled for execution on a separate thread 
> inside {{moveToDoneExecutor}} but does not get the chance to run immediately
> 7. RPC invocation returns with the path pointing to 
> {{/tmp/hadoop-yarn/staging/history/done_intermediate}}
> 8. The call to {{moveToDone()}} completes which moves the contents of 
> {{done_intermediate}} to {{done}}
> 9. Hadoop CLI tries to download the config file from done_intermediate but 
> it's no longer there
> Usually step #6 is fast enough to complete before step #7, but sometimes it 
> can get behind, causing this race condition.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org