[jira] [Commented] (MAPREDUCE-6852) Job#updateStatus() failed with NPE due to race condition

2017-02-28 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889376#comment-15889376
 ] 

Jian He commented on MAPREDUCE-6852:


lgtm, committing tomorrow  

> Job#updateStatus() failed with NPE due to race condition
> 
>
> Key: MAPREDUCE-6852
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6852
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: MAPREDUCE-6852.patch, MAPREDUCE-6852-v2.patch
>
>
> Like MAPREDUCE-6762, we found this issue in a cluster where Pig query 
> occasionally failed on NPE - "Pig uses JobControl API to track MR job status, 
> but sometimes Job History Server failed to flush job meta files to HDFS which 
> caused the status update failed." Beside NPE in 
> o.a.h.mapreduce.Job.getJobName, we also get NPE in Job.updateStatus() and the 
> exception is as following:
> {noformat}
> Caused by: java.lang.NullPointerException
>   at org.apache.hadoop.mapreduce.Job$1.run(Job.java:323)
>   at org.apache.hadoop.mapreduce.Job$1.run(Job.java:320)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1833)
>   at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:320)
>   at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:604)
> {noformat}
> We found state here is null. However, we already check the job state to be 
> RUNNING as code below:
> {noformat}
>   public boolean isComplete() throws IOException {
> ensureState(JobState.RUNNING);
> updateStatus();
> return status.isJobComplete();
>   }
> {noformat}
> The only possible reason here is two threads are calling here for the same 
> time: ensure state first, then one thread update the state to null while the 
> other thread hit NPE issue here.
> We should fix this NPE exception.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6852) Job#updateStatus() failed with NPE due to race condition

2017-02-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889282#comment-15889282
 ] 

Hadoop QA commented on MAPREDUCE-6852:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 35s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
32s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
21s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
58s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
28s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
59s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 6s 
{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 25m 22s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12855269/MAPREDUCE-6852-v2.patch
 |
| JIRA Issue | MAPREDUCE-6852 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux d0268124cdaf 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 
09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 989bd56 |
| Default Java | 1.8.0_121 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6909/testReport/ |
| modules | C: 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
U: 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core |
| Console output | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6909/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Job#updateStatus() failed with NPE due to race condition
> 
>
> Key: MAPREDUCE-6852
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6852
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: 

[jira] [Updated] (MAPREDUCE-6852) Job#updateStatus() failed with NPE due to race condition

2017-02-28 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated MAPREDUCE-6852:
--
Status: Patch Available  (was: Open)

> Job#updateStatus() failed with NPE due to race condition
> 
>
> Key: MAPREDUCE-6852
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6852
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: MAPREDUCE-6852.patch, MAPREDUCE-6852-v2.patch
>
>
> Like MAPREDUCE-6762, we found this issue in a cluster where Pig query 
> occasionally failed on NPE - "Pig uses JobControl API to track MR job status, 
> but sometimes Job History Server failed to flush job meta files to HDFS which 
> caused the status update failed." Beside NPE in 
> o.a.h.mapreduce.Job.getJobName, we also get NPE in Job.updateStatus() and the 
> exception is as following:
> {noformat}
> Caused by: java.lang.NullPointerException
>   at org.apache.hadoop.mapreduce.Job$1.run(Job.java:323)
>   at org.apache.hadoop.mapreduce.Job$1.run(Job.java:320)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1833)
>   at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:320)
>   at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:604)
> {noformat}
> We found state here is null. However, we already check the job state to be 
> RUNNING as code below:
> {noformat}
>   public boolean isComplete() throws IOException {
> ensureState(JobState.RUNNING);
> updateStatus();
> return status.isJobComplete();
>   }
> {noformat}
> The only possible reason here is two threads are calling here for the same 
> time: ensure state first, then one thread update the state to null while the 
> other thread hit NPE issue here.
> We should fix this NPE exception.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6852) Job#updateStatus() failed with NPE due to race condition

2017-02-28 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated MAPREDUCE-6852:
--
Attachment: MAPREDUCE-6852-v2.patch

Sounds reasonable. v2 patch incorporate comments here.

> Job#updateStatus() failed with NPE due to race condition
> 
>
> Key: MAPREDUCE-6852
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6852
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: MAPREDUCE-6852.patch, MAPREDUCE-6852-v2.patch
>
>
> Like MAPREDUCE-6762, we found this issue in a cluster where Pig query 
> occasionally failed on NPE - "Pig uses JobControl API to track MR job status, 
> but sometimes Job History Server failed to flush job meta files to HDFS which 
> caused the status update failed." Beside NPE in 
> o.a.h.mapreduce.Job.getJobName, we also get NPE in Job.updateStatus() and the 
> exception is as following:
> {noformat}
> Caused by: java.lang.NullPointerException
>   at org.apache.hadoop.mapreduce.Job$1.run(Job.java:323)
>   at org.apache.hadoop.mapreduce.Job$1.run(Job.java:320)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1833)
>   at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:320)
>   at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:604)
> {noformat}
> We found state here is null. However, we already check the job state to be 
> RUNNING as code below:
> {noformat}
>   public boolean isComplete() throws IOException {
> ensureState(JobState.RUNNING);
> updateStatus();
> return status.isJobComplete();
>   }
> {noformat}
> The only possible reason here is two threads are calling here for the same 
> time: ensure state first, then one thread update the state to null while the 
> other thread hit NPE issue here.
> We should fix this NPE exception.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6852) Job#updateStatus() failed with NPE due to race condition

2017-02-28 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated MAPREDUCE-6852:
--
Status: Open  (was: Patch Available)

> Job#updateStatus() failed with NPE due to race condition
> 
>
> Key: MAPREDUCE-6852
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6852
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: MAPREDUCE-6852.patch
>
>
> Like MAPREDUCE-6762, we found this issue in a cluster where Pig query 
> occasionally failed on NPE - "Pig uses JobControl API to track MR job status, 
> but sometimes Job History Server failed to flush job meta files to HDFS which 
> caused the status update failed." Beside NPE in 
> o.a.h.mapreduce.Job.getJobName, we also get NPE in Job.updateStatus() and the 
> exception is as following:
> {noformat}
> Caused by: java.lang.NullPointerException
>   at org.apache.hadoop.mapreduce.Job$1.run(Job.java:323)
>   at org.apache.hadoop.mapreduce.Job$1.run(Job.java:320)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1833)
>   at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:320)
>   at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:604)
> {noformat}
> We found state here is null. However, we already check the job state to be 
> RUNNING as code below:
> {noformat}
>   public boolean isComplete() throws IOException {
> ensureState(JobState.RUNNING);
> updateStatus();
> return status.isJobComplete();
>   }
> {noformat}
> The only possible reason here is two threads are calling here for the same 
> time: ensure state first, then one thread update the state to null while the 
> other thread hit NPE issue here.
> We should fix this NPE exception.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6852) Job#updateStatus() failed with NPE due to race condition

2017-02-28 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889206#comment-15889206
 ] 

Jian He commented on MAPREDUCE-6852:


looks like getJobID is used in the same class in several other places, we may 
just use this method.

> Job#updateStatus() failed with NPE due to race condition
> 
>
> Key: MAPREDUCE-6852
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6852
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: MAPREDUCE-6852.patch
>
>
> Like MAPREDUCE-6762, we found this issue in a cluster where Pig query 
> occasionally failed on NPE - "Pig uses JobControl API to track MR job status, 
> but sometimes Job History Server failed to flush job meta files to HDFS which 
> caused the status update failed." Beside NPE in 
> o.a.h.mapreduce.Job.getJobName, we also get NPE in Job.updateStatus() and the 
> exception is as following:
> {noformat}
> Caused by: java.lang.NullPointerException
>   at org.apache.hadoop.mapreduce.Job$1.run(Job.java:323)
>   at org.apache.hadoop.mapreduce.Job$1.run(Job.java:320)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1833)
>   at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:320)
>   at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:604)
> {noformat}
> We found state here is null. However, we already check the job state to be 
> RUNNING as code below:
> {noformat}
>   public boolean isComplete() throws IOException {
> ensureState(JobState.RUNNING);
> updateStatus();
> return status.isJobComplete();
>   }
> {noformat}
> The only possible reason here is two threads are calling here for the same 
> time: ensure state first, then one thread update the state to null while the 
> other thread hit NPE issue here.
> We should fix this NPE exception.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6834) MR application fails with "No NMToken sent" exception after MRAppMaster recovery

2017-02-28 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888952#comment-15888952
 ] 

Haibo Chen commented on MAPREDUCE-6834:
---

Looks like very much like the same problem at 
https://issues.apache.org/jira/browse/YARN-3112

> MR application fails with "No NMToken sent" exception after MRAppMaster 
> recovery
> 
>
> Key: MAPREDUCE-6834
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6834
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: resourcemanager, yarn
>Affects Versions: 2.7.0
> Environment: Centos 7
>Reporter: Aleksandr Balitsky
>Assignee: Aleksandr Balitsky
>Priority: Critical
> Attachments: YARN-6019.001.patch
>
>
> *Steps to reproduce:*
> 1) Submit MR application (for example PI app with 50 containers)
> 2) Find MRAppMaster process id for the application 
> 3) Kill MRAppMaster by kill -9 command
> *Expected:* ResourceManager launch new MRAppMaster container and MRAppAttempt 
> and application finish correctly
> *Actually:* After launching new MRAppMaster and MRAppAttempt the application 
> fails with the following exception:
> {noformat}
> 2016-12-22 23:17:53,929 ERROR [ContainerLauncher #9] 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container 
> launch failed for container_1482408247195_0002_02_11 : 
> org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent 
> for node1:43037
>   at 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:254)
>   at 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.(ContainerManagementProtocolProxy.java:244)
>   at 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:129)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:395)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:361)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> *Problem*:
> When RMCommunicator sends "registerApplicationMaster" request to RM, RM 
> generates NMTokens for new RMAppAttempt. Those new NMTokens are transmitted 
> to RMCommunicator in RegisterApplicationMasterResponse  
> (getNMTokensFromPreviousAttempts method). But we don't handle these tokens in 
> RMCommunicator.register method. RM don't transmit tese tokens again for other 
> allocated requests, but we don't have these tokens in NMTokenCache. 
> Accordingly we get "No NMToken sent for node" exception.
> I have found that this issue appears after changes from the 
> https://github.com/apache/hadoop/commit/9b272ccae78918e7d756d84920a9322187d61eed
>  
> I tried to do the same scenario without the commit and application completed 
> successfully after RMAppMaster recovery



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6852) Job#updateStatus() failed with NPE due to race condition

2017-02-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888912#comment-15888912
 ] 

Hadoop QA commented on MAPREDUCE-6852:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
1s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
48s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
23s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
53s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 46s 
{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 22m 24s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12855219/MAPREDUCE-6852.patch |
| JIRA Issue | MAPREDUCE-6852 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux fd4f13fcd44d 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 
15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / e0bb867 |
| Default Java | 1.8.0_121 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6908/testReport/ |
| modules | C: 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
U: 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core |
| Console output | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6908/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Job#updateStatus() failed with NPE due to race condition
> 
>
> Key: MAPREDUCE-6852
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6852
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: 

[jira] [Updated] (MAPREDUCE-6852) Job#updateStatus() failed with NPE due to race condition

2017-02-28 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated MAPREDUCE-6852:
--
Attachment: MAPREDUCE-6852.patch

Upload a quick patch to fix this issue. Not include any unit test given this is 
a race condition case which is hard to do in unit test.

> Job#updateStatus() failed with NPE due to race condition
> 
>
> Key: MAPREDUCE-6852
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6852
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: MAPREDUCE-6852.patch
>
>
> Like MAPREDUCE-6762, we found this issue in a cluster where Pig query 
> occasionally failed on NPE - "Pig uses JobControl API to track MR job status, 
> but sometimes Job History Server failed to flush job meta files to HDFS which 
> caused the status update failed." Beside NPE in 
> o.a.h.mapreduce.Job.getJobName, we also get NPE in Job.updateStatus() and the 
> exception is as following:
> {noformat}
> Caused by: java.lang.NullPointerException
>   at org.apache.hadoop.mapreduce.Job$1.run(Job.java:323)
>   at org.apache.hadoop.mapreduce.Job$1.run(Job.java:320)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1833)
>   at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:320)
>   at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:604)
> {noformat}
> We found state here is null. However, we already check the job state to be 
> RUNNING as code below:
> {noformat}
>   public boolean isComplete() throws IOException {
> ensureState(JobState.RUNNING);
> updateStatus();
> return status.isJobComplete();
>   }
> {noformat}
> The only possible reason here is two threads are calling here for the same 
> time: ensure state first, then one thread update the state to null while the 
> other thread hit NPE issue here.
> We should fix this NPE exception.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6852) Job#updateStatus() failed with NPE due to race condition

2017-02-28 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated MAPREDUCE-6852:
--
Status: Patch Available  (was: Open)

> Job#updateStatus() failed with NPE due to race condition
> 
>
> Key: MAPREDUCE-6852
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6852
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: MAPREDUCE-6852.patch
>
>
> Like MAPREDUCE-6762, we found this issue in a cluster where Pig query 
> occasionally failed on NPE - "Pig uses JobControl API to track MR job status, 
> but sometimes Job History Server failed to flush job meta files to HDFS which 
> caused the status update failed." Beside NPE in 
> o.a.h.mapreduce.Job.getJobName, we also get NPE in Job.updateStatus() and the 
> exception is as following:
> {noformat}
> Caused by: java.lang.NullPointerException
>   at org.apache.hadoop.mapreduce.Job$1.run(Job.java:323)
>   at org.apache.hadoop.mapreduce.Job$1.run(Job.java:320)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1833)
>   at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:320)
>   at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:604)
> {noformat}
> We found state here is null. However, we already check the job state to be 
> RUNNING as code below:
> {noformat}
>   public boolean isComplete() throws IOException {
> ensureState(JobState.RUNNING);
> updateStatus();
> return status.isJobComplete();
>   }
> {noformat}
> The only possible reason here is two threads are calling here for the same 
> time: ensure state first, then one thread update the state to null while the 
> other thread hit NPE issue here.
> We should fix this NPE exception.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6852) Job#updateStatus() failed with NPE due to race condition

2017-02-28 Thread Junping Du (JIRA)
Junping Du created MAPREDUCE-6852:
-

 Summary: Job#updateStatus() failed with NPE due to race condition
 Key: MAPREDUCE-6852
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6852
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Junping Du
Assignee: Junping Du


Like MAPREDUCE-6762, we found this issue in a cluster where Pig query 
occasionally failed on NPE - "Pig uses JobControl API to track MR job status, 
but sometimes Job History Server failed to flush job meta files to HDFS which 
caused the status update failed." Beside NPE in o.a.h.mapreduce.Job.getJobName, 
we also get NPE in Job.updateStatus() and the exception is as following:
{noformat}
Caused by: java.lang.NullPointerException
at org.apache.hadoop.mapreduce.Job$1.run(Job.java:323)
at org.apache.hadoop.mapreduce.Job$1.run(Job.java:320)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1833)
at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:320)
at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:604)
{noformat}
We found state here is null. However, we already check the job state to be 
RUNNING as code below:
{noformat}
  public boolean isComplete() throws IOException {
ensureState(JobState.RUNNING);
updateStatus();
return status.isJobComplete();
  }
{noformat}
The only possible reason here is two threads are calling here for the same 
time: ensure state first, then one thread update the state to null while the 
other thread hit NPE issue here.
We should fix this NPE exception.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6850) Shuffle Handler keep-alive connections are closed from the server side

2017-02-28 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888762#comment-15888762
 ] 

Jonathan Eagles commented on MAPREDUCE-6850:


[~jlowe], can you have a look at this patch to see if is ready to commit? 
[~rajesh.balamohan] and I are happy so far with the current patch but would 
like a second set of eyes.

> Shuffle Handler keep-alive connections are closed from the server side
> --
>
> Key: MAPREDUCE-6850
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6850
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: MAPREDUCE-6850.1.patch, MAPREDUCE-6850.2.patch, 
> MAPREDUCE-6850.3.patch, MAPREDUCE-6850.4.patch, With_Issue.png, 
> With_Patch.png, With_Patch_withData.png
>
>
> When performance testing tez shuffle handler (TEZ-3334), it was noticed the 
> keep-alive connections are closed from the server-side. The client silently 
> recovers and logs the connection as keep-alive, despite reestablishing a 
> connection. This jira aims to remove the close from the server side, fixing 
> the bug preventing keep-alive connections.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6839) TestRecovery.testCrashed failed

2017-02-28 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888649#comment-15888649
 ] 

Haibo Chen commented on MAPREDUCE-6839:
---

Thanks [~pbacsko] for the addiontial information. Can you be more specific 
about what will be affected if the directory is not cleaned properly or how? 
Off the top of my head, I cannot think of why staging directory problem could 
delay task transition from Scheduled to Running state. If the staging directory 
is not directly causing testCrashed to fail, I'd like we include the app.stop 
fix in a separate patch.

> TestRecovery.testCrashed failed
> ---
>
> Key: MAPREDUCE-6839
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6839
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Gergő Pásztor
>Assignee: Gergő Pásztor
> Attachments: MAPREDUCE-6839_v1.patch, MAPREDUCE-6839_v2.patch, 
> MAPREDUCE-6839_v3.patch
>
>
> TestRecovery#testCrashed is a flaky test.
> Error Message: 
> Reduce Task state not correct expected: but was:
> Stack Trace:
> java.lang.AssertionError: Reduce Task state not correct expected: 
> but was: 
> at org.junit.Assert.fail(Assert.java:88) 
> at org.junit.Assert.failNotEquals(Assert.java:743) 
> at org.junit.Assert.assertEquals(Assert.java:118) 
> at 
> org.apache.hadoop.mapreduce.v2.app.TestRecovery.testCrashed(TestRecovery.java:164)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org