[jira] [Commented] (MAPREDUCE-6654) Possible NPE in JobHistoryEventHandler#handleEvent

2018-08-28 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16595933#comment-16595933
 ] 

Sunil Govindan commented on MAPREDUCE-6654:
---

Hi [~djp]

As this jira is marked for 3.2 as a critical, cud u pls help to take this 
forward or move out if its not feasible to finish in coming weeks. 3.2 code 
freeze date is nearby in a weeks. Kindly help to check the same.

> Possible NPE in JobHistoryEventHandler#handleEvent
> --
>
> Key: MAPREDUCE-6654
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6654
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Xiao Chen
>Assignee: Junping Du
>Priority: Critical
> Attachments: MAPREDUCE-6654-v2.1.patch, MAPREDUCE-6654-v2.patch, 
> MAPREDUCE-6654.patch
>
>
> I have seen NPE thrown from {{JobHistoryEventHandler#handleEvent}}:
> {noformat}
> 2016-03-14 16:42:15,231 INFO [Thread-69] 
> org.apache.hadoop.service.AbstractService: Service JobHistoryEventHandler 
> failed in state STOPPED; cause: java.lang.NullPointerException
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:570)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceStop(JobHistoryEventHandler.java:382)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
>   at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1651)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1147)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.shutDownJob(MRAppMaster.java:573)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler$1.run(MRAppMaster.java:620)
> {noformat}
> In the version this exception is thrown, the 
> [line|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java#L586]
>  is:
> {code:java}mi.writeEvent(historyEvent);{code}
> IMHO, this may be caused by an exception in a previous step. Specifically, in 
> the kerberized environment, when creating event writer which calls to decrypt 
> EEK, the connection to KMS failed. Exception below:
> {noformat} 
> 2016-03-14 16:41:57,559 ERROR [eventHandlingThread] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Error 
> JobHistoryEventHandler in handleEvent: EventType: AM_STARTED
> java.net.SocketTimeoutException: Read timed out
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.read(SocketInputStream.java:152)
>   at java.net.SocketInputStream.read(SocketInputStream.java:122)
>   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
>   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
>   at 
> java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
>   at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:520)
>   at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:505)
>   at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:779)
>   at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$3.call(LoadBalancingKMSClientProvider.java:185)
>   at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$3.call(LoadBalancingKMSClientProvider.java:181)
>   at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.doOp(LoadBalancingKMSClientProvider.java:94)
>   at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.decryptEncryptedKey(LoadBalancingKMSClientProvider.java:181)
>   at 
> org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.decryptEncryptedKey(KeyProviderCryptoExtension.java:388)
>   at 
> 

[jira] [Commented] (MAPREDUCE-6315) Implement retrieval of logs for crashed MR-AM via jhist in the staging directory

2018-08-28 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16595918#comment-16595918
 ] 

Sunil Govindan commented on MAPREDUCE-6315:
---

Hi [~jira.shegalov]

As this jira is marked for 3.2 as a critical, cud u pls help to take this 
forward or move out if its not feasible to finish in coming weeks. 3.2 code 
freeze date is nearby in a weeks. Kindly help to check the same.

> Implement retrieval of logs for crashed MR-AM via jhist in the staging 
> directory
> 
>
> Key: MAPREDUCE-6315
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6315
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client, mr-am
>Affects Versions: 2.7.0
>Reporter: Gera Shegalov
>Priority: Critical
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-6315.001.patch, MAPREDUCE-6315.002.patch, 
> MAPREDUCE-6315.003.patch
>
>
> When all AM attempts crash, there is no record of them in JHS. Thus no easy 
> way to get the logs. This JIRA automates the procedure by utilizing the jhist 
> file in the staging directory. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7131) Job History Server has race condition where it moves files from intermediate to finished but thinks file is in intermediate

2018-08-28 Thread Anthony Hsu (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16595805#comment-16595805
 ] 

Anthony Hsu commented on MAPREDUCE-7131:


Hmm, will investigate why the unit test I added fails during the Jenkins build 
but not locally.

> Job History Server has race condition where it moves files from intermediate 
> to finished but thinks file is in intermediate
> ---
>
> Key: MAPREDUCE-7131
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7131
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.4
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
>Priority: Major
> Attachments: MAPREDUCE-7131.1.patch, MAPREDUCE-7131.2.patch
>
>
> This is the race condition that can occur:
> # during the first *scanIntermediateDirectory()*, 
> *HistoryFileInfo.moveToDone()* is scheduled for job j1
> # during the second *scanIntermediateDirectory()*, j1 is found again and put 
> in the *fileStatusList* to process
> # *HistoryFileInfo.moveToDone()* is processed in another thread and history 
> files are moved to the finished directory
> # the *HistoryFileInfo* for j1 is removed from *jobListCache*
> # the j1 in *fileStatusList* is processed and a new *HistoryFileInfo* for j1 
> is created (history, conf, and summary files will point to the intermediate 
> user directory, and state will be IN_INTERMEDIATE) and added to the 
> *jobListCache*
> # *moveToDone()* is scheduled for this new j1
> # *moveToDone()* fails during *moveToDoneNow()* for the history file because 
> the source path in the intermediate directory does not exist
> From this point on, while the new j1 *HistoryFileInfo* is in the 
> *jobListCache*, the JobHistoryServer will think the history file is in the 
> intermediate directory. If a user queries this job in the JobHistoryServer 
> UI, they will get
> {code}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Could not load 
> history file 
> ://:/mr-history/intermediate//job_1529348381246_27275711-1535123223269---1535127026668-1-0-SUCCEEDED--1535126980787.jhist
> {code}
> Noticed this issue while running 2.7.4, but the race condition seems to still 
> exist in trunk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7131) Job History Server has race condition where it moves files from intermediate to finished but thinks file is in intermediate

2018-08-28 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16595708#comment-16595708
 ] 

Hadoop QA commented on MAPREDUCE-7131:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  7s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 32s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  3m 44s{color} 
| {color:red} hadoop-mapreduce-client-hs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
39s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 59m 12s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.mapreduce.v2.hs.TestHistoryFileManager |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | MAPREDUCE-7131 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12937521/MAPREDUCE-7131.2.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 47831ab24fa9 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / c5629d5 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7464/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-hs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7464/testReport/ |
| Max. process+thread count | 334 (vs. ulimit of 1) |
| modules | C: 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs U: 

[jira] [Updated] (MAPREDUCE-7131) Job History Server has race condition where it moves files from intermediate to finished but thinks file is in intermediate

2018-08-28 Thread Anthony Hsu (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated MAPREDUCE-7131:
---
Status: Patch Available  (was: Open)

Uploaded a [new patch|^MAPREDUCE-7131.2.patch] that addresses Checkstyle issues 
(lines exceeding 80 characters). Could not reproduce unit test failure locally.

> Job History Server has race condition where it moves files from intermediate 
> to finished but thinks file is in intermediate
> ---
>
> Key: MAPREDUCE-7131
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7131
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.4
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
>Priority: Major
> Attachments: MAPREDUCE-7131.1.patch, MAPREDUCE-7131.2.patch
>
>
> This is the race condition that can occur:
> # during the first *scanIntermediateDirectory()*, 
> *HistoryFileInfo.moveToDone()* is scheduled for job j1
> # during the second *scanIntermediateDirectory()*, j1 is found again and put 
> in the *fileStatusList* to process
> # *HistoryFileInfo.moveToDone()* is processed in another thread and history 
> files are moved to the finished directory
> # the *HistoryFileInfo* for j1 is removed from *jobListCache*
> # the j1 in *fileStatusList* is processed and a new *HistoryFileInfo* for j1 
> is created (history, conf, and summary files will point to the intermediate 
> user directory, and state will be IN_INTERMEDIATE) and added to the 
> *jobListCache*
> # *moveToDone()* is scheduled for this new j1
> # *moveToDone()* fails during *moveToDoneNow()* for the history file because 
> the source path in the intermediate directory does not exist
> From this point on, while the new j1 *HistoryFileInfo* is in the 
> *jobListCache*, the JobHistoryServer will think the history file is in the 
> intermediate directory. If a user queries this job in the JobHistoryServer 
> UI, they will get
> {code}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Could not load 
> history file 
> ://:/mr-history/intermediate//job_1529348381246_27275711-1535123223269---1535127026668-1-0-SUCCEEDED--1535126980787.jhist
> {code}
> Noticed this issue while running 2.7.4, but the race condition seems to still 
> exist in trunk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7131) Job History Server has race condition where it moves files from intermediate to finished but thinks file is in intermediate

2018-08-28 Thread Anthony Hsu (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated MAPREDUCE-7131:
---
Attachment: MAPREDUCE-7131.2.patch

> Job History Server has race condition where it moves files from intermediate 
> to finished but thinks file is in intermediate
> ---
>
> Key: MAPREDUCE-7131
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7131
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.4
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
>Priority: Major
> Attachments: MAPREDUCE-7131.1.patch, MAPREDUCE-7131.2.patch
>
>
> This is the race condition that can occur:
> # during the first *scanIntermediateDirectory()*, 
> *HistoryFileInfo.moveToDone()* is scheduled for job j1
> # during the second *scanIntermediateDirectory()*, j1 is found again and put 
> in the *fileStatusList* to process
> # *HistoryFileInfo.moveToDone()* is processed in another thread and history 
> files are moved to the finished directory
> # the *HistoryFileInfo* for j1 is removed from *jobListCache*
> # the j1 in *fileStatusList* is processed and a new *HistoryFileInfo* for j1 
> is created (history, conf, and summary files will point to the intermediate 
> user directory, and state will be IN_INTERMEDIATE) and added to the 
> *jobListCache*
> # *moveToDone()* is scheduled for this new j1
> # *moveToDone()* fails during *moveToDoneNow()* for the history file because 
> the source path in the intermediate directory does not exist
> From this point on, while the new j1 *HistoryFileInfo* is in the 
> *jobListCache*, the JobHistoryServer will think the history file is in the 
> intermediate directory. If a user queries this job in the JobHistoryServer 
> UI, they will get
> {code}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Could not load 
> history file 
> ://:/mr-history/intermediate//job_1529348381246_27275711-1535123223269---1535127026668-1-0-SUCCEEDED--1535126980787.jhist
> {code}
> Noticed this issue while running 2.7.4, but the race condition seems to still 
> exist in trunk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7131) Job History Server has race condition where it moves files from intermediate to finished but thinks file is in intermediate

2018-08-28 Thread Anthony Hsu (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated MAPREDUCE-7131:
---
Status: Open  (was: Patch Available)

> Job History Server has race condition where it moves files from intermediate 
> to finished but thinks file is in intermediate
> ---
>
> Key: MAPREDUCE-7131
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7131
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.4
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
>Priority: Major
> Attachments: MAPREDUCE-7131.1.patch, MAPREDUCE-7131.2.patch
>
>
> This is the race condition that can occur:
> # during the first *scanIntermediateDirectory()*, 
> *HistoryFileInfo.moveToDone()* is scheduled for job j1
> # during the second *scanIntermediateDirectory()*, j1 is found again and put 
> in the *fileStatusList* to process
> # *HistoryFileInfo.moveToDone()* is processed in another thread and history 
> files are moved to the finished directory
> # the *HistoryFileInfo* for j1 is removed from *jobListCache*
> # the j1 in *fileStatusList* is processed and a new *HistoryFileInfo* for j1 
> is created (history, conf, and summary files will point to the intermediate 
> user directory, and state will be IN_INTERMEDIATE) and added to the 
> *jobListCache*
> # *moveToDone()* is scheduled for this new j1
> # *moveToDone()* fails during *moveToDoneNow()* for the history file because 
> the source path in the intermediate directory does not exist
> From this point on, while the new j1 *HistoryFileInfo* is in the 
> *jobListCache*, the JobHistoryServer will think the history file is in the 
> intermediate directory. If a user queries this job in the JobHistoryServer 
> UI, they will get
> {code}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Could not load 
> history file 
> ://:/mr-history/intermediate//job_1529348381246_27275711-1535123223269---1535127026668-1-0-SUCCEEDED--1535126980787.jhist
> {code}
> Noticed this issue while running 2.7.4, but the race condition seems to still 
> exist in trunk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7131) Job History Server has race condition where it moves files from intermediate to finished but thinks file is in intermediate

2018-08-28 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16595557#comment-16595557
 ] 

Hadoop QA commented on MAPREDUCE-7131:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
29s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
 9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 25s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 11s{color} | {color:orange} 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs: 
The patch generated 9 new + 21 unchanged - 0 fixed = 30 total (was 21) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 56s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  3m 34s{color} 
| {color:red} hadoop-mapreduce-client-hs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 51m 33s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.mapreduce.v2.hs.TestHistoryFileManager |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | MAPREDUCE-7131 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12937495/MAPREDUCE-7131.1.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 4cd0403a280a 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 
07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / cb9d371 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7463/artifact/out/diff-checkstyle-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-hs.txt
 |
| unit | 

[jira] [Comment Edited] (MAPREDUCE-7131) Job History Server has race condition where it moves files from intermediate to finished but thinks file is in intermediate

2018-08-28 Thread Anthony Hsu (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16595451#comment-16595451
 ] 

Anthony Hsu edited comment on MAPREDUCE-7131 at 8/28/18 7:19 PM:
-

Attached a patch that fixes this issue: [^MAPREDUCE-7131.1.patch]. [~jlowe], 
[~pbacsko], [~varun_saxena], could you help review?

My approach is basically to have *moveToDoneNow()* handle the case where the 
move has already happened by ignoring the FileNotFoundException.


was (Author: erwaman):
Attached a patch that fixes this issue: [^MAPREDUCE-7131.1.patch]. [~pbacsko], 
[~varun_saxena], could you help review?

My approach is basically to have *moveToDoneNow()* handle the case where the 
move has already happened by ignoring the FileNotFoundException.

> Job History Server has race condition where it moves files from intermediate 
> to finished but thinks file is in intermediate
> ---
>
> Key: MAPREDUCE-7131
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7131
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.4
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
>Priority: Major
> Attachments: MAPREDUCE-7131.1.patch
>
>
> This is the race condition that can occur:
> # during the first *scanIntermediateDirectory()*, 
> *HistoryFileInfo.moveToDone()* is scheduled for job j1
> # during the second *scanIntermediateDirectory()*, j1 is found again and put 
> in the *fileStatusList* to process
> # *HistoryFileInfo.moveToDone()* is processed in another thread and history 
> files are moved to the finished directory
> # the *HistoryFileInfo* for j1 is removed from *jobListCache*
> # the j1 in *fileStatusList* is processed and a new *HistoryFileInfo* for j1 
> is created (history, conf, and summary files will point to the intermediate 
> user directory, and state will be IN_INTERMEDIATE) and added to the 
> *jobListCache*
> # *moveToDone()* is scheduled for this new j1
> # *moveToDone()* fails during *moveToDoneNow()* for the history file because 
> the source path in the intermediate directory does not exist
> From this point on, while the new j1 *HistoryFileInfo* is in the 
> *jobListCache*, the JobHistoryServer will think the history file is in the 
> intermediate directory. If a user queries this job in the JobHistoryServer 
> UI, they will get
> {code}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Could not load 
> history file 
> ://:/mr-history/intermediate//job_1529348381246_27275711-1535123223269---1535127026668-1-0-SUCCEEDED--1535126980787.jhist
> {code}
> Noticed this issue while running 2.7.4, but the race condition seems to still 
> exist in trunk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7131) Job History Server has race condition where it moves files from intermediate to finished but thinks file is in intermediate

2018-08-28 Thread Anthony Hsu (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated MAPREDUCE-7131:
---
Status: Patch Available  (was: Open)

Attached a patch that fixes this issue: [^MAPREDUCE-7131.1.patch]. [~pbacsko], 
[~varun_saxena], could you help review?

> Job History Server has race condition where it moves files from intermediate 
> to finished but thinks file is in intermediate
> ---
>
> Key: MAPREDUCE-7131
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7131
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.4
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
>Priority: Major
> Attachments: MAPREDUCE-7131.1.patch
>
>
> This is the race condition that can occur:
> # during the first *scanIntermediateDirectory()*, 
> *HistoryFileInfo.moveToDone()* is scheduled for job j1
> # during the second *scanIntermediateDirectory()*, j1 is found again and put 
> in the *fileStatusList* to process
> # *HistoryFileInfo.moveToDone()* is processed in another thread and history 
> files are moved to the finished directory
> # the *HistoryFileInfo* for j1 is removed from *jobListCache*
> # the j1 in *fileStatusList* is processed and a new *HistoryFileInfo* for j1 
> is created (history, conf, and summary files will point to the intermediate 
> user directory, and state will be IN_INTERMEDIATE) and added to the 
> *jobListCache*
> # *moveToDone()* is scheduled for this new j1
> # *moveToDone()* fails during *moveToDoneNow()* for the history file because 
> the source path in the intermediate directory does not exist
> From this point on, while the new j1 *HistoryFileInfo* is in the 
> *jobListCache*, the JobHistoryServer will think the history file is in the 
> intermediate directory. If a user queries this job in the JobHistoryServer 
> UI, they will get
> {code}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Could not load 
> history file 
> ://:/mr-history/intermediate//job_1529348381246_27275711-1535123223269---1535127026668-1-0-SUCCEEDED--1535126980787.jhist
> {code}
> Noticed this issue while running 2.7.4, but the race condition seems to still 
> exist in trunk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (MAPREDUCE-7131) Job History Server has race condition where it moves files from intermediate to finished but thinks file is in intermediate

2018-08-28 Thread Anthony Hsu (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16595451#comment-16595451
 ] 

Anthony Hsu edited comment on MAPREDUCE-7131 at 8/28/18 7:02 PM:
-

Attached a patch that fixes this issue: [^MAPREDUCE-7131.1.patch]. [~pbacsko], 
[~varun_saxena], could you help review?

My approach is basically to have *moveToDoneNow()* handle the case where the 
move has already happened by ignoring the FileNotFoundException.


was (Author: erwaman):
Attached a patch that fixes this issue: [^MAPREDUCE-7131.1.patch]. [~pbacsko], 
[~varun_saxena], could you help review?

> Job History Server has race condition where it moves files from intermediate 
> to finished but thinks file is in intermediate
> ---
>
> Key: MAPREDUCE-7131
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7131
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.4
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
>Priority: Major
> Attachments: MAPREDUCE-7131.1.patch
>
>
> This is the race condition that can occur:
> # during the first *scanIntermediateDirectory()*, 
> *HistoryFileInfo.moveToDone()* is scheduled for job j1
> # during the second *scanIntermediateDirectory()*, j1 is found again and put 
> in the *fileStatusList* to process
> # *HistoryFileInfo.moveToDone()* is processed in another thread and history 
> files are moved to the finished directory
> # the *HistoryFileInfo* for j1 is removed from *jobListCache*
> # the j1 in *fileStatusList* is processed and a new *HistoryFileInfo* for j1 
> is created (history, conf, and summary files will point to the intermediate 
> user directory, and state will be IN_INTERMEDIATE) and added to the 
> *jobListCache*
> # *moveToDone()* is scheduled for this new j1
> # *moveToDone()* fails during *moveToDoneNow()* for the history file because 
> the source path in the intermediate directory does not exist
> From this point on, while the new j1 *HistoryFileInfo* is in the 
> *jobListCache*, the JobHistoryServer will think the history file is in the 
> intermediate directory. If a user queries this job in the JobHistoryServer 
> UI, they will get
> {code}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Could not load 
> history file 
> ://:/mr-history/intermediate//job_1529348381246_27275711-1535123223269---1535127026668-1-0-SUCCEEDED--1535126980787.jhist
> {code}
> Noticed this issue while running 2.7.4, but the race condition seems to still 
> exist in trunk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7131) Job History Server has race condition where it moves files from intermediate to finished but thinks file is in intermediate

2018-08-28 Thread Anthony Hsu (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated MAPREDUCE-7131:
---
Attachment: MAPREDUCE-7131.1.patch

> Job History Server has race condition where it moves files from intermediate 
> to finished but thinks file is in intermediate
> ---
>
> Key: MAPREDUCE-7131
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7131
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.4
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
>Priority: Major
> Attachments: MAPREDUCE-7131.1.patch
>
>
> This is the race condition that can occur:
> # during the first *scanIntermediateDirectory()*, 
> *HistoryFileInfo.moveToDone()* is scheduled for job j1
> # during the second *scanIntermediateDirectory()*, j1 is found again and put 
> in the *fileStatusList* to process
> # *HistoryFileInfo.moveToDone()* is processed in another thread and history 
> files are moved to the finished directory
> # the *HistoryFileInfo* for j1 is removed from *jobListCache*
> # the j1 in *fileStatusList* is processed and a new *HistoryFileInfo* for j1 
> is created (history, conf, and summary files will point to the intermediate 
> user directory, and state will be IN_INTERMEDIATE) and added to the 
> *jobListCache*
> # *moveToDone()* is scheduled for this new j1
> # *moveToDone()* fails during *moveToDoneNow()* for the history file because 
> the source path in the intermediate directory does not exist
> From this point on, while the new j1 *HistoryFileInfo* is in the 
> *jobListCache*, the JobHistoryServer will think the history file is in the 
> intermediate directory. If a user queries this job in the JobHistoryServer 
> UI, they will get
> {code}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Could not load 
> history file 
> ://:/mr-history/intermediate//job_1529348381246_27275711-1535123223269---1535127026668-1-0-SUCCEEDED--1535126980787.jhist
> {code}
> Noticed this issue while running 2.7.4, but the race condition seems to still 
> exist in trunk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7131) Job History Server has race condition where it moves files from intermediate to finished but thinks file is in intermediate

2018-08-28 Thread Anthony Hsu (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated MAPREDUCE-7131:
---
Description: 
This is the race condition that can occur:

# during the first *scanIntermediateDirectory()*, 
*HistoryFileInfo.moveToDone()* is scheduled for job j1
# during the second *scanIntermediateDirectory()*, j1 is found again and put in 
the *fileStatusList* to process
# *HistoryFileInfo.moveToDone()* is processed in another thread and history 
files are moved to the finished directory
# the *HistoryFileInfo* for j1 is removed from *jobListCache*
# the j1 in *fileStatusList* is processed and a new *HistoryFileInfo* for j1 is 
created (history, conf, and summary files will point to the intermediate user 
directory, and state will be IN_INTERMEDIATE) and added to the *jobListCache*
# *moveToDone()* is scheduled for this new j1
# *moveToDone()* fails during *moveToDoneNow()* for the history file because 
the source path in the intermediate directory does not exist

>From this point on, while the new j1 *HistoryFileInfo* is in the 
>*jobListCache*, the JobHistoryServer will think the history file is in the 
>intermediate directory. If a user queries this job in the JobHistoryServer UI, 
>they will get

{code}
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Could not load history 
file 
://:/mr-history/intermediate//job_1529348381246_27275711-1535123223269---1535127026668-1-0-SUCCEEDED--1535126980787.jhist
{code}

Noticed this issue while running 2.7.4, but the race condition seems to still 
exist in trunk.

  was:
This is the race condition that can occur:

# during the first *scanIntermediateDirectory()*, 
*HistoryFileInfo.moveToDone()* is scheduled for job j1
# during the second *scanIntermediateDirectory()*, j1 is found again and put in 
the *fileStatusList* to process
# *HistoryFileInfo.moveToDone()* is processed in another thread and history 
files are moved to the finished directory
# the *HistoryFileInfo* for j1 is removed from *jobListCache*
# the j1 in *fileStatusList* is processed and a new *HistoryFileInfo* for j1 is 
created (history, conf, and summary files will point to the intermediate user 
directory, and state will be IN_INTERMEDIATE)
# *moveToDone()* is scheduled for this new j1
# *moveToDone()* fails during *moveToDoneNow()* for the history file because 
the source path in the intermediate directory does not exist

>From this point on, while the new j1 *HistoryFileInfo* is in the 
>*jobListCache*, the JobHistoryServer will think the history file is in the 
>intermediate directory. If a user queries this job in the JobHistoryServer UI, 
>they will get

{code}
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Could not load history 
file 
://:/mr-history/intermediate//job_1529348381246_27275711-1535123223269---1535127026668-1-0-SUCCEEDED--1535126980787.jhist
{code}

Noticed this issue while running 2.7.4, but the race condition seems to still 
exist in trunk.


> Job History Server has race condition where it moves files from intermediate 
> to finished but thinks file is in intermediate
> ---
>
> Key: MAPREDUCE-7131
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7131
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.4
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
>Priority: Major
>
> This is the race condition that can occur:
> # during the first *scanIntermediateDirectory()*, 
> *HistoryFileInfo.moveToDone()* is scheduled for job j1
> # during the second *scanIntermediateDirectory()*, j1 is found again and put 
> in the *fileStatusList* to process
> # *HistoryFileInfo.moveToDone()* is processed in another thread and history 
> files are moved to the finished directory
> # the *HistoryFileInfo* for j1 is removed from *jobListCache*
> # the j1 in *fileStatusList* is processed and a new *HistoryFileInfo* for j1 
> is created (history, conf, and summary files will point to the intermediate 
> user directory, and state will be IN_INTERMEDIATE) and added to the 
> *jobListCache*
> # *moveToDone()* is scheduled for this new j1
> # *moveToDone()* fails during *moveToDoneNow()* for the history file because 
> the source path in the intermediate directory does not exist
> From this point on, while the new j1 *HistoryFileInfo* is in the 
> *jobListCache*, the JobHistoryServer will think the history file is in the 
> intermediate directory. If a user queries this job in the JobHistoryServer 
> UI, they will get
> {code}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Could not load 
> history file 
> ://:/mr-history/intermediate//job_1529348381246_27275711-1535123223269---1535127026668-1-0-SUCCEEDED--1535126980787.jhist
> {code}
> 

[jira] [Commented] (MAPREDUCE-7131) Job History Server has race condition where it moves files from intermediate to finished but thinks file is in intermediate

2018-08-28 Thread Erik Krogen (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16595181#comment-16595181
 ] 

Erik Krogen commented on MAPREDUCE-7131:


[~pbacsko], we are seeing the issue in 2.7.4, and MAPREDUCE-7015 is only as far 
back as 2.10, so it should not be the cause.

> Job History Server has race condition where it moves files from intermediate 
> to finished but thinks file is in intermediate
> ---
>
> Key: MAPREDUCE-7131
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7131
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.4
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
>Priority: Major
>
> This is the race condition that can occur:
> # during the first *scanIntermediateDirectory()*, 
> *HistoryFileInfo.moveToDone()* is scheduled for job j1
> # during the second *scanIntermediateDirectory()*, j1 is found again and put 
> in the *fileStatusList* to process
> # *HistoryFileInfo.moveToDone()* is processed in another thread and history 
> files are moved to the finished directory
> # the *HistoryFileInfo* for j1 is removed from *jobListCache*
> # the j1 in *fileStatusList* is processed and a new *HistoryFileInfo* for j1 
> is created (history, conf, and summary files will point to the intermediate 
> user directory, and state will be IN_INTERMEDIATE)
> # *moveToDone()* is scheduled for this new j1
> # *moveToDone()* fails during *moveToDoneNow()* for the history file because 
> the source path in the intermediate directory does not exist
> From this point on, while the new j1 *HistoryFileInfo* is in the 
> *jobListCache*, the JobHistoryServer will think the history file is in the 
> intermediate directory. If a user queries this job in the JobHistoryServer 
> UI, they will get
> {code}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Could not load 
> history file 
> ://:/mr-history/intermediate//job_1529348381246_27275711-1535123223269---1535127026668-1-0-SUCCEEDED--1535126980787.jhist
> {code}
> Noticed this issue while running 2.7.4, but the race condition seems to still 
> exist in trunk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7131) Job History Server has race condition where it moves files from intermediate to finished but thinks file is in intermediate

2018-08-28 Thread Peter Bacsko (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16594793#comment-16594793
 ] 

Peter Bacsko commented on MAPREDUCE-7131:
-

Hi guys, I made some threading-related changes (ironically, to mitigate a race) 
in JHS back in January: MAPREDUCE-7015. Could that be related?

> Job History Server has race condition where it moves files from intermediate 
> to finished but thinks file is in intermediate
> ---
>
> Key: MAPREDUCE-7131
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7131
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.4
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
>Priority: Major
>
> This is the race condition that can occur:
> # during the first *scanIntermediateDirectory()*, 
> *HistoryFileInfo.moveToDone()* is scheduled for job j1
> # during the second *scanIntermediateDirectory()*, j1 is found again and put 
> in the *fileStatusList* to process
> # *HistoryFileInfo.moveToDone()* is processed in another thread and history 
> files are moved to the finished directory
> # the *HistoryFileInfo* for j1 is removed from *jobListCache*
> # the j1 in *fileStatusList* is processed and a new *HistoryFileInfo* for j1 
> is created (history, conf, and summary files will point to the intermediate 
> user directory, and state will be IN_INTERMEDIATE)
> # *moveToDone()* is scheduled for this new j1
> # *moveToDone()* fails during *moveToDoneNow()* for the history file because 
> the source path in the intermediate directory does not exist
> From this point on, while the new j1 *HistoryFileInfo* is in the 
> *jobListCache*, the JobHistoryServer will think the history file is in the 
> intermediate directory. If a user queries this job in the JobHistoryServer 
> UI, they will get
> {code}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Could not load 
> history file 
> ://:/mr-history/intermediate//job_1529348381246_27275711-1535123223269---1535127026668-1-0-SUCCEEDED--1535126980787.jhist
> {code}
> Noticed this issue while running 2.7.4, but the race condition seems to still 
> exist in trunk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7131) Job History Server has race condition where it moves files from intermediate to finished but thinks file is in intermediate

2018-08-28 Thread Varun Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16594688#comment-16594688
 ] 

Varun Saxena commented on MAPREDUCE-7131:
-

[~erwaman], added you to the list of contributors and assigned the JIRA to you

> Job History Server has race condition where it moves files from intermediate 
> to finished but thinks file is in intermediate
> ---
>
> Key: MAPREDUCE-7131
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7131
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.4
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
>Priority: Major
>
> This is the race condition that can occur:
> # during the first *scanIntermediateDirectory()*, 
> *HistoryFileInfo.moveToDone()* is scheduled for job j1
> # during the second *scanIntermediateDirectory()*, j1 is found again and put 
> in the *fileStatusList* to process
> # *HistoryFileInfo.moveToDone()* is processed in another thread and history 
> files are moved to the finished directory
> # the *HistoryFileInfo* for j1 is removed from *jobListCache*
> # the j1 in *fileStatusList* is processed and a new *HistoryFileInfo* for j1 
> is created (history, conf, and summary files will point to the intermediate 
> user directory, and state will be IN_INTERMEDIATE)
> # *moveToDone()* is scheduled for this new j1
> # *moveToDone()* fails during *moveToDoneNow()* for the history file because 
> the source path in the intermediate directory does not exist
> From this point on, while the new j1 *HistoryFileInfo* is in the 
> *jobListCache*, the JobHistoryServer will think the history file is in the 
> intermediate directory. If a user queries this job in the JobHistoryServer 
> UI, they will get
> {code}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Could not load 
> history file 
> ://:/mr-history/intermediate//job_1529348381246_27275711-1535123223269---1535127026668-1-0-SUCCEEDED--1535126980787.jhist
> {code}
> Noticed this issue while running 2.7.4, but the race condition seems to still 
> exist in trunk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Assigned] (MAPREDUCE-7131) Job History Server has race condition where it moves files from intermediate to finished but thinks file is in intermediate

2018-08-28 Thread Varun Saxena (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena reassigned MAPREDUCE-7131:
---

Assignee: Anthony Hsu

> Job History Server has race condition where it moves files from intermediate 
> to finished but thinks file is in intermediate
> ---
>
> Key: MAPREDUCE-7131
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7131
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.4
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
>Priority: Major
>
> This is the race condition that can occur:
> # during the first *scanIntermediateDirectory()*, 
> *HistoryFileInfo.moveToDone()* is scheduled for job j1
> # during the second *scanIntermediateDirectory()*, j1 is found again and put 
> in the *fileStatusList* to process
> # *HistoryFileInfo.moveToDone()* is processed in another thread and history 
> files are moved to the finished directory
> # the *HistoryFileInfo* for j1 is removed from *jobListCache*
> # the j1 in *fileStatusList* is processed and a new *HistoryFileInfo* for j1 
> is created (history, conf, and summary files will point to the intermediate 
> user directory, and state will be IN_INTERMEDIATE)
> # *moveToDone()* is scheduled for this new j1
> # *moveToDone()* fails during *moveToDoneNow()* for the history file because 
> the source path in the intermediate directory does not exist
> From this point on, while the new j1 *HistoryFileInfo* is in the 
> *jobListCache*, the JobHistoryServer will think the history file is in the 
> intermediate directory. If a user queries this job in the JobHistoryServer 
> UI, they will get
> {code}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Could not load 
> history file 
> ://:/mr-history/intermediate//job_1529348381246_27275711-1535123223269---1535127026668-1-0-SUCCEEDED--1535126980787.jhist
> {code}
> Noticed this issue while running 2.7.4, but the race condition seems to still 
> exist in trunk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7131) Job History Server has race condition where it moves files from intermediate to finished but thinks file is in intermediate

2018-08-28 Thread Anthony Hsu (JIRA)
Anthony Hsu created MAPREDUCE-7131:
--

 Summary: Job History Server has race condition where it moves 
files from intermediate to finished but thinks file is in intermediate
 Key: MAPREDUCE-7131
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7131
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.7.4
Reporter: Anthony Hsu


This is the race condition that can occur:

# during the first *scanIntermediateDirectory()*, 
*HistoryFileInfo.moveToDone()* is scheduled for job j1
# during the second *scanIntermediateDirectory()*, j1 is found again and put in 
the *fileStatusList* to process
# *HistoryFileInfo.moveToDone()* is processed in another thread and history 
files are moved to the finished directory
# the *HistoryFileInfo* for j1 is removed from *jobListCache*
# the j1 in *fileStatusList* is processed and a new *HistoryFileInfo* for j1 is 
created (history, conf, and summary files will point to the intermediate user 
directory, and state will be IN_INTERMEDIATE)
# *moveToDone()* is scheduled for this new j1
# *moveToDone()* fails during *moveToDoneNow()* for the history file because 
the source path in the intermediate directory does not exist

>From this point on, while the new j1 *HistoryFileInfo* is in the 
>*jobListCache*, the JobHistoryServer will think the history file is in the 
>intermediate directory. If a user queries this job in the JobHistoryServer UI, 
>they will get

{code}
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Could not load history 
file 
://:/mr-history/intermediate//job_1529348381246_27275711-1535123223269---1535127026668-1-0-SUCCEEDED--1535126980787.jhist
{code}

Noticed this issue while running 2.7.4, but the race condition seems to still 
exist in trunk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org