[jira] [Commented] (MAPREDUCE-6436) JobHistory cache issue

2015-12-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059858#comment-15059858
 ] 

Hudson commented on MAPREDUCE-6436:
---

ABORTED: Integrated in Hadoop-Hdfs-trunk-Java8 #698 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/698/])
Update CHANGES.txt to move MAPREDUCE-6436 from YARN to MAPREDUCE (zxu: rev 
7092d47fc0b3b792dd31f967c01d460dc089f60b)
* hadoop-yarn-project/CHANGES.txt
* hadoop-mapreduce-project/CHANGES.txt


> JobHistory cache issue
> --
>
> Key: MAPREDUCE-6436
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Ryu Kobayashi
>Assignee: Kai Sasaki
>Priority: Blocker
> Fix For: 2.8.0, 2.7.3, 2.6.4
>
> Attachments: MAPREDUCE-6436.1.patch, MAPREDUCE-6436.2.patch, 
> MAPREDUCE-6436.3.patch, MAPREDUCE-6436.4.patch, stacktrace1.txt, 
> stacktrace2.txt, stacktrace3.txt
>
>
> Problem: 
> HistoryFileManager.addIfAbsent produces large amount of logs if number of
> cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes
> larger than mapreduce.jobhistory.joblist.cache.size by far.
> Example:
> For example, if the cache contains 5 entries in total and 10,000 entries
> newer than mapreduce.jobhistory.max-age-ms where
> mapreduce.jobhistory.joblist.cache.size is 2, 
> HistoryFileManager.addIfAbsent
> method produces 5 - 2 = 3 lines of "Waiting to remove  from
> JobListCache because it is not in done yet" message.
> It will attach a stacktrace.
> Impact:
> In addition to large disk consumption, this issue blocks JobHistory.getJob
> long time and slows job execution down significantly because getJob is called
> by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport.
> This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded
> eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When
> multiple threads call scanIfNeeded simultaneously, one of them acquires lock
> and the other threads are blocked until the first thread completes 
> long-running
> HistoryFileManager.addIfAbsent call.
> Solution: 
> * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take 
> too long time.
> * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips
>   scanning if another thread is already scanning. This changes semantics of
>   some HistoryFileManager methods (such as getAllFileInfo and getFileInfo)
>   because scanIfNeeded keep outdated state.
> * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls 
> are
>   not blocked by a loop at scale of tens of thousands.
>  
> This patch implemented the first item.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6436) JobHistory cache issue

2015-12-15 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057625#comment-15057625
 ] 

zhihai xu commented on MAPREDUCE-6436:
--

Thanks [~djp] and [~lewuathe]! I changed it to a blocker, because it may let 
more people notice this potential performance issue.
+1 for the latest patch. Will commit it shortly.

> JobHistory cache issue
> --
>
> Key: MAPREDUCE-6436
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ryu Kobayashi
>Assignee: Kai Sasaki
> Attachments: MAPREDUCE-6436.1.patch, MAPREDUCE-6436.2.patch, 
> MAPREDUCE-6436.3.patch, MAPREDUCE-6436.4.patch, stacktrace1.txt, 
> stacktrace2.txt, stacktrace3.txt
>
>
> Problem: 
> HistoryFileManager.addIfAbsent produces large amount of logs if number of
> cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes
> larger than mapreduce.jobhistory.joblist.cache.size by far.
> Example:
> For example, if the cache contains 5 entries in total and 10,000 entries
> newer than mapreduce.jobhistory.max-age-ms where
> mapreduce.jobhistory.joblist.cache.size is 2, 
> HistoryFileManager.addIfAbsent
> method produces 5 - 2 = 3 lines of "Waiting to remove  from
> JobListCache because it is not in done yet" message.
> It will attach a stacktrace.
> Impact:
> In addition to large disk consumption, this issue blocks JobHistory.getJob
> long time and slows job execution down significantly because getJob is called
> by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport.
> This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded
> eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When
> multiple threads call scanIfNeeded simultaneously, one of them acquires lock
> and the other threads are blocked until the first thread completes 
> long-running
> HistoryFileManager.addIfAbsent call.
> Solution: 
> * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take 
> too long time.
> * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips
>   scanning if another thread is already scanning. This changes semantics of
>   some HistoryFileManager methods (such as getAllFileInfo and getFileInfo)
>   because scanIfNeeded keep outdated state.
> * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls 
> are
>   not blocked by a loop at scale of tens of thousands.
>  
> This patch implemented the first item.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6436) JobHistory cache issue

2015-12-15 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057703#comment-15057703
 ] 

zhihai xu commented on MAPREDUCE-6436:
--

Committed it to trunk, branch-2, branch-2.6 and branch-2.7! Thanks [~lewuathe] 
for the contributions! Thanks [~djp] for the additional review!

> JobHistory cache issue
> --
>
> Key: MAPREDUCE-6436
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Ryu Kobayashi
>Assignee: Kai Sasaki
>Priority: Blocker
> Fix For: 2.8.0, 2.7.3, 2.6.4
>
> Attachments: MAPREDUCE-6436.1.patch, MAPREDUCE-6436.2.patch, 
> MAPREDUCE-6436.3.patch, MAPREDUCE-6436.4.patch, stacktrace1.txt, 
> stacktrace2.txt, stacktrace3.txt
>
>
> Problem: 
> HistoryFileManager.addIfAbsent produces large amount of logs if number of
> cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes
> larger than mapreduce.jobhistory.joblist.cache.size by far.
> Example:
> For example, if the cache contains 5 entries in total and 10,000 entries
> newer than mapreduce.jobhistory.max-age-ms where
> mapreduce.jobhistory.joblist.cache.size is 2, 
> HistoryFileManager.addIfAbsent
> method produces 5 - 2 = 3 lines of "Waiting to remove  from
> JobListCache because it is not in done yet" message.
> It will attach a stacktrace.
> Impact:
> In addition to large disk consumption, this issue blocks JobHistory.getJob
> long time and slows job execution down significantly because getJob is called
> by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport.
> This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded
> eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When
> multiple threads call scanIfNeeded simultaneously, one of them acquires lock
> and the other threads are blocked until the first thread completes 
> long-running
> HistoryFileManager.addIfAbsent call.
> Solution: 
> * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take 
> too long time.
> * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips
>   scanning if another thread is already scanning. This changes semantics of
>   some HistoryFileManager methods (such as getAllFileInfo and getFileInfo)
>   because scanIfNeeded keep outdated state.
> * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls 
> are
>   not blocked by a loop at scale of tens of thousands.
>  
> This patch implemented the first item.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6436) JobHistory cache issue

2015-12-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057727#comment-15057727
 ] 

Hudson commented on MAPREDUCE-6436:
---

FAILURE: Integrated in Hadoop-trunk-Commit #8968 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8968/])
MAPREDUCE-6436. JobHistory cache issue. Contributed by Kai Sasaki (zxu: rev 
5b7078d06921893200163a3d29c8901c3c0107cb)
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java
* hadoop-yarn-project/CHANGES.txt


> JobHistory cache issue
> --
>
> Key: MAPREDUCE-6436
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Ryu Kobayashi
>Assignee: Kai Sasaki
>Priority: Blocker
> Fix For: 2.8.0, 2.7.3, 2.6.4
>
> Attachments: MAPREDUCE-6436.1.patch, MAPREDUCE-6436.2.patch, 
> MAPREDUCE-6436.3.patch, MAPREDUCE-6436.4.patch, stacktrace1.txt, 
> stacktrace2.txt, stacktrace3.txt
>
>
> Problem: 
> HistoryFileManager.addIfAbsent produces large amount of logs if number of
> cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes
> larger than mapreduce.jobhistory.joblist.cache.size by far.
> Example:
> For example, if the cache contains 5 entries in total and 10,000 entries
> newer than mapreduce.jobhistory.max-age-ms where
> mapreduce.jobhistory.joblist.cache.size is 2, 
> HistoryFileManager.addIfAbsent
> method produces 5 - 2 = 3 lines of "Waiting to remove  from
> JobListCache because it is not in done yet" message.
> It will attach a stacktrace.
> Impact:
> In addition to large disk consumption, this issue blocks JobHistory.getJob
> long time and slows job execution down significantly because getJob is called
> by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport.
> This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded
> eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When
> multiple threads call scanIfNeeded simultaneously, one of them acquires lock
> and the other threads are blocked until the first thread completes 
> long-running
> HistoryFileManager.addIfAbsent call.
> Solution: 
> * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take 
> too long time.
> * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips
>   scanning if another thread is already scanning. This changes semantics of
>   some HistoryFileManager methods (such as getAllFileInfo and getFileInfo)
>   because scanIfNeeded keep outdated state.
> * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls 
> are
>   not blocked by a loop at scale of tens of thousands.
>  
> This patch implemented the first item.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6436) JobHistory cache issue

2015-12-15 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058917#comment-15058917
 ] 

zhihai xu commented on MAPREDUCE-6436:
--

Thanks for the finding [~aw], Just know we branched out 2.8. Will commit it to 
branch-2.8 shortly.

> JobHistory cache issue
> --
>
> Key: MAPREDUCE-6436
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Ryu Kobayashi
>Assignee: Kai Sasaki
>Priority: Blocker
> Fix For: 2.8.0, 2.7.3, 2.6.4
>
> Attachments: MAPREDUCE-6436.1.patch, MAPREDUCE-6436.2.patch, 
> MAPREDUCE-6436.3.patch, MAPREDUCE-6436.4.patch, stacktrace1.txt, 
> stacktrace2.txt, stacktrace3.txt
>
>
> Problem: 
> HistoryFileManager.addIfAbsent produces large amount of logs if number of
> cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes
> larger than mapreduce.jobhistory.joblist.cache.size by far.
> Example:
> For example, if the cache contains 5 entries in total and 10,000 entries
> newer than mapreduce.jobhistory.max-age-ms where
> mapreduce.jobhistory.joblist.cache.size is 2, 
> HistoryFileManager.addIfAbsent
> method produces 5 - 2 = 3 lines of "Waiting to remove  from
> JobListCache because it is not in done yet" message.
> It will attach a stacktrace.
> Impact:
> In addition to large disk consumption, this issue blocks JobHistory.getJob
> long time and slows job execution down significantly because getJob is called
> by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport.
> This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded
> eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When
> multiple threads call scanIfNeeded simultaneously, one of them acquires lock
> and the other threads are blocked until the first thread completes 
> long-running
> HistoryFileManager.addIfAbsent call.
> Solution: 
> * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take 
> too long time.
> * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips
>   scanning if another thread is already scanning. This changes semantics of
>   some HistoryFileManager methods (such as getAllFileInfo and getFileInfo)
>   because scanIfNeeded keep outdated state.
> * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls 
> are
>   not blocked by a loop at scale of tens of thousands.
>  
> This patch implemented the first item.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6436) JobHistory cache issue

2015-12-15 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058744#comment-15058744
 ] 

Allen Wittenauer commented on MAPREDUCE-6436:
-

bq. Committed it to trunk, branch-2, branch-2.6 and branch-2.7! 

You missed branch-2.8...

> JobHistory cache issue
> --
>
> Key: MAPREDUCE-6436
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Ryu Kobayashi
>Assignee: Kai Sasaki
>Priority: Blocker
> Fix For: 2.8.0, 2.7.3, 2.6.4
>
> Attachments: MAPREDUCE-6436.1.patch, MAPREDUCE-6436.2.patch, 
> MAPREDUCE-6436.3.patch, MAPREDUCE-6436.4.patch, stacktrace1.txt, 
> stacktrace2.txt, stacktrace3.txt
>
>
> Problem: 
> HistoryFileManager.addIfAbsent produces large amount of logs if number of
> cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes
> larger than mapreduce.jobhistory.joblist.cache.size by far.
> Example:
> For example, if the cache contains 5 entries in total and 10,000 entries
> newer than mapreduce.jobhistory.max-age-ms where
> mapreduce.jobhistory.joblist.cache.size is 2, 
> HistoryFileManager.addIfAbsent
> method produces 5 - 2 = 3 lines of "Waiting to remove  from
> JobListCache because it is not in done yet" message.
> It will attach a stacktrace.
> Impact:
> In addition to large disk consumption, this issue blocks JobHistory.getJob
> long time and slows job execution down significantly because getJob is called
> by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport.
> This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded
> eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When
> multiple threads call scanIfNeeded simultaneously, one of them acquires lock
> and the other threads are blocked until the first thread completes 
> long-running
> HistoryFileManager.addIfAbsent call.
> Solution: 
> * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take 
> too long time.
> * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips
>   scanning if another thread is already scanning. This changes semantics of
>   some HistoryFileManager methods (such as getAllFileInfo and getFileInfo)
>   because scanIfNeeded keep outdated state.
> * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls 
> are
>   not blocked by a loop at scale of tens of thousands.
>  
> This patch implemented the first item.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6436) JobHistory cache issue

2015-12-15 Thread Kai Sasaki (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059254#comment-15059254
 ] 

Kai Sasaki commented on MAPREDUCE-6436:
---

[~zxu] Thank you so much!

Do we need to create another JIRA as a follow up of optimization of 
JobHistoryServer RPC to implement below?
{quote}
Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls are
not blocked by a loop at scale of tens of thousands.
{quote}

> JobHistory cache issue
> --
>
> Key: MAPREDUCE-6436
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Ryu Kobayashi
>Assignee: Kai Sasaki
>Priority: Blocker
> Fix For: 2.8.0, 2.7.3, 2.6.4
>
> Attachments: MAPREDUCE-6436.1.patch, MAPREDUCE-6436.2.patch, 
> MAPREDUCE-6436.3.patch, MAPREDUCE-6436.4.patch, stacktrace1.txt, 
> stacktrace2.txt, stacktrace3.txt
>
>
> Problem: 
> HistoryFileManager.addIfAbsent produces large amount of logs if number of
> cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes
> larger than mapreduce.jobhistory.joblist.cache.size by far.
> Example:
> For example, if the cache contains 5 entries in total and 10,000 entries
> newer than mapreduce.jobhistory.max-age-ms where
> mapreduce.jobhistory.joblist.cache.size is 2, 
> HistoryFileManager.addIfAbsent
> method produces 5 - 2 = 3 lines of "Waiting to remove  from
> JobListCache because it is not in done yet" message.
> It will attach a stacktrace.
> Impact:
> In addition to large disk consumption, this issue blocks JobHistory.getJob
> long time and slows job execution down significantly because getJob is called
> by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport.
> This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded
> eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When
> multiple threads call scanIfNeeded simultaneously, one of them acquires lock
> and the other threads are blocked until the first thread completes 
> long-running
> HistoryFileManager.addIfAbsent call.
> Solution: 
> * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take 
> too long time.
> * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips
>   scanning if another thread is already scanning. This changes semantics of
>   some HistoryFileManager methods (such as getAllFileInfo and getFileInfo)
>   because scanIfNeeded keep outdated state.
> * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls 
> are
>   not blocked by a loop at scale of tens of thousands.
>  
> This patch implemented the first item.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6436) JobHistory cache issue

2015-12-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059548#comment-15059548
 ] 

Hudson commented on MAPREDUCE-6436:
---

FAILURE: Integrated in Hadoop-trunk-Commit #8973 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8973/])
Update CHANGES.txt to move MAPREDUCE-6436 from YARN to MAPREDUCE (zxu: rev 
7092d47fc0b3b792dd31f967c01d460dc089f60b)
* hadoop-yarn-project/CHANGES.txt
* hadoop-mapreduce-project/CHANGES.txt


> JobHistory cache issue
> --
>
> Key: MAPREDUCE-6436
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Ryu Kobayashi
>Assignee: Kai Sasaki
>Priority: Blocker
> Fix For: 2.8.0, 2.7.3, 2.6.4
>
> Attachments: MAPREDUCE-6436.1.patch, MAPREDUCE-6436.2.patch, 
> MAPREDUCE-6436.3.patch, MAPREDUCE-6436.4.patch, stacktrace1.txt, 
> stacktrace2.txt, stacktrace3.txt
>
>
> Problem: 
> HistoryFileManager.addIfAbsent produces large amount of logs if number of
> cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes
> larger than mapreduce.jobhistory.joblist.cache.size by far.
> Example:
> For example, if the cache contains 5 entries in total and 10,000 entries
> newer than mapreduce.jobhistory.max-age-ms where
> mapreduce.jobhistory.joblist.cache.size is 2, 
> HistoryFileManager.addIfAbsent
> method produces 5 - 2 = 3 lines of "Waiting to remove  from
> JobListCache because it is not in done yet" message.
> It will attach a stacktrace.
> Impact:
> In addition to large disk consumption, this issue blocks JobHistory.getJob
> long time and slows job execution down significantly because getJob is called
> by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport.
> This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded
> eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When
> multiple threads call scanIfNeeded simultaneously, one of them acquires lock
> and the other threads are blocked until the first thread completes 
> long-running
> HistoryFileManager.addIfAbsent call.
> Solution: 
> * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take 
> too long time.
> * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips
>   scanning if another thread is already scanning. This changes semantics of
>   some HistoryFileManager methods (such as getAllFileInfo and getFileInfo)
>   because scanIfNeeded keep outdated state.
> * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls 
> are
>   not blocked by a loop at scale of tens of thousands.
>  
> This patch implemented the first item.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6436) JobHistory cache issue

2015-12-15 Thread Kai Sasaki (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059386#comment-15059386
 ] 

Kai Sasaki commented on MAPREDUCE-6436:
---

[~zxu] Thanks for clarifying. I created another JIRA for reducing unnecessary 
call of scanIntermediateDirectory. MAPREDUCE-6573
Please discuss on the JIRA. If it is not necessary, it can be voided. Thanks 
anyway.

> JobHistory cache issue
> --
>
> Key: MAPREDUCE-6436
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Ryu Kobayashi
>Assignee: Kai Sasaki
>Priority: Blocker
> Fix For: 2.8.0, 2.7.3, 2.6.4
>
> Attachments: MAPREDUCE-6436.1.patch, MAPREDUCE-6436.2.patch, 
> MAPREDUCE-6436.3.patch, MAPREDUCE-6436.4.patch, stacktrace1.txt, 
> stacktrace2.txt, stacktrace3.txt
>
>
> Problem: 
> HistoryFileManager.addIfAbsent produces large amount of logs if number of
> cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes
> larger than mapreduce.jobhistory.joblist.cache.size by far.
> Example:
> For example, if the cache contains 5 entries in total and 10,000 entries
> newer than mapreduce.jobhistory.max-age-ms where
> mapreduce.jobhistory.joblist.cache.size is 2, 
> HistoryFileManager.addIfAbsent
> method produces 5 - 2 = 3 lines of "Waiting to remove  from
> JobListCache because it is not in done yet" message.
> It will attach a stacktrace.
> Impact:
> In addition to large disk consumption, this issue blocks JobHistory.getJob
> long time and slows job execution down significantly because getJob is called
> by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport.
> This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded
> eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When
> multiple threads call scanIfNeeded simultaneously, one of them acquires lock
> and the other threads are blocked until the first thread completes 
> long-running
> HistoryFileManager.addIfAbsent call.
> Solution: 
> * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take 
> too long time.
> * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips
>   scanning if another thread is already scanning. This changes semantics of
>   some HistoryFileManager methods (such as getAllFileInfo and getFileInfo)
>   because scanIfNeeded keep outdated state.
> * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls 
> are
>   not blocked by a loop at scale of tens of thousands.
>  
> This patch implemented the first item.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6436) JobHistory cache issue

2015-12-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058061#comment-15058061
 ] 

Hudson commented on MAPREDUCE-6436:
---

SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #694 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/694/])
MAPREDUCE-6436. JobHistory cache issue. Contributed by Kai Sasaki (zxu: rev 
5b7078d06921893200163a3d29c8901c3c0107cb)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java


> JobHistory cache issue
> --
>
> Key: MAPREDUCE-6436
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Ryu Kobayashi
>Assignee: Kai Sasaki
>Priority: Blocker
> Fix For: 2.8.0, 2.7.3, 2.6.4
>
> Attachments: MAPREDUCE-6436.1.patch, MAPREDUCE-6436.2.patch, 
> MAPREDUCE-6436.3.patch, MAPREDUCE-6436.4.patch, stacktrace1.txt, 
> stacktrace2.txt, stacktrace3.txt
>
>
> Problem: 
> HistoryFileManager.addIfAbsent produces large amount of logs if number of
> cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes
> larger than mapreduce.jobhistory.joblist.cache.size by far.
> Example:
> For example, if the cache contains 5 entries in total and 10,000 entries
> newer than mapreduce.jobhistory.max-age-ms where
> mapreduce.jobhistory.joblist.cache.size is 2, 
> HistoryFileManager.addIfAbsent
> method produces 5 - 2 = 3 lines of "Waiting to remove  from
> JobListCache because it is not in done yet" message.
> It will attach a stacktrace.
> Impact:
> In addition to large disk consumption, this issue blocks JobHistory.getJob
> long time and slows job execution down significantly because getJob is called
> by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport.
> This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded
> eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When
> multiple threads call scanIfNeeded simultaneously, one of them acquires lock
> and the other threads are blocked until the first thread completes 
> long-running
> HistoryFileManager.addIfAbsent call.
> Solution: 
> * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take 
> too long time.
> * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips
>   scanning if another thread is already scanning. This changes semantics of
>   some HistoryFileManager methods (such as getAllFileInfo and getFileInfo)
>   because scanIfNeeded keep outdated state.
> * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls 
> are
>   not blocked by a loop at scale of tens of thousands.
>  
> This patch implemented the first item.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6436) JobHistory cache issue

2015-12-15 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059349#comment-15059349
 ] 

zhihai xu commented on MAPREDUCE-6436:
--

Thanks [~lewuathe] for suggestion! There is a task 
{{MoveIntermediateToDoneRunnable}} which will call scanIntermediateDirectory 
periodically. So most time the job will be found in the cache {{jobListCache}}. 
Also making scanIfNeeded asynchronous may change the functionality in RPC 
calls: cannot find the job information which can be found before. I think about 
the other way to improve the performance which can decrease the times to call 
scanIntermediateDirectory:
In getFileInfo, add scanOldDirsForJob before scanIntermediateDirectory, which 
means calling  scanOldDirsForJob twice:
one is before scanIntermediateDirectory, the other is after 
scanIntermediateDirectory.
{code}
  public HistoryFileInfo getFileInfo(JobId jobId) throws IOException {
// FileInfo available in cache.
HistoryFileInfo fileInfo = jobListCache.get(jobId);
if (fileInfo != null) {
  return fileInfo;
}
// call scanOldDirsForJob before scanIntermediateDirectory
fileInfo = scanOldDirsForJob(jobId);
if (fileInfo != null) {
  return fileInfo;
}

// OK so scan the intermediate to be sure we did not lose it that way
scanIntermediateDirectory();
fileInfo = jobListCache.get(jobId);
if (fileInfo != null) {
  return fileInfo;
}

// Intermediate directory does not contain job. Search through older ones.
fileInfo = scanOldDirsForJob(jobId);
if (fileInfo != null) {
  return fileInfo;
}
return null;
  }
{code}


> JobHistory cache issue
> --
>
> Key: MAPREDUCE-6436
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Ryu Kobayashi
>Assignee: Kai Sasaki
>Priority: Blocker
> Fix For: 2.8.0, 2.7.3, 2.6.4
>
> Attachments: MAPREDUCE-6436.1.patch, MAPREDUCE-6436.2.patch, 
> MAPREDUCE-6436.3.patch, MAPREDUCE-6436.4.patch, stacktrace1.txt, 
> stacktrace2.txt, stacktrace3.txt
>
>
> Problem: 
> HistoryFileManager.addIfAbsent produces large amount of logs if number of
> cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes
> larger than mapreduce.jobhistory.joblist.cache.size by far.
> Example:
> For example, if the cache contains 5 entries in total and 10,000 entries
> newer than mapreduce.jobhistory.max-age-ms where
> mapreduce.jobhistory.joblist.cache.size is 2, 
> HistoryFileManager.addIfAbsent
> method produces 5 - 2 = 3 lines of "Waiting to remove  from
> JobListCache because it is not in done yet" message.
> It will attach a stacktrace.
> Impact:
> In addition to large disk consumption, this issue blocks JobHistory.getJob
> long time and slows job execution down significantly because getJob is called
> by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport.
> This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded
> eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When
> multiple threads call scanIfNeeded simultaneously, one of them acquires lock
> and the other threads are blocked until the first thread completes 
> long-running
> HistoryFileManager.addIfAbsent call.
> Solution: 
> * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take 
> too long time.
> * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips
>   scanning if another thread is already scanning. This changes semantics of
>   some HistoryFileManager methods (such as getAllFileInfo and getFileInfo)
>   because scanIfNeeded keep outdated state.
> * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls 
> are
>   not blocked by a loop at scale of tens of thousands.
>  
> This patch implemented the first item.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6436) JobHistory cache issue

2015-12-15 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059594#comment-15059594
 ] 

zhihai xu commented on MAPREDUCE-6436:
--

Just committed it to branch-2.8!

> JobHistory cache issue
> --
>
> Key: MAPREDUCE-6436
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Ryu Kobayashi
>Assignee: Kai Sasaki
>Priority: Blocker
> Fix For: 2.8.0, 2.7.3, 2.6.4
>
> Attachments: MAPREDUCE-6436.1.patch, MAPREDUCE-6436.2.patch, 
> MAPREDUCE-6436.3.patch, MAPREDUCE-6436.4.patch, stacktrace1.txt, 
> stacktrace2.txt, stacktrace3.txt
>
>
> Problem: 
> HistoryFileManager.addIfAbsent produces large amount of logs if number of
> cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes
> larger than mapreduce.jobhistory.joblist.cache.size by far.
> Example:
> For example, if the cache contains 5 entries in total and 10,000 entries
> newer than mapreduce.jobhistory.max-age-ms where
> mapreduce.jobhistory.joblist.cache.size is 2, 
> HistoryFileManager.addIfAbsent
> method produces 5 - 2 = 3 lines of "Waiting to remove  from
> JobListCache because it is not in done yet" message.
> It will attach a stacktrace.
> Impact:
> In addition to large disk consumption, this issue blocks JobHistory.getJob
> long time and slows job execution down significantly because getJob is called
> by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport.
> This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded
> eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When
> multiple threads call scanIfNeeded simultaneously, one of them acquires lock
> and the other threads are blocked until the first thread completes 
> long-running
> HistoryFileManager.addIfAbsent call.
> Solution: 
> * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take 
> too long time.
> * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips
>   scanning if another thread is already scanning. This changes semantics of
>   some HistoryFileManager methods (such as getAllFileInfo and getFileInfo)
>   because scanIfNeeded keep outdated state.
> * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls 
> are
>   not blocked by a loop at scale of tens of thousands.
>  
> This patch implemented the first item.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6436) JobHistory cache issue

2015-12-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057236#comment-15057236
 ] 

Hadoop QA commented on MAPREDUCE-6436:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
9s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
34s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
8s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
42s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 5m 32s {color} 
| {color:red} hadoop-mapreduce-client-hs in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 5m 58s 
{color} | {color:green} hadoop-mapreduce-client-hs in the patch passed with JDK 
v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 20s 
{color} | {color:red} Patch generated 14 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 25m 38s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.mapreduce.v2.hs.TestHistoryFileManager |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12777649/MAPREDUCE-6436.4.patch
 |
| JIRA Issue | MAPREDUCE-6436 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux ca50a68ad968 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 

[jira] [Commented] (MAPREDUCE-6436) JobHistory cache issue

2015-12-14 Thread Kai Sasaki (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057241#comment-15057241
 ] 

Kai Sasaki commented on MAPREDUCE-6436:
---

[~djp] Yes, as described above, scanIfNeeded slowness makes 
HistroyClientServer.HDSClientProtocolHandler.getJobReport slow that is called 
from job client. In some cases, it causes a performance issue of the job. 
But usually retuned from JobListCached retained by HistoryFileManager in this 
case scanIntermediateDirectory won't be required. So we cannot say that the 
performance issue is occurred immediately if there are a lot of failed and 
pending job logs in intermediate directory.
I'm not sure we should set the JIRA as a blocker or not though...

> JobHistory cache issue
> --
>
> Key: MAPREDUCE-6436
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ryu Kobayashi
>Assignee: Kai Sasaki
> Attachments: MAPREDUCE-6436.1.patch, MAPREDUCE-6436.2.patch, 
> MAPREDUCE-6436.3.patch, MAPREDUCE-6436.4.patch, stacktrace1.txt, 
> stacktrace2.txt, stacktrace3.txt
>
>
> Problem: 
> HistoryFileManager.addIfAbsent produces large amount of logs if number of
> cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes
> larger than mapreduce.jobhistory.joblist.cache.size by far.
> Example:
> For example, if the cache contains 5 entries in total and 10,000 entries
> newer than mapreduce.jobhistory.max-age-ms where
> mapreduce.jobhistory.joblist.cache.size is 2, 
> HistoryFileManager.addIfAbsent
> method produces 5 - 2 = 3 lines of "Waiting to remove  from
> JobListCache because it is not in done yet" message.
> It will attach a stacktrace.
> Impact:
> In addition to large disk consumption, this issue blocks JobHistory.getJob
> long time and slows job execution down significantly because getJob is called
> by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport.
> This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded
> eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When
> multiple threads call scanIfNeeded simultaneously, one of them acquires lock
> and the other threads are blocked until the first thread completes 
> long-running
> HistoryFileManager.addIfAbsent call.
> Solution: 
> * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take 
> too long time.
> * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips
>   scanning if another thread is already scanning. This changes semantics of
>   some HistoryFileManager methods (such as getAllFileInfo and getFileInfo)
>   because scanIfNeeded keep outdated state.
> * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls 
> are
>   not blocked by a loop at scale of tens of thousands.
>  
> This patch implemented the first item.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6436) JobHistory cache issue

2015-12-14 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056154#comment-15056154
 ] 

Junping Du commented on MAPREDUCE-6436:
---

I think the impact of this issue could be more severe than our description 
above: "In addition to large disk consumption, this issue blocks 
JobHistory.getJob() long time and slows job execution down significantly 
because getJob is called by RPC such as 
HistoryClientService.HSClientProtocolHandler.getJobReport. This impact happens 
because HistoryFileManager.UserLogDir.scanIfNeeded eventually calls 
HistoryFileManager.addIfAbsent in a synchronized block. When multiple threads 
call scanIfNeeded simultaneously, one of them acquires lock and the other 
threads are blocked until the first thread completes long-running 
HistoryFileManager.addIfAbsent call. "
It could cause JHS serious OOM because REST call of getJobs() could get blocked 
with some getJob() while unexpected caching other completedJob() in previous 
calls. Isn' it? [~zxu] and [~lewuathe], may be we should set this JIRA as a 
blocker for 2.6.4 and 2.7.3?

> JobHistory cache issue
> --
>
> Key: MAPREDUCE-6436
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ryu Kobayashi
>Assignee: Kai Sasaki
> Attachments: MAPREDUCE-6436.1.patch, MAPREDUCE-6436.2.patch, 
> MAPREDUCE-6436.3.patch, stacktrace1.txt, stacktrace2.txt, stacktrace3.txt
>
>
> Problem: 
> HistoryFileManager.addIfAbsent produces large amount of logs if number of
> cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes
> larger than mapreduce.jobhistory.joblist.cache.size by far.
> Example:
> For example, if the cache contains 5 entries in total and 10,000 entries
> newer than mapreduce.jobhistory.max-age-ms where
> mapreduce.jobhistory.joblist.cache.size is 2, 
> HistoryFileManager.addIfAbsent
> method produces 5 - 2 = 3 lines of "Waiting to remove  from
> JobListCache because it is not in done yet" message.
> It will attach a stacktrace.
> Impact:
> In addition to large disk consumption, this issue blocks JobHistory.getJob
> long time and slows job execution down significantly because getJob is called
> by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport.
> This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded
> eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When
> multiple threads call scanIfNeeded simultaneously, one of them acquires lock
> and the other threads are blocked until the first thread completes 
> long-running
> HistoryFileManager.addIfAbsent call.
> Solution: 
> * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take 
> too long time.
> * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips
>   scanning if another thread is already scanning. This changes semantics of
>   some HistoryFileManager methods (such as getAllFileInfo and getFileInfo)
>   because scanIfNeeded keep outdated state.
> * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls 
> are
>   not blocked by a loop at scale of tens of thousands.
>  
> This patch implemented the first item.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6436) JobHistory cache issue

2015-12-13 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055461#comment-15055461
 ] 

zhihai xu commented on MAPREDUCE-6436:
--

Thanks for updating the patch [~lewuathe]! the new patch looks good except the 
checkstyle issue.
{code}
./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java:265:
 Line is longer than 80 characters (found 97).
./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java:267:
 Line is longer than 80 characters (found 102).
./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java:268:
 Line is longer than 80 characters (found 118).
./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java:271:
 Line is longer than 80 characters (found 94).
./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java:272:
 Line is longer than 80 characters (found 114).
{code}
Could you fix the above checkstyle issue?

> JobHistory cache issue
> --
>
> Key: MAPREDUCE-6436
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ryu Kobayashi
>Assignee: Kai Sasaki
> Attachments: MAPREDUCE-6436.1.patch, MAPREDUCE-6436.2.patch, 
> MAPREDUCE-6436.3.patch, stacktrace1.txt, stacktrace2.txt, stacktrace3.txt
>
>
> Problem: 
> HistoryFileManager.addIfAbsent produces large amount of logs if number of
> cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes
> larger than mapreduce.jobhistory.joblist.cache.size by far.
> Example:
> For example, if the cache contains 5 entries in total and 10,000 entries
> newer than mapreduce.jobhistory.max-age-ms where
> mapreduce.jobhistory.joblist.cache.size is 2, 
> HistoryFileManager.addIfAbsent
> method produces 5 - 2 = 3 lines of "Waiting to remove  from
> JobListCache because it is not in done yet" message.
> It will attach a stacktrace.
> Impact:
> In addition to large disk consumption, this issue blocks JobHistory.getJob
> long time and slows job execution down significantly because getJob is called
> by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport.
> This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded
> eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When
> multiple threads call scanIfNeeded simultaneously, one of them acquires lock
> and the other threads are blocked until the first thread completes 
> long-running
> HistoryFileManager.addIfAbsent call.
> Solution: 
> * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take 
> too long time.
> * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips
>   scanning if another thread is already scanning. This changes semantics of
>   some HistoryFileManager methods (such as getAllFileInfo and getFileInfo)
>   because scanIfNeeded keep outdated state.
> * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls 
> are
>   not blocked by a loop at scale of tens of thousands.
>  
> This patch implemented the first item.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6436) JobHistory cache issue

2015-12-13 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055199#comment-15055199
 ] 

zhihai xu commented on MAPREDUCE-6436:
--

[~lewuathe], thanks for working on this issue. About the patch, We don't need 
to calculate the count for the entries being removed.
Can we do all the calculations in the {{else}} section:
{code}
if(firstValue.didMoveFail() &&
firstValue.jobIndexInfo.getFinishTime() <= cutoff) {
...
} else {
  if (firstValue.didMoveFail()) {
if (moveFailedCount == 0) {
  firstMoveFailedKey = key;
}
moveFailedCount += 1;
  } else {
if (inIntermediateCount == 0) {
  firstInIntermediateKey = key;
}
inIntermediateCount += 1;
  }
}
{code}

> JobHistory cache issue
> --
>
> Key: MAPREDUCE-6436
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ryu Kobayashi
>Assignee: Kai Sasaki
> Attachments: MAPREDUCE-6436.1.patch, MAPREDUCE-6436.2.patch, 
> stacktrace1.txt, stacktrace2.txt, stacktrace3.txt
>
>
> Problem: 
> HistoryFileManager.addIfAbsent produces large amount of logs if number of
> cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes
> larger than mapreduce.jobhistory.joblist.cache.size by far.
> Example:
> For example, if the cache contains 5 entries in total and 10,000 entries
> newer than mapreduce.jobhistory.max-age-ms where
> mapreduce.jobhistory.joblist.cache.size is 2, 
> HistoryFileManager.addIfAbsent
> method produces 5 - 2 = 3 lines of "Waiting to remove  from
> JobListCache because it is not in done yet" message.
> It will attach a stacktrace.
> Impact:
> In addition to large disk consumption, this issue blocks JobHistory.getJob
> long time and slows job execution down significantly because getJob is called
> by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport.
> This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded
> eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When
> multiple threads call scanIfNeeded simultaneously, one of them acquires lock
> and the other threads are blocked until the first thread completes 
> long-running
> HistoryFileManager.addIfAbsent call.
> Solution: 
> * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take 
> too long time.
> * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips
>   scanning if another thread is already scanning. This changes semantics of
>   some HistoryFileManager methods (such as getAllFileInfo and getFileInfo)
>   because scanIfNeeded keep outdated state.
> * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls 
> are
>   not blocked by a loop at scale of tens of thousands.
>  
> This patch implemented the first item.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6436) JobHistory cache issue

2015-12-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055308#comment-15055308
 ] 

Hadoop QA commented on MAPREDUCE-6436:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
20s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
11s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
39s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
22s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 11s 
{color} | {color:red} Patch generated 5 new checkstyle issues in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs 
(total was 16, now 21). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
47s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 5m 55s 
{color} | {color:green} hadoop-mapreduce-client-hs in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 6m 12s 
{color} | {color:green} hadoop-mapreduce-client-hs in the patch passed with JDK 
v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 23s 
{color} | {color:red} Patch generated 14 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 27m 42s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12777379/MAPREDUCE-6436.3.patch
 |
| JIRA Issue | MAPREDUCE-6436 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 0098dd90cfb6 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 

[jira] [Commented] (MAPREDUCE-6436) JobHistory cache issue

2015-12-10 Thread Kai Sasaki (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050335#comment-15050335
 ] 

Kai Sasaki commented on MAPREDUCE-6436:
---

[~zxu] Sorry for bothering you again. Could you review this?

> JobHistory cache issue
> --
>
> Key: MAPREDUCE-6436
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ryu Kobayashi
>Assignee: Ryu Kobayashi
> Attachments: MAPREDUCE-6436.1.patch, MAPREDUCE-6436.2.patch, 
> stacktrace1.txt, stacktrace2.txt, stacktrace3.txt
>
>
> Problem: 
> HistoryFileManager.addIfAbsent produces large amount of logs if number of
> cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes
> larger than mapreduce.jobhistory.joblist.cache.size by far.
> Example:
> For example, if the cache contains 5 entries in total and 10,000 entries
> newer than mapreduce.jobhistory.max-age-ms where
> mapreduce.jobhistory.joblist.cache.size is 2, 
> HistoryFileManager.addIfAbsent
> method produces 5 - 2 = 3 lines of "Waiting to remove  from
> JobListCache because it is not in done yet" message.
> It will attach a stacktrace.
> Impact:
> In addition to large disk consumption, this issue blocks JobHistory.getJob
> long time and slows job execution down significantly because getJob is called
> by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport.
> This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded
> eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When
> multiple threads call scanIfNeeded simultaneously, one of them acquires lock
> and the other threads are blocked until the first thread completes 
> long-running
> HistoryFileManager.addIfAbsent call.
> Solution: 
> * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take 
> too long time.
> * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips
>   scanning if another thread is already scanning. This changes semantics of
>   some HistoryFileManager methods (such as getAllFileInfo and getFileInfo)
>   because scanIfNeeded keep outdated state.
> * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls 
> are
>   not blocked by a loop at scale of tens of thousands.
>  
> This patch implemented the first item.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6436) JobHistory cache issue

2015-12-01 Thread Kai Sasaki (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035354#comment-15035354
 ] 

Kai Sasaki commented on MAPREDUCE-6436:
---

[~zxu] Hello, Zhihai. I'm sorry for late for responding. [~ryu_kobayashi] asked 
me to take over this JIRA. So I'll update current patch soon. Thank you.

> JobHistory cache issue
> --
>
> Key: MAPREDUCE-6436
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ryu Kobayashi
>Assignee: Ryu Kobayashi
> Attachments: MAPREDUCE-6436.1.patch, stacktrace1.txt, 
> stacktrace2.txt, stacktrace3.txt
>
>
> Problem: 
> HistoryFileManager.addIfAbsent produces large amount of logs if number of
> cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes
> larger than mapreduce.jobhistory.joblist.cache.size by far.
> Example:
> For example, if the cache contains 5 entries in total and 10,000 entries
> newer than mapreduce.jobhistory.max-age-ms where
> mapreduce.jobhistory.joblist.cache.size is 2, 
> HistoryFileManager.addIfAbsent
> method produces 5 - 2 = 3 lines of "Waiting to remove  from
> JobListCache because it is not in done yet" message.
> It will attach a stacktrace.
> Impact:
> In addition to large disk consumption, this issue blocks JobHistory.getJob
> long time and slows job execution down significantly because getJob is called
> by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport.
> This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded
> eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When
> multiple threads call scanIfNeeded simultaneously, one of them acquires lock
> and the other threads are blocked until the first thread completes 
> long-running
> HistoryFileManager.addIfAbsent call.
> Solution: 
> * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take 
> too long time.
> * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips
>   scanning if another thread is already scanning. This changes semantics of
>   some HistoryFileManager methods (such as getAllFileInfo and getFileInfo)
>   because scanIfNeeded keep outdated state.
> * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls 
> are
>   not blocked by a loop at scale of tens of thousands.
>  
> This patch implemented the first item.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6436) JobHistory cache issue

2015-12-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035392#comment-15035392
 ] 

Hadoop QA commented on MAPREDUCE-6436:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
4s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
10s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
37s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
22s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 10s 
{color} | {color:red} Patch generated 1 new checkstyle issues in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs 
(total was 16, now 17). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
45s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 5m 48s 
{color} | {color:green} hadoop-mapreduce-client-hs in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 6m 12s 
{color} | {color:green} hadoop-mapreduce-client-hs in the patch passed with JDK 
v1.7.0_85. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 22s 
{color} | {color:red} Patch generated 14 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 27m 12s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12745774/MAPREDUCE-6436.1.patch
 |
| JIRA Issue | MAPREDUCE-6436 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux d5be684a9740 3.13.0-36-lowlatency 

[jira] [Commented] (MAPREDUCE-6436) JobHistory cache issue

2015-12-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035395#comment-15035395
 ] 

Hadoop QA commented on MAPREDUCE-6436:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
44s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
10s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
39s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
23s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 10s 
{color} | {color:red} Patch generated 5 new checkstyle issues in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs 
(total was 16, now 21). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
48s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 6m 17s 
{color} | {color:green} hadoop-mapreduce-client-hs in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 6m 19s 
{color} | {color:green} hadoop-mapreduce-client-hs in the patch passed with JDK 
v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 24s 
{color} | {color:red} Patch generated 14 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 28m 53s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12775224/MAPREDUCE-6436.2.patch
 |
| JIRA Issue | MAPREDUCE-6436 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 421014cad788 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 

[jira] [Commented] (MAPREDUCE-6436) JobHistory cache issue

2015-07-18 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632630#comment-14632630
 ] 

zhihai xu commented on MAPREDUCE-6436:
--

Thanks for working on this issue [~ryu_kobayashi]! It looks like the log will 
only be printed for the HistoryFileInfo at state {{IN_INTERMEDIATE}} or 
{{MOVE_FAILED}}. I think most of HistoryFileInfo should be at state {{IN_DONE}}.
bq. HistoryFileManager.addIfAbsent method produces 5 - 2 = 3 lines 
of Waiting to remove key from JobListCache because it is not in done yet 
message
The above statement may not be a valid case unless you have a performance issue 
at HDFS which cause {{HistoryFileInfo#moveToDone}} take very long time.
The direct cause for your issue may be a HDFS performance issue. But we can 
improve the logs to print less message.
About your patch, Changing {{scanIfNeeded}} to nonblocking may not be good 
because the following code at {{HistoryFileManager#getFileInfo}} expects 
{{jobListCache}} has the entry for the given job after 
{{scanIntermediateDirectory}} returns, which need block {{scanIfNeeded}}.
{code}
// OK so scan the intermediate to be sure we did not lose it that way
scanIntermediateDirectory();
fileInfo = jobListCache.get(jobId);
if (fileInfo != null) {
  return fileInfo;
}
{code}
Also the implementation of {{scanIfNeeded}} will make sure {{ 
scanIntermediateDirectory(p);}} will only be called once.
{code}
if (modTime != newModTime) {
Path p = fs.getPath();
try {
  scanIntermediateDirectory(p);
  //If scanning fails, we will scan again.  We assume the failure is
  // temporary.
  modTime = newModTime;
} catch (IOException e) {
  LOG.error(Error while trying to scan the directory  + p, e);
}
  } else {
if (LOG.isDebugEnabled()) {
  LOG.debug(Scan not needed of  + fs.getPath());
}
  }
{code}
So the performance overhead for {{scanIfNeeded}} won't be that much.

We can make a patch to print less log message. The following logs are printed 
for HistoryFileInfo at both {{IN_INTERMEDIATE}} state and {{MOVE_FAILED}} 
state, Can we add two counters: one for {{IN_INTERMEDIATE}} and the other one 
for {{MOVE_FAILED}}?
Also we can save the first key for HistoryFileInfo at state {{IN_INTERMEDIATE}} 
and the first key for HistoryFileInfo at state {{MOVE_FAILED}}, print these two 
keys in the logs.
{code}
} else {
  LOG.warn(Waiting to remove  + key
  +  from JobListCache because it is not in done yet.);
}
{code}

 JobHistory cache issue
 --

 Key: MAPREDUCE-6436
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Ryu Kobayashi
Assignee: Ryu Kobayashi
 Attachments: MAPREDUCE-6436.1.patch, stacktrace1.txt, 
 stacktrace2.txt, stacktrace3.txt


 Problem: 
 HistoryFileManager.addIfAbsent produces large amount of logs if number of
 cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes
 larger than mapreduce.jobhistory.joblist.cache.size by far.
 Example:
 For example, if the cache contains 5 entries in total and 10,000 entries
 newer than mapreduce.jobhistory.max-age-ms where
 mapreduce.jobhistory.joblist.cache.size is 2, 
 HistoryFileManager.addIfAbsent
 method produces 5 - 2 = 3 lines of Waiting to remove key from
 JobListCache because it is not in done yet message.
 It will attach a stacktrace.
 Impact:
 In addition to large disk consumption, this issue blocks JobHistory.getJob
 long time and slows job execution down significantly because getJob is called
 by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport.
 This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded
 eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When
 multiple threads call scanIfNeeded simultaneously, one of them acquires lock
 and the other threads are blocked until the first thread completes 
 long-running
 HistoryFileManager.addIfAbsent call.
 Solution: 
 * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take 
 too long time.
 * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips
   scanning if another thread is already scanning. This changes semantics of
   some HistoryFileManager methods (such as getAllFileInfo and getFileInfo)
   because scanIfNeeded keep outdated state.
 * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls 
 are
   not blocked by a loop at scale of tens of thousands.
  
 This patch implemented the first item.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6436) JobHistory cache issue

2015-07-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630890#comment-14630890
 ] 

Hadoop QA commented on MAPREDUCE-6436:
--

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 18s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m  4s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 12s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 30s | The applied patch generated  1 
new checkstyle issues (total was 16, now 17). |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 25s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 53s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | mapreduce tests |   5m 53s | Tests passed in 
hadoop-mapreduce-client-hs. |
| | |  44m 15s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12745774/MAPREDUCE-6436.1.patch 
|
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / ee36f4f |
| checkstyle |  
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5894/artifact/patchprocess/diffcheckstylehadoop-mapreduce-client-hs.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5894/artifact/patchprocess/whitespace.txt
 |
| hadoop-mapreduce-client-hs test log | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5894/artifact/patchprocess/testrun_hadoop-mapreduce-client-hs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5894/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5894/console |


This message was automatically generated.

 JobHistory cache issue
 --

 Key: MAPREDUCE-6436
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Ryu Kobayashi
Assignee: Ryu Kobayashi
 Attachments: MAPREDUCE-6436.1.patch, stacktrace1.txt, 
 stacktrace2.txt, stacktrace3.txt


 Problem: 
 HistoryFileManager.addIfAbsent produces large amount of logs if number of
 cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes
 larger than mapreduce.jobhistory.joblist.cache.size by far.
 Example:
 For example, if the cache contains 5 entries in total and 10,000 entries
 newer than mapreduce.jobhistory.max-age-ms where
 mapreduce.jobhistory.joblist.cache.size is 2, 
 HistoryFileManager.addIfAbsent
 method produces 5 - 2 = 3 lines of Waiting to remove key from
 JobListCache because it is not in done yet message.
 It will attach a stacktrace.
 Impact:
 In addition to large disk consumption, this issue blocks JobHistory.getJob
 long time and slows job execution down significantly because getJob is called
 by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport.
 This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded
 eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When
 multiple threads call scanIfNeeded simultaneously, one of them acquires lock
 and the other threads are blocked until the first thread completes 
 long-running
 HistoryFileManager.addIfAbsent call.
 Solution: 
 * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take 
 too long time.
 * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips
   scanning if another thread is already scanning. This changes semantics of
   some HistoryFileManager methods (such as getAllFileInfo and getFileInfo)
   because scanIfNeeded keep outdated state.
 * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls 
 are
   not blocked by a loop at scale of tens of thousands.
  
 This patch implemented