[jira] [Commented] (MAPREDUCE-6674) configure parallel tests for mapreduce-client-jobclient

2017-08-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122837#comment-16122837
 ] 

Hadoop QA commented on MAPREDUCE-6674:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 19m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 140 new or modified 
test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
29s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 
34s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 11m 34s{color} 
| {color:red} root generated 2 new + 1320 unchanged - 54 fixed = 1322 total 
(was 1374) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  2m 
35s{color} | {color:red} root: The patch generated 276 new + 2623 unchanged - 
200 fixed = 2899 total (was 2823) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 8 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch 1 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 30m 32s{color} 
| {color:red} hadoop-mapreduce-client-jobclient in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
14s{color} | {color:green} hadoop-extras in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}104m 57s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.mapred.TestMRCJCFileInputFormat |
| Timed out junit tests | org.apache.hadoop.mapreduce.v2.TestMRJobs |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | MAPREDUCE-6674 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12881380/MR-4980.013.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  xml  findbugs  checkstyle  |
| uname | Linux 1b2ada8be358 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / a32e013 |
| Default Java | 

[jira] [Updated] (MAPREDUCE-6674) configure parallel tests for mapreduce-client-jobclient

2017-08-10 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6674:

Attachment: MR-4980.013.patch

> configure parallel tests for mapreduce-client-jobclient
> ---
>
> Key: MAPREDUCE-6674
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6674
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: test
>Affects Versions: 3.0.0-alpha1
>Reporter: Allen Wittenauer
>Priority: Critical
> Attachments: MAPREDUCE-6674.00.patch, MAPREDUCE-6674.01.patch, 
> MR-4980.011.patch, MR-4980.012.patch, MR-4980.013.patch, MR-4980.patch, 
> sftest.000.patch
>
>
> mapreduce-client-jobclient takes almost an hour and a half.  Configuring 
> parallel-tests would greatly reduce this run time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-4980) Parallel test execution of hadoop-mapreduce-client-core

2017-08-10 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122733#comment-16122733
 ] 

Allen Wittenauer commented on MAPREDUCE-4980:
-

A few notes:

* TestMRJobs doesn't actually appear to be a unit test.  It looks more like an 
integration test.
* It currently takes about 18 minutes on my laptop to run.  It's significantly 
longer on Jenkins.
* Most of that time is spend starting up and bringing down clusters.
* Is there any reason for it to do that? i.e., why can't can't it just bring up 
one cluster, do the tests, then shut it down?



> Parallel test execution of hadoop-mapreduce-client-core
> ---
>
> Key: MAPREDUCE-4980
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4980
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: test
>Affects Versions: 3.0.0-alpha1
>Reporter: Tsuyoshi Ozawa
>Assignee: Andrey Klochkov
> Attachments: MAPREDUCE-4980.010.patch, MAPREDUCE-4980.1.patch, 
> MAPREDUCE-4980--n3.patch, MAPREDUCE-4980--n4.patch, MAPREDUCE-4980--n5.patch, 
> MAPREDUCE-4980--n6.patch, MAPREDUCE-4980--n7.patch, MAPREDUCE-4980--n7.patch, 
> MAPREDUCE-4980--n8.patch, MAPREDUCE-4980.patch
>
>
> The maven surefire plugin supports parallel testing feature. By using it, the 
> tests can be run more faster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6674) configure parallel tests for mapreduce-client-jobclient

2017-08-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122597#comment-16122597
 ] 

Hadoop QA commented on MAPREDUCE-6674:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 21m  
6s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 140 new or modified 
test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
16s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m  
3s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 15m  3s{color} 
| {color:red} root generated 2 new + 1320 unchanged - 54 fixed = 1322 total 
(was 1374) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  3m  
9s{color} | {color:red} root: The patch generated 276 new + 2623 unchanged - 
200 fixed = 2899 total (was 2823) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 8 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch 1 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 35m  8s{color} 
| {color:red} hadoop-mapreduce-client-jobclient in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
27s{color} | {color:green} hadoop-extras in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
47s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}122m 54s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestNNBench |
|   | hadoop.mapreduce.TestMapReduceLazyOutput |
| Timed out junit tests | org.apache.hadoop.mapreduce.v2.TestUberAM |
|   | org.apache.hadoop.mapreduce.v2.TestMRJobs |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | MAPREDUCE-6674 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12881314/MR-4980.012.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  xml  findbugs  checkstyle  |
| uname | Linux e236bce585d8 3.13.0-123-generic #172-Ubuntu SMP Mon Jun 26 
18:04:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 

[jira] [Commented] (MAPREDUCE-6870) Add configuration for MR job to finish when all reducers are complete (even with unfinished mappers)

2017-08-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122507#comment-16122507
 ] 

Hudson commented on MAPREDUCE-6870:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #12162 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/12162/])
MAPREDUCE-6870. Add configuration for MR job to finish when all reducers 
(haibochen: rev a32e0138fb63c92902e6613001f38a87c8a41321)
* (edit) 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestJobImpl.java
* (edit) 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
* (edit) 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
* (edit) 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java


> Add configuration for MR job to finish when all reducers are complete (even 
> with unfinished mappers)
> 
>
> Key: MAPREDUCE-6870
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6870
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.6.1
>Reporter: Zhe Zhang
>Assignee: Peter Bacsko
> Fix For: 3.0.0-beta1
>
> Attachments: MAPREDUCE-6870-001.patch, MAPREDUCE-6870-002.patch, 
> MAPREDUCE-6870-003.patch, MAPREDUCE-6870-004.patch, MAPREDUCE-6870-005.patch, 
> MAPREDUCE-6870-006.patch, MAPREDUCE-6870-007.patch
>
>
> Even with MAPREDUCE-5817, there could still be cases where mappers get 
> scheduled before all reducers are complete, but those mappers run for long 
> time, even after all reducers are complete. This could hurt the performance 
> of large MR jobs.
> In some cases, mappers don't have any materialize-able outcome other than 
> providing intermediate data to reducers. In that case, the job owner should 
> have the config option to finish the job once all reducers are complete.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6931) Fix TestDFSIO "Total Throughput" calculation

2017-08-10 Thread Dennis Huo (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122505#comment-16122505
 ] 

Dennis Huo commented on MAPREDUCE-6931:
---

Fair enough, makes sense. I went ahead and removed that line, keeping the 
refactorings otherwise; I also updated my commit message and pull request title 
to reflect the "removal" rather than the "fix" of the line, but it sounds like 
guidelines are to avoid editing JIRAs inplace, so I'll leave that untouched.

> Fix TestDFSIO "Total Throughput" calculation
> 
>
> Key: MAPREDUCE-6931
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6931
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: benchmarks, test
>Affects Versions: 2.8.0
>Reporter: Dennis Huo
>Priority: Trivial
>
> The new "Total Throughput" line added in 
> https://issues.apache.org/jira/browse/HDFS-9153 is currently calculated as 
> {{toMB(size) / ((float)execTime)}} and claims to be in units of "MB/s", but 
> {{execTime}} is in milliseconds; thus, the reported number is 1/1000x the 
> actual value:
> {code:java}
> String resultLines[] = {
> "- TestDFSIO - : " + testType,
> "Date & time: " + new Date(System.currentTimeMillis()),
> "Number of files: " + tasks,
> " Total MBytes processed: " + df.format(toMB(size)),
> "  Throughput mb/sec: " + df.format(size * 1000.0 / (time * 
> MEGA)),
> "Total Throughput mb/sec: " + df.format(toMB(size) / 
> ((float)execTime)),
> " Average IO rate mb/sec: " + df.format(med),
> "  IO rate std deviation: " + df.format(stdDev),
> " Test exec time sec: " + df.format((float)execTime / 1000),
> "" };
> {code}
> The different calculated fields can also use toMB and a shared 
> milliseconds-to-seconds conversion to make it easier to keep units consistent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6870) Add configuration for MR job to finish when all reducers are complete (even with unfinished mappers)

2017-08-10 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-6870:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0-beta1
   Status: Resolved  (was: Patch Available)

Thanks [~pbacsko] for the patch! I have committed it to trunk.

> Add configuration for MR job to finish when all reducers are complete (even 
> with unfinished mappers)
> 
>
> Key: MAPREDUCE-6870
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6870
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.6.1
>Reporter: Zhe Zhang
>Assignee: Peter Bacsko
> Fix For: 3.0.0-beta1
>
> Attachments: MAPREDUCE-6870-001.patch, MAPREDUCE-6870-002.patch, 
> MAPREDUCE-6870-003.patch, MAPREDUCE-6870-004.patch, MAPREDUCE-6870-005.patch, 
> MAPREDUCE-6870-006.patch, MAPREDUCE-6870-007.patch
>
>
> Even with MAPREDUCE-5817, there could still be cases where mappers get 
> scheduled before all reducers are complete, but those mappers run for long 
> time, even after all reducers are complete. This could hurt the performance 
> of large MR jobs.
> In some cases, mappers don't have any materialize-able outcome other than 
> providing intermediate data to reducers. In that case, the job owner should 
> have the config option to finish the job once all reducers are complete.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6674) configure parallel tests for mapreduce-client-jobclient

2017-08-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122417#comment-16122417
 ] 

Hadoop QA commented on MAPREDUCE-6674:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 140 new or modified 
test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 
53s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 14m 53s{color} 
| {color:red} root generated 2 new + 1320 unchanged - 54 fixed = 1322 total 
(was 1374) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  3m  
8s{color} | {color:red} root: The patch generated 276 new + 2623 unchanged - 
200 fixed = 2899 total (was 2823) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 8 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
1s{color} | {color:red} The patch 1 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 41m 27s{color} 
| {color:red} hadoop-mapreduce-client-jobclient in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
49s{color} | {color:green} hadoop-extras in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
43s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}105m 55s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestNNBench |
|   | hadoop.mapred.TestMRTimelineEventHandling |
| Timed out junit tests | org.apache.hadoop.mapreduce.v2.TestUberAM |
|   | org.apache.hadoop.mapreduce.v2.TestMRJobs |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | MAPREDUCE-6674 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12881314/MR-4980.012.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  xml  findbugs  checkstyle  |
| uname | Linux a67d71dc286b 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 
11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 

[jira] [Commented] (MAPREDUCE-6674) configure parallel tests for mapreduce-client-jobclient

2017-08-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122374#comment-16122374
 ] 

Hadoop QA commented on MAPREDUCE-6674:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  9m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m 
29s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m  
3s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 10m  3s{color} 
| {color:red} root generated 110 new + 1374 unchanged - 0 fixed = 1484 total 
(was 1374) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 10m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m 
19s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 13m 12s{color} 
| {color:red} root in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 94m 33s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.conf.TestCommonConfigurationFields |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | MAPREDUCE-6674 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12881312/sftest.000.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  xml  |
| uname | Linux bc65a92f0a93 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 312e57b |
| Default Java | 1.8.0_131 |
| javac | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7056/artifact/patchprocess/diff-compile-javac-root.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7056/artifact/patchprocess/patch-unit-root.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7056/testReport/ |
| modules | C: hadoop-project . U: . |
| Console output | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7056/console |
| Powered by | Apache Yetus 0.5.0   http://yetus.apache.org |


This message was automatically generated.



> configure parallel tests for mapreduce-client-jobclient
> ---
>
> Key: MAPREDUCE-6674
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6674
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: test
>Affects Versions: 3.0.0-alpha1
>  

[jira] [Commented] (MAPREDUCE-6931) Fix TestDFSIO "Total Throughput" calculation

2017-08-10 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122346#comment-16122346
 ] 

Konstantin Shvachko commented on MAPREDUCE-6931:


[~dennishuo] I agree that "Total Throughput" metric highly depends on how you 
run the job. This is exactly the point that it makes it a Mapreduce metric, not 
HDFS. One can go to Yarn UI and divide HDFS bytes written by the job time for 
any job, but it does not measure HDFS write operation.
I think we should just remove it.

> Fix TestDFSIO "Total Throughput" calculation
> 
>
> Key: MAPREDUCE-6931
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6931
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: benchmarks, test
>Affects Versions: 2.8.0
>Reporter: Dennis Huo
>Priority: Trivial
>
> The new "Total Throughput" line added in 
> https://issues.apache.org/jira/browse/HDFS-9153 is currently calculated as 
> {{toMB(size) / ((float)execTime)}} and claims to be in units of "MB/s", but 
> {{execTime}} is in milliseconds; thus, the reported number is 1/1000x the 
> actual value:
> {code:java}
> String resultLines[] = {
> "- TestDFSIO - : " + testType,
> "Date & time: " + new Date(System.currentTimeMillis()),
> "Number of files: " + tasks,
> " Total MBytes processed: " + df.format(toMB(size)),
> "  Throughput mb/sec: " + df.format(size * 1000.0 / (time * 
> MEGA)),
> "Total Throughput mb/sec: " + df.format(toMB(size) / 
> ((float)execTime)),
> " Average IO rate mb/sec: " + df.format(med),
> "  IO rate std deviation: " + df.format(stdDev),
> " Test exec time sec: " + df.format((float)execTime / 1000),
> "" };
> {code}
> The different calculated fields can also use toMB and a shared 
> milliseconds-to-seconds conversion to make it easier to keep units consistent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6931) Fix TestDFSIO "Total Throughput" calculation

2017-08-10 Thread Dennis Huo (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122308#comment-16122308
 ] 

Dennis Huo commented on MAPREDUCE-6931:
---

Thanks for the explanation! I have no strong preference about removing the 
particular "Total Throughput" metric, but from my own experience using 
TestDFSIO in the past, I do find that the "average single-stream throughput" 
calculation historically provided by TestDFSIO can itself be somewhat 
misleading in characterizing a cluster since it makes it difficult to infer the 
level of concurrency corresponding to that per-stream performance without 
backing out the numbers manually.

I see the new metric as being a useful measure of "Effective Aggregate 
Throughput", all-in including overhead.

For example, if I use memory settings that only fit 1 container per physical 
machine at a time, my TestDFSIO will trickle through 1 task per machine at a 
time, and those single tasks will have very high single-stream throughput. If I 
instead do memory packing so that every machine runs, say, 64 tasks 
concurrently, then single-stream throughput will suffer significantly, while 
total walltime will decrease significantly. With a walltime-based calculation, 
I can see at a glance the approximate total throughput rating of my cluster 
when everything is running at full throttle; I'd expect increasing concurrency 
to increase aggregate throughput until IO limits are reached, where aggregate 
throughput will become flat w.r.t. increasing concurrency or slightly declining 
due to thrashing.

This could also be my cloud bias, where it becomes more important to 
characterize a full-blast cluster against a remote filesystem vs caring so much 
about per-stream throughputs.

It seems like an "effective aggregate throughput" calculation would help 
encompass the cluster-wide effects of things like optimal CPU oversubscription 
ratios, scheduler settings, speculative execution vs failure rates, etc.

I agree the wording and computation as-is might not be the right fit for this 
though. I see a few options that might be worthwhile, possibly in some 
combination:

* Change wording to say "Effective Aggregate Throughput" to more accurately 
describe what the number means
* Add a metric displaying the "time" as "Slot Seconds" or something like that 
so that user doesn't have to compute it by dividing "Total MBytes processes" by 
"Throughput mb/sec" explicitly. This also helps clarify that the throughput is 
computed in terms is slot time, not walltime.
* Additionally, maybe provide a measure of "average concurrency" taking total 
slot time divided by walltime. This would legitimately consider scheduler 
overheads; if my whole test only ran 1 task in an hour, and it only had 30 
minutes of slot time, then a concurrency of 0.5 correctly characterizes the 
fact that I'm only squeezing out 0.5 utilization after factoring in delays.


In any case, happy to just delete the one line in-place to have the 
refactorings committed if you feel it's better not to change/add metrics or if 
these are better discussed in a followup JIRA, let me know.

Re: MAPREDUCE and HDFS, I'll be sure remember TestDFSIO goes under HDFS in the 
future. For this one I looked at a search for "TestDFSIO" in JIRA and eyeballed 
that a plurality seemed to be under MAPREDUCE, a smaller fraction in HDFS, and 
then remaining ones in HADOOP. Combined with this code going under the 
hadoop-mapreduce directory, it looked like MAPREDUCE was more correct.

> Fix TestDFSIO "Total Throughput" calculation
> 
>
> Key: MAPREDUCE-6931
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6931
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: benchmarks, test
>Affects Versions: 2.8.0
>Reporter: Dennis Huo
>Priority: Trivial
>
> The new "Total Throughput" line added in 
> https://issues.apache.org/jira/browse/HDFS-9153 is currently calculated as 
> {{toMB(size) / ((float)execTime)}} and claims to be in units of "MB/s", but 
> {{execTime}} is in milliseconds; thus, the reported number is 1/1000x the 
> actual value:
> {code:java}
> String resultLines[] = {
> "- TestDFSIO - : " + testType,
> "Date & time: " + new Date(System.currentTimeMillis()),
> "Number of files: " + tasks,
> " Total MBytes processed: " + df.format(toMB(size)),
> "  Throughput mb/sec: " + df.format(size * 1000.0 / (time * 
> MEGA)),
> "Total Throughput mb/sec: " + df.format(toMB(size) / 
> ((float)execTime)),
> " Average IO rate mb/sec: " + df.format(med),
> "  IO rate std deviation: " + df.format(stdDev),
> " Test exec time sec: " + df.format((float)execTime / 1000),
> "" };
> {code}
> The different calculated 

[jira] [Updated] (MAPREDUCE-6674) configure parallel tests for mapreduce-client-jobclient

2017-08-10 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6674:

Attachment: MR-4980.012.patch

> configure parallel tests for mapreduce-client-jobclient
> ---
>
> Key: MAPREDUCE-6674
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6674
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: test
>Affects Versions: 3.0.0-alpha1
>Reporter: Allen Wittenauer
>Priority: Critical
> Attachments: MAPREDUCE-6674.00.patch, MAPREDUCE-6674.01.patch, 
> MR-4980.011.patch, MR-4980.012.patch, MR-4980.patch, sftest.000.patch
>
>
> mapreduce-client-jobclient takes almost an hour and a half.  Configuring 
> parallel-tests would greatly reduce this run time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6674) configure parallel tests for mapreduce-client-jobclient

2017-08-10 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6674:

Attachment: (was: MR-4980.011.patch)

> configure parallel tests for mapreduce-client-jobclient
> ---
>
> Key: MAPREDUCE-6674
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6674
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: test
>Affects Versions: 3.0.0-alpha1
>Reporter: Allen Wittenauer
>Priority: Critical
> Attachments: MAPREDUCE-6674.00.patch, MAPREDUCE-6674.01.patch, 
> MR-4980.011.patch, MR-4980.012.patch, MR-4980.patch, sftest.000.patch
>
>
> mapreduce-client-jobclient takes almost an hour and a half.  Configuring 
> parallel-tests would greatly reduce this run time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6674) configure parallel tests for mapreduce-client-jobclient

2017-08-10 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6674:

Attachment: MR-4980.011.patch

> configure parallel tests for mapreduce-client-jobclient
> ---
>
> Key: MAPREDUCE-6674
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6674
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: test
>Affects Versions: 3.0.0-alpha1
>Reporter: Allen Wittenauer
>Priority: Critical
> Attachments: MAPREDUCE-6674.00.patch, MAPREDUCE-6674.01.patch, 
> MR-4980.011.patch, MR-4980.011.patch, MR-4980.patch, sftest.000.patch
>
>
> mapreduce-client-jobclient takes almost an hour and a half.  Configuring 
> parallel-tests would greatly reduce this run time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6674) configure parallel tests for mapreduce-client-jobclient

2017-08-10 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6674:

Attachment: sftest.000.patch

> configure parallel tests for mapreduce-client-jobclient
> ---
>
> Key: MAPREDUCE-6674
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6674
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: test
>Affects Versions: 3.0.0-alpha1
>Reporter: Allen Wittenauer
>Priority: Critical
> Attachments: MAPREDUCE-6674.00.patch, MAPREDUCE-6674.01.patch, 
> MR-4980.011.patch, MR-4980.patch, sftest.000.patch
>
>
> mapreduce-client-jobclient takes almost an hour and a half.  Configuring 
> parallel-tests would greatly reduce this run time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6936) Remove unnecessary dependency of hadoop-yarn-server-common from hadoop-mapreduce-client-common

2017-08-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122189#comment-16122189
 ] 

Hadoop QA commented on MAPREDUCE-6936:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
24s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
50s{color} | {color:green} hadoop-mapreduce-client-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 20m 18s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | MAPREDUCE-6936 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12881306/MAPREDUCE-6936.00.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  xml  |
| uname | Linux 0f96aa13d342 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 
12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 312e57b |
| Default Java | 1.8.0_131 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7055/testReport/ |
| modules | C: 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common 
U: 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common 
|
| Console output | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7055/console |
| Powered by | Apache Yetus 0.5.0   http://yetus.apache.org |


This message was automatically generated.



> Remove unnecessary dependency of hadoop-yarn-server-common from 
> hadoop-mapreduce-client-common 
> ---
>
> Key: MAPREDUCE-6936
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6936
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: MAPREDUCE-6936.00.patch
>
>
> The dependency of hadoop-yarn-server-common in hadoop-mapreduce-client-common 
> seems unnecessary, as 
> it is not using as of the classes from hadoop-yarn-server-common. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, 

[jira] [Updated] (MAPREDUCE-6936) Remove unnecessary dependency of hadoop-yarn-server-common from hadoop-mapreduce-client-common

2017-08-10 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-6936:
--
Attachment: MAPREDUCE-6936.00.patch

> Remove unnecessary dependency of hadoop-yarn-server-common from 
> hadoop-mapreduce-client-common 
> ---
>
> Key: MAPREDUCE-6936
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6936
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: MAPREDUCE-6936.00.patch
>
>
> The dependency of hadoop-yarn-server-common in hadoop-mapreduce-client-common 
> seems unnecessary, as 
> it is not using as of the classes from hadoop-yarn-server-common. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6936) Remove unnecessary dependency of hadoop-yarn-server-common from hadoop-mapreduce-client-common

2017-08-10 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-6936:
--
Status: Patch Available  (was: Open)

> Remove unnecessary dependency of hadoop-yarn-server-common from 
> hadoop-mapreduce-client-common 
> ---
>
> Key: MAPREDUCE-6936
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6936
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: MAPREDUCE-6936.00.patch
>
>
> The dependency of hadoop-yarn-server-common in hadoop-mapreduce-client-common 
> seems unnecessary, as 
> it is not using as of the classes from hadoop-yarn-server-common. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6936) Remove unnecessary dependency of hadoop-yarn-server-common from hadoop-mapreduce-client-common

2017-08-10 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-6936:
--
Description: 
The dependency of hadoop-yarn-server-common in hadoop-mapreduce-client-common 
seems unnecessary, as 
it is not using as of the classes from hadoop-yarn-server-common. 

  was:The dependency of 


> Remove unnecessary dependency of hadoop-yarn-server-common from 
> hadoop-mapreduce-client-common 
> ---
>
> Key: MAPREDUCE-6936
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6936
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>
> The dependency of hadoop-yarn-server-common in hadoop-mapreduce-client-common 
> seems unnecessary, as 
> it is not using as of the classes from hadoop-yarn-server-common. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6936) Remove unnecessary dependency of hadoop-yarn-server-common from hadoop-mapreduce-client-common

2017-08-10 Thread Haibo Chen (JIRA)
Haibo Chen created MAPREDUCE-6936:
-

 Summary: Remove unnecessary dependency of 
hadoop-yarn-server-common from hadoop-mapreduce-client-common 
 Key: MAPREDUCE-6936
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6936
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Reporter: Haibo Chen
Assignee: Haibo Chen






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6936) Remove unnecessary dependency of hadoop-yarn-server-common from hadoop-mapreduce-client-common

2017-08-10 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-6936:
--
Description: The dependency of 

> Remove unnecessary dependency of hadoop-yarn-server-common from 
> hadoop-mapreduce-client-common 
> ---
>
> Key: MAPREDUCE-6936
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6936
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>
> The dependency of 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (MAPREDUCE-6931) Fix TestDFSIO "Total Throughput" calculation

2017-08-10 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122067#comment-16122067
 ] 

Konstantin Shvachko edited comment on MAPREDUCE-6931 at 8/10/17 6:40 PM:
-

Hey [~dennishuo], thanks for reporting this.
As I mentioned in HDFS-9153 ([in this 
comment|https://issues.apache.org/jira/browse/HDFS-9153?focusedCommentId=16122049=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16122049]
 the "Total Throughput" should be removed as a deceiving metrics.
Could you please fix this by removing the line.

Also, DFSIO issues should be filed on HDFS jira.


was (Author: shv):
Hey [~dennishuo], thanks for reporting this.
As I mentioned in HDFS-9153 ([in this 
comment|https://issues.apache.org/jira/browse/HDFS-9153?focusedCommentId=16122049=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16122049]
 the "Total Throughput" should be removed as a deceiving metrics.
Could you please fix this by removing the line.

> Fix TestDFSIO "Total Throughput" calculation
> 
>
> Key: MAPREDUCE-6931
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6931
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: benchmarks, test
>Affects Versions: 2.8.0
>Reporter: Dennis Huo
>Priority: Trivial
>
> The new "Total Throughput" line added in 
> https://issues.apache.org/jira/browse/HDFS-9153 is currently calculated as 
> {{toMB(size) / ((float)execTime)}} and claims to be in units of "MB/s", but 
> {{execTime}} is in milliseconds; thus, the reported number is 1/1000x the 
> actual value:
> {code:java}
> String resultLines[] = {
> "- TestDFSIO - : " + testType,
> "Date & time: " + new Date(System.currentTimeMillis()),
> "Number of files: " + tasks,
> " Total MBytes processed: " + df.format(toMB(size)),
> "  Throughput mb/sec: " + df.format(size * 1000.0 / (time * 
> MEGA)),
> "Total Throughput mb/sec: " + df.format(toMB(size) / 
> ((float)execTime)),
> " Average IO rate mb/sec: " + df.format(med),
> "  IO rate std deviation: " + df.format(stdDev),
> " Test exec time sec: " + df.format((float)execTime / 1000),
> "" };
> {code}
> The different calculated fields can also use toMB and a shared 
> milliseconds-to-seconds conversion to make it easier to keep units consistent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6931) Fix TestDFSIO "Total Throughput" calculation

2017-08-10 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122067#comment-16122067
 ] 

Konstantin Shvachko commented on MAPREDUCE-6931:


Hey [~dennishuo], thanks for reporting this.
As I mentioned in HDFS-9153 ([in this 
comment|https://issues.apache.org/jira/browse/HDFS-9153?focusedCommentId=16122049=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16122049]
 the "Total Throughput" should be removed as a deceiving metrics.
Could you please fix this by removing the line.

> Fix TestDFSIO "Total Throughput" calculation
> 
>
> Key: MAPREDUCE-6931
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6931
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: benchmarks, test
>Affects Versions: 2.8.0
>Reporter: Dennis Huo
>Priority: Trivial
>
> The new "Total Throughput" line added in 
> https://issues.apache.org/jira/browse/HDFS-9153 is currently calculated as 
> {{toMB(size) / ((float)execTime)}} and claims to be in units of "MB/s", but 
> {{execTime}} is in milliseconds; thus, the reported number is 1/1000x the 
> actual value:
> {code:java}
> String resultLines[] = {
> "- TestDFSIO - : " + testType,
> "Date & time: " + new Date(System.currentTimeMillis()),
> "Number of files: " + tasks,
> " Total MBytes processed: " + df.format(toMB(size)),
> "  Throughput mb/sec: " + df.format(size * 1000.0 / (time * 
> MEGA)),
> "Total Throughput mb/sec: " + df.format(toMB(size) / 
> ((float)execTime)),
> " Average IO rate mb/sec: " + df.format(med),
> "  IO rate std deviation: " + df.format(stdDev),
> " Test exec time sec: " + df.format((float)execTime / 1000),
> "" };
> {code}
> The different calculated fields can also use toMB and a shared 
> milliseconds-to-seconds conversion to make it easier to keep units consistent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6923) Optimize MapReduce Shuffle I/O for small partitions

2017-08-10 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121932#comment-16121932
 ] 

Ravi Prakash commented on MAPREDUCE-6923:
-

bq. I'd say that for readSize == trans, we're in the else block, 
Thanks for pointing that out Robert! :-) Yupp. I agree 

bq. I'll be linking to the results once they're properly published.
Looking forward to it :-)

> Optimize MapReduce Shuffle I/O for small partitions
> ---
>
> Key: MAPREDUCE-6923
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6923
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
> Environment: Observed in Hadoop 2.7.3 and above (judging from the 
> source code of future versions), and Ubuntu 16.04.
>Reporter: Robert Schmidtke
>Assignee: Robert Schmidtke
> Fix For: 2.9.0, 3.0.0-beta1
>
> Attachments: MAPREDUCE-6923.00.patch, MAPREDUCE-6923.01.patch
>
>
> When a job configuration results in small partitions read by each reducer 
> from each mapper (e.g. 65 kilobytes as in my setup: a 
> [TeraSort|https://github.com/apache/hadoop/blob/branch-2.7.3/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraSort.java]
>  of 256 gigabytes using 2048 mappers and reducers each), and setting
> {code:xml}
> 
>   mapreduce.shuffle.transferTo.allowed
>   false
> 
> {code}
> then the default setting of
> {code:xml}
> 
>   mapreduce.shuffle.transfer.buffer.size
>   131072
> 
> {code}
> results in almost 100% overhead in reads during shuffle in YARN, because for 
> each 65K needed, 128K are read.
> I propose a fix in 
> [FadvisedFileRegion.java|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedFileRegion.java#L114]
>  as follows:
> {code:java}
> ByteBuffer byteBuffer = ByteBuffer.allocate(Math.min(this.shuffleBufferSize, 
> trans > Integer.MAX_VALUE ? Integer.MAX_VALUE : (int) trans));
> {code}
> e.g. 
> [here|https://github.com/apache/hadoop/compare/branch-2.7.3...robert-schmidtke:adaptive-shuffle-buffer].
>  This sets the shuffle buffer size to the minimum value of the shuffle buffer 
> size specified in the configuration (128K by default), and the actual 
> partition size (65K on average in my setup). In my benchmarks this reduced 
> the read overhead in YARN from about 100% (255 additional gigabytes as 
> described above) down to about 18% (an additional 45 gigabytes). The runtime 
> of the job remained the same in my setup.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6870) Add configuration for MR job to finish when all reducers are complete (even with unfinished mappers)

2017-08-10 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121826#comment-16121826
 ] 

Haibo Chen commented on MAPREDUCE-6870:
---

+1 Will check it in later today.

> Add configuration for MR job to finish when all reducers are complete (even 
> with unfinished mappers)
> 
>
> Key: MAPREDUCE-6870
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6870
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.6.1
>Reporter: Zhe Zhang
>Assignee: Peter Bacsko
> Attachments: MAPREDUCE-6870-001.patch, MAPREDUCE-6870-002.patch, 
> MAPREDUCE-6870-003.patch, MAPREDUCE-6870-004.patch, MAPREDUCE-6870-005.patch, 
> MAPREDUCE-6870-006.patch, MAPREDUCE-6870-007.patch
>
>
> Even with MAPREDUCE-5817, there could still be cases where mappers get 
> scheduled before all reducers are complete, but those mappers run for long 
> time, even after all reducers are complete. This could hurt the performance 
> of large MR jobs.
> In some cases, mappers don't have any materialize-able outcome other than 
> providing intermediate data to reducers. In that case, the job owner should 
> have the config option to finish the job once all reducers are complete.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6935) Allow multiple active timeline clients

2017-08-10 Thread Aaron Gresch (JIRA)
Aaron Gresch created MAPREDUCE-6935:
---

 Summary: Allow multiple active timeline clients 
 Key: MAPREDUCE-6935
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6935
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Aaron Gresch


In order to migrate smoothly from timeline service v1 to v2, it would be useful 
to be able to run both services at the same time for a period of time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6870) Add configuration for MR job to finish when all reducers are complete (even with unfinished mappers)

2017-08-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121585#comment-16121585
 ] 

Hadoop QA commented on MAPREDUCE-6870:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
7s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
7s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
36s{color} | {color:green} hadoop-mapreduce-project/hadoop-mapreduce-client: 
The patch generated 0 new + 654 unchanged - 5 fixed = 654 total (was 659) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
39s{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m 
52s{color} | {color:green} hadoop-mapreduce-client-app in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
14s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 38m 24s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | MAPREDUCE-6870 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12881202/MAPREDUCE-6870-007.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  xml  |
| uname | Linux 73c1dc9f7ec6 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 
11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 8d953c2 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7054/testReport/ |
| modules | C: 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app U: 
hadoop-mapreduce-project/hadoop-mapreduce-client |
| 

[jira] [Commented] (MAPREDUCE-6870) Add configuration for MR job to finish when all reducers are complete (even with unfinished mappers)

2017-08-10 Thread Peter Bacsko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121541#comment-16121541
 ] 

Peter Bacsko commented on MAPREDUCE-6870:
-

Yep, good catch. Uploading v7.

> Add configuration for MR job to finish when all reducers are complete (even 
> with unfinished mappers)
> 
>
> Key: MAPREDUCE-6870
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6870
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.6.1
>Reporter: Zhe Zhang
>Assignee: Peter Bacsko
> Attachments: MAPREDUCE-6870-001.patch, MAPREDUCE-6870-002.patch, 
> MAPREDUCE-6870-003.patch, MAPREDUCE-6870-004.patch, MAPREDUCE-6870-005.patch, 
> MAPREDUCE-6870-006.patch, MAPREDUCE-6870-007.patch
>
>
> Even with MAPREDUCE-5817, there could still be cases where mappers get 
> scheduled before all reducers are complete, but those mappers run for long 
> time, even after all reducers are complete. This could hurt the performance 
> of large MR jobs.
> In some cases, mappers don't have any materialize-able outcome other than 
> providing intermediate data to reducers. In that case, the job owner should 
> have the config option to finish the job once all reducers are complete.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6870) Add configuration for MR job to finish when all reducers are complete (even with unfinished mappers)

2017-08-10 Thread Peter Bacsko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated MAPREDUCE-6870:

Attachment: MAPREDUCE-6870-007.patch

> Add configuration for MR job to finish when all reducers are complete (even 
> with unfinished mappers)
> 
>
> Key: MAPREDUCE-6870
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6870
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.6.1
>Reporter: Zhe Zhang
>Assignee: Peter Bacsko
> Attachments: MAPREDUCE-6870-001.patch, MAPREDUCE-6870-002.patch, 
> MAPREDUCE-6870-003.patch, MAPREDUCE-6870-004.patch, MAPREDUCE-6870-005.patch, 
> MAPREDUCE-6870-006.patch, MAPREDUCE-6870-007.patch
>
>
> Even with MAPREDUCE-5817, there could still be cases where mappers get 
> scheduled before all reducers are complete, but those mappers run for long 
> time, even after all reducers are complete. This could hurt the performance 
> of large MR jobs.
> In some cases, mappers don't have any materialize-able outcome other than 
> providing intermediate data to reducers. In that case, the job owner should 
> have the config option to finish the job once all reducers are complete.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6923) Optimize MapReduce Shuffle I/O for small partitions

2017-08-10 Thread Robert Schmidtke (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121117#comment-16121117
 ] 

Robert Schmidtke commented on MAPREDUCE-6923:
-

Thanks for coming back to my comments.

When I said Yarn I indeed meant the NodeManager, sorry for the confusion. 
You're right about the shuffle service, it was however something that I only 
discovered recently, having built my configuration a long time ago, not exactly 
knowing what I was doing. I set these keys as you described.
I'm seeing jar files being loaded in the MapTask and ReduceTask JVMs alright, 
but there does not seem to be disk I/O overhead.

In any case, I greatly appreciate all of your effort, and now that things are 
working as expected for me, I can focus on analyzing the numbers and making 
some sense of them. I'll be linking to the results once they're properly 
published.

Cheers
Robert

> Optimize MapReduce Shuffle I/O for small partitions
> ---
>
> Key: MAPREDUCE-6923
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6923
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
> Environment: Observed in Hadoop 2.7.3 and above (judging from the 
> source code of future versions), and Ubuntu 16.04.
>Reporter: Robert Schmidtke
>Assignee: Robert Schmidtke
> Fix For: 2.9.0, 3.0.0-beta1
>
> Attachments: MAPREDUCE-6923.00.patch, MAPREDUCE-6923.01.patch
>
>
> When a job configuration results in small partitions read by each reducer 
> from each mapper (e.g. 65 kilobytes as in my setup: a 
> [TeraSort|https://github.com/apache/hadoop/blob/branch-2.7.3/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraSort.java]
>  of 256 gigabytes using 2048 mappers and reducers each), and setting
> {code:xml}
> 
>   mapreduce.shuffle.transferTo.allowed
>   false
> 
> {code}
> then the default setting of
> {code:xml}
> 
>   mapreduce.shuffle.transfer.buffer.size
>   131072
> 
> {code}
> results in almost 100% overhead in reads during shuffle in YARN, because for 
> each 65K needed, 128K are read.
> I propose a fix in 
> [FadvisedFileRegion.java|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedFileRegion.java#L114]
>  as follows:
> {code:java}
> ByteBuffer byteBuffer = ByteBuffer.allocate(Math.min(this.shuffleBufferSize, 
> trans > Integer.MAX_VALUE ? Integer.MAX_VALUE : (int) trans));
> {code}
> e.g. 
> [here|https://github.com/apache/hadoop/compare/branch-2.7.3...robert-schmidtke:adaptive-shuffle-buffer].
>  This sets the shuffle buffer size to the minimum value of the shuffle buffer 
> size specified in the configuration (128K by default), and the actual 
> partition size (65K on average in my setup). In my benchmarks this reduced 
> the read overhead in YARN from about 100% (255 additional gigabytes as 
> described above) down to about 18% (an additional 45 gigabytes). The runtime 
> of the job remained the same in my setup.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (MAPREDUCE-6923) Optimize MapReduce Shuffle I/O for small partitions

2017-08-10 Thread Robert Schmidtke (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121109#comment-16121109
 ] 

Robert Schmidtke edited comment on MAPREDUCE-6923 at 8/10/17 6:07 AM:
--

Hi Ravi,

{quote}
When {{shuffleBufferSize <= trans}}, then behavior is exactly the same as old 
code.
{quote}
Yes.

{quote}
if {{readSize == trans}} (i.e. the {{fileChannel.read()}} returned as many 
bytes as I wanted to transfer, {{trans}} is decremented correctly, {{position}} 
is increased correctly and the {{byteBuffer}} is flipped as usual. 
{{byteBuffer}}'s contents are written to {{target}} as usual, {{byteBuffer}} is 
cleared and then hopefully GCed never to be seen again.
{quote}
I'd say that for {{readSize == trans}}, we're in the [else 
block|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedFileRegion.java#L127],
 and thus {{byteBuffer}}'s {{limit()}} is set to {{trans}} (which is the size 
it already has because we're in the case where {{trans < shuffleBufferSize}}. 
It's correctly positioned to {{0}} as we're done reading, and {{trans}} is 
correctly set to {{0}}. Afterwards, the loop breaks (it can only be one 
iteration here because otherwise {{trans}} would have been larger than 
{{shuffleBufferSize}}), {{byteBuffer}} is written to {{target}} and then 
cleared.

{quote}
if {{readSize < trans}} (almost the same thing as above happens, but in a while 
loop). The only change this patch makes is that the {{byteBuffer}} may be 
smaller than before this patch, but it doesn't matter because its big enough 
for the number of bytes we need to transfer.
{quote}
Now we have the situation you described for the previous case, and I agree with 
your reasoning here.


was (Author: rosch):
Hi Ravi,

{quote}
When {{shuffleBufferSize <= trans}}, then behavior is exactly the same as old 
code.
{quote}
Yes.

{quote}
if {{readSize == trans}} (i.e. the {{fileChannel.read()}} returned as many 
bytes as I wanted to transfer, {{trans}} is decremented correctly, {{position}} 
is increased correctly and the {{byteBuffer}} is flipped as usual. 
{{byteBuffer}}'s contents are written to {{target}} as usual, {{byteBuffer}} is 
cleared and then hopefully GCed never to be seen again.
{quote}
I'd say that for {{readSize == trans}}, we're in the [else 
block|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedFileRegion.java#L127],
 and thus {{byteBuffer}} is {{limit()}}ed to {{trans}} (which is the size it 
already has because we're in the case where {{trans < shuffleBufferSize}}. It's 
correctly positioned to {{0}} as we're done reading, and {{trans}} is correctly 
set to {{0}}. Afterwards, the loop breaks (it can only be one iteration here 
because otherwise {{trans}} would have been larger than {{shuffleBufferSize}}), 
{{byteBuffer}} is written to {{target}} and then cleared.

{quote}
if {{readSize < trans}} (almost the same thing as above happens, but in a while 
loop). The only change this patch makes is that the {{byteBuffer}} may be 
smaller than before this patch, but it doesn't matter because its big enough 
for the number of bytes we need to transfer.
{quote}
Now we have the situation you described for the previous case, and I agree with 
your reasoning here.

> Optimize MapReduce Shuffle I/O for small partitions
> ---
>
> Key: MAPREDUCE-6923
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6923
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
> Environment: Observed in Hadoop 2.7.3 and above (judging from the 
> source code of future versions), and Ubuntu 16.04.
>Reporter: Robert Schmidtke
>Assignee: Robert Schmidtke
> Fix For: 2.9.0, 3.0.0-beta1
>
> Attachments: MAPREDUCE-6923.00.patch, MAPREDUCE-6923.01.patch
>
>
> When a job configuration results in small partitions read by each reducer 
> from each mapper (e.g. 65 kilobytes as in my setup: a 
> [TeraSort|https://github.com/apache/hadoop/blob/branch-2.7.3/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraSort.java]
>  of 256 gigabytes using 2048 mappers and reducers each), and setting
> {code:xml}
> 
>   mapreduce.shuffle.transferTo.allowed
>   false
> 
> {code}
> then the default setting of
> {code:xml}
> 
>   mapreduce.shuffle.transfer.buffer.size
>   131072
> 
> {code}
> results in almost 100% overhead in reads during shuffle in YARN, because for 
> each 65K needed, 128K are read.
> I propose a fix in 
> 

[jira] [Comment Edited] (MAPREDUCE-6923) Optimize MapReduce Shuffle I/O for small partitions

2017-08-10 Thread Robert Schmidtke (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121109#comment-16121109
 ] 

Robert Schmidtke edited comment on MAPREDUCE-6923 at 8/10/17 6:06 AM:
--

Hi Ravi,

{quote}
When {{shuffleBufferSize <= trans}}, then behavior is exactly the same as old 
code.
{quote}
Yes.

{quote}
if {{readSize == trans}} (i.e. the {{fileChannel.read()}} returned as many 
bytes as I wanted to transfer, {{trans}} is decremented correctly, {{position}} 
is increased correctly and the {{byteBuffer}} is flipped as usual. 
{{byteBuffer}}'s contents are written to {{target}} as usual, {{byteBuffer}} is 
cleared and then hopefully GCed never to be seen again.
{quote}
I'd say that for {{readSize == trans}}, we're in the [else 
block|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedFileRegion.java#L127],
 and thus {{byteBuffer}} is {{limit()}} ed to {{trans}} (which is the size it 
already has because we're in the case where {{trans < shuffleBufferSize}}. It's 
correctly positioned to {{0}} as we're done reading, and {{trans}} is correctly 
set to {{0}}. Afterwards, the loop breaks (it can only be one iteration here 
because otherwise {{trans}} would have been larger than {{shuffleBufferSize}}), 
{{byteBuffer}} is written to {{target}} and then cleared.

{quote}
if {{readSize < trans}} (almost the same thing as above happens, but in a while 
loop). The only change this patch makes is that the {{byteBuffer}} may be 
smaller than before this patch, but it doesn't matter because its big enough 
for the number of bytes we need to transfer.
{quote}
Now we have the situation you described for the previous case, and I agree with 
your reasoning here.


was (Author: rosch):
Hi Ravi,

{quote}
When {{shuffleBufferSize <= trans}}, then behavior is exactly the same as old 
code.
{quote}
Yes.

{quote}
if {{readSize == trans}} (i.e. the {{fileChannel.read()}} returned as many 
bytes as I wanted to transfer, {{trans}} is decremented correctly, {{position}} 
is increased correctly and the {{byteBuffer}} is flipped as usual. 
{{byteBuffer}} 's contents are written to {{target}} as usual, {{byteBuffer}} 
is cleared and then hopefully GCed never to be seen again.
{quote}
I'd say that for {{readSize == trans}}, we're in the [else 
block|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedFileRegion.java#L127],
 and thus {{byteBuffer}} is {{limit()}} ed to {{trans}} (which is the size it 
already has because we're in the case where {{trans < shuffleBufferSize}}. It's 
correctly positioned to {{0}} as we're done reading, and {{trans}} is correctly 
set to {{0}}. Afterwards, the loop breaks (it can only be one iteration here 
because otherwise {{trans}} would have been larger than {{shuffleBufferSize}}), 
{{byteBuffer}} is written to {{target}} and then cleared.

{quote}
if {{readSize < trans}} (almost the same thing as above happens, but in a while 
loop). The only change this patch makes is that the {{byteBuffer}} may be 
smaller than before this patch, but it doesn't matter because its big enough 
for the number of bytes we need to transfer.
{quote}
Now we have the situation you described for the previous case, and I agree with 
your reasoning here.

> Optimize MapReduce Shuffle I/O for small partitions
> ---
>
> Key: MAPREDUCE-6923
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6923
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
> Environment: Observed in Hadoop 2.7.3 and above (judging from the 
> source code of future versions), and Ubuntu 16.04.
>Reporter: Robert Schmidtke
>Assignee: Robert Schmidtke
> Fix For: 2.9.0, 3.0.0-beta1
>
> Attachments: MAPREDUCE-6923.00.patch, MAPREDUCE-6923.01.patch
>
>
> When a job configuration results in small partitions read by each reducer 
> from each mapper (e.g. 65 kilobytes as in my setup: a 
> [TeraSort|https://github.com/apache/hadoop/blob/branch-2.7.3/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraSort.java]
>  of 256 gigabytes using 2048 mappers and reducers each), and setting
> {code:xml}
> 
>   mapreduce.shuffle.transferTo.allowed
>   false
> 
> {code}
> then the default setting of
> {code:xml}
> 
>   mapreduce.shuffle.transfer.buffer.size
>   131072
> 
> {code}
> results in almost 100% overhead in reads during shuffle in YARN, because for 
> each 65K needed, 128K are read.
> I propose a fix in 
> 

[jira] [Comment Edited] (MAPREDUCE-6923) Optimize MapReduce Shuffle I/O for small partitions

2017-08-10 Thread Robert Schmidtke (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121109#comment-16121109
 ] 

Robert Schmidtke edited comment on MAPREDUCE-6923 at 8/10/17 6:06 AM:
--

Hi Ravi,

{quote}
When {{shuffleBufferSize <= trans}}, then behavior is exactly the same as old 
code.
{quote}
Yes.

{quote}
if {{readSize == trans}} (i.e. the {{fileChannel.read()}} returned as many 
bytes as I wanted to transfer, {{trans}} is decremented correctly, {{position}} 
is increased correctly and the {{byteBuffer}} is flipped as usual. 
{{byteBuffer}}'s contents are written to {{target}} as usual, {{byteBuffer}} is 
cleared and then hopefully GCed never to be seen again.
{quote}
I'd say that for {{readSize == trans}}, we're in the [else 
block|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedFileRegion.java#L127],
 and thus {{byteBuffer}} is {{limit()}}ed to {{trans}} (which is the size it 
already has because we're in the case where {{trans < shuffleBufferSize}}. It's 
correctly positioned to {{0}} as we're done reading, and {{trans}} is correctly 
set to {{0}}. Afterwards, the loop breaks (it can only be one iteration here 
because otherwise {{trans}} would have been larger than {{shuffleBufferSize}}), 
{{byteBuffer}} is written to {{target}} and then cleared.

{quote}
if {{readSize < trans}} (almost the same thing as above happens, but in a while 
loop). The only change this patch makes is that the {{byteBuffer}} may be 
smaller than before this patch, but it doesn't matter because its big enough 
for the number of bytes we need to transfer.
{quote}
Now we have the situation you described for the previous case, and I agree with 
your reasoning here.


was (Author: rosch):
Hi Ravi,

{quote}
When {{shuffleBufferSize <= trans}}, then behavior is exactly the same as old 
code.
{quote}
Yes.

{quote}
if {{readSize == trans}} (i.e. the {{fileChannel.read()}} returned as many 
bytes as I wanted to transfer, {{trans}} is decremented correctly, {{position}} 
is increased correctly and the {{byteBuffer}} is flipped as usual. 
{{byteBuffer}}'s contents are written to {{target}} as usual, {{byteBuffer}} is 
cleared and then hopefully GCed never to be seen again.
{quote}
I'd say that for {{readSize == trans}}, we're in the [else 
block|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedFileRegion.java#L127],
 and thus {{byteBuffer}} is {{limit()}} ed to {{trans}} (which is the size it 
already has because we're in the case where {{trans < shuffleBufferSize}}. It's 
correctly positioned to {{0}} as we're done reading, and {{trans}} is correctly 
set to {{0}}. Afterwards, the loop breaks (it can only be one iteration here 
because otherwise {{trans}} would have been larger than {{shuffleBufferSize}}), 
{{byteBuffer}} is written to {{target}} and then cleared.

{quote}
if {{readSize < trans}} (almost the same thing as above happens, but in a while 
loop). The only change this patch makes is that the {{byteBuffer}} may be 
smaller than before this patch, but it doesn't matter because its big enough 
for the number of bytes we need to transfer.
{quote}
Now we have the situation you described for the previous case, and I agree with 
your reasoning here.

> Optimize MapReduce Shuffle I/O for small partitions
> ---
>
> Key: MAPREDUCE-6923
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6923
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
> Environment: Observed in Hadoop 2.7.3 and above (judging from the 
> source code of future versions), and Ubuntu 16.04.
>Reporter: Robert Schmidtke
>Assignee: Robert Schmidtke
> Fix For: 2.9.0, 3.0.0-beta1
>
> Attachments: MAPREDUCE-6923.00.patch, MAPREDUCE-6923.01.patch
>
>
> When a job configuration results in small partitions read by each reducer 
> from each mapper (e.g. 65 kilobytes as in my setup: a 
> [TeraSort|https://github.com/apache/hadoop/blob/branch-2.7.3/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraSort.java]
>  of 256 gigabytes using 2048 mappers and reducers each), and setting
> {code:xml}
> 
>   mapreduce.shuffle.transferTo.allowed
>   false
> 
> {code}
> then the default setting of
> {code:xml}
> 
>   mapreduce.shuffle.transfer.buffer.size
>   131072
> 
> {code}
> results in almost 100% overhead in reads during shuffle in YARN, because for 
> each 65K needed, 128K are read.
> I propose a fix in 
> 

[jira] [Commented] (MAPREDUCE-6923) Optimize MapReduce Shuffle I/O for small partitions

2017-08-10 Thread Robert Schmidtke (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121109#comment-16121109
 ] 

Robert Schmidtke commented on MAPREDUCE-6923:
-

Hi Ravi,

{quote}
When {{shuffleBufferSize <= trans}}, then behavior is exactly the same as old 
code.
{quote}
Yes.

{quote}
if {{readSize == trans}} (i.e. the {{fileChannel.read()}} returned as many 
bytes as I wanted to transfer, {{trans}} is decremented correctly, {{position}} 
is increased correctly and the {{byteBuffer}} is flipped as usual. 
{{byteBuffer}} 's contents are written to {{target}} as usual, {{byteBuffer}} 
is cleared and then hopefully GCed never to be seen again.
{quote}
I'd say that for {{readSize == trans}}, we're in the [else 
block|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedFileRegion.java#L127],
 and thus {{byteBuffer}} is {{limit()}} ed to {{trans}} (which is the size it 
already has because we're in the case where {{trans < shuffleBufferSize}}. It's 
correctly positioned to {{0}} as we're done reading, and {{trans}} is correctly 
set to {{0}}. Afterwards, the loop breaks (it can only be one iteration here 
because otherwise {{trans}} would have been larger than {{shuffleBufferSize}}), 
{{byteBuffer}} is written to {{target}} and then cleared.

{quote}
if {{readSize < trans}} (almost the same thing as above happens, but in a while 
loop). The only change this patch makes is that the {{byteBuffer}} may be 
smaller than before this patch, but it doesn't matter because its big enough 
for the number of bytes we need to transfer.
{quote}
Now we have the situation you described for the previous case, and I agree with 
your reasoning here.

> Optimize MapReduce Shuffle I/O for small partitions
> ---
>
> Key: MAPREDUCE-6923
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6923
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
> Environment: Observed in Hadoop 2.7.3 and above (judging from the 
> source code of future versions), and Ubuntu 16.04.
>Reporter: Robert Schmidtke
>Assignee: Robert Schmidtke
> Fix For: 2.9.0, 3.0.0-beta1
>
> Attachments: MAPREDUCE-6923.00.patch, MAPREDUCE-6923.01.patch
>
>
> When a job configuration results in small partitions read by each reducer 
> from each mapper (e.g. 65 kilobytes as in my setup: a 
> [TeraSort|https://github.com/apache/hadoop/blob/branch-2.7.3/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraSort.java]
>  of 256 gigabytes using 2048 mappers and reducers each), and setting
> {code:xml}
> 
>   mapreduce.shuffle.transferTo.allowed
>   false
> 
> {code}
> then the default setting of
> {code:xml}
> 
>   mapreduce.shuffle.transfer.buffer.size
>   131072
> 
> {code}
> results in almost 100% overhead in reads during shuffle in YARN, because for 
> each 65K needed, 128K are read.
> I propose a fix in 
> [FadvisedFileRegion.java|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedFileRegion.java#L114]
>  as follows:
> {code:java}
> ByteBuffer byteBuffer = ByteBuffer.allocate(Math.min(this.shuffleBufferSize, 
> trans > Integer.MAX_VALUE ? Integer.MAX_VALUE : (int) trans));
> {code}
> e.g. 
> [here|https://github.com/apache/hadoop/compare/branch-2.7.3...robert-schmidtke:adaptive-shuffle-buffer].
>  This sets the shuffle buffer size to the minimum value of the shuffle buffer 
> size specified in the configuration (128K by default), and the actual 
> partition size (65K on average in my setup). In my benchmarks this reduced 
> the read overhead in YARN from about 100% (255 additional gigabytes as 
> described above) down to about 18% (an additional 45 gigabytes). The runtime 
> of the job remained the same in my setup.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org