[jira] [Commented] (MAPREDUCE-6674) configure parallel tests for mapreduce-client-jobclient
[ https://issues.apache.org/jira/browse/MAPREDUCE-6674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122837#comment-16122837 ] Hadoop QA commented on MAPREDUCE-6674: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 19m 22s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 140 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 29s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 34s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 11m 34s{color} | {color:red} root generated 2 new + 1320 unchanged - 54 fixed = 1322 total (was 1374) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 2m 35s{color} | {color:red} root: The patch generated 276 new + 2623 unchanged - 200 fixed = 2899 total (was 2823) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 8 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch 1 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 30m 32s{color} | {color:red} hadoop-mapreduce-client-jobclient in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 14s{color} | {color:green} hadoop-extras in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 35s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}104m 57s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.mapred.TestMRCJCFileInputFormat | | Timed out junit tests | org.apache.hadoop.mapreduce.v2.TestMRJobs | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | MAPREDUCE-6674 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12881380/MR-4980.013.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit xml findbugs checkstyle | | uname | Linux 1b2ada8be358 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / a32e013 | | Default Java |
[jira] [Updated] (MAPREDUCE-6674) configure parallel tests for mapreduce-client-jobclient
[ https://issues.apache.org/jira/browse/MAPREDUCE-6674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6674: Attachment: MR-4980.013.patch > configure parallel tests for mapreduce-client-jobclient > --- > > Key: MAPREDUCE-6674 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6674 > Project: Hadoop Map/Reduce > Issue Type: Test > Components: test >Affects Versions: 3.0.0-alpha1 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: MAPREDUCE-6674.00.patch, MAPREDUCE-6674.01.patch, > MR-4980.011.patch, MR-4980.012.patch, MR-4980.013.patch, MR-4980.patch, > sftest.000.patch > > > mapreduce-client-jobclient takes almost an hour and a half. Configuring > parallel-tests would greatly reduce this run time. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-4980) Parallel test execution of hadoop-mapreduce-client-core
[ https://issues.apache.org/jira/browse/MAPREDUCE-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122733#comment-16122733 ] Allen Wittenauer commented on MAPREDUCE-4980: - A few notes: * TestMRJobs doesn't actually appear to be a unit test. It looks more like an integration test. * It currently takes about 18 minutes on my laptop to run. It's significantly longer on Jenkins. * Most of that time is spend starting up and bringing down clusters. * Is there any reason for it to do that? i.e., why can't can't it just bring up one cluster, do the tests, then shut it down? > Parallel test execution of hadoop-mapreduce-client-core > --- > > Key: MAPREDUCE-4980 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4980 > Project: Hadoop Map/Reduce > Issue Type: Test > Components: test >Affects Versions: 3.0.0-alpha1 >Reporter: Tsuyoshi Ozawa >Assignee: Andrey Klochkov > Attachments: MAPREDUCE-4980.010.patch, MAPREDUCE-4980.1.patch, > MAPREDUCE-4980--n3.patch, MAPREDUCE-4980--n4.patch, MAPREDUCE-4980--n5.patch, > MAPREDUCE-4980--n6.patch, MAPREDUCE-4980--n7.patch, MAPREDUCE-4980--n7.patch, > MAPREDUCE-4980--n8.patch, MAPREDUCE-4980.patch > > > The maven surefire plugin supports parallel testing feature. By using it, the > tests can be run more faster. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6674) configure parallel tests for mapreduce-client-jobclient
[ https://issues.apache.org/jira/browse/MAPREDUCE-6674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122597#comment-16122597 ] Hadoop QA commented on MAPREDUCE-6674: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 21m 6s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 140 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 3s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 15m 3s{color} | {color:red} root generated 2 new + 1320 unchanged - 54 fixed = 1322 total (was 1374) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 3m 9s{color} | {color:red} root: The patch generated 276 new + 2623 unchanged - 200 fixed = 2899 total (was 2823) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 11s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 8 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch 1 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 35m 8s{color} | {color:red} hadoop-mapreduce-client-jobclient in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 27s{color} | {color:green} hadoop-extras in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 47s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}122m 54s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestNNBench | | | hadoop.mapreduce.TestMapReduceLazyOutput | | Timed out junit tests | org.apache.hadoop.mapreduce.v2.TestUberAM | | | org.apache.hadoop.mapreduce.v2.TestMRJobs | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | MAPREDUCE-6674 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12881314/MR-4980.012.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit xml findbugs checkstyle | | uname | Linux e236bce585d8 3.13.0-123-generic #172-Ubuntu SMP Mon Jun 26 18:04:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality |
[jira] [Commented] (MAPREDUCE-6870) Add configuration for MR job to finish when all reducers are complete (even with unfinished mappers)
[ https://issues.apache.org/jira/browse/MAPREDUCE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122507#comment-16122507 ] Hudson commented on MAPREDUCE-6870: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #12162 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/12162/]) MAPREDUCE-6870. Add configuration for MR job to finish when all reducers (haibochen: rev a32e0138fb63c92902e6613001f38a87c8a41321) * (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestJobImpl.java * (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java * (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml * (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java > Add configuration for MR job to finish when all reducers are complete (even > with unfinished mappers) > > > Key: MAPREDUCE-6870 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6870 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.6.1 >Reporter: Zhe Zhang >Assignee: Peter Bacsko > Fix For: 3.0.0-beta1 > > Attachments: MAPREDUCE-6870-001.patch, MAPREDUCE-6870-002.patch, > MAPREDUCE-6870-003.patch, MAPREDUCE-6870-004.patch, MAPREDUCE-6870-005.patch, > MAPREDUCE-6870-006.patch, MAPREDUCE-6870-007.patch > > > Even with MAPREDUCE-5817, there could still be cases where mappers get > scheduled before all reducers are complete, but those mappers run for long > time, even after all reducers are complete. This could hurt the performance > of large MR jobs. > In some cases, mappers don't have any materialize-able outcome other than > providing intermediate data to reducers. In that case, the job owner should > have the config option to finish the job once all reducers are complete. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6931) Fix TestDFSIO "Total Throughput" calculation
[ https://issues.apache.org/jira/browse/MAPREDUCE-6931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122505#comment-16122505 ] Dennis Huo commented on MAPREDUCE-6931: --- Fair enough, makes sense. I went ahead and removed that line, keeping the refactorings otherwise; I also updated my commit message and pull request title to reflect the "removal" rather than the "fix" of the line, but it sounds like guidelines are to avoid editing JIRAs inplace, so I'll leave that untouched. > Fix TestDFSIO "Total Throughput" calculation > > > Key: MAPREDUCE-6931 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6931 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: benchmarks, test >Affects Versions: 2.8.0 >Reporter: Dennis Huo >Priority: Trivial > > The new "Total Throughput" line added in > https://issues.apache.org/jira/browse/HDFS-9153 is currently calculated as > {{toMB(size) / ((float)execTime)}} and claims to be in units of "MB/s", but > {{execTime}} is in milliseconds; thus, the reported number is 1/1000x the > actual value: > {code:java} > String resultLines[] = { > "- TestDFSIO - : " + testType, > "Date & time: " + new Date(System.currentTimeMillis()), > "Number of files: " + tasks, > " Total MBytes processed: " + df.format(toMB(size)), > " Throughput mb/sec: " + df.format(size * 1000.0 / (time * > MEGA)), > "Total Throughput mb/sec: " + df.format(toMB(size) / > ((float)execTime)), > " Average IO rate mb/sec: " + df.format(med), > " IO rate std deviation: " + df.format(stdDev), > " Test exec time sec: " + df.format((float)execTime / 1000), > "" }; > {code} > The different calculated fields can also use toMB and a shared > milliseconds-to-seconds conversion to make it easier to keep units consistent. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6870) Add configuration for MR job to finish when all reducers are complete (even with unfinished mappers)
[ https://issues.apache.org/jira/browse/MAPREDUCE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-6870: -- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.0.0-beta1 Status: Resolved (was: Patch Available) Thanks [~pbacsko] for the patch! I have committed it to trunk. > Add configuration for MR job to finish when all reducers are complete (even > with unfinished mappers) > > > Key: MAPREDUCE-6870 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6870 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.6.1 >Reporter: Zhe Zhang >Assignee: Peter Bacsko > Fix For: 3.0.0-beta1 > > Attachments: MAPREDUCE-6870-001.patch, MAPREDUCE-6870-002.patch, > MAPREDUCE-6870-003.patch, MAPREDUCE-6870-004.patch, MAPREDUCE-6870-005.patch, > MAPREDUCE-6870-006.patch, MAPREDUCE-6870-007.patch > > > Even with MAPREDUCE-5817, there could still be cases where mappers get > scheduled before all reducers are complete, but those mappers run for long > time, even after all reducers are complete. This could hurt the performance > of large MR jobs. > In some cases, mappers don't have any materialize-able outcome other than > providing intermediate data to reducers. In that case, the job owner should > have the config option to finish the job once all reducers are complete. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6674) configure parallel tests for mapreduce-client-jobclient
[ https://issues.apache.org/jira/browse/MAPREDUCE-6674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122417#comment-16122417 ] Hadoop QA commented on MAPREDUCE-6674: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 140 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 53s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 14m 53s{color} | {color:red} root generated 2 new + 1320 unchanged - 54 fixed = 1322 total (was 1374) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 3m 8s{color} | {color:red} root: The patch generated 276 new + 2623 unchanged - 200 fixed = 2899 total (was 2823) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 13s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 8 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 1s{color} | {color:red} The patch 1 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 41m 27s{color} | {color:red} hadoop-mapreduce-client-jobclient in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 49s{color} | {color:green} hadoop-extras in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 43s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}105m 55s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestNNBench | | | hadoop.mapred.TestMRTimelineEventHandling | | Timed out junit tests | org.apache.hadoop.mapreduce.v2.TestUberAM | | | org.apache.hadoop.mapreduce.v2.TestMRJobs | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | MAPREDUCE-6674 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12881314/MR-4980.012.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit xml findbugs checkstyle | | uname | Linux a67d71dc286b 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality |
[jira] [Commented] (MAPREDUCE-6674) configure parallel tests for mapreduce-client-jobclient
[ https://issues.apache.org/jira/browse/MAPREDUCE-6674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122374#comment-16122374 ] Hadoop QA commented on MAPREDUCE-6674: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 9m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 29s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 19s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 3s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 10m 3s{color} | {color:red} root generated 110 new + 1374 unchanged - 0 fixed = 1484 total (was 1374) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 10m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 19s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 13m 12s{color} | {color:red} root in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 29s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 94m 33s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.conf.TestCommonConfigurationFields | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | MAPREDUCE-6674 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12881312/sftest.000.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit xml | | uname | Linux bc65a92f0a93 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 312e57b | | Default Java | 1.8.0_131 | | javac | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7056/artifact/patchprocess/diff-compile-javac-root.txt | | unit | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7056/artifact/patchprocess/patch-unit-root.txt | | Test Results | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7056/testReport/ | | modules | C: hadoop-project . U: . | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7056/console | | Powered by | Apache Yetus 0.5.0 http://yetus.apache.org | This message was automatically generated. > configure parallel tests for mapreduce-client-jobclient > --- > > Key: MAPREDUCE-6674 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6674 > Project: Hadoop Map/Reduce > Issue Type: Test > Components: test >Affects Versions: 3.0.0-alpha1 >
[jira] [Commented] (MAPREDUCE-6931) Fix TestDFSIO "Total Throughput" calculation
[ https://issues.apache.org/jira/browse/MAPREDUCE-6931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122346#comment-16122346 ] Konstantin Shvachko commented on MAPREDUCE-6931: [~dennishuo] I agree that "Total Throughput" metric highly depends on how you run the job. This is exactly the point that it makes it a Mapreduce metric, not HDFS. One can go to Yarn UI and divide HDFS bytes written by the job time for any job, but it does not measure HDFS write operation. I think we should just remove it. > Fix TestDFSIO "Total Throughput" calculation > > > Key: MAPREDUCE-6931 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6931 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: benchmarks, test >Affects Versions: 2.8.0 >Reporter: Dennis Huo >Priority: Trivial > > The new "Total Throughput" line added in > https://issues.apache.org/jira/browse/HDFS-9153 is currently calculated as > {{toMB(size) / ((float)execTime)}} and claims to be in units of "MB/s", but > {{execTime}} is in milliseconds; thus, the reported number is 1/1000x the > actual value: > {code:java} > String resultLines[] = { > "- TestDFSIO - : " + testType, > "Date & time: " + new Date(System.currentTimeMillis()), > "Number of files: " + tasks, > " Total MBytes processed: " + df.format(toMB(size)), > " Throughput mb/sec: " + df.format(size * 1000.0 / (time * > MEGA)), > "Total Throughput mb/sec: " + df.format(toMB(size) / > ((float)execTime)), > " Average IO rate mb/sec: " + df.format(med), > " IO rate std deviation: " + df.format(stdDev), > " Test exec time sec: " + df.format((float)execTime / 1000), > "" }; > {code} > The different calculated fields can also use toMB and a shared > milliseconds-to-seconds conversion to make it easier to keep units consistent. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6931) Fix TestDFSIO "Total Throughput" calculation
[ https://issues.apache.org/jira/browse/MAPREDUCE-6931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122308#comment-16122308 ] Dennis Huo commented on MAPREDUCE-6931: --- Thanks for the explanation! I have no strong preference about removing the particular "Total Throughput" metric, but from my own experience using TestDFSIO in the past, I do find that the "average single-stream throughput" calculation historically provided by TestDFSIO can itself be somewhat misleading in characterizing a cluster since it makes it difficult to infer the level of concurrency corresponding to that per-stream performance without backing out the numbers manually. I see the new metric as being a useful measure of "Effective Aggregate Throughput", all-in including overhead. For example, if I use memory settings that only fit 1 container per physical machine at a time, my TestDFSIO will trickle through 1 task per machine at a time, and those single tasks will have very high single-stream throughput. If I instead do memory packing so that every machine runs, say, 64 tasks concurrently, then single-stream throughput will suffer significantly, while total walltime will decrease significantly. With a walltime-based calculation, I can see at a glance the approximate total throughput rating of my cluster when everything is running at full throttle; I'd expect increasing concurrency to increase aggregate throughput until IO limits are reached, where aggregate throughput will become flat w.r.t. increasing concurrency or slightly declining due to thrashing. This could also be my cloud bias, where it becomes more important to characterize a full-blast cluster against a remote filesystem vs caring so much about per-stream throughputs. It seems like an "effective aggregate throughput" calculation would help encompass the cluster-wide effects of things like optimal CPU oversubscription ratios, scheduler settings, speculative execution vs failure rates, etc. I agree the wording and computation as-is might not be the right fit for this though. I see a few options that might be worthwhile, possibly in some combination: * Change wording to say "Effective Aggregate Throughput" to more accurately describe what the number means * Add a metric displaying the "time" as "Slot Seconds" or something like that so that user doesn't have to compute it by dividing "Total MBytes processes" by "Throughput mb/sec" explicitly. This also helps clarify that the throughput is computed in terms is slot time, not walltime. * Additionally, maybe provide a measure of "average concurrency" taking total slot time divided by walltime. This would legitimately consider scheduler overheads; if my whole test only ran 1 task in an hour, and it only had 30 minutes of slot time, then a concurrency of 0.5 correctly characterizes the fact that I'm only squeezing out 0.5 utilization after factoring in delays. In any case, happy to just delete the one line in-place to have the refactorings committed if you feel it's better not to change/add metrics or if these are better discussed in a followup JIRA, let me know. Re: MAPREDUCE and HDFS, I'll be sure remember TestDFSIO goes under HDFS in the future. For this one I looked at a search for "TestDFSIO" in JIRA and eyeballed that a plurality seemed to be under MAPREDUCE, a smaller fraction in HDFS, and then remaining ones in HADOOP. Combined with this code going under the hadoop-mapreduce directory, it looked like MAPREDUCE was more correct. > Fix TestDFSIO "Total Throughput" calculation > > > Key: MAPREDUCE-6931 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6931 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: benchmarks, test >Affects Versions: 2.8.0 >Reporter: Dennis Huo >Priority: Trivial > > The new "Total Throughput" line added in > https://issues.apache.org/jira/browse/HDFS-9153 is currently calculated as > {{toMB(size) / ((float)execTime)}} and claims to be in units of "MB/s", but > {{execTime}} is in milliseconds; thus, the reported number is 1/1000x the > actual value: > {code:java} > String resultLines[] = { > "- TestDFSIO - : " + testType, > "Date & time: " + new Date(System.currentTimeMillis()), > "Number of files: " + tasks, > " Total MBytes processed: " + df.format(toMB(size)), > " Throughput mb/sec: " + df.format(size * 1000.0 / (time * > MEGA)), > "Total Throughput mb/sec: " + df.format(toMB(size) / > ((float)execTime)), > " Average IO rate mb/sec: " + df.format(med), > " IO rate std deviation: " + df.format(stdDev), > " Test exec time sec: " + df.format((float)execTime / 1000), > "" }; > {code} > The different calculated
[jira] [Updated] (MAPREDUCE-6674) configure parallel tests for mapreduce-client-jobclient
[ https://issues.apache.org/jira/browse/MAPREDUCE-6674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6674: Attachment: MR-4980.012.patch > configure parallel tests for mapreduce-client-jobclient > --- > > Key: MAPREDUCE-6674 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6674 > Project: Hadoop Map/Reduce > Issue Type: Test > Components: test >Affects Versions: 3.0.0-alpha1 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: MAPREDUCE-6674.00.patch, MAPREDUCE-6674.01.patch, > MR-4980.011.patch, MR-4980.012.patch, MR-4980.patch, sftest.000.patch > > > mapreduce-client-jobclient takes almost an hour and a half. Configuring > parallel-tests would greatly reduce this run time. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6674) configure parallel tests for mapreduce-client-jobclient
[ https://issues.apache.org/jira/browse/MAPREDUCE-6674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6674: Attachment: (was: MR-4980.011.patch) > configure parallel tests for mapreduce-client-jobclient > --- > > Key: MAPREDUCE-6674 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6674 > Project: Hadoop Map/Reduce > Issue Type: Test > Components: test >Affects Versions: 3.0.0-alpha1 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: MAPREDUCE-6674.00.patch, MAPREDUCE-6674.01.patch, > MR-4980.011.patch, MR-4980.012.patch, MR-4980.patch, sftest.000.patch > > > mapreduce-client-jobclient takes almost an hour and a half. Configuring > parallel-tests would greatly reduce this run time. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6674) configure parallel tests for mapreduce-client-jobclient
[ https://issues.apache.org/jira/browse/MAPREDUCE-6674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6674: Attachment: MR-4980.011.patch > configure parallel tests for mapreduce-client-jobclient > --- > > Key: MAPREDUCE-6674 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6674 > Project: Hadoop Map/Reduce > Issue Type: Test > Components: test >Affects Versions: 3.0.0-alpha1 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: MAPREDUCE-6674.00.patch, MAPREDUCE-6674.01.patch, > MR-4980.011.patch, MR-4980.011.patch, MR-4980.patch, sftest.000.patch > > > mapreduce-client-jobclient takes almost an hour and a half. Configuring > parallel-tests would greatly reduce this run time. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6674) configure parallel tests for mapreduce-client-jobclient
[ https://issues.apache.org/jira/browse/MAPREDUCE-6674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6674: Attachment: sftest.000.patch > configure parallel tests for mapreduce-client-jobclient > --- > > Key: MAPREDUCE-6674 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6674 > Project: Hadoop Map/Reduce > Issue Type: Test > Components: test >Affects Versions: 3.0.0-alpha1 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: MAPREDUCE-6674.00.patch, MAPREDUCE-6674.01.patch, > MR-4980.011.patch, MR-4980.patch, sftest.000.patch > > > mapreduce-client-jobclient takes almost an hour and a half. Configuring > parallel-tests would greatly reduce this run time. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6936) Remove unnecessary dependency of hadoop-yarn-server-common from hadoop-mapreduce-client-common
[ https://issues.apache.org/jira/browse/MAPREDUCE-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122189#comment-16122189 ] Hadoop QA commented on MAPREDUCE-6936: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 24s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 50s{color} | {color:green} hadoop-mapreduce-client-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 15s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 20m 18s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | MAPREDUCE-6936 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12881306/MAPREDUCE-6936.00.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit xml | | uname | Linux 0f96aa13d342 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 312e57b | | Default Java | 1.8.0_131 | | Test Results | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7055/testReport/ | | modules | C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common U: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7055/console | | Powered by | Apache Yetus 0.5.0 http://yetus.apache.org | This message was automatically generated. > Remove unnecessary dependency of hadoop-yarn-server-common from > hadoop-mapreduce-client-common > --- > > Key: MAPREDUCE-6936 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6936 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: MAPREDUCE-6936.00.patch > > > The dependency of hadoop-yarn-server-common in hadoop-mapreduce-client-common > seems unnecessary, as > it is not using as of the classes from hadoop-yarn-server-common. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe,
[jira] [Updated] (MAPREDUCE-6936) Remove unnecessary dependency of hadoop-yarn-server-common from hadoop-mapreduce-client-common
[ https://issues.apache.org/jira/browse/MAPREDUCE-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-6936: -- Attachment: MAPREDUCE-6936.00.patch > Remove unnecessary dependency of hadoop-yarn-server-common from > hadoop-mapreduce-client-common > --- > > Key: MAPREDUCE-6936 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6936 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: MAPREDUCE-6936.00.patch > > > The dependency of hadoop-yarn-server-common in hadoop-mapreduce-client-common > seems unnecessary, as > it is not using as of the classes from hadoop-yarn-server-common. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6936) Remove unnecessary dependency of hadoop-yarn-server-common from hadoop-mapreduce-client-common
[ https://issues.apache.org/jira/browse/MAPREDUCE-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-6936: -- Status: Patch Available (was: Open) > Remove unnecessary dependency of hadoop-yarn-server-common from > hadoop-mapreduce-client-common > --- > > Key: MAPREDUCE-6936 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6936 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: MAPREDUCE-6936.00.patch > > > The dependency of hadoop-yarn-server-common in hadoop-mapreduce-client-common > seems unnecessary, as > it is not using as of the classes from hadoop-yarn-server-common. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6936) Remove unnecessary dependency of hadoop-yarn-server-common from hadoop-mapreduce-client-common
[ https://issues.apache.org/jira/browse/MAPREDUCE-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-6936: -- Description: The dependency of hadoop-yarn-server-common in hadoop-mapreduce-client-common seems unnecessary, as it is not using as of the classes from hadoop-yarn-server-common. was:The dependency of > Remove unnecessary dependency of hadoop-yarn-server-common from > hadoop-mapreduce-client-common > --- > > Key: MAPREDUCE-6936 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6936 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Reporter: Haibo Chen >Assignee: Haibo Chen > > The dependency of hadoop-yarn-server-common in hadoop-mapreduce-client-common > seems unnecessary, as > it is not using as of the classes from hadoop-yarn-server-common. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-6936) Remove unnecessary dependency of hadoop-yarn-server-common from hadoop-mapreduce-client-common
Haibo Chen created MAPREDUCE-6936: - Summary: Remove unnecessary dependency of hadoop-yarn-server-common from hadoop-mapreduce-client-common Key: MAPREDUCE-6936 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6936 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Reporter: Haibo Chen Assignee: Haibo Chen -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6936) Remove unnecessary dependency of hadoop-yarn-server-common from hadoop-mapreduce-client-common
[ https://issues.apache.org/jira/browse/MAPREDUCE-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-6936: -- Description: The dependency of > Remove unnecessary dependency of hadoop-yarn-server-common from > hadoop-mapreduce-client-common > --- > > Key: MAPREDUCE-6936 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6936 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Reporter: Haibo Chen >Assignee: Haibo Chen > > The dependency of -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (MAPREDUCE-6931) Fix TestDFSIO "Total Throughput" calculation
[ https://issues.apache.org/jira/browse/MAPREDUCE-6931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122067#comment-16122067 ] Konstantin Shvachko edited comment on MAPREDUCE-6931 at 8/10/17 6:40 PM: - Hey [~dennishuo], thanks for reporting this. As I mentioned in HDFS-9153 ([in this comment|https://issues.apache.org/jira/browse/HDFS-9153?focusedCommentId=16122049=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16122049] the "Total Throughput" should be removed as a deceiving metrics. Could you please fix this by removing the line. Also, DFSIO issues should be filed on HDFS jira. was (Author: shv): Hey [~dennishuo], thanks for reporting this. As I mentioned in HDFS-9153 ([in this comment|https://issues.apache.org/jira/browse/HDFS-9153?focusedCommentId=16122049=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16122049] the "Total Throughput" should be removed as a deceiving metrics. Could you please fix this by removing the line. > Fix TestDFSIO "Total Throughput" calculation > > > Key: MAPREDUCE-6931 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6931 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: benchmarks, test >Affects Versions: 2.8.0 >Reporter: Dennis Huo >Priority: Trivial > > The new "Total Throughput" line added in > https://issues.apache.org/jira/browse/HDFS-9153 is currently calculated as > {{toMB(size) / ((float)execTime)}} and claims to be in units of "MB/s", but > {{execTime}} is in milliseconds; thus, the reported number is 1/1000x the > actual value: > {code:java} > String resultLines[] = { > "- TestDFSIO - : " + testType, > "Date & time: " + new Date(System.currentTimeMillis()), > "Number of files: " + tasks, > " Total MBytes processed: " + df.format(toMB(size)), > " Throughput mb/sec: " + df.format(size * 1000.0 / (time * > MEGA)), > "Total Throughput mb/sec: " + df.format(toMB(size) / > ((float)execTime)), > " Average IO rate mb/sec: " + df.format(med), > " IO rate std deviation: " + df.format(stdDev), > " Test exec time sec: " + df.format((float)execTime / 1000), > "" }; > {code} > The different calculated fields can also use toMB and a shared > milliseconds-to-seconds conversion to make it easier to keep units consistent. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6931) Fix TestDFSIO "Total Throughput" calculation
[ https://issues.apache.org/jira/browse/MAPREDUCE-6931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122067#comment-16122067 ] Konstantin Shvachko commented on MAPREDUCE-6931: Hey [~dennishuo], thanks for reporting this. As I mentioned in HDFS-9153 ([in this comment|https://issues.apache.org/jira/browse/HDFS-9153?focusedCommentId=16122049=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16122049] the "Total Throughput" should be removed as a deceiving metrics. Could you please fix this by removing the line. > Fix TestDFSIO "Total Throughput" calculation > > > Key: MAPREDUCE-6931 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6931 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: benchmarks, test >Affects Versions: 2.8.0 >Reporter: Dennis Huo >Priority: Trivial > > The new "Total Throughput" line added in > https://issues.apache.org/jira/browse/HDFS-9153 is currently calculated as > {{toMB(size) / ((float)execTime)}} and claims to be in units of "MB/s", but > {{execTime}} is in milliseconds; thus, the reported number is 1/1000x the > actual value: > {code:java} > String resultLines[] = { > "- TestDFSIO - : " + testType, > "Date & time: " + new Date(System.currentTimeMillis()), > "Number of files: " + tasks, > " Total MBytes processed: " + df.format(toMB(size)), > " Throughput mb/sec: " + df.format(size * 1000.0 / (time * > MEGA)), > "Total Throughput mb/sec: " + df.format(toMB(size) / > ((float)execTime)), > " Average IO rate mb/sec: " + df.format(med), > " IO rate std deviation: " + df.format(stdDev), > " Test exec time sec: " + df.format((float)execTime / 1000), > "" }; > {code} > The different calculated fields can also use toMB and a shared > milliseconds-to-seconds conversion to make it easier to keep units consistent. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6923) Optimize MapReduce Shuffle I/O for small partitions
[ https://issues.apache.org/jira/browse/MAPREDUCE-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121932#comment-16121932 ] Ravi Prakash commented on MAPREDUCE-6923: - bq. I'd say that for readSize == trans, we're in the else block, Thanks for pointing that out Robert! :-) Yupp. I agree bq. I'll be linking to the results once they're properly published. Looking forward to it :-) > Optimize MapReduce Shuffle I/O for small partitions > --- > > Key: MAPREDUCE-6923 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6923 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Environment: Observed in Hadoop 2.7.3 and above (judging from the > source code of future versions), and Ubuntu 16.04. >Reporter: Robert Schmidtke >Assignee: Robert Schmidtke > Fix For: 2.9.0, 3.0.0-beta1 > > Attachments: MAPREDUCE-6923.00.patch, MAPREDUCE-6923.01.patch > > > When a job configuration results in small partitions read by each reducer > from each mapper (e.g. 65 kilobytes as in my setup: a > [TeraSort|https://github.com/apache/hadoop/blob/branch-2.7.3/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraSort.java] > of 256 gigabytes using 2048 mappers and reducers each), and setting > {code:xml} > > mapreduce.shuffle.transferTo.allowed > false > > {code} > then the default setting of > {code:xml} > > mapreduce.shuffle.transfer.buffer.size > 131072 > > {code} > results in almost 100% overhead in reads during shuffle in YARN, because for > each 65K needed, 128K are read. > I propose a fix in > [FadvisedFileRegion.java|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedFileRegion.java#L114] > as follows: > {code:java} > ByteBuffer byteBuffer = ByteBuffer.allocate(Math.min(this.shuffleBufferSize, > trans > Integer.MAX_VALUE ? Integer.MAX_VALUE : (int) trans)); > {code} > e.g. > [here|https://github.com/apache/hadoop/compare/branch-2.7.3...robert-schmidtke:adaptive-shuffle-buffer]. > This sets the shuffle buffer size to the minimum value of the shuffle buffer > size specified in the configuration (128K by default), and the actual > partition size (65K on average in my setup). In my benchmarks this reduced > the read overhead in YARN from about 100% (255 additional gigabytes as > described above) down to about 18% (an additional 45 gigabytes). The runtime > of the job remained the same in my setup. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6870) Add configuration for MR job to finish when all reducers are complete (even with unfinished mappers)
[ https://issues.apache.org/jira/browse/MAPREDUCE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121826#comment-16121826 ] Haibo Chen commented on MAPREDUCE-6870: --- +1 Will check it in later today. > Add configuration for MR job to finish when all reducers are complete (even > with unfinished mappers) > > > Key: MAPREDUCE-6870 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6870 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.6.1 >Reporter: Zhe Zhang >Assignee: Peter Bacsko > Attachments: MAPREDUCE-6870-001.patch, MAPREDUCE-6870-002.patch, > MAPREDUCE-6870-003.patch, MAPREDUCE-6870-004.patch, MAPREDUCE-6870-005.patch, > MAPREDUCE-6870-006.patch, MAPREDUCE-6870-007.patch > > > Even with MAPREDUCE-5817, there could still be cases where mappers get > scheduled before all reducers are complete, but those mappers run for long > time, even after all reducers are complete. This could hurt the performance > of large MR jobs. > In some cases, mappers don't have any materialize-able outcome other than > providing intermediate data to reducers. In that case, the job owner should > have the config option to finish the job once all reducers are complete. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-6935) Allow multiple active timeline clients
Aaron Gresch created MAPREDUCE-6935: --- Summary: Allow multiple active timeline clients Key: MAPREDUCE-6935 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6935 Project: Hadoop Map/Reduce Issue Type: Sub-task Reporter: Aaron Gresch In order to migrate smoothly from timeline service v1 to v2, it would be useful to be able to run both services at the same time for a period of time. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6870) Add configuration for MR job to finish when all reducers are complete (even with unfinished mappers)
[ https://issues.apache.org/jira/browse/MAPREDUCE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121585#comment-16121585 ] Hadoop QA commented on MAPREDUCE-6870: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 7s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 7s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 36s{color} | {color:green} hadoop-mapreduce-project/hadoop-mapreduce-client: The patch generated 0 new + 654 unchanged - 5 fixed = 654 total (was 659) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 39s{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 52s{color} | {color:green} hadoop-mapreduce-client-app in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 14s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 38m 24s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | MAPREDUCE-6870 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12881202/MAPREDUCE-6870-007.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml | | uname | Linux 73c1dc9f7ec6 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 8d953c2 | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7054/testReport/ | | modules | C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app U: hadoop-mapreduce-project/hadoop-mapreduce-client | |
[jira] [Commented] (MAPREDUCE-6870) Add configuration for MR job to finish when all reducers are complete (even with unfinished mappers)
[ https://issues.apache.org/jira/browse/MAPREDUCE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121541#comment-16121541 ] Peter Bacsko commented on MAPREDUCE-6870: - Yep, good catch. Uploading v7. > Add configuration for MR job to finish when all reducers are complete (even > with unfinished mappers) > > > Key: MAPREDUCE-6870 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6870 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.6.1 >Reporter: Zhe Zhang >Assignee: Peter Bacsko > Attachments: MAPREDUCE-6870-001.patch, MAPREDUCE-6870-002.patch, > MAPREDUCE-6870-003.patch, MAPREDUCE-6870-004.patch, MAPREDUCE-6870-005.patch, > MAPREDUCE-6870-006.patch, MAPREDUCE-6870-007.patch > > > Even with MAPREDUCE-5817, there could still be cases where mappers get > scheduled before all reducers are complete, but those mappers run for long > time, even after all reducers are complete. This could hurt the performance > of large MR jobs. > In some cases, mappers don't have any materialize-able outcome other than > providing intermediate data to reducers. In that case, the job owner should > have the config option to finish the job once all reducers are complete. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6870) Add configuration for MR job to finish when all reducers are complete (even with unfinished mappers)
[ https://issues.apache.org/jira/browse/MAPREDUCE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated MAPREDUCE-6870: Attachment: MAPREDUCE-6870-007.patch > Add configuration for MR job to finish when all reducers are complete (even > with unfinished mappers) > > > Key: MAPREDUCE-6870 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6870 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.6.1 >Reporter: Zhe Zhang >Assignee: Peter Bacsko > Attachments: MAPREDUCE-6870-001.patch, MAPREDUCE-6870-002.patch, > MAPREDUCE-6870-003.patch, MAPREDUCE-6870-004.patch, MAPREDUCE-6870-005.patch, > MAPREDUCE-6870-006.patch, MAPREDUCE-6870-007.patch > > > Even with MAPREDUCE-5817, there could still be cases where mappers get > scheduled before all reducers are complete, but those mappers run for long > time, even after all reducers are complete. This could hurt the performance > of large MR jobs. > In some cases, mappers don't have any materialize-able outcome other than > providing intermediate data to reducers. In that case, the job owner should > have the config option to finish the job once all reducers are complete. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6923) Optimize MapReduce Shuffle I/O for small partitions
[ https://issues.apache.org/jira/browse/MAPREDUCE-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121117#comment-16121117 ] Robert Schmidtke commented on MAPREDUCE-6923: - Thanks for coming back to my comments. When I said Yarn I indeed meant the NodeManager, sorry for the confusion. You're right about the shuffle service, it was however something that I only discovered recently, having built my configuration a long time ago, not exactly knowing what I was doing. I set these keys as you described. I'm seeing jar files being loaded in the MapTask and ReduceTask JVMs alright, but there does not seem to be disk I/O overhead. In any case, I greatly appreciate all of your effort, and now that things are working as expected for me, I can focus on analyzing the numbers and making some sense of them. I'll be linking to the results once they're properly published. Cheers Robert > Optimize MapReduce Shuffle I/O for small partitions > --- > > Key: MAPREDUCE-6923 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6923 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Environment: Observed in Hadoop 2.7.3 and above (judging from the > source code of future versions), and Ubuntu 16.04. >Reporter: Robert Schmidtke >Assignee: Robert Schmidtke > Fix For: 2.9.0, 3.0.0-beta1 > > Attachments: MAPREDUCE-6923.00.patch, MAPREDUCE-6923.01.patch > > > When a job configuration results in small partitions read by each reducer > from each mapper (e.g. 65 kilobytes as in my setup: a > [TeraSort|https://github.com/apache/hadoop/blob/branch-2.7.3/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraSort.java] > of 256 gigabytes using 2048 mappers and reducers each), and setting > {code:xml} > > mapreduce.shuffle.transferTo.allowed > false > > {code} > then the default setting of > {code:xml} > > mapreduce.shuffle.transfer.buffer.size > 131072 > > {code} > results in almost 100% overhead in reads during shuffle in YARN, because for > each 65K needed, 128K are read. > I propose a fix in > [FadvisedFileRegion.java|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedFileRegion.java#L114] > as follows: > {code:java} > ByteBuffer byteBuffer = ByteBuffer.allocate(Math.min(this.shuffleBufferSize, > trans > Integer.MAX_VALUE ? Integer.MAX_VALUE : (int) trans)); > {code} > e.g. > [here|https://github.com/apache/hadoop/compare/branch-2.7.3...robert-schmidtke:adaptive-shuffle-buffer]. > This sets the shuffle buffer size to the minimum value of the shuffle buffer > size specified in the configuration (128K by default), and the actual > partition size (65K on average in my setup). In my benchmarks this reduced > the read overhead in YARN from about 100% (255 additional gigabytes as > described above) down to about 18% (an additional 45 gigabytes). The runtime > of the job remained the same in my setup. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (MAPREDUCE-6923) Optimize MapReduce Shuffle I/O for small partitions
[ https://issues.apache.org/jira/browse/MAPREDUCE-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121109#comment-16121109 ] Robert Schmidtke edited comment on MAPREDUCE-6923 at 8/10/17 6:07 AM: -- Hi Ravi, {quote} When {{shuffleBufferSize <= trans}}, then behavior is exactly the same as old code. {quote} Yes. {quote} if {{readSize == trans}} (i.e. the {{fileChannel.read()}} returned as many bytes as I wanted to transfer, {{trans}} is decremented correctly, {{position}} is increased correctly and the {{byteBuffer}} is flipped as usual. {{byteBuffer}}'s contents are written to {{target}} as usual, {{byteBuffer}} is cleared and then hopefully GCed never to be seen again. {quote} I'd say that for {{readSize == trans}}, we're in the [else block|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedFileRegion.java#L127], and thus {{byteBuffer}}'s {{limit()}} is set to {{trans}} (which is the size it already has because we're in the case where {{trans < shuffleBufferSize}}. It's correctly positioned to {{0}} as we're done reading, and {{trans}} is correctly set to {{0}}. Afterwards, the loop breaks (it can only be one iteration here because otherwise {{trans}} would have been larger than {{shuffleBufferSize}}), {{byteBuffer}} is written to {{target}} and then cleared. {quote} if {{readSize < trans}} (almost the same thing as above happens, but in a while loop). The only change this patch makes is that the {{byteBuffer}} may be smaller than before this patch, but it doesn't matter because its big enough for the number of bytes we need to transfer. {quote} Now we have the situation you described for the previous case, and I agree with your reasoning here. was (Author: rosch): Hi Ravi, {quote} When {{shuffleBufferSize <= trans}}, then behavior is exactly the same as old code. {quote} Yes. {quote} if {{readSize == trans}} (i.e. the {{fileChannel.read()}} returned as many bytes as I wanted to transfer, {{trans}} is decremented correctly, {{position}} is increased correctly and the {{byteBuffer}} is flipped as usual. {{byteBuffer}}'s contents are written to {{target}} as usual, {{byteBuffer}} is cleared and then hopefully GCed never to be seen again. {quote} I'd say that for {{readSize == trans}}, we're in the [else block|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedFileRegion.java#L127], and thus {{byteBuffer}} is {{limit()}}ed to {{trans}} (which is the size it already has because we're in the case where {{trans < shuffleBufferSize}}. It's correctly positioned to {{0}} as we're done reading, and {{trans}} is correctly set to {{0}}. Afterwards, the loop breaks (it can only be one iteration here because otherwise {{trans}} would have been larger than {{shuffleBufferSize}}), {{byteBuffer}} is written to {{target}} and then cleared. {quote} if {{readSize < trans}} (almost the same thing as above happens, but in a while loop). The only change this patch makes is that the {{byteBuffer}} may be smaller than before this patch, but it doesn't matter because its big enough for the number of bytes we need to transfer. {quote} Now we have the situation you described for the previous case, and I agree with your reasoning here. > Optimize MapReduce Shuffle I/O for small partitions > --- > > Key: MAPREDUCE-6923 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6923 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Environment: Observed in Hadoop 2.7.3 and above (judging from the > source code of future versions), and Ubuntu 16.04. >Reporter: Robert Schmidtke >Assignee: Robert Schmidtke > Fix For: 2.9.0, 3.0.0-beta1 > > Attachments: MAPREDUCE-6923.00.patch, MAPREDUCE-6923.01.patch > > > When a job configuration results in small partitions read by each reducer > from each mapper (e.g. 65 kilobytes as in my setup: a > [TeraSort|https://github.com/apache/hadoop/blob/branch-2.7.3/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraSort.java] > of 256 gigabytes using 2048 mappers and reducers each), and setting > {code:xml} > > mapreduce.shuffle.transferTo.allowed > false > > {code} > then the default setting of > {code:xml} > > mapreduce.shuffle.transfer.buffer.size > 131072 > > {code} > results in almost 100% overhead in reads during shuffle in YARN, because for > each 65K needed, 128K are read. > I propose a fix in >
[jira] [Comment Edited] (MAPREDUCE-6923) Optimize MapReduce Shuffle I/O for small partitions
[ https://issues.apache.org/jira/browse/MAPREDUCE-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121109#comment-16121109 ] Robert Schmidtke edited comment on MAPREDUCE-6923 at 8/10/17 6:06 AM: -- Hi Ravi, {quote} When {{shuffleBufferSize <= trans}}, then behavior is exactly the same as old code. {quote} Yes. {quote} if {{readSize == trans}} (i.e. the {{fileChannel.read()}} returned as many bytes as I wanted to transfer, {{trans}} is decremented correctly, {{position}} is increased correctly and the {{byteBuffer}} is flipped as usual. {{byteBuffer}}'s contents are written to {{target}} as usual, {{byteBuffer}} is cleared and then hopefully GCed never to be seen again. {quote} I'd say that for {{readSize == trans}}, we're in the [else block|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedFileRegion.java#L127], and thus {{byteBuffer}} is {{limit()}} ed to {{trans}} (which is the size it already has because we're in the case where {{trans < shuffleBufferSize}}. It's correctly positioned to {{0}} as we're done reading, and {{trans}} is correctly set to {{0}}. Afterwards, the loop breaks (it can only be one iteration here because otherwise {{trans}} would have been larger than {{shuffleBufferSize}}), {{byteBuffer}} is written to {{target}} and then cleared. {quote} if {{readSize < trans}} (almost the same thing as above happens, but in a while loop). The only change this patch makes is that the {{byteBuffer}} may be smaller than before this patch, but it doesn't matter because its big enough for the number of bytes we need to transfer. {quote} Now we have the situation you described for the previous case, and I agree with your reasoning here. was (Author: rosch): Hi Ravi, {quote} When {{shuffleBufferSize <= trans}}, then behavior is exactly the same as old code. {quote} Yes. {quote} if {{readSize == trans}} (i.e. the {{fileChannel.read()}} returned as many bytes as I wanted to transfer, {{trans}} is decremented correctly, {{position}} is increased correctly and the {{byteBuffer}} is flipped as usual. {{byteBuffer}} 's contents are written to {{target}} as usual, {{byteBuffer}} is cleared and then hopefully GCed never to be seen again. {quote} I'd say that for {{readSize == trans}}, we're in the [else block|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedFileRegion.java#L127], and thus {{byteBuffer}} is {{limit()}} ed to {{trans}} (which is the size it already has because we're in the case where {{trans < shuffleBufferSize}}. It's correctly positioned to {{0}} as we're done reading, and {{trans}} is correctly set to {{0}}. Afterwards, the loop breaks (it can only be one iteration here because otherwise {{trans}} would have been larger than {{shuffleBufferSize}}), {{byteBuffer}} is written to {{target}} and then cleared. {quote} if {{readSize < trans}} (almost the same thing as above happens, but in a while loop). The only change this patch makes is that the {{byteBuffer}} may be smaller than before this patch, but it doesn't matter because its big enough for the number of bytes we need to transfer. {quote} Now we have the situation you described for the previous case, and I agree with your reasoning here. > Optimize MapReduce Shuffle I/O for small partitions > --- > > Key: MAPREDUCE-6923 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6923 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Environment: Observed in Hadoop 2.7.3 and above (judging from the > source code of future versions), and Ubuntu 16.04. >Reporter: Robert Schmidtke >Assignee: Robert Schmidtke > Fix For: 2.9.0, 3.0.0-beta1 > > Attachments: MAPREDUCE-6923.00.patch, MAPREDUCE-6923.01.patch > > > When a job configuration results in small partitions read by each reducer > from each mapper (e.g. 65 kilobytes as in my setup: a > [TeraSort|https://github.com/apache/hadoop/blob/branch-2.7.3/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraSort.java] > of 256 gigabytes using 2048 mappers and reducers each), and setting > {code:xml} > > mapreduce.shuffle.transferTo.allowed > false > > {code} > then the default setting of > {code:xml} > > mapreduce.shuffle.transfer.buffer.size > 131072 > > {code} > results in almost 100% overhead in reads during shuffle in YARN, because for > each 65K needed, 128K are read. > I propose a fix in >
[jira] [Comment Edited] (MAPREDUCE-6923) Optimize MapReduce Shuffle I/O for small partitions
[ https://issues.apache.org/jira/browse/MAPREDUCE-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121109#comment-16121109 ] Robert Schmidtke edited comment on MAPREDUCE-6923 at 8/10/17 6:06 AM: -- Hi Ravi, {quote} When {{shuffleBufferSize <= trans}}, then behavior is exactly the same as old code. {quote} Yes. {quote} if {{readSize == trans}} (i.e. the {{fileChannel.read()}} returned as many bytes as I wanted to transfer, {{trans}} is decremented correctly, {{position}} is increased correctly and the {{byteBuffer}} is flipped as usual. {{byteBuffer}}'s contents are written to {{target}} as usual, {{byteBuffer}} is cleared and then hopefully GCed never to be seen again. {quote} I'd say that for {{readSize == trans}}, we're in the [else block|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedFileRegion.java#L127], and thus {{byteBuffer}} is {{limit()}}ed to {{trans}} (which is the size it already has because we're in the case where {{trans < shuffleBufferSize}}. It's correctly positioned to {{0}} as we're done reading, and {{trans}} is correctly set to {{0}}. Afterwards, the loop breaks (it can only be one iteration here because otherwise {{trans}} would have been larger than {{shuffleBufferSize}}), {{byteBuffer}} is written to {{target}} and then cleared. {quote} if {{readSize < trans}} (almost the same thing as above happens, but in a while loop). The only change this patch makes is that the {{byteBuffer}} may be smaller than before this patch, but it doesn't matter because its big enough for the number of bytes we need to transfer. {quote} Now we have the situation you described for the previous case, and I agree with your reasoning here. was (Author: rosch): Hi Ravi, {quote} When {{shuffleBufferSize <= trans}}, then behavior is exactly the same as old code. {quote} Yes. {quote} if {{readSize == trans}} (i.e. the {{fileChannel.read()}} returned as many bytes as I wanted to transfer, {{trans}} is decremented correctly, {{position}} is increased correctly and the {{byteBuffer}} is flipped as usual. {{byteBuffer}}'s contents are written to {{target}} as usual, {{byteBuffer}} is cleared and then hopefully GCed never to be seen again. {quote} I'd say that for {{readSize == trans}}, we're in the [else block|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedFileRegion.java#L127], and thus {{byteBuffer}} is {{limit()}} ed to {{trans}} (which is the size it already has because we're in the case where {{trans < shuffleBufferSize}}. It's correctly positioned to {{0}} as we're done reading, and {{trans}} is correctly set to {{0}}. Afterwards, the loop breaks (it can only be one iteration here because otherwise {{trans}} would have been larger than {{shuffleBufferSize}}), {{byteBuffer}} is written to {{target}} and then cleared. {quote} if {{readSize < trans}} (almost the same thing as above happens, but in a while loop). The only change this patch makes is that the {{byteBuffer}} may be smaller than before this patch, but it doesn't matter because its big enough for the number of bytes we need to transfer. {quote} Now we have the situation you described for the previous case, and I agree with your reasoning here. > Optimize MapReduce Shuffle I/O for small partitions > --- > > Key: MAPREDUCE-6923 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6923 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Environment: Observed in Hadoop 2.7.3 and above (judging from the > source code of future versions), and Ubuntu 16.04. >Reporter: Robert Schmidtke >Assignee: Robert Schmidtke > Fix For: 2.9.0, 3.0.0-beta1 > > Attachments: MAPREDUCE-6923.00.patch, MAPREDUCE-6923.01.patch > > > When a job configuration results in small partitions read by each reducer > from each mapper (e.g. 65 kilobytes as in my setup: a > [TeraSort|https://github.com/apache/hadoop/blob/branch-2.7.3/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraSort.java] > of 256 gigabytes using 2048 mappers and reducers each), and setting > {code:xml} > > mapreduce.shuffle.transferTo.allowed > false > > {code} > then the default setting of > {code:xml} > > mapreduce.shuffle.transfer.buffer.size > 131072 > > {code} > results in almost 100% overhead in reads during shuffle in YARN, because for > each 65K needed, 128K are read. > I propose a fix in >
[jira] [Commented] (MAPREDUCE-6923) Optimize MapReduce Shuffle I/O for small partitions
[ https://issues.apache.org/jira/browse/MAPREDUCE-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121109#comment-16121109 ] Robert Schmidtke commented on MAPREDUCE-6923: - Hi Ravi, {quote} When {{shuffleBufferSize <= trans}}, then behavior is exactly the same as old code. {quote} Yes. {quote} if {{readSize == trans}} (i.e. the {{fileChannel.read()}} returned as many bytes as I wanted to transfer, {{trans}} is decremented correctly, {{position}} is increased correctly and the {{byteBuffer}} is flipped as usual. {{byteBuffer}} 's contents are written to {{target}} as usual, {{byteBuffer}} is cleared and then hopefully GCed never to be seen again. {quote} I'd say that for {{readSize == trans}}, we're in the [else block|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedFileRegion.java#L127], and thus {{byteBuffer}} is {{limit()}} ed to {{trans}} (which is the size it already has because we're in the case where {{trans < shuffleBufferSize}}. It's correctly positioned to {{0}} as we're done reading, and {{trans}} is correctly set to {{0}}. Afterwards, the loop breaks (it can only be one iteration here because otherwise {{trans}} would have been larger than {{shuffleBufferSize}}), {{byteBuffer}} is written to {{target}} and then cleared. {quote} if {{readSize < trans}} (almost the same thing as above happens, but in a while loop). The only change this patch makes is that the {{byteBuffer}} may be smaller than before this patch, but it doesn't matter because its big enough for the number of bytes we need to transfer. {quote} Now we have the situation you described for the previous case, and I agree with your reasoning here. > Optimize MapReduce Shuffle I/O for small partitions > --- > > Key: MAPREDUCE-6923 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6923 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Environment: Observed in Hadoop 2.7.3 and above (judging from the > source code of future versions), and Ubuntu 16.04. >Reporter: Robert Schmidtke >Assignee: Robert Schmidtke > Fix For: 2.9.0, 3.0.0-beta1 > > Attachments: MAPREDUCE-6923.00.patch, MAPREDUCE-6923.01.patch > > > When a job configuration results in small partitions read by each reducer > from each mapper (e.g. 65 kilobytes as in my setup: a > [TeraSort|https://github.com/apache/hadoop/blob/branch-2.7.3/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraSort.java] > of 256 gigabytes using 2048 mappers and reducers each), and setting > {code:xml} > > mapreduce.shuffle.transferTo.allowed > false > > {code} > then the default setting of > {code:xml} > > mapreduce.shuffle.transfer.buffer.size > 131072 > > {code} > results in almost 100% overhead in reads during shuffle in YARN, because for > each 65K needed, 128K are read. > I propose a fix in > [FadvisedFileRegion.java|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedFileRegion.java#L114] > as follows: > {code:java} > ByteBuffer byteBuffer = ByteBuffer.allocate(Math.min(this.shuffleBufferSize, > trans > Integer.MAX_VALUE ? Integer.MAX_VALUE : (int) trans)); > {code} > e.g. > [here|https://github.com/apache/hadoop/compare/branch-2.7.3...robert-schmidtke:adaptive-shuffle-buffer]. > This sets the shuffle buffer size to the minimum value of the shuffle buffer > size specified in the configuration (128K by default), and the actual > partition size (65K on average in my setup). In my benchmarks this reduced > the read overhead in YARN from about 100% (255 additional gigabytes as > described above) down to about 18% (an additional 45 gigabytes). The runtime > of the job remained the same in my setup. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org