[jira] [Commented] (MAPREDUCE-7150) Optimize collections used by MR JHS to reduce its memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16652467#comment-16652467 ] Hudson commented on MAPREDUCE-7150: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15231 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/15231/]) MAPREDUCE-7150. Optimize collections used by MR JHS to reduce its (haibochen: rev babd1449bf8898f44c434c852e67240721c0eb00) * (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryParser.java * (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/counters/FileSystemCounterGroup.java * (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/CompletedTask.java * (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/CompletedTaskAttempt.java > Optimize collections used by MR JHS to reduce its memory > > > Key: MAPREDUCE-7150 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7150 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver, mrv2 >Reporter: Misha Dmitriev >Assignee: Misha Dmitriev >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-8872.01.patch, YARN-8872.02.patch, > YARN-8872.03.patch, YARN-8872.04.patch, jhs-bad-collections.png > > > We analyzed, using jxray (www.jxray.com) a heap dump of JHS running with big > heap in a large clusters, handling large MapReduce jobs. The heap is large > (over 32GB) and 21.4% of it is wasted due to various suboptimal Java > collections, mostly maps and lists that are either empty or contain only one > element. In such under-populated collections considerable amount of memory is > still used by just the internal implementation objects. See the attached > excerpt from the jxray report for the details. If certain collections are > almost always empty, they should be initialized lazily. If others almost > always have just 1 or 2 elements, they should be initialized with the > appropriate initial capacity of 1 or 2 (the default capacity is 16 for > HashMap and 10 for ArrayList). > Based on the attached report, we should do the following: > # {{FileSystemCounterGroup.map}} - initialize lazily > # {{CompletedTask.attempts}} - initialize with capacity 2, given most tasks > only have one or two attempts > # {{JobHistoryParser$TaskInfo.attemptsMap}} - initialize with capacity > # {{CompletedTaskAttempt.diagnostics}} - initialize with capacity 1 since it > contains one diagnostic message most of the time > # {{CompletedTask.reportDiagnostics}} - switch to ArrayList (no reason to > use the more wasteful LinkedList here) and initialize with capacity 1. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-7150) Optimize collections used by MR JHS to reduce its memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16652407#comment-16652407 ] Haibo Chen commented on MAPREDUCE-7150: --- +1 on the latest patch. Will check it in shortly. > Optimize collections used by MR JHS to reduce its memory > > > Key: MAPREDUCE-7150 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7150 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver, mrv2 >Reporter: Misha Dmitriev >Assignee: Misha Dmitriev >Priority: Major > Attachments: YARN-8872.01.patch, YARN-8872.02.patch, > YARN-8872.03.patch, YARN-8872.04.patch, jhs-bad-collections.png > > > We analyzed, using jxray (www.jxray.com) a heap dump of JHS running with big > heap in a large clusters, handling large MapReduce jobs. The heap is large > (over 32GB) and 21.4% of it is wasted due to various suboptimal Java > collections, mostly maps and lists that are either empty or contain only one > element. In such under-populated collections considerable amount of memory is > still used by just the internal implementation objects. See the attached > excerpt from the jxray report for the details. If certain collections are > almost always empty, they should be initialized lazily. If others almost > always have just 1 or 2 elements, they should be initialized with the > appropriate initial capacity of 1 or 2 (the default capacity is 16 for > HashMap and 10 for ArrayList). > Based on the attached report, we should do the following: > # {{FileSystemCounterGroup.map}} - initialize lazily > # {{CompletedTask.attempts}} - initialize with capacity 2, given most tasks > only have one or two attempts > # {{JobHistoryParser$TaskInfo.attemptsMap}} - initialize with capacity > # {{CompletedTaskAttempt.diagnostics}} - initialize with capacity 1 since it > contains one diagnostic message most of the time > # {{CompletedTask.reportDiagnostics}} - switch to ArrayList (no reason to > use the more wasteful LinkedList here) and initialize with capacity 1. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-7150) Optimize collections used by MR JHS to reduce its memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16652086#comment-16652086 ] Peter Bacsko commented on MAPREDUCE-7150: - +1 non-binding from me. I suppose the checkstyle stuff already existed, so IMO it can be ignored. > Optimize collections used by MR JHS to reduce its memory > > > Key: MAPREDUCE-7150 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7150 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver, mrv2 >Reporter: Misha Dmitriev >Assignee: Misha Dmitriev >Priority: Major > Attachments: YARN-8872.01.patch, YARN-8872.02.patch, > YARN-8872.03.patch, YARN-8872.04.patch, jhs-bad-collections.png > > > We analyzed, using jxray (www.jxray.com) a heap dump of JHS running with big > heap in a large clusters, handling large MapReduce jobs. The heap is large > (over 32GB) and 21.4% of it is wasted due to various suboptimal Java > collections, mostly maps and lists that are either empty or contain only one > element. In such under-populated collections considerable amount of memory is > still used by just the internal implementation objects. See the attached > excerpt from the jxray report for the details. If certain collections are > almost always empty, they should be initialized lazily. If others almost > always have just 1 or 2 elements, they should be initialized with the > appropriate initial capacity of 1 or 2 (the default capacity is 16 for > HashMap and 10 for ArrayList). > Based on the attached report, we should do the following: > # {{FileSystemCounterGroup.map}} - initialize lazily > # {{CompletedTask.attempts}} - initialize with capacity 2, given most tasks > only have one or two attempts > # {{JobHistoryParser$TaskInfo.attemptsMap}} - initialize with capacity > # {{CompletedTaskAttempt.diagnostics}} - initialize with capacity 1 since it > contains one diagnostic message most of the time > # {{CompletedTask.reportDiagnostics}} - switch to ArrayList (no reason to > use the more wasteful LinkedList here) and initialize with capacity 1. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-7150) Optimize collections used by MR JHS to reduce its memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16650690#comment-16650690 ] Hadoop QA commented on MAPREDUCE-7150: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 54s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 41s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 51s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 35s{color} | {color:orange} hadoop-mapreduce-project/hadoop-mapreduce-client: The patch generated 4 new + 179 unchanged - 6 fixed = 183 total (was 185) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 45s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 46s{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 20s{color} | {color:green} hadoop-mapreduce-client-hs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 62m 42s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | MAPREDUCE-7150 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12943986/YARN-8872.04.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 17797c72d8a4 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision |
[jira] [Commented] (MAPREDUCE-7150) Optimize collections used by MR JHS to reduce its memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648673#comment-16648673 ] Hadoop QA commented on MAPREDUCE-7150: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 2m 37s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 27s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 48s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 39s{color} | {color:orange} hadoop-mapreduce-project/hadoop-mapreduce-client: The patch generated 6 new + 179 unchanged - 6 fixed = 185 total (was 185) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 57s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 58s{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 49s{color} | {color:green} hadoop-mapreduce-client-hs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 67m 0s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | MAPREDUCE-7150 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12943730/YARN-8872.03.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 6ed51bf27e95 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision |
[jira] [Commented] (MAPREDUCE-7150) Optimize collections used by MR JHS to reduce its memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648656#comment-16648656 ] Hadoop QA commented on MAPREDUCE-7150: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 8s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 30s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 45s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 35s{color} | {color:orange} hadoop-mapreduce-project/hadoop-mapreduce-client: The patch generated 6 new + 179 unchanged - 6 fixed = 185 total (was 185) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 1s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 46s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 42s{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 22s{color} | {color:green} hadoop-mapreduce-client-hs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 62m 32s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | MAPREDUCE-7150 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12943730/YARN-8872.03.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 76671bd89e26 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision |
[jira] [Commented] (MAPREDUCE-7150) Optimize collections used by MR JHS to reduce its memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648621#comment-16648621 ] Haibo Chen commented on MAPREDUCE-7150: --- Thanks for the detailed explaination, [~mi...@cloudera.com]. That makes a lot of sense, and I agree with you that it is correct without the synchronized fix. It's more of a convention to address the warning as much as possible, so that the findbugs warning won't show up in every Jenkins build. The jenkins' job might be confused because the Jira is now moved to MapReduce. I have manually kicked it off. > Optimize collections used by MR JHS to reduce its memory > > > Key: MAPREDUCE-7150 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7150 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver, mrv2 >Reporter: Misha Dmitriev >Assignee: Misha Dmitriev >Priority: Major > Attachments: YARN-8872.01.patch, YARN-8872.02.patch, > YARN-8872.03.patch, jhs-bad-collections.png > > > We analyzed, using jxray (www.jxray.com) a heap dump of JHS running with big > heap in a large clusters, handling large MapReduce jobs. The heap is large > (over 32GB) and 21.4% of it is wasted due to various suboptimal Java > collections, mostly maps and lists that are either empty or contain only one > element. In such under-populated collections considerable amount of memory is > still used by just the internal implementation objects. See the attached > excerpt from the jxray report for the details. If certain collections are > almost always empty, they should be initialized lazily. If others almost > always have just 1 or 2 elements, they should be initialized with the > appropriate initial capacity of 1 or 2 (the default capacity is 16 for > HashMap and 10 for ArrayList). > Based on the attached report, we should do the following: > # {{FileSystemCounterGroup.map}} - initialize lazily > # {{CompletedTask.attempts}} - initialize with capacity 2, given most tasks > only have one or two attempts > # {{JobHistoryParser$TaskInfo.attemptsMap}} - initialize with capacity > # {{CompletedTaskAttempt.diagnostics}} - initialize with capacity 1 since it > contains one diagnostic message most of the time > # {{CompletedTask.reportDiagnostics}} - switch to ArrayList (no reason to > use the more wasteful LinkedList here) and initialize with capacity 1. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-7150) Optimize collections used by MR JHS to reduce its memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648603#comment-16648603 ] Misha Dmitriev commented on MAPREDUCE-7150: --- In the above message, I see that for some reason Jenkins re-ran the old patch: |JIRA Patch URL|[^YARN-8872.02.patch]| [~haibochen] do you know how to force it to run the latest patch? > Optimize collections used by MR JHS to reduce its memory > > > Key: MAPREDUCE-7150 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7150 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver, mrv2 >Reporter: Misha Dmitriev >Assignee: Misha Dmitriev >Priority: Major > Attachments: YARN-8872.01.patch, YARN-8872.02.patch, > YARN-8872.03.patch, jhs-bad-collections.png > > > We analyzed, using jxray (www.jxray.com) a heap dump of JHS running with big > heap in a large clusters, handling large MapReduce jobs. The heap is large > (over 32GB) and 21.4% of it is wasted due to various suboptimal Java > collections, mostly maps and lists that are either empty or contain only one > element. In such under-populated collections considerable amount of memory is > still used by just the internal implementation objects. See the attached > excerpt from the jxray report for the details. If certain collections are > almost always empty, they should be initialized lazily. If others almost > always have just 1 or 2 elements, they should be initialized with the > appropriate initial capacity of 1 or 2 (the default capacity is 16 for > HashMap and 10 for ArrayList). > Based on the attached report, we should do the following: > # {{FileSystemCounterGroup.map}} - initialize lazily > # {{CompletedTask.attempts}} - initialize with capacity 2, given most tasks > only have one or two attempts > # {{JobHistoryParser$TaskInfo.attemptsMap}} - initialize with capacity > # {{CompletedTaskAttempt.diagnostics}} - initialize with capacity 1 since it > contains one diagnostic message most of the time > # {{CompletedTask.reportDiagnostics}} - switch to ArrayList (no reason to > use the more wasteful LinkedList here) and initialize with capacity 1. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-7150) Optimize collections used by MR JHS to reduce its memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648597#comment-16648597 ] Hadoop QA commented on MAPREDUCE-7150: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 8s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 29s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 51s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 41s{color} | {color:orange} hadoop-mapreduce-project/hadoop-mapreduce-client: The patch generated 6 new + 179 unchanged - 6 fixed = 185 total (was 185) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 37s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 9s{color} | {color:red} hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 50s{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 29s{color} | {color:green} hadoop-mapreduce-client-hs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 67m 33s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core | | | Inconsistent synchronization of org.apache.hadoop.mapreduce.counters.FileSystemCounterGroup.map; locked 66% of time Unsynchronized access at FileSystemCounterGroup.java:66% of time Unsynchronized access at FileSystemCounterGroup.java:[line 281] | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | |
[jira] [Commented] (MAPREDUCE-7150) Optimize collections used by MR JHS to reduce its memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648589#comment-16648589 ] Misha Dmitriev commented on MAPREDUCE-7150: --- In Java, if some variable X is always updated in a {{synchronized}} block, the code that reads X doesn't need a read barrier to read the value that X is set to once the above {{synchronized}} block completes. What happens under the covers is that in the end of the synchronized block all the memory caches are flushed, and thus the latest value of X becomes visible to all CPUs/threads other than the one that updated X. In our case, where {{map}} (I mean the {{.map}} field, not the contents of the {{ConcurrentSkipListMap}} that it points to) is updated only once, from null to its final value, it means that all other code will see it as either null or fully constructed (i.e. not in some inconsistent half-constructed state). Subsequently, the code that wants to read/update {{map}} contents, doesn't need to be synchronized. But both before and after my change, the unsynchronized code in {{write()}} that iterates {{map}} contents can do it in parallel with the code that updates {{map}} in {{findCounter()}}. Anyway, it's unclear whether the way it worked before is considered correct/by design. So I'll make {{write()}} synchronized. > Optimize collections used by MR JHS to reduce its memory > > > Key: MAPREDUCE-7150 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7150 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver, mrv2 >Reporter: Misha Dmitriev >Assignee: Misha Dmitriev >Priority: Major > Attachments: YARN-8872.01.patch, YARN-8872.02.patch, > jhs-bad-collections.png > > > We analyzed, using jxray (www.jxray.com) a heap dump of JHS running with big > heap in a large clusters, handling large MapReduce jobs. The heap is large > (over 32GB) and 21.4% of it is wasted due to various suboptimal Java > collections, mostly maps and lists that are either empty or contain only one > element. In such under-populated collections considerable amount of memory is > still used by just the internal implementation objects. See the attached > excerpt from the jxray report for the details. If certain collections are > almost always empty, they should be initialized lazily. If others almost > always have just 1 or 2 elements, they should be initialized with the > appropriate initial capacity of 1 or 2 (the default capacity is 16 for > HashMap and 10 for ArrayList). > Based on the attached report, we should do the following: > # {{FileSystemCounterGroup.map}} - initialize lazily > # {{CompletedTask.attempts}} - initialize with capacity 2, given most tasks > only have one or two attempts > # {{JobHistoryParser$TaskInfo.attemptsMap}} - initialize with capacity > # {{CompletedTaskAttempt.diagnostics}} - initialize with capacity 1 since it > contains one diagnostic message most of the time > # {{CompletedTask.reportDiagnostics}} - switch to ArrayList (no reason to > use the more wasteful LinkedList here) and initialize with capacity 1. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-7150) Optimize collections used by MR JHS to reduce its memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648559#comment-16648559 ] Haibo Chen commented on MAPREDUCE-7150: --- Thanks [~mi...@cloudera.com] for the patch! {quote}The {{map}} object is created lazily in the synchronized method {{findCounter()}}, so according to the Java Memory Model, once it's created, it's visible to all the code, both synchronized and unsynchronized. {quote} Not a JMM expert. Doesn't the reader always need to have a read barrier to get the latest result of a variable? Is there something that synchronized block does special? Regardless, let's add synchronized to the write(DataOutput) method too to fix the findbugs warning. > Optimize collections used by MR JHS to reduce its memory > > > Key: MAPREDUCE-7150 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7150 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver, mrv2 >Reporter: Misha Dmitriev >Assignee: Misha Dmitriev >Priority: Major > Attachments: YARN-8872.01.patch, YARN-8872.02.patch, > jhs-bad-collections.png > > > We analyzed, using jxray (www.jxray.com) a heap dump of JHS running with big > heap in a large clusters, handling large MapReduce jobs. The heap is large > (over 32GB) and 21.4% of it is wasted due to various suboptimal Java > collections, mostly maps and lists that are either empty or contain only one > element. In such under-populated collections considerable amount of memory is > still used by just the internal implementation objects. See the attached > excerpt from the jxray report for the details. If certain collections are > almost always empty, they should be initialized lazily. If others almost > always have just 1 or 2 elements, they should be initialized with the > appropriate initial capacity of 1 or 2 (the default capacity is 16 for > HashMap and 10 for ArrayList). > Based on the attached report, we should do the following: > # {{FileSystemCounterGroup.map}} - initialize lazily > # {{CompletedTask.attempts}} - initialize with capacity 2, given most tasks > only have one or two attempts > # {{JobHistoryParser$TaskInfo.attemptsMap}} - initialize with capacity > # {{CompletedTaskAttempt.diagnostics}} - initialize with capacity 1 since it > contains one diagnostic message most of the time > # {{CompletedTask.reportDiagnostics}} - switch to ArrayList (no reason to > use the more wasteful LinkedList here) and initialize with capacity 1. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org