Misha Dmitriev created YARN-8872:
------------------------------------
Summary: Optimize collections used by Yarn JHS to reduce its memory
Key: YARN-8872
URL: https://issues.apache.org/jira/browse/YARN-8872
Project: Hadoop YARN
Issue Type: Improvement
Components: yarn
Reporter: Misha Dmitriev
Assignee: Misha Dmitriev
Attachments: jhs-bad-collections.png
We analyzed, using jxray (www.jxray.com) a heap dump of JHS running with big
heap in a large clusters, handling large MapReduce jobs. The heap is large
(over 32GB) and 21.4% of it is wasted due to various suboptimal Java
collections, mostly maps and lists that are either empty or contain only one
element. In such under-populated collections considerable amount of memory is
still used by just the internal implementation objects. See the attached
excerpt from the jxray report for the details. If certain collections are
almost always empty, they should be initialized lazily. If others almost always
have just 1 or 2 elements, they should be initialized with the appropriate
initial capacity, which is much smaller than e.g. the default 16 for HashMap
and 10 for ArrayList.
Based on the attached report, we should do the following:
# {{FileSystemCounterGroup.map}} - initialize lazily
# {{CompletedTask.attempts}} - initialize with capacity 2, given most tasks
only have one or two attempts
# {{JobHistoryParser$TaskInfo.attemptsMap}} - initialize with capacity 2
# {{CompletedTaskAttempt.diagnostics}} - initialize with capacity 1 since it
contains one diagnostic message most of the time.
# {{CompletedTask.reportDiagnostics}} - switch to ArrayList (no reason to use
the more wasteful LinkedList here) and initialize with capacity 1.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]