[
https://issues.apache.org/jira/browse/YARN-8872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16648270#comment-16648270
]
Misha Dmitriev commented on YARN-8872:
--------------------------------------
[~pbacsko] I think the situation here is the same as before. Both before and
after this change, the {{size()}} method can never see {{map}} in a really
inconsistent (half-constructed) state, because this object (a
{{ConcurrentSkipListMap)}} is first fully constructed, and then the {{map}}
reference is set to point to it. You are right that if {{findCounter()}} and
{{size()}} run concurrently after that point, then the first method can keep
adding objects to {{map}} and the second one may iterate a smaller number of
objects (or none at all) and return a smaller size. But the same thing could
happen before this change.
Note also that since this is a concurrent map implementation, iterating and
adding/removing elements concurrently is safe (will not cause exceptions).
According to the javadoc of {{ConcurrentSkipListMap.values()}}, "The view's
{{iterator}} is a "weakly consistent" iterator that will never throw
[{{ConcurrentModificationException}}|https://docs.oracle.com/javase/7/docs/api/java/util/ConcurrentModificationException.html],
and guarantees to traverse elements as they existed upon construction of the
iterator, and may (but is not guaranteed to) reflect any modifications
subsequent to construction."
However, making {{size()}} synchronized will still make the code a little more
predictable, at least in tests if nothing else. So I can make this change if
you would like.
> Optimize collections used by Yarn JHS to reduce its memory
> ----------------------------------------------------------
>
> Key: YARN-8872
> URL: https://issues.apache.org/jira/browse/YARN-8872
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: yarn
> Reporter: Misha Dmitriev
> Assignee: Misha Dmitriev
> Priority: Major
> Attachments: YARN-8872.01.patch, jhs-bad-collections.png
>
>
> We analyzed, using jxray (www.jxray.com) a heap dump of JHS running with big
> heap in a large clusters, handling large MapReduce jobs. The heap is large
> (over 32GB) and 21.4% of it is wasted due to various suboptimal Java
> collections, mostly maps and lists that are either empty or contain only one
> element. In such under-populated collections considerable amount of memory is
> still used by just the internal implementation objects. See the attached
> excerpt from the jxray report for the details. If certain collections are
> almost always empty, they should be initialized lazily. If others almost
> always have just 1 or 2 elements, they should be initialized with the
> appropriate initial capacity of 1 or 2 (the default capacity is 16 for
> HashMap and 10 for ArrayList).
> Based on the attached report, we should do the following:
> # {{FileSystemCounterGroup.map}} - initialize lazily
> # {{CompletedTask.attempts}} - initialize with capacity 2, given most tasks
> only have one or two attempts
> # {{JobHistoryParser$TaskInfo.attemptsMap}} - initialize with capacity
> # {{CompletedTaskAttempt.diagnostics}} - initialize with capacity 1 since it
> contains one diagnostic message most of the time
> # {{CompletedTask.reportDiagnostics}} - switch to ArrayList (no reason to
> use the more wasteful LinkedList here) and initialize with capacity 1.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]