Misha Dmitriev created YARN-7386:
------------------------------------

             Summary: Duplicate Strings in various places in Yarn memory
                 Key: YARN-7386
                 URL: https://issues.apache.org/jira/browse/YARN-7386
             Project: Hadoop YARN
          Issue Type: Improvement
            Reporter: Misha Dmitriev
            Assignee: Misha Dmitriev


Using jxray (www.jxray.com) I've analyzed a Yarn RM heap dump obtained in a big 
cluster. The tool uncovered several sources of memory waste. One problem is 
duplicate strings:

{code}
Total strings     Unique strings          Duplicate values       Overhead 
 361,506         86,672  5,928  22,886K (7.6%)
{code}

They are spread across a number of locations. The biggest source of waste is 
the following reference chain:

{code}

7,416K (2.5%), 31292 / 62% dup strings (499 unique), 31292 dup backing arrays:
↖{j.u.HashMap}.values
↖org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.environment
↖org.apache.hadoop.yarn.api.records.impl.pb.ApplicationSubmissionContextPBImpl.amContainer
↖org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.submissionContext
↖{java.util.concurrent.ConcurrentHashMap}.values
↖org.apache.hadoop.yarn.server.resourcemanager.RMActiveServiceContext.applications
↖org.apache.hadoop.yarn.server.resourcemanager.RMContextImpl.activeServiceContext
↖org.apache.hadoop.yarn.server.resourcemanager.rmapp.monitor.RMAppLifetimeMonitor.rmContext
↖Java Local@3ed9ef820 
(org.apache.hadoop.yarn.server.resourcemanager.rmapp.monitor.RMAppLifetimeMonitor)
{code}

However, there are also many others. Mostly they are strings in proto buffer or 
proto buffer builder objects. I plan to get rid of at least the worst offenders 
by inserting String.intern() calls. String.intern() used to consume memory in 
PermGen and was not very scalable up until about the early JDK 7 versions, but 
has greatly improved since then, and I've used it many times without any issues.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to