[jira] [Resolved] (YARN-7147) ATS1.5 crash due to OOM

Rohith Sharma K S (JIRA) Fri, 01 Sep 2017 08:20:23 -0700

     [ 
https://issues.apache.org/jira/browse/YARN-7147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Rohith Sharma K S resolved YARN-7147.
-------------------------------------
    Resolution: Duplicate

Closing as duplicate!

> ATS1.5 crash due to OOM
> -----------------------
>
>                 Key: YARN-7147
>                 URL: https://issues.apache.org/jira/browse/YARN-7147
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: timelineserver
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>         Attachments: Screen Shot - suspect-1.png, Screen Shot - suspect-2.png
>
>
> It is observed that in production cluster, though _app-cache-size_ is set to 
> minimal i.e less than 5, ATS server is going down with OOM. The 
> _entity-group-fs-store.cache-store-class_ is configured with 
> MemoryTimelineStore which is by default. The heap size configured for ATS 
> daemon is 8GB. 
> This is because ATS parse the entity log file per domain and caches it. If 
> the domain has lot of entity information, then in memory cache store loads 
> all the entity information which is causing OOM. After restart, again it 
> caches same domain and goes OOM. 
> There are  possible way handle it are
> # threshold the number of entities loaded into in memory cache. This still 
> can lead to OOM if data size is huge. 
> # Based on the data size in the store. 
> We faced 1st issue where number of entities are very huge.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Resolved] (YARN-7147) ATS1.5 crash due to OOM

Reply via email to