[
https://issues.apache.org/jira/browse/YARN-7147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rohith Sharma K S resolved YARN-7147.
-------------------------------------
Resolution: Duplicate
Closing as duplicate!
> ATS1.5 crash due to OOM
> -----------------------
>
> Key: YARN-7147
> URL: https://issues.apache.org/jira/browse/YARN-7147
> Project: Hadoop YARN
> Issue Type: Bug
> Components: timelineserver
> Reporter: Rohith Sharma K S
> Assignee: Rohith Sharma K S
> Attachments: Screen Shot - suspect-1.png, Screen Shot - suspect-2.png
>
>
> It is observed that in production cluster, though _app-cache-size_ is set to
> minimal i.e less than 5, ATS server is going down with OOM. The
> _entity-group-fs-store.cache-store-class_ is configured with
> MemoryTimelineStore which is by default. The heap size configured for ATS
> daemon is 8GB.
> This is because ATS parse the entity log file per domain and caches it. If
> the domain has lot of entity information, then in memory cache store loads
> all the entity information which is causing OOM. After restart, again it
> caches same domain and goes OOM.
> There are possible way handle it are
> # threshold the number of entities loaded into in memory cache. This still
> can lead to OOM if data size is huge.
> # Based on the data size in the store.
> We faced 1st issue where number of entities are very huge.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]