[
https://issues.apache.org/jira/browse/YARN-7163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rohith Sharma K S updated YARN-7163:
------------------------------------
Attachment: YARN-7163.01.patch
Updating the patch for keeping RM reference in RMDelegationTokenSecretManager
rather than RMContext. This always points to new rmcontext which got created
during stand by transition so that old rmcontext will be GC ed.
> RM crashes with OOM in secured cluster when HA is enabled
> ---------------------------------------------------------
>
> Key: YARN-7163
> URL: https://issues.apache.org/jira/browse/YARN-7163
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Reporter: Rohith Sharma K S
> Assignee: Rohith Sharma K S
> Attachments: YARN-7163.01.patch
>
>
> It is observed that RM crashes with heap space OOM in secure cluster(http
> authentication is kerborse) when RM HA is enabled.
> Scenario is
> 1. Start RM in HA secure mode. Lets say RM1 is active mode.
> 2. Run many applications so that it uses greater than 50% of heap space
> configured. Lets say, if heap space is 2GB, then run applications that occupy
> 1.5GB of heap space.
> 3. Switch RM to StandBy and bring back to Active! While recovering
> applications from state store, RM crashes with OOM.
> *Note* : This issue will happen only when RM is started as ACTIVE directly.
> (not switched from standby to active during start of JVM)
> Heap dump shows that RMAuthenticationFilter holds 60% heap space! And other
> 40% held by RMAppState which is during recovering from state store. This
> exceeds the heap space and crashes with OOM.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]