[
https://issues.apache.org/jira/browse/YARN-3934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15152283#comment-15152283
]
Dustin Cote commented on YARN-3934:
-----------------------------------
[~mingma] can you have a look at the proposed patch and let me know if you feel
it addresses your issue appropriately?
> Application with large ApplicationSubmissionContext can cause RM to exit when
> ZK store is used
> ----------------------------------------------------------------------------------------------
>
> Key: YARN-3934
> URL: https://issues.apache.org/jira/browse/YARN-3934
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Ming Ma
> Assignee: Dustin Cote
> Attachments: YARN-3934-1.patch
>
>
> Use the following steps to test.
> 1. Set up ZK as the RM HA store.
> 2. Submit a job that refers to lots of distributed cache files with long HDFS
> path, which will cause the app state size to exceed ZK's max object size
> limit.
> 3. RM can't write to ZK and exit with the following exception.
> {noformat}
> 2015-07-10 22:21:13,002 FATAL
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a
> org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type
> STATE_STORE_OP_FAILED. Cause:
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode
> = Session expired
> at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
> at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:935)
> at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:944)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:941)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1083)
> {noformat}
> In this case, RM could have rejected the app during submitApplication RPC if
> the size of ApplicationSubmissionContext is too large.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)