Ming Ma created YARN-3934:
-----------------------------
Summary: Application with large ApplicationSubmissionContext can
cause RM to exit when ZK store is used
Key: YARN-3934
URL: https://issues.apache.org/jira/browse/YARN-3934
Project: Hadoop YARN
Issue Type: Bug
Reporter: Ming Ma
Use the following steps to test.
1. Set up ZK as the RM HA store.
2. Submit a job that refers to lots of distributed cache files with long HDFS
path, which will cause the app state size to exceed ZK's max object size limit.
3. RM can't write to ZK and exit with the following exception.
{noformat}
2015-07-10 22:21:13,002 FATAL
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a
org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type
STATE_STORE_OP_FAILED. Cause:
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode =
Session expired
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:935)
at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:944)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:941)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1083)
{noformat}
In this case, RM could have rejected the app during submitApplication RPC if
the size of ApplicationSubmissionContext is too large.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)