[
https://issues.apache.org/jira/browse/YARN-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14320666#comment-14320666
]
Jian He commented on YARN-1514:
-------------------------------
- I tried to run from command line and specify appsize=5 attemptsize=10, it
fails with
{code}
15/02/13 11:44:42 INFO server.PrepRequestProcessor: Got user-level
KeeperException when processing sessionid:0x14b8478c39d0000 type:multi
cxid:0x25 zxid:0x15 txntype:-1 reqpath:n/a aborting remaining multi ops. Error
Path:/Test/ZKRMStateRoot/RMAppRoot/application_1352994193343_0001/appattempt_1352994193343_0000_000000
Error:KeeperErrorCode = NoNode for
/Test/ZKRMStateRoot/RMAppRoot/application_1352994193343_0001/appattempt_1352994193343_0000_000000
15/02/13 11:44:42 INFO recovery.ZKRMStateStore: Exception while executing a ZK
operation.
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)
at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:907)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:904)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1049)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1070)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:904)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.removeApplicationStateInternal(ZKRMStateStore.java:698)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStorePerf.run(TestZKRMStateStorePerf.java:232)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStorePerf.main(TestZKRMStateStorePerf.java:266)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
15/02/13 11:44:42 INFO recovery.ZKRMStateStore: Maxed out ZK retries. Giving up!
failed to cleanup. : KeeperErrorCode = NoNode
{code}
Do you have the same issue ? not sure if it’s my local problem.
- ZK_TIMEOUT_MS is not used, we can remove
- {{ContainerId.newInstance(attemptId, 0)}} this looks still using the
deprecated method
- this is not important at all. but just feel a larger number of applications
and less number of attempts my be more similar to real scenario.
{code}
private int ZK_PERF_NUM_APP_DEFAULT = 100;
private int ZK_PERF_NUM_APPATTEMPT_PER_APP = 100;
{code}
- Regarding the excessive loggings, is it possible to suppress the LOG.info()
if running from command line ?
> Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA
> --------------------------------------------------------------------
>
> Key: YARN-1514
> URL: https://issues.apache.org/jira/browse/YARN-1514
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Tsuyoshi OZAWA
> Assignee: Tsuyoshi OZAWA
> Fix For: 2.7.0
>
> Attachments: YARN-1514.1.patch, YARN-1514.2.patch, YARN-1514.3.patch,
> YARN-1514.4.patch, YARN-1514.4.patch, YARN-1514.5.patch, YARN-1514.5.patch,
> YARN-1514.6.patch, YARN-1514.wip-2.patch, YARN-1514.wip.patch
>
>
> ZKRMStateStore is very sensitive to ZNode-related operations as discussed in
> YARN-1307, YARN-1378 and so on. Especially, ZKRMStateStore#loadState is
> called when RM-HA cluster does failover. Therefore, its execution time
> impacts failover time of RM-HA.
> We need utility to benchmark time execution time of ZKRMStateStore#loadStore
> as development tool.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)