[ https://issues.apache.org/jira/browse/STORM-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
P. Taylor Goetz reassigned STORM-1114: -------------------------------------- Assignee: P. Taylor Goetz > Racing condition in trident zookeeper zk-node create/delete > ----------------------------------------------------------- > > Key: STORM-1114 > URL: https://issues.apache.org/jira/browse/STORM-1114 > Project: Apache Storm > Issue Type: Documentation > Components: storm-core > Reporter: Zhuo Liu > Assignee: P. Taylor Goetz > Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > In production for some trident topology, we met the bug that some workers are > trying to create a zk-node that is already existent or delete a zk node that > has already been deleted. This causes the worker process to die. > > We dissect the problem and figure out that there exists racing condition in > trident TransactionalState's zk-node create and delete codes. > failure stack trace in worker.log: > {noformat} > Caused by: > org.apache.storm.shade.org.apache.zookeeper.KeeperException$NodeExistsException: > KeeperErrorCode = NodeExists for /ignoreStoredMetadata > at > org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:119) > ~[storm-core-0.10.1.y.jar:0.10.1.y] > at > org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > ~[storm-core-0.10.1.y.jar:0.10.1.y] > at > org.apache.storm.shade.org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) > ~[storm-core-0.10.1.y.jar:0.10.1.y] > at > org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:676) > ~[storm-core-0.10.1.y.jar:0.10.1.y] > at > org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:660) > ~[storm-core-0.10.1.y.jar:0.10.1.y] > at > org.apache.storm.shade.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) > ~[storm-core-0.10.1.y.jar:0.10.1.y] > at > org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:656) > ~[storm-core-0.10.1.y.jar:0.10.1.y] > at > org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:441) > ~[storm-core-0.10.1.y.jar:0.10.1.y] > at > org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:431) > ~[storm-core-0.10.1.y.jar:0.10.1.y] > at > org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl$3.forPath(CreateBuilderImpl.java:239) > ~[storm-core-0.10.1.y.jar:0.10.1.y] > at > org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl$3.forPath(CreateBuilderImpl.java:193) > ~[storm-core-0.10.1.y.jar:0.10.1.y] > at > storm.trident.topology.state.TransactionalState.forPath(TransactionalState.java:83) > ~[storm-core-0.10.1.y.jar:0.10.1.y] > at > storm.trident.topology.state.TransactionalState.createNode(TransactionalState.java:100) > ~[storm-core-0.10.1.y.jar:0.10.1.y] > at > storm.trident.topology.state.TransactionalState.setData(TransactionalState.java:115) > ~[storm-core-0.10.1.y.jar:0.10.1.y] > ... 9 more > 2015-10-14 18:10:43.786 b.s.util [ERROR] Halting process: ("Worker died") > {noformat} > {noformat} > Caused by: > org.apache.storm.shade.org.apache.zookeeper.KeeperException$NoNodeException: > KeeperErrorCode = NoNode for /rainbowHdfsPath > at > org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:111) > ~[storm-core-0.10.1.y.jar:0.10.1.y] > at > org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > ~[storm-core-0.10.1.y.jar:0.10.1.y] > at > org.apache.storm.shade.org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873) > ~[storm-core-0.10.1.y.jar:0.10.1.y] > at > org.apache.storm.shade.org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:239) > ~[storm-core-0.10.1.y.jar:0.10.1.y] > at > org.apache.storm.shade.org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:234) > ~[storm-core-0.10.1.y.jar:0.10.1.y] > at > org.apache.storm.shade.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) > ~[storm-core-0.10.1.y.jar:0.10.1.y] > at > org.apache.storm.shade.org.apache.curator.framework.imps.DeleteBuilderImpl.pathInForeground(DeleteBuilderImpl.java:230) > ~[storm-core-0.10.1.y.jar:0.10.1.y] > at > org.apache.storm.shade.org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:215) > ~[storm-core-0.10.1.y.jar:0.10.1.y] > at > org.apache.storm.shade.org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:42) > ~[storm-core-0.10.1.y.jar:0.10.1.y] > at > storm.trident.topology.state.TransactionalState.delete(TransactionalState.java:126) > ~[storm-core-0.10.1.y.jar:0.10.1.y] > ... 12 more > 2015-10-14 18:10:28.799 b.s.util [ERROR] Halting process: ("Worker died") > java.lang.RuntimeException: ("Worker died") > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)