codeon opened a new issue #683: Inconsistency in the nodes tracked URL: https://github.com/apache/helix/issues/683 Hi, We are seeing this particular issue, where our nodes are not able to start the participant process and the de-registration process fails with the following exception: ``` 2019-10-03 05:31:28 [core-thread-12] ERROR c.uber.streamgate.helix.Participant - Exception while unregistering helix participant org.apache.helix.HelixException: Node dca1-prod05_streamgate_shadow_0 does not exist in config for cluster StreamgateClusterV1-DCA1-Shadow at org.apache.helix.manager.zk.ZKHelixAdmin.dropInstance(ZKHelixAdmin.java:129) at c.u.s.helix.Participant.unregister(Participant.java:134) at c.u.s.helix.Participant.registerInstance(Participant.java:109) at c.u.s.helix.Participant.run(Participant.java:61) at c.u.s.http.endpoints.HelixRegister.lambda$doHandle$0(HelixRegister.java:57) at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404) at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:309) at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884) at java.lang.Thread.run(Thread.java:748) ``` However, when the node tries to register itself, it gets an exception saying that the node is already registered. From our understanding, it happens when under some extreme circumstances (like a flappy node restarting quickly), instance information goes away from `/<HELIX_CLUSTER_NAME>/CONFIGS/PARTICIPANT` path but gets stuck in `/<HELIX_CLUSTER_NAME>/INSTANCES/` path. Then all helix commands, register/unregister/delete/disable fail. In order to fix it, we remove the node from `/<HELIX_CLUSTER_NAME>/INSTANCES/` path manually, and restart controller processes and the participant nodes so they can register cleanly again. We wanted to understand when can such a situation arise when the instance is cleaned up from one path but remains in another leading to inconsistency. Helix Version : 0.8.2
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
