First make sure under /<CLUSTER_NAME>/LIVEINSTANCES/, the node you want to remove from the cluster is not running. Then you can simply remove the orphaned znodes under /<CLUTER_NAME>/INSTANCES as well as under /<CLUSTER_NAME>/CONFIGS/PARTICIPANT. Normally ":" is not recommended in the instance id, and we internally replace it with "_". We will check how to get rid of an instance with ":" in its id.
Thanks, Jason From: Varun Sharma <[email protected]<mailto:[email protected]>> Reply-To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Date: Tuesday, August 19, 2014 1:58 PM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: Re: Error on participant while joining cluster Can I simply remove the orphaned znodes under the /<CLUSTER_NAME>/INSTANCES tag ? Varun On Tue, Aug 19, 2014 at 1:54 PM, Varun Sharma <[email protected]<mailto:[email protected]>> wrote: Another issue I have now is that I ended up registering the participants as <host>:<port> - this causes exceptions related to MBeann (because it does not like colon separators). I dont know if that is interfering with normal controller operation. I restarted the instances replacing the : with a , but those old names are still stuck in INSTANCES znode. How can I get rid of these - helix-admin seems to be replacing the ":" in the node name with an underscore "_" and can't delete the node. This is still causing MBean related exceptions in the log trace. Varun On Tue, Aug 19, 2014 at 12:18 PM, Zhen Zhang <[email protected]<mailto:[email protected]>> wrote: sure. Will add it. From: kishore g <[email protected]<mailto:[email protected]>> Reply-To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Date: Tuesday, August 19, 2014 12:14 PM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: Re: Error on participant while joining cluster Thanks Jason. We need to add this to the documentation. I could not find the way to enable auto-join from the docs. Should we add this to admin interface documentation? On Tue, Aug 19, 2014 at 12:06 PM, Zhen Zhang <[email protected]<mailto:[email protected]>> wrote: Hi Varun, you need to either add the participant to the cluster before start it, or enable participant auto-join config: add participant to cluster: ./helix-admin.sh --zkSvr <ZookeeperServerAddress, e.g. localhost:2181> --addNode <clusterName, e.g. terrapin> <instanceId, e.g. hdfsterrapin-a-datanode-531b2679_9090> or, enable auto-join config: ./helix-admin.sh --zkSvr <ZookeeperServerAddress> --setConfig CLUSTER <clusterName> allowParticipantAutoJoin=true Thanks, Jason From: Varun Sharma <[email protected]<mailto:[email protected]>> Reply-To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Date: Tuesday, August 19, 2014 11:47 AM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: Error on participant while joining cluster I am getting the following error while trying to join a cluster as a participant. THe cluster is setup and a controller has already connected to it. Can someone help out as to why this is happening ? 2014-08-19 18:41:36,843 [main] (ZKHelixManager.java:727) INFO Handling new session, session id: 147a7beb2dd63f4, instance: hdfsterrapin-a-datanode-531b2679:9090, instanceTye: PARTICIPANT, cluster: terrapin, zkconnection: State:CONNECTED Timeout:30000 sessionid:0x147a7beb2dd63f4 local:/10.65.145.80:43854<http://10.65.145.80:43854> remoteserver:terrapinzk001a/10.115.59.31:2181<http://10.115.59.31:2181> lastZxid:0 xid:1 sent:1 recv:1 queuedpkts:0 pendingresp:0 queuedevents:0 2014-08-19 18:41:36,843 [main] (ParticipantHealthReportTask.java:67) WARN ParticipantHealthReportTimerTask already stopped 2014-08-19 18:41:36,914 [main] (ParticipantManagerHelper.java:101) INFO instance: hdfsterrapin-a-datanode-531b2679:9090 auto-joining terrapin is false 2014-08-19 18:41:36,917 [main] (ZKUtil.java:95) INFO Invalid instance setup, missing znode path: /terrapin/CONFIGS/PARTICIPANT/hdfsterrapin-a-datanode-531b2679:9090 2014-08-19 18:41:36,918 [main] (ZKUtil.java:95) INFO Invalid instance setup, missing znode path: /terrapin/INSTANCES/hdfsterrapin-a-datanode-531b2679:9090/MESSAGES 2014-08-19 18:41:36,918 [main] (ZKUtil.java:95) INFO Invalid instance setup, missing znode path: /terrapin/INSTANCES/hdfsterrapin-a-datanode-531b2679:9090/CURRENTSTATES 2014-08-19 18:41:36,919 [main] (ZKUtil.java:95) INFO Invalid instance setup, missing znode path: /terrapin/INSTANCES/hdfsterrapin-a-datanode-531b2679:9090/STATUSUPDATES 2014-08-19 18:41:36,920 [main] (ZKUtil.java:95) INFO Invalid instance setup, missing znode path: /terrapin/INSTANCES/hdfsterrapin-a-datanode-531b2679:9090/ERRORS 2014-08-19 18:41:36,920 [main] (ZKHelixManager.java:496) ERROR fail to createClient. org.apache.helix.HelixException: Initial cluster structure is not set up for instance: hdfsterrapin-a-datanode-531b2679:9090, instanceType: PARTICIPANT at org.apache.helix.manager.zk.ParticipantManagerHelper.joinCluster(ParticipantManagerHelper.java:108) at org.apache.helix.manager.zk.ZKHelixManager.handleNewSessionAsParticipant(ZKHelixManager.java:869) at org.apache.helix.manager.zk.ZKHelixManager.handleNewSession(ZKHelixManager.java:838) at org.apache.helix.manager.zk.ZKHelixManager.createClient(ZKHelixManager.java:493) at org.apache.helix.manager.zk.ZKHelixManager.connect(ZKHelixManager.java:519) at com.pinterest.terrapin.server.TerrapinServerHandler.start(TerrapinServerHandler.java:84) at com.pinterest.terrapin.server.TerrapinServerMain.main(TerrapinServerMain.java:31) 2014-08-19 18:41:36,921 [main] (ZKHelixManager.java:522) ERROR fail to connect hdfsterrapin-a-datanode-531b2679:9090 org.apache.helix.HelixException: Initial cluster structure is not set up for instance: hdfsterrapin-a-datanode-531b2679:9090, instanceType: PARTICIPANT at org.apache.helix.manager.zk.ParticipantManagerHelper.joinCluster(ParticipantManagerHelper.java:108) at org.apache.helix.manager.zk.ZKHelixManager.handleNewSessionAsParticipant(ZKHelixManager.java:869) at org.apache.helix.manager.zk.ZKHelixManager.handleNewSession(ZKHelixManager.java:838) at org.apache.helix.manager.zk.ZKHelixManager.createClient(ZKHelixManager.java:493) at org.apache.helix.manager.zk.ZKHelixManager.connect(ZKHelixManager.java:519) at com.pinterest.terrapin.server.TerrapinServerHandler.start(TerrapinServerHandler.java:84) at com.pinterest.terrapin.server.TerrapinServerMain.main(TerrapinServerMain.java:31)
