I don't believe you will get into an inconsistent state if you interleave addResource and dropResource calls, so you should be fine.
Date: Tue, 26 Aug 2014 16:48:06 -0700 Subject: Re: Error on participant while joining cluster From: [email protected] To: [email protected] I am doing an "addResource", "dropResource" in separate threads. Its highly highly unlikely for me to call these operations on the same resource concurrently. Varun On Tue, Aug 26, 2014 at 4:45 PM, Kanak Biscuitwala <[email protected]> wrote: I would have to say, "it depends." There are operations that are idempotent (e.g. dropResource), atomic (e.g. setResourceIdealState), both, or neither (e.g. resetResource). Generally speaking, you should be OK for most operations, but there isn't any synchronization, so depending on which ZNodes are affected and how, there may be some thread safety issues. Are there specific operations you need to be thread-safe? Date: Tue, 26 Aug 2014 16:37:50 -0700 Subject: Re: Error on participant while joining cluster From: [email protected] To: [email protected] Thanks Kanak. Another question, is HelixAdmin thread safe ? Varun On Tue, Aug 26, 2014 at 3:36 PM, Kanak Biscuitwala <[email protected]> wrote: Hi Varun, To answer your question on IRC, the resource's znode is deleted immediately on dropResource(), but Helix will still be able to send dropped messages after this happens because there is enough persisted information in the current state on each node. Kanak Date: Thu, 21 Aug 2014 12:56:21 -0700 Subject: Re: Error on participant while joining cluster From: [email protected] To: [email protected] I dont see any issue at runtime. However, Helix as a support to backup the zookeeper nodes on to a file system. I think | might cause problems while storing or restoring data onto zookeeper. I would use something thats compatible with file system something like _ or probably -. On Thu, Aug 21, 2014 at 12:03 PM, Varun Sharma <[email protected]> wrote: Is there any restriction with choosing resource names. I was initially putting "/" in the name but that seems to be not working well since it ends up creating a znode with a slash. I found that if i replace a "/" with a "|", a znode can be created. Could there be any other issues inside helix with using a "|" in the resource name ? Varun On Tue, Aug 19, 2014 at 2:20 PM, Kanak Biscuitwala <[email protected]> wrote: But of course since HelixAdmin seems to be bugging out, what Jason said is right :) From: [email protected] To: [email protected] Subject: RE: Error on participant while joining cluster Date: Tue, 19 Aug 2014 14:18:23 -0700 As Jason said, typically the naming convention is host_port, which helix tools automatically parse as host and port. It is possible to use arbitrary instance IDs in theory though, so it might be worth filing as a bug. As for removing instances, the typical flow is to shut it down (so that the live instance is gone), disable it, and then drop it using HelixAdmin. From: [email protected] To: [email protected] Subject: Re: Error on participant while joining cluster Date: Tue, 19 Aug 2014 21:05:46 +0000 First make sure under /<CLUSTER_NAME>/LIVEINSTANCES/, the node you want to remove from the cluster is not running. Then you can simply remove the orphaned znodes under /<CLUTER_NAME>/INSTANCES as well as under /<CLUSTER_NAME>/CONFIGS/PARTICIPANT. Normally ":" is not recommended in the instance id, and we internally replace it with "_". We will check how to get rid of an instance with ":" in its id. Thanks, Jason From: Varun Sharma <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Tuesday, August 19, 2014 1:58 PM To: "[email protected]" <[email protected]> Subject: Re: Error on participant while joining cluster Can I simply remove the orphaned znodes under the /<CLUSTER_NAME>/INSTANCES tag ? Varun On Tue, Aug 19, 2014 at 1:54 PM, Varun Sharma <[email protected]> wrote: Another issue I have now is that I ended up registering the participants as <host>:<port> - this causes exceptions related to MBeann (because it does not like colon separators). I dont know if that is interfering with normal controller operation. I restarted the instances replacing the : with a , but those old names are still stuck in INSTANCES znode. How can I get rid of these - helix-admin seems to be replacing the ":" in the node name with an underscore "_" and can't delete the node. This is still causing MBean related exceptions in the log trace. Varun On Tue, Aug 19, 2014 at 12:18 PM, Zhen Zhang <[email protected]> wrote: sure. Will add it. From: kishore g <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Tuesday, August 19, 2014 12:14 PM To: "[email protected]" <[email protected]> Subject: Re: Error on participant while joining cluster Thanks Jason. We need to add this to the documentation. I could not find the way to enable auto-join from the docs. Should we add this to admin interface documentation? On Tue, Aug 19, 2014 at 12:06 PM, Zhen Zhang <[email protected]> wrote: Hi Varun, you need to either add the participant to the cluster before start it, or enable participant auto-join config: add participant to cluster: ./helix-admin.sh --zkSvr <ZookeeperServerAddress, e.g. localhost:2181> --addNode <clusterName, e.g. terrapin> <instanceId, e.g. hdfsterrapin-a-datanode-531b2679_9090> or, enable auto-join config: ./helix-admin.sh --zkSvr <ZookeeperServerAddress> --setConfig CLUSTER <clusterName> allowParticipantAutoJoin=true Thanks, Jason From: Varun Sharma <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Tuesday, August 19, 2014 11:47 AM To: "[email protected]" <[email protected]> Subject: Error on participant while joining cluster I am getting the following error while trying to join a cluster as a participant. THe cluster is setup and a controller has already connected to it. Can someone help out as to why this is happening ? 2014-08-19 18:41:36,843 [main] (ZKHelixManager.java:727) INFO Handling new session, session id: 147a7beb2dd63f4, instance: hdfsterrapin-a-datanode-531b2679:9090, instanceTye: PARTICIPANT, cluster: terrapin, zkconnection: State:CONNECTED Timeout:30000 sessionid:0x147a7beb2dd63f4 local:/10.65.145.80:43854 remoteserver:terrapinzk001a/10.115.59.31:2181 lastZxid:0 xid:1 sent:1 recv:1 queuedpkts:0 pendingresp:0 queuedevents:0 2014-08-19 18:41:36,843 [main] (ParticipantHealthReportTask.java:67) WARN ParticipantHealthReportTimerTask already stopped 2014-08-19 18:41:36,914 [main] (ParticipantManagerHelper.java:101) INFO instance: hdfsterrapin-a-datanode-531b2679:9090 auto-joining terrapin is false 2014-08-19 18:41:36,917 [main] (ZKUtil.java:95) INFO Invalid instance setup, missing znode path: /terrapin/CONFIGS/PARTICIPANT/hdfsterrapin-a-datanode-531b2679:9090 2014-08-19 18:41:36,918 [main] (ZKUtil.java:95) INFO Invalid instance setup, missing znode path: /terrapin/INSTANCES/hdfsterrapin-a-datanode-531b2679:9090/MESSAGES 2014-08-19 18:41:36,918 [main] (ZKUtil.java:95) INFO Invalid instance setup, missing znode path: /terrapin/INSTANCES/hdfsterrapin-a-datanode-531b2679:9090/CURRENTSTATES 2014-08-19 18:41:36,919 [main] (ZKUtil.java:95) INFO Invalid instance setup, missing znode path: /terrapin/INSTANCES/hdfsterrapin-a-datanode-531b2679:9090/STATUSUPDATES 2014-08-19 18:41:36,920 [main] (ZKUtil.java:95) INFO Invalid instance setup, missing znode path: /terrapin/INSTANCES/hdfsterrapin-a-datanode-531b2679:9090/ERRORS 2014-08-19 18:41:36,920 [main] (ZKHelixManager.java:496) ERROR fail to createClient. org.apache.helix.HelixException: Initial cluster structure is not set up for instance: hdfsterrapin-a-datanode-531b2679:9090, instanceType: PARTICIPANT at org.apache.helix.manager.zk.ParticipantManagerHelper.joinCluster(ParticipantManagerHelper.java:108) at org.apache.helix.manager.zk.ZKHelixManager.handleNewSessionAsParticipant(ZKHelixManager.java:869) at org.apache.helix.manager.zk.ZKHelixManager.handleNewSession(ZKHelixManager.java:838) at org.apache.helix.manager.zk.ZKHelixManager.createClient(ZKHelixManager.java:493) at org.apache.helix.manager.zk.ZKHelixManager.connect(ZKHelixManager.java:519) at com.pinterest.terrapin.server.TerrapinServerHandler.start(TerrapinServerHandler.java:84) at com.pinterest.terrapin.server.TerrapinServerMain.main(TerrapinServerMain.java:31) 2014-08-19 18:41:36,921 [main] (ZKHelixManager.java:522) ERROR fail to connect hdfsterrapin-a-datanode-531b2679:9090 org.apache.helix.HelixException: Initial cluster structure is not set up for instance: hdfsterrapin-a-datanode-531b2679:9090, instanceType: PARTICIPANT at org.apache.helix.manager.zk.ParticipantManagerHelper.joinCluster(ParticipantManagerHelper.java:108) at org.apache.helix.manager.zk.ZKHelixManager.handleNewSessionAsParticipant(ZKHelixManager.java:869) at org.apache.helix.manager.zk.ZKHelixManager.handleNewSession(ZKHelixManager.java:838) at org.apache.helix.manager.zk.ZKHelixManager.createClient(ZKHelixManager.java:493) at org.apache.helix.manager.zk.ZKHelixManager.connect(ZKHelixManager.java:519) at com.pinterest.terrapin.server.TerrapinServerHandler.start(TerrapinServerHandler.java:84) at com.pinterest.terrapin.server.TerrapinServerMain.main(TerrapinServerMain.java:31)
