I am doing an "addResource", "dropResource" in separate threads. Its highly highly unlikely for me to call these operations on the same resource concurrently.
Varun On Tue, Aug 26, 2014 at 4:45 PM, Kanak Biscuitwala <[email protected]> wrote: > I would have to say, "it depends." There are operations that are > idempotent (e.g. dropResource), atomic (e.g. setResourceIdealState), both, > or neither (e.g. resetResource). Generally speaking, you should be OK for > most operations, but there isn't any synchronization, so depending on which > ZNodes are affected and how, there may be some thread safety issues. > > Are there specific operations you need to be thread-safe? > > > ------------------------------ > Date: Tue, 26 Aug 2014 16:37:50 -0700 > > Subject: Re: Error on participant while joining cluster > From: [email protected] > To: [email protected] > > > Thanks Kanak. Another question, is HelixAdmin thread safe ? > > Varun > > > On Tue, Aug 26, 2014 at 3:36 PM, Kanak Biscuitwala <[email protected]> > wrote: > > Hi Varun, > > > To answer your question on IRC, the resource's znode is deleted > immediately on dropResource(), but Helix will still be able to send dropped > messages after this happens because there is enough persisted information > in the current state on each node. > > > Kanak > > ------------------------------ > Date: Thu, 21 Aug 2014 12:56:21 -0700 > > Subject: Re: Error on participant while joining cluster > From: [email protected] > To: [email protected] > > > I dont see any issue at runtime. However, Helix as a support to backup the > zookeeper nodes on to a file system. I think | might cause problems while > storing or restoring data onto zookeeper. I would use something thats > compatible with file system something like _ or probably -. > > > On Thu, Aug 21, 2014 at 12:03 PM, Varun Sharma <[email protected]> > wrote: > > Is there any restriction with choosing resource names. I was initially > putting "/" in the name but that seems to be not working well since it ends > up creating a znode with a slash. I found that if i replace a "/" with a > "|", a znode can be created. Could there be any other issues inside helix > with using a "|" in the resource name ? > > Varun > > > On Tue, Aug 19, 2014 at 2:20 PM, Kanak Biscuitwala <[email protected]> > wrote: > > But of course since HelixAdmin seems to be bugging out, what Jason said is > right :) > > ------------------------------ > From: [email protected] > To: [email protected] > Subject: RE: Error on participant while joining cluster > Date: Tue, 19 Aug 2014 14:18:23 -0700 > > > As Jason said, typically the naming convention is host_port, which helix > tools automatically parse as host and port. It is possible to use arbitrary > instance IDs in theory though, so it might be worth filing as a bug. > > As for removing instances, the typical flow is to shut it down (so that > the live instance is gone), disable it, and then drop it using HelixAdmin. > > ------------------------------ > From: [email protected] > To: [email protected] > Subject: Re: Error on participant while joining cluster > Date: Tue, 19 Aug 2014 21:05:46 +0000 > > First make sure under /<CLUSTER_NAME>/LIVEINSTANCES/, the node you want to > remove from the cluster is not running. Then you can simply remove the > orphaned znodes under /<CLUTER_NAME>/INSTANCES as well as under > /<CLUSTER_NAME>/CONFIGS/PARTICIPANT. Normally ":" is not recommended in the > instance id, and we internally replace it with "_". We will check how to > get rid of an instance with ":" in its id. > > Thanks, > Jason > > From: Varun Sharma <[email protected]> > Reply-To: "[email protected]" <[email protected]> > Date: Tuesday, August 19, 2014 1:58 PM > To: "[email protected]" <[email protected]> > Subject: Re: Error on participant while joining cluster > > Can I simply remove the orphaned znodes under the > /<CLUSTER_NAME>/INSTANCES tag ? > > Varun > > > On Tue, Aug 19, 2014 at 1:54 PM, Varun Sharma <[email protected]> wrote: > > Another issue I have now is that I ended up registering the participants > as <host>:<port> - this causes exceptions related to MBeann (because it > does not like colon separators). I dont know if that is interfering with > normal controller operation. I restarted the instances replacing the : with > a , but those old names are still stuck in INSTANCES znode. How can I get > rid of these - helix-admin seems to be replacing the ":" in the node name > with an underscore "_" and can't delete the node. > > This is still causing MBean related exceptions in the log trace. > > Varun > > > On Tue, Aug 19, 2014 at 12:18 PM, Zhen Zhang <[email protected]> wrote: > > sure. Will add it. > > From: kishore g <[email protected]> > Reply-To: "[email protected]" <[email protected]> > Date: Tuesday, August 19, 2014 12:14 PM > To: "[email protected]" <[email protected]> > Subject: Re: Error on participant while joining cluster > > Thanks Jason. We need to add this to the documentation. I could not > find the way to enable auto-join from the docs. Should we add this to admin > interface documentation? > > > > > > > On Tue, Aug 19, 2014 at 12:06 PM, Zhen Zhang <[email protected]> wrote: > > Hi Varun, you need to either add the participant to the cluster before > start it, or enable participant auto-join config: > > add participant to cluster: > ./helix-admin.sh --zkSvr <ZookeeperServerAddress, e.g. localhost:2181> > --addNode <clusterName, e.g. terrapin> <instanceId, e.g. > hdfsterrapin-a-datanode-531b2679_9090> > > or, enable auto-join config: > ./helix-admin.sh --zkSvr <ZookeeperServerAddress> --setConfig CLUSTER > <clusterName> allowParticipantAutoJoin=true > > Thanks, > Jason > > > From: Varun Sharma <[email protected]> > Reply-To: "[email protected]" <[email protected]> > Date: Tuesday, August 19, 2014 11:47 AM > To: "[email protected]" <[email protected]> > Subject: Error on participant while joining cluster > > I am getting the following error while trying to join a cluster as a > participant. THe cluster is setup and a controller has already connected to > it. Can someone help out as to why this is happening ? > > > 2014-08-19 18:41:36,843 [main] (ZKHelixManager.java:727) INFO Handling > new session, session id: 147a7beb2dd63f4, instance: > hdfsterrapin-a-datanode-531b2679:9090, instanceTye: PARTICIPANT, cluster: > terrapin, zkconnection: State:CONNECTED Timeout:30000 > sessionid:0x147a7beb2dd63f4 local:/10.65.145.80:43854 > remoteserver:terrapinzk001a/10.115.59.31:2181 lastZxid:0 xid:1 sent:1 > recv:1 queuedpkts:0 pendingresp:0 queuedevents:0 > 2014-08-19 18:41:36,843 [main] (ParticipantHealthReportTask.java:67) WARN > ParticipantHealthReportTimerTask already stopped > 2014-08-19 18:41:36,914 [main] (ParticipantManagerHelper.java:101) INFO > instance: hdfsterrapin-a-datanode-531b2679:9090 auto-joining terrapin is > false > *2014-08-19 18:41:36,917 [main] (ZKUtil.java:95) INFO Invalid instance > setup, missing znode path: > /terrapin/CONFIGS/PARTICIPANT/hdfsterrapin-a-datanode-531b2679:9090* > *2014-08-19 18:41:36,918 [main] (ZKUtil.java:95) INFO Invalid instance > setup, missing znode path: > /terrapin/INSTANCES/hdfsterrapin-a-datanode-531b2679:9090/MESSAGES* > *2014-08-19 18:41:36,918 [main] (ZKUtil.java:95) INFO Invalid instance > setup, missing znode path: > /terrapin/INSTANCES/hdfsterrapin-a-datanode-531b2679:9090/CURRENTSTATES* > *2014-08-19 18:41:36,919 [main] (ZKUtil.java:95) INFO Invalid instance > setup, missing znode path: > /terrapin/INSTANCES/hdfsterrapin-a-datanode-531b2679:9090/STATUSUPDATES* > *2014-08-19 18:41:36,920 [main] (ZKUtil.java:95) INFO Invalid instance > setup, missing znode path: > /terrapin/INSTANCES/hdfsterrapin-a-datanode-531b2679:9090/ERRORS* > *2014-08-19 18:41:36,920 [main] (ZKHelixManager.java:496) ERROR fail to > createClient.* > *org.apache.helix.HelixException: Initial cluster structure is not set up > for instance: hdfsterrapin-a-datanode-531b2679:9090, instanceType: > PARTICIPANT* > at > org.apache.helix.manager.zk.ParticipantManagerHelper.joinCluster(ParticipantManagerHelper.java:108) > at > org.apache.helix.manager.zk.ZKHelixManager.handleNewSessionAsParticipant(ZKHelixManager.java:869) > at > org.apache.helix.manager.zk.ZKHelixManager.handleNewSession(ZKHelixManager.java:838) > at > org.apache.helix.manager.zk.ZKHelixManager.createClient(ZKHelixManager.java:493) > at > org.apache.helix.manager.zk.ZKHelixManager.connect(ZKHelixManager.java:519) > at > com.pinterest.terrapin.server.TerrapinServerHandler.start(TerrapinServerHandler.java:84) > at > com.pinterest.terrapin.server.TerrapinServerMain.main(TerrapinServerMain.java:31) > 2014-08-19 18:41:36,921 [main] (ZKHelixManager.java:522) ERROR fail to > connect hdfsterrapin-a-datanode-531b2679:9090 > org.apache.helix.HelixException: Initial cluster structure is not set up > for instance: hdfsterrapin-a-datanode-531b2679:9090, instanceType: > PARTICIPANT > at > org.apache.helix.manager.zk.ParticipantManagerHelper.joinCluster(ParticipantManagerHelper.java:108) > at > org.apache.helix.manager.zk.ZKHelixManager.handleNewSessionAsParticipant(ZKHelixManager.java:869) > at > org.apache.helix.manager.zk.ZKHelixManager.handleNewSession(ZKHelixManager.java:838) > at > org.apache.helix.manager.zk.ZKHelixManager.createClient(ZKHelixManager.java:493) > at > org.apache.helix.manager.zk.ZKHelixManager.connect(ZKHelixManager.java:519) > at > com.pinterest.terrapin.server.TerrapinServerHandler.start(TerrapinServerHandler.java:84) > at > com.pinterest.terrapin.server.TerrapinServerMain.main(TerrapinServerMain.java:31) > > > > > > > >
