Another quick question - if I open the externalview from inside a contoller using helixadmin.getResourceExternalView - is that a zk call or is the external view cached in local memory. If the former, is it better to establish a spectator conn. so we get notified of changes instead of having to pull every time (I am polling external view for all resources every few minutes which is why i am asking this question)..
On Tue, Aug 26, 2014 at 5:02 PM, kishore g <[email protected]> wrote: > I think they are thread safe because ZKHelixAdmin is stateless.I think the > right question is "are the operations atomic". Most HelixAdmin operation > change znodes in zookeeper. By default none of the operations are atomic. > However, HelixAdmin under the hood uses HelixDataAccessor that supports > atomic operations. > > If you really want these operations to be atomic, you can use > HelixDataAccessor and BaseDataAccessor. These are low level api's and if > you really need atomicity, we should probably file a jira and provide the > high level apis in HelixAdmin. > > > > > > > On Tue, Aug 26, 2014 at 4:48 PM, Varun Sharma <[email protected]> wrote: > >> I am doing an "addResource", "dropResource" in separate threads. Its >> highly highly unlikely for me to call these operations on the same resource >> concurrently. >> >> Varun >> >> >> On Tue, Aug 26, 2014 at 4:45 PM, Kanak Biscuitwala <[email protected]> >> wrote: >> >>> I would have to say, "it depends." There are operations that are >>> idempotent (e.g. dropResource), atomic (e.g. setResourceIdealState), both, >>> or neither (e.g. resetResource). Generally speaking, you should be OK for >>> most operations, but there isn't any synchronization, so depending on which >>> ZNodes are affected and how, there may be some thread safety issues. >>> >>> Are there specific operations you need to be thread-safe? >>> >>> >>> ------------------------------ >>> Date: Tue, 26 Aug 2014 16:37:50 -0700 >>> >>> Subject: Re: Error on participant while joining cluster >>> From: [email protected] >>> To: [email protected] >>> >>> >>> Thanks Kanak. Another question, is HelixAdmin thread safe ? >>> >>> Varun >>> >>> >>> On Tue, Aug 26, 2014 at 3:36 PM, Kanak Biscuitwala <[email protected]> >>> wrote: >>> >>> Hi Varun, >>> >>> >>> To answer your question on IRC, the resource's znode is deleted >>> immediately on dropResource(), but Helix will still be able to send dropped >>> messages after this happens because there is enough persisted information >>> in the current state on each node. >>> >>> >>> Kanak >>> >>> ------------------------------ >>> Date: Thu, 21 Aug 2014 12:56:21 -0700 >>> >>> Subject: Re: Error on participant while joining cluster >>> From: [email protected] >>> To: [email protected] >>> >>> >>> I dont see any issue at runtime. However, Helix as a support to backup >>> the zookeeper nodes on to a file system. I think | might cause problems >>> while storing or restoring data onto zookeeper. I would use something thats >>> compatible with file system something like _ or probably -. >>> >>> >>> On Thu, Aug 21, 2014 at 12:03 PM, Varun Sharma <[email protected]> >>> wrote: >>> >>> Is there any restriction with choosing resource names. I was initially >>> putting "/" in the name but that seems to be not working well since it ends >>> up creating a znode with a slash. I found that if i replace a "/" with a >>> "|", a znode can be created. Could there be any other issues inside helix >>> with using a "|" in the resource name ? >>> >>> Varun >>> >>> >>> On Tue, Aug 19, 2014 at 2:20 PM, Kanak Biscuitwala <[email protected]> >>> wrote: >>> >>> But of course since HelixAdmin seems to be bugging out, what Jason said >>> is right :) >>> >>> ------------------------------ >>> From: [email protected] >>> To: [email protected] >>> Subject: RE: Error on participant while joining cluster >>> Date: Tue, 19 Aug 2014 14:18:23 -0700 >>> >>> >>> As Jason said, typically the naming convention is host_port, which helix >>> tools automatically parse as host and port. It is possible to use arbitrary >>> instance IDs in theory though, so it might be worth filing as a bug. >>> >>> As for removing instances, the typical flow is to shut it down (so that >>> the live instance is gone), disable it, and then drop it using HelixAdmin. >>> >>> ------------------------------ >>> From: [email protected] >>> To: [email protected] >>> Subject: Re: Error on participant while joining cluster >>> Date: Tue, 19 Aug 2014 21:05:46 +0000 >>> >>> First make sure under /<CLUSTER_NAME>/LIVEINSTANCES/, the node you want >>> to remove from the cluster is not running. Then you can simply remove the >>> orphaned znodes under /<CLUTER_NAME>/INSTANCES as well as under >>> /<CLUSTER_NAME>/CONFIGS/PARTICIPANT. Normally ":" is not recommended in the >>> instance id, and we internally replace it with "_". We will check how to >>> get rid of an instance with ":" in its id. >>> >>> Thanks, >>> Jason >>> >>> From: Varun Sharma <[email protected]> >>> Reply-To: "[email protected]" <[email protected]> >>> Date: Tuesday, August 19, 2014 1:58 PM >>> To: "[email protected]" <[email protected]> >>> Subject: Re: Error on participant while joining cluster >>> >>> Can I simply remove the orphaned znodes under the >>> /<CLUSTER_NAME>/INSTANCES tag ? >>> >>> Varun >>> >>> >>> On Tue, Aug 19, 2014 at 1:54 PM, Varun Sharma <[email protected]> >>> wrote: >>> >>> Another issue I have now is that I ended up registering the participants >>> as <host>:<port> - this causes exceptions related to MBeann (because it >>> does not like colon separators). I dont know if that is interfering with >>> normal controller operation. I restarted the instances replacing the : with >>> a , but those old names are still stuck in INSTANCES znode. How can I get >>> rid of these - helix-admin seems to be replacing the ":" in the node name >>> with an underscore "_" and can't delete the node. >>> >>> This is still causing MBean related exceptions in the log trace. >>> >>> Varun >>> >>> >>> On Tue, Aug 19, 2014 at 12:18 PM, Zhen Zhang <[email protected]> >>> wrote: >>> >>> sure. Will add it. >>> >>> From: kishore g <[email protected]> >>> Reply-To: "[email protected]" <[email protected]> >>> Date: Tuesday, August 19, 2014 12:14 PM >>> To: "[email protected]" <[email protected]> >>> Subject: Re: Error on participant while joining cluster >>> >>> Thanks Jason. We need to add this to the documentation. I could not >>> find the way to enable auto-join from the docs. Should we add this to admin >>> interface documentation? >>> >>> >>> >>> >>> >>> >>> On Tue, Aug 19, 2014 at 12:06 PM, Zhen Zhang <[email protected]> >>> wrote: >>> >>> Hi Varun, you need to either add the participant to the cluster before >>> start it, or enable participant auto-join config: >>> >>> add participant to cluster: >>> ./helix-admin.sh --zkSvr <ZookeeperServerAddress, e.g. localhost:2181> >>> --addNode <clusterName, e.g. terrapin> <instanceId, e.g. >>> hdfsterrapin-a-datanode-531b2679_9090> >>> >>> or, enable auto-join config: >>> ./helix-admin.sh --zkSvr <ZookeeperServerAddress> --setConfig CLUSTER >>> <clusterName> allowParticipantAutoJoin=true >>> >>> Thanks, >>> Jason >>> >>> >>> From: Varun Sharma <[email protected]> >>> Reply-To: "[email protected]" <[email protected]> >>> Date: Tuesday, August 19, 2014 11:47 AM >>> To: "[email protected]" <[email protected]> >>> Subject: Error on participant while joining cluster >>> >>> I am getting the following error while trying to join a cluster as a >>> participant. THe cluster is setup and a controller has already connected to >>> it. Can someone help out as to why this is happening ? >>> >>> >>> 2014-08-19 18:41:36,843 [main] (ZKHelixManager.java:727) INFO Handling >>> new session, session id: 147a7beb2dd63f4, instance: >>> hdfsterrapin-a-datanode-531b2679:9090, instanceTye: PARTICIPANT, cluster: >>> terrapin, zkconnection: State:CONNECTED Timeout:30000 >>> sessionid:0x147a7beb2dd63f4 local:/10.65.145.80:43854 >>> remoteserver:terrapinzk001a/10.115.59.31:2181 lastZxid:0 xid:1 sent:1 >>> recv:1 queuedpkts:0 pendingresp:0 queuedevents:0 >>> 2014-08-19 18:41:36,843 [main] (ParticipantHealthReportTask.java:67) >>> WARN ParticipantHealthReportTimerTask already stopped >>> 2014-08-19 18:41:36,914 [main] (ParticipantManagerHelper.java:101) INFO >>> instance: hdfsterrapin-a-datanode-531b2679:9090 auto-joining terrapin is >>> false >>> *2014-08-19 18:41:36,917 [main] (ZKUtil.java:95) INFO Invalid instance >>> setup, missing znode path: >>> /terrapin/CONFIGS/PARTICIPANT/hdfsterrapin-a-datanode-531b2679:9090* >>> *2014-08-19 18:41:36,918 [main] (ZKUtil.java:95) INFO Invalid instance >>> setup, missing znode path: >>> /terrapin/INSTANCES/hdfsterrapin-a-datanode-531b2679:9090/MESSAGES* >>> *2014-08-19 18:41:36,918 [main] (ZKUtil.java:95) INFO Invalid instance >>> setup, missing znode path: >>> /terrapin/INSTANCES/hdfsterrapin-a-datanode-531b2679:9090/CURRENTSTATES* >>> *2014-08-19 18:41:36,919 [main] (ZKUtil.java:95) INFO Invalid instance >>> setup, missing znode path: >>> /terrapin/INSTANCES/hdfsterrapin-a-datanode-531b2679:9090/STATUSUPDATES* >>> *2014-08-19 18:41:36,920 [main] (ZKUtil.java:95) INFO Invalid instance >>> setup, missing znode path: >>> /terrapin/INSTANCES/hdfsterrapin-a-datanode-531b2679:9090/ERRORS* >>> *2014-08-19 18:41:36,920 [main] (ZKHelixManager.java:496) ERROR fail to >>> createClient.* >>> *org.apache.helix.HelixException: Initial cluster structure is not set >>> up for instance: hdfsterrapin-a-datanode-531b2679:9090, instanceType: >>> PARTICIPANT* >>> at >>> org.apache.helix.manager.zk.ParticipantManagerHelper.joinCluster(ParticipantManagerHelper.java:108) >>> at >>> org.apache.helix.manager.zk.ZKHelixManager.handleNewSessionAsParticipant(ZKHelixManager.java:869) >>> at >>> org.apache.helix.manager.zk.ZKHelixManager.handleNewSession(ZKHelixManager.java:838) >>> at >>> org.apache.helix.manager.zk.ZKHelixManager.createClient(ZKHelixManager.java:493) >>> at >>> org.apache.helix.manager.zk.ZKHelixManager.connect(ZKHelixManager.java:519) >>> at >>> com.pinterest.terrapin.server.TerrapinServerHandler.start(TerrapinServerHandler.java:84) >>> at >>> com.pinterest.terrapin.server.TerrapinServerMain.main(TerrapinServerMain.java:31) >>> 2014-08-19 18:41:36,921 [main] (ZKHelixManager.java:522) ERROR fail to >>> connect hdfsterrapin-a-datanode-531b2679:9090 >>> org.apache.helix.HelixException: Initial cluster structure is not set up >>> for instance: hdfsterrapin-a-datanode-531b2679:9090, instanceType: >>> PARTICIPANT >>> at >>> org.apache.helix.manager.zk.ParticipantManagerHelper.joinCluster(ParticipantManagerHelper.java:108) >>> at >>> org.apache.helix.manager.zk.ZKHelixManager.handleNewSessionAsParticipant(ZKHelixManager.java:869) >>> at >>> org.apache.helix.manager.zk.ZKHelixManager.handleNewSession(ZKHelixManager.java:838) >>> at >>> org.apache.helix.manager.zk.ZKHelixManager.createClient(ZKHelixManager.java:493) >>> at >>> org.apache.helix.manager.zk.ZKHelixManager.connect(ZKHelixManager.java:519) >>> at >>> com.pinterest.terrapin.server.TerrapinServerHandler.start(TerrapinServerHandler.java:84) >>> at >>> com.pinterest.terrapin.server.TerrapinServerMain.main(TerrapinServerMain.java:31) >>> >>> >>> >>> >>> >>> >>> >>> >> >
