RE: Error on participant while joining cluster

Kanak Biscuitwala Tue, 26 Aug 2014 16:53:28 -0700

I don't believe you will get into an inconsistent state if you interleave 
addResource and dropResource calls, so you should be fine.

Date: Tue, 26 Aug 2014 16:48:06 -0700
Subject: Re: Error on participant while joining cluster
From: [email protected]
To: [email protected]

I am doing an "addResource", "dropResource" in separate threads. Its highly 
highly unlikely for me to call these operations on the same resource 
concurrently.
Varun

On Tue, Aug 26, 2014 at 4:45 PM, Kanak Biscuitwala <[email protected]> wrote:

I would have to say, "it depends." There are operations that are idempotent 
(e.g. dropResource), atomic (e.g. setResourceIdealState), both, or neither 
(e.g. resetResource). Generally speaking, you should be OK for most operations, 
but there isn't any synchronization, so depending on which ZNodes are affected 
and how, there may be some thread safety issues.

Are there specific operations you need to be thread-safe?

Date: Tue, 26 Aug 2014 16:37:50 -0700
Subject: Re: Error on participant while joining cluster
From: [email protected]

To: [email protected]

Thanks Kanak. Another question, is HelixAdmin thread safe ?
Varun

On Tue, Aug 26, 2014 at 3:36 PM, Kanak Biscuitwala <[email protected]> wrote:

Hi Varun,

To answer your question on IRC, the resource's znode is deleted immediately on 
dropResource(), but Helix will still be able to send dropped messages after 
this happens because there is enough persisted information in the current state 
on each node.

Kanak

Date: Thu, 21 Aug 2014 12:56:21 -0700
Subject: Re: Error on participant while joining cluster
From: [email protected]

To: [email protected]

I dont see any issue at runtime. However, Helix as a support to backup the 
zookeeper nodes on to a file system. I think | might cause problems while 
storing or restoring data onto zookeeper. I would use something thats 
compatible with file system something like _ or probably -. 

On Thu, Aug 21, 2014 at 12:03 PM, Varun Sharma <[email protected]> wrote:

Is there any restriction with choosing resource names. I was initially putting 
"/" in the name but that seems to be not working well since it ends up creating 
a znode with a slash. I found that if i replace a "/" with a "|", a znode can 
be created. Could there be any other issues inside helix with using a "|" in 
the resource name ?

Varun

On Tue, Aug 19, 2014 at 2:20 PM, Kanak Biscuitwala <[email protected]> wrote:

But of course since HelixAdmin seems to be bugging out, what Jason said is 
right :)

From: [email protected]

To: [email protected]
Subject: RE: Error on participant while joining cluster
Date: Tue, 19 Aug 2014 14:18:23 -0700

As Jason said, typically the naming convention is host_port, which helix tools 
automatically parse as host and port. It is possible to use arbitrary instance 
IDs in theory though, so it might be worth filing as a bug.

As for removing instances, the typical flow is to shut it down (so that the 
live instance is gone), disable it, and then drop it using HelixAdmin.

From: [email protected]

To: [email protected]
Subject: Re: Error on participant while joining cluster
Date: Tue, 19 Aug 2014 21:05:46 +0000

First make sure under /<CLUSTER_NAME>/LIVEINSTANCES/, the node you want to 
remove from the cluster is not running. Then you can simply remove the orphaned 
znodes under /<CLUTER_NAME>/INSTANCES as well as under 
/<CLUSTER_NAME>/CONFIGS/PARTICIPANT. Normally
 ":" is not recommended in the instance id, and we internally replace it with 
"_". We will check how to get rid of an instance with ":" in its id.

Thanks,
Jason

From: Varun Sharma <[email protected]>

Reply-To: "[email protected]" <[email protected]>

Date: Tuesday, August 19, 2014 1:58 PM

To: "[email protected]" <[email protected]>

Subject: Re: Error on participant while joining cluster

Can I simply remove the orphaned znodes under the /<CLUSTER_NAME>/INSTANCES tag 
?

Varun

On Tue, Aug 19, 2014 at 1:54 PM, Varun Sharma 
<[email protected]> wrote:

Another issue I have now is that I ended up registering the participants as 
<host>:<port> - this causes exceptions related to MBeann (because it does not 
like colon separators). I dont know if that is interfering with normal 
controller operation.
 I restarted the instances replacing the : with a , but those old names are 
still stuck in INSTANCES znode. How can I get rid of these - helix-admin seems 
to be replacing the ":" in the node name with an underscore "_" and can't 
delete the node.

This is still causing MBean related exceptions in the log trace.

Varun

On Tue, Aug 19, 2014 at 12:18 PM, Zhen Zhang 
<[email protected]> wrote:

sure. Will add it.

From: kishore g <[email protected]>

Reply-To: "[email protected]" <[email protected]>

Date: Tuesday, August 19, 2014 12:14 PM

To: "[email protected]" <[email protected]>

Subject: Re: Error on participant while joining cluster

Thanks Jason. We need to add this to the documentation. I could not find the 
way to enable auto-join from the docs. Should we add this to admin interface 
documentation?

On Tue, Aug 19, 2014 at 12:06 PM, Zhen Zhang 
<[email protected]> wrote:

Hi Varun, you need to either add the participant to the cluster before start 
it, or enable participant auto-join config:

add participant to cluster:

./helix-admin.sh --zkSvr <ZookeeperServerAddress, e.g. localhost:2181> 
--addNode <clusterName, e.g. terrapin> <instanceId, e.g. 
hdfsterrapin-a-datanode-531b2679_9090>

or, enable auto-join config:
./helix-admin.sh --zkSvr <ZookeeperServerAddress> --setConfig CLUSTER 
<clusterName> allowParticipantAutoJoin=true

Thanks,
Jason

From: Varun Sharma <[email protected]>

Reply-To: "[email protected]" <[email protected]>

Date: Tuesday, August 19, 2014 11:47 AM

To: "[email protected]" <[email protected]>

Subject: Error on participant while joining cluster

I am getting the following error while trying to join a cluster as a 
participant. THe cluster is setup and a controller has already connected to it. 
Can someone help out as to why this is happening ?

2014-08-19 18:41:36,843 [main] (ZKHelixManager.java:727) INFO  Handling new 
session, session id: 147a7beb2dd63f4, instance: 
hdfsterrapin-a-datanode-531b2679:9090, instanceTye: PARTICIPANT, cluster: 
terrapin, zkconnection: State:CONNECTED Timeout:30000 
sessionid:0x147a7beb2dd63f4
 local:/10.65.145.80:43854 remoteserver:terrapinzk001a/10.115.59.31:2181 
lastZxid:0 xid:1 sent:1 recv:1 queuedpkts:0 pendingresp:0 queuedevents:0

2014-08-19 18:41:36,843 [main] (ParticipantHealthReportTask.java:67) WARN  
ParticipantHealthReportTimerTask already stopped

2014-08-19 18:41:36,914 [main] (ParticipantManagerHelper.java:101) INFO  
instance: hdfsterrapin-a-datanode-531b2679:9090 auto-joining terrapin is false

2014-08-19 18:41:36,917 [main] (ZKUtil.java:95) INFO  Invalid instance setup, 
missing znode path: 
/terrapin/CONFIGS/PARTICIPANT/hdfsterrapin-a-datanode-531b2679:9090

2014-08-19 18:41:36,918 [main] (ZKUtil.java:95) INFO  Invalid instance setup, 
missing znode path: 
/terrapin/INSTANCES/hdfsterrapin-a-datanode-531b2679:9090/MESSAGES

2014-08-19 18:41:36,918 [main] (ZKUtil.java:95) INFO  Invalid instance setup, 
missing znode path: 
/terrapin/INSTANCES/hdfsterrapin-a-datanode-531b2679:9090/CURRENTSTATES

2014-08-19 18:41:36,919 [main] (ZKUtil.java:95) INFO  Invalid instance setup, 
missing znode path: 
/terrapin/INSTANCES/hdfsterrapin-a-datanode-531b2679:9090/STATUSUPDATES

2014-08-19 18:41:36,920 [main] (ZKUtil.java:95) INFO  Invalid instance setup, 
missing znode path: 
/terrapin/INSTANCES/hdfsterrapin-a-datanode-531b2679:9090/ERRORS

2014-08-19 18:41:36,920 [main] (ZKHelixManager.java:496) ERROR fail to 
createClient.

org.apache.helix.HelixException: Initial cluster structure is not set up for 
instance: hdfsterrapin-a-datanode-531b2679:9090, instanceType: PARTICIPANT

at 
org.apache.helix.manager.zk.ParticipantManagerHelper.joinCluster(ParticipantManagerHelper.java:108)

at 
org.apache.helix.manager.zk.ZKHelixManager.handleNewSessionAsParticipant(ZKHelixManager.java:869)

at 
org.apache.helix.manager.zk.ZKHelixManager.handleNewSession(ZKHelixManager.java:838)

at 
org.apache.helix.manager.zk.ZKHelixManager.createClient(ZKHelixManager.java:493)

at org.apache.helix.manager.zk.ZKHelixManager.connect(ZKHelixManager.java:519)

at 
com.pinterest.terrapin.server.TerrapinServerHandler.start(TerrapinServerHandler.java:84)

at 
com.pinterest.terrapin.server.TerrapinServerMain.main(TerrapinServerMain.java:31)

2014-08-19 18:41:36,921 [main] (ZKHelixManager.java:522) ERROR fail to connect 
hdfsterrapin-a-datanode-531b2679:9090

org.apache.helix.HelixException: Initial cluster structure is not set up for 
instance: hdfsterrapin-a-datanode-531b2679:9090, instanceType: PARTICIPANT

at 
org.apache.helix.manager.zk.ParticipantManagerHelper.joinCluster(ParticipantManagerHelper.java:108)

at 
org.apache.helix.manager.zk.ZKHelixManager.handleNewSessionAsParticipant(ZKHelixManager.java:869)

at 
org.apache.helix.manager.zk.ZKHelixManager.handleNewSession(ZKHelixManager.java:838)

at 
org.apache.helix.manager.zk.ZKHelixManager.createClient(ZKHelixManager.java:493)

at org.apache.helix.manager.zk.ZKHelixManager.connect(ZKHelixManager.java:519)

at 
com.pinterest.terrapin.server.TerrapinServerHandler.start(TerrapinServerHandler.java:84)

at 
com.pinterest.terrapin.server.TerrapinServerMain.main(TerrapinServerMain.java:31)

RE: Error on participant while joining cluster

Reply via email to