Kishore thanks, Option 1 and Option 3 are plausible. Option 2 is not feasible, even though the cluster name is same, instance name is different (usually this a random value)
With Option 1 what should I be looking in the External View, should I be looking at all the resources that should have been transitioned off. With Option 3, when a cluster is redeployed the controller is moving around (because of leader election) from old nodes to old nodes, so I wonder if the controller will miss any messages for dead nodes. Are I can simply have a reaper that comes up and deletes all messages that are destined for instances that are not present in /LIVEINSTANCES/. How should I be dealing with <cluster_id>INSTANCES/INSTANCES/CURRENTSTATES this has stale current states ( session id that is not valid). On Mon, Nov 28, 2016 at 12:52 PM, kishore g <[email protected]> wrote: > Looks like nodes add and remove themselves quite often. After you disable > the instance, Helix will send messages to go from ONLINE to OFFLINE. Looks > like the nodes shut down before they get those messages and when they come > back up, they use a different instance id. > > There are two solutions > - During shut down - after disabling wait for the state to be reflected in > the External View. > - During start up - If possible, re-join the cluster with the same name. > If you do that, Helix will remove old messages. > > A third option is to support autoCleanUp in Helix. Helix controller can > monitor the cluster for dead nodes and remove them automatically after some > time. > > > > On Mon, Nov 28, 2016 at 12:39 PM, Sesh Jalagam <[email protected]> wrote: > >> <clustername>/INSTANCES/INSTANCES/MESSAGES has already read messages. >> >> Here is an example. >> ,"FROM_STATE":"ONLINE" >> ,"MSG_STATE":"read" >> ,"MSG_TYPE":"STATE_TRANSITION" >> ,"STATE_MODEL_DEF":"OnlineOffline" >> ,"STATE_MODEL_FACTORY_NAME":"DEFAULT" >> ,"TO_STATE":"OFFLINE >> >> I see these messages after the participant is disabled and dropped i.e >> <clustername>/INSTANCES/<PARTICIPANT_ID> is removed. >> >> Thanks >> >> >> On Mon, Nov 28, 2016 at 12:18 PM, kishore g <[email protected]> wrote: >> >>> <clustername>/INSTANCES/INSTANCES/MESSAGES by this do you mean >>> <clustername>/INSTANCES/<PARTICIPANT_ID>/MESSAGES >>> >>> What kind of messages do you see under these nodes. >>> >>> >>> >>> On Mon, Nov 28, 2016 at 12:04 PM, Sesh Jalagam <[email protected]> wrote: >>> >>>> Our set up is following. >>>> >>>> - Controller (leader elected from one of the cluster nodes) >>>> >>>> - Cluster of nodes as participants in OnlineOffline StateModel >>>> >>>> - Set of resources with partitions. >>>> >>>> >>>> Each node on its startup, creates a controller adds a participant if >>>> its not existing and waits for the callbacks to handle partition >>>> rebalancing. >>>> >>>> Please not this cluster is created on the fly multiple times a day >>>> (actual cluster is not deleted, but new participants are removed and >>>> re-added) >>>> >>>> >>>> Everything works fine in production, but I see that the znodes >>>> in <clustername>/INSTANCES/INSTANCES/MESSAGES is growing. >>>> >>>> What is <cluster_id>/INSTANCES/INSTANCES used for, is there a way for >>>> the messages to be deleted automatically. >>>> >>>> I see similar buildup in <cluster_id>INSTANCES/INSTANCES/CURRENTSTATES. >>>> >>>> >>>> Thanks >>>> -- >>>> - Sesh .J >>>> >>> >>> >> >> >> -- >> - Sesh .J >> > > -- - Sesh .J
