Looks like nodes add and remove themselves quite often. After you disable the instance, Helix will send messages to go from ONLINE to OFFLINE. Looks like the nodes shut down before they get those messages and when they come back up, they use a different instance id.
There are two solutions - During shut down - after disabling wait for the state to be reflected in the External View. - During start up - If possible, re-join the cluster with the same name. If you do that, Helix will remove old messages. A third option is to support autoCleanUp in Helix. Helix controller can monitor the cluster for dead nodes and remove them automatically after some time. On Mon, Nov 28, 2016 at 12:39 PM, Sesh Jalagam <[email protected]> wrote: > <clustername>/INSTANCES/INSTANCES/MESSAGES has already read messages. > > Here is an example. > ,"FROM_STATE":"ONLINE" > ,"MSG_STATE":"read" > ,"MSG_TYPE":"STATE_TRANSITION" > ,"STATE_MODEL_DEF":"OnlineOffline" > ,"STATE_MODEL_FACTORY_NAME":"DEFAULT" > ,"TO_STATE":"OFFLINE > > I see these messages after the participant is disabled and dropped i.e > <clustername>/INSTANCES/<PARTICIPANT_ID> is removed. > > Thanks > > > On Mon, Nov 28, 2016 at 12:18 PM, kishore g <[email protected]> wrote: > >> <clustername>/INSTANCES/INSTANCES/MESSAGES by this do you mean >> <clustername>/INSTANCES/<PARTICIPANT_ID>/MESSAGES >> >> What kind of messages do you see under these nodes. >> >> >> >> On Mon, Nov 28, 2016 at 12:04 PM, Sesh Jalagam <[email protected]> wrote: >> >>> Our set up is following. >>> >>> - Controller (leader elected from one of the cluster nodes) >>> >>> - Cluster of nodes as participants in OnlineOffline StateModel >>> >>> - Set of resources with partitions. >>> >>> >>> Each node on its startup, creates a controller adds a participant if its >>> not existing and waits for the callbacks to handle partition rebalancing. >>> >>> Please not this cluster is created on the fly multiple times a day >>> (actual cluster is not deleted, but new participants are removed and >>> re-added) >>> >>> >>> Everything works fine in production, but I see that the znodes >>> in <clustername>/INSTANCES/INSTANCES/MESSAGES is growing. >>> >>> What is <cluster_id>/INSTANCES/INSTANCES used for, is there a way for >>> the messages to be deleted automatically. >>> >>> I see similar buildup in <cluster_id>INSTANCES/INSTANCES/CURRENTSTATES. >>> >>> >>> Thanks >>> -- >>> - Sesh .J >>> >> >> > > > -- > - Sesh .J >
