Re: Messages building up in helix

Sesh Jalagam Mon, 28 Nov 2016 13:56:40 -0800

Kishore thanks,

Option 1 and Option 3 are plausible. Option 2 is not feasible, even though
the cluster name is same, instance name is different (usually this a random
value)


With Option 1 what should I be looking in the External View, should I be
looking at all the resources that should have been transitioned off.

With Option 3, when a cluster is redeployed the controller is moving around
(because of leader election) from old nodes to old nodes, so I wonder if
the controller will miss any messages for dead nodes. Are I can simply have
a reaper that comes up and deletes all messages that are destined for
instances that are not present in /LIVEINSTANCES/.

How should I be dealing with <cluster_id>INSTANCES/INSTANCES/CURRENTSTATES
this has stale current states ( session id that is not valid).



On Mon, Nov 28, 2016 at 12:52 PM, kishore g <[email protected]> wrote:

> Looks like nodes add and remove themselves quite often. After you disable
> the instance, Helix will send messages to go from ONLINE to OFFLINE. Looks
> like the nodes shut down before they get those messages and when they come
> back up, they use a different instance id.
>
> There are two solutions
> - During shut down - after disabling wait for the state to be reflected in
> the External View.
> - During start up - If possible, re-join the cluster with the same name.
> If you do that, Helix will remove old messages.
>
> A third option is to support autoCleanUp in Helix. Helix controller can
> monitor the cluster for dead nodes and remove them automatically after some
> time.
>
>
>
> On Mon, Nov 28, 2016 at 12:39 PM, Sesh Jalagam <[email protected]> wrote:
>
>> <clustername>/INSTANCES/INSTANCES/MESSAGES has already read messages.
>>
>> Here is an example.
>>     ,"FROM_STATE":"ONLINE"
>>     ,"MSG_STATE":"read"
>>     ,"MSG_TYPE":"STATE_TRANSITION"
>>     ,"STATE_MODEL_DEF":"OnlineOffline"
>>     ,"STATE_MODEL_FACTORY_NAME":"DEFAULT"
>>     ,"TO_STATE":"OFFLINE
>>
>> I see these messages after the participant is disabled and dropped i.e
>> <clustername>/INSTANCES/<PARTICIPANT_ID> is removed.
>>
>> Thanks
>>
>>
>> On Mon, Nov 28, 2016 at 12:18 PM, kishore g <[email protected]> wrote:
>>
>>> <clustername>/INSTANCES/INSTANCES/MESSAGES by this do you mean
>>> <clustername>/INSTANCES/<PARTICIPANT_ID>/MESSAGES
>>>
>>> What kind of messages do you see under these nodes.
>>>
>>>
>>>
>>> On Mon, Nov 28, 2016 at 12:04 PM, Sesh Jalagam <[email protected]> wrote:
>>>
>>>> Our set up is following.
>>>>
>>>> - Controller (leader elected from one of the cluster nodes)
>>>>
>>>> - Cluster of nodes as participants in OnlineOffline StateModel
>>>>
>>>> - Set of resources with partitions.
>>>>
>>>>
>>>> Each node on its startup, creates a controller adds a participant if
>>>> its not existing and waits for the callbacks to handle partition
>>>> rebalancing.
>>>>
>>>> Please not this cluster is created on the fly multiple times a day
>>>> (actual cluster is not deleted, but new participants are removed and
>>>> re-added)
>>>>
>>>>
>>>> Everything works fine in production, but I see that the znodes
>>>> in <clustername>/INSTANCES/INSTANCES/MESSAGES is growing.
>>>>
>>>> What is <cluster_id>/INSTANCES/INSTANCES used for, is there a way for
>>>> the messages to be deleted automatically.
>>>>
>>>> I see similar buildup in <cluster_id>INSTANCES/INSTANCES/CURRENTSTATES.
>>>>
>>>>
>>>> Thanks
>>>> --
>>>> - Sesh .J
>>>>
>>>
>>>
>>
>>
>> --
>> - Sesh .J
>>
>
>


-- 
- Sesh .J

Re: Messages building up in helix

Reply via email to