Thanks for the detailed explaination, Kishore!

On Jan 4, 2018 16:29, "kishore g" <[email protected]> wrote:

> Thats right SEMI_AUTO will only change the role of the replica. It will
> never move the replicas.
>
> Instead of answering each question, I will try to explain what happens
> under the hood.
>
> - Each participant maintains a persistent connection with Zookeeper and
> sends the heartbeat every X seconds. I think this is called tick time.
> - When the participant fails to send heartbeat, there is a disconnect
> callback from local ZK client code. Note this callback does not come from
> the ZK server, it will occur as soon as the participant fails to send the
> heartbeat.
> - Let's say the participant connects back to ZK after period T. Now there
> are two cases.
>
>    - T < session timeout: In this case,  the participant gets "connected"
>    callback and its session is still valid and nothing has changed from ZK
>    server/Helix controller/spectator point of view.
>    - T > session timeout: This is when the participant gets "session
>    expiry" callback from the ZK. Note that this happens only after the
>    participant reconnects to ZK. So it might be minutes or even hours
>    (depending on the cause of disconnection from ZK) before the participant
>    gets this call back. But outside world - ZK Server/Controller/Spectator 
> will
>    know about the session expiry immediately after the session timeout.
>
>
> Helix gets to know about the session expiry and will initiate a mastership
> transfer from old master to new master. It cannot send any Master - Slave
> transition message to the old master because the old master is disconnected
> from ZK and is unreachable. Helix will automatically change the external
> View to reflect that the old master is offline for all the replicas it
> owns. The clients (spectators) will immediately know about this and they
> can stop sending requests to the old master.
>
> Similarly, once the new master processes the slave to master transition is
> successful, the external view will be updated and the clients (spectators)
> can now start routing the requests to the new master.
>
> As you pointed out in your email, you can start a timer in participant
> after you get a disconnected event and after session timeout time, stop
> processing the requests. We could have done this automatically in Helix but
> it really depends on the application. This is typically needed only in
> master-slave state model and we could not come up with automatic way. But
> we could have potentially done this based on a config variable. It will be
> awesome if you can contribute this feature.
>
> The controller will change all the relevant data structures in ZK when the
> node goes down (session expires). There is no need for any extra work here.
>
> Thanks,
> Kishore G
>
> On Tue, Jan 2, 2018 at 7:03 PM, Bo Liu <[email protected]> wrote:
>
>> Hi Kishore,
>>
>> Thanks for the answers.
>>
>> My understanding is that Helix with SEMI AUTO mode won't change shard
>> mapping autotically, but may change the roles of each replica. Please
>> correct me if this is wrong.
>> I am wondering how will SEMI AUTO Helix change the roles of replicas
>> mastered on a participant whose ZK session is just expired? Ideally, we
>> want to first 1) change the role of master replicas on the expired
>> participant to Slave, and then 2) promote some other live participant to be
>> the new Masters for those partitions.
>> For 1), we can add some timer logic at the participant side to
>> automatically (without receiving requests from the Controller, because it
>> can't talk to ZK to receive Controller requests) change their roles to be
>> Slave if its ZK session is expired. For 2), Controller needs to change all
>> relevant data stored on ZK to indicate that all replicas on the expired
>> participant are Slaves, and then request some live participants to be the
>> new Master and change the ZK data to indicate new Masters. My understanding
>> is that Helix Controller always sends msg to participants to change their
>> states and then update ZK data when responses are received from
>> participants. This doesn't apply to an expired/dead participant. Because a
>> dead participant can't act on a state change request.
>> Please let me know if I missed anything and Helix has a straightforward
>> way to solve it.
>>
>> Thanks,
>> Bo
>>
>>
>> On Tue, Jan 2, 2018 at 12:49 PM, kishore g <[email protected]> wrote:
>>
>>> Hi Bo,
>>>
>>> Sorry for the delay in responding.
>>>
>>>    1. That's right, you can pretty much use the existing code in Helix
>>>    to generate the initial mapping. In fact, just set the mode to SEMI_AUTO
>>>    and call rebalance API once - this will set up the initial ideal state 
>>> and
>>>    ensure that the MASTERs/SLAVE are evenly distributed. You can also invoke
>>>    rebalance api any time the number of nodes changes (add/remove nodes from
>>>    the cluster).
>>>    2. This won't be a problem with SEMI AUTO mode since the
>>>    idealstate is fixed and is changed by explicitly invoking the rebalance
>>>    API. DROPPED messages will be sent only when the mapping in ideal state
>>>    changes.
>>>    3. Yes, if you have thousands of participants, it is recommended to
>>>    run the rebalancer in the controller.
>>>    4. With SEMI-AUTO mode, the data will never be deleted from the
>>>    participants. In case of ZK network partition, the participants will be
>>>    unreachable for the duration of the outage. Once the connection is
>>>    re-established, everything should return back to normal. Typically, this
>>>    can be avoided by ensuring that the ZK nodes are on different racks.
>>>
>>> thanks,
>>> Kishore G
>>>
>>>
>>>
>>> On Thu, Dec 28, 2017 at 1:53 PM, Bo Liu <[email protected]> wrote:
>>>
>>>> Hi Kishore,
>>>>
>>>> The fullmatix example is very helpful. For my original questions, I
>>>> think we can still let Helix decide role assignment. We just need to make
>>>> the selected slave catch up before promoting it to the new Master in state
>>>> transition handler function. We can also request other Slaves to pull
>>>> updates from this new Master in the same handler function. We will add a
>>>> constraint to allow at most one transition for a partition to avoid
>>>> potential race. Please let us know if this solution has any other
>>>> implications.
>>>>
>>>> After reading some code in both fullmatix and helix, I still have a few
>>>> questions.
>>>>
>>>> 1. I plan to use semi_auto mode to manage our Master-Slave replicated
>>>> storage system running on AWS ec2. A customized rebalancer will be used to
>>>> generate shard mapping, and we rely on helix to determine master-slave role
>>>> assignment (auto restore write availability when a host is down). From the
>>>> code, it seems to me that Helix will make a host serve Master replicas only
>>>> if it is on the top of preference list for every partition it serves. If
>>>> this is the case, the customized rebalancer needs to carefully decide host
>>>> order in preference list to evenly distribute Master replicas? Just wanted
>>>> to know how much work we can save by reusing the role assignment logic from
>>>> semi_auto mode comparing to customized mode.
>>>>
>>>> 2. I noticed that all non-alive hosts will be excluded
>>>> from ResourceAssignment returned by computeBestPossiblePartitionState().
>>>> Does that mean Helix will mark all non-alive hosts DROPPED or just won't
>>>> try to send any state transition messages to non-alive hosts? Partition
>>>> replicas in our systems are expensive to rebuild. So we'd like to not drop
>>>> all the data on a host if the host's ZK session is expired. What's the
>>>> recommended way to achieve this? If a participant reconnect to ZK with a
>>>> new session ID. Will it have to restart from the scratch?
>>>>
>>>> 3. I found that the fullmatix runs rebalancer in participants. If we
>>>> have thousands of participants, is it better to run it in controller?
>>>> Because zk will have less load to synchronize a few controllers than
>>>> thousands participants.
>>>>
>>>> 4. How to protect the system during the events like network partition
>>>> or ZK is unavailable? For example, 1/3 of participants couldn't connect to
>>>> ZK and thus expire their ZK sessions. If possible, we want to avoid
>>>> committing suicide on those 1/3 participants but to keep data in a reusable
>>>> state.
>>>>
>>>> I am still new to Helix. Sorry for the overwhelming questions.
>>>>
>>>> Thanks,
>>>> Bo
>>>>
>>>>
>>>> On Sun, Dec 24, 2017 at 8:54 PM, Bo Liu <[email protected]> wrote:
>>>>
>>>>> Thank you,  will take a look later.
>>>>>
>>>>> On Dec 24, 2017 19:26, "kishore g" <[email protected]> wrote:
>>>>>
>>>>>> https://github.com/kishoreg/fullmatix/tree/master/mysql-cluster
>>>>>>
>>>>>> Take a look at this recipe.
>>>>>>
>>>>>>
>>>>>> On Sun, Dec 24, 2017 at 5:40 PM Bo Liu <[email protected]> wrote:
>>>>>>
>>>>>>> Hi Helix team,
>>>>>>>
>>>>>>> We have an application which runs with 1 Master and multiple Slaves
>>>>>>> per shard. If a host is dead, we want to move the master role from the 
>>>>>>> dead
>>>>>>> host to one of the slave hosts. In the meantime, we need to inform all
>>>>>>> other Slaves start to pull updates from the new Master instead of the 
>>>>>>> old
>>>>>>> one. How do you suggest to implement this with Helix?
>>>>>>>
>>>>>>> Another related question is can we add some logic to make Helix
>>>>>>> choose new Master based on 1) which slave has the most recent updates 
>>>>>>> and
>>>>>>> 2) try to evenly distribute Master shards (only if more than one Slave 
>>>>>>> have
>>>>>>> the most recent updates).
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best regards,
>>>>>>> Bo
>>>>>>>
>>>>>>>
>>>>
>>>>
>>>> --
>>>> Best regards,
>>>> Bo
>>>>
>>>>
>>>
>>
>>
>> --
>> Best regards,
>> Bo
>>
>>
>

Reply via email to