Hi Mahadev,

First of all, I like to thank you for being patient with me - my questions
seem unclear to many of you who try to help me.

I guess clients have to be smart enough to trigger a new leader election by
trying to delete the znode. But in this case, ZK should not allow any single
or multiple (as long as they are less than a quorum) client(s) to delete the
znode responding to the master, right? A new consensus among clients (NOT
among the nodes in zk cluster) has to be there for the znode to be deleted,
right?  Does zk have this capability or the clients have to come to this
consensus outside of zk before trying to delete the znode in zk?

Thanks,

Lei

> Hi Lei,
>  Sorry I minsinterpreted your question! The scenario you describe could be
> handled in such a way -
> 
> You could have a status node in ZooKeeper which every slave will subscribe
> to and update! If one of the slave nodes sees that there have been too many
> connection refused to the Leader by the slaves, the slave could go ahead and
> delete the Leader znode, and force the Leader to give up its leadership. I
> am not describing a deatiled way to do it, but its not very hard to come up
> with a design for this.
> 
> 
> 
> Do you intend to have the Leader and Slaves in different Network (different
> ACLs I mean) protected zones? In that case, it is a legitimate concern else
> I do think assymetric network partition would be very unlikely to happen.
> 
> Do you usually see network partitions in such scenarios?
> 
> Thanks
> mahadev
> 
> 
> On 4/30/10 4:05 PM, "Lei Gao" <l...@linkedin.com> wrote:
> 
>> Hi Mahadev,
>> 
>> Why would the leader be disconnected from ZK? ZK is fine communicating with
>> the leader in this case. We are talking about asymmetric network failure.
>> Yes. Leader could consider all the slaves being down if it tracks the status
>> of all slaves himself. But I guess if ZK is used for for membership
>> management, neither the leader nor the slaves will be considered
>> disconnected because they can all connect to ZK.
>> 
>> Thanks,
>> 
>> Lei  
>> 
>> 
>> On 4/30/10 3:47 PM, "Mahadev Konar" <maha...@yahoo-inc.com> wrote:
>> 
>>> Hi Lei,
>>> 
>>> In this case, the Leader will be disconnected from ZK cluster and will give
>>> up its leadership. Since its disconnected, ZK cluster will realize that the
>>> Leader is dead!....
>>> 
>>> When Zk cluster realizes that the Leader is dead (this is because the zk
>>> cluster hasn't heard from the Leader for a certain time.... Configurable via
>>> session timeout parameter), the slaves will be notified of this via watchers
>>> in zookeeper cluster. The slaves will realize that the Leader is gone and
>>> will relect a new Leader and will start working with the new Leader.
>>> 
>>> Does that answer your question?
>>> 
>>> You might want to look though the documentation of ZK to understand its use
>>> case and how it solves these kind of issues....
>>> 
>>> Thanks
>>> mahadev
>>> 
>>> 
>>> On 4/30/10 2:08 PM, "Lei Gao" <l...@linkedin.com> wrote:
>>> 
>>>> Thank you all for your answers. It clarifies a lot of my confusion about
>>>> the
>>>> service guarantees of ZK. I am still struggling with one failure case (I am
>>>> not trying to be the pain in the neck. But I need to have a full
>>>> understanding of what ZK can offer before I make a decision on whether to
>>>> used it in my cluster.)
>>>> 
>>>> Assume the following topology:
>>>> 
>>>>          Leader  ==== ZK cluster
>>>>               \\                    //
>>>>                \\                  //
>>>>                  \\               //
>>>>                       Slave(s)
>>>> 
>>>> If I am asymmetric network failure such that the connection between Leader
>>>> and Slave(s) are broken while all other connections are still alive, would
>>>> my system hang after some point? Because no new leader election will be
>>>> initiated by slaves and the leader can't get the work to slave(s).
>>>> 
>>>> Thanks,
>>>> 
>>>> Lei
>>>> 
>>>> On 4/30/10 1:54 PM, "Ted Dunning" <ted.dunn...@gmail.com> wrote:
>>>> 
>>>>> If one of your user clients can no longer reach one member of the ZK
>>>>> cluster, then it will try to reach another.  If it succeeds, then it will
>>>>> continue without any problems as long as the ZK cluster itself is OK.
>>>>> 
>>>>> This applies for all the ZK recipes.  You will have to be a little bit
>>>>> careful to handle connection loss, but that should get easier soon (and
>>>>> isn't all that difficult anyway).
>>>>> 
>>>>> On Fri, Apr 30, 2010 at 1:26 PM, Lei Gao <l...@linkedin.com> wrote:
>>>>> 
>>>>>> I am not talking about the leader election within zookeeper cluster. I
>>>>>> guess
>>>>>> I didn't make the discussion context clear. In my case, I run a cluster
>>>>>> that
>>>>>> uses zookeeper for doing the leader election. Yes, nodes in my cluster
>>>>>> are
>>>>>> the clients of zookeeper.  Those nodes depend on zookeeper to elect a new
>>>>>> leader and figure out what the current leader is. So if the zookeeper
>>>>>> (think
>>>>>> of it as a stand-alone entity) becomes unavailabe in the way I've
>>>>>> described
>>>>>> earlier, how can I handle such situation so my cluster can still function
>>>>>> while a majority of nodes still connect to each other (but not to the
>>>>>> zookeeper)?
>>>>>> 
>>>> 
>>> 
>> 
> 

Reply via email to