Re: Re: Re: Re: How to ensure trasaction create-and-update

2010-03-31 Thread Ted Dunning
That is one of the great virtues in working with ZK... in the event of a
server failure, you get behavior as good as can be expected.

There are several failure scenarios:

a) a (small) fraction of the ZK servers fail or are cut off, but a quorum
persists

b) a (large) fraction of the ZK servers fail or are cut off and a quorum no
longer exists

c) the network connection to ZK from the machine changing disk status is
interrupted for a short time

d) the machine changing disk status goes down or is disconnected from ZK for
a long period of time.

Failure (a) is not a problem and is, indeeed, a normal maintenance operation
when you are upgrading ZK

Failure (b) is serious and will cause all updates to ZK to stop.  The state
will be preserved if at all possible and when enough ZK machines reappear to
have a quorum, operations will proceed normally.

Failure (c) is generally non-critical, but you should consider how short "a
short time" should be and set your ZK timeouts accordingly.  You have to
deal with this issue in any case to have a reliable system.

Failure (d) is normally handled by using some kind of ephemeral file.  For
instance, you can have one ephemeral file for each machine with a disk.
Then you can have a master process that is notified when such a machine's
ephemeral file disappears.  This master process can do any cleanup
operations necessary.  It is normal to have several master processes of
which only one is active (use ZK for leader election to make this work).

On Wed, Mar 31, 2010 at 1:28 AM, zd.wbh  wrote:

>  It is under the assumption that zookeeper requester is stable enough. what
> if a server restart occur in the update sequence, no abort or proceed action
> can be done. I'm just curious how to handle this kinds of dirty data.
>


Re: Re: Re: Re: How to ensure trasaction create-and-update

2010-03-31 Thread zd.wbh
Ted, your suggested flow guaranteed the update sequence to succeed or fail 
completely. It is under the assumption that zookeeper requester is stable 
enough. what if a server restart occur in the update sequence, no abort or 
proceed action can be done. I'm just curious how to handle this kinds of dirty 
data.

2010-03-31



Will


发件人: Ted Dunning 
发送时间: 2010-03-30 15:39
主 题: Re: Re: Re: How to ensure trasaction create-and-update
收件人: zookeeper-user@hadoop.apache.org



As I mentioned, you can keep state in the disk as an echo of the diskPair. 
If you don't mind a small delay after constructing the pair, then you can 
just rely on the copy in the disk structure and never refer to the version 
in the diskPair. 

The other transitions that you describe can easily be implemented on that 
same basis. 

On Tue, Mar 30, 2010 at 12:22 AM, zd.wbh  wrote: 

> It is in the situation 2, it's important to keep a disk in only one 
> diskPair.Your suggested flow would fit. But it's best to keep the state with 
> the disk rather than diskPair. When one disk goes bad, pick a HOTSPARE state 
> disk, change the state to DUMPING, begin to copy data from the other disk of 
> the pair, when data ready, update state to ONLINE to serve clients. 
>