Re: Re: Re: Re: How to ensure trasaction create-and-update
That is one of the great virtues in working with ZK... in the event of a server failure, you get behavior as good as can be expected. There are several failure scenarios: a) a (small) fraction of the ZK servers fail or are cut off, but a quorum persists b) a (large) fraction of the ZK servers fail or are cut off and a quorum no longer exists c) the network connection to ZK from the machine changing disk status is interrupted for a short time d) the machine changing disk status goes down or is disconnected from ZK for a long period of time. Failure (a) is not a problem and is, indeeed, a normal maintenance operation when you are upgrading ZK Failure (b) is serious and will cause all updates to ZK to stop. The state will be preserved if at all possible and when enough ZK machines reappear to have a quorum, operations will proceed normally. Failure (c) is generally non-critical, but you should consider how short "a short time" should be and set your ZK timeouts accordingly. You have to deal with this issue in any case to have a reliable system. Failure (d) is normally handled by using some kind of ephemeral file. For instance, you can have one ephemeral file for each machine with a disk. Then you can have a master process that is notified when such a machine's ephemeral file disappears. This master process can do any cleanup operations necessary. It is normal to have several master processes of which only one is active (use ZK for leader election to make this work). On Wed, Mar 31, 2010 at 1:28 AM, zd.wbh wrote: > It is under the assumption that zookeeper requester is stable enough. what > if a server restart occur in the update sequence, no abort or proceed action > can be done. I'm just curious how to handle this kinds of dirty data. >
Re: Re: Re: Re: How to ensure trasaction create-and-update
Ted, your suggested flow guaranteed the update sequence to succeed or fail completely. It is under the assumption that zookeeper requester is stable enough. what if a server restart occur in the update sequence, no abort or proceed action can be done. I'm just curious how to handle this kinds of dirty data. 2010-03-31 Will 发件人: Ted Dunning 发送时间: 2010-03-30 15:39 主 题: Re: Re: Re: How to ensure trasaction create-and-update 收件人: zookeeper-user@hadoop.apache.org As I mentioned, you can keep state in the disk as an echo of the diskPair. If you don't mind a small delay after constructing the pair, then you can just rely on the copy in the disk structure and never refer to the version in the diskPair. The other transitions that you describe can easily be implemented on that same basis. On Tue, Mar 30, 2010 at 12:22 AM, zd.wbh wrote: > It is in the situation 2, it's important to keep a disk in only one > diskPair.Your suggested flow would fit. But it's best to keep the state with > the disk rather than diskPair. When one disk goes bad, pick a HOTSPARE state > disk, change the state to DUMPING, begin to copy data from the other disk of > the pair, when data ready, update state to ONLINE to serve clients. >