add api support for "subscribe" method

                 Key: ZOOKEEPER-153
             Project: Zookeeper
          Issue Type: New Feature
          Components: c client, documentation, java client, server, tests
            Reporter: Patrick Hunt
            Priority: Minor

Subscribe Method
(note, this was moved from

Outline of the semantics and the requirements of a yet-to-be-implemented 
subscribe() method.


ZooKeeper uses a very light weight one-time notification method for notifying 
interested clients of changes to ZooKeeper data nodes (znode). Clients can set 
a watch on a node when they request information about a znode. The watch is 
atomically set and the data returned, so that any subsequent changes to the 
znode that affect the data returned will trigger a watch event. The watch stays 
in place until triggered or the client is disconnected from a ZooKeeper server. 
A disconnect watch event implicitly triggers all watches.

ZooKeeper users have wondered if they can set permanent watches rather than one 
time watches. In reality such permanent watches do not provide any extra 
benefit over one time watches. Specifically, no data is included in a watch 
event, so the client still needs to do a query operation to get the data 
corresponding to a change; even then, the znode can change yet again after the 
event is received and before the client sends the query operation. Even the 
number of of changes to a znode can be found using one time watches and 
checking the mzxid in the stat structure of the znode. And the client will 
still miss events that happen when the client switches ZooKeeper servers.

There are use cases that require clients to see every change to a ZooKeeper 
node. The most general case is when a client behaves like a state machine and 
each change to the znode changes the state of the client. In these cases 
ZooKeeper is much more like a publish/subscribe system than a distributed 
register. To support this case we need not only reliable permanent watches (we 
even get the events that happen while switching servers) but also the data that 
caused the change, so that the client doesn't miss data that occurs between 
rapid fire changes.


The subscribe(String path) causes ZooKeeper to register a subscription for a 
znode. The initial value of the znode and any subsequent changes to that znode 
will cause a watch event with the data to be sent to the client. The client 
will see all changes in order. If a client switches servers, any missed events 
with the corresponding data will be sent to the client when the client 
reconnects to a server.

There are three ways to cancel a subscription:

   1. Calling unsubscribe(String path)
   2. Closing the ZooKeeper session or letting it expire
   3. Falling too far behind. If the server decides that a client is not 
processing the watch events fast enough, it will cancel the subscription and 
send a SUBSCRIPTION_CANCELLED watch event.


There are a couple of things that make it hard to implement the subscribe() 

   1. Servers must have complete transaction logs - Currently ZooKeeper servers 
just need to have their data trees and in flight transaction logs in sync. When 
a follower syncs to a leader, the leader can just blast down a new snapshot of 
its data tree; it does not need to send past transactions that the follower 
might have missed. However in order to send changes that might have been missed 
by a client, the ZooKeeper server must be able to look into the past to send 
missed changes.
   2. Servers must be able to send clients information about past changes - 
Currenly ZooKeeper servers just send clients information about the current 
state of the system. However, to implement subscribe clients must be able to go 
back into the log and send watches for past changes.

Implementation Hints

There are things that work in our favor. ZooKeeper does have a bound on the 
amount of time it needs to look into the past. A ZooKeeper server bounds the 
session expiration time. The server does not need to keep a record of 
transactions older than this bound.

ZooKeeper also keeps a log of transactions. As long as the log is complete 
enough (as all the transaction back to the longest expiration time) the server 
has the information it needs and it isn't hard to process.

We do not want to cause the log disk to seek while looking at past 
transactions. There are two complimentary approaches to handling this problems: 
keep a few of the transactions from the recent past in memory and log to two 
disks. The first log disk will be synced before letting requests proceed and 
the second disk will not be synced. Recovery uses the first log disk and 
ensures that the second log disk has the same log at recovery time. The second 
log disk is to look into the past. Using the two disks in this way allows 
synchronous logging to be fast because seeks are avoided on the disk with the 
synchronous log.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to