> > > This is tricky: what happens if the server your client is connected to is > decommissioned by a view change, and you are unable to locate another server > to connect to because other view changes committed while you are > reconnecting have removed all the servers you knew about. We'd need to make > sure that watches on this znode were fired before a view change, but it's > hard to know how to avoid having to wait for a session timeout before a > client that might just be migrating servers reappears in order to make sure > it sees the veiw change. > > Even then, the problem of 'locating' the cluster still exists in the case > that there are no clients connected to tell anyone about it.
Yes, this doesn't completely solve two issues: 1. Bootstrapping the cluster itself & clients 2. Major cluster reconfiguration (e.g. switching out every node before clients can pickup the changes). That said, I think it gets close and could still be useful. For #1, you could simply require that the initial servers in the cluster be manually configured, then servers could be added and removed as needed. New servers would just need the address of one other server to "join" and get the full server list. For clients, you'd have a similar situation - you still need a way to pass an initial server list (or at least 1 valid server) in to the client, but that could be via HTTP, DNS, or manual list, then the clients themselves could stay in sync with changes. For #2, you could simply document that there are limits to how fast you want to change the cluster, and that if you make too many changes too fast, clients or servers may not pick up the change fast enough and need to be restarted. In reality I don't think this will be much of an issue - as long as at least one server from the "starting" state stays up until everyone else gets reconnected, everyone should eventually find that node and get the full server list. -Dave