Michael, I am interested in this procedure, but I have never attempted it myself. It seems that the concept advanced in [1] is to manually replicate the data, then start a new ensemble with different ports using the replicated data, and finally instruct your clients to talk to the new ensemble. This procedure would definitely cause any ephemeral znodes to be lost during the migration since the client connections could not be transferred to the new ensemble without being dropped. Essentially, each client becomes disconnected from the standalone server instance and then is reconnected to the new ensemble of highly available server instances.
Given that the clients must become disconnected, at least temporarily, it seems that you can not obtain 100% up time from a client perspective during this migration. I.e., each client would have to be either restarted or (if you architect for it in your client) it would have to be instructed through an API that it should disconnect from one zk server configuration and connect to another. Either way, the client would be disconnected during its transition. However, the service provided by your clients could remain up as long as that service was able to move transparently from clients connected to the standalone node to clients connected to the ensemble. But at the service level, there would have to be some point at which you stopped relying on the data in the standalone instance and began to rely on the data in the new ensemble. If there are writes on the standalone instance after you manually replicate its data, then those writes would not be present in the ensemble. From a service level, those writes would have been lost. I would be interested in a procedure to make this migration seamless, but I can't see how it would be accomplished without: - halt writes on zookeeper. - replicate zookeeper standalone server state to a zookeeper ensemble with at least two instances (a quorum can meet with two servers). The services will need their myid files. If you start one of these servers on the same machine, then you need to use a different client port for the new ensemble. - start the servers in the new ensemble. Quorum should meet. Leader should be elected, etc. - change the client configuration to point to the servers in the new ensemble. - restart the clients. This moves them from the old standalone zookeeper instance (which nobody should be writing on) to the new ensemble (which is read/write). - terminate the old standalone zookeeper server instance I think that a procedure to increase the replication count of a zookeeper ensemble would be similar: - start a new service in zookeeper ensemble. This service should know about the original servers plus itself. - for each existing zookeeper service, change the server configurations to include the new server and restart the service (rolling restart). This makes the services mutually aware of the new server. - for each client, change the client configuration to include the new zookeeper ensemble list and restart that client. Given all of this, I suggest that the right way to move from a single node deployment to a highly available deployment is to begin with a zookeeper ensemble running on the initial node. - Begin with a single node with 3 zookeeper server instances configured as an ensemble (there are instructions somewhere for running multiple zk instances on the same node - the ports need to be specified such that they do not conflict). To move from a single node to multiple nodes: - Configure and start a new zookeeper server instance on another node. It should know about 2 of the original instances. - Rolling reconfigure and restart of the zookeeper services. The server instance that is being migrated is terminated rather than being restarted. - Rolling reconfigure and restart of the zookeeper clients. On restart, the client will know about the new zookeeper ensemble locations. This would leave you with two zookeeper server instances on the original node and one somewhere else. You would then repeat that procedure to migrate one of the two remaining zookeeper server instances to another node. That would give you one zookeeper service per node. You could then follow the procedure to increase the replication count if you wanted to increase the availability of zookeeper beyond those three nodes. I have not tested any of this. This is just the way I could see it working based on my understanding of zookeeper. I am interested in a procedure for managing this because we have a service that uses zookeeper to coordinate failover. We can manage the increase of replication in our own services and their durable state easily enough, but I am not sure how to manage this for zookeeper. All of the above is complicated enough that it seems it would be easier to begin with three VMs running zookeeper and then migrate the VMs if necessary, ideally without changing their IPs. Thanks, Bryan [1] http://zookeeper-user.578899.n2.nabble.com/safely-upgrade-single-instance-t o-ensemble-td7578716.html On 11/21/13 5:20 AM, "michael.boom" <[email protected]> wrote: >http://zookeeper-user.578899.n2.nabble.com/From-standalone-ZK-instance-to- >3-instances-tp7579325p7579349.html
