G'day!

Just to add to the list of people asking questions about migrating to 1.2.1 . . .

We're about to migrate our 4 node production Riak database from 1.1.1 to 1.2.1. At the same time we're also migrating from virtual machines to physical machines. These machines will have new names and IP addresses.

The process of doing rolling upgrades is well documented but I'm unsure of the correct procedure for moving to an entirely new cluster.

We have the luxury of a maintenance window so we don't need to keep everything running during the migration. Therefore the current plan is to stop the current cluster, copy the Riak data directories to the new machines and start up the new cluster. The hazy part of the process is how we "reip" the database so it will work in the new cluster.

We've tried using the "riak-admin reip" command but were left with one of our nodes in "(legacy)" mode according to "riak-admin member-status". From an earlier E-Mail thread[1] it seems like "reip" is deprecated and we should be doing a "cluster force replace" instead.

So, would the new procedure be the following?

1. Shutdown old cluster
2. Copy data directory
3. Start new cluster (QUESTION: The new nodes don't own any of the partitions in the data directory. What does it do?) (QUESTION: The new nodes won't be part of a cluster yet. Do I need to "join" them before I can do any of the following commands? Or do I just put all the joins and force-replace commands into the same plan and commit it all together?) 3. Issue "riak-admin cluster force-replace old-node1 new-node1" (QUESTION: Do I run this command just on "new-node1" or on all nodes?)
4. Issue "force-replace" commands for the remaining three nodes.
5. Issue a "cluster plan" and "cluster commit" to commit the changes.
6. Cross fingers.

In my mind the "replace" and/or "force-replace" commands are something we would use it we had a failed node and needed to bring a spare online to take over. It doesn't feel like something you would do if you don't already have a cluster in place and are needing to "replace" ALL nodes.

Of course, we want to test this procedure before doing it for real. What are the risks of doing the above procedure while the old cluster is still running? While the new nodes are on a segregated network and shouldn't be able to contact the old nodes what would happen if we did the above and found the network wasn't as segregated as we originally thought? Would the new nodes start trying to communicate with the old nodes before the "force-replace" can take effect? Or, because all the cluster changes are atomic there won't be any risk of that?

Sorry for all the questions. I'm just trying to get a clear procedure for moving an entire cluster to new hardware and hopefully this thread will help other people in the future.

Thanks in advance!

Shane.

[1] http://comments.gmane.org/gmane.comp.db.riak.user/8418


_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to