Anyone? Beuller? :-)

Installing Riak 1.1.1 on the new nodes, copying the data directories from the old nodes, issuing a "reip" on all the new nodes, starting up, waiting for partition handoffs to complete, shutting down, upgrading to 1.2.1 and starting up again got us to where we want to be. But this is not very convenient.

What do I do when I come to creating our test environment where I'll be wanting to copy production data onto the test nodes on a regular basis? At that point I won't have the "luxury" of downgrading to 1.1.1 to have a working "reip" command.

Surely there's gotta be an easier way to spin up a new cluster with new names and IPs but with old data?

Shane.

On 08/11/12 21:10, Shane McEwan wrote:
G'day!

Just to add to the list of people asking questions about migrating to
1.2.1 . . .

We're about to migrate our 4 node production Riak database from 1.1.1 to
1.2.1. At the same time we're also migrating from virtual machines to
physical machines. These machines will have new names and IP addresses.

The process of doing rolling upgrades is well documented but I'm unsure
of the correct procedure for moving to an entirely new cluster.

We have the luxury of a maintenance window so we don't need to keep
everything running during the migration. Therefore the current plan is
to stop the current cluster, copy the Riak data directories to the new
machines and start up the new cluster. The hazy part of the process is
how we "reip" the database so it will work in the new cluster.

We've tried using the "riak-admin reip" command but were left with one
of our nodes in "(legacy)" mode according to "riak-admin member-status".
From an earlier E-Mail thread[1] it seems like "reip" is deprecated and
we should be doing a "cluster force replace" instead.

So, would the new procedure be the following?

1. Shutdown old cluster
2. Copy data directory
3. Start new cluster (QUESTION: The new nodes don't own any of the
partitions in the data directory. What does it do?) (QUESTION: The new
nodes won't be part of a cluster yet. Do I need to "join" them before I
can do any of the following commands? Or do I just put all the joins and
force-replace commands into the same plan and commit it all together?)
3. Issue "riak-admin cluster force-replace old-node1 new-node1"
(QUESTION: Do I run this command just on "new-node1" or on all nodes?)
4. Issue "force-replace" commands for the remaining three nodes.
5. Issue a "cluster plan" and "cluster commit" to commit the changes.
6. Cross fingers.

In my mind the "replace" and/or "force-replace" commands are something
we would use it we had a failed node and needed to bring a spare online
to take over. It doesn't feel like something you would do if you don't
already have a cluster in place and are needing to "replace" ALL nodes.

Of course, we want to test this procedure before doing it for real. What
are the risks of doing the above procedure while the old cluster is
still running? While the new nodes are on a segregated network and
shouldn't be able to contact the old nodes what would happen if we did
the above and found the network wasn't as segregated as we originally
thought? Would the new nodes start trying to communicate with the old
nodes before the "force-replace" can take effect? Or, because all the
cluster changes are atomic there won't be any risk of that?

Sorry for all the questions. I'm just trying to get a clear procedure
for moving an entire cluster to new hardware and hopefully this thread
will help other people in the future.

Thanks in advance!

Shane.

[1] http://comments.gmane.org/gmane.comp.db.riak.user/8418


_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to