These are really good directions. Thanks a lot everyone!

@Kenneth - The cluster is comprised of 44 nodes, version 2.0.14, ~2.5TB of
data per node. It's gonna be a major version upgrade (or upgrades to be
exact... version 3.x is the target).

@Jeff, I have a passive DC. What if I upgrade the passive DC and if all
goes well, move the applications to work with the passive DC and then
upgrade the active DC. Is this doable?
Also, Would you suggest to upgrade one node (binaries), upgrade the
SSTables and move to the second node, and then third etc, or first upgrade
binaries to all nodes, and only then start with the SSTables upgrade?

Thanks!



On Tue, Feb 27, 2018 at 7:47 PM, Jeff Jirsa <jji...@gmail.com> wrote:

> MOST minor versions support rollback - the exceptions are those where
> internode protocol changes (3.0.14 being the only one in recent memory), or
> where sstable format changes (again rare). No major versions support
> rollback - the only way to do it is to upgrade in a way that you can
> effectively reinstall the old version without data loss.
>
> The steps usually look like:
>
> Test in a lab
> Test in a lab again
> Test in a lab a few more times
> Snapshot everything
>
> If you have a passive data center:
> - upgrade one instance
> - check to see if it’s happy
> - upgrade another
> - check to see if it’s happy
> - continue until the passive dc is done
> - if at any point they’re unhappy rebuild (wipe and restream the old
> version) the dc from the active dc
>
> On the active DCs, you’ll want to canary it one replica at a time so you
> can treat a failed upgrade like a bad disk:
> - upgrade one instance
> - check if it’s happy; if it’s not treat it like a failed disk and replace
> it with the old version
> - if you’re using single token, do another instance in a different replica
> set, repeat until you’re out of different replicas.
> - if you’re using vnodes but a rack aware snitch and have more racks than
> your RF, do another instance in the same rack as the canary, repeat until
> you’re out of instances in that rack
>
> This is typically your point of no return - as soon as you have two
> replicas in the new version there’s no more rollback practical.
>
>
>
> --
> Jeff Jirsa
>
>
> On Feb 27, 2018, at 9:22 AM, Carl Mueller <carl.muel...@smartthings.com>
> wrote:
>
> My speculation is that IF (bigif) the sstable formats are compatible
> between the versions, which probably isn't the case for major versions,
> then you could drop back.
>
> If the sstables changed format, then you'll probably need to figure out
> how to rewrite the sstables in the older format and then sstableloader them
> in the older-version cluster if need be. Alas, while there is an sstable
> upgrader, there isn't a downgrader AFAIK.
>
> And I don't have an intimate view of version-by-version sstable format
> changes and compatibilities. You'd probably need to check the upgrade
> instructions (which you presumably did if you're upgrading versions) to
> tell.
>
> Basically, version rollback is pretty unlikely to be done.
>
> The OTHER option:
>
> 1) build a new cluster with the new version, no new data.
>
> 2) code your driver interfaces to interface with both clusters. Write to
> both, but read preferentially from the new, then fall through to the old.
> Yes, that gets hairy on multiple row queries. Port your data with sstable
> loading from the old to the new gradually.
>
> When you've done a full load of all the data from old to new, and you're
> satisfied with the new cluster stability, retire the old cluster.
>
> For merging two multirow sets you'll probably need your multirow queries
> to return the partition hash value (or extract the code that generates the
> hash), and have two simulaneous java-driver ResultSets going, and merge
> their results, providing the illusion of a single database query. You'll
> need to pay attention to both the row key ordering and column key ordering
> to ensure the combined results are properly ordered.
>
> Writes will be slowed by the double-writes, reads you'll be bound by the
> worse performing cluster.
>
> On Tue, Feb 27, 2018 at 8:23 AM, Kenneth Brotman <
> kenbrot...@yahoo.com.invalid> wrote:
>
>> Could you tell us the size and configuration of your Cassandra cluster?
>>
>>
>>
>> Kenneth Brotman
>>
>>
>>
>> *From:* shalom sagges [mailto:shalomsag...@gmail.com]
>> *Sent:* Tuesday, February 27, 2018 6:19 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* Version Rollback
>>
>>
>>
>> Hi All,
>>
>> I'm planning to upgrade my C* cluster to version 3.x and was wondering
>> what's the best way to perform a rollback if need be.
>>
>> If I used snapshot restoration, I would be facing data loss, depends when
>> I took the snapshot (i.e. a rollback might be required after upgrading half
>> the cluster for example).
>>
>> If I add another DC to the cluster with the old version, then I could
>> point the apps to talk to that DC if anything bad happens, but building it
>> is really time consuming and requires a lot of resources.
>>
>> Can anyone provide recommendations on this matter? Any ideas on how to
>> make the upgrade foolproof, or at least "really really safe"?
>>
>>
>>
>> Thanks!
>>
>>
>>
>
>

Reply via email to