MOST minor versions support rollback - the exceptions are those where internode 
protocol changes (3.0.14 being the only one in recent memory), or where sstable 
format changes (again rare). No major versions support rollback - the only way 
to do it is to upgrade in a way that you can effectively reinstall the old 
version without data loss.

The steps usually look like:

Test in a lab
Test in a lab again
Test in a lab a few more times
Snapshot everything 

If you have a passive data center:
- upgrade one instance
- check to see if it’s happy
- upgrade another
- check to see if it’s happy
- continue until the passive dc is done
- if at any point they’re unhappy rebuild (wipe and restream the old version) 
the dc from the active dc

On the active DCs, you’ll want to canary it one replica at a time so you can 
treat a failed upgrade like a bad disk:
- upgrade one instance
- check if it’s happy; if it’s not treat it like a failed disk and replace it 
with the old version
- if you’re using single token, do another instance in a different replica set, 
repeat until you’re out of different replicas. 
- if you’re using vnodes but a rack aware snitch and have more racks than your 
RF, do another instance in the same rack as the canary, repeat until you’re out 
of instances in that rack

This is typically your point of no return - as soon as you have two replicas in 
the new version there’s no more rollback practical. 



-- 
Jeff Jirsa


> On Feb 27, 2018, at 9:22 AM, Carl Mueller <carl.muel...@smartthings.com> 
> wrote:
> 
> My speculation is that IF (bigif) the sstable formats are compatible between 
> the versions, which probably isn't the case for major versions, then you 
> could drop back. 
> 
> If the sstables changed format, then you'll probably need to figure out how 
> to rewrite the sstables in the older format and then sstableloader them in 
> the older-version cluster if need be. Alas, while there is an sstable 
> upgrader, there isn't a downgrader AFAIK. 
> 
> And I don't have an intimate view of version-by-version sstable format 
> changes and compatibilities. You'd probably need to check the upgrade 
> instructions (which you presumably did if you're upgrading versions) to tell.
> 
> Basically, version rollback is pretty unlikely to be done.
> 
> The OTHER option:
> 
> 1) build a new cluster with the new version, no new data. 
> 
> 2) code your driver interfaces to interface with both clusters. Write to 
> both, but read preferentially from the new, then fall through to the old. 
> Yes, that gets hairy on multiple row queries. Port your data with sstable 
> loading from the old to the new gradually. 
> 
> When you've done a full load of all the data from old to new, and you're 
> satisfied with the new cluster stability, retire the old cluster.
> 
> For merging two multirow sets you'll probably need your multirow queries to 
> return the partition hash value (or extract the code that generates the 
> hash), and have two simulaneous java-driver ResultSets going, and merge their 
> results, providing the illusion of a single database query. You'll need to 
> pay attention to both the row key ordering and column key ordering to ensure 
> the combined results are properly ordered.
> 
> Writes will be slowed by the double-writes, reads you'll be bound by the 
> worse performing cluster.
> 
>> On Tue, Feb 27, 2018 at 8:23 AM, Kenneth Brotman 
>> <kenbrot...@yahoo.com.invalid> wrote:
>> Could you tell us the size and configuration of your Cassandra cluster?
>> 
>>  
>> 
>> Kenneth Brotman
>> 
>>  
>> 
>> From: shalom sagges [mailto:shalomsag...@gmail.com] 
>> Sent: Tuesday, February 27, 2018 6:19 AM
>> To: user@cassandra.apache.org
>> Subject: Version Rollback
>> 
>>  
>> 
>> Hi All,
>> 
>> I'm planning to upgrade my C* cluster to version 3.x and was wondering 
>> what's the best way to perform a rollback if need be.
>> 
>> If I used snapshot restoration, I would be facing data loss, depends when I 
>> took the snapshot (i.e. a rollback might be required after upgrading half 
>> the cluster for example).
>> 
>> If I add another DC to the cluster with the old version, then I could point 
>> the apps to talk to that DC if anything bad happens, but building it is 
>> really time consuming and requires a lot of resources.
>> 
>> Can anyone provide recommendations on this matter? Any ideas on how to make 
>> the upgrade foolproof, or at least "really really safe"?
>> 
>>  
>> 
>> Thanks!
>> 
>>  
>> 
> 

Reply via email to