RE: [EXTERNAL] Re: Version Rollback

2018-02-28 Thread Durity, Sean R
My short answer is always – there are no rollbacks, we only go forward.  Jeff’s 
answer is much more complete and technically precise. You *could* rollback a 
few nodes (depending on topology) by just replacing them as if they had died.

I always upgrade all nodes (the binaries) as quickly as possible (but, one node 
at a time). The application stays up, stays happy, and my customers love 
“always up” Cassandra. I have clusters where we have done 3 or more major 
upgrades with 0 downtime for the application. One of the best things about 
supporting Cassandra! One node at a time upgrades can also be automated (which 
we have done).

After upgrading binaries on all nodes, I execute upgradesstables on groups of 
nodes (depending on load, hardware, cluster size, etc.). Reasoning: You cannot 
do any streaming operations (bootstrap, repairs) in a mixed-version cluster 
(except for maybe very minor version upgrades).


Sean Durity
From: shalom sagges [mailto:shalomsag...@gmail.com]
Sent: Wednesday, February 28, 2018 3:54 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Version Rollback

These are really good directions. Thanks a lot everyone!
@Kenneth - The cluster is comprised of 44 nodes, version 2.0.14, ~2.5TB of data 
per node. It's gonna be a major version upgrade (or upgrades to be exact... 
version 3.x is the target).

@Jeff, I have a passive DC. What if I upgrade the passive DC and if all goes 
well, move the applications to work with the passive DC and then upgrade the 
active DC. Is this doable?
Also, Would you suggest to upgrade one node (binaries), upgrade the SSTables 
and move to the second node, and then third etc, or first upgrade binaries to 
all nodes, and only then start with the SSTables upgrade?
Thanks!


On Tue, Feb 27, 2018 at 7:47 PM, Jeff Jirsa 
<jji...@gmail.com<mailto:jji...@gmail.com>> wrote:
MOST minor versions support rollback - the exceptions are those where internode 
protocol changes (3.0.14 being the only one in recent memory), or where sstable 
format changes (again rare). No major versions support rollback - the only way 
to do it is to upgrade in a way that you can effectively reinstall the old 
version without data loss.

The steps usually look like:

Test in a lab
Test in a lab again
Test in a lab a few more times
Snapshot everything

If you have a passive data center:
- upgrade one instance
- check to see if it’s happy
- upgrade another
- check to see if it’s happy
- continue until the passive dc is done
- if at any point they’re unhappy rebuild (wipe and restream the old version) 
the dc from the active dc

On the active DCs, you’ll want to canary it one replica at a time so you can 
treat a failed upgrade like a bad disk:
- upgrade one instance
- check if it’s happy; if it’s not treat it like a failed disk and replace it 
with the old version
- if you’re using single token, do another instance in a different replica set, 
repeat until you’re out of different replicas.
- if you’re using vnodes but a rack aware snitch and have more racks than your 
RF, do another instance in the same rack as the canary, repeat until you’re out 
of instances in that rack

This is typically your point of no return - as soon as you have two replicas in 
the new version there’s no more rollback practical.


--
Jeff Jirsa


On Feb 27, 2018, at 9:22 AM, Carl Mueller 
<carl.muel...@smartthings.com<mailto:carl.muel...@smartthings.com>> wrote:
My speculation is that IF (bigif) the sstable formats are compatible between 
the versions, which probably isn't the case for major versions, then you could 
drop back.

If the sstables changed format, then you'll probably need to figure out how to 
rewrite the sstables in the older format and then sstableloader them in the 
older-version cluster if need be. Alas, while there is an sstable upgrader, 
there isn't a downgrader AFAIK.

And I don't have an intimate view of version-by-version sstable format changes 
and compatibilities. You'd probably need to check the upgrade instructions 
(which you presumably did if you're upgrading versions) to tell.

Basically, version rollback is pretty unlikely to be done.

The OTHER option:

1) build a new cluster with the new version, no new data.

2) code your driver interfaces to interface with both clusters. Write to both, 
but read preferentially from the new, then fall through to the old. Yes, that 
gets hairy on multiple row queries. Port your data with sstable loading from 
the old to the new gradually.

When you've done a full load of all the data from old to new, and you're 
satisfied with the new cluster stability, retire the old cluster.

For merging two multirow sets you'll probably need your multirow queries to 
return the partition hash value (or extract the code that generates the hash), 
and have two simulaneous java-driver ResultSets going, and merge their results, 
providing the illusion of a single database query. You'll need to pay attention 
to both the row key ordering

Re: Version Rollback

2018-02-28 Thread shalom sagges
These are really good directions. Thanks a lot everyone!

@Kenneth - The cluster is comprised of 44 nodes, version 2.0.14, ~2.5TB of
data per node. It's gonna be a major version upgrade (or upgrades to be
exact... version 3.x is the target).

@Jeff, I have a passive DC. What if I upgrade the passive DC and if all
goes well, move the applications to work with the passive DC and then
upgrade the active DC. Is this doable?
Also, Would you suggest to upgrade one node (binaries), upgrade the
SSTables and move to the second node, and then third etc, or first upgrade
binaries to all nodes, and only then start with the SSTables upgrade?

Thanks!



On Tue, Feb 27, 2018 at 7:47 PM, Jeff Jirsa  wrote:

> MOST minor versions support rollback - the exceptions are those where
> internode protocol changes (3.0.14 being the only one in recent memory), or
> where sstable format changes (again rare). No major versions support
> rollback - the only way to do it is to upgrade in a way that you can
> effectively reinstall the old version without data loss.
>
> The steps usually look like:
>
> Test in a lab
> Test in a lab again
> Test in a lab a few more times
> Snapshot everything
>
> If you have a passive data center:
> - upgrade one instance
> - check to see if it’s happy
> - upgrade another
> - check to see if it’s happy
> - continue until the passive dc is done
> - if at any point they’re unhappy rebuild (wipe and restream the old
> version) the dc from the active dc
>
> On the active DCs, you’ll want to canary it one replica at a time so you
> can treat a failed upgrade like a bad disk:
> - upgrade one instance
> - check if it’s happy; if it’s not treat it like a failed disk and replace
> it with the old version
> - if you’re using single token, do another instance in a different replica
> set, repeat until you’re out of different replicas.
> - if you’re using vnodes but a rack aware snitch and have more racks than
> your RF, do another instance in the same rack as the canary, repeat until
> you’re out of instances in that rack
>
> This is typically your point of no return - as soon as you have two
> replicas in the new version there’s no more rollback practical.
>
>
>
> --
> Jeff Jirsa
>
>
> On Feb 27, 2018, at 9:22 AM, Carl Mueller 
> wrote:
>
> My speculation is that IF (bigif) the sstable formats are compatible
> between the versions, which probably isn't the case for major versions,
> then you could drop back.
>
> If the sstables changed format, then you'll probably need to figure out
> how to rewrite the sstables in the older format and then sstableloader them
> in the older-version cluster if need be. Alas, while there is an sstable
> upgrader, there isn't a downgrader AFAIK.
>
> And I don't have an intimate view of version-by-version sstable format
> changes and compatibilities. You'd probably need to check the upgrade
> instructions (which you presumably did if you're upgrading versions) to
> tell.
>
> Basically, version rollback is pretty unlikely to be done.
>
> The OTHER option:
>
> 1) build a new cluster with the new version, no new data.
>
> 2) code your driver interfaces to interface with both clusters. Write to
> both, but read preferentially from the new, then fall through to the old.
> Yes, that gets hairy on multiple row queries. Port your data with sstable
> loading from the old to the new gradually.
>
> When you've done a full load of all the data from old to new, and you're
> satisfied with the new cluster stability, retire the old cluster.
>
> For merging two multirow sets you'll probably need your multirow queries
> to return the partition hash value (or extract the code that generates the
> hash), and have two simulaneous java-driver ResultSets going, and merge
> their results, providing the illusion of a single database query. You'll
> need to pay attention to both the row key ordering and column key ordering
> to ensure the combined results are properly ordered.
>
> Writes will be slowed by the double-writes, reads you'll be bound by the
> worse performing cluster.
>
> On Tue, Feb 27, 2018 at 8:23 AM, Kenneth Brotman <
> kenbrot...@yahoo.com.invalid> wrote:
>
>> Could you tell us the size and configuration of your Cassandra cluster?
>>
>>
>>
>> Kenneth Brotman
>>
>>
>>
>> *From:* shalom sagges [mailto:shalomsag...@gmail.com]
>> *Sent:* Tuesday, February 27, 2018 6:19 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* Version Rollback
>>
>>
>>
>> Hi All,
>>
>> I'm planning to upgrade my C* cluster to version 3.x and was wondering
>> what's the best way to perform a rollback if need be.
>>
>> If I used snapshot restoration, I would be facing data loss, depends when
>> I took the snapshot (i.e. a rollback might be required after upgrading half
>> the cluster for example).
>>
>> If I add another DC to the cluster with the old version, then I could
>> point the apps to talk to that DC if anything bad happens, but building it
>> is really time 

Re: Version Rollback

2018-02-27 Thread Jeff Jirsa
MOST minor versions support rollback - the exceptions are those where internode 
protocol changes (3.0.14 being the only one in recent memory), or where sstable 
format changes (again rare). No major versions support rollback - the only way 
to do it is to upgrade in a way that you can effectively reinstall the old 
version without data loss.

The steps usually look like:

Test in a lab
Test in a lab again
Test in a lab a few more times
Snapshot everything 

If you have a passive data center:
- upgrade one instance
- check to see if it’s happy
- upgrade another
- check to see if it’s happy
- continue until the passive dc is done
- if at any point they’re unhappy rebuild (wipe and restream the old version) 
the dc from the active dc

On the active DCs, you’ll want to canary it one replica at a time so you can 
treat a failed upgrade like a bad disk:
- upgrade one instance
- check if it’s happy; if it’s not treat it like a failed disk and replace it 
with the old version
- if you’re using single token, do another instance in a different replica set, 
repeat until you’re out of different replicas. 
- if you’re using vnodes but a rack aware snitch and have more racks than your 
RF, do another instance in the same rack as the canary, repeat until you’re out 
of instances in that rack

This is typically your point of no return - as soon as you have two replicas in 
the new version there’s no more rollback practical. 



-- 
Jeff Jirsa


> On Feb 27, 2018, at 9:22 AM, Carl Mueller  
> wrote:
> 
> My speculation is that IF (bigif) the sstable formats are compatible between 
> the versions, which probably isn't the case for major versions, then you 
> could drop back. 
> 
> If the sstables changed format, then you'll probably need to figure out how 
> to rewrite the sstables in the older format and then sstableloader them in 
> the older-version cluster if need be. Alas, while there is an sstable 
> upgrader, there isn't a downgrader AFAIK. 
> 
> And I don't have an intimate view of version-by-version sstable format 
> changes and compatibilities. You'd probably need to check the upgrade 
> instructions (which you presumably did if you're upgrading versions) to tell.
> 
> Basically, version rollback is pretty unlikely to be done.
> 
> The OTHER option:
> 
> 1) build a new cluster with the new version, no new data. 
> 
> 2) code your driver interfaces to interface with both clusters. Write to 
> both, but read preferentially from the new, then fall through to the old. 
> Yes, that gets hairy on multiple row queries. Port your data with sstable 
> loading from the old to the new gradually. 
> 
> When you've done a full load of all the data from old to new, and you're 
> satisfied with the new cluster stability, retire the old cluster.
> 
> For merging two multirow sets you'll probably need your multirow queries to 
> return the partition hash value (or extract the code that generates the 
> hash), and have two simulaneous java-driver ResultSets going, and merge their 
> results, providing the illusion of a single database query. You'll need to 
> pay attention to both the row key ordering and column key ordering to ensure 
> the combined results are properly ordered.
> 
> Writes will be slowed by the double-writes, reads you'll be bound by the 
> worse performing cluster.
> 
>> On Tue, Feb 27, 2018 at 8:23 AM, Kenneth Brotman 
>>  wrote:
>> Could you tell us the size and configuration of your Cassandra cluster?
>> 
>>  
>> 
>> Kenneth Brotman
>> 
>>  
>> 
>> From: shalom sagges [mailto:shalomsag...@gmail.com] 
>> Sent: Tuesday, February 27, 2018 6:19 AM
>> To: user@cassandra.apache.org
>> Subject: Version Rollback
>> 
>>  
>> 
>> Hi All,
>> 
>> I'm planning to upgrade my C* cluster to version 3.x and was wondering 
>> what's the best way to perform a rollback if need be.
>> 
>> If I used snapshot restoration, I would be facing data loss, depends when I 
>> took the snapshot (i.e. a rollback might be required after upgrading half 
>> the cluster for example).
>> 
>> If I add another DC to the cluster with the old version, then I could point 
>> the apps to talk to that DC if anything bad happens, but building it is 
>> really time consuming and requires a lot of resources.
>> 
>> Can anyone provide recommendations on this matter? Any ideas on how to make 
>> the upgrade foolproof, or at least "really really safe"?
>> 
>>  
>> 
>> Thanks!
>> 
>>  
>> 
> 


RE: Version Rollback

2018-02-27 Thread Kenneth Brotman
I wonder if you could use Apache Spark to do it?

 

Kenneth Brotman

 

From: Carl Mueller [mailto:carl.muel...@smartthings.com] 
Sent: Tuesday, February 27, 2018 9:22 AM
To: user@cassandra.apache.org
Subject: Re: Version Rollback

 

My speculation is that IF (bigif) the sstable formats are compatible between 
the versions, which probably isn't the case for major versions, then you could 
drop back. 

If the sstables changed format, then you'll probably need to figure out how to 
rewrite the sstables in the older format and then sstableloader them in the 
older-version cluster if need be. Alas, while there is an sstable upgrader, 
there isn't a downgrader AFAIK. 

And I don't have an intimate view of version-by-version sstable format changes 
and compatibilities. You'd probably need to check the upgrade instructions 
(which you presumably did if you're upgrading versions) to tell.

Basically, version rollback is pretty unlikely to be done.

The OTHER option:

1) build a new cluster with the new version, no new data. 

2) code your driver interfaces to interface with both clusters. Write to both, 
but read preferentially from the new, then fall through to the old. Yes, that 
gets hairy on multiple row queries. Port your data with sstable loading from 
the old to the new gradually. 

When you've done a full load of all the data from old to new, and you're 
satisfied with the new cluster stability, retire the old cluster.

For merging two multirow sets you'll probably need your multirow queries to 
return the partition hash value (or extract the code that generates the hash), 
and have two simulaneous java-driver ResultSets going, and merge their results, 
providing the illusion of a single database query. You'll need to pay attention 
to both the row key ordering and column key ordering to ensure the combined 
results are properly ordered.

Writes will be slowed by the double-writes, reads you'll be bound by the worse 
performing cluster.

 

On Tue, Feb 27, 2018 at 8:23 AM, Kenneth Brotman <kenbrot...@yahoo.com.invalid> 
wrote:

Could you tell us the size and configuration of your Cassandra cluster?

 

Kenneth Brotman

 

From: shalom sagges [mailto:shalomsag...@gmail.com] 
Sent: Tuesday, February 27, 2018 6:19 AM
To: user@cassandra.apache.org
Subject: Version Rollback

 

Hi All, 

I'm planning to upgrade my C* cluster to version 3.x and was wondering what's 
the best way to perform a rollback if need be. 

If I used snapshot restoration, I would be facing data loss, depends when I 
took the snapshot (i.e. a rollback might be required after upgrading half the 
cluster for example). 

If I add another DC to the cluster with the old version, then I could point the 
apps to talk to that DC if anything bad happens, but building it is really time 
consuming and requires a lot of resources. 

Can anyone provide recommendations on this matter? Any ideas on how to make the 
upgrade foolproof, or at least "really really safe"? 

 

Thanks!

 

 



Re: Version Rollback

2018-02-27 Thread Carl Mueller
My speculation is that IF (bigif) the sstable formats are compatible
between the versions, which probably isn't the case for major versions,
then you could drop back.

If the sstables changed format, then you'll probably need to figure out how
to rewrite the sstables in the older format and then sstableloader them in
the older-version cluster if need be. Alas, while there is an sstable
upgrader, there isn't a downgrader AFAIK.

And I don't have an intimate view of version-by-version sstable format
changes and compatibilities. You'd probably need to check the upgrade
instructions (which you presumably did if you're upgrading versions) to
tell.

Basically, version rollback is pretty unlikely to be done.

The OTHER option:

1) build a new cluster with the new version, no new data.

2) code your driver interfaces to interface with both clusters. Write to
both, but read preferentially from the new, then fall through to the old.
Yes, that gets hairy on multiple row queries. Port your data with sstable
loading from the old to the new gradually.

When you've done a full load of all the data from old to new, and you're
satisfied with the new cluster stability, retire the old cluster.

For merging two multirow sets you'll probably need your multirow queries to
return the partition hash value (or extract the code that generates the
hash), and have two simulaneous java-driver ResultSets going, and merge
their results, providing the illusion of a single database query. You'll
need to pay attention to both the row key ordering and column key ordering
to ensure the combined results are properly ordered.

Writes will be slowed by the double-writes, reads you'll be bound by the
worse performing cluster.

On Tue, Feb 27, 2018 at 8:23 AM, Kenneth Brotman <
kenbrot...@yahoo.com.invalid> wrote:

> Could you tell us the size and configuration of your Cassandra cluster?
>
>
>
> Kenneth Brotman
>
>
>
> *From:* shalom sagges [mailto:shalomsag...@gmail.com]
> *Sent:* Tuesday, February 27, 2018 6:19 AM
> *To:* user@cassandra.apache.org
> *Subject:* Version Rollback
>
>
>
> Hi All,
>
> I'm planning to upgrade my C* cluster to version 3.x and was wondering
> what's the best way to perform a rollback if need be.
>
> If I used snapshot restoration, I would be facing data loss, depends when
> I took the snapshot (i.e. a rollback might be required after upgrading half
> the cluster for example).
>
> If I add another DC to the cluster with the old version, then I could
> point the apps to talk to that DC if anything bad happens, but building it
> is really time consuming and requires a lot of resources.
>
> Can anyone provide recommendations on this matter? Any ideas on how to
> make the upgrade foolproof, or at least "really really safe"?
>
>
>
> Thanks!
>
>
>


RE: Version Rollback

2018-02-27 Thread Kenneth Brotman
Could you tell us the size and configuration of your Cassandra cluster?

 

Kenneth Brotman

 

From: shalom sagges [mailto:shalomsag...@gmail.com] 
Sent: Tuesday, February 27, 2018 6:19 AM
To: user@cassandra.apache.org
Subject: Version Rollback

 

Hi All, 

I'm planning to upgrade my C* cluster to version 3.x and was wondering what's 
the best way to perform a rollback if need be. 

If I used snapshot restoration, I would be facing data loss, depends when I 
took the snapshot (i.e. a rollback might be required after upgrading half the 
cluster for example). 

If I add another DC to the cluster with the old version, then I could point the 
apps to talk to that DC if anything bad happens, but building it is really time 
consuming and requires a lot of resources. 

Can anyone provide recommendations on this matter? Any ideas on how to make the 
upgrade foolproof, or at least "really really safe"? 

 

Thanks!