Re: About Geode rolling downgrade

2020-06-12 Thread Alberto Gomez
Hi Naba!

Did you manage to comment this topic with some engineers?

Cheers,

/Alberto G.

From: Nabarun Nag 
Sent: Friday, June 5, 2020 11:00 AM
To: dev@geode.apache.org 
Subject: Re: About Geode rolling downgrade


Hi Mario and Alberto,

I will sync up with couple of engineers get you a feedback within a couple of 
days.

@Barry , Jason and I were discussing once, can your idea of WAN GII achieve the 
downgrade. Like create a DS with old versions and let it do a GII from the 
newer version cluster and then shutdown the new version DS. Now we have a DS 
with lower version.


Regards
Naba


From: Mario Ivanac 
Sent: Friday, June 5, 2020 1:19:42 AM
To: geode 
Subject: Odg: About Geode rolling downgrade

Hi all,

just a reminder that Alberto is still waiting for feedback,
regarding his question.

BR,
Mario

Šalje: Alberto Gomez 
Poslano: 14. svibnja 2020. 14:45
Prima: geode 
Predmet: Re: About Geode rolling downgrade

Hi,

I friendly reminder to the community about this request for feedback.

Thanks,

-Alberto G.

From: Alberto Gomez 
Sent: Thursday, May 7, 2020 10:44 AM
To: geode 
Subject: Re: About Geode rolling downgrade

Hi again,


Considering Geode does not support online rollback for the time being and since 
we have the need to rollback even a standalone system, we were thinking on a 
procedure to downgrade Geode cluster tolerating downtime, but without a need to:

  *   spin another cluster to sync from,
  *   do a restore or
  *   import data snapshot.



The procedure we came up with is:

  1.  First step - downgrade locators:

 *   While still on the newer version, export cluster configuration.
 *   Shutdown all locators. Existing clients will continue using their 
server connections. New clients/connections are not possible.
 *   Start new locators using the old SW version and import cluster 
configuration. They will form a new cluster. Existing client connections should 
still work, but new client connections are not yet possible (no servers 
connected to locators).

  1.  Second step – downgrade servers:

 *   First shutdown all servers in parallel. This marks the beginning of 
total downtime.
 *   Now start all servers in parallel but still on the new software 
version. Servers connect to the cluster formed by the downgraded locators. When 
servers are up, downtime ends. New client connections are possible. The rest of 
the rollback should be fully online.
 *   Now per server:

   i.  Shutdown 
it, revoke its disk-stores and delete its file system.

 ii.  Start 
server using old SW version. When up, server will take over cluster 
configuration and pick up replicated data and partitioned regions buckets 
satisfying region redundancy (essentially will hold exactly the same data 
previous server had).



The above has some important prerequisites:

  1.  Partitioned regions have redundancy and region configuration allows 
recovery as described above.
  2.  Clients version allows connection to new and old clusters - i.e. clients 
must not use newer version at the moment the procedure starts.
  3.  Geode guarantees cluster configuration exported from newer system can be 
imported into older system. In case of incompatibility I expect we could even 
manually edit the configuration to adapt it to the older system but it is a 
question how new servers will react when they connect (in step 2b).
  4.  Geode guarantees communication between peers with different SW version 
works and recovery of region data works.



Could we have opinions on this offline procedure? It seems to work well but 
probably has caveats we do not see at the moment.



What about prerequisites 3 and 4? It is valid in upgrade case but not sure if 
it holds in this rollback case.


Best regards,


-Alberto G.


From: Anilkumar Gingade 
Sent: Thursday, April 23, 2020 12:59 AM
To: geode 
Subject: Re: About Geode rolling downgrade

That's right, most/always no down-time requirement is managed by having
replicated cluster setups (Disaster-recovery/backup site). The data is
either pushed to both systems through the data ingesters or by using WAN
setup.
The clusters are upgraded one at a time. If there is a failure during
upgrade or needs to be rolled back; one system will be always up
and running.

-Anil.





On Wed, Apr 22, 2020 at 1:51 PM Anthony Baker  wrote:

> Anil, let me see if I understand your perspective by stating it this way:
>
> If cases where 100% uptime is a requirement, users are almost always
> running a disaster recovery site.  It could be active/active or
> active/standby but there are already at least 2 clusters with current
> copies of the data.  If an upgrade goes badly, the clusters c

Re: About Geode rolling downgrade

2020-06-05 Thread Nabarun Nag

Hi Mario and Alberto,

I will sync up with couple of engineers get you a feedback within a couple of 
days.

@Barry , Jason and I were discussing once, can your idea of WAN GII achieve the 
downgrade. Like create a DS with old versions and let it do a GII from the 
newer version cluster and then shutdown the new version DS. Now we have a DS 
with lower version.


Regards
Naba


From: Mario Ivanac 
Sent: Friday, June 5, 2020 1:19:42 AM
To: geode 
Subject: Odg: About Geode rolling downgrade

Hi all,

just a reminder that Alberto is still waiting for feedback,
regarding his question.

BR,
Mario

Šalje: Alberto Gomez 
Poslano: 14. svibnja 2020. 14:45
Prima: geode 
Predmet: Re: About Geode rolling downgrade

Hi,

I friendly reminder to the community about this request for feedback.

Thanks,

-Alberto G.

From: Alberto Gomez 
Sent: Thursday, May 7, 2020 10:44 AM
To: geode 
Subject: Re: About Geode rolling downgrade

Hi again,


Considering Geode does not support online rollback for the time being and since 
we have the need to rollback even a standalone system, we were thinking on a 
procedure to downgrade Geode cluster tolerating downtime, but without a need to:

  *   spin another cluster to sync from,
  *   do a restore or
  *   import data snapshot.



The procedure we came up with is:

  1.  First step - downgrade locators:

 *   While still on the newer version, export cluster configuration.
 *   Shutdown all locators. Existing clients will continue using their 
server connections. New clients/connections are not possible.
 *   Start new locators using the old SW version and import cluster 
configuration. They will form a new cluster. Existing client connections should 
still work, but new client connections are not yet possible (no servers 
connected to locators).

  1.  Second step – downgrade servers:

 *   First shutdown all servers in parallel. This marks the beginning of 
total downtime.
 *   Now start all servers in parallel but still on the new software 
version. Servers connect to the cluster formed by the downgraded locators. When 
servers are up, downtime ends. New client connections are possible. The rest of 
the rollback should be fully online.
 *   Now per server:

   i.  Shutdown 
it, revoke its disk-stores and delete its file system.

 ii.  Start 
server using old SW version. When up, server will take over cluster 
configuration and pick up replicated data and partitioned regions buckets 
satisfying region redundancy (essentially will hold exactly the same data 
previous server had).



The above has some important prerequisites:

  1.  Partitioned regions have redundancy and region configuration allows 
recovery as described above.
  2.  Clients version allows connection to new and old clusters - i.e. clients 
must not use newer version at the moment the procedure starts.
  3.  Geode guarantees cluster configuration exported from newer system can be 
imported into older system. In case of incompatibility I expect we could even 
manually edit the configuration to adapt it to the older system but it is a 
question how new servers will react when they connect (in step 2b).
  4.  Geode guarantees communication between peers with different SW version 
works and recovery of region data works.



Could we have opinions on this offline procedure? It seems to work well but 
probably has caveats we do not see at the moment.



What about prerequisites 3 and 4? It is valid in upgrade case but not sure if 
it holds in this rollback case.


Best regards,


-Alberto G.


From: Anilkumar Gingade 
Sent: Thursday, April 23, 2020 12:59 AM
To: geode 
Subject: Re: About Geode rolling downgrade

That's right, most/always no down-time requirement is managed by having
replicated cluster setups (Disaster-recovery/backup site). The data is
either pushed to both systems through the data ingesters or by using WAN
setup.
The clusters are upgraded one at a time. If there is a failure during
upgrade or needs to be rolled back; one system will be always up
and running.

-Anil.





On Wed, Apr 22, 2020 at 1:51 PM Anthony Baker  wrote:

> Anil, let me see if I understand your perspective by stating it this way:
>
> If cases where 100% uptime is a requirement, users are almost always
> running a disaster recovery site.  It could be active/active or
> active/standby but there are already at least 2 clusters with current
> copies of the data.  If an upgrade goes badly, the clusters can be
> downgraded one at a time without loss of availability.  This is because we
> ensure compatibility across the wan protocol.
>
> Is that correct?
>
>
> Anthony
>
>
>
> > On Apr 22, 2020, at 10:43 AM, A

Odg: About Geode rolling downgrade

2020-06-05 Thread Mario Ivanac
Hi all,

just a reminder that Alberto is still waiting for feedback,
regarding his question.

BR,
Mario

Šalje: Alberto Gomez 
Poslano: 14. svibnja 2020. 14:45
Prima: geode 
Predmet: Re: About Geode rolling downgrade

Hi,

I friendly reminder to the community about this request for feedback.

Thanks,

-Alberto G.

From: Alberto Gomez 
Sent: Thursday, May 7, 2020 10:44 AM
To: geode 
Subject: Re: About Geode rolling downgrade

Hi again,


Considering Geode does not support online rollback for the time being and since 
we have the need to rollback even a standalone system, we were thinking on a 
procedure to downgrade Geode cluster tolerating downtime, but without a need to:

  *   spin another cluster to sync from,
  *   do a restore or
  *   import data snapshot.



The procedure we came up with is:

  1.  First step - downgrade locators:

 *   While still on the newer version, export cluster configuration.
 *   Shutdown all locators. Existing clients will continue using their 
server connections. New clients/connections are not possible.
 *   Start new locators using the old SW version and import cluster 
configuration. They will form a new cluster. Existing client connections should 
still work, but new client connections are not yet possible (no servers 
connected to locators).

  1.  Second step – downgrade servers:

 *   First shutdown all servers in parallel. This marks the beginning of 
total downtime.
 *   Now start all servers in parallel but still on the new software 
version. Servers connect to the cluster formed by the downgraded locators. When 
servers are up, downtime ends. New client connections are possible. The rest of 
the rollback should be fully online.
 *   Now per server:

   i.  Shutdown 
it, revoke its disk-stores and delete its file system.

 ii.  Start 
server using old SW version. When up, server will take over cluster 
configuration and pick up replicated data and partitioned regions buckets 
satisfying region redundancy (essentially will hold exactly the same data 
previous server had).



The above has some important prerequisites:

  1.  Partitioned regions have redundancy and region configuration allows 
recovery as described above.
  2.  Clients version allows connection to new and old clusters - i.e. clients 
must not use newer version at the moment the procedure starts.
  3.  Geode guarantees cluster configuration exported from newer system can be 
imported into older system. In case of incompatibility I expect we could even 
manually edit the configuration to adapt it to the older system but it is a 
question how new servers will react when they connect (in step 2b).
  4.  Geode guarantees communication between peers with different SW version 
works and recovery of region data works.



Could we have opinions on this offline procedure? It seems to work well but 
probably has caveats we do not see at the moment.



What about prerequisites 3 and 4? It is valid in upgrade case but not sure if 
it holds in this rollback case.


Best regards,


-Alberto G.


From: Anilkumar Gingade 
Sent: Thursday, April 23, 2020 12:59 AM
To: geode 
Subject: Re: About Geode rolling downgrade

That's right, most/always no down-time requirement is managed by having
replicated cluster setups (Disaster-recovery/backup site). The data is
either pushed to both systems through the data ingesters or by using WAN
setup.
The clusters are upgraded one at a time. If there is a failure during
upgrade or needs to be rolled back; one system will be always up
and running.

-Anil.





On Wed, Apr 22, 2020 at 1:51 PM Anthony Baker  wrote:

> Anil, let me see if I understand your perspective by stating it this way:
>
> If cases where 100% uptime is a requirement, users are almost always
> running a disaster recovery site.  It could be active/active or
> active/standby but there are already at least 2 clusters with current
> copies of the data.  If an upgrade goes badly, the clusters can be
> downgraded one at a time without loss of availability.  This is because we
> ensure compatibility across the wan protocol.
>
> Is that correct?
>
>
> Anthony
>
>
>
> > On Apr 22, 2020, at 10:43 AM, Anilkumar Gingade 
> wrote:
> >
> >>> Rolling downgrade is a pretty important requirement for our customers
> >>> I'd love to hear what others think about whether this feature is worth
> > the overhead of making sure downgrades can always work.
> >
> > I/We haven't seen users/customers requesting rolling downgrade as a
> > critical requirement for them; most of the time they had both an old and
> > new setup to upgrade or switch back to an older setup.
> > Con

Re: About Geode rolling downgrade

2020-05-14 Thread Alberto Gomez
Hi,

I friendly reminder to the community about this request for feedback.

Thanks,

-Alberto G.

From: Alberto Gomez 
Sent: Thursday, May 7, 2020 10:44 AM
To: geode 
Subject: Re: About Geode rolling downgrade

Hi again,


Considering Geode does not support online rollback for the time being and since 
we have the need to rollback even a standalone system, we were thinking on a 
procedure to downgrade Geode cluster tolerating downtime, but without a need to:

  *   spin another cluster to sync from,
  *   do a restore or
  *   import data snapshot.



The procedure we came up with is:

  1.  First step - downgrade locators:

 *   While still on the newer version, export cluster configuration.
 *   Shutdown all locators. Existing clients will continue using their 
server connections. New clients/connections are not possible.
 *   Start new locators using the old SW version and import cluster 
configuration. They will form a new cluster. Existing client connections should 
still work, but new client connections are not yet possible (no servers 
connected to locators).

  1.  Second step – downgrade servers:

 *   First shutdown all servers in parallel. This marks the beginning of 
total downtime.
 *   Now start all servers in parallel but still on the new software 
version. Servers connect to the cluster formed by the downgraded locators. When 
servers are up, downtime ends. New client connections are possible. The rest of 
the rollback should be fully online.
 *   Now per server:

   i.  Shutdown 
it, revoke its disk-stores and delete its file system.

 ii.  Start 
server using old SW version. When up, server will take over cluster 
configuration and pick up replicated data and partitioned regions buckets 
satisfying region redundancy (essentially will hold exactly the same data 
previous server had).



The above has some important prerequisites:

  1.  Partitioned regions have redundancy and region configuration allows 
recovery as described above.
  2.  Clients version allows connection to new and old clusters - i.e. clients 
must not use newer version at the moment the procedure starts.
  3.  Geode guarantees cluster configuration exported from newer system can be 
imported into older system. In case of incompatibility I expect we could even 
manually edit the configuration to adapt it to the older system but it is a 
question how new servers will react when they connect (in step 2b).
  4.  Geode guarantees communication between peers with different SW version 
works and recovery of region data works.



Could we have opinions on this offline procedure? It seems to work well but 
probably has caveats we do not see at the moment.



What about prerequisites 3 and 4? It is valid in upgrade case but not sure if 
it holds in this rollback case.


Best regards,


-Alberto G.


From: Anilkumar Gingade 
Sent: Thursday, April 23, 2020 12:59 AM
To: geode 
Subject: Re: About Geode rolling downgrade

That's right, most/always no down-time requirement is managed by having
replicated cluster setups (Disaster-recovery/backup site). The data is
either pushed to both systems through the data ingesters or by using WAN
setup.
The clusters are upgraded one at a time. If there is a failure during
upgrade or needs to be rolled back; one system will be always up
and running.

-Anil.





On Wed, Apr 22, 2020 at 1:51 PM Anthony Baker  wrote:

> Anil, let me see if I understand your perspective by stating it this way:
>
> If cases where 100% uptime is a requirement, users are almost always
> running a disaster recovery site.  It could be active/active or
> active/standby but there are already at least 2 clusters with current
> copies of the data.  If an upgrade goes badly, the clusters can be
> downgraded one at a time without loss of availability.  This is because we
> ensure compatibility across the wan protocol.
>
> Is that correct?
>
>
> Anthony
>
>
>
> > On Apr 22, 2020, at 10:43 AM, Anilkumar Gingade 
> wrote:
> >
> >>> Rolling downgrade is a pretty important requirement for our customers
> >>> I'd love to hear what others think about whether this feature is worth
> > the overhead of making sure downgrades can always work.
> >
> > I/We haven't seen users/customers requesting rolling downgrade as a
> > critical requirement for them; most of the time they had both an old and
> > new setup to upgrade or switch back to an older setup.
> > Considering the amount of work involved, and code complexity it brings
> in;
> > while there are ways to downgrade, it is hard to justify supporting this
> > feature.
> >
> > -Anil.
>
>


Re: About Geode rolling downgrade

2020-05-07 Thread Alberto Gomez
Hi again,


Considering Geode does not support online rollback for the time being and since 
we have the need to rollback even a standalone system, we were thinking on a 
procedure to downgrade Geode cluster tolerating downtime, but without a need to:

  *   spin another cluster to sync from,
  *   do a restore or
  *   import data snapshot.



The procedure we came up with is:

  1.  First step - downgrade locators:

 *   While still on the newer version, export cluster configuration.
 *   Shutdown all locators. Existing clients will continue using their 
server connections. New clients/connections are not possible.
 *   Start new locators using the old SW version and import cluster 
configuration. They will form a new cluster. Existing client connections should 
still work, but new client connections are not yet possible (no servers 
connected to locators).

  1.  Second step – downgrade servers:

 *   First shutdown all servers in parallel. This marks the beginning of 
total downtime.
 *   Now start all servers in parallel but still on the new software 
version. Servers connect to the cluster formed by the downgraded locators. When 
servers are up, downtime ends. New client connections are possible. The rest of 
the rollback should be fully online.
 *   Now per server:

   i.  Shutdown 
it, revoke its disk-stores and delete its file system.

 ii.  Start 
server using old SW version. When up, server will take over cluster 
configuration and pick up replicated data and partitioned regions buckets 
satisfying region redundancy (essentially will hold exactly the same data 
previous server had).



The above has some important prerequisites:

  1.  Partitioned regions have redundancy and region configuration allows 
recovery as described above.
  2.  Clients version allows connection to new and old clusters - i.e. clients 
must not use newer version at the moment the procedure starts.
  3.  Geode guarantees cluster configuration exported from newer system can be 
imported into older system. In case of incompatibility I expect we could even 
manually edit the configuration to adapt it to the older system but it is a 
question how new servers will react when they connect (in step 2b).
  4.  Geode guarantees communication between peers with different SW version 
works and recovery of region data works.



Could we have opinions on this offline procedure? It seems to work well but 
probably has caveats we do not see at the moment.



What about prerequisites 3 and 4? It is valid in upgrade case but not sure if 
it holds in this rollback case.


Best regards,


-Alberto G.


From: Anilkumar Gingade 
Sent: Thursday, April 23, 2020 12:59 AM
To: geode 
Subject: Re: About Geode rolling downgrade

That's right, most/always no down-time requirement is managed by having
replicated cluster setups (Disaster-recovery/backup site). The data is
either pushed to both systems through the data ingesters or by using WAN
setup.
The clusters are upgraded one at a time. If there is a failure during
upgrade or needs to be rolled back; one system will be always up
and running.

-Anil.





On Wed, Apr 22, 2020 at 1:51 PM Anthony Baker  wrote:

> Anil, let me see if I understand your perspective by stating it this way:
>
> If cases where 100% uptime is a requirement, users are almost always
> running a disaster recovery site.  It could be active/active or
> active/standby but there are already at least 2 clusters with current
> copies of the data.  If an upgrade goes badly, the clusters can be
> downgraded one at a time without loss of availability.  This is because we
> ensure compatibility across the wan protocol.
>
> Is that correct?
>
>
> Anthony
>
>
>
> > On Apr 22, 2020, at 10:43 AM, Anilkumar Gingade 
> wrote:
> >
> >>> Rolling downgrade is a pretty important requirement for our customers
> >>> I'd love to hear what others think about whether this feature is worth
> > the overhead of making sure downgrades can always work.
> >
> > I/We haven't seen users/customers requesting rolling downgrade as a
> > critical requirement for them; most of the time they had both an old and
> > new setup to upgrade or switch back to an older setup.
> > Considering the amount of work involved, and code complexity it brings
> in;
> > while there are ways to downgrade, it is hard to justify supporting this
> > feature.
> >
> > -Anil.
>
>


Re: About Geode rolling downgrade

2020-04-22 Thread Anilkumar Gingade
That's right, most/always no down-time requirement is managed by having
replicated cluster setups (Disaster-recovery/backup site). The data is
either pushed to both systems through the data ingesters or by using WAN
setup.
The clusters are upgraded one at a time. If there is a failure during
upgrade or needs to be rolled back; one system will be always up
and running.

-Anil.





On Wed, Apr 22, 2020 at 1:51 PM Anthony Baker  wrote:

> Anil, let me see if I understand your perspective by stating it this way:
>
> If cases where 100% uptime is a requirement, users are almost always
> running a disaster recovery site.  It could be active/active or
> active/standby but there are already at least 2 clusters with current
> copies of the data.  If an upgrade goes badly, the clusters can be
> downgraded one at a time without loss of availability.  This is because we
> ensure compatibility across the wan protocol.
>
> Is that correct?
>
>
> Anthony
>
>
>
> > On Apr 22, 2020, at 10:43 AM, Anilkumar Gingade 
> wrote:
> >
> >>> Rolling downgrade is a pretty important requirement for our customers
> >>> I'd love to hear what others think about whether this feature is worth
> > the overhead of making sure downgrades can always work.
> >
> > I/We haven't seen users/customers requesting rolling downgrade as a
> > critical requirement for them; most of the time they had both an old and
> > new setup to upgrade or switch back to an older setup.
> > Considering the amount of work involved, and code complexity it brings
> in;
> > while there are ways to downgrade, it is hard to justify supporting this
> > feature.
> >
> > -Anil.
>
>


Re: About Geode rolling downgrade

2020-04-22 Thread Anthony Baker
Anil, let me see if I understand your perspective by stating it this way:

If cases where 100% uptime is a requirement, users are almost always running a 
disaster recovery site.  It could be active/active or active/standby but there 
are already at least 2 clusters with current copies of the data.  If an upgrade 
goes badly, the clusters can be downgraded one at a time without loss of 
availability.  This is because we ensure compatibility across the wan protocol.

Is that correct?


Anthony



> On Apr 22, 2020, at 10:43 AM, Anilkumar Gingade  wrote:
> 
>>> Rolling downgrade is a pretty important requirement for our customers
>>> I'd love to hear what others think about whether this feature is worth
> the overhead of making sure downgrades can always work.
> 
> I/We haven't seen users/customers requesting rolling downgrade as a
> critical requirement for them; most of the time they had both an old and
> new setup to upgrade or switch back to an older setup.
> Considering the amount of work involved, and code complexity it brings in;
> while there are ways to downgrade, it is hard to justify supporting this
> feature.
> 
> -Anil.



Re: About Geode rolling downgrade

2020-04-22 Thread Anilkumar Gingade
>> Rolling downgrade is a pretty important requirement for our customers
>> I'd love to hear what others think about whether this feature is worth
the overhead of making sure downgrades can always work.

I/We haven't seen users/customers requesting rolling downgrade as a
critical requirement for them; most of the time they had both an old and
new setup to upgrade or switch back to an older setup.
Considering the amount of work involved, and code complexity it brings in;
while there are ways to downgrade, it is hard to justify supporting this
feature.

-Anil.





On Tue, Apr 21, 2020 at 2:01 PM Dan Smith  wrote:

> > Anyhow, we wonder what would be as of today the recommended or official
> way to downgrade a Geode system without downtime and data loss?
>
> I think the without downtime option is difficult right now. The most bullet
> proof way to downgrade without data loss is probably just to export/import
> the data, but that involves downtime. In many cases, you could restart the
> system with an old version if you have persistent data because the on disk
> format doesn't change that often, but that won't work in all cases. Or if
> you have multiple redundant WAN sites you could potentially shift traffic
> from one to the other and recreate a WAN site, but that also requires some
> work.
>
> > Rolling downgrade is a pretty important requirement for our customers so
> we would not like to close the discussion here and instead try to see if it
> is still reasonable to propose it for Geode maybe relaxing a bit the
> expectations and clarifying some things.
>
> I agree that rolling downgrade is a useful feature for some cases. I also
> agree we would need to add a lot of tests to make sure we really can
> support it. I'd love to hear what others think about whether this feature
> is worth the overhead of making sure downgrades can always work. As Bruce
> pointed out, we have made changes in the past and we will make changes in
> the future that may need additional logic to support downgrades.
>
> Regarding your downgrade steps, they look reasonable. You might consider
> downgrading the servers first. Rolling *upgrade* upgrades the locators
> first, so up to this point we have only tested a newer locator with an
> older server.
>
> -Dan
>
> On Mon, Apr 20, 2020 at 9:13 AM  wrote:
>
> > Hi,
> >
> > I agree that if we wanted to support limited rolling downgrade some other
> > version interchange needs to be done and extra tests will be required.
> >
> > Nevertheless, this could be done using gfsh or with a startup parameter.
> > For example, in the case you mentioned about the UDP messaging, some
> > command like: "enable UDP messaging" to put the system again in a state
> > equivalent to "upgrade in progress but not yet completed" that would
> allow
> > old members to join again.
> > I guess for each case there would be particularities but they should not
> > involve a lot of effort because most of the mechanisms needed (the ones
> > that allow old and new members to coexist) will have been developed for
> the
> > rolling upgrade.
> >
> > Anyhow, we wonder what would be as of today the recommended or official
> > way to downgrade a Geode system without downtime and data loss?
> >
> >
> > 
> > From: Bruce Schuchardt 
> > Sent: Friday, April 17, 2020 11:36 PM
> > To: dev@geode.apache.org 
> > Subject: Re: About Geode rolling downgrade
> >
> > Hi Alberto,
> >
> > I think that if we want to support limited rolling downgrade some other
> > version interchange needs to be done and there need to be tests that
> prove
> > that the downgrade works.  That would let us document which versions are
> > compatible for a downgrade and enforce that no-one attempts it between
> > incompatible versions.
> >
> > For instance, there is work going on right now that introduces
> > communications changes to remove UDP messaging.  Once rolling upgrade
> > completes it will shut down unsecure UDP communications.  At that point
> > there is no way to go back.  If you tried it the old servers would try to
> > communicate with UDP but the new servers would not have UDP sockets open
> > for security reasons.
> >
> > As a side note, clients would all have to be rolled back before starting
> > in on the servers.  Clients aren't equipped to talk to an older version
> > server, and servers will reject the client's attempts to create
> connections.
> >
> > On 4/17/20, 10:14 AM, "Alberto Gomez"  wrote:
> >
> > Hi Bruce,
> >
> > Thanks a lot for your an

Re: About Geode rolling downgrade

2020-04-21 Thread Dan Smith
> Anyhow, we wonder what would be as of today the recommended or official
way to downgrade a Geode system without downtime and data loss?

I think the without downtime option is difficult right now. The most bullet
proof way to downgrade without data loss is probably just to export/import
the data, but that involves downtime. In many cases, you could restart the
system with an old version if you have persistent data because the on disk
format doesn't change that often, but that won't work in all cases. Or if
you have multiple redundant WAN sites you could potentially shift traffic
from one to the other and recreate a WAN site, but that also requires some
work.

> Rolling downgrade is a pretty important requirement for our customers so
we would not like to close the discussion here and instead try to see if it
is still reasonable to propose it for Geode maybe relaxing a bit the
expectations and clarifying some things.

I agree that rolling downgrade is a useful feature for some cases. I also
agree we would need to add a lot of tests to make sure we really can
support it. I'd love to hear what others think about whether this feature
is worth the overhead of making sure downgrades can always work. As Bruce
pointed out, we have made changes in the past and we will make changes in
the future that may need additional logic to support downgrades.

Regarding your downgrade steps, they look reasonable. You might consider
downgrading the servers first. Rolling *upgrade* upgrades the locators
first, so up to this point we have only tested a newer locator with an
older server.

-Dan

On Mon, Apr 20, 2020 at 9:13 AM  wrote:

> Hi,
>
> I agree that if we wanted to support limited rolling downgrade some other
> version interchange needs to be done and extra tests will be required.
>
> Nevertheless, this could be done using gfsh or with a startup parameter.
> For example, in the case you mentioned about the UDP messaging, some
> command like: "enable UDP messaging" to put the system again in a state
> equivalent to "upgrade in progress but not yet completed" that would allow
> old members to join again.
> I guess for each case there would be particularities but they should not
> involve a lot of effort because most of the mechanisms needed (the ones
> that allow old and new members to coexist) will have been developed for the
> rolling upgrade.
>
> Anyhow, we wonder what would be as of today the recommended or official
> way to downgrade a Geode system without downtime and data loss?
>
>
> 
> From: Bruce Schuchardt 
> Sent: Friday, April 17, 2020 11:36 PM
> To: dev@geode.apache.org 
> Subject: Re: About Geode rolling downgrade
>
> Hi Alberto,
>
> I think that if we want to support limited rolling downgrade some other
> version interchange needs to be done and there need to be tests that prove
> that the downgrade works.  That would let us document which versions are
> compatible for a downgrade and enforce that no-one attempts it between
> incompatible versions.
>
> For instance, there is work going on right now that introduces
> communications changes to remove UDP messaging.  Once rolling upgrade
> completes it will shut down unsecure UDP communications.  At that point
> there is no way to go back.  If you tried it the old servers would try to
> communicate with UDP but the new servers would not have UDP sockets open
> for security reasons.
>
> As a side note, clients would all have to be rolled back before starting
> in on the servers.  Clients aren't equipped to talk to an older version
> server, and servers will reject the client's attempts to create connections.
>
> On 4/17/20, 10:14 AM, "Alberto Gomez"  wrote:
>
> Hi Bruce,
>
> Thanks a lot for your answer. We had not thought about the changes in
> distributed algorithms when analyzing rolling downgrades.
>
> Rolling downgrade is a pretty important requirement for our customers
> so we would not like to close the discussion here and instead try to see if
> it is still reasonable to propose it for Geode maybe relaxing a bit the
> expectations and clarifying some things.
>
> First, I think supporting rolling downgrade does not mean making it
> impossible to upgrade distributed algorithms. It means that you need to
> support the new and the old algorithms (just as it is done today with
> rolling upgrades) in the upgraded version and also support the possibility
> of switching to the old algorithm in a fully upgraded system.
>
> Second of all, I would say it is not very common to upgrade
> distributed algorithms, or at least, it does not seem to have been the case
> so far in Geode. Therefore, the burden of adding the logic to support the
> rolling downgrade would not be some

Re: About Geode rolling downgrade

2020-04-20 Thread alberto.gomez
Hi,

I agree that if we wanted to support limited rolling downgrade some other 
version interchange needs to be done and extra tests will be required.

Nevertheless, this could be done using gfsh or with a startup parameter. For 
example, in the case you mentioned about the UDP messaging, some command like: 
"enable UDP messaging" to put the system again in a state equivalent to 
"upgrade in progress but not yet completed" that would allow old members to 
join again.
I guess for each case there would be particularities but they should not 
involve a lot of effort because most of the mechanisms needed (the ones that 
allow old and new members to coexist) will have been developed for the rolling 
upgrade.

Anyhow, we wonder what would be as of today the recommended or official way to 
downgrade a Geode system without downtime and data loss?



From: Bruce Schuchardt 
Sent: Friday, April 17, 2020 11:36 PM
To: dev@geode.apache.org 
Subject: Re: About Geode rolling downgrade

Hi Alberto,

I think that if we want to support limited rolling downgrade some other version 
interchange needs to be done and there need to be tests that prove that the 
downgrade works.  That would let us document which versions are compatible for 
a downgrade and enforce that no-one attempts it between incompatible versions.

For instance, there is work going on right now that introduces communications 
changes to remove UDP messaging.  Once rolling upgrade completes it will shut 
down unsecure UDP communications.  At that point there is no way to go back.  
If you tried it the old servers would try to communicate with UDP but the new 
servers would not have UDP sockets open for security reasons.

As a side note, clients would all have to be rolled back before starting in on 
the servers.  Clients aren't equipped to talk to an older version server, and 
servers will reject the client's attempts to create connections.

On 4/17/20, 10:14 AM, "Alberto Gomez"  wrote:

Hi Bruce,

Thanks a lot for your answer. We had not thought about the changes in 
distributed algorithms when analyzing rolling downgrades.

Rolling downgrade is a pretty important requirement for our customers so we 
would not like to close the discussion here and instead try to see if it is 
still reasonable to propose it for Geode maybe relaxing a bit the expectations 
and clarifying some things.

First, I think supporting rolling downgrade does not mean making it 
impossible to upgrade distributed algorithms. It means that you need to support 
the new and the old algorithms (just as it is done today with rolling upgrades) 
in the upgraded version and also support the possibility of switching to the 
old algorithm in a fully upgraded system.

Second of all, I would say it is not very common to upgrade distributed 
algorithms, or at least, it does not seem to have been the case so far in 
Geode. Therefore, the burden of adding the logic to support the rolling 
downgrade would not be something to be carried in every release. In my opinion, 
it will be some extra percentage of work to be added to the work to support the 
rolling upgrade of the algorithm as the rolling downgrade will probably be 
using the mechanisms implemented for the rolling upgrade.

Third, we do not need to support the rolling downgrade from any release to 
any other older release. We could just support the rolling downgrade (at least 
when distributed algorithms are changed) between consecutive versions. They 
could be considered special cases like those when it is required to provide a 
tool to convert files in order to assure compatibility.

-Alberto



From: Bruce Schuchardt 
Sent: Thursday, April 16, 2020 5:04 PM
To: dev@geode.apache.org 
Subject: Re: About Geode rolling downgrade

-1

Another reason that we should not support rolling downgrade is that it 
makes it impossible to upgrade distributed algorithms.

When we added rolling upgrade support we pretty much immediately ran into a 
distributed hang when a test started a Locator using an older version.  In that 
release we also introduced the cluster configuration service and along with 
that we needed to upgrade the distributed lock service's notion of the "elder" 
member of the cluster.  Prior to that change a Locator could not fill this 
role, but the CCS needed to be able to use locking and needed a Locator to be 
able to fill this role.  During upgrade we used the old "elder" algorithm but 
once the upgrade was finished we switched to the new algorithm.  If you 
introduced an older Locator into this upgraded cluster it wouldn't think that 
it should be the "elder" but the rest of the cluster would expect it to be the 
elder.

You could support rolling downgrade in this scenario with extra logic and 
extra testing, but I don't think that will always be 

Re: About Geode rolling downgrade

2020-04-17 Thread Bruce Schuchardt
Hi Alberto, 

I think that if we want to support limited rolling downgrade some other version 
interchange needs to be done and there need to be tests that prove that the 
downgrade works.  That would let us document which versions are compatible for 
a downgrade and enforce that no-one attempts it between incompatible versions.

For instance, there is work going on right now that introduces communications 
changes to remove UDP messaging.  Once rolling upgrade completes it will shut 
down unsecure UDP communications.  At that point there is no way to go back.  
If you tried it the old servers would try to communicate with UDP but the new 
servers would not have UDP sockets open for security reasons.

As a side note, clients would all have to be rolled back before starting in on 
the servers.  Clients aren't equipped to talk to an older version server, and 
servers will reject the client's attempts to create connections.

On 4/17/20, 10:14 AM, "Alberto Gomez"  wrote:

Hi Bruce,

Thanks a lot for your answer. We had not thought about the changes in 
distributed algorithms when analyzing rolling downgrades.

Rolling downgrade is a pretty important requirement for our customers so we 
would not like to close the discussion here and instead try to see if it is 
still reasonable to propose it for Geode maybe relaxing a bit the expectations 
and clarifying some things.

First, I think supporting rolling downgrade does not mean making it 
impossible to upgrade distributed algorithms. It means that you need to support 
the new and the old algorithms (just as it is done today with rolling upgrades) 
in the upgraded version and also support the possibility of switching to the 
old algorithm in a fully upgraded system.

Second of all, I would say it is not very common to upgrade distributed 
algorithms, or at least, it does not seem to have been the case so far in 
Geode. Therefore, the burden of adding the logic to support the rolling 
downgrade would not be something to be carried in every release. In my opinion, 
it will be some extra percentage of work to be added to the work to support the 
rolling upgrade of the algorithm as the rolling downgrade will probably be 
using the mechanisms implemented for the rolling upgrade.

Third, we do not need to support the rolling downgrade from any release to 
any other older release. We could just support the rolling downgrade (at least 
when distributed algorithms are changed) between consecutive versions. They 
could be considered special cases like those when it is required to provide a 
tool to convert files in order to assure compatibility.

-Alberto



From: Bruce Schuchardt 
Sent: Thursday, April 16, 2020 5:04 PM
To: dev@geode.apache.org 
Subject: Re: About Geode rolling downgrade

-1

Another reason that we should not support rolling downgrade is that it 
makes it impossible to upgrade distributed algorithms.

When we added rolling upgrade support we pretty much immediately ran into a 
distributed hang when a test started a Locator using an older version.  In that 
release we also introduced the cluster configuration service and along with 
that we needed to upgrade the distributed lock service's notion of the "elder" 
member of the cluster.  Prior to that change a Locator could not fill this 
role, but the CCS needed to be able to use locking and needed a Locator to be 
able to fill this role.  During upgrade we used the old "elder" algorithm but 
once the upgrade was finished we switched to the new algorithm.  If you 
introduced an older Locator into this upgraded cluster it wouldn't think that 
it should be the "elder" but the rest of the cluster would expect it to be the 
elder.

You could support rolling downgrade in this scenario with extra logic and 
extra testing, but I don't think that will always be the case.  Rolling 
downgrade support would place an immense burden on developers in extra 
development and testing in order to ensure that older algorithms could always 
be brought back on-line.

On 4/16/20, 4:24 AM, "Alberto Gomez"  wrote:

Hi,

Some months ago I posted a question on this list (see [1]) about the 
possibility of supporting "rolling downgrade" in Geode in order to downgrade a 
Geode system to an older version, similar to the "rolling upgrade" currently 
supported.
With your answers and my investigations my conclusion was that the main 
stumbling block to support "rolling downgrades" was the compatibility of 
persistent files which was very hard to achieve because old members would 
require to be prepared to support newer versions of persistent files.

We have come up with a new approach to support rolling downgrades in 
Geode which consists of the following procedu

Re: About Geode rolling downgrade

2020-04-17 Thread Alberto Gomez
Hi Bruce,

Thanks a lot for your answer. We had not thought about the changes in 
distributed algorithms when analyzing rolling downgrades.

Rolling downgrade is a pretty important requirement for our customers so we 
would not like to close the discussion here and instead try to see if it is 
still reasonable to propose it for Geode maybe relaxing a bit the expectations 
and clarifying some things.

First, I think supporting rolling downgrade does not mean making it impossible 
to upgrade distributed algorithms. It means that you need to support the new 
and the old algorithms (just as it is done today with rolling upgrades) in the 
upgraded version and also support the possibility of switching to the old 
algorithm in a fully upgraded system.

Second of all, I would say it is not very common to upgrade distributed 
algorithms, or at least, it does not seem to have been the case so far in 
Geode. Therefore, the burden of adding the logic to support the rolling 
downgrade would not be something to be carried in every release. In my opinion, 
it will be some extra percentage of work to be added to the work to support the 
rolling upgrade of the algorithm as the rolling downgrade will probably be 
using the mechanisms implemented for the rolling upgrade.

Third, we do not need to support the rolling downgrade from any release to any 
other older release. We could just support the rolling downgrade (at least when 
distributed algorithms are changed) between consecutive versions. They could be 
considered special cases like those when it is required to provide a tool to 
convert files in order to assure compatibility.

-Alberto



From: Bruce Schuchardt 
Sent: Thursday, April 16, 2020 5:04 PM
To: dev@geode.apache.org 
Subject: Re: About Geode rolling downgrade

-1

Another reason that we should not support rolling downgrade is that it makes it 
impossible to upgrade distributed algorithms.

When we added rolling upgrade support we pretty much immediately ran into a 
distributed hang when a test started a Locator using an older version.  In that 
release we also introduced the cluster configuration service and along with 
that we needed to upgrade the distributed lock service's notion of the "elder" 
member of the cluster.  Prior to that change a Locator could not fill this 
role, but the CCS needed to be able to use locking and needed a Locator to be 
able to fill this role.  During upgrade we used the old "elder" algorithm but 
once the upgrade was finished we switched to the new algorithm.  If you 
introduced an older Locator into this upgraded cluster it wouldn't think that 
it should be the "elder" but the rest of the cluster would expect it to be the 
elder.

You could support rolling downgrade in this scenario with extra logic and extra 
testing, but I don't think that will always be the case.  Rolling downgrade 
support would place an immense burden on developers in extra development and 
testing in order to ensure that older algorithms could always be brought back 
on-line.

On 4/16/20, 4:24 AM, "Alberto Gomez"  wrote:

Hi,

Some months ago I posted a question on this list (see [1]) about the 
possibility of supporting "rolling downgrade" in Geode in order to downgrade a 
Geode system to an older version, similar to the "rolling upgrade" currently 
supported.
With your answers and my investigations my conclusion was that the main 
stumbling block to support "rolling downgrades" was the compatibility of 
persistent files which was very hard to achieve because old members would 
require to be prepared to support newer versions of persistent files.

We have come up with a new approach to support rolling downgrades in Geode 
which consists of the following procedure:

- For each locator:
  - Stop locator
  - Remove locator files
  - Start locator in older version

- For each server:
  - Stop server
  - Remove server files
  - Revoke missing-disk-stores for server
  - Start server in older version

Some extra details about this procedure:
- The starting and stopping of processes may not be able to be done using 
gfsh as gfsh does not allow to manage members in a different version than its 
own.
- Redundancy in servers is required
- More than one locator is required
- The allow_old_members_to_join_for_testing needs to be passed to the 
members.

I would like to ask two questions regarding this procedure:
- Do you see any issue not considered by this procedure or any alternative 
to it?
- Would it be reasonable to make public the 
"allow_old_members_to_join_for_testing" parameter (with a new name) so that it 
might be valid option for production systems to support, for example, the 
procedure proposed?

Thanks in advance for your answers.

Best regards,

-Alberto G.


[1]
 
http://mail-archive

Re: About Geode rolling downgrade

2020-04-16 Thread Bruce Schuchardt
-1

Another reason that we should not support rolling downgrade is that it makes it 
impossible to upgrade distributed algorithms.

When we added rolling upgrade support we pretty much immediately ran into a 
distributed hang when a test started a Locator using an older version.  In that 
release we also introduced the cluster configuration service and along with 
that we needed to upgrade the distributed lock service's notion of the "elder" 
member of the cluster.  Prior to that change a Locator could not fill this 
role, but the CCS needed to be able to use locking and needed a Locator to be 
able to fill this role.  During upgrade we used the old "elder" algorithm but 
once the upgrade was finished we switched to the new algorithm.  If you 
introduced an older Locator into this upgraded cluster it wouldn't think that 
it should be the "elder" but the rest of the cluster would expect it to be the 
elder.

You could support rolling downgrade in this scenario with extra logic and extra 
testing, but I don't think that will always be the case.  Rolling downgrade 
support would place an immense burden on developers in extra development and 
testing in order to ensure that older algorithms could always be brought back 
on-line.

On 4/16/20, 4:24 AM, "Alberto Gomez"  wrote:

Hi,

Some months ago I posted a question on this list (see [1]) about the 
possibility of supporting "rolling downgrade" in Geode in order to downgrade a 
Geode system to an older version, similar to the "rolling upgrade" currently 
supported.
With your answers and my investigations my conclusion was that the main 
stumbling block to support "rolling downgrades" was the compatibility of 
persistent files which was very hard to achieve because old members would 
require to be prepared to support newer versions of persistent files.

We have come up with a new approach to support rolling downgrades in Geode 
which consists of the following procedure:

- For each locator:
  - Stop locator
  - Remove locator files
  - Start locator in older version

- For each server:
  - Stop server
  - Remove server files
  - Revoke missing-disk-stores for server
  - Start server in older version

Some extra details about this procedure:
- The starting and stopping of processes may not be able to be done using 
gfsh as gfsh does not allow to manage members in a different version than its 
own.
- Redundancy in servers is required
- More than one locator is required
- The allow_old_members_to_join_for_testing needs to be passed to the 
members.

I would like to ask two questions regarding this procedure:
- Do you see any issue not considered by this procedure or any alternative 
to it?
- Would it be reasonable to make public the 
"allow_old_members_to_join_for_testing" parameter (with a new name) so that it 
might be valid option for production systems to support, for example, the 
procedure proposed?

Thanks in advance for your answers.

Best regards,

-Alberto G.


[1]
 
http://mail-archives.apache.org/mod_mbox/geode-dev/201910.mbox/%3cb080e98c-5df4-e494-dcbd-383f6d979...@est.tech%3E





About Geode rolling downgrade

2020-04-16 Thread Alberto Gomez
Hi,

Some months ago I posted a question on this list (see [1]) about the 
possibility of supporting "rolling downgrade" in Geode in order to downgrade a 
Geode system to an older version, similar to the "rolling upgrade" currently 
supported.
With your answers and my investigations my conclusion was that the main 
stumbling block to support "rolling downgrades" was the compatibility of 
persistent files which was very hard to achieve because old members would 
require to be prepared to support newer versions of persistent files.

We have come up with a new approach to support rolling downgrades in Geode 
which consists of the following procedure:

- For each locator:
  - Stop locator
  - Remove locator files
  - Start locator in older version

- For each server:
  - Stop server
  - Remove server files
  - Revoke missing-disk-stores for server
  - Start server in older version

Some extra details about this procedure:
- The starting and stopping of processes may not be able to be done using gfsh 
as gfsh does not allow to manage members in a different version than its own.
- Redundancy in servers is required
- More than one locator is required
- The allow_old_members_to_join_for_testing needs to be passed to the members.

I would like to ask two questions regarding this procedure:
- Do you see any issue not considered by this procedure or any alternative to 
it?
- Would it be reasonable to make public the 
"allow_old_members_to_join_for_testing" parameter (with a new name) so that it 
might be valid option for production systems to support, for example, the 
procedure proposed?

Thanks in advance for your answers.

Best regards,

-Alberto G.


[1]
 
http://mail-archives.apache.org/mod_mbox/geode-dev/201910.mbox/%3cb080e98c-5df4-e494-dcbd-383f6d979...@est.tech%3E