Em 02-11-2017 16:43, Jeremy Evans escreveu:
On Thursday, November 2, 2017 at 6:57:57 PM UTC+1, Rodrigo Rosenfeld
Rosas wrote:
Currently we have quite a few clients depending on the
application. Coordinating a scheduled maintenance window is a
complicated and lengthy process as I have to check with both my
employer (which would certainly prefer to avoid any upgrades for
years or as long as possible) and with its clients and some of
them have very rigid rules for such maintenances involving
downtime. Also, I'd usually have to work in weekends or at night
when they accept a maintenance window. As you can probably guess,
I much prefer spend much more time figuring out how to reduce the
downtime as much as possible then trying to orchestrate a planned
migration downtime window.
For regular tasks, I can deploy many times a day without any
downtime by performing blue-green (canary) deployments. However,
upgrading the database is one of those tasks that I can't do
without any downtime yet. This time I got so close to this that I
decided to perform the upgrade without requesting for the
maintenance window because it would take just a few seconds of
downtime and then I could afford dealing with eventual errors that
would happen in that small window if any. It wouldn't cause
anything critical, except that the user would have to retry their
operation just like it would be the case if their Internet had a
temporary connection issue.
But I'd prefer to keep the window as short as possible. By using
pglogical, I was able to get PG10 in sync with PG96 in such a way
that when I stop PG96, PG10 would have the latest data and I could
then restart PG10 in the production port. All of that would take
about just 1s in our production environment. So far so great,
however due to this problem with OIDs I also have to restart the
application apps dealing with the database (the main app, the
admin app and the sidekiq cluster). Restarting those apps will add
a few extra seconds to the downtime window and if some user would
try to load the application while it's been restarted it would get
an error page from nginx which I'd really prefer to avoid at all
if possible to avoid having to explain to someone sending that
nginx error page screenshot by e-mail.
You should be able to figure out the PG10 OIDs well before the
switchover. OIDs are not changed after the schema is created. What
takes time is copying the data from PG96 to PG10, and then after the
data is copied, you can switch.
Note that even if you want to wait till all the data is copied, that
should be fine. It should be possible to use pglocal to replicate from
PG96 to PG10 for an arbitrarily long timeframe, certainly far longer
than it would take determine the PG96 to PG10 OID mapping. You can
restart your ruby processes running Sequel on a rolling basis so that
no connections are ever dropped. Once all process have been replaced
by new processes that are aware of the PG96 to PG10 OID mapping, then
you can switch the database so that PG10 runs on the production port.
This should result in zero downtime (at least downtime caused by Sequel).
For an example of how Sequel gets the type OIDs, here's the specific
related code:
https://github.com/jeremyevans/sequel/blob/master/lib/sequel/adapters/shared/postgres.rb#L205
https://github.com/jeremyevans/sequel/blob/master/lib/sequel/extensions/pg_array.rb#L130
https://github.com/jeremyevans/sequel/blob/master/lib/sequel/extensions/pg_row.rb#L424
One solution may be to use Sequel's sharding support, add a shard for
the PG10 (with the default shard using PG96), and at startup register
all used types twice, once inside a with_server block for the PG10
database.
Sometimes I'd need to restart PostgreSQL and that's why we setup
Sequel to always verify if the connection is valid before using it
to avoid errors in production when the database was restarted due
to some change to postgresql.conf for example. If I was able to
set up the application so that it would fix itself when the
database is upgraded that would be awesome even if it would add
some overhead to the application. Server-side performance is
barely a concern in our application after all. Keeping the app
working all the time is much more of a concern just like keeping
the dependencies up to date as much as possible.
That's unexpected downtime, and the connection_validator can handle
that well. This PG96 to PG10 database switchover is not unexpected
and can be adequately planned for using the process I explained above.
I know there are some PG load balancer solutions that allow one to
start enqueueing new requests to connections so that the database
could be restarted and then the load balancer could resume its
operation and return the connections from the queue without the
user noticing the database has restarted or upgraded. Except for
this issue with OIDs changing over upgrades. Another solution
would be to keep the same OIDs when upgrading if possible. Is it
possible to keep the same OIDs when upgrading?
I'm not sure if it is possible to keep the same type OIDs. You would
have to ask on a PostgreSQL specific list.
I don't think you need a different load balancer if you follow the
process I laid out above.
I read the rest of your message. I realize you have it in your mind
that this is something that should be done on the fly, but I don't
think that will work well for reasons I have already explained. I
advise you to consider the process I described above.
I think I'm pretty lost with these OID mapping thing. I performed the
following query both on PG 9.6 and PG 10 and got the same result:
select typname, oid, typarray from pg_type where typname = 'json';
typname | oid | typarray
---------+-----+----------
json | 114 | 199
It seems like the JSON oid, assuming I'm looking at the right columns,
for built-in types such as JSON should remain the same when I upgrade
the database. However, if I remember correctly the application was
interpreting a JSON column as if it was TEXT which means I'm missing
something here because it doesn't seem to be a matter of simply ensuring
those type OIDs are mapped...
I mean, I have just a handful of custom types and it would be probably
easy to add those mappings to the initialization code and deploy a new
version before switching to the new DB version. However I'm not sure if
this would be enough.
But I was also wondering, can't this approach yield to conflicts? I
mean, maybe the same OID is present for both databases but requiring
different conversion procs.
For example, if I understood correctly, in one of your proposed
solutions, I'd add PG 10 as a read-only slave and keep 9.6 as the master
and deploy the new version. I'd register the custom types in both
databases using a with_server block. Then I'd do something like this:
in with_server block I'd assign variables such as:
pg96_conversion_procs = DB.conversion_procs # in PG 9.6 with_server block
pg10_conversion_procs = DB.conversion_procs # in PG 10 with_server block
(pg10_conversion_procs.keys - pg96_conversion_procs.keys).each {|k|
pg96_conversion_procs[k] = pg10_conversion_procs[k] }
Assuming there's no conflicting OIDs between both databases, it should
be safe to switch the databases without restarting the application.
Did I understand your proposal correctly this time? I just have this
feeling that this wouldn't be enough because of errors related to JSON
columns being interpreted as TEXT when I switched the database
versions... After all, the JSON OID didn't change among those servers.
I'll try to find some time to try those methods and will let you know in
case I face any issues with that approach.
Thanks,
Rodrigo.
--
You received this message because you are subscribed to the Google Groups
"sequel-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/sequel-talk.
For more options, visit https://groups.google.com/d/optout.