On Thursday, November 2, 2017 at 6:57:57 PM UTC+1, Rodrigo Rosenfeld Rosas wrote: > > Currently we have quite a few clients depending on the application. > Coordinating a scheduled maintenance window is a complicated and lengthy > process as I have to check with both my employer (which would certainly > prefer to avoid any upgrades for years or as long as possible) and with its > clients and some of them have very rigid rules for such maintenances > involving downtime. Also, I'd usually have to work in weekends or at night > when they accept a maintenance window. As you can probably guess, I much > prefer spend much more time figuring out how to reduce the downtime as much > as possible then trying to orchestrate a planned migration downtime window. > > For regular tasks, I can deploy many times a day without any downtime by > performing blue-green (canary) deployments. However, upgrading the database > is one of those tasks that I can't do without any downtime yet. This time I > got so close to this that I decided to perform the upgrade without > requesting for the maintenance window because it would take just a few > seconds of downtime and then I could afford dealing with eventual errors > that would happen in that small window if any. It wouldn't cause anything > critical, except that the user would have to retry their operation just > like it would be the case if their Internet had a temporary connection > issue. > > But I'd prefer to keep the window as short as possible. By using > pglogical, I was able to get PG10 in sync with PG96 in such a way that when > I stop PG96, PG10 would have the latest data and I could then restart PG10 > in the production port. All of that would take about just 1s in our > production environment. So far so great, however due to this problem with > OIDs I also have to restart the application apps dealing with the database > (the main app, the admin app and the sidekiq cluster). Restarting those > apps will add a few extra seconds to the downtime window and if some user > would try to load the application while it's been restarted it would get an > error page from nginx which I'd really prefer to avoid at all if possible > to avoid having to explain to someone sending that nginx error page > screenshot by e-mail. >
You should be able to figure out the PG10 OIDs well before the switchover. OIDs are not changed after the schema is created. What takes time is copying the data from PG96 to PG10, and then after the data is copied, you can switch. Note that even if you want to wait till all the data is copied, that should be fine. It should be possible to use pglocal to replicate from PG96 to PG10 for an arbitrarily long timeframe, certainly far longer than it would take determine the PG96 to PG10 OID mapping. You can restart your ruby processes running Sequel on a rolling basis so that no connections are ever dropped. Once all process have been replaced by new processes that are aware of the PG96 to PG10 OID mapping, then you can switch the database so that PG10 runs on the production port. This should result in zero downtime (at least downtime caused by Sequel). For an example of how Sequel gets the type OIDs, here's the specific related code: https://github.com/jeremyevans/sequel/blob/master/lib/sequel/adapters/shared/postgres.rb#L205 https://github.com/jeremyevans/sequel/blob/master/lib/sequel/extensions/pg_array.rb#L130 https://github.com/jeremyevans/sequel/blob/master/lib/sequel/extensions/pg_row.rb#L424 One solution may be to use Sequel's sharding support, add a shard for the PG10 (with the default shard using PG96), and at startup register all used types twice, once inside a with_server block for the PG10 database. Sometimes I'd need to restart PostgreSQL and that's why we setup Sequel to > always verify if the connection is valid before using it to avoid errors in > production when the database was restarted due to some change to > postgresql.conf for example. If I was able to set up the application so > that it would fix itself when the database is upgraded that would be > awesome even if it would add some overhead to the application. Server-side > performance is barely a concern in our application after all. Keeping the > app working all the time is much more of a concern just like keeping the > dependencies up to date as much as possible. > That's unexpected downtime, and the connection_validator can handle that well. This PG96 to PG10 database switchover is not unexpected and can be adequately planned for using the process I explained above. I know there are some PG load balancer solutions that allow one to start > enqueueing new requests to connections so that the database could be > restarted and then the load balancer could resume its operation and return > the connections from the queue without the user noticing the database has > restarted or upgraded. Except for this issue with OIDs changing over > upgrades. Another solution would be to keep the same OIDs when upgrading if > possible. Is it possible to keep the same OIDs when upgrading? > I'm not sure if it is possible to keep the same type OIDs. You would have to ask on a PostgreSQL specific list. I don't think you need a different load balancer if you follow the process I laid out above. I read the rest of your message. I realize you have it in your mind that this is something that should be done on the fly, but I don't think that will work well for reasons I have already explained. I advise you to consider the process I described above. Thanks, Jeremy -- You received this message because you are subscribed to the Google Groups "sequel-talk" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/sequel-talk. For more options, visit https://groups.google.com/d/optout.
