On 08/01/2018 09:20 AM, Alvaro Herrera wrote:

my problem is that I think the "restart" approach is just using the
entirely wrong hammer to solve the problem at hand.  At the very least
it's very problematic in respect to replicas, which need to know about
the setting too, and can have similar problems the restart on the
primary is supposed to prevent.
If we define "restart" to mean taking all the servers down
simultaneously, that can be planned.

People in mission critical environments do not "restart all servers". They fail over to a secondary to do maintenance on a primary. When you have a system where you literally lose thousands of dollars every minute the database is down you can't do what you are proposing. When you have a system that if the database is down for longer than X minutes, you actually lose a whole day because all of the fabricators have to revalidate before they begin work, you can't do that. Granted that is not the majority (which you mention) but let's not forget them.

The one place where a restart does happen and will continue to happen for around 5 (3 if you incorporate pg_logical and 9.6) more years is upgrades. Although we have logical replication for upgrades now, we are 5 years away from the majority of users being on a version of PostgreSQL that supports logical replication for upgrades. So, I can see an argument for an incremental approach because people could enable checksums as part of their upgrade restart.

For users that cannot do that,
that's too bad, they'll have to wait to the next release in order to
enable checksums (assuming they fund the necessary development).  But

I have to say, as a proponent of funded development for longer than most I like to see this refreshing take on the fact that this all does take money.

there are many systems where it *is* possible to take everything down
for five seconds, then back up.  They can definitely take advantage of
checksummed data.

This is a good point.

Currently, the only way to enable checksums is to initdb and create a
new copy of the data from a logical backup, which could take hours or
even days if data is large, or use logical replication.

Originally, I was going to -1 how this is being implemented. I too wish we had the "ALTER DATABASE ENABLE CHECKSUM" or equivalent without a restart. However, being able to just restart is a huge step forward from what we have now.

Lastly, I think Alvaro has a point with the incremental development and I also think some others on this thread need to, "show me the patch" instead of being armchair directors of development.

JD



--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc
***  A fault and talent of mine is to tell it exactly how it is.  ***
PostgreSQL centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://postgresconf.org
*****     Unless otherwise stated, opinions are my own.   *****


Reply via email to