Hi, I did a quick review today. First a couple minor comments:
1) monitoring.sgml typos: number of database -> number of databases calcuated -> calculated 2) unnecessary newline in heapam.c (no other changes) 3) unnecessary ListCell in DataChecksumsWorkerMain() on line 1345, shadowing earlier variable 4) unnecessary comment change in bufmgr.c (no other changes) 5) unnecessary include and newline in bulk_write.c (no other changes) 6) unnecessary newline in pgstat.c (no other changes) 7) controldata.c - maybe this if (oldctrl->data_checksum_version == 2) should use PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION instead of the magic constant? For "off" we use "0" which seems somewhat acceptable, but for other values it's less obvious what the meaning is. 8) xlog_internal.h - xl_checksum_state should be added to typedefs 9) system_functions.sql - Isn't it weird that this only creates the new pg_enable_data_checksums function, but not pg_disable_data_checksums? It also means it doesn't revoke EXECUTE from public on it, which I guess it probably should? Or why should this be different for the two functions? Also the error message seems to differ: test=> select pg_enable_data_checksums(); ERROR: permission denied for function pg_enable_data_checksums test=> select pg_disable_data_checksums(); ERROR: must be superuser Probably harmless, but seems a bit strange. But there also seems to be a more serious problem with recovery. I did a simple script that does a loop of * start a cluster * initialize a small pgbench database (scale 1 - 100) * run "long" pgbench * call pg_enable_data_checksums(), wait for it to complete * stop the cluster with "-m immediate" * start the cluster And this unfortunately hits this assert: bool AbsorbChecksumsOnBarrier(void) { Assert(LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION); LocalDataChecksumVersion = PG_DATA_CHECKSUM_VERSION; return true; } Based on our short discussion about this, the controlfile gets updated right after pg_enable_data_checksums() completes. The immediate stop however forces a recovery since the last checkpoint, which means we see the XLOG_CHECKSUMS WAL message again, and set the barrier. And then we exit recovery, try to start checkpointer and it trips over this, because the control file already has the "on" value :-( I'm not sure what's the best way to fix this. Would it be possible to remember we saw the XLOG_CHECKSUMS during recovery, and make the assert noop in that case? Or not set the barrier when exiting recovery. I'm not sure the relaxed assert would remain meaningful, though. What would it check on standbys, for example? Maybe a better way would be to wait for a checkpoint before updating the controlfile, similar to what we do at the beginning? Possibly even with the same "fast=true/false" logic. That would prevent us from seeing the XLOG_CHECKSUMS wal record with the updated flag. It would extend the "window" where a crash would mean we have to redo the checksums, but I don't think that matters much. For small databases who cares, and for large databases it should not be a meaningful difference (setting the checksums already ran over multiple checkpoints, so one checkpoint is not a big difference). regards -- Tomas Vondra