On Tue, 2021-06-08 at 12:52 -0400, Tom Lane wrote: > Yeah. If there are actually use-cases for knowing both things, then > we ought to record both. However, it's not real clear to me why > LSN would be interesting.
Let me expand on my use case: in a sharded environment, how do you figure out if you need to repopulate an UNLOGGED table? For a single node, there's not much risk, because you either have the data or you don't. But in a sharded environment, if one node crashes, you might end up with some shards empty and others populated, and that's inconsistent. If Postgres provides a way to figure out when the last crash happened, then that would give the sharding solution the basic information it needs to figure out if it needs to clear and repopulate the entire unlogged table (i.e. all its shards on all nodes). Clearly, the sharding solution would need to do some tracking of its own, like recording when the last TRUNCATE happened, to figure out what to do. For that tracking, I think using the LSN makes more sense than a timestamp. > (If there is a reason to log LSN, maybe the argument is different > for that? Although I'd think that looking at the last checkpoint > REDO location is sufficient for figuring out where the current > crash recovery attempt started.) I came to a similar conclusion for my use case: tracking the LSN at the end of the recovery makes more sense. I attached a patch to track last recovery LSN, time, and total count. But there are a few issues: 1. Do we want a way to reset the counter? If so, should it be done with pg_resetwal or a superuser SQL function? 2. It would be helpful to also know the last time a promotion happened, for the same reason (e.g. a failover of a single node leading to an unlogged table with some empty shards and some populated ones). Should also store the last promotion LSN and time as well? Does "promotion count" make sense, and should we track that, too? 3. Should we try to track crash information across promotions, or just start them at the initial values when promoted? Regards, Jeff Davis
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 3a21129021a..c010e3435d1 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -24733,6 +24733,21 @@ SELECT collation for ('foo' COLLATE "de_DE"); <tbody> + <row> + <entry><structfield>last_recovery_lsn</structfield></entry> + <entry><type>pg_lsn</type></entry> + </row> + + <row> + <entry><structfield>last_recovery_time</structfield></entry> + <entry><type>timestamp with time zone</type></entry> + </row> + + <row> + <entry><structfield>recovery_count</structfield></entry> + <entry><type>bigint</type></entry> + </row> + <row> <entry><structfield>min_recovery_end_lsn</structfield></entry> <entry><type>pg_lsn</type></entry> diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml index a8c5e4028af..defeaeb74d4 100644 --- a/doc/src/sgml/ref/create_table.sgml +++ b/doc/src/sgml/ref/create_table.sgml @@ -211,6 +211,14 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM Any indexes created on an unlogged table are automatically unlogged as well. </para> + + <para> + The function <link linkend="functions-pg-control-recovery"> + <function>pg_control_recovery()</function></link> can be used to + determine when a crash or standby promotion last happened, which is + useful for determining whether an <literal>UNLOGGED</literal> table + should be repopulated. + </para> </listitem> </varlistentry> diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 441a9124cd5..f2ce7b02a3a 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -5384,6 +5384,11 @@ BootStrapXLOG(void) ControlFile->checkPoint = checkPoint.redo; ControlFile->checkPointCopy = checkPoint; + /* initialize last recovery as the initial checkpoint */ + ControlFile->lastRecoveryLSN = checkPoint.redo; + ControlFile->lastRecoveryTime = checkPoint.time; + ControlFile->recoveryCount = 0; + /* some additional ControlFile fields are set in WriteControlFile() */ WriteControlFile(); @@ -9223,6 +9228,14 @@ CreateCheckPoint(int flags) ControlFile->minRecoveryPoint = InvalidXLogRecPtr; ControlFile->minRecoveryPointTLI = 0; + /* update recovery information */ + if (flags & CHECKPOINT_END_OF_RECOVERY) + { + ControlFile->lastRecoveryLSN = checkPoint.redo; + ControlFile->lastRecoveryTime = checkPoint.time; + ControlFile->recoveryCount++; + } + /* * Persist unloggedLSN value. It's reset on crash recovery, so this goes * unused on non-shutdown checkpoints, but seems useful to store it always diff --git a/src/backend/utils/misc/pg_controldata.c b/src/backend/utils/misc/pg_controldata.c index 209a20a8827..a7e709ec15a 100644 --- a/src/backend/utils/misc/pg_controldata.c +++ b/src/backend/utils/misc/pg_controldata.c @@ -210,8 +210,8 @@ pg_control_checkpoint(PG_FUNCTION_ARGS) Datum pg_control_recovery(PG_FUNCTION_ARGS) { - Datum values[5]; - bool nulls[5]; + Datum values[8]; + bool nulls[8]; TupleDesc tupdesc; HeapTuple htup; ControlFileData *ControlFile; @@ -221,16 +221,22 @@ pg_control_recovery(PG_FUNCTION_ARGS) * Construct a tuple descriptor for the result row. This must match this * function's pg_proc entry! */ - tupdesc = CreateTemplateTupleDesc(5); - TupleDescInitEntry(tupdesc, (AttrNumber) 1, "min_recovery_end_lsn", + tupdesc = CreateTemplateTupleDesc(8); + TupleDescInitEntry(tupdesc, (AttrNumber) 1, "last_recovery_lsn", PG_LSNOID, -1, 0); - TupleDescInitEntry(tupdesc, (AttrNumber) 2, "min_recovery_end_timeline", + TupleDescInitEntry(tupdesc, (AttrNumber) 2, "last_recovery_time", + TIMESTAMPTZOID, -1, 0); + TupleDescInitEntry(tupdesc, (AttrNumber) 3, "recovery_count", + INT8OID, -1, 0); + TupleDescInitEntry(tupdesc, (AttrNumber) 4, "min_recovery_end_lsn", + PG_LSNOID, -1, 0); + TupleDescInitEntry(tupdesc, (AttrNumber) 5, "min_recovery_end_timeline", INT4OID, -1, 0); - TupleDescInitEntry(tupdesc, (AttrNumber) 3, "backup_start_lsn", + TupleDescInitEntry(tupdesc, (AttrNumber) 6, "backup_start_lsn", PG_LSNOID, -1, 0); - TupleDescInitEntry(tupdesc, (AttrNumber) 4, "backup_end_lsn", + TupleDescInitEntry(tupdesc, (AttrNumber) 7, "backup_end_lsn", PG_LSNOID, -1, 0); - TupleDescInitEntry(tupdesc, (AttrNumber) 5, "end_of_backup_record_required", + TupleDescInitEntry(tupdesc, (AttrNumber) 8, "end_of_backup_record_required", BOOLOID, -1, 0); tupdesc = BlessTupleDesc(tupdesc); @@ -240,21 +246,30 @@ pg_control_recovery(PG_FUNCTION_ARGS) ereport(ERROR, (errmsg("calculated CRC checksum does not match value stored in file"))); - values[0] = LSNGetDatum(ControlFile->minRecoveryPoint); + values[0] = LSNGetDatum(ControlFile->lastRecoveryLSN); nulls[0] = false; - values[1] = Int32GetDatum(ControlFile->minRecoveryPointTLI); + values[1] = TimestampTzGetDatum(time_t_to_timestamptz(ControlFile->lastRecoveryTime)); nulls[1] = false; - values[2] = LSNGetDatum(ControlFile->backupStartPoint); + values[2] = Int64GetDatum(ControlFile->recoveryCount); nulls[2] = false; - values[3] = LSNGetDatum(ControlFile->backupEndPoint); + values[3] = LSNGetDatum(ControlFile->minRecoveryPoint); nulls[3] = false; - values[4] = BoolGetDatum(ControlFile->backupEndRequired); + values[4] = Int32GetDatum(ControlFile->minRecoveryPointTLI); nulls[4] = false; + values[5] = LSNGetDatum(ControlFile->backupStartPoint); + nulls[5] = false; + + values[6] = LSNGetDatum(ControlFile->backupEndPoint); + nulls[6] = false; + + values[7] = BoolGetDatum(ControlFile->backupEndRequired); + nulls[7] = false; + htup = heap_form_tuple(tupdesc, values, nulls); PG_RETURN_DATUM(HeapTupleGetDatum(htup)); diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c index f911f98d946..4ed693696ce 100644 --- a/src/bin/pg_controldata/pg_controldata.c +++ b/src/bin/pg_controldata/pg_controldata.c @@ -99,6 +99,7 @@ main(int argc, char *argv[]) time_t time_tmp; char pgctime_str[128]; char ckpttime_str[128]; + char last_recovery_time_str[128]; char mock_auth_nonce_str[MOCK_AUTH_NONCE_LEN * 2 + 1]; const char *strftime_fmt = "%c"; const char *progname; @@ -202,6 +203,9 @@ main(int argc, char *argv[]) time_tmp = (time_t) ControlFile->checkPointCopy.time; strftime(ckpttime_str, sizeof(ckpttime_str), strftime_fmt, localtime(&time_tmp)); + time_tmp = (time_t) ControlFile->lastRecoveryTime; + strftime(last_recovery_time_str, sizeof(last_recovery_time_str), strftime_fmt, + localtime(&time_tmp)); /* * Calculate name of the WAL file containing the latest checkpoint's REDO @@ -272,6 +276,12 @@ main(int argc, char *argv[]) ControlFile->checkPointCopy.newestCommitTsXid); printf(_("Time of latest checkpoint: %s\n"), ckpttime_str); + printf(_("Last recovery LSN: %X/%X\n"), + LSN_FORMAT_ARGS(ControlFile->lastRecoveryLSN)); + printf(_("Last recovery time: %s\n"), + last_recovery_time_str); + printf(_("Recovery count: %llu\n"), + (unsigned long long) ControlFile->recoveryCount); printf(_("Fake LSN counter for unlogged rels: %X/%X\n"), LSN_FORMAT_ARGS(ControlFile->unloggedLSN)); printf(_("Minimum recovery ending location: %X/%X\n"), diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h index e3f48158ce7..c0e69b94f31 100644 --- a/src/include/catalog/pg_control.h +++ b/src/include/catalog/pg_control.h @@ -130,6 +130,10 @@ typedef struct ControlFileData CheckPoint checkPointCopy; /* copy of last check point record */ + XLogRecPtr lastRecoveryLSN; /* redo LSN of last recovery checkpoint */ + XLogRecPtr lastRecoveryTime; /* time of last recovery checkpoint */ + uint64 recoveryCount; /* number of recoveries since last reset */ + XLogRecPtr unloggedLSN; /* current fake LSN value, for unlogged rels */ /* diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index acbcae46070..655b7cf7984 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -11486,9 +11486,9 @@ { oid => '3443', descr => 'pg_controldata recovery state information as a function', proname => 'pg_control_recovery', provolatile => 'v', prorettype => 'record', - proargtypes => '', proallargtypes => '{pg_lsn,int4,pg_lsn,pg_lsn,bool}', - proargmodes => '{o,o,o,o,o}', - proargnames => '{min_recovery_end_lsn,min_recovery_end_timeline,backup_start_lsn,backup_end_lsn,end_of_backup_record_required}', + proargtypes => '', proallargtypes => '{pg_lsn,timestamptz,int8,pg_lsn,int4,pg_lsn,pg_lsn,bool}', + proargmodes => '{o,o,o,o,o,o,o,o}', + proargnames => '{last_recovery_lsn,last_recovery_time,recovery_count,min_recovery_end_lsn,min_recovery_end_timeline,backup_start_lsn,backup_end_lsn,end_of_backup_record_required}', prosrc => 'pg_control_recovery' }, { oid => '3444',