On Tue, 2021-06-08 at 12:52 -0400, Tom Lane wrote:
> Yeah.  If there are actually use-cases for knowing both things, then
> we ought to record both.  However, it's not real clear to me why
> LSN would be interesting.

Let me expand on my use case: in a sharded environment, how do you
figure out if you need to repopulate an UNLOGGED table? For a single
node, there's not much risk, because you either have the data or you
don't. But in a sharded environment, if one node crashes, you might end
up with some shards empty and others populated, and that's
inconsistent.

If Postgres provides a way to figure out when the last crash happened,
then that would give the sharding solution the basic information it
needs to figure out if it needs to clear and repopulate the entire
unlogged table (i.e. all its shards on all nodes).

Clearly, the sharding solution would need to do some tracking of its
own, like recording when the last TRUNCATE happened, to figure out what
to do. For that tracking, I think using the LSN makes more sense than a
timestamp.

> (If there is a reason to log LSN, maybe the argument is different
> for that?  Although I'd think that looking at the last checkpoint
> REDO location is sufficient for figuring out where the current
> crash recovery attempt started.)

I came to a similar conclusion for my use case: tracking the LSN at the
end of the recovery makes more sense.

I attached a patch to track last recovery LSN, time, and total count.
But there are a few issues:

1. Do we want a way to reset the counter? If so, should it be done with
pg_resetwal or a superuser SQL function?

2. It would be helpful to also know the last time a promotion happened,
for the same reason (e.g. a failover of a single node leading to an
unlogged table with some empty shards and some populated ones). Should
also store the last promotion LSN and time as well? Does "promotion
count" make sense, and should we track that, too?

3. Should we try to track crash information across promotions, or just
start them at the initial values when promoted?

Regards,
        Jeff Davis

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 3a21129021a..c010e3435d1 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -24733,6 +24733,21 @@ SELECT collation for ('foo' COLLATE "de_DE");
 
      <tbody>
 
+      <row>
+       <entry><structfield>last_recovery_lsn</structfield></entry>
+       <entry><type>pg_lsn</type></entry>
+      </row>
+
+      <row>
+       <entry><structfield>last_recovery_time</structfield></entry>
+       <entry><type>timestamp with time zone</type></entry>
+      </row>
+
+      <row>
+       <entry><structfield>recovery_count</structfield></entry>
+       <entry><type>bigint</type></entry>
+      </row>
+
       <row>
        <entry><structfield>min_recovery_end_lsn</structfield></entry>
        <entry><type>pg_lsn</type></entry>
diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index a8c5e4028af..defeaeb74d4 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -211,6 +211,14 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
       Any indexes created on an unlogged table are automatically unlogged as
       well.
      </para>
+
+     <para>
+      The function <link linkend="functions-pg-control-recovery">
+      <function>pg_control_recovery()</function></link> can be used to
+      determine when a crash or standby promotion last happened, which is
+      useful for determining whether an <literal>UNLOGGED</literal> table
+      should be repopulated.
+     </para>
     </listitem>
    </varlistentry>
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 441a9124cd5..f2ce7b02a3a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5384,6 +5384,11 @@ BootStrapXLOG(void)
 	ControlFile->checkPoint = checkPoint.redo;
 	ControlFile->checkPointCopy = checkPoint;
 
+	/* initialize last recovery as the initial checkpoint */
+	ControlFile->lastRecoveryLSN = checkPoint.redo;
+	ControlFile->lastRecoveryTime = checkPoint.time;
+	ControlFile->recoveryCount = 0;
+
 	/* some additional ControlFile fields are set in WriteControlFile() */
 	WriteControlFile();
 
@@ -9223,6 +9228,14 @@ CreateCheckPoint(int flags)
 	ControlFile->minRecoveryPoint = InvalidXLogRecPtr;
 	ControlFile->minRecoveryPointTLI = 0;
 
+	/* update recovery information */
+	if (flags & CHECKPOINT_END_OF_RECOVERY)
+	{
+		ControlFile->lastRecoveryLSN = checkPoint.redo;
+		ControlFile->lastRecoveryTime = checkPoint.time;
+		ControlFile->recoveryCount++;
+	}
+
 	/*
 	 * Persist unloggedLSN value. It's reset on crash recovery, so this goes
 	 * unused on non-shutdown checkpoints, but seems useful to store it always
diff --git a/src/backend/utils/misc/pg_controldata.c b/src/backend/utils/misc/pg_controldata.c
index 209a20a8827..a7e709ec15a 100644
--- a/src/backend/utils/misc/pg_controldata.c
+++ b/src/backend/utils/misc/pg_controldata.c
@@ -210,8 +210,8 @@ pg_control_checkpoint(PG_FUNCTION_ARGS)
 Datum
 pg_control_recovery(PG_FUNCTION_ARGS)
 {
-	Datum		values[5];
-	bool		nulls[5];
+	Datum		values[8];
+	bool		nulls[8];
 	TupleDesc	tupdesc;
 	HeapTuple	htup;
 	ControlFileData *ControlFile;
@@ -221,16 +221,22 @@ pg_control_recovery(PG_FUNCTION_ARGS)
 	 * Construct a tuple descriptor for the result row.  This must match this
 	 * function's pg_proc entry!
 	 */
-	tupdesc = CreateTemplateTupleDesc(5);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "min_recovery_end_lsn",
+	tupdesc = CreateTemplateTupleDesc(8);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "last_recovery_lsn",
 					   PG_LSNOID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "min_recovery_end_timeline",
+	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "last_recovery_time",
+					   TIMESTAMPTZOID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 3, "recovery_count",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 4, "min_recovery_end_lsn",
+					   PG_LSNOID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 5, "min_recovery_end_timeline",
 					   INT4OID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 3, "backup_start_lsn",
+	TupleDescInitEntry(tupdesc, (AttrNumber) 6, "backup_start_lsn",
 					   PG_LSNOID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 4, "backup_end_lsn",
+	TupleDescInitEntry(tupdesc, (AttrNumber) 7, "backup_end_lsn",
 					   PG_LSNOID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 5, "end_of_backup_record_required",
+	TupleDescInitEntry(tupdesc, (AttrNumber) 8, "end_of_backup_record_required",
 					   BOOLOID, -1, 0);
 	tupdesc = BlessTupleDesc(tupdesc);
 
@@ -240,21 +246,30 @@ pg_control_recovery(PG_FUNCTION_ARGS)
 		ereport(ERROR,
 				(errmsg("calculated CRC checksum does not match value stored in file")));
 
-	values[0] = LSNGetDatum(ControlFile->minRecoveryPoint);
+	values[0] = LSNGetDatum(ControlFile->lastRecoveryLSN);
 	nulls[0] = false;
 
-	values[1] = Int32GetDatum(ControlFile->minRecoveryPointTLI);
+	values[1] = TimestampTzGetDatum(time_t_to_timestamptz(ControlFile->lastRecoveryTime));
 	nulls[1] = false;
 
-	values[2] = LSNGetDatum(ControlFile->backupStartPoint);
+	values[2] = Int64GetDatum(ControlFile->recoveryCount);
 	nulls[2] = false;
 
-	values[3] = LSNGetDatum(ControlFile->backupEndPoint);
+	values[3] = LSNGetDatum(ControlFile->minRecoveryPoint);
 	nulls[3] = false;
 
-	values[4] = BoolGetDatum(ControlFile->backupEndRequired);
+	values[4] = Int32GetDatum(ControlFile->minRecoveryPointTLI);
 	nulls[4] = false;
 
+	values[5] = LSNGetDatum(ControlFile->backupStartPoint);
+	nulls[5] = false;
+
+	values[6] = LSNGetDatum(ControlFile->backupEndPoint);
+	nulls[6] = false;
+
+	values[7] = BoolGetDatum(ControlFile->backupEndRequired);
+	nulls[7] = false;
+
 	htup = heap_form_tuple(tupdesc, values, nulls);
 
 	PG_RETURN_DATUM(HeapTupleGetDatum(htup));
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index f911f98d946..4ed693696ce 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -99,6 +99,7 @@ main(int argc, char *argv[])
 	time_t		time_tmp;
 	char		pgctime_str[128];
 	char		ckpttime_str[128];
+	char		last_recovery_time_str[128];
 	char		mock_auth_nonce_str[MOCK_AUTH_NONCE_LEN * 2 + 1];
 	const char *strftime_fmt = "%c";
 	const char *progname;
@@ -202,6 +203,9 @@ main(int argc, char *argv[])
 	time_tmp = (time_t) ControlFile->checkPointCopy.time;
 	strftime(ckpttime_str, sizeof(ckpttime_str), strftime_fmt,
 			 localtime(&time_tmp));
+	time_tmp = (time_t) ControlFile->lastRecoveryTime;
+	strftime(last_recovery_time_str, sizeof(last_recovery_time_str), strftime_fmt,
+			 localtime(&time_tmp));
 
 	/*
 	 * Calculate name of the WAL file containing the latest checkpoint's REDO
@@ -272,6 +276,12 @@ main(int argc, char *argv[])
 		   ControlFile->checkPointCopy.newestCommitTsXid);
 	printf(_("Time of latest checkpoint:            %s\n"),
 		   ckpttime_str);
+	printf(_("Last recovery LSN:                    %X/%X\n"),
+		   LSN_FORMAT_ARGS(ControlFile->lastRecoveryLSN));
+	printf(_("Last recovery time:                   %s\n"),
+		   last_recovery_time_str);
+	printf(_("Recovery count:                       %llu\n"),
+		   (unsigned long long) ControlFile->recoveryCount);
 	printf(_("Fake LSN counter for unlogged rels:   %X/%X\n"),
 		   LSN_FORMAT_ARGS(ControlFile->unloggedLSN));
 	printf(_("Minimum recovery ending location:     %X/%X\n"),
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index e3f48158ce7..c0e69b94f31 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -130,6 +130,10 @@ typedef struct ControlFileData
 
 	CheckPoint	checkPointCopy; /* copy of last check point record */
 
+	XLogRecPtr	lastRecoveryLSN;	/* redo LSN of last recovery checkpoint */
+	XLogRecPtr	lastRecoveryTime;	/* time of last recovery checkpoint */
+	uint64		recoveryCount;		/* number of recoveries since last reset */
+
 	XLogRecPtr	unloggedLSN;	/* current fake LSN value, for unlogged rels */
 
 	/*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index acbcae46070..655b7cf7984 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11486,9 +11486,9 @@
 { oid => '3443',
   descr => 'pg_controldata recovery state information as a function',
   proname => 'pg_control_recovery', provolatile => 'v', prorettype => 'record',
-  proargtypes => '', proallargtypes => '{pg_lsn,int4,pg_lsn,pg_lsn,bool}',
-  proargmodes => '{o,o,o,o,o}',
-  proargnames => '{min_recovery_end_lsn,min_recovery_end_timeline,backup_start_lsn,backup_end_lsn,end_of_backup_record_required}',
+  proargtypes => '', proallargtypes => '{pg_lsn,timestamptz,int8,pg_lsn,int4,pg_lsn,pg_lsn,bool}',
+  proargmodes => '{o,o,o,o,o,o,o,o}',
+  proargnames => '{last_recovery_lsn,last_recovery_time,recovery_count,min_recovery_end_lsn,min_recovery_end_timeline,backup_start_lsn,backup_end_lsn,end_of_backup_record_required}',
   prosrc => 'pg_control_recovery' },
 
 { oid => '3444',

Reply via email to