On Wed, Feb 16, 2022 at 9:04 AM Ashutosh Sharma <[email protected]> wrote:
> I don't think that's the use case of this patch. Unless there is some
> other valid reason, I would suggest you remove it.
Removed the function pg_verify_raw_wal_record. Robert and Greg also
voted for removal upthread.
> > > Should we add a function that returns the pointer to the first and
> > > probably the last WAL record in the WAL segment? This would help users
> > > to inspect the wal records in the entire wal segment if they wish to
> > > do so.
> >
> > Good point. One can do this already with pg_get_wal_records_info and
> > pg_walfile_name_offset. Usually, the LSN format itself can give an
> > idea about the WAL file it is in.
> >
> > postgres=# select lsn, pg_walfile_name_offset(lsn) from
> > pg_get_wal_records_info('0/5000000', '0/5FFFFFF') order by lsn asc
> > limit 1;
> > lsn | pg_walfile_name_offset
> > -----------+-------------------------------
> > 0/5000038 | (000000010000000000000005,56)
> > (1 row)
> >
> > postgres=# select lsn, pg_walfile_name_offset(lsn) from
> > pg_get_wal_records_info('0/5000000', '0/5FFFFFF') order by lsn desc
> > limit 1;
> > lsn | pg_walfile_name_offset
> > -----------+-------------------------------------
> > 0/5FFFFC0 | (000000010000000000000005,16777152)
> > (1 row)
> >
>
> The workaround you are suggesting is not very user friendly and FYKI
> pg_wal_records_info simply hangs at times when we specify the higher
> and lower limit of lsn in a wal file.
>
> To make things easier for the end users I would suggest we add a
> function that can return a valid first and last lsn in a walfile. The
> output of this function can be used to inspect the wal records in the
> entire wal file if they wish to do so and I am sure they will. So, it
> should be something like this:
>
> select first_valid_lsn, last_valid_lsn from
> pg_get_first_last_valid_wal_record('wal-segment-name');
>
> And above function can directly be used with pg_get_wal_records_info() like
>
> select
> pg_get_wal_records_info(pg_get_first_last_valid_wal_record('wal-segment'));
>
> I think this is a pretty basic ASK that we expect to be present in the
> module like this.
Added a new function that returns the first and last valid WAL record
LSN of a given WAL file.
> > > +PG_FUNCTION_INFO_V1(pg_get_raw_wal_record);
> > > +PG_FUNCTION_INFO_V1(pg_get_first_valid_wal_record_lsn);
> > > +PG_FUNCTION_INFO_V1(pg_verify_raw_wal_record);
> > > +PG_FUNCTION_INFO_V1(pg_get_wal_record_info);
> > > +PG_FUNCTION_INFO_V1(pg_get_wal_records_info);
> > >
> > > I think we should allow all these functions to be executed in wait and
> > > *nowait* mode. If a user specifies nowait mode, the function should
> > > return if no WAL data is present, rather than waiting for new WAL data
> > > to become available, default behaviour could be anything you like.
> >
> > Currently, pg_walinspect uses read_local_xlog_page which waits in the
> > while(1) loop if a future LSN is specified. As read_local_xlog_page is
> > an implementation of XLogPageReadCB, which doesn't have a wait/nowait
> > parameter, if we really need a wait/nowait mode behaviour, we need to
> > do extra things(either add a backend-level global wait variable, set
> > before XLogReadRecord, if set, read_local_xlog_page can just exit
> > without waiting and reset after the XLogReadRecord or add an extra
> > bool wait variable to XLogReaderState and use it in
> > read_local_xlog_page).
> >
>
> I am not asking to do any changes in the backend code. Please check -
> how pg_waldump does this when a user requests to stop once the endptr
> has reached. If not for all functions at least for a few functions we
> can do this if it is doable.
I've added a new function read_local_xlog_page_2 (similar to
read_local_xlog_page but works in wait and no wait mode) and the
callers can specify whether to wait or not wait using private_data.
Actually, I wanted to use the private_data structure of
read_local_xlog_page but the logical decoding already has context as
private_data, that is why I had to have a new function. I know it
creates a bit of duplicate code, but its cleaner than using
backend-local variables or additional flags in XLogReaderState or
adding wait/no-wait boolean to page_read callback. Any other
suggestions are welcome here.
With this, I'm able to have wait/no wait versions for any functions.
But for now, I'm having wait/no wait for two functions
(pg_get_wal_records_info and pg_get_wal_stats) for which it makes more
sense.
> > > +Datum
> > > +pg_get_wal_records_info(PG_FUNCTION_ARGS)
> > > +{
> > > +#define PG_GET_WAL_RECORDS_INFO_COLS 10
> > >
> > >
> > > We could probably have another variant of this function that would
> > > work even if the end pointer is not specified, in which case the
> > > default end pointer would be the last WAL record in the WAL segment.
> > > Currently it mandates the use of an end pointer which slightly reduces
> > > flexibility.
> >
> > Last WAL record in the WAL segment may not be of much use(one can
> > figure out the last valid WAL record in a wal file as mentioned
> > above), but the WAL records info till the latest current flush LSN of
> > the server would be a useful functionality. But that too, can be found
> > using something like "select lsn, prev_lsn, resource_manager from
> > pg_get_wal_records_info('0/8099568', pg_current_wal_lsn());"
> >
>
> What if a user wants to inspect all the valid wal records from a
> startptr (startlsn) and he doesn't know the endptr? Why should he/she
> be mandated to get the endptr and supply it to this function? I don't
> think we should force users to do that. I think this is again a very
> basic ASK that can be done in this version itself. It is not at all
> any advanced thing that we can think of doing in the future.
Agreed. Added new functions that emits wal records info/stats till the
end of the WAL at the moment.
> > > +
> > > +/*
> > > + * Get the first valid raw WAL record lsn.
> > > + */
> > > +Datum
> > > +pg_get_first_valid_wal_record_lsn(PG_FUNCTION_ARGS)
> > >
> > >
> > > I think this function should return a pointer to the nearest valid WAL
> > > record which can be the previous WAL record to the LSN entered by the
> > > user or the next WAL record. If a user unknowingly enters an lsn that
> > > does not exist then in such cases we should probably return the lsn of
> > > the previous WAL record instead of hanging or waiting for the new WAL
> > > record to arrive.
> >
> > Is it useful?
>
> It is useful in the same way as returning the next valid wal pointer
> is. Why should a user wait for the next valid wal pointer to be
> available instead the function should identify the previous valid wal
> record and return it and put an appropriate message to the user.
>
> If there's a strong reason, how about naming
> > pg_get_next_valid_wal_record_lsn returning the next valid wal record
> > LSN and pg_get_previous_valid_wal_record_lsn returning the previous
> > valid wal record LSN ? If you think having two functions is too much,
> > then, how about pg_get_first_valid_wal_record_lsn returning both the
> > next valid wal record LSN and its previous wal record LSN?
> >
>
> The latter one looks better.
Modified.
Attaching v5 patch set, please review it further.
Regards,
Bharath Rupireddy.
From 42e9f32a8579505adc38366a12c8c6d63f18fb83 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Fri, 25 Feb 2022 10:16:43 +0000
Subject: [PATCH v5] pg_walinspect
---
contrib/Makefile | 1 +
contrib/pg_walinspect/.gitignore | 4 +
contrib/pg_walinspect/Makefile | 26 +
contrib/pg_walinspect/pg_walinspect--1.0.sql | 146 +++
contrib/pg_walinspect/pg_walinspect.c | 1227 ++++++++++++++++++
contrib/pg_walinspect/pg_walinspect.control | 5 +
src/backend/access/transam/xlogreader.c | 14 +-
src/backend/access/transam/xlogutils.c | 142 ++
src/bin/pg_waldump/pg_waldump.c | 5 +
src/common/relpath.c | 18 +
src/include/access/xlog.h | 2 +-
src/include/access/xlog_internal.h | 2 +-
src/include/access/xlogreader.h | 2 -
src/include/access/xlogutils.h | 13 +
src/include/common/relpath.h | 1 +
15 files changed, 1597 insertions(+), 11 deletions(-)
create mode 100644 contrib/pg_walinspect/.gitignore
create mode 100644 contrib/pg_walinspect/Makefile
create mode 100644 contrib/pg_walinspect/pg_walinspect--1.0.sql
create mode 100644 contrib/pg_walinspect/pg_walinspect.c
create mode 100644 contrib/pg_walinspect/pg_walinspect.control
diff --git a/contrib/Makefile b/contrib/Makefile
index e3e221308b..705c6fc36b 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -40,6 +40,7 @@ SUBDIRS = \
pgrowlocks \
pgstattuple \
pg_visibility \
+ pg_walinspect \
postgres_fdw \
seg \
spi \
diff --git a/contrib/pg_walinspect/.gitignore b/contrib/pg_walinspect/.gitignore
new file mode 100644
index 0000000000..5dcb3ff972
--- /dev/null
+++ b/contrib/pg_walinspect/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/contrib/pg_walinspect/Makefile b/contrib/pg_walinspect/Makefile
new file mode 100644
index 0000000000..c92a97447f
--- /dev/null
+++ b/contrib/pg_walinspect/Makefile
@@ -0,0 +1,26 @@
+# contrib/pg_walinspect/Makefile
+
+MODULE_big = pg_walinspect
+OBJS = \
+ $(WIN32RES) \
+ pg_walinspect.o
+PGFILEDESC = "pg_walinspect - functions to inspect contents of PostgreSQL Write-Ahead Log"
+
+PG_CPPFLAGS = -I$(libpq_srcdir)
+SHLIB_LINK_INTERNAL = $(libpq)
+
+EXTENSION = pg_walinspect
+DATA = pg_walinspect--1.0.sql
+
+REGRESS = pg_walinspect
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_walinspect
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_walinspect/pg_walinspect--1.0.sql b/contrib/pg_walinspect/pg_walinspect--1.0.sql
new file mode 100644
index 0000000000..d1828841f0
--- /dev/null
+++ b/contrib/pg_walinspect/pg_walinspect--1.0.sql
@@ -0,0 +1,146 @@
+/* contrib/pg_walinspect/pg_walinspect--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION pg_walinspect" to load this file. \quit
+
+--
+-- pg_get_raw_wal_record()
+--
+CREATE FUNCTION pg_get_raw_wal_record(IN in_lsn pg_lsn,
+ OUT lsn pg_lsn,
+ OUT record bytea
+)
+AS 'MODULE_PATHNAME', 'pg_get_raw_wal_record'
+LANGUAGE C STRICT PARALLEL SAFE;
+
+GRANT EXECUTE ON FUNCTION pg_get_raw_wal_record(pg_lsn) TO pg_monitor;
+
+--
+-- pg_get_first_valid_wal_record_lsn()
+--
+CREATE FUNCTION pg_get_first_valid_wal_record_lsn(IN in_lsn pg_lsn,
+ OUT lsn pg_lsn,
+ OUT prev_lsn pg_lsn
+)
+AS 'MODULE_PATHNAME', 'pg_get_first_valid_wal_record_lsn'
+LANGUAGE C STRICT PARALLEL SAFE;
+
+GRANT EXECUTE ON FUNCTION pg_get_first_valid_wal_record_lsn(pg_lsn) TO pg_monitor;
+
+--
+-- pg_get_first_and_last_valid_wal_record_lsn()
+--
+CREATE FUNCTION pg_get_first_and_last_valid_wal_record_lsn(IN wal_file_name text,
+ OUT first_valid_lsn pg_lsn,
+ OUT last_valid_lsn pg_lsn
+)
+AS 'MODULE_PATHNAME', 'pg_get_first_and_last_valid_wal_record_lsn'
+LANGUAGE C STRICT PARALLEL SAFE;
+
+GRANT EXECUTE ON FUNCTION pg_get_first_and_last_valid_wal_record_lsn(text) TO pg_monitor;
+
+--
+-- pg_get_wal_record_info()
+--
+CREATE FUNCTION pg_get_wal_record_info(IN in_lsn pg_lsn,
+ OUT lsn pg_lsn,
+ OUT prev_lsn pg_lsn,
+ OUT xid xid,
+ OUT resource_manager text,
+ OUT length int4,
+ OUT total_length int4,
+ OUT description text,
+ OUT block_ref text,
+ OUT data bytea,
+ OUT data_len int4
+)
+AS 'MODULE_PATHNAME', 'pg_get_wal_record_info'
+LANGUAGE C STRICT PARALLEL SAFE;
+
+GRANT EXECUTE ON FUNCTION pg_get_wal_record_info(pg_lsn) TO pg_monitor;
+
+--
+-- pg_get_wal_records_info()
+--
+CREATE FUNCTION pg_get_wal_records_info(IN start_lsn pg_lsn,
+ IN end_lsn pg_lsn,
+ IN wait_for_wal boolean DEFAULT false,
+ OUT lsn pg_lsn,
+ OUT prev_lsn pg_lsn,
+ OUT xid xid,
+ OUT resource_manager text,
+ OUT length int4,
+ OUT total_length int4,
+ OUT description text,
+ OUT block_ref text,
+ OUT data bytea,
+ OUT data_len int4
+)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'pg_get_wal_records_info'
+LANGUAGE C STRICT PARALLEL SAFE;
+
+GRANT EXECUTE ON FUNCTION pg_get_wal_records_info(pg_lsn, pg_lsn, boolean) TO pg_monitor;
+
+--
+-- pg_get_wal_records_info_till_end_of_wal()
+--
+CREATE FUNCTION pg_get_wal_records_info_till_end_of_wal(IN start_lsn pg_lsn,
+ OUT lsn pg_lsn,
+ OUT prev_lsn pg_lsn,
+ OUT xid xid,
+ OUT resource_manager text,
+ OUT length int4,
+ OUT total_length int4,
+ OUT description text,
+ OUT block_ref text,
+ OUT data bytea,
+ OUT data_len int4
+)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'pg_get_wal_records_info_till_end_of_wal'
+LANGUAGE C STRICT PARALLEL SAFE;
+
+GRANT EXECUTE ON FUNCTION pg_get_wal_records_info_till_end_of_wal(pg_lsn) TO pg_monitor;
+
+--
+-- pg_get_wal_stats()
+--
+CREATE FUNCTION pg_get_wal_stats(IN start_lsn pg_lsn,
+ IN end_lsn pg_lsn,
+ IN wait_for_wal boolean DEFAULT false,
+ OUT resource_manager text,
+ OUT count int8,
+ OUT count_percentage float4,
+ OUT record_size int8,
+ OUT record_size_percentage float4,
+ OUT fpi_size int8,
+ OUT fpi_size_percentage float4,
+ OUT combined_size int8,
+ OUT combined_size_percentage float4
+)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'pg_get_wal_stats'
+LANGUAGE C STRICT PARALLEL SAFE;
+
+GRANT EXECUTE ON FUNCTION pg_get_wal_stats(pg_lsn, pg_lsn, boolean) TO pg_monitor;
+
+--
+-- pg_get_wal_stats_till_end_of_wal()
+--
+CREATE FUNCTION pg_get_wal_stats_till_end_of_wal(IN start_lsn pg_lsn,
+ OUT resource_manager text,
+ OUT count int8,
+ OUT count_percentage float4,
+ OUT record_size int8,
+ OUT record_size_percentage float4,
+ OUT fpi_size int8,
+ OUT fpi_size_percentage float4,
+ OUT combined_size int8,
+ OUT combined_size_percentage float4
+)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'pg_get_wal_stats_till_end_of_wal'
+LANGUAGE C STRICT PARALLEL SAFE;
+
+GRANT EXECUTE ON FUNCTION pg_get_wal_stats_till_end_of_wal(pg_lsn) TO pg_monitor;
diff --git a/contrib/pg_walinspect/pg_walinspect.c b/contrib/pg_walinspect/pg_walinspect.c
new file mode 100644
index 0000000000..02ea21c860
--- /dev/null
+++ b/contrib/pg_walinspect/pg_walinspect.c
@@ -0,0 +1,1227 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_walinspect.c
+ * Functions to inspect contents of PostgreSQL Write-Ahead Log
+ *
+ * Copyright (c) 2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/pg_walinspect/pg_walinspect.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/xlog.h"
+#include "access/xlog_internal.h"
+#include "access/xlogreader.h"
+#include "access/xlogutils.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "utils/builtins.h"
+#include "utils/pg_lsn.h"
+
+/*
+ * NOTE: For any code change or issue fix here, it is highly recommended to
+ * give a thought about doing the same in pg_waldump tool as well.
+ */
+
+PG_MODULE_MAGIC;
+
+#define MAX_XLINFO_TYPES 16
+
+#define IsEndOfWALReached(state) \
+ (state->private_data != NULL && \
+ (((ReadLocalXLOGPage2Private *) xlogreader->private_data)->no_wait == true) && \
+ (((ReadLocalXLOGPage2Private *) xlogreader->private_data)->reached_end_of_wal == true))
+
+typedef struct Stats
+{
+ uint64 count;
+ uint64 rec_len;
+ uint64 fpi_len;
+} Stats;
+
+typedef struct XLogRecStats
+{
+ uint64 count;
+ Stats rmgr_stats[RM_NEXT_ID];
+ Stats record_stats[RM_NEXT_ID][MAX_XLINFO_TYPES];
+} XLogRecStats;
+
+extern void _PG_init(void);
+extern void _PG_fini(void);
+
+PG_FUNCTION_INFO_V1(pg_get_raw_wal_record);
+PG_FUNCTION_INFO_V1(pg_get_first_valid_wal_record_lsn);
+PG_FUNCTION_INFO_V1(pg_get_wal_record_info);
+PG_FUNCTION_INFO_V1(pg_get_wal_records_info);
+PG_FUNCTION_INFO_V1(pg_get_wal_records_info_till_end_of_wal);
+PG_FUNCTION_INFO_V1(pg_get_wal_stats);
+PG_FUNCTION_INFO_V1(pg_get_wal_stats_till_end_of_wal);
+PG_FUNCTION_INFO_V1(pg_get_first_and_last_valid_wal_record_lsn);
+
+static XLogReaderState *InitXLogReaderState(XLogRecPtr lsn,
+ XLogRecPtr *first_record,
+ bool warning,
+ bool wait_for_wal);
+static XLogRecord *ReadNextXLogRecord(XLogReaderState *xlogreader,
+ XLogRecPtr first_record);
+static void GetXLogRecordInfo(XLogReaderState *record, XLogRecPtr lsn,
+ Datum *values, bool *nulls,
+ uint32 ncols);
+static void StoreXLogRecordStats(XLogRecStats * stats,
+ XLogReaderState *record);
+static void GetXLogSummaryStats(XLogRecStats * stats,
+ Tuplestorestate *tupstore, TupleDesc tupdesc,
+ Datum *values, bool *nulls,
+ uint32 ncols);
+static void FillXLogStatsRow(const char *name, uint64 n, uint64 total_count,
+ uint64 rec_len, uint64 total_rec_len,
+ uint64 fpi_len, uint64 total_fpi_len,
+ uint64 tot_len, uint64 total_len,
+ Datum *values, bool *nulls, uint32 ncols);
+static void GetWalStatsInternal(FunctionCallInfo fcinfo, XLogRecPtr start_lsn,
+ XLogRecPtr end_lsn, bool wait_for_wal);
+static void GetWALRecordsInfoInternal(FunctionCallInfo fcinfo,
+ XLogRecPtr start_lsn, XLogRecPtr end_lsn,
+ bool wait_for_wal);
+
+/*
+ * Module load callback.
+ */
+void
+_PG_init(void)
+{
+ /* Define custom GUCs and install hooks here, if any. */
+
+ /*
+ * Have EmitWarningsOnPlaceholders("pg_walinspect"); if custom GUCs are
+ * defined.
+ */
+
+}
+
+/*
+ * Module unload callback.
+ */
+void
+_PG_fini(void)
+{
+ /* Uninstall hooks, if any. */
+}
+
+/*
+ * Intialize WAL reader and identify first valid LSN.
+ */
+static XLogReaderState *
+InitXLogReaderState(XLogRecPtr lsn, XLogRecPtr *first_record,
+ bool warning, bool wait_for_wal)
+{
+ XLogReaderState *xlogreader;
+ ReadLocalXLOGPage2Private *private_data = NULL;
+
+ /*
+ * Reading WAL below the first page of the first sgements isn't allowed.
+ * This is a bootstrap WAL page and the page_read callback fails to read
+ * it.
+ */
+ if (lsn < XLOG_BLCKSZ)
+ ereport(ERROR,
+ (errmsg("could not read WAL at %X/%X",
+ LSN_FORMAT_ARGS(lsn))));
+
+ /*
+ * When told let's not wait for WAL to be available. Fill in the
+ * private_data so that the page_read callback can return whenever
+ * end-of-WAL is reached.
+ */
+ if (wait_for_wal == false)
+ {
+ private_data = (ReadLocalXLOGPage2Private *)
+ palloc0(sizeof(ReadLocalXLOGPage2Private));
+
+ private_data->no_wait = true;
+
+ /*
+ * page_read callback will set this to true when the end-of-WAL is
+ * reached.
+ */
+ private_data->reached_end_of_wal = false;
+ }
+
+ xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
+ XL_ROUTINE(.page_read = &read_local_xlog_page_2,
+ .segment_open = &wal_segment_open,
+ .segment_close = &wal_segment_close),
+ (void *) private_data);
+
+ if (xlogreader == NULL)
+ {
+ if (wait_for_wal == false)
+ pfree(private_data);
+
+ ereport(ERROR,
+ (errcode(ERRCODE_OUT_OF_MEMORY),
+ errmsg("out of memory"),
+ errdetail("Failed while allocating a WAL reading processor.")));
+ }
+
+ /* First find a valid recptr to start from. */
+ *first_record = XLogFindNextRecord(xlogreader, lsn);
+
+ if (XLogRecPtrIsInvalid(*first_record) &&
+ IsEndOfWALReached(xlogreader))
+ {
+ ereport(WARNING,
+ (errmsg("reached end of WAL")));
+
+ return xlogreader;
+ }
+ else if (XLogRecPtrIsInvalid(*first_record))
+ {
+ ereport(ERROR,
+ (errmsg("could not find a valid record after %X/%X",
+ LSN_FORMAT_ARGS(lsn))));
+ }
+
+ /*
+ * Display a message that we're skipping data if the given LSN wasn't a
+ * pointer to the start of a record and also wasn't a pointer to the
+ * beginning of a segment (e.g. we were used in file mode).
+ */
+ if (warning && *first_record != lsn &&
+ XLogSegmentOffset(lsn, wal_segment_size) != 0)
+ {
+ ereport(WARNING,
+ (errmsg_plural("first record is after %X/%X, at %X/%X, skipping over %u byte",
+ "first record is after %X/%X, at %X/%X, skipping over %u bytes",
+ (*first_record - lsn),
+ LSN_FORMAT_ARGS(lsn),
+ LSN_FORMAT_ARGS(*first_record),
+ (uint32) (*first_record - lsn))));
+ }
+
+ return xlogreader;
+}
+
+/*
+ * Read next WAL record.
+ */
+static XLogRecord *
+ReadNextXLogRecord(XLogReaderState *xlogreader, XLogRecPtr first_record)
+{
+ char *errormsg;
+ XLogRecord *record;
+
+ record = XLogReadRecord(xlogreader, &errormsg);
+
+ if (record == NULL &&
+ IsEndOfWALReached(xlogreader))
+ {
+ ereport(WARNING,
+ (errmsg("reached end of WAL")));
+
+ return record;
+ }
+ else if (record == NULL)
+ {
+ if (errormsg)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not read WAL at %X/%X: %s",
+ LSN_FORMAT_ARGS(first_record), errormsg)));
+ else
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not read WAL at %X/%X",
+ LSN_FORMAT_ARGS(first_record))));
+ }
+
+ return record;
+}
+
+/*
+ * Get raw WAL record.
+ *
+ * XXX: Currently, this function will not wait if the future WAL LSN is
+ * provided. We can have a wait version of it by using
+ * ReadLocalXLOGPage2Private.
+ */
+Datum
+pg_get_raw_wal_record(PG_FUNCTION_ARGS)
+{
+#define PG_GET_RAW_WAL_RECORD_COLS 2
+ XLogRecPtr lsn;
+ XLogRecord *record;
+ XLogRecPtr first_record;
+ XLogReaderState *xlogreader;
+ bytea *raw_record;
+ uint32 rec_len;
+ char *raw_record_data;
+ TupleDesc tupdesc;
+ Datum result;
+ HeapTuple tuple;
+ Datum values[PG_GET_RAW_WAL_RECORD_COLS];
+ bool nulls[PG_GET_RAW_WAL_RECORD_COLS];
+ int i = 0;
+
+ lsn = PG_GETARG_LSN(0);
+
+ /* Validate input. */
+ if (XLogRecPtrIsInvalid(lsn))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid WAL record LSN")));
+
+ /* Build a tuple descriptor for our result type. */
+ if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+ elog(ERROR, "return type must be a row type");
+
+ xlogreader = InitXLogReaderState(lsn, &first_record, true, false);
+
+ Assert(xlogreader);
+
+ if (XLogRecPtrIsInvalid(first_record) &&
+ IsEndOfWALReached(xlogreader))
+ {
+ pfree(xlogreader->private_data);
+ XLogReaderFree(xlogreader);
+
+ PG_RETURN_NULL();
+ }
+
+ record = ReadNextXLogRecord(xlogreader, first_record);
+
+ if (record == NULL &&
+ IsEndOfWALReached(xlogreader))
+ {
+ pfree(xlogreader->private_data);
+ XLogReaderFree(xlogreader);
+
+ PG_RETURN_NULL();
+ }
+
+ rec_len = XLogRecGetTotalLen(xlogreader);
+
+ Assert(rec_len > 0);
+
+ raw_record = (bytea *) palloc(rec_len + VARHDRSZ);
+ SET_VARSIZE(raw_record, rec_len + VARHDRSZ);
+ raw_record_data = VARDATA(raw_record);
+
+ memcpy(raw_record_data, record, rec_len);
+
+ if (xlogreader->private_data)
+ pfree(xlogreader->private_data);
+
+ XLogReaderFree(xlogreader);
+
+ MemSet(values, 0, sizeof(values));
+ MemSet(nulls, 0, sizeof(nulls));
+
+ values[i++] = LSNGetDatum(first_record);
+ values[i++] = PointerGetDatum(raw_record);
+
+ Assert(i == PG_GET_RAW_WAL_RECORD_COLS);
+
+ tuple = heap_form_tuple(tupdesc, values, nulls);
+ result = HeapTupleGetDatum(tuple);
+
+ PG_RETURN_DATUM(result);
+#undef PG_GET_RAW_WAL_RECORD_COLS
+}
+
+/*
+ * Get first valid raw WAL record LSN following the given LSN.
+ *
+ * XXX: Currently, this function will not wait if the future WAL LSN is
+ * provided. We can have a wait version of it by using
+ * ReadLocalXLOGPage2Private.
+ */
+Datum
+pg_get_first_valid_wal_record_lsn(PG_FUNCTION_ARGS)
+{
+#define PG_GET_FIRST_VALID_WAL_RECORD_LSN_COLS 2
+ XLogRecPtr lsn;
+ XLogRecord *record;
+ XLogRecPtr first_record;
+ XLogRecPtr prev_record;
+ XLogReaderState *xlogreader;
+ TupleDesc tupdesc;
+ Datum result;
+ HeapTuple tuple;
+ Datum values[PG_GET_FIRST_VALID_WAL_RECORD_LSN_COLS];
+ bool nulls[PG_GET_FIRST_VALID_WAL_RECORD_LSN_COLS];
+ int i = 0;
+
+ lsn = PG_GETARG_LSN(0);
+
+ /* Validate input. */
+ if (XLogRecPtrIsInvalid(lsn))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid WAL record LSN")));
+
+ /* Build a tuple descriptor for our result type. */
+ if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+ elog(ERROR, "return type must be a row type");
+
+ xlogreader = InitXLogReaderState(lsn, &first_record, true, false);
+
+ Assert(xlogreader);
+
+ if (XLogRecPtrIsInvalid(first_record) &&
+ IsEndOfWALReached(xlogreader))
+ {
+ pfree(xlogreader->private_data);
+ XLogReaderFree(xlogreader);
+
+ PG_RETURN_NULL();
+ }
+
+ record = ReadNextXLogRecord(xlogreader, first_record);
+
+ if (record == NULL &&
+ IsEndOfWALReached(xlogreader))
+ {
+ pfree(xlogreader->private_data);
+ XLogReaderFree(xlogreader);
+
+ PG_RETURN_NULL();
+ }
+
+ prev_record = XLogRecGetPrev(xlogreader);
+
+ if (xlogreader->private_data)
+ pfree(xlogreader->private_data);
+
+ XLogReaderFree(xlogreader);
+
+ /*
+ * Previous valid WAL record must be at an LSN lower than next valid WAL
+ * record LSN. Otherwise, it is an indication of something wrong, so error
+ * out.
+ */
+ if (prev_record >= first_record)
+ ereport(ERROR,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("record with incorrect prev-link %X/%X at %X/%X",
+ LSN_FORMAT_ARGS(prev_record),
+ LSN_FORMAT_ARGS(first_record))));
+
+ MemSet(values, 0, sizeof(values));
+ MemSet(nulls, 0, sizeof(nulls));
+
+ values[i++] = LSNGetDatum(first_record);
+ values[i++] = LSNGetDatum(prev_record);
+
+ Assert(i == PG_GET_FIRST_VALID_WAL_RECORD_LSN_COLS);
+
+ tuple = heap_form_tuple(tupdesc, values, nulls);
+ result = HeapTupleGetDatum(tuple);
+
+ PG_RETURN_DATUM(result);
+#undef PG_GET_FIRST_VALID_WAL_RECORD_LSN_COLS
+}
+
+/*
+ * Get first and last valid record LSN in a WAL file.
+ */
+Datum
+pg_get_first_and_last_valid_wal_record_lsn(PG_FUNCTION_ARGS)
+{
+#define PG_GET_FIRST_AND_LAST_VALID_WAL_RECORD_LSN_COLS 2
+ char *fname;
+ uint32 tli;
+ XLogSegNo segno;
+ XLogRecPtr lsn;
+ XLogRecPtr first_record;
+ XLogRecPtr last_record = InvalidXLogRecPtr;
+ XLogReaderState *xlogreader;
+ TupleDesc tupdesc;
+ Datum result;
+ HeapTuple tuple;
+ Datum values[PG_GET_FIRST_AND_LAST_VALID_WAL_RECORD_LSN_COLS];
+ bool nulls[PG_GET_FIRST_AND_LAST_VALID_WAL_RECORD_LSN_COLS];
+ int i = 0;
+
+ fname = text_to_cstring(PG_GETARG_TEXT_PP(0));
+
+ /* Validate input. */
+ if (!IsXLogFileName(fname))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid WAL file name")));
+
+ /* Looks like an xlog file. Parse its position. */
+ XLogFromFileName(fname, &tli, &segno, wal_segment_size);
+
+ XLogSegNoOffsetToRecPtr(segno, 0, wal_segment_size, lsn);
+
+ /* Build a tuple descriptor for our result type. */
+ if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+ elog(ERROR, "return type must be a row type");
+
+ xlogreader = InitXLogReaderState(lsn, &first_record, true, false);
+
+ Assert(xlogreader);
+
+ if (XLogRecPtrIsInvalid(first_record) &&
+ IsEndOfWALReached(xlogreader))
+ {
+ pfree(xlogreader->private_data);
+ XLogReaderFree(xlogreader);
+
+ PG_RETURN_NULL();
+ }
+
+ for (;;)
+ {
+ XLogRecord *record;
+ char cfname[MAXFNAMELEN];
+
+ record = ReadNextXLogRecord(xlogreader, first_record);
+
+ /* Exit loop if end-of-WAL is reached. */
+ if (record == NULL &&
+ IsEndOfWALReached(xlogreader))
+ break;
+
+ XLByteToSeg(xlogreader->currRecPtr, segno, wal_segment_size);
+ XLogFileName(cfname, tli, segno, wal_segment_size);
+
+ if (!IsXLogFileName(cfname))
+ continue;
+
+ if (strncmp(cfname, fname, XLOG_FNAME_LEN) == 0)
+ {
+ /*
+ * Given WAL file name and current record's WAL file name matches.
+ * The current record might be the last valid record.
+ */
+ last_record = xlogreader->currRecPtr;
+ }
+ else
+ {
+ /*
+ * Given WAL file name and current record's WAL file name doesn't
+ * match, so exit the loop. last_record contains the last valid
+ * record in the given WAL file.
+ */
+ break;
+ }
+ }
+
+ if (xlogreader->private_data)
+ pfree(xlogreader->private_data);
+
+ XLogReaderFree(xlogreader);
+
+ MemSet(values, 0, sizeof(values));
+ MemSet(nulls, 0, sizeof(nulls));
+
+ values[i++] = LSNGetDatum(first_record);
+ values[i++] = LSNGetDatum(last_record);
+
+ Assert(i == PG_GET_FIRST_AND_LAST_VALID_WAL_RECORD_LSN_COLS);
+
+ tuple = heap_form_tuple(tupdesc, values, nulls);
+ result = HeapTupleGetDatum(tuple);
+
+ PG_RETURN_DATUM(result);
+#undef PG_GET_FIRST_AND_LAST_VALID_WAL_RECORD_LSN_COLS
+}
+
+/*
+ * Calculate size of a record, split into !FPI and FPI parts.
+ */
+static void
+GetXLogRecordLen(XLogReaderState *record, uint32 *rec_len, uint32 *fpi_len)
+{
+ int block_id;
+
+ /*
+ * Calculate the amount of FPI data in the record.
+ *
+ * XXX: We peek into xlogreader's private decoded backup blocks for the
+ * bimg_len indicating the length of FPI data. It doesn't seem worth it to
+ * add an accessor macro for this.
+ */
+ *fpi_len = 0;
+ for (block_id = 0; block_id <= record->max_block_id; block_id++)
+ {
+ if (XLogRecHasBlockImage(record, block_id))
+ *fpi_len += record->blocks[block_id].bimg_len;
+ }
+
+ /*
+ * Calculate the length of the record as the total length - the length of
+ * all the block images.
+ */
+ *rec_len = XLogRecGetTotalLen(record) - *fpi_len;
+}
+
+/*
+ * Get WAL record info.
+ */
+static void
+GetXLogRecordInfo(XLogReaderState *record, XLogRecPtr lsn,
+ Datum *values, bool *nulls, uint32 ncols)
+{
+ const char *id;
+ const RmgrData *desc;
+ uint32 rec_len;
+ uint32 fpi_len;
+ RelFileNode rnode;
+ ForkNumber forknum;
+ BlockNumber blk;
+ int block_id;
+ StringInfoData rec_desc;
+ StringInfoData rec_blk_ref;
+ StringInfoData temp;
+ bytea *data;
+ char *main_data;
+ uint32 main_data_len;
+ int i = 0;
+
+ desc = &RmgrTable[XLogRecGetRmid(record)];
+
+ GetXLogRecordLen(record, &rec_len, &fpi_len);
+
+ values[i++] = LSNGetDatum(lsn);
+ values[i++] = LSNGetDatum(XLogRecGetPrev(record));
+ values[i++] = TransactionIdGetDatum(XLogRecGetXid(record));
+ values[i++] = CStringGetTextDatum(desc->rm_name);
+ values[i++] = UInt32GetDatum(rec_len);
+ values[i++] = UInt32GetDatum(XLogRecGetTotalLen(record));
+
+ initStringInfo(&rec_desc);
+
+ id = desc->rm_identify(XLogRecGetInfo(record));
+
+ if (id == NULL)
+ appendStringInfo(&rec_desc, "UNKNOWN (%x) ", XLogRecGetInfo(record) & ~XLR_INFO_MASK);
+ else
+ appendStringInfo(&rec_desc, "%s ", id);
+
+ initStringInfo(&temp);
+
+ desc->rm_desc(&temp, record);
+ appendStringInfo(&rec_desc, "%s", temp.data);
+
+ values[i++] = CStringGetTextDatum(rec_desc.data);
+
+ pfree(temp.data);
+
+ initStringInfo(&rec_blk_ref);
+
+ /* Block references (detailed format). */
+ for (block_id = 0; block_id <= record->max_block_id; block_id++)
+ {
+ if (!XLogRecHasBlockRef(record, block_id))
+ continue;
+
+ XLogRecGetBlockTag(record, block_id, &rnode, &forknum, &blk);
+
+ if (forknum != MAIN_FORKNUM)
+ {
+ appendStringInfo(&rec_blk_ref,
+ "blkref #%u: rel %u/%u/%u fork %s blk %u",
+ block_id, rnode.spcNode, rnode.dbNode,
+ rnode.relNode, get_forkname(forknum), blk);
+ }
+ else
+ {
+ appendStringInfo(&rec_blk_ref,
+ "blkref #%u: rel %u/%u/%u blk %u",
+ block_id, rnode.spcNode, rnode.dbNode,
+ rnode.relNode, blk);
+ }
+
+ if (XLogRecHasBlockImage(record, block_id))
+ {
+ uint8 bimg_info = record->blocks[block_id].bimg_info;
+
+ if (BKPIMAGE_COMPRESSED(bimg_info))
+ {
+ const char *method;
+
+ if ((bimg_info & BKPIMAGE_COMPRESS_PGLZ) != 0)
+ method = "pglz";
+ else if ((bimg_info & BKPIMAGE_COMPRESS_LZ4) != 0)
+ method = "lz4";
+ else
+ method = "unknown";
+
+ appendStringInfo(&rec_blk_ref, " (FPW%s); hole: offset: %u, length: %u, "
+ "compression saved: %u, method: %s",
+ XLogRecBlockImageApply(record, block_id) ?
+ "" : " for WAL verification",
+ record->blocks[block_id].hole_offset,
+ record->blocks[block_id].hole_length,
+ BLCKSZ -
+ record->blocks[block_id].hole_length -
+ record->blocks[block_id].bimg_len,
+ method);
+ }
+ else
+ {
+ appendStringInfo(&rec_blk_ref, " (FPW%s); hole: offset: %u, length: %u",
+ XLogRecBlockImageApply(record, block_id) ?
+ "" : " for WAL verification",
+ record->blocks[block_id].hole_offset,
+ record->blocks[block_id].hole_length);
+ }
+ }
+ }
+
+ values[i++] = CStringGetTextDatum(rec_blk_ref.data);
+
+ main_data_len = XLogRecGetDataLen(record);
+
+ data = (bytea *) palloc(main_data_len + VARHDRSZ);
+ SET_VARSIZE(data, main_data_len + VARHDRSZ);
+ main_data = VARDATA(data);
+
+ memcpy(main_data, XLogRecGetData(record), main_data_len);
+
+ values[i++] = PointerGetDatum(data);
+ values[i++] = UInt32GetDatum(main_data_len);
+
+ Assert(i == ncols);
+}
+
+/*
+ * Get the WAL record info and data.
+ *
+ * XXX: Currently, this function will not wait if the future WAL LSN is
+ * provided. We can have a wait version of it by using
+ * ReadLocalXLOGPage2Private.
+ */
+Datum
+pg_get_wal_record_info(PG_FUNCTION_ARGS)
+{
+#define PG_GET_WAL_RECORD_INFO_COLS 10
+ XLogRecPtr lsn;
+ XLogRecord *record;
+ XLogRecPtr first_record;
+ XLogReaderState *xlogreader;
+ TupleDesc tupdesc;
+ Datum result;
+ HeapTuple tuple;
+ Datum values[PG_GET_WAL_RECORD_INFO_COLS];
+ bool nulls[PG_GET_WAL_RECORD_INFO_COLS];
+
+ lsn = PG_GETARG_LSN(0);
+
+ /* Validate input. */
+ if (XLogRecPtrIsInvalid(lsn))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid WAL record LSN")));
+
+ /* Build a tuple descriptor for our result type. */
+ if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+ elog(ERROR, "return type must be a row type");
+
+ xlogreader = InitXLogReaderState(lsn, &first_record, true, false);
+
+ Assert(xlogreader);
+
+ if (XLogRecPtrIsInvalid(first_record) &&
+ IsEndOfWALReached(xlogreader))
+ {
+ pfree(xlogreader->private_data);
+ XLogReaderFree(xlogreader);
+
+ PG_RETURN_NULL();
+ }
+
+ record = ReadNextXLogRecord(xlogreader, first_record);
+
+ if (record == NULL &&
+ IsEndOfWALReached(xlogreader))
+ {
+ pfree(xlogreader->private_data);
+ XLogReaderFree(xlogreader);
+
+ PG_RETURN_NULL();
+ }
+
+ MemSet(values, 0, sizeof(values));
+ MemSet(nulls, 0, sizeof(nulls));
+
+ GetXLogRecordInfo(xlogreader, first_record, values, nulls,
+ PG_GET_WAL_RECORD_INFO_COLS);
+
+ if (xlogreader->private_data)
+ pfree(xlogreader->private_data);
+
+ XLogReaderFree(xlogreader);
+
+ tuple = heap_form_tuple(tupdesc, values, nulls);
+ result = HeapTupleGetDatum(tuple);
+
+ PG_RETURN_DATUM(result);
+#undef PG_GET_WAL_RECORD_INFO_COLS
+}
+
+/*
+ * Get info and data of all WAL records between start LSN and end LSN.
+ */
+static void
+GetWALRecordsInfoInternal(FunctionCallInfo fcinfo, XLogRecPtr start_lsn,
+ XLogRecPtr end_lsn, bool wait_for_wal)
+{
+#define PG_GET_WAL_RECORDS_INFO_COLS 10
+ XLogRecPtr first_record;
+ XLogReaderState *xlogreader;
+ ReturnSetInfo *rsinfo;
+ TupleDesc tupdesc;
+ Tuplestorestate *tupstore;
+ MemoryContext per_query_ctx;
+ MemoryContext oldcontext;
+ Datum values[PG_GET_WAL_RECORDS_INFO_COLS];
+ bool nulls[PG_GET_WAL_RECORDS_INFO_COLS];
+
+ rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+ /* Check to see if caller supports us returning a tuplestore. */
+ if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("set-valued function called in context that cannot accept a set")));
+ if (!(rsinfo->allowedModes & SFRM_Materialize))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("materialize mode required, but it is not allowed in this context")));
+
+ /* Build a tuple descriptor for our result type. */
+ if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+ elog(ERROR, "return type must be a row type");
+
+ /* Build tuplestore to hold the result rows. */
+ per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+ oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+ tupstore = tuplestore_begin_heap(true, false, work_mem);
+ rsinfo->returnMode = SFRM_Materialize;
+ rsinfo->setResult = tupstore;
+ rsinfo->setDesc = tupdesc;
+
+ MemoryContextSwitchTo(oldcontext);
+
+ xlogreader = InitXLogReaderState(start_lsn, &first_record, true,
+ wait_for_wal);
+
+ Assert(xlogreader);
+
+ if (wait_for_wal == false &&
+ XLogRecPtrIsInvalid(first_record) &&
+ IsEndOfWALReached(xlogreader))
+ {
+ pfree(xlogreader->private_data);
+ XLogReaderFree(xlogreader);
+ tuplestore_end(tupstore);
+
+ return;
+ }
+
+ MemSet(values, 0, sizeof(values));
+ MemSet(nulls, 0, sizeof(nulls));
+
+ for (;;)
+ {
+ XLogRecord *record;
+
+ record = ReadNextXLogRecord(xlogreader, first_record);
+
+ /* Exit loop if end-of-WAL is reached. */
+ if (record == NULL &&
+ IsEndOfWALReached(xlogreader))
+ break;
+
+ /* Exit loop if read up to end_lsn. */
+ if (!XLogRecPtrIsInvalid(end_lsn) &&
+ xlogreader->currRecPtr >= end_lsn)
+ break;
+
+ GetXLogRecordInfo(xlogreader, xlogreader->currRecPtr, values, nulls,
+ PG_GET_WAL_RECORDS_INFO_COLS);
+
+ tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+ }
+
+ if (xlogreader->private_data)
+ pfree(xlogreader->private_data);
+
+ XLogReaderFree(xlogreader);
+
+ /* Clean up and return the tuplestore. */
+ tuplestore_donestoring(tupstore);
+#undef PG_GET_WAL_RECORDS_INFO_COLS
+}
+
+/*
+ * Get info and data of all WAL records between start LSN and end LSN.
+ *
+ * This function will wait if the future WAL LSN is provided and wait_for_wal
+ * is true. By default, it doesn't wait for the future WAL LSN, but emits a
+ * warning and returns the available info.
+ */
+Datum
+pg_get_wal_records_info(PG_FUNCTION_ARGS)
+{
+ XLogRecPtr start_lsn;
+ XLogRecPtr end_lsn;
+ bool wait_for_wal;
+
+ start_lsn = PG_GETARG_LSN(0);
+ end_lsn = PG_GETARG_LSN(1);
+ wait_for_wal = PG_GETARG_BOOL(2);
+
+ /* Validate input. */
+ if (XLogRecPtrIsInvalid(start_lsn))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid WAL record start LSN")));
+
+ if (XLogRecPtrIsInvalid(end_lsn))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid WAL record end LSN")));
+
+ if (start_lsn >= end_lsn)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("WAL record start LSN must be less than end LSN")));
+
+ GetWALRecordsInfoInternal(fcinfo, start_lsn, end_lsn, wait_for_wal);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Get info and data of all WAL records from start LSN till end-of-WAL.
+ *
+ * This function is similar to pg_get_wal_records_info except that it doesn't
+ * have end LSN as input and it doesn't wait for future WAL.
+ */
+Datum
+pg_get_wal_records_info_till_end_of_wal(PG_FUNCTION_ARGS)
+{
+ XLogRecPtr start_lsn;
+
+ start_lsn = PG_GETARG_LSN(0);
+
+ /* Validate input. */
+ if (XLogRecPtrIsInvalid(start_lsn))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid WAL record start LSN")));
+
+ GetWALRecordsInfoInternal(fcinfo, start_lsn, InvalidXLogRecPtr,
+ false);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Store per-rmgr and per-record statistics for a given record.
+ */
+static void
+StoreXLogRecordStats(XLogRecStats * stats, XLogReaderState *record)
+{
+ RmgrId rmid;
+ uint8 recid;
+ uint32 rec_len;
+ uint32 fpi_len;
+
+ stats->count++;
+
+ rmid = XLogRecGetRmid(record);
+
+ GetXLogRecordLen(record, &rec_len, &fpi_len);
+
+ /* Update per-rmgr statistics. */
+ stats->rmgr_stats[rmid].count++;
+ stats->rmgr_stats[rmid].rec_len += rec_len;
+ stats->rmgr_stats[rmid].fpi_len += fpi_len;
+
+ /*
+ * Update per-record statistics, where the record is identified by a
+ * combination of the RmgrId and the four bits of the xl_info field that
+ * are the rmgr's domain (resulting in sixteen possible entries per
+ * RmgrId).
+ */
+ recid = XLogRecGetInfo(record) >> 4;
+
+ /*
+ * XACT records need to be handled differently. Those records use the
+ * first bit of those four bits for an optional flag variable and the
+ * following three bits for the opcode. We filter opcode out of xl_info
+ * and use it as the identifier of the record.
+ */
+ if (rmid == RM_XACT_ID)
+ recid &= 0x07;
+
+ stats->record_stats[rmid][recid].count++;
+ stats->record_stats[rmid][recid].rec_len += rec_len;
+ stats->record_stats[rmid][recid].fpi_len += fpi_len;
+}
+
+/*
+ * Fill single row of record counts and sizes for an rmgr or record.
+ */
+static void
+FillXLogStatsRow(const char *name,
+ uint64 n, uint64 total_count,
+ uint64 rec_len, uint64 total_rec_len,
+ uint64 fpi_len, uint64 total_fpi_len,
+ uint64 tot_len, uint64 total_len,
+ Datum *values, bool *nulls, uint32 ncols)
+{
+ double n_pct;
+ double rec_len_pct;
+ double fpi_len_pct;
+ double tot_len_pct;
+ int i = 0;
+
+ n_pct = 0;
+ if (total_count != 0)
+ n_pct = 100 * (double) n / total_count;
+
+ rec_len_pct = 0;
+ if (total_rec_len != 0)
+ rec_len_pct = 100 * (double) rec_len / total_rec_len;
+
+ fpi_len_pct = 0;
+ if (total_fpi_len != 0)
+ fpi_len_pct = 100 * (double) fpi_len / total_fpi_len;
+
+ tot_len_pct = 0;
+ if (total_len != 0)
+ tot_len_pct = 100 * (double) tot_len / total_len;
+
+ values[i++] = CStringGetTextDatum(name);
+ values[i++] = Int64GetDatum(n);
+ values[i++] = Float4GetDatum(n_pct);
+ values[i++] = Int64GetDatum(rec_len);
+ values[i++] = Float4GetDatum(rec_len_pct);
+ values[i++] = Int64GetDatum(fpi_len);
+ values[i++] = Float4GetDatum(fpi_len_pct);
+ values[i++] = Int64GetDatum(tot_len);
+ values[i++] = Float4GetDatum(tot_len_pct);
+
+ Assert(i == ncols);
+}
+
+/*
+ * Get summary statistics about the records seen so far.
+ */
+static void
+GetXLogSummaryStats(XLogRecStats * stats, Tuplestorestate *tupstore,
+ TupleDesc tupdesc, Datum *values, bool *nulls,
+ uint32 ncols)
+{
+ int ri;
+ uint64 total_count = 0;
+ uint64 total_rec_len = 0;
+ uint64 total_fpi_len = 0;
+ uint64 total_len = 0;
+
+ /*
+ * Each row shows its percentages of the total, so make a first pass to
+ * calculate column totals.
+ */
+ for (ri = 0; ri < RM_NEXT_ID; ri++)
+ {
+ total_count += stats->rmgr_stats[ri].count;
+ total_rec_len += stats->rmgr_stats[ri].rec_len;
+ total_fpi_len += stats->rmgr_stats[ri].fpi_len;
+ }
+ total_len = total_rec_len + total_fpi_len;
+
+ for (ri = 0; ri < RM_NEXT_ID; ri++)
+ {
+ uint64 count;
+ uint64 rec_len;
+ uint64 fpi_len;
+ uint64 tot_len;
+ const RmgrData *desc = &RmgrTable[ri];
+
+ count = stats->rmgr_stats[ri].count;
+ rec_len = stats->rmgr_stats[ri].rec_len;
+ fpi_len = stats->rmgr_stats[ri].fpi_len;
+ tot_len = rec_len + fpi_len;
+
+ FillXLogStatsRow(desc->rm_name, count, total_count, rec_len,
+ total_rec_len, fpi_len, total_fpi_len, tot_len,
+ total_len, values, nulls, ncols);
+
+ tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+ }
+}
+
+/*
+ * Get WAL stats between start LSN and end LSN.
+ */
+static void
+GetWalStatsInternal(FunctionCallInfo fcinfo, XLogRecPtr start_lsn,
+ XLogRecPtr end_lsn, bool wait_for_wal)
+{
+#define PG_GET_WAL_STATS_COLS 9
+ XLogRecPtr first_record;
+ XLogReaderState *xlogreader;
+ XLogRecStats stats;
+ ReturnSetInfo *rsinfo;
+ TupleDesc tupdesc;
+ Tuplestorestate *tupstore;
+ MemoryContext per_query_ctx;
+ MemoryContext oldcontext;
+ Datum values[PG_GET_WAL_STATS_COLS];
+ bool nulls[PG_GET_WAL_STATS_COLS];
+
+ rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+ /* Check to see if caller supports us returning a tuplestore. */
+ if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("set-valued function called in context that cannot accept a set")));
+ if (!(rsinfo->allowedModes & SFRM_Materialize))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("materialize mode required, but it is not allowed in this context")));
+
+ /* Build a tuple descriptor for our result type. */
+ if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+ elog(ERROR, "return type must be a row type");
+
+ /* Build tuplestore to hold the result rows. */
+ per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+ oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+ tupstore = tuplestore_begin_heap(true, false, work_mem);
+ rsinfo->returnMode = SFRM_Materialize;
+ rsinfo->setResult = tupstore;
+ rsinfo->setDesc = tupdesc;
+
+ MemoryContextSwitchTo(oldcontext);
+
+ xlogreader = InitXLogReaderState(start_lsn, &first_record, true,
+ wait_for_wal);
+
+ Assert(xlogreader);
+
+ if (wait_for_wal == false &&
+ XLogRecPtrIsInvalid(first_record) &&
+ IsEndOfWALReached(xlogreader))
+ {
+ pfree(xlogreader->private_data);
+ XLogReaderFree(xlogreader);
+ tuplestore_end(tupstore);
+
+ return;
+ }
+
+ MemSet(&stats, 0, sizeof(stats));
+
+ for (;;)
+ {
+ XLogRecord *record;
+
+ record = ReadNextXLogRecord(xlogreader, first_record);
+
+ /* Exit loop if end-of-WAL is reached. */
+ if (record == NULL &&
+ IsEndOfWALReached(xlogreader))
+ break;
+
+ /* Exit loop if read up to end_lsn. */
+ if (!XLogRecPtrIsInvalid(end_lsn) &&
+ xlogreader->currRecPtr >= end_lsn)
+ break;
+
+ StoreXLogRecordStats(&stats, xlogreader);
+ }
+
+ if (xlogreader->private_data)
+ pfree(xlogreader->private_data);
+
+ XLogReaderFree(xlogreader);
+
+ MemSet(values, 0, sizeof(values));
+ MemSet(nulls, 0, sizeof(nulls));
+
+ GetXLogSummaryStats(&stats, tupstore, tupdesc, values, nulls,
+ PG_GET_WAL_STATS_COLS);
+
+ /* Clean up and return the tuplestore. */
+ tuplestore_donestoring(tupstore);
+#undef PG_GET_WAL_STATS_COLS
+}
+
+/*
+ * Get WAL stats between start LSN and end LSN.
+ *
+ * This function will wait if the future WAL LSN is provided and wait_for_wal
+ * is true. By default, it doesn't wait for the future WAL LSN, but emits a
+ * warning and returns the available info.
+ */
+Datum
+pg_get_wal_stats(PG_FUNCTION_ARGS)
+{
+ XLogRecPtr start_lsn;
+ XLogRecPtr end_lsn;
+ bool wait_for_wal;
+
+ start_lsn = PG_GETARG_LSN(0);
+ end_lsn = PG_GETARG_LSN(1);
+ wait_for_wal = PG_GETARG_BOOL(2);
+
+ /* Validate input. */
+ if (XLogRecPtrIsInvalid(start_lsn))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid WAL record start LSN")));
+
+ if (XLogRecPtrIsInvalid(end_lsn))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid WAL record end LSN")));
+
+ if (start_lsn >= end_lsn)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("WAL record start LSN must be less than end LSN")));
+
+ GetWalStatsInternal(fcinfo, start_lsn, end_lsn, wait_for_wal);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Get WAL stats from start LSN till end-of-WAL.
+ *
+ * This function is similar to pg_get_wal_stats except that it doesn't have end
+ * LSN as input and it doesn't wait for future WAL.
+ */
+Datum
+pg_get_wal_stats_till_end_of_wal(PG_FUNCTION_ARGS)
+{
+ XLogRecPtr start_lsn;
+
+ start_lsn = PG_GETARG_LSN(0);
+
+ /* Validate input. */
+ if (XLogRecPtrIsInvalid(start_lsn))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid WAL record start LSN")));
+
+ GetWalStatsInternal(fcinfo, start_lsn, InvalidXLogRecPtr, false);
+
+ PG_RETURN_VOID();
+}
diff --git a/contrib/pg_walinspect/pg_walinspect.control b/contrib/pg_walinspect/pg_walinspect.control
new file mode 100644
index 0000000000..017e56a2bb
--- /dev/null
+++ b/contrib/pg_walinspect/pg_walinspect.control
@@ -0,0 +1,5 @@
+# pg_walinspect extension
+comment = 'functions to inspect contents of PostgreSQL Write-Ahead Log'
+default_version = '1.0'
+module_pathname = '$libdir/pg_walinspect'
+relocatable = true
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 35029cf97d..4d217cabbe 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -956,13 +956,6 @@ XLogReaderValidatePageHeader(XLogReaderState *state, XLogRecPtr recptr,
return true;
}
-#ifdef FRONTEND
-/*
- * Functions that are currently not needed in the backend, but are better
- * implemented inside xlogreader.c because of the internal facilities available
- * here.
- */
-
/*
* Find the first record with an lsn >= RecPtr.
*
@@ -1080,6 +1073,13 @@ err:
return InvalidXLogRecPtr;
}
+#ifdef FRONTEND
+/*
+ * Functions that are currently not needed in the backend, but are better
+ * implemented inside xlogreader.c because of the internal facilities available
+ * here.
+ */
+
#endif /* FRONTEND */
/*
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index 54d5f20734..4c33933249 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -968,6 +968,148 @@ read_local_xlog_page(XLogReaderState *state, XLogRecPtr targetPagePtr,
return count;
}
+/*
+ * XLogReaderRoutine->page_read callback for reading local xlog files
+ *
+ * This function is same as read_local_xlog_page except that it works in both
+ * wait and no wait mode. The callers can specify about waiting in private_data
+ * of XLogReaderState.
+ */
+int
+read_local_xlog_page_2(XLogReaderState *state, XLogRecPtr targetPagePtr,
+ int reqLen, XLogRecPtr targetRecPtr, char *cur_page)
+{
+ XLogRecPtr read_upto,
+ loc;
+ TimeLineID tli;
+ int count;
+ WALReadError errinfo;
+ TimeLineID currTLI;
+
+ loc = targetPagePtr + reqLen;
+
+ /* Loop waiting for xlog to be available if necessary */
+ while (1)
+ {
+ /*
+ * Determine the limit of xlog we can currently read to, and what the
+ * most recent timeline is.
+ */
+ if (!RecoveryInProgress())
+ read_upto = GetFlushRecPtr(&currTLI);
+ else
+ read_upto = GetXLogReplayRecPtr(&currTLI);
+ tli = currTLI;
+
+ /*
+ * Check which timeline to get the record from.
+ *
+ * We have to do it each time through the loop because if we're in
+ * recovery as a cascading standby, the current timeline might've
+ * become historical. We can't rely on RecoveryInProgress() because in
+ * a standby configuration like
+ *
+ * A => B => C
+ *
+ * if we're a logical decoding session on C, and B gets promoted, our
+ * timeline will change while we remain in recovery.
+ *
+ * We can't just keep reading from the old timeline as the last WAL
+ * archive in the timeline will get renamed to .partial by
+ * StartupXLOG().
+ *
+ * If that happens after our caller determined the TLI but before
+ * we actually read the xlog page, we might still try to read from the
+ * old (now renamed) segment and fail. There's not much we can do
+ * about this, but it can only happen when we're a leaf of a cascading
+ * standby whose primary gets promoted while we're decoding, so a
+ * one-off ERROR isn't too bad.
+ */
+ XLogReadDetermineTimeline(state, targetPagePtr, reqLen, tli);
+
+ if (state->currTLI == currTLI)
+ {
+
+ if (loc <= read_upto)
+ break;
+
+ /* Let's not wait for WAL to be available if indicated */
+ if (loc > read_upto &&
+ state->private_data != NULL)
+ {
+ ReadLocalXLOGPage2Private *private;
+
+ private = (ReadLocalXLOGPage2Private *) state->private_data;
+
+ if (private->no_wait == true)
+ {
+ private->reached_end_of_wal = true;
+ break;
+ }
+ }
+
+ CHECK_FOR_INTERRUPTS();
+ pg_usleep(1000L);
+ }
+ else
+ {
+ /*
+ * We're on a historical timeline, so limit reading to the switch
+ * point where we moved to the next timeline.
+ *
+ * We don't need to GetFlushRecPtr or GetXLogReplayRecPtr. We know
+ * about the new timeline, so we must've received past the end of
+ * it.
+ */
+ read_upto = state->currTLIValidUntil;
+
+ /*
+ * Setting tli to our wanted record's TLI is slightly wrong; the
+ * page might begin on an older timeline if it contains a timeline
+ * switch, since its xlog segment will have been copied from the
+ * prior timeline. This is pretty harmless though, as nothing
+ * cares so long as the timeline doesn't go backwards. We should
+ * read the page header instead; FIXME someday.
+ */
+ tli = state->currTLI;
+
+ /* No need to wait on a historical timeline */
+ break;
+ }
+ }
+
+ if (targetPagePtr + XLOG_BLCKSZ <= read_upto)
+ {
+ /*
+ * more than one block available; read only that block, have caller
+ * come back if they need more.
+ */
+ count = XLOG_BLCKSZ;
+ }
+ else if (targetPagePtr + reqLen > read_upto)
+ {
+ /* not enough data there */
+ return -1;
+ }
+ else
+ {
+ /* enough bytes available to satisfy the request */
+ count = read_upto - targetPagePtr;
+ }
+
+ /*
+ * Even though we just determined how much of the page can be validly read
+ * as 'count', read the whole page anyway. It's guaranteed to be
+ * zero-padded up to the page boundary if it's incomplete.
+ */
+ if (!WALRead(state, cur_page, targetPagePtr, XLOG_BLCKSZ, tli,
+ &errinfo))
+ WALReadRaiseError(&errinfo);
+
+ /* number of valid bytes in the buffer */
+ return count;
+}
+
/*
* Backend-specific convenience code to handle read errors encountered by
* WALRead().
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index a6251e1a96..f8d61b56ab 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -26,6 +26,11 @@
#include "getopt_long.h"
#include "rmgrdesc.h"
+/*
+ * NOTE: For any code change or issue fix here, it is highly recommended to
+ * give a thought about doing the same in pg_walinspect contrib module as well.
+ */
+
static const char *progname;
static int WalSegSz;
diff --git a/src/common/relpath.c b/src/common/relpath.c
index 636c96efd3..e8e3c44eae 100644
--- a/src/common/relpath.c
+++ b/src/common/relpath.c
@@ -40,6 +40,24 @@ const char *const forkNames[] = {
StaticAssertDecl(lengthof(forkNames) == (MAX_FORKNUM + 1),
"array length mismatch");
+/*
+ * get_forkname - return fork name given fork number
+ *
+ * This function is defined with "extern PGDLLIMPORT ..." in the core here so
+ * that the loadable modules can access it.
+ */
+const char *const
+get_forkname(ForkNumber num)
+{
+ /*
+ * As this function gets called by external modules, let's ensure that the
+ * fork number passed in is valid.
+ */
+ Assert(num > InvalidForkNumber && num <= MAX_FORKNUM);
+
+ return forkNames[num];
+}
+
/*
* forkname_to_number - look up fork number by name
*
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 4b45ac64db..f34f228563 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -31,7 +31,7 @@ extern XLogRecPtr XactLastRecEnd;
extern PGDLLIMPORT XLogRecPtr XactLastCommitEnd;
/* these variables are GUC parameters related to XLOG */
-extern int wal_segment_size;
+extern PGDLLIMPORT int wal_segment_size;
extern int min_wal_size_mb;
extern int max_wal_size_mb;
extern int wal_keep_size_mb;
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 849954a8e5..38fd51bafa 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -319,7 +319,7 @@ typedef struct RmgrData
struct XLogRecordBuffer *buf);
} RmgrData;
-extern const RmgrData RmgrTable[];
+extern PGDLLIMPORT const RmgrData RmgrTable[];
/*
* Exported to support xlog switching from checkpointer
diff --git a/src/include/access/xlogreader.h b/src/include/access/xlogreader.h
index 477f0efe26..d62d6ce7f8 100644
--- a/src/include/access/xlogreader.h
+++ b/src/include/access/xlogreader.h
@@ -276,9 +276,7 @@ extern void XLogReaderFree(XLogReaderState *state);
/* Position the XLogReader to given record */
extern void XLogBeginRead(XLogReaderState *state, XLogRecPtr RecPtr);
-#ifdef FRONTEND
extern XLogRecPtr XLogFindNextRecord(XLogReaderState *state, XLogRecPtr RecPtr);
-#endif /* FRONTEND */
/* Read the next XLog record. Returns NULL on end-of-WAL or failure */
extern struct XLogRecord *XLogReadRecord(XLogReaderState *state,
diff --git a/src/include/access/xlogutils.h b/src/include/access/xlogutils.h
index 64708949db..dd28640f07 100644
--- a/src/include/access/xlogutils.h
+++ b/src/include/access/xlogutils.h
@@ -75,6 +75,16 @@ typedef enum
* need to be replayed) */
} XLogRedoAction;
+/*
+ * private_data of read_local_xlog_page, that is, XLogReaderRoutine->page_read
+ * callback for reading local xlog files.
+ */
+typedef struct ReadLocalXLOGPage2Private
+{
+ bool no_wait;
+ bool reached_end_of_wal;
+} ReadLocalXLOGPage2Private;
+
extern XLogRedoAction XLogReadBufferForRedo(XLogReaderState *record,
uint8 buffer_id, Buffer *buf);
extern Buffer XLogInitBufferForRedo(XLogReaderState *record, uint8 block_id);
@@ -92,6 +102,9 @@ extern void FreeFakeRelcacheEntry(Relation fakerel);
extern int read_local_xlog_page(XLogReaderState *state,
XLogRecPtr targetPagePtr, int reqLen,
XLogRecPtr targetRecPtr, char *cur_page);
+extern int read_local_xlog_page_2(XLogReaderState *state,
+ XLogRecPtr targetPagePtr, int reqLen,
+ XLogRecPtr targetRecPtr, char *cur_page);
extern void wal_segment_open(XLogReaderState *state,
XLogSegNo nextSegNo,
TimeLineID *tli_p);
diff --git a/src/include/common/relpath.h b/src/include/common/relpath.h
index a4b5dc853b..3743f2e505 100644
--- a/src/include/common/relpath.h
+++ b/src/include/common/relpath.h
@@ -57,6 +57,7 @@ typedef enum ForkNumber
#define FORKNAMECHARS 4 /* max chars for a fork name */
extern const char *const forkNames[];
+extern PGDLLIMPORT const char *const get_forkname(ForkNumber num);
extern ForkNumber forkname_to_number(const char *forkName);
extern int forkname_chars(const char *str, ForkNumber *fork);
--
2.25.1
From 1d2c2fd51cf0ad8704782aab2f215dd8e37fcbc2 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Fri, 25 Feb 2022 06:49:42 +0000
Subject: [PATCH v5] pg_walinspect tests
---
.../pg_walinspect/expected/pg_walinspect.out | 122 ++++++++++++++++++
contrib/pg_walinspect/sql/pg_walinspect.sql | 86 ++++++++++++
2 files changed, 208 insertions(+)
create mode 100644 contrib/pg_walinspect/expected/pg_walinspect.out
create mode 100644 contrib/pg_walinspect/sql/pg_walinspect.sql
diff --git a/contrib/pg_walinspect/expected/pg_walinspect.out b/contrib/pg_walinspect/expected/pg_walinspect.out
new file mode 100644
index 0000000000..ba09c1caab
--- /dev/null
+++ b/contrib/pg_walinspect/expected/pg_walinspect.out
@@ -0,0 +1,122 @@
+CREATE EXTENSION pg_walinspect;
+SELECT pg_current_wal_lsn() AS wal_lsn1 \gset
+CREATE TABLE sample_tbl(col1 int, col2 int);
+INSERT INTO sample_tbl VALUES(1, 2);
+SELECT pg_current_wal_lsn() AS wal_lsn2 \gset
+INSERT INTO sample_tbl VALUES(2, 3);
+-- ===================================================================
+-- tests for input validation
+-- ===================================================================
+SELECT pg_get_raw_wal_record('0/0'); -- ERROR
+ERROR: invalid WAL record LSN
+SELECT pg_get_first_valid_wal_record_lsn('0/0'); -- ERROR
+ERROR: invalid WAL record LSN
+SELECT pg_get_wal_record_info('0/0'); -- ERROR
+ERROR: invalid WAL record LSN
+SELECT pg_get_wal_records_info('0/0', '0/0'); -- ERROR
+ERROR: invalid WAL record start LSN
+SELECT pg_get_wal_records_info(:'wal_lsn1', '0/0'); -- ERROR
+ERROR: invalid WAL record end LSN
+SELECT pg_get_wal_records_info(:'wal_lsn2', :'wal_lsn1'); -- start LSN >= end LSN, ERROR
+ERROR: WAL record start LSN must be less than end LSN
+SELECT pg_get_wal_stats('0/0', '0/0'); -- ERROR
+ERROR: invalid WAL record start LSN
+SELECT pg_get_wal_stats(:'wal_lsn1', '0/0'); -- ERROR
+ERROR: invalid WAL record end LSN
+SELECT pg_get_wal_stats(:'wal_lsn2', :'wal_lsn1'); -- start LSN >= end LSN, ERROR
+ERROR: WAL record start LSN must be less than end LSN
+-- ===================================================================
+-- tests for all function executions
+-- ===================================================================
+SELECT COUNT(*) >= 0 AS ok FROM pg_get_first_valid_wal_record_lsn(:'wal_lsn1');
+ ok
+----
+ t
+(1 row)
+
+SELECT lsn AS valid_wal_lsn1 FROM pg_get_first_valid_wal_record_lsn(:'wal_lsn1') \gset
+SELECT lsn AS valid_wal_lsn2 FROM pg_get_first_valid_wal_record_lsn(:'wal_lsn2') \gset
+SELECT COUNT(*) >= 0 AS ok FROM pg_get_raw_wal_record(:'valid_wal_lsn1');
+ ok
+----
+ t
+(1 row)
+
+SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_record_info(:'valid_wal_lsn1');
+ ok
+----
+ t
+(1 row)
+
+SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_records_info(:'valid_wal_lsn1', :'valid_wal_lsn2');
+ ok
+----
+ t
+(1 row)
+
+SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_records_info_till_end_of_wal(:'valid_wal_lsn2');
+WARNING: reached end of WAL
+ ok
+----
+ t
+(1 row)
+
+SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_stats(:'valid_wal_lsn1', :'valid_wal_lsn2');
+ ok
+----
+ t
+(1 row)
+
+SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_stats_till_end_of_wal(:'valid_wal_lsn2');
+WARNING: reached end of WAL
+ ok
+----
+ t
+(1 row)
+
+SELECT file_name AS wal_file_name FROM pg_walfile_name_offset(:'valid_wal_lsn1') \gset
+SELECT COUNT(*) = 1 AS ok FROM pg_switch_wal();
+ ok
+----
+ t
+(1 row)
+
+INSERT INTO sample_tbl VALUES(3, 4);
+SELECT COUNT(*) >= 0 AS ok FROM pg_get_first_and_last_valid_wal_record_lsn(:'wal_file_name');
+ ok
+----
+ t
+(1 row)
+
+-- ===================================================================
+-- tests for functions with future WAL
+-- ===================================================================
+SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_records_info(:'valid_wal_lsn1', 'FFFFFFFF/FFFFFFFF'); -- WARNING
+WARNING: reached end of WAL
+ ok
+----
+ t
+(1 row)
+
+SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_stats(:'valid_wal_lsn1', 'FFFFFFFF/FFFFFFFF'); -- WARNING
+WARNING: reached end of WAL
+ ok
+----
+ t
+(1 row)
+
+-- ===================================================================
+-- tests for filtering out WAL records of a particular table
+-- ===================================================================
+SELECT oid AS sample_tbl_oid FROM pg_class WHERE relname = 'sample_tbl' \gset
+SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_records_info(:'valid_wal_lsn1', :'valid_wal_lsn2')
+ WHERE block_ref LIKE concat('%', :'sample_tbl_oid', '%') AND resource_manager = 'Heap';
+ ok
+----
+ t
+(1 row)
+
+-- ===================================================================
+-- clean up
+-- ===================================================================
+DROP TABLE sample_tbl;
diff --git a/contrib/pg_walinspect/sql/pg_walinspect.sql b/contrib/pg_walinspect/sql/pg_walinspect.sql
new file mode 100644
index 0000000000..5c4253787d
--- /dev/null
+++ b/contrib/pg_walinspect/sql/pg_walinspect.sql
@@ -0,0 +1,86 @@
+CREATE EXTENSION pg_walinspect;
+
+SELECT pg_current_wal_lsn() AS wal_lsn1 \gset
+
+CREATE TABLE sample_tbl(col1 int, col2 int);
+
+INSERT INTO sample_tbl VALUES(1, 2);
+
+SELECT pg_current_wal_lsn() AS wal_lsn2 \gset
+
+INSERT INTO sample_tbl VALUES(2, 3);
+
+-- ===================================================================
+-- tests for input validation
+-- ===================================================================
+
+SELECT pg_get_raw_wal_record('0/0'); -- ERROR
+
+SELECT pg_get_first_valid_wal_record_lsn('0/0'); -- ERROR
+
+SELECT pg_get_wal_record_info('0/0'); -- ERROR
+
+SELECT pg_get_wal_records_info('0/0', '0/0'); -- ERROR
+
+SELECT pg_get_wal_records_info(:'wal_lsn1', '0/0'); -- ERROR
+
+SELECT pg_get_wal_records_info(:'wal_lsn2', :'wal_lsn1'); -- start LSN >= end LSN, ERROR
+
+SELECT pg_get_wal_stats('0/0', '0/0'); -- ERROR
+
+SELECT pg_get_wal_stats(:'wal_lsn1', '0/0'); -- ERROR
+
+SELECT pg_get_wal_stats(:'wal_lsn2', :'wal_lsn1'); -- start LSN >= end LSN, ERROR
+
+-- ===================================================================
+-- tests for all function executions
+-- ===================================================================
+
+SELECT COUNT(*) >= 0 AS ok FROM pg_get_first_valid_wal_record_lsn(:'wal_lsn1');
+
+SELECT lsn AS valid_wal_lsn1 FROM pg_get_first_valid_wal_record_lsn(:'wal_lsn1') \gset
+
+SELECT lsn AS valid_wal_lsn2 FROM pg_get_first_valid_wal_record_lsn(:'wal_lsn2') \gset
+
+SELECT COUNT(*) >= 0 AS ok FROM pg_get_raw_wal_record(:'valid_wal_lsn1');
+
+SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_record_info(:'valid_wal_lsn1');
+
+SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_records_info(:'valid_wal_lsn1', :'valid_wal_lsn2');
+
+SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_records_info_till_end_of_wal(:'valid_wal_lsn2');
+
+SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_stats(:'valid_wal_lsn1', :'valid_wal_lsn2');
+
+SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_stats_till_end_of_wal(:'valid_wal_lsn2');
+
+SELECT file_name AS wal_file_name FROM pg_walfile_name_offset(:'valid_wal_lsn1') \gset
+
+SELECT COUNT(*) = 1 AS ok FROM pg_switch_wal();
+
+INSERT INTO sample_tbl VALUES(3, 4);
+
+SELECT COUNT(*) >= 0 AS ok FROM pg_get_first_and_last_valid_wal_record_lsn(:'wal_file_name');
+
+-- ===================================================================
+-- tests for functions with future WAL
+-- ===================================================================
+
+SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_records_info(:'valid_wal_lsn1', 'FFFFFFFF/FFFFFFFF'); -- WARNING
+
+SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_stats(:'valid_wal_lsn1', 'FFFFFFFF/FFFFFFFF'); -- WARNING
+
+-- ===================================================================
+-- tests for filtering out WAL records of a particular table
+-- ===================================================================
+
+SELECT oid AS sample_tbl_oid FROM pg_class WHERE relname = 'sample_tbl' \gset
+
+SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_records_info(:'valid_wal_lsn1', :'valid_wal_lsn2')
+ WHERE block_ref LIKE concat('%', :'sample_tbl_oid', '%') AND resource_manager = 'Heap';
+
+-- ===================================================================
+-- clean up
+-- ===================================================================
+
+DROP TABLE sample_tbl;
--
2.25.1
From d1f3c3a295b842fbf6a3a9a040cfbd33e52d7333 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Fri, 25 Feb 2022 07:30:49 +0000
Subject: [PATCH v5] pg_walinspect docs
---
doc/src/sgml/contrib.sgml | 1 +
doc/src/sgml/filelist.sgml | 1 +
doc/src/sgml/pgwalinspect.sgml | 165 +++++++++++++++++++++++++++++++++
3 files changed, 167 insertions(+)
create mode 100644 doc/src/sgml/pgwalinspect.sgml
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index be9711c6f2..19614a42e1 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -130,6 +130,7 @@ CREATE EXTENSION <replaceable>module_name</replaceable>;
&pgsurgery;
&pgtrgm;
&pgvisibility;
+ &pgwalinspect;
&postgres-fdw;
&seg;
&sepgsql;
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 328cd1f378..a2e8fd4a08 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -146,6 +146,7 @@
<!ENTITY pgsurgery SYSTEM "pgsurgery.sgml">
<!ENTITY pgtrgm SYSTEM "pgtrgm.sgml">
<!ENTITY pgvisibility SYSTEM "pgvisibility.sgml">
+<!ENTITY pgwalinspect SYSTEM "pgwalinspect.sgml">
<!ENTITY postgres-fdw SYSTEM "postgres-fdw.sgml">
<!ENTITY seg SYSTEM "seg.sgml">
<!ENTITY contrib-spi SYSTEM "contrib-spi.sgml">
diff --git a/doc/src/sgml/pgwalinspect.sgml b/doc/src/sgml/pgwalinspect.sgml
new file mode 100644
index 0000000000..fa9f1d9b6c
--- /dev/null
+++ b/doc/src/sgml/pgwalinspect.sgml
@@ -0,0 +1,165 @@
+<!-- doc/src/sgml/pgwalinspect.sgml -->
+
+<sect1 id="pgwalinspect" xreflabel="pg_walinspect">
+ <title>pg_walinspect</title>
+
+ <indexterm zone="pgwalinspect">
+ <primary>pg_walinspect</primary>
+ </indexterm>
+
+ <para>
+ The <filename>pg_walinspect</filename> module provides functions that allow
+ you to inspect the contents of write-ahead log of <productname>PostgreSQL</productname>
+ database cluster at a low level, which is useful for debugging and analytical
+ purposes.
+ </para>
+
+ <para>
+ By default, use of these functions is restricted to superusers and members of
+ the <literal>pg_monitor</literal> role. Access may be granted to others using
+ <command>GRANT</command>.
+ </para>
+
+ <sect2>
+ <title>General Functions</title>
+
+ <variablelist>
+ <varlistentry>
+ <term>
+ <function>pg_get_raw_wal_record(in_lsn pg_lsn, lsn OUT pg_lsn, record OUT bytea)</function>
+ </term>
+
+ <listitem>
+ <para>
+ Gets raw WAL record data of a given LSN. Issues a warning if the given
+ LSN wasn't a pointer to the start of a record and also wasn't a pointer
+ to the beginning of a WAL segment file. This function will not wait if
+ the future WAL LSN is provided, instead emits a warning and returns an
+ empty row.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <function>pg_get_first_valid_wal_record_lsn(in_lsn pg_lsn, lsn OUT pg_lsn, prev_lsn OUT pg_lsn)</function>
+ </term>
+
+ <listitem>
+ <para>
+ Gets first and previous valid WAL record LSNs of the given LSN. Issues
+ a warning if the given LSN wasn't a pointer to the start of a record and
+ also wasn't a pointer to the beginning of a WAL segment file. This
+ function will not wait if the future WAL LSN is provided, instead emits
+ a warning and returns an empty row.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <function>pg_get_first_and_last_valid_wal_record_lsn(wal_file_name text, first_valid_lsn OUT pg_lsn, last_valid_lsn OUT pg_lsn)</function>
+ </term>
+
+ <listitem>
+ <para>
+ Gets first and last valid WAL record LSNs of the given WAL file.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <function>pg_get_wal_record_info(in_lsn pg_lsn, lsn OUT pg_lsn, prev_lsn OUT pg_lsn, xid OUT xid, resource_manager OUT text, length OUT int4, total_length OUT int4, description OUT text, block_ref OUT text, data OUT bytea, data_len OUT int4)</function>
+ </term>
+
+ <listitem>
+ <para>
+ Gets WAL record information of the given LSN. Issues a warning if the
+ given LSN wasn't a pointer to the start of a record and also wasn't a
+ pointer to the beginning of a WAL segment file. This function will not
+ wait if the future WAL LSN is provided, instead emits a warning and
+ returns an empty row.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <function>pg_get_wal_records_info(start_lsn pg_lsn, end_lsn pg_lsn, wait_for_wal boolean DEFAULT false, lsn OUT pg_lsn, prev_lsn OUT pg_lsn, xid OUT xid, resource_manager OUT text, length OUT int4, total_length OUT int4, description OUT text, block_ref OUT text, data OUT bytea, data_len OUT int4)</function>
+ </term>
+
+ <listitem>
+ <para>
+ Gets information of all the valid WAL records between
+ <replaceable>start_lsn</replaceable> and <replaceable>end_lsn</replaceable>.
+ Returns one row per each valid WAL record. Issues a warning if the given
+ <replaceable>start_lsn</replaceable> wasn't a pointer to the start of a
+ record and also wasn't a pointer to the beginning of a WAL segment file.
+ This function will wait if the future WAL LSN is provided when
+ <replaceable>wait_for_wal</replaceable> is passed as <literal>true</literal>.
+ If <replaceable>wait_for_wal</replaceable> is <literal>false</literal>
+ (default value), it will not wait for the future WAL, instead emits a
+ warning and returns rows for whatever available WAL records.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <function>pg_get_wal_records_info_till_end_of_wal(start_lsn pg_lsn, lsn OUT pg_lsn, prev_lsn OUT pg_lsn, xid OUT xid, resource_manager OUT text, length OUT int4, total_length OUT int4, description OUT text, block_ref OUT text, data OUT bytea, data_len OUT int4)</function>
+ </term>
+
+ <listitem>
+ <para>
+ Gets information of all the valid WAL records from
+ <replaceable>start_lsn</replaceable> till end of WAL. This function is
+ similar to <literal>pg_get_wal_records_info</literal> except that it
+ doesn't have <replaceable>end_lsn</replaceable> as input and it doesn't
+ wait for the future WAL.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <function>pg_get_wal_stats(start_lsn pg_lsn, end_lsn pg_lsn, wait_for_wal boolean DEFAULT false, resource_manager OUT text, count OUT int8, count_percentage OUT float4, record_size OUT int8, record_size_percentage OUT float4, fpi_size OUT int8, fpi_size_percentage OUT float4, combined_size OUT int8, combined_size_percentage OUT float4)</function>
+ </term>
+
+ <listitem>
+ <para>
+ Gets statistics of all the valid WAL records between
+ <replaceable>start_lsn</replaceable> and <replaceable>end_lsn</replaceable>.
+ Returns one row per each <replaceable>resource_manager</replaceable>
+ type. Issues a warning if the given <replaceable>start_lsn</replaceable>
+ wasn't a pointer to the start of a record and also wasn't a pointer to
+ the beginning of a WAL segment file. This function will wait if the
+ future WAL LSN is provided when <replaceable>wait_for_wal</replaceable>
+ is passed as <literal>true</literal>. If <replaceable>wait_for_wal</replaceable>
+ is <literal>false</literal> (default value), it will not wait for the
+ future WAL, instead emits a warning and returns rows for whatever
+ available WAL records.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <function>pg_get_wal_stats_till_end_of_wal(start_lsn pg_lsn, end_lsn pg_lsn, resource_manager OUT text, count OUT int8, count_percentage OUT float4, record_size OUT int8, record_size_percentage OUT float4, fpi_size OUT int8, fpi_size_percentage OUT float4, combined_size OUT int8, combined_size_percentage OUT float4)</function>
+ </term>
+
+ <listitem>
+ <para>
+ Gets statistics of all the valid WAL records from
+ <replaceable>start_lsn</replaceable> till end of WAL. This function is
+ similar to <literal>pg_get_wal_stats</literal> except that it doesn't
+ have <replaceable>end_lsn</replaceable> as input and it doesn't wait for
+ the future WAL.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+ </sect2>
+
+</sect1>
--
2.25.1