On Tue, Mar 7, 2023 at 11:14 PM Nathan Bossart <nathandboss...@gmail.com> wrote: > > On Tue, Mar 07, 2023 at 12:39:13PM +0530, Bharath Rupireddy wrote: > > On Tue, Mar 7, 2023 at 3:30 AM Nathan Bossart <nathandboss...@gmail.com> > > wrote: > >> Is it possible to memcpy more than a page at a time? > > > > It would complicate things a lot there; the logic to figure out the > > last page bytes that may or may not fit in the whole page gets > > complicated. Also, the logic to verify each page's header gets > > complicated. We might lose out if we memcpy all the pages at once and > > start verifying each page's header in another loop. > > Doesn't the complicated logic you describe already exist to some extent in > the patch? You are copying a page at a time, which involves calculating > various addresses and byte counts.
Okay here I am with the v10 patch set attached that avoids multiple memcpy calls which must benefit the callers who want to read more than 1 WAL buffer page (streaming replication WAL sender for instance). > >> + elog(DEBUG1, "read %zu bytes out of %zu bytes from WAL buffers for > >> given LSN %X/%X, Timeline ID %u", > >> + *read_bytes, count, LSN_FORMAT_ARGS(startptr), tli); > >> > >> I definitely don't think we should put an elog() in this code path. > >> Perhaps this should be guarded behind WAL_DEBUG. > > > > Placing it behind WAL_DEBUG doesn't help users/developers. My > > intention was to let users know that the WAL read hit the buffers, > > it'll help them report if any issue occurs and also help developers to > > debug that issue. > > I still think an elog() is mighty expensive for this code path, even when > it doesn't actually produce any messages. And when it does, I think it has > the potential to be incredibly noisy. Well, my motive was to have a way for the user to know WAL buffer hits and misses to report any found issues. However, I have a plan later to add WAL buffer stats (hits/misses). I understand that even if someone enables DEBUG1, this message can bloat server log files and make recovery slower, especially on a standby. Hence, I agree to keep these logs behind the WAL_DEBUG macro like others and did so in the attached v10 patch set. Please review the attached v10 patch set further. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
From aa6454d9abb9a70b728dbba7f40279108486a3e4 Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Date: Tue, 14 Mar 2023 07:30:09 +0000 Subject: [PATCH v10] Improve WALRead() to suck data directly from WAL buffers --- src/backend/access/transam/xlog.c | 171 ++++++++++++++++++++++++ src/backend/access/transam/xlogreader.c | 42 +++++- src/include/access/xlog.h | 6 + 3 files changed, 217 insertions(+), 2 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 543d4d897a..d40b9562e1 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -1639,6 +1639,177 @@ GetXLogBuffer(XLogRecPtr ptr, TimeLineID tli) return cachedPos + ptr % XLOG_BLCKSZ; } +/* + * Read WAL from WAL buffers. + * + * Read 'count' bytes of WAL from WAL buffers into 'buf', starting at location + * 'startptr', on timeline 'tli' and return total read bytes. + * + * Note that this function reads as much as it can from WAL buffers, meaning, + * it may not read all the requested 'count' bytes. Caller must be aware of + * this and deal with it. + */ +Size +XLogReadFromBuffers(XLogReaderState *state PG_USED_FOR_ASSERTS_ONLY, + XLogRecPtr startptr, + TimeLineID tli, + Size count, + char *buf) +{ + XLogRecPtr ptr = startptr; + Size nbytes = count; /* total bytes requested to be read by caller */ + Size ntotal = 0; /* total bytes read */ + Size nbatch = 0; /* bytes to be read in single batch */ + char *batchstart = NULL; /* location to read from for single batch */ + + Assert(!XLogRecPtrIsInvalid(startptr)); + Assert(count > 0); + Assert(startptr <= GetFlushRecPtr(NULL)); + Assert(!RecoveryInProgress()); + Assert(tli == GetWALInsertionTimeLine()); + + /* + * Holding WALBufMappingLock ensures inserters don't overwrite this value + * while we are reading it. We try to acquire it in shared mode so that the + * concurrent WAL readers are also allowed. We try to do as less work as + * possible while holding the lock to not impact concurrent WAL writers + * much. We quickly exit to not cause any contention, if the lock isn't + * immediately available. + */ + if (!LWLockConditionalAcquire(WALBufMappingLock, LW_SHARED)) + return ntotal; + + while (nbytes > 0) + { + XLogRecPtr expectedEndPtr; + XLogRecPtr endptr; + int idx; + char *page; + char *data; + XLogPageHeader phdr; + + idx = XLogRecPtrToBufIdx(ptr); + expectedEndPtr = ptr; + expectedEndPtr += XLOG_BLCKSZ - ptr % XLOG_BLCKSZ; + endptr = XLogCtl->xlblocks[idx]; + + /* Requested WAL isn't available in WAL buffers. */ + if (expectedEndPtr != endptr) + break; + + /* + * We found WAL buffer page containing given XLogRecPtr. Get starting + * address of the page and a pointer to the right location of given + * XLogRecPtr in that page. + */ + page = XLogCtl->pages + idx * (Size) XLOG_BLCKSZ; + data = page + ptr % XLOG_BLCKSZ; + + /* + * The fact that we acquire WALBufMappingLock while reading the WAL + * buffer page itself guarantees that no one else initializes it or + * makes it ready for next use in AdvanceXLInsertBuffer(). However, we + * need to ensure that we are not reading a page that just got + * initialized. For this, we looka at the needed page header. + */ + phdr = (XLogPageHeader) page; + + /* Return, if WAL buffer page doesn't look valid. */ + if (!(phdr->xlp_magic == XLOG_PAGE_MAGIC && + phdr->xlp_pageaddr == (ptr - (ptr % XLOG_BLCKSZ)) && + phdr->xlp_tli == tli)) + break; + + /* + * Note that we don't perform all page header checks here to avoid + * extra work in production builds, callers will anyway do those checks + * extensively. However, in an assert-enabled build, we perform all the + * checks here and raise an error if failed. + */ +#ifdef USE_ASSERT_CHECKING + if (state != NULL && + !XLogReaderValidatePageHeader(state, (endptr - XLOG_BLCKSZ), + (char *) phdr)) + ereport(ERROR, + (errcode(ERRCODE_INTERNAL_ERROR), + errmsg_internal("error while reading WAL from WAL buffers: %s", state->errormsg_buf))); +#endif + + /* Count what is wanted, not the whole page. */ + if ((data + nbytes) <= (page + XLOG_BLCKSZ)) + { + /* All the bytes are in one page. */ + nbatch += nbytes; + ntotal += nbytes; + nbytes = 0; + } + else + { + Size navailable; + + /* + * All the bytes are not in one page. Deduce available bytes on the + * current page, count them and continue to look for remaining + * bytes. + */ + navailable = XLOG_BLCKSZ - (data - page); + Assert(navailable > 0 && navailable <= nbytes); + ptr += navailable; + nbytes -= navailable; + nbatch += navailable; + ntotal += navailable; + } + + /* + * We avoid multiple memcpy calls while reading WAL. Note that we + * memcpy what we have counted so far whenever we are wrapping around + * WAL buffers (because WAL buffers are organized as cirucular array of + * pages) and continue to look for remaining WAL. + */ + if (batchstart == NULL) + { + /* Mark where the data in WAL buffers starts from. */ + batchstart = data; + } + + /* + * We are wrapping around WAL buffers, so read what we have counted so + * far. + */ + if (idx == XLogCtl->XLogCacheBlck) + { + Assert(batchstart != NULL); + Assert(nbatch > 0); + + memcpy(buf, batchstart, nbatch); + buf += nbatch; + + /* Reset for next batch. */ + batchstart = NULL; + nbatch = 0; + } + } + + /* Read what we have counted so far. */ + Assert(nbatch <= ntotal); + if (batchstart != NULL && nbatch > 0) + memcpy(buf, batchstart, nbatch); + + LWLockRelease(WALBufMappingLock); + + /* We never read more than what the caller has asked for. */ + Assert(ntotal <= count); + +#ifdef WAL_DEBUG + if (XLOG_DEBUG) + ereport(DEBUG1, + (errmsg_internal("read %zu bytes out of %zu bytes from WAL buffers for given LSN %X/%X, Timeline ID %u", + ntotal, count, LSN_FORMAT_ARGS(startptr), tli))); +#endif + + return ntotal; +} + /* * Converts a "usable byte position" to XLogRecPtr. A usable byte position * is the position starting from the beginning of WAL, excluding all WAL diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c index cadea21b37..03f0cca1e6 100644 --- a/src/backend/access/transam/xlogreader.c +++ b/src/backend/access/transam/xlogreader.c @@ -1486,8 +1486,7 @@ err: * Returns true if succeeded, false if an error occurs, in which case * 'errinfo' receives error details. * - * XXX probably this should be improved to suck data directly from the - * WAL buffers when possible. + * When possible, this function reads data directly from WAL buffers. */ bool WALRead(XLogReaderState *state, @@ -1498,6 +1497,45 @@ WALRead(XLogReaderState *state, XLogRecPtr recptr; Size nbytes; +#ifndef FRONTEND + /* Frontend tools have no idea of WAL buffers. */ + Size nread; + + /* + * Try reading WAL from WAL buffers. We skip this step and continue the + * usual way, that is to read from WAL file, either when server is in + * recovery (standby mode, archive or crash recovery), in which case the + * WAL buffers are not used or when the server is inserting in a different + * timeline from that of the timeline that we're trying to read WAL from. + */ + if (!RecoveryInProgress() && + tli == GetWALInsertionTimeLine()) + { + nread = XLogReadFromBuffers(state, startptr, tli, count, buf); + + Assert(nread >= 0); + + /* + * Check if we have read fully (hit), partially (partial hit) or + * nothing (miss) from WAL buffers. If we have read either partially or + * nothing, then continue to read the remaining bytes the usual way, + * that is, read from WAL file. + */ + if (count == nread) + return true; /* Buffer hit, so return. */ + else if (count > nread) + { + /* + * Buffer partial hit, so reset the state to count the read bytes + * and continue. + */ + buf += nread; + startptr += nread; + count -= nread; + } + } +#endif /* FRONTEND */ + p = buf; recptr = startptr; nbytes = count; diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h index cfe5409738..4fdd8c8b17 100644 --- a/src/include/access/xlog.h +++ b/src/include/access/xlog.h @@ -247,6 +247,12 @@ extern XLogRecPtr GetLastImportantRecPtr(void); extern void SetWalWriterSleeping(bool sleeping); +extern Size XLogReadFromBuffers(struct XLogReaderState *state, + XLogRecPtr startptr, + TimeLineID tli, + Size count, + char *buf); + /* * Routines used by xlogrecovery.c to call back into xlog.c during recovery. */ -- 2.34.1
From 9a7abe263123266cfc7fccd259570164ecdab047 Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Date: Tue, 14 Mar 2023 07:14:01 +0000 Subject: [PATCH v10] Add test module for verifying WAL read from WAL buffers --- src/test/modules/Makefile | 1 + src/test/modules/meson.build | 1 + .../test_wal_read_from_buffers/.gitignore | 4 ++ .../test_wal_read_from_buffers/Makefile | 23 +++++++++ .../test_wal_read_from_buffers/meson.build | 36 +++++++++++++ .../test_wal_read_from_buffers/t/001_basic.pl | 44 ++++++++++++++++ .../test_wal_read_from_buffers--1.0.sql | 16 ++++++ .../test_wal_read_from_buffers.c | 51 +++++++++++++++++++ .../test_wal_read_from_buffers.control | 4 ++ 9 files changed, 180 insertions(+) create mode 100644 src/test/modules/test_wal_read_from_buffers/.gitignore create mode 100644 src/test/modules/test_wal_read_from_buffers/Makefile create mode 100644 src/test/modules/test_wal_read_from_buffers/meson.build create mode 100644 src/test/modules/test_wal_read_from_buffers/t/001_basic.pl create mode 100644 src/test/modules/test_wal_read_from_buffers/test_wal_read_from_buffers--1.0.sql create mode 100644 src/test/modules/test_wal_read_from_buffers/test_wal_read_from_buffers.c create mode 100644 src/test/modules/test_wal_read_from_buffers/test_wal_read_from_buffers.control diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile index c629cbe383..ea33361f69 100644 --- a/src/test/modules/Makefile +++ b/src/test/modules/Makefile @@ -33,6 +33,7 @@ SUBDIRS = \ test_rls_hooks \ test_shm_mq \ test_slru \ + test_wal_read_from_buffers \ unsafe_tests \ worker_spi diff --git a/src/test/modules/meson.build b/src/test/modules/meson.build index 1baa6b558d..e3ffd3538d 100644 --- a/src/test/modules/meson.build +++ b/src/test/modules/meson.build @@ -29,5 +29,6 @@ subdir('test_regex') subdir('test_rls_hooks') subdir('test_shm_mq') subdir('test_slru') +subdir('test_wal_read_from_buffers') subdir('unsafe_tests') subdir('worker_spi') diff --git a/src/test/modules/test_wal_read_from_buffers/.gitignore b/src/test/modules/test_wal_read_from_buffers/.gitignore new file mode 100644 index 0000000000..5dcb3ff972 --- /dev/null +++ b/src/test/modules/test_wal_read_from_buffers/.gitignore @@ -0,0 +1,4 @@ +# Generated subdirectories +/log/ +/results/ +/tmp_check/ diff --git a/src/test/modules/test_wal_read_from_buffers/Makefile b/src/test/modules/test_wal_read_from_buffers/Makefile new file mode 100644 index 0000000000..7a09533ec7 --- /dev/null +++ b/src/test/modules/test_wal_read_from_buffers/Makefile @@ -0,0 +1,23 @@ +# src/test/modules/test_wal_read_from_buffers/Makefile + +MODULE_big = test_wal_read_from_buffers +OBJS = \ + $(WIN32RES) \ + test_wal_read_from_buffers.o +PGFILEDESC = "test_wal_read_from_buffers - test module to verify that WAL can be read from WAL buffers" + +EXTENSION = test_wal_read_from_buffers +DATA = test_wal_read_from_buffers--1.0.sql + +TAP_TESTS = 1 + +ifdef USE_PGXS +PG_CONFIG = pg_config +PGXS := $(shell $(PG_CONFIG) --pgxs) +include $(PGXS) +else +subdir = src/test/modules/test_wal_read_from_buffers +top_builddir = ../../../.. +include $(top_builddir)/src/Makefile.global +include $(top_srcdir)/contrib/contrib-global.mk +endif diff --git a/src/test/modules/test_wal_read_from_buffers/meson.build b/src/test/modules/test_wal_read_from_buffers/meson.build new file mode 100644 index 0000000000..40a36edc07 --- /dev/null +++ b/src/test/modules/test_wal_read_from_buffers/meson.build @@ -0,0 +1,36 @@ +# Copyright (c) 2022-2023, PostgreSQL Global Development Group + +# FIXME: prevent install during main install, but not during test :/ + +test_wal_read_from_buffers_sources = files( + 'test_wal_read_from_buffers.c', +) + +if host_system == 'windows' + test_wal_read_from_buffers_sources += rc_lib_gen.process(win32ver_rc, extra_args: [ + '--NAME', 'test_wal_read_from_buffers', + '--FILEDESC', 'test_wal_read_from_buffers - test module to verify that WAL can be read from WAL buffers',]) +endif + +test_wal_read_from_buffers = shared_module('test_wal_read_from_buffers', + test_wal_read_from_buffers_sources, + kwargs: pg_mod_args, +) +testprep_targets += test_wal_read_from_buffers + +install_data( + 'test_wal_read_from_buffers.control', + 'test_wal_read_from_buffers--1.0.sql', + kwargs: contrib_data_args, +) + +tests += { + 'name': 'test_wal_read_from_buffers', + 'sd': meson.current_source_dir(), + 'bd': meson.current_build_dir(), + 'tap': { + 'tests': [ + 't/001_basic.pl', + ], + }, +} diff --git a/src/test/modules/test_wal_read_from_buffers/t/001_basic.pl b/src/test/modules/test_wal_read_from_buffers/t/001_basic.pl new file mode 100644 index 0000000000..3448e0bed6 --- /dev/null +++ b/src/test/modules/test_wal_read_from_buffers/t/001_basic.pl @@ -0,0 +1,44 @@ +# Copyright (c) 2021-2023, PostgreSQL Global Development Group + +use strict; +use warnings; + +use PostgreSQL::Test::Cluster; +use PostgreSQL::Test::Utils; +use Test::More; + +my $node = PostgreSQL::Test::Cluster->new('main'); + +$node->init; + +# Ensure nobody interferes with us so that the WAL in WAL buffers don't get +# overwritten while running tests. +$node->append_conf( + 'postgresql.conf', qq( +autovacuum = off +checkpoint_timeout = 1h +wal_writer_delay = 10000ms +wal_writer_flush_after = 1GB +)); +$node->start; + +# Setup. +$node->safe_psql('postgres', 'CREATE EXTENSION test_wal_read_from_buffers'); + +# Get current insert LSN. After this, we generate some WAL which is guranteed +# to be in WAL buffers as there is no other WAL generating activity is +# happening on the server. We then verify if we can read the WAL from WAL +# buffers using this LSN. +my $lsn = + $node->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();'); + +# Generate minimal WAL so that WAL buffers don't get overwritten. +$node->safe_psql('postgres', + "CREATE TABLE t (c int); INSERT INTO t VALUES (1);"); + +# Check if WAL is successfully read from WAL buffers. +my $result = $node->safe_psql('postgres', + qq{SELECT test_wal_read_from_buffers('$lsn')}); +is($result, 't', "WAL is successfully read from WAL buffers"); + +done_testing(); diff --git a/src/test/modules/test_wal_read_from_buffers/test_wal_read_from_buffers--1.0.sql b/src/test/modules/test_wal_read_from_buffers/test_wal_read_from_buffers--1.0.sql new file mode 100644 index 0000000000..8e89910133 --- /dev/null +++ b/src/test/modules/test_wal_read_from_buffers/test_wal_read_from_buffers--1.0.sql @@ -0,0 +1,16 @@ +/* src/test/modules/test_wal_read_from_buffers/test_wal_read_from_buffers--1.0.sql */ + +-- complain if script is sourced in psql, rather than via CREATE EXTENSION +\echo Use "CREATE EXTENSION test_wal_read_from_buffers" to load this file. \quit + +-- +-- test_wal_read_from_buffers() +-- +-- Returns true if WAL data at a given LSN can be read from WAL buffers. +-- Otherwise returns false. +-- +CREATE FUNCTION test_wal_read_from_buffers(IN lsn pg_lsn, + OUT read_from_buffers bool +) +AS 'MODULE_PATHNAME', 'test_wal_read_from_buffers' +LANGUAGE C STRICT PARALLEL UNSAFE; diff --git a/src/test/modules/test_wal_read_from_buffers/test_wal_read_from_buffers.c b/src/test/modules/test_wal_read_from_buffers/test_wal_read_from_buffers.c new file mode 100644 index 0000000000..d0942658da --- /dev/null +++ b/src/test/modules/test_wal_read_from_buffers/test_wal_read_from_buffers.c @@ -0,0 +1,51 @@ +/*-------------------------------------------------------------------------- + * + * test_wal_read_from_buffers.c + * Test code for veryfing WAL read from WAL buffers. + * + * Portions Copyright (c) 1996-2023, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * IDENTIFICATION + * src/test/modules/test_wal_read_from_buffers/test_wal_read_from_buffers.c + * ------------------------------------------------------------------------- + */ + +#include "postgres.h" + +#include "access/xlog.h" +#include "fmgr.h" +#include "utils/pg_lsn.h" + +PG_MODULE_MAGIC; + +/* + * SQL function for verifying that WAL data at a given LSN can be read from WAL + * buffers. Returns true if read from WAL buffers, otherwise false. + */ +PG_FUNCTION_INFO_V1(test_wal_read_from_buffers); +Datum +test_wal_read_from_buffers(PG_FUNCTION_ARGS) +{ + XLogRecPtr lsn; + Size nread; + TimeLineID tli; + char data[XLOG_BLCKSZ] = {0}; + + lsn = PG_GETARG_LSN(0); + + if (XLogRecPtrIsInvalid(lsn)) + PG_RETURN_BOOL(false); + + if (RecoveryInProgress()) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("recovery is in progress"), + errhint("WAL control functions cannot be executed during recovery."))); + + tli = GetWALInsertionTimeLine(); + + nread = XLogReadFromBuffers(NULL, lsn, tli, XLOG_BLCKSZ, data); + + PG_RETURN_LSN(nread > 0); +} diff --git a/src/test/modules/test_wal_read_from_buffers/test_wal_read_from_buffers.control b/src/test/modules/test_wal_read_from_buffers/test_wal_read_from_buffers.control new file mode 100644 index 0000000000..7852b3e331 --- /dev/null +++ b/src/test/modules/test_wal_read_from_buffers/test_wal_read_from_buffers.control @@ -0,0 +1,4 @@ +comment = 'Test code for veryfing WAL read from WAL buffers' +default_version = '1.0' +module_pathname = '$libdir/test_wal_read_from_buffers' +relocatable = true -- 2.34.1