On Mon, Oct 19, 2020 at 04:52:48PM +0900, Michael Paquier wrote: > No issues with reporting the block number and the fork type in the SRF > at least of course as this is helpful to detect the position of the > broken blocks. For the checksum found in the header and the one > calculated (named expected in the patch), I am less sure how to put a > clear definition on top of that but we could always consider that > later and extend the SRF as needed. Once the user knows that both do > not match, he/she knows that there is a problem.
So, I have reviewed your patch set, and heavily reworked the logic to be more consistent on many thinks, resulting in a largely simplified patch without sacrificing its usefulness: - The logic of the core routine of bufmgr.c is unchanged. I have simplified it a bit though by merging the subroutines that were part of the patch. SMgrRelation is used as argument instead of a Relation. That's more consistent with the surroundings. The initial read of a page without locks is still on the table as an extra optimization, though I am not completely sure if this should be part of CheckBuffer() or not. I also thought about the previous benchmarks and I think that not using the most-optimized improved performance, because it reduced the successive runes of the SQL functions, reducing the back-pressure on the partition locks (we held on of them at the same time for a longer period, meaning that the other 127 ran free for a longer time). Please note that this part still referred to a "module", which was incorrect. - Removal of the expected and found checksums from the SRF of the function. Based on my recent business with the page checks for base backups, I have arrived at the conclusion that the SRF should return data that we can absolutely trust, and the minimum I think we have to trust here is if a given page is thought as safe or not, considering all the sanity checks done by PageIsVerified() as the main entry point for everything. This has led to a bit of confusion with the addition of NoComputedChecksum for a page that was empty as of the initial of the patch, so it happens that we don't need it anymore. - Removal of the dependency with checksums for this feature. While simplifying the code, I have noticed that this feature can also be beneficial for clusters that do not have have data checksums, as PageIsVerified() is perfectly able to run some page header checks and the zero-page case. That's of course less useful than having the checksums, but there is no need to add a dependency here. The file for the SQL functions is renamed from checksumfuncs.c to pagefuncs.c. - The function is changed to return no tuples if the relkind is not supported, and the same applies for temporary relations. That's more consistent with other system functions like the ones in charge of partition information, and this makes full scans of pg_class much easier to work with. Temporary tables were not handled correctly anyway as these are in local buffers, but the use case of this function in this case is not really obvious to me. - Having the forknum in the SRF is kind of confusing, as the user would need to map a number with the physical on-disk name. Instead, I have made the choice to return the *path* of the corrupted file with a block number. This way, an operator can know immediately where a problem comes from. The patch does not append the segment number, and I am not sure if we should actually do that, but adding it is straight-forward as we have the block number. There is a dependency with table AMs here as well, as this goes down to fd.c, explaining why I have not added it and just. - I really don't know what kind of default ACL should apply for such functions, but I am sure that SCAN_TABLES is not what we are looking for here, and there is nothing preventing us from having a safe default from the start of times, so I moved the function to be superuser-only by default, and GRANT can be used to allow its execution to other roles. We could relax that in the future, of course, this can be discussed separately. - The WARNING report for each block found as corrupted gains an error context, as a result of a switch to PageIsVerified(), giving a user all the information needed in the logs on top of the result in the SRF. That's useful as well if combined with CHECK_FOR_INTERRUPTS(), and I got to wonder if we should have some progress report for this stuff, though that's a separate discussion. - The function is renamed to something less generic, pg_relation_check_pages(), and I have reduced the number of functions from two to one, where the user can specify the fork name with a new option. The default of NULL means that all the forks of a relation are checked. - The TAP tests are rather bulky. I have moved all the non-corruption test cases into a new SQL test file. That's useful for people willing to do some basic sanity checks with a non-default table AM. At least it provides a minimum coverage. I have not completely finished its review, but I have done some work. Doing some debugging of corrupt_and_test_block() was proving to be rather difficult as the same test names are assigned multiple times. I am tempted to move this test suite to src/test/recovery/ instead. - Reworked the docs and some comments. That's quite a lot of changes, and I think that most of the C code, the main tests in src/test/regress/ and the docs are getting in a rather committable state. The TAP tests emulating corruptions still need a closer lookup (say, check_pg_stat_database_nb_error() should have an error prefix at least). The portions in bufmgr.c and the rest should of course be split into two separate commits, that can easily be done. And the code needs an indentation run and a catalog bump. -- Michael
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index bbcac69d48..459b2ad4ae 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -10958,6 +10958,14 @@ proallargtypes => '{oid,text,int8,timestamptz}', proargmodes => '{i,o,o,o}', proargnames => '{tablespace,name,size,modification}', prosrc => 'pg_ls_tmpdir_1arg' }, +{ oid => '9147', descr => 'check pages of a relation', + proname => 'pg_relation_check_pages', procost => '10000', + prorows => '20', proisstrict => 'f', proretset => 't', proparallel => 'r', + provolatile => 'v', prorettype => 'record', proargtypes => 'regclass text', + proallargtypes => '{regclass,text,text,int8}', + proargmodes => '{i,i,o,o}', + proargnames => '{relation,fork,path,failed_block_num}', + prosrc => 'pg_relation_check_pages' }, # hash partitioning constraint function { oid => '5028', descr => 'hash partition CHECK constraint', diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h index ee91b8fa26..a21cab2eaf 100644 --- a/src/include/storage/bufmgr.h +++ b/src/include/storage/bufmgr.h @@ -240,6 +240,9 @@ extern void AtProcExit_LocalBuffers(void); extern void TestForOldSnapshot_impl(Snapshot snapshot, Relation relation); +extern bool CheckBuffer(struct SMgrRelationData *smgr, ForkNumber forknum, + BlockNumber blkno); + /* in freelist.c */ extern BufferAccessStrategy GetAccessStrategy(BufferAccessStrategyType btype); extern void FreeAccessStrategy(BufferAccessStrategy strategy); diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 85cd147e21..c6dd084fbc 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -1300,6 +1300,14 @@ LANGUAGE INTERNAL STRICT VOLATILE AS 'pg_create_logical_replication_slot'; +CREATE OR REPLACE FUNCTION pg_relation_check_pages( + IN relation regclass, IN fork text DEFAULT NULL, + OUT path text, OUT failed_block_num bigint) +RETURNS SETOF record +LANGUAGE internal +VOLATILE PARALLEL RESTRICTED +AS 'pg_relation_check_pages'; + CREATE OR REPLACE FUNCTION make_interval(years int4 DEFAULT 0, months int4 DEFAULT 0, weeks int4 DEFAULT 0, days int4 DEFAULT 0, hours int4 DEFAULT 0, mins int4 DEFAULT 0, @@ -1444,6 +1452,7 @@ AS 'unicode_is_normalized'; -- can later change who can access these functions, or leave them as only -- available to superuser / cluster owner, if they choose. -- +REVOKE EXECUTE ON FUNCTION pg_relation_check_pages(regclass, text) FROM public; REVOKE EXECUTE ON FUNCTION pg_start_backup(text, boolean, boolean) FROM public; REVOKE EXECUTE ON FUNCTION pg_stop_backup() FROM public; REVOKE EXECUTE ON FUNCTION pg_stop_backup(boolean, boolean) FROM public; diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index e549fa1d30..427177e9ce 100644 --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufmgr.c @@ -4583,3 +4583,100 @@ TestForOldSnapshot_impl(Snapshot snapshot, Relation relation) (errcode(ERRCODE_SNAPSHOT_TOO_OLD), errmsg("snapshot too old"))); } + + +/* + * CheckBuffer + * + * Check the state of a buffer without loading it into the shared buffers. To + * avoid torn pages and possible false positives when reading data, a shared + * LWLock is taken on the target buffer pool partition mapping, and we check + * if the page is in shared_buffers or not. + * + * If the page is found as dirty in the shared buffers, it is ignored as + * it will be flushed to disk either before the end of the next checkpoint + * or during recovery in case of unsafe shutdown + * + * If the page is found in the shared buffers but is not dirty, we still + * check the state of its data on disk, as it could be possible that the + * page stayed in shared buffers for a rather long time but its on-disk data + * got corrupted. An I/O lock is taken on the block and the block is then + * read from storage to prevent any concurrent activity from happening. + * + * If the page is not found in shared_buffers, the block is read from disk + * while holding the buffer pool partition mapping LWLock. + * + * The page data is stored a private memory area local to this function. + * + * The caller of this function should hold an AccessShareLock on the relation + * check whose page is checked. + */ +bool +CheckBuffer(SMgrRelation smgr, ForkNumber forknum, BlockNumber blkno) +{ + char buffer[BLCKSZ]; + BufferTag buf_tag; /* identity of requested block */ + uint32 buf_hash; /* hash value for buf_tag */ + LWLock *partLock; /* buffer partition lock for the buffer */ + BufferDesc *bufdesc; + int buf_id; + + Assert(smgrexists(smgr, forknum)); + + /* create a tag so we can look after the buffer */ + INIT_BUFFERTAG(buf_tag, smgr->smgr_rnode.node, forknum, blkno); + + /* determine its hash code and partition lock ID */ + buf_hash = BufTableHashCode(&buf_tag); + partLock = BufMappingPartitionLock(buf_hash); + + /* see if the block is in the buffer pool or not */ + LWLockAcquire(partLock, LW_SHARED); + buf_id = BufTableLookup(&buf_tag, buf_hash); + if (buf_id >= 0) + { + uint32 buf_state; + + /* + * Found it. Now, retrieve its state to know what to do with it, and + * release the pin immediately. We do so to limit overhead as much + * as possible. We keep the shared lightweight lock on the target + * buffer mapping partition for noe, so this buffer cannot be evicted, + * and we will acquire an I/O Lock on the buffer if we need to read + * the content on disk. + */ + bufdesc = GetBufferDescriptor(buf_id); + + buf_state = LockBufHdr(bufdesc); + UnlockBufHdr(bufdesc, buf_state); + + /* If the page is dirty or invalid, skip it */ + if ((buf_state & BM_DIRTY) || !(buf_state & BM_TAG_VALID)) + { + LWLockRelease(partLock); + return true; + } + + /* + * Read the buffer from disk, taking an I/O lock to prevent torn-page + * reads, in the unlikely event that it was concurrently dirtied and + * flushed. + */ + LWLockAcquire(BufferDescriptorGetIOLock(bufdesc), LW_SHARED); + smgrread(smgr, forknum, blkno, buffer); + LWLockRelease(BufferDescriptorGetIOLock(bufdesc)); + } + else + { + /* + * Simply read the buffer. There's no risk of modification on it as + * we are holding the buffer pool partition mapping lock. + */ + smgrread(smgr, forknum, blkno, buffer); + } + + /* buffer lookup done, so now do its check */ + LWLockRelease(partLock); + + return PageIsVerified(buffer, blkno); +} diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile index b4d55e849b..e2279af1e5 100644 --- a/src/backend/utils/adt/Makefile +++ b/src/backend/utils/adt/Makefile @@ -69,6 +69,7 @@ OBJS = \ oid.o \ oracle_compat.o \ orderedsetaggs.o \ + pagefuncs.o \ partitionfuncs.o \ pg_locale.o \ pg_lsn.o \ diff --git a/src/backend/utils/adt/pagefuncs.c b/src/backend/utils/adt/pagefuncs.c new file mode 100644 index 0000000000..7de415fc59 --- /dev/null +++ b/src/backend/utils/adt/pagefuncs.c @@ -0,0 +1,219 @@ +/*------------------------------------------------------------------------- + * + * pagefuncs.c + * Functions for page related features. + * + * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * + * IDENTIFICATION + * src/backend/utils/adt/pagefuncs.c + * + *------------------------------------------------------------------------- + */ + +#include "postgres.h" + +#include "access/relation.h" +#include "funcapi.h" +#include "miscadmin.h" +#include "storage/bufmgr.h" +#include "storage/lmgr.h" +#include "storage/smgr.h" +#include "utils/builtins.h" + +static void check_one_relation(TupleDesc tupdesc, Tuplestorestate *tupstore, + Oid relid, ForkNumber single_forknum); +static void check_relation_fork(TupleDesc tupdesc, Tuplestorestate *tupstore, + Relation relation, ForkNumber forknum); + +/* + * callback arguments for check_pages_error_callback() + */ +typedef struct CheckPagesErrorInfo +{ + char *path; + BlockNumber blkno; +} CheckPagesErrorInfo; + +/* + * Error callback specific to check_relation_fork(). + */ +static void +check_pages_error_callback(void *arg) +{ + CheckPagesErrorInfo *errinfo = (CheckPagesErrorInfo *) arg; + + errcontext("while checking page %u of path %s", + errinfo->blkno, errinfo->path); +} + +/* + * pg_relation_check_pages + * + * Check the state of all the pages for one or more fork types in the given + * relation. + */ +Datum +pg_relation_check_pages(PG_FUNCTION_ARGS) +{ + Oid relid; + ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo; + TupleDesc tupdesc; + Tuplestorestate *tupstore; + MemoryContext per_query_ctx; + MemoryContext oldcontext; + ForkNumber forknum; + + /* Switch into long-lived context to construct returned data structures */ + per_query_ctx = rsinfo->econtext->ecxt_per_query_memory; + oldcontext = MemoryContextSwitchTo(per_query_ctx); + + /* Build a tuple descriptor for our result type */ + if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE) + elog(ERROR, "return type must be a row type"); + + tupstore = tuplestore_begin_heap(true, false, work_mem); + rsinfo->returnMode = SFRM_Materialize; + rsinfo->setResult = tupstore; + rsinfo->setDesc = tupdesc; + + MemoryContextSwitchTo(oldcontext); + + /* handle arguments */ + if (PG_ARGISNULL(0)) + { + /* Just leave if nothing is defined */ + PG_RETURN_VOID(); + } + + /* By default all the forks of a relation are checked */ + if (PG_ARGISNULL(1)) + forknum = InvalidForkNumber; + else + { + const char *forkname = TextDatumGetCString(PG_GETARG_TEXT_PP(1)); + forknum = forkname_to_number(forkname); + } + + relid = PG_GETARG_OID(0); + + check_one_relation(tupdesc, tupstore, relid, forknum); + tuplestore_donestoring(tupstore); + + return (Datum) 0; +} + +/* + * Perform the check on a single relation, possibly filtered with a single + * fork. This function will check if the given relation exists or not, as + * a relation could be dropped after checking for the list of relations and + * before getting here, and we don't want to error out in this case. + */ +static void +check_one_relation(TupleDesc tupdesc, Tuplestorestate *tupstore, + Oid relid, ForkNumber single_forknum) +{ + Relation relation; + ForkNumber forknum; + + relation = relation_open(relid, AccessShareLock); + + /* sanity checks, returning no results if no support */ + if (!RELKIND_HAS_STORAGE(relation->rd_rel->relkind) || + relation->rd_rel->relpersistence == RELPERSISTENCE_TEMP) + { + relation_close(relation, AccessShareLock); + return; + } + + RelationOpenSmgr(relation); + + for (forknum = 0; forknum <= MAX_FORKNUM; forknum++) + { + if (single_forknum != InvalidForkNumber && single_forknum != forknum) + continue; + + if (smgrexists(relation->rd_smgr, forknum)) + check_relation_fork(tupdesc, tupstore, relation, forknum); + } + + relation_close(relation, AccessShareLock); +} + +/* + * For a given relation and fork, Do the real work of iterating over all pages + * and doing the check. Caller must hold an AccessShareLock lock on the given + * relation. + */ +static void +check_relation_fork(TupleDesc tupdesc, Tuplestorestate *tupstore, + Relation relation, ForkNumber forknum) +{ + BlockNumber blkno, + nblocks; + SMgrRelation smgr = relation->rd_smgr; + char *path; + CheckPagesErrorInfo errinfo; + ErrorContextCallback errcallback; + + /* Number of output arguments in the SRF */ +#define PG_CHECK_RELATION_COLS 2 + + Assert(CheckRelationLockedByMe(relation, AccessShareLock, true)); + + /* + * We remember the number of blocks here. Since caller must hold a lock on + * the relation, we know that it won't be truncated while we're iterating + * over the blocks. Any block added after this function started won't be + * checked, but this is out of scope as such pages will be flushed before + * the next checkpoint's completion. + */ + nblocks = RelationGetNumberOfBlocksInFork(relation, forknum); + + path = relpathbackend(smgr->smgr_rnode.node, + smgr->smgr_rnode.backend, + forknum); + + /* + * Error context to print some information about blocks and relations + * impacted by corruptions. + */ + errinfo.path = pstrdup(path); + errinfo.blkno = 0; + errcallback.callback = check_pages_error_callback; + errcallback.arg = (void *) &errinfo; + errcallback.previous = error_context_stack; + error_context_stack = &errcallback; + + for (blkno = 0; blkno < nblocks; blkno++) + { + Datum values[PG_CHECK_RELATION_COLS]; + bool nulls[PG_CHECK_RELATION_COLS]; + int i = 0; + + /* Update block number for the error context */ + errinfo.blkno = blkno; + + /* Check the given buffer */ + if (CheckBuffer(smgr, forknum, blkno)) + continue; + + memset(values, 0, sizeof(values)); + memset(nulls, 0, sizeof(nulls)); + + values[i++] = CStringGetTextDatum(path); + values[i++] = UInt32GetDatum(blkno); + + Assert(i == PG_CHECK_RELATION_COLS); + + /* Save the corrupted blocks in the tuplestore. */ + tuplestore_putvalues(tupstore, tupdesc, values, nulls); + + pfree(path); + } + + /* Pop the error context stack */ + error_context_stack = errcallback.previous; +} diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile index a6d2ffbf9e..a845af71fd 100644 --- a/src/test/modules/Makefile +++ b/src/test/modules/Makefile @@ -7,6 +7,7 @@ include $(top_builddir)/src/Makefile.global SUBDIRS = \ brin \ commit_ts \ + check_relation \ delay_execution \ dummy_index_am \ dummy_seclabel \ diff --git a/src/test/modules/check_relation/.gitignore b/src/test/modules/check_relation/.gitignore new file mode 100644 index 0000000000..871e943d50 --- /dev/null +++ b/src/test/modules/check_relation/.gitignore @@ -0,0 +1,2 @@ +# Generated by test suite +/tmp_check/ diff --git a/src/test/modules/check_relation/Makefile b/src/test/modules/check_relation/Makefile new file mode 100644 index 0000000000..a540cdece2 --- /dev/null +++ b/src/test/modules/check_relation/Makefile @@ -0,0 +1,14 @@ +# src/test/modules/check_relation/Makefile + +TAP_TESTS = 1 + +ifdef USE_PGXS +PG_CONFIG = pg_config +PGXS := $(shell $(PG_CONFIG) --pgxs) +include $(PGXS) +else +subdir = src/test/modules/check_relation +top_builddir = ../../../.. +include $(top_builddir)/src/Makefile.global +include $(top_srcdir)/contrib/contrib-global.mk +endif diff --git a/src/test/modules/check_relation/README b/src/test/modules/check_relation/README new file mode 100644 index 0000000000..415c4b21ad --- /dev/null +++ b/src/test/modules/check_relation/README @@ -0,0 +1,23 @@ +src/test/check_relation/README + +Regression tests for online checksums verification +================================================== + +This directory contains a test suite for online checksums verification. + +Running the tests +================= + +NOTE: You must have given the --enable-tap-tests argument to configure. + +Run + make check +or + make installcheck +You can use "make installcheck" if you previously did "make install". +In that case, the code in the installation tree is tested. With +"make check", a temporary installation tree is built from the current +sources and then tested. + +Either way, this test initializes, starts, and stops a test Postgres +cluster. diff --git a/src/test/modules/check_relation/t/001_checksums_check.pl b/src/test/modules/check_relation/t/001_checksums_check.pl new file mode 100644 index 0000000000..73a7f2d849 --- /dev/null +++ b/src/test/modules/check_relation/t/001_checksums_check.pl @@ -0,0 +1,246 @@ +use strict; +use warnings; + +use PostgresNode; +use TestLib; +use Test::More tests => 52; + +our $CHECKSUM_UINT16_OFFSET = 4; +our $PD_UPPER_UINT16_OFFSET = 7; +our $BLOCKSIZE; +our $TOTAL_NB_ERR = 0; + +sub get_block +{ + my ($filename, $blkno) = @_; + my $block; + + open(my $infile, '<', $filename) or die; + binmode($infile); + + my $success = read($infile, $block, $BLOCKSIZE, ($blkno * $BLOCKSIZE)); + die($!) if not defined $success; + + close($infile); + + return($block); +} + +sub overwrite_block +{ + my ($filename, $block, $blkno) = @_; + + open(my $outfile, '>', $filename) or die; + binmode ($outfile); + + my $nb = syswrite($outfile, $block, $BLOCKSIZE, ($blkno * $BLOCKSIZE)); + + die($!) if not defined $nb; + die("Write error") if ($nb != $BLOCKSIZE); + + $outfile->flush(); + + close($outfile); +} + +sub get_uint16_from_page +{ + my ($block, $offset) = @_; + + return (unpack("S*", $block))[$offset]; +} + +sub set_uint16_to_page +{ + my ($block, $data, $offset) = @_; + + my $pack = pack("S", $data); + + # vec with 16B or more won't preserve endianness + vec($block, 2*$offset, 8) = (unpack('C*', $pack))[0]; + vec($block, (2*$offset) + 1, 8) = (unpack('C*', $pack))[1]; + + return $block; +} + +sub check_checksums_call +{ + my ($node, $relname) = @_; + + my ($cmdret, $stdout, $stderr) = $node->psql('postgres', "SELECT COUNT(*)" + . " FROM pg_catalog.pg_relation_check_pages('$relname')" + ); + + return ($stderr eq ''); +} + +sub check_checksums_nb_error +{ + my ($node, $nb, $pattern) = @_; + + my ($cmdret, $stdout, $stderr) = $node->psql('postgres', "SELECT COUNT(*)" + . " FROM (SELECT pg_catalog.pg_relation_check_pages(oid, 'main')" + . " FROM pg_class WHERE relkind in ('r', 'i', 'm')) AS s" + ); + + is($cmdret, 0, 'Function should run successfully'); + like($stderr, $pattern, 'Error output should match expectations'); + is($stdout, $nb, "Should have $nb error"); + + $TOTAL_NB_ERR += $nb; +} + +sub check_pg_stat_database_nb_error +{ + my ($node) = @_; + + my ($cmdret, $stdout, $stderr) = $node->psql('postgres', "SELECT " + . " sum(checksum_failures)" + . " FROM pg_catalog.pg_stat_database" + ); + + is($cmdret, 0, 'Function should run successfully'); + is($stderr, '', 'Function should run successfully'); + is($stdout, $TOTAL_NB_ERR, "Should have $TOTAL_NB_ERR error"); +} + +sub get_checksums_errors +{ + my ($node, $nb, $pattern) = @_; + + my ($cmdret, $stdout, $stderr) = $node->psql('postgres', "SELECT " + . "relname, failed_block_num" + . " FROM (SELECT relname, (pg_catalog.pg_relation_check_pages(oid)).*" + . " FROM pg_class WHERE relkind in ('r','i', 'm')) AS s" + ); + + is($cmdret, '0', 'Function should run successfully'); + like($stderr, $pattern, 'Error output should match expectations'); + + $TOTAL_NB_ERR += $nb; + + return $stdout; +} + +# This function will perform various test by modifying the specified block at +# the specified uint16 offset, checking that the corruption is correctly +# detected, and finally restore the specified block to its original content. +sub corrupt_and_test_block +{ + my ($node, $filename, $blkno, $offset, $fake_data, $error_pattern, $test_prefix) = @_; + + check_checksums_nb_error($node, 0, qr/^$/); + + check_pg_stat_database_nb_error($node); + + $node->stop(); + + my $original_block = get_block($filename, 0); + my $original_data = get_uint16_from_page($original_block, $offset); + + isnt($original_data, $fake_data, + "$test_prefix: The fake data at offset $offset should be different" + . " from the existing one"); + + my $new_block = set_uint16_to_page($original_block, $fake_data, $offset); + isnt($original_data, get_uint16_from_page($new_block, $offset), + "$test_prefix: The fake data at offset $offset should have been changed in memory"); + + overwrite_block($filename, $new_block, 0); + + my $written_data = get_uint16_from_page(get_block($filename, 0), $offset); + isnt($original_data, $written_data, + "$test_prefix: The data written at offset $offset should be different" + . " from the original one"); + is(get_uint16_from_page($new_block, $offset), $written_data, + "$test_prefix: The data written at offset $offset should be the same" + . " as the one in memory"); + is($written_data, $fake_data, + "$test_prefix: The data written at offset $offset should be the one" + . " we wanted to write"); + + $node->start(); + + check_checksums_nb_error($node, 1, $error_pattern); + + my $det = get_checksums_errors($node, 1, $error_pattern); + is($det, "t1|0", + "$test_prefix: The checksums error for modification at offset $offset" + . " should be detected"); + + $node->stop(); + + $new_block = set_uint16_to_page($original_block, $original_data, $offset); + is($original_data, get_uint16_from_page($new_block, $offset), + "$test_prefix: The data at offset $offset should have been restored in memory"); + + overwrite_block($filename, $new_block, 0); + is($original_data, get_uint16_from_page(get_block($filename, $blkno), + $offset), + "$test_prefix: The data at offset $offset should have been restored on disk"); + + $node->start(); + + check_checksums_nb_error($node, 0, qr/^$/); +} + +if (exists $ENV{MY_PG_REGRESS}) +{ + $ENV{PG_REGRESS} = $ENV{MY_PG_REGRESS}; +} + +my $node = get_new_node('main'); + +my %params; +$params{'extra'} = ['--data-checksums']; +$node->init(%params); + +$node->start(); + +$ENV{PGOPTIONS} = '--client-min-messages=WARNING'; + +my ($cmdret, $stdout, $stderr) = $node->psql('postgres', "SELECT" + . " current_setting('data_checksums')"); + +is($stdout, 'on', 'Data checksums should be enabled'); + +($cmdret, $stdout, $stderr) = $node->psql('postgres', "SELECT" + . " current_setting('block_size')"); + +$BLOCKSIZE = $stdout; + +# Basic schema to corrupt and check +$node->safe_psql( + 'postgres', q| + CREATE TABLE public.t1(id integer); + INSERT INTO public.t1 SELECT generate_series(1, 100); + CHECKPOINT; +|); + +# get the underlying heap absolute path +($cmdret, $stdout, $stderr) = $node->psql('postgres', "SELECT" + . " current_setting('data_directory') || '/' || pg_relation_filepath('t1')" +); + +isnt($stdout, '', 'A relfilenode should be returned'); + +my $filename = $stdout; + +check_checksums_nb_error($node, 0, qr/^$/); + +check_pg_stat_database_nb_error($node); + +my $fake_uint16 = hex '0x0000'; + +# Test with a modified checksum. We use a zero checksum here as it's the only +# one that cannot exist on a checksummed page. We also don't have an easy way +# to compute what the checksum would be after a modification in a random place +# in the block. +corrupt_and_test_block($node, $filename, 0, $CHECKSUM_UINT16_OFFSET, + $fake_uint16, qr/WARNING.*page verification failed/s, + 'corrupted page: '); + +# Test corruption making the block looks like it's PageIsNew(). This should +# pass. +corrupt_and_test_block($node, $filename, 0, $PD_UPPER_UINT16_OFFSET, + $fake_uint16, qr/^$/, 'new page: '); diff --git a/src/test/regress/expected/pagefuncs.out b/src/test/regress/expected/pagefuncs.out new file mode 100644 index 0000000000..38a72b01b3 --- /dev/null +++ b/src/test/regress/expected/pagefuncs.out @@ -0,0 +1,72 @@ +-- +-- Tests for functions related to relation pages +-- +-- Restricted to superusers by default +CREATE ROLE regress_pgfunc_user; +SET ROLE regress_pgfunc_user; +SELECT pg_relation_check_pages('pg_class'); -- error +ERROR: permission denied for function pg_relation_check_pages +SELECT pg_relation_check_pages('pg_class', 'main'); -- error +ERROR: permission denied for function pg_relation_check_pages +RESET ROLE; +DROP ROLE regress_pgfunc_user; +-- NULL and simple sanity checks +SELECT pg_relation_check_pages(NULL); -- empty result + pg_relation_check_pages +------------------------- +(0 rows) + +SELECT pg_relation_check_pages(NULL, NULL); -- empty result + pg_relation_check_pages +------------------------- +(0 rows) + +SELECT pg_relation_check_pages('pg_class', 'invalid_fork'); -- error +ERROR: invalid fork name +HINT: Valid fork names are "main", "fsm", "vm", and "init". +-- Relation types that are supported +CREATE TABLE pgfunc_test_tab (id int); +CREATE INDEX pgfunc_test_ind ON pgfunc_test_tab(id); +INSERT INTO pgfunc_test_tab VALUES (generate_series(1,1000)); +SELECT pg_relation_check_pages('pgfunc_test_tab'); + pg_relation_check_pages +------------------------- +(0 rows) + +SELECT pg_relation_check_pages('pgfunc_test_ind'); + pg_relation_check_pages +------------------------- +(0 rows) + +DROP TABLE pgfunc_test_tab; +CREATE MATERIALIZED VIEW pgfunc_test_matview AS SELECT 1; +SELECT pg_relation_check_pages('pgfunc_test_matview'); + pg_relation_check_pages +------------------------- +(0 rows) + +DROP MATERIALIZED VIEW pgfunc_test_matview; +CREATE SEQUENCE pgfunc_test_seq; +SELECT pg_relation_check_pages('pgfunc_test_seq'); + pg_relation_check_pages +------------------------- +(0 rows) + +DROP SEQUENCE pgfunc_test_seq; +-- pg_relation_check_pages() returns no results if passed relations that +-- do not support the operation, like relations without storage or temporary +-- relations. +CREATE TEMPORARY TABLE pgfunc_test_temp AS SELECT generate_series(1,10) AS a; +SELECT pg_relation_check_pages('pgfunc_test_temp'); + pg_relation_check_pages +------------------------- +(0 rows) + +DROP TABLE pgfunc_test_temp; +CREATE VIEW pgfunc_test_view AS SELECT 1; +SELECT pg_relation_check_pages('pgfunc_test_view'); + pg_relation_check_pages +------------------------- +(0 rows) + +DROP VIEW pgfunc_test_view; diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule index ae89ed7f0b..7a46a13252 100644 --- a/src/test/regress/parallel_schedule +++ b/src/test/regress/parallel_schedule @@ -112,7 +112,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr # ---------- # Another group of parallel tests # ---------- -test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain +test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain pagefuncs # event triggers cannot run concurrently with any test that runs DDL test: event_trigger diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule index 525bdc804f..9a80b80f73 100644 --- a/src/test/regress/serial_schedule +++ b/src/test/regress/serial_schedule @@ -197,6 +197,7 @@ test: hash_part test: indexing test: partition_aggregate test: partition_info +test: pagefuncs test: tuplesort test: explain test: event_trigger diff --git a/src/test/regress/sql/pagefuncs.sql b/src/test/regress/sql/pagefuncs.sql new file mode 100644 index 0000000000..12d32eeae4 --- /dev/null +++ b/src/test/regress/sql/pagefuncs.sql @@ -0,0 +1,41 @@ +-- +-- Tests for functions related to relation pages +-- + +-- Restricted to superusers by default +CREATE ROLE regress_pgfunc_user; +SET ROLE regress_pgfunc_user; +SELECT pg_relation_check_pages('pg_class'); -- error +SELECT pg_relation_check_pages('pg_class', 'main'); -- error +RESET ROLE; +DROP ROLE regress_pgfunc_user; + +-- NULL and simple sanity checks +SELECT pg_relation_check_pages(NULL); -- empty result +SELECT pg_relation_check_pages(NULL, NULL); -- empty result +SELECT pg_relation_check_pages('pg_class', 'invalid_fork'); -- error + +-- Relation types that are supported +CREATE TABLE pgfunc_test_tab (id int); +CREATE INDEX pgfunc_test_ind ON pgfunc_test_tab(id); +INSERT INTO pgfunc_test_tab VALUES (generate_series(1,1000)); +SELECT pg_relation_check_pages('pgfunc_test_tab'); +SELECT pg_relation_check_pages('pgfunc_test_ind'); +DROP TABLE pgfunc_test_tab; + +CREATE MATERIALIZED VIEW pgfunc_test_matview AS SELECT 1; +SELECT pg_relation_check_pages('pgfunc_test_matview'); +DROP MATERIALIZED VIEW pgfunc_test_matview; +CREATE SEQUENCE pgfunc_test_seq; +SELECT pg_relation_check_pages('pgfunc_test_seq'); +DROP SEQUENCE pgfunc_test_seq; + +-- pg_relation_check_pages() returns no results if passed relations that +-- do not support the operation, like relations without storage or temporary +-- relations. +CREATE TEMPORARY TABLE pgfunc_test_temp AS SELECT generate_series(1,10) AS a; +SELECT pg_relation_check_pages('pgfunc_test_temp'); +DROP TABLE pgfunc_test_temp; +CREATE VIEW pgfunc_test_view AS SELECT 1; +SELECT pg_relation_check_pages('pgfunc_test_view'); +DROP VIEW pgfunc_test_view; diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index c99499e52b..57a45cf0d3 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -26219,6 +26219,56 @@ SELECT convert_from(pg_read_binary_file('file_in_utf8.txt'), 'UTF8'); </sect2> + <sect2 id="functions-data-sanity"> + <title>Data Sanity Functions</title> + + <para> + The functions shown in <xref linkend="functions-data-sanity-table"/> + provide ways to check the sanity of data files in the cluster. + </para> + + <table id="functions-data-sanity-table"> + <title>Data Sanity Functions</title> + <tgroup cols="3"> + <thead> + <row><entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry> + </row> + </thead> + + <tbody> + <row> + <entry> + <literal><function>pg_relation_check_pages(<parameter>relation</parameter> <type>regclass</type> [, <parameter>fork</parameter> <type>text</type> <literal>DEFAULT</literal> <literal>NULL</literal> ])</function></literal> + </entry> + <entry><type>setof record</type></entry> + <entry>Check the pages of a relation. + </entry> + </row> + </tbody> + </tgroup> + </table> + + <indexterm> + <primary>pg_relation_check_pages</primary> + </indexterm> + <para id="functions-check-relation-note" xreflabel="pg_relation_check_pages"> + <function>pg_relation_check_pages</function> iterates over all blocks of a + given relation and verifies if they are in a state where they can safely + be loaded into the shared buffers. If defined, + <replaceable>fork</replaceable> specifies that only the pages of the given + fork are to be verified. Fork can be <literal>'main'</literal> for the + main data fork, <literal>'fsm'</literal> for the free space map, + <literal>'vm'</literal> for the visibility map, or + <literal>'init'</literal> for the initialization fork. The default of + <literal>NULL</literal> means that all the forks of the relation are + checked. The function returns a list of blocks that are considered as + corrupted with the path of the related file. Use of this function is + restricted to superusers by default but access may be granted to others + using <command>GRANT</command>. + </para> + + </sect2> + </sect1> <sect1 id="functions-trigger"> diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm index 90594bd41b..adbabba3c4 100644 --- a/src/tools/msvc/Mkvcbuild.pm +++ b/src/tools/msvc/Mkvcbuild.pm @@ -50,7 +50,8 @@ my @contrib_excludes = ( 'pgcrypto', 'sepgsql', 'brin', 'test_extensions', 'test_misc', 'test_pg_dump', - 'snapshot_too_old', 'unsafe_tests'); + 'snapshot_too_old', 'unsafe_tests', + 'check_relation'); # Set of variables for frontend modules my $frontend_defines = { 'initdb' => 'FRONTEND' };
signature.asc
Description: PGP signature