Re: 回复:Re: Cache relation sizes?
On Wed, Jun 16, 2021 at 9:24 AM Thomas Munro wrote: > No change yet, just posting a rebase to keep cfbot happy. > > Hi, Thomas I think that the proposed feature is pretty cool not only because it fixes some old issues with lseek() performance and reliability, but also because it opens the door to some new features, such as disk size quota management [1]. Cheap up-to-date size tracking is one of the major problems for it. I've only read the 1st patch so far. Here are some thoughts about it: - SMgrSharedRelation - what about calling it SMgrShmemRelation. It looks weird too, but "...SharedRelation" makes me think of shared catalog tables. - We can exclude temp relations from consideration as they are never shared and use RelFileNode instead of RelFileNodeBackend in SMgrSharedRelationMapping. Not only it will save us some space, but also it will help to avoid a situation when temp relations occupy whole cache (which may easily happen in some workloads). I assume you were trying to make a generic solution, but in practice, temp rels differ from regular ones and may require different optimizations. Local caching will be enough for them if ever needed, for example, we can leave smgr_cached_nblocks[] for temp relations. > int smgr_shared_relations = 1000; - How bad would it be to keep cache for all relations? Let's test memory overhead on some realistic datasets. I suppose this 1000 default was chosen just for WIP patch. I wonder if we can use DSM instead of regular shared memory? - I'd expect that CREATE DATABASE and ALTER DATABASE SET TABLESPACE require special treatment, because they bypass smgr, just like dropdb. Don't see it in the patch. [1] https://github.com/greenplum-db/diskquota -- Best regards, Lubennikova Anastasia
Re: [CLOBBER_CACHE]Server crashed with segfault 11 while executing clusterdb
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation:not tested I looked through the patch. Looks good to me. CFbot tests are passing and, as I got it from the thread, nobody opposes this refactoring, so, move it to RFC status. The new status of this patch is: Ready for Committer
Extensible storage manager API - smgr hooks
Hi, hackers! Many recently discussed features can make use of an extensible storage manager API. Namely, storage level compression and encryption [1], [2], [3], disk quota feature [4], SLRU storage changes [5], and any other features that may want to substitute PostgreSQL storage layer with their implementation (i.e. lazy_restore [6]). Attached is a proposal to change smgr API to make it extensible. The idea is to add a hook for plugins to get control in smgr and define custom storage managers. The patch replaces smgrsw[] array and smgr_sw selector with smgr() function that loads f_smgr implementation. As before it has only one implementation - smgr_md, which is wrapped into smgr_standard(). To create custom implementation, a developer needs to implement smgr API functions static const struct f_smgr smgr_custom = { .smgr_init = custominit, ... } create a hook function const f_smgr * smgr_custom(BackendId backend, RelFileNode rnode) { //Here we can also add some logic and chose which smgr to use based on rnode and backend return _custom; } and finally set the hook: smgr_hook = smgr_custom; [1] https://www.postgresql.org/message-id/flat/11996861554042...@iva4-dd95b404a60b.qloud-c.yandex.net [2] https://www.postgresql.org/message-id/flat/272dd2d9.e52a.17235f2c050.Coremail.chjischj%40163.com [3] https://postgrespro.com/docs/enterprise/9.6/cfs [4] https://www.postgresql.org/message-id/flat/CAB0yre%3DRP_ho6Bq4cV23ELKxRcfhV2Yqrb1zHp0RfUPEWCnBRw%40mail.gmail.com [5] https://www.postgresql.org/message-id/flat/20180814213500.GA74618%4060f81dc409fc.ant.amazon.com [6] https://wiki.postgresql.org/wiki/PGCon_2021_Fun_With_WAL#Lazy_Restore -- Best regards, Lubennikova Anastasia From 90085398f5ecc90d6b7caa318bd3d5f2867ef95c Mon Sep 17 00:00:00 2001 From: anastasia Date: Tue, 29 Jun 2021 22:16:26 +0300 Subject: [PATCH] smgr_api.patch Make smgr API pluggable. Add smgr_hook that can be used to define custom storage managers. Remove smgrsw[] array and smgr_sw selector. Instead, smgropen() uses smgr() function to load f_smgr implementation using smgr_hook. Also add smgr_init_hook and smgr_shutdown_hook. And a lot of mechanical changes in smgr.c functions. --- src/backend/storage/smgr/smgr.c | 136 ++-- src/include/storage/smgr.h | 56 - 2 files changed, 116 insertions(+), 76 deletions(-) diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c index 4dc24649df..5f1981a353 100644 --- a/src/backend/storage/smgr/smgr.c +++ b/src/backend/storage/smgr/smgr.c @@ -26,47 +26,8 @@ #include "utils/hsearch.h" #include "utils/inval.h" - -/* - * This struct of function pointers defines the API between smgr.c and - * any individual storage manager module. Note that smgr subfunctions are - * generally expected to report problems via elog(ERROR). An exception is - * that smgr_unlink should use elog(WARNING), rather than erroring out, - * because we normally unlink relations during post-commit/abort cleanup, - * and so it's too late to raise an error. Also, various conditions that - * would normally be errors should be allowed during bootstrap and/or WAL - * recovery --- see comments in md.c for details. - */ -typedef struct f_smgr -{ - void (*smgr_init) (void); /* may be NULL */ - void (*smgr_shutdown) (void); /* may be NULL */ - void (*smgr_open) (SMgrRelation reln); - void (*smgr_close) (SMgrRelation reln, ForkNumber forknum); - void (*smgr_create) (SMgrRelation reln, ForkNumber forknum, -bool isRedo); - bool (*smgr_exists) (SMgrRelation reln, ForkNumber forknum); - void (*smgr_unlink) (RelFileNodeBackend rnode, ForkNumber forknum, -bool isRedo); - void (*smgr_extend) (SMgrRelation reln, ForkNumber forknum, -BlockNumber blocknum, char *buffer, bool skipFsync); - bool (*smgr_prefetch) (SMgrRelation reln, ForkNumber forknum, - BlockNumber blocknum); - void (*smgr_read) (SMgrRelation reln, ForkNumber forknum, - BlockNumber blocknum, char *buffer); - void (*smgr_write) (SMgrRelation reln, ForkNumber forknum, - BlockNumber blocknum, char *buffer, bool skipFsync); - void (*smgr_writeback) (SMgrRelation reln, ForkNumber forknum, - BlockNumber blocknum, BlockNumber nblocks); - BlockNumber (*smgr_nblocks) (SMgrRelation reln, ForkNumber forknum); - void (*smgr_truncate) (SMgrRelation reln, ForkNumber forknum, - BlockNumber nblocks); - void (*smgr_immedsync) (SMgrRelation reln, ForkNumber forknum); -} f_smgr; - -static const f_smgr smgrsw[] = { +static const f_smgr smgr_md = { /* magnetic disk */ - { .smgr_init = mdinit, .smgr_shutdown = NULL, .smgr_open = mdopen, @@ -82,11 +43,8 @@ static const f_smgr smgrsw[] = { .smgr_nblocks = mdnblocks, .smgr_truncate = mdtruncate, .smgr_immedsync = mdimmedsync, - } }; -static const int NSmgr = lengthof(smgrsw); - /* * Each backend has a
Re: Testing autovacuum wraparound (including failsafe)
On Thu, Jun 10, 2021 at 10:52 AM Andres Freund wrote: > > I started to write a test for $Subject, which I think we sorely need. > > Currently my approach is to: > - start a cluster, create a few tables with test data > - acquire SHARE UPDATE EXCLUSIVE in a prepared transaction, to prevent > autovacuum from doing anything > - cause dead tuples to exist > - restart > - run pg_resetwal -x 227648 > - do things like acquiring pins on pages that block vacuum from progressing > - commit prepared transaction > - wait for template0, template1 datfrozenxid to increase > - wait for relfrozenxid for most relations in postgres to increase > - release buffer pin > - wait for postgres datfrozenxid to increase > > Cool. Thank you for working on that! Could you please share a WIP patch for the $subj? I'd be happy to help with it. So far so good. But I've encountered a few things that stand in the way of > enabling such a test by default: > > 1) During startup StartupSUBTRANS() zeroes out all pages between >oldestActiveXID and nextXid. That takes 8s on my workstation, but only >because I have plenty memory - pg_subtrans ends up 14GB as I currently > do >the test. Clearly not something we could do on the BF. > > 3) pg_resetwal -x requires to carefully choose an xid: It needs to be the >first xid on a clog page. It's not hard to determine which xids are but > it >depends on BLCKSZ and a few constants in clog.c. I've for now hardcoded > a >value appropriate for 8KB, but ... > > Maybe we can add new pg_resetwal option? Something like pg_resetwal --xid-near-wraparound, which will ask pg_resetwal to calculate exact xid value using values from pg_control and clog macros? I think it might come in handy for manual testing too. > I have 2 1/2 ideas about addressing 1); > > - We could exposing functionality to do advance nextXid to a future value > at > runtime, without filling in clog/subtrans pages. Would probably have to > live > in varsup.c and be exposed via regress.so or such? > > This option looks scary to me. Several functions rely on the fact that StartupSUBTRANS() have zeroed pages. And if we will do it conditional just for tests, it means that we won't test the real code path. - The only reason StartupSUBTRANS() does that work is because of the > prepared > transaction holding back oldestActiveXID. That transaction in turn > exists to > prevent autovacuum from doing anything before we do test setup > steps. > > > Perhaps it'd be sufficient to set autovacuum_naptime really high > initially, > perform the test setup, set naptime to something lower, reload config. > But > I'm worried that might not be reliable: If something ends up allocating > an > xid we'd potentially reach the path in GetNewTransaction() that wakes up > the > launcher? But probably there wouldn't be anything doing so? > > Another aspect that might not make this a good choice is that it actually > seems relevant to be able to test cases where there are very old still > running transactions... > > Maybe this exact scenario can be covered with a separate long-running test, not included in buildfarm test suite? -- Best regards, Lubennikova Anastasia
Re: A test for replay of regression tests
вт, 8 июн. 2021 г. в 02:44, Anastasia Lubennikova : > > вт, 8 июн. 2021 г. в 02:25, Thomas Munro : > >> Ok, here's a new version incorporating feedback so far. >> >> 1. Invoke pg_regress directly (no make). >> >> 2. Use PG_TEST_EXTRA="wal_consistency_checking" as a way to opt in to >> the more expensive test. >> >> 3. Use parallel schedule rather than serial. It's faster but also >> the non-determinism might discover more things. This required >> changing the TAP test max_connections setting from 10 to 25. >> >> 4. Remove some extraneous print statements and >> check-if-data-is-replicated-using-SELECT tests that are technically >> not needed (I had copied those from 001_stream_rep.pl). >> > > Thank you for working on this test set! > I was especially glad to see the skip-tests option for pg_regress. I think > it will become a very handy tool for hackers. > > To try the patch I had to resolve a few merge conflicts, see a rebased > version in attachments. > > > auth_extra => [ '--create-role', 'repl_role' ]); > This line and the comment above it look like some copy-paste artifacts. > Did I get it right? If so, I suggest removing them. > Other than that, the patch looks good to me. > For some reason, it failed on cfbot, so I'll switch it back to 'Waiting on author'. BTW, do I get it right, that cfbot CI will need some adjustments to print regression.out for this test? See one more revision of the patch attached. It contains the following changes: - rebase to recent main - removed 'auth_extra' piece, that I mentioned above. - added lacking make clean and gitignore changes. -- Best regards, Lubennikova Anastasia commit 9630f6367d2b5079976bef0a27bb925d00cb798d Author: anastasia Date: Tue Jun 8 02:01:35 2021 +0300 v4-0001-Test-replay-of-regression-tests.patch diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml index cb401a45b3..7a10c83d8a 100644 --- a/doc/src/sgml/regress.sgml +++ b/doc/src/sgml/regress.sgml @@ -289,6 +289,17 @@ make check-world PG_TEST_EXTRA='kerberos ldap ssl' + + + wal_consistency_checking + + + Uses wal_consistency_checking=all while running + some of the tests under src/test/recovery. Not + enabled by default because it is resource intensive. + + + Tests for features that are not supported by the current build diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm index 45d1636128..7b0af9fd78 100644 --- a/src/test/perl/PostgresNode.pm +++ b/src/test/perl/PostgresNode.pm @@ -484,7 +484,7 @@ sub init print $conf "hot_standby = on\n"; # conservative settings to ensure we can run multiple postmasters: print $conf "shared_buffers = 1MB\n"; - print $conf "max_connections = 10\n"; + print $conf "max_connections = 25\n"; # limit disk space consumption, too: print $conf "max_wal_size = 128MB\n"; } diff --git a/src/test/recovery/.gitignore b/src/test/recovery/.gitignore index 871e943d50..03410b70a3 100644 --- a/src/test/recovery/.gitignore +++ b/src/test/recovery/.gitignore @@ -1,2 +1,10 @@ # Generated by test suite /tmp_check/ +/results/ +/expected/ +/sql/ +/log/ + +# Note: regression.* are only left behind on a failure; that's why they're not ignored +#/regression.diffs +#/regression.out diff --git a/src/test/recovery/Makefile b/src/test/recovery/Makefile index 96442ceb4e..0b0d6094cc 100644 --- a/src/test/recovery/Makefile +++ b/src/test/recovery/Makefile @@ -15,6 +15,10 @@ subdir = src/test/recovery top_builddir = ../../.. include $(top_builddir)/src/Makefile.global +# These are created by pg_regress. +# So we need rules to clean them. +pg_regress_clean_files += sql/ expected/ + check: $(prove_check) @@ -22,4 +26,4 @@ installcheck: $(prove_installcheck) clean distclean maintainer-clean: - rm -rf tmp_check + rm -rf $(pg_regress_clean_files) diff --git a/src/test/recovery/t/026_stream_rep_regress.pl b/src/test/recovery/t/026_stream_rep_regress.pl new file mode 100644 index 00..0043405739 --- /dev/null +++ b/src/test/recovery/t/026_stream_rep_regress.pl @@ -0,0 +1,59 @@ +# Run the standard regression tests with streaming replication +use strict; +use warnings; +use PostgresNode; +use TestLib; +use Test::More tests => 2; + +# Initialize primary node +my $node_primary = get_new_node('primary'); +$node_primary->init( + allows_streaming => 1); + +# WAL consistency checking is resource intensive so require opt-in with the +# PG_TEST_EXTRA environment variable. +if ($ENV{PG_TEST_EXTRA} && + $ENV{PG_TEST_EXTRA} =~ m/\bwal_consistency_checking\b/) { + $node_primary->append_conf('postgresql.conf', + 'wal_consistency_checking = all'); +} + +$node_primary->start; +is( $node_primary->psq
Re: A test for replay of regression tests
вт, 8 июн. 2021 г. в 02:25, Thomas Munro : > Ok, here's a new version incorporating feedback so far. > > 1. Invoke pg_regress directly (no make). > > 2. Use PG_TEST_EXTRA="wal_consistency_checking" as a way to opt in to > the more expensive test. > > 3. Use parallel schedule rather than serial. It's faster but also > the non-determinism might discover more things. This required > changing the TAP test max_connections setting from 10 to 25. > > 4. Remove some extraneous print statements and > check-if-data-is-replicated-using-SELECT tests that are technically > not needed (I had copied those from 001_stream_rep.pl). > Thank you for working on this test set! I was especially glad to see the skip-tests option for pg_regress. I think it will become a very handy tool for hackers. To try the patch I had to resolve a few merge conflicts, see a rebased version in attachments. > auth_extra => [ '--create-role', 'repl_role' ]); This line and the comment above it look like some copy-paste artifacts. Did I get it right? If so, I suggest removing them. Other than that, the patch looks good to me. -- Best regards, Lubennikova Anastasia commit 2eeaf5802c060612831b3fdeb33401a07c230f83 Author: anastasia Date: Tue Jun 8 02:01:35 2021 +0300 v3-0001-Test-replay-of-regression-tests.patch diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml index cb401a45b3..7a10c83d8a 100644 --- a/doc/src/sgml/regress.sgml +++ b/doc/src/sgml/regress.sgml @@ -289,6 +289,17 @@ make check-world PG_TEST_EXTRA='kerberos ldap ssl' + + + wal_consistency_checking + + + Uses wal_consistency_checking=all while running + some of the tests under src/test/recovery. Not + enabled by default because it is resource intensive. + + + Tests for features that are not supported by the current build diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm index 45d1636128..7b0af9fd78 100644 --- a/src/test/perl/PostgresNode.pm +++ b/src/test/perl/PostgresNode.pm @@ -484,7 +484,7 @@ sub init print $conf "hot_standby = on\n"; # conservative settings to ensure we can run multiple postmasters: print $conf "shared_buffers = 1MB\n"; - print $conf "max_connections = 10\n"; + print $conf "max_connections = 25\n"; # limit disk space consumption, too: print $conf "max_wal_size = 128MB\n"; } diff --git a/src/test/recovery/t/025_stream_rep_regress.pl b/src/test/recovery/t/025_stream_rep_regress.pl new file mode 100644 index 00..8b125a1d67 --- /dev/null +++ b/src/test/recovery/t/025_stream_rep_regress.pl @@ -0,0 +1,62 @@ +# Run the standard regression tests with streaming replication +use strict; +use warnings; +use PostgresNode; +use TestLib; +use Test::More tests => 2; + +# Initialize primary node +my $node_primary = get_new_node('primary'); +# A specific role is created to perform some tests related to replication, +# and it needs proper authentication configuration. +$node_primary->init( + allows_streaming => 1, + auth_extra => [ '--create-role', 'repl_role' ]); + +# WAL consistency checking is resource intensive so require opt-in with the +# PG_TEST_EXTRA environment variable. +if ($ENV{PG_TEST_EXTRA} && + $ENV{PG_TEST_EXTRA} =~ m/\bwal_consistency_checking\b/) { + $node_primary->append_conf('postgresql.conf', + 'wal_consistency_checking = all'); +} + +$node_primary->start; +is( $node_primary->psql( +'postgres', +qq[SELECT pg_create_physical_replication_slot('standby_1');]), +0, +'physical slot created on primary'); +my $backup_name = 'my_backup'; + +# Take backup +$node_primary->backup($backup_name); + +# Create streaming standby linking to primary +my $node_standby_1 = get_new_node('standby_1'); +$node_standby_1->init_from_backup($node_primary, $backup_name, + has_streaming => 1); +$node_standby_1->append_conf('postgresql.conf', +"primary_slot_name = standby_1"); +$node_standby_1->start; + +# XXX The tablespace tests don't currently work when the standby shares a +# filesystem with the primary, due to colliding absolute paths. We'll skip +# that for now. + +# Run the regression tests against the primary. +system_or_bail("../regress/pg_regress", + "--port=" . $node_primary->port, + "--schedule=../regress/parallel_schedule", + "--dlpath=../regress", + "--inputdir=../regress", + "--skip-tests=tablespace"); + +# Wait for standby to catch up +$node_primary->wait_for_catchup($node_standby_1, 'replay', + $node_primary->lsn('insert')); + +ok(1, "caught up"); + +$node_standby_1->stop; +$node_primary->stop; diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c index e04d365258..510afaadbf 100644 --- a/src/test/regress/pg_regress.c +++ b/src/test/regress/pg_regress.c @@ -83,6 +83,7 @@ static int max_connections = 0; static int max_concurrent_tests = 0; static char *encoding = NULL; static _stringlist *schedulelist = NULL;
Re: Performing partition pruning using row value
On 21.07.2020 11:24, kato-...@fujitsu.com wrote: So, after looking at these functions and modifying this patch, I would like to add this patch to the next I updated this patch and registered for the next CF . https://commitfest.postgresql.org/29/2654/ regards, sho kato Thank you for working on this improvement. I took a look at the code. 1) This piece of code is unneeded: switch (get_op_opfamily_strategy(opno, partopfamily)) { case BTLessStrategyNumber: case BTLessEqualStrategyNumber: case BTGreaterEqualStrategyNumber: case BTGreaterStrategyNumber: See the comment for RowCompareExpr, which states that "A RowCompareExpr node is only generated for the < <= > >= cases". 2) It's worth to add a regression test for this feature. Other than that, the patch looks good to me. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: CREATE INDEX CONCURRENTLY on partitioned index
On 28.01.2021 17:30, Justin Pryzby wrote: On Thu, Jan 28, 2021 at 09:51:51PM +0900, Masahiko Sawada wrote: On Mon, Nov 30, 2020 at 5:22 AM Justin Pryzby wrote: On Sat, Oct 31, 2020 at 01:31:17AM -0500, Justin Pryzby wrote: Forking this thread, since the existing CFs have been closed. https://www.postgresql.org/message-id/flat/20200914143102.GX18552%40telsasoft.com#58b1056488451f8594b0f0ba40996afd The strategy is to create catalog entries for all tables with indisvalid=false, and then process them like REINDEX CONCURRENTLY. If it's interrupted, it leaves INVALID indexes, which can be cleaned up with DROP or REINDEX, same as CIC on a plain table. On Sat, Aug 08, 2020 at 01:37:44AM -0500, Justin Pryzby wrote: On Mon, Jun 15, 2020 at 09:37:42PM +0900, Michael Paquier wrote: Note that the mentioned problem wasn't serious: there was missing index on child table, therefor the parent index was invalid, as intended. However I agree that it's not nice that the command can fail so easily and leave behind some indexes created successfully and some failed some not created at all. But I took your advice initially creating invalid inds. ... That gave me the idea to layer CIC on top of Reindex, since I think it does exactly what's needed. On Sat, Sep 26, 2020 at 02:56:55PM -0500, Justin Pryzby wrote: On Thu, Sep 24, 2020 at 05:11:03PM +0900, Michael Paquier wrote: It would be good also to check if we have a partition index tree that maps partially with a partition table tree (aka no all table partitions have a partition index), where these don't get clustered because there is no index to work on. This should not happen, since a incomplete partitioned index is "invalid". I had been waiting to rebase since there hasn't been any review comments and I expected additional, future conflicts. I attempted to review this feature, but the last patch conflicts with the recent refactoring, so I wasn't able to test it properly. Could you please send a new version? Meanwhile, here are my questions about the patch: 1) I don't see a reason to change the logic here. We don't skip counting existing indexes when create parent index. Why should we skip them in CONCURRENTLY mode? // If concurrent, maybe this should be done after excluding indexes which already exist ? pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL, nparts); 2) Here we access relation field after closing the relation. Is it safe? /* save lockrelid and locktag for below */ heaprelid = rel->rd_lockInfo.lockRelId; 3) leaf_partitions() function only handles indexes, so I suggest to name it more specifically and add a comment about meaning of 'options' parameter. 4) I don't quite understand the idea of the regression test. Why do we expect to see invalid indexes there? + "idxpart_a_idx1" UNIQUE, btree (a) INVALID 5) Speaking of documentation, I think we need to add a paragraph about CIC on partitioned indexes which will explain that invalid indexes may appear and what user should do to fix them. 6) ReindexIndexesConcurrently() needs some code cleanup. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: CREATE INDEX CONCURRENTLY on partitioned index
On 28.01.2021 17:30, Justin Pryzby wrote: On Thu, Jan 28, 2021 at 09:51:51PM +0900, Masahiko Sawada wrote: On Mon, Nov 30, 2020 at 5:22 AM Justin Pryzby wrote: On Sat, Oct 31, 2020 at 01:31:17AM -0500, Justin Pryzby wrote: Forking this thread, since the existing CFs have been closed. https://www.postgresql.org/message-id/flat/20200914143102.GX18552%40telsasoft.com#58b1056488451f8594b0f0ba40996afd The strategy is to create catalog entries for all tables with indisvalid=false, and then process them like REINDEX CONCURRENTLY. If it's interrupted, it leaves INVALID indexes, which can be cleaned up with DROP or REINDEX, same as CIC on a plain table. On Sat, Aug 08, 2020 at 01:37:44AM -0500, Justin Pryzby wrote: On Mon, Jun 15, 2020 at 09:37:42PM +0900, Michael Paquier wrote: Note that the mentioned problem wasn't serious: there was missing index on child table, therefor the parent index was invalid, as intended. However I agree that it's not nice that the command can fail so easily and leave behind some indexes created successfully and some failed some not created at all. But I took your advice initially creating invalid inds. ... That gave me the idea to layer CIC on top of Reindex, since I think it does exactly what's needed. On Sat, Sep 26, 2020 at 02:56:55PM -0500, Justin Pryzby wrote: On Thu, Sep 24, 2020 at 05:11:03PM +0900, Michael Paquier wrote: It would be good also to check if we have a partition index tree that maps partially with a partition table tree (aka no all table partitions have a partition index), where these don't get clustered because there is no index to work on. This should not happen, since a incomplete partitioned index is "invalid". I had been waiting to rebase since there hasn't been any review comments and I expected additional, future conflicts. I attempted to review this feature, but the last patch conflicts with the recent refactoring, so I wasn't able to test it properly. Could you please send a new version? Meanwhile, here are my questions about the patch: 1) I don't see a reason to change the logic here. We don't skip counting existing indexes when create parent index. Why should we skip them in CONCURRENTLY mode? // If concurrent, maybe this should be done after excluding indexes which already exist ? pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL, nparts); 2) Here we access relation field after closing the relation. Is it safe? /* save lockrelid and locktag for below */ heaprelid = rel->rd_lockInfo.lockRelId; 3) leaf_partitions() function only handles indexes, so I suggest to name it more specifically and add a comment about meaning of 'options' parameter. 4) I don't quite understand the idea of the regression test. Why do we expect to see invalid indexes there? + "idxpart_a_idx1" UNIQUE, btree (a) INVALID 5) Speaking of documentation, I think we need to add a paragraph about CIC on partitioned indexes which will explain that invalid indexes may appear and what user should do to fix them. 6) ReindexIndexesConcurrently() needs some code cleanup. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: WIP: document the hook system
On 17.01.2021 16:53, Magnus Hagander wrote: On Fri, Jan 15, 2021 at 8:28 AM Peter Eisentraut wrote: On 2020-12-31 04:28, David Fetter wrote: This could probably use a lot of filling in, but having it in the actual documentation beats needing to know folklore even to know that the capability is there. This patch seems quite short of a state where one could begin to evaluate it. Documenting the hooks better seems a worthwhile goal. I think the question is whether we can develop documentation that is genuinely useful by itself without studying the relevant source code. This submission does not address that question. Even just having a list of available hooks would be a nice improvement though :) But maybe it's something that belongs better in a README file instead, since as you say it's unlikely to be properly useful without looking at the source anyway. But just a list of hooks and a *very* high overview of where each of them hooks in would definitely be useful to have somewhere, I think. Having to find with "grep" whether there may or may not exist a hook for approximately what it is you're looking for is definitely a process to improve on. +1 for README. Hooks are intended for developers and can be quite dangerous without proper understanding of the internal code. I also want to remind about a readme gathered by mentees [1]. It was done under a PostgreSQL license, so we can use it. By the way, is there any agreement on the plain-text format of PostrgeSQL README files or we can use md? [1] https://github.com/AmatanHead/psql-hooks/blob/master/Detailed.md -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: some pointless HeapTupleHeaderIndicatesMovedPartitions calls
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation:not tested I wonder, why this patch hangs on commitfest for so long. The idea of the fix is clear, the code is correct and all tests pass, so, I move it to ReadyForCommitter status. The new status of this patch is: Ready for Committer
Re: pg_upgrade fails with non-standard ACL
On 24.01.2021 11:39, Noah Misch wrote: On Thu, Jan 21, 2021 at 01:03:58AM +0300, Anastasia Lubennikova wrote: On 03.01.2021 14:29, Noah Misch wrote: Overall, this patch predicts a subset of cases where pg_dump will emit a failing GRANT or REVOKE that targets a pg_catalog object. Can you write a code comment stating what it does and does not detect? I think it's okay to not predict every failure, but let's record the intended coverage. Here are a few examples I know; there may be more. The above query only finds GRANTs to non-pinned roles. pg_dump dumps the following, but the above query doesn't find them: REVOKE ALL ON FUNCTION pg_last_wal_replay_lsn FROM public; GRANT EXECUTE ON FUNCTION pg_reload_conf() TO pg_signal_backend; I see a new comment listing object types. Please also explain the lack of preventing REVOKE failures (first example query above) and the limitation around non-pinned roles (second example). 1) Could you please clarify, what do you mean by REVOKE failures? I tried following example: Start 9.6 cluster. REVOKE ALL ON function pg_switch_xlog() FROM public; REVOKE ALL ON function pg_switch_xlog() FROM backup; The upgrade to the current master passes with and without patch. It seems that current implementation of pg_upgrade doesn't take into account revoke ACLs. 2) As for pinned roles, I think we should fix this behavior, rather than adding a comment. Because such roles can have grants on system objects. In earlier versions of the patch, I gathered ACLs directly from system catalogs: pg_proc.proacl, pg_class.relacl pg_attribute.attacl and pg_type.typacl. The only downside of this approach is that it cannot be easily extended to other object types, as we need to handle every object type separately. I don't think it is a big deal, as we already do it in check_for_changed_signatures() And also the query to gather non-standard ACLs won't look as nice as now, because of the need to parse arrays of aclitems. What do you think? -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: pg_upgrade fails with non-standard ACL
On 03.01.2021 14:29, Noah Misch wrote: On Thu, Jun 11, 2020 at 07:58:43PM +0300, Anastasia Lubennikova wrote: On 08.06.2020 19:31, Alvaro Herrera wrote: I'm thinking what's a good way to have a test that's committable. Maybe it would work to add a TAP test to pg_upgrade that runs initdb, does a few GRANTS as per your attachment, then runs pg_upgrade? Currently the pg_upgrade tests are not TAP ... we'd have to revive https://postgr.es/m/20180126080026.gi17...@paquier.xyz (Some progress was already made on the buildfarm front to permit this) I would be glad to add some test, but it seems to me that the infrastructure changes for cross-version pg_upgrade test is much more complicated task than this modest bugfix. Agreed. Thank you for the review. New version of the patch is attached, though I haven't tested it properly yet. I am planning to do in a couple of days. --- a/src/bin/pg_upgrade/check.c +++ b/src/bin/pg_upgrade/check.c +static void +check_for_changed_signatures(void) +{ + snprintf(output_path, sizeof(output_path), "revoke_objects.sql"); If an upgrade deleted function pg_last_wal_replay_lsn, upgrading a database in which "REVOKE ALL ON FUNCTION pg_last_wal_replay_lsn FROM public" happened requires a GRANT. Can you use a file name that will still make sense if someone enhances pg_upgrade to generate such GRANT statements? I changed the name to 'fix_system_objects_ACL.sql'. Does it look good to you? + if (script == NULL && (script = fopen_priv(output_path, "w")) == NULL) + pg_fatal("could not open file \"%s\": %s\n", +output_path, strerror(errno)); Use %m instead of passing sterror(errno) to %s. Done. + } + + /* Handle columns separately */ + if (strstr(aclinfo->obj_type, "column") != NULL) + { + char *pdot = last_dot_location(aclinfo->obj_ident); I'd write strrchr(aclinfo->obj_ident, '.') instead of introducing last_dot_location(). This assumes column names don't contain '.'; how difficult would it be to remove that assumption? If it's difficult, a cheap approach would be to ignore any entry containing too many dots. We're not likely to create such column names at initdb time, but v13 does allow: ALTER VIEW pg_locks RENAME COLUMN objid TO "a.b"; GRANT SELECT ("a.b") ON pg_locks TO backup; After which revoke_objects.sql has: psql:./revoke_objects.sql:9: ERROR: syntax error at or near "") ON pg_catalog.pg_locks."" LINE 1: REVOKE ALL (b") ON pg_catalog.pg_locks."a FROM "backup"; While that ALTER VIEW probably should have required allow_system_table_mods, we need to assume such databases exist. I've fixed it by using pg_identify_object_as_address(). Now we don't have to parse obj_identity. Overall, this patch predicts a subset of cases where pg_dump will emit a failing GRANT or REVOKE that targets a pg_catalog object. Can you write a code comment stating what it does and does not detect? I think it's okay to not predict every failure, but let's record the intended coverage. Here are a few examples I know; there may be more. The above query only finds GRANTs to non-pinned roles. pg_dump dumps the following, but the above query doesn't find them: REVOKE ALL ON FUNCTION pg_last_wal_replay_lsn FROM public; GRANT EXECUTE ON FUNCTION pg_reload_conf() TO pg_signal_backend; The above query should test refclassid. This function should not run queries against servers older than 9.6, because pg_dump emits GRANT/REVOKE for pg_catalog objects only when targeting 9.6 or later. An upgrade from 8.4 to master is failing on the above query: Fixed. The patch adds many lines wider than 78 columns, this being one example. In general, break such lines. (Don't break string literal arguments of ereport(), though.) Fixed. nm -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c index 43fc297eb6..429e4468f2 100644 --- a/src/bin/pg_upgrade/check.c +++ b/src/bin/pg_upgrade/check.c @@ -16,6 +16,7 @@ static void check_new_cluster_is_empty(void); static void check_databases_are_compatible(void); +static void check_for_changed_signatures(void); static void check_locale_and_encoding(DbInfo *olddb, DbInfo *newdb); static bool equivalent_locale(int category, const char *loca, const char *locb); static void check_is_install_user(ClusterInfo *cluster); @@ -151,6 +152,8 @@ check_and_dump_old_cluster(bool live_check) if (GET_MAJOR_VERSION(old_cluster.major_version) <= 804) new_9_0_populate_pg
Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits
On 12.01.2021 22:30, Tomas Vondra wrote: Thanks. These patches seem to resolve the TOAST table issue, freezing it as expected. I think the code duplication is not an issue, but I wonder why heap_insert uses this condition: /* * ... * * No need to update the visibilitymap if it had all_frozen bit set * before this insertion. */ if (all_frozen_set && ((vmstatus & VISIBILITYMAP_ALL_FROZEN) == 0)) while heap_multi_insert only does this: if (all_frozen_set) { ... } I haven't looked at the details, but shouldn't both do the same thing? I decided to add this check for heap_insert() to avoid unneeded calls of visibilitymap_set(). If we insert tuples one by one, we can only call this once per page. In my understanding, heap_multi_insert() inserts tuples in batches, so it doesn't need this optimization. However, I've also repeated the test counting all-frozen pages in both the main table and TOAST table, and I get this: patched === select count(*) from pg_visibility((select reltoastrelid from pg_class where relname = 't')); count 12 (1 row) select count(*) from pg_visibility((select reltoastrelid from pg_class where relname = 't')) where not all_visible; count 0 (1 row) That is - all TOAST pages are frozen (as expected, which is good). But now there are 12 pages, not just 10 pages. That is, we're now creating 2 extra pages, for some reason. I recall Pavan reported similar issue with every 32768-th page not being properly filled, but I'm not sure if that's the same issue. regards As Pavan correctly figured it out before the problem is that RelationGetBufferForTuple() moves to the next page, losing free space in the block: > ... I see that a relcache invalidation arrives > after 1st and then after every 32672th block is filled. That clears the > rel->rd_smgr field and we lose the information about the saved target > block. The code then moves to extend the relation again and thus skips the > previously less-than-half filled block, losing the free space in that block. The reason of this cache invalidation is vm_extend() call, which happens every 32762 blocks. RelationGetBufferForTuple() tries to use the last page, but for some reason this code is under 'use_fsm' check. And COPY FROM doesn't use fsm (see TABLE_INSERT_SKIP_FSM). /* * If the FSM knows nothing of the rel, try the last page before we * give up and extend. This avoids one-tuple-per-page syndrome during * bootstrapping or in a recently-started system. */ if (targetBlock == InvalidBlockNumber) { BlockNumber nblocks = RelationGetNumberOfBlocks(relation); if (nblocks > 0) targetBlock = nblocks - 1; } I think we can use this code without regard to 'use_fsm'. With this change, the number of toast rel pages is correct. The patch is attached. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/contrib/pg_visibility/expected/pg_visibility.out b/contrib/pg_visibility/expected/pg_visibility.out index ca4b6e186b..0017e3415c 100644 --- a/contrib/pg_visibility/expected/pg_visibility.out +++ b/contrib/pg_visibility/expected/pg_visibility.out @@ -179,6 +179,69 @@ select pg_truncate_visibility_map('test_partition'); (1 row) +-- test copy freeze +create table copyfreeze (a int, b char(1500)); +-- load all rows via COPY FREEZE and ensure that all pages are set all-visible +-- and all-frozen. +begin; +truncate copyfreeze; +copy copyfreeze from stdin freeze; +commit; +select * from pg_visibility_map('copyfreeze'); + blkno | all_visible | all_frozen +---+-+ + 0 | t | t + 1 | t | t + 2 | t | t +(3 rows) + +select * from pg_check_frozen('copyfreeze'); + t_ctid + +(0 rows) + +-- load half the rows via regular COPY and rest via COPY FREEZE. The pages +-- which are touched by regular COPY must not be set all-visible/all-frozen. On +-- the other hand, pages allocated by COPY FREEZE should be marked +-- all-frozen/all-visible. +begin; +truncate copyfreeze; +copy copyfreeze from stdin; +copy copyfreeze from stdin freeze; +commit; +select * from pg_visibility_map('copyfreeze'); + blkno | all_visible | all_frozen +---+-+ + 0 | f | f + 1 | f | f + 2 | t | t +(3 rows) + +select * from pg_check_frozen('copyfreeze'); + t_ctid + +(0 rows) + +-- Try a mix of regular COPY and COPY FREEZE. +begin; +truncate copyfreeze; +copy copyfreeze from stdin freeze; +copy copyfreeze from stdin; +copy copyfreeze from stdin freeze; +commit; +select * from pg_visibility_map('copyfreeze'); + blkno | all_visible | all_frozen +---+-+ + 0 | t | t +
Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits
On 12.01.2021 00:51, Tomas Vondra wrote: On 1/11/21 10:00 PM, Anastasia Lubennikova wrote: On 11.01.2021 01:35, Tomas Vondra wrote: Hi, I started looking at this patch again, hoping to get it committed in this CF, but I think there's a regression in handling TOAST tables (compared to the v3 patch submitted by Pavan in February 2019). The test I'm running a very simple test (see test.sql): 1) start a transaction 2) create a table with a text column 3) copy freeze data into it 4) use pg_visibility to see how many blocks are all_visible both in the main table and it's TOAST table For v3 patch (applied on top of 278584b526 and s/hi_options/ti_options) I get this: pages NOT all_visible -- main 637 0 toast 50001 3 There was some discussion about relcache invalidations causing a couple TOAST pages not be marked as all_visible, etc. However, for this patch on master I get this pages NOT all_visible -- main 637 0 toast 50001 50001 So no pages in TOAST are marked as all_visible. I haven't investigated what's causing this, but IMO that needs fixing to make ths patch RFC. Attached is the test script I'm using, and a v5 of the patch - rebased on current master, with some minor tweaks to comments etc. Thank you for attaching the test script. I reproduced the problem. This regression occurs because TOAST internally uses heap_insert(). You have asked upthread about adding this optimization to heap_insert(). I wrote a quick fix, see the attached patch 0002. The TOAST test passes now, but I haven't tested performance or any other use-cases yet. I'm going to test it properly in a couple of days and share results. Thanks. I think it's important to make this work for TOAST tables - it often stores most of the data, and it was working in v3 of the patch. I haven't looked into the details, but if it's really just due to TOAST using heap_insert, I'd say it just confirms the importance of tweaking heap_insert too. I've tested performance. All tests were run on my laptop, latest master with and without patches, all default settings, except disabled autovacuum and installed pg_stat_statements extension. The VACUUM is significantly faster with the patch, as it only checks visibility map. COPY speed fluctuates a lot between tests, but I didn't notice any trend. I would expect minor slowdown with the patch, as we need to handle visibility map pages during COPY FREEZE. But in some runs, patched version was faster than current master, so the impact of the patch is insignificant. I run 3 different tests: 1) Regular table (final size 5972 MB) patched | master COPY FREEZE data 3 times: 33384,544 ms 31135,037 ms 31666,226 ms 31158,294 ms 32802,783 ms 33599,234 ms VACUUM 54,185 ms 48445,584 ms 2) Table with TOAST (final size 1207 MB where 1172 MB is in toast table) patched | master COPY FREEZE data 3 times: 368166,743 ms 383231,077 ms 398695,018 ms 454572,630 ms 410174,682 ms 567847,288 ms VACUUM 43,159 ms 6648,302 ms 3) Table with a trivial BEFORE INSERT trigger (final size 5972 MB) patched | master COPY FREEZE data 3 times: 73544,225 ms 64967,802 ms 90960,584 ms 71826,362 ms 81356,025 ms 80023,041 ms VACUUM 49,626 ms 40100,097 ms I took another look at the yesterday's patch and it looks fine to me. So now I am waiting for your review. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits
On 11.01.2021 01:35, Tomas Vondra wrote: Hi, I started looking at this patch again, hoping to get it committed in this CF, but I think there's a regression in handling TOAST tables (compared to the v3 patch submitted by Pavan in February 2019). The test I'm running a very simple test (see test.sql): 1) start a transaction 2) create a table with a text column 3) copy freeze data into it 4) use pg_visibility to see how many blocks are all_visible both in the main table and it's TOAST table For v3 patch (applied on top of 278584b526 and s/hi_options/ti_options) I get this: pages NOT all_visible -- main 637 0 toast 50001 3 There was some discussion about relcache invalidations causing a couple TOAST pages not be marked as all_visible, etc. However, for this patch on master I get this pages NOT all_visible -- main 637 0 toast 50001 50001 So no pages in TOAST are marked as all_visible. I haven't investigated what's causing this, but IMO that needs fixing to make ths patch RFC. Attached is the test script I'm using, and a v5 of the patch - rebased on current master, with some minor tweaks to comments etc. Thank you for attaching the test script. I reproduced the problem. This regression occurs because TOAST internally uses heap_insert(). You have asked upthread about adding this optimization to heap_insert(). I wrote a quick fix, see the attached patch 0002. The TOAST test passes now, but I haven't tested performance or any other use-cases yet. I'm going to test it properly in a couple of days and share results. With this change a lot of new code is repeated in heap_insert() and heap_multi_insert(). I think it's fine, because these functions already have a lot in common. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company >From 6ccaa4f526b38fbdd2f3a38ac3bd51e96fb140b6 Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Sun, 10 Jan 2021 20:30:29 +0100 Subject: [PATCH] Set PD_ALL_VISIBLE and visibility map bits in COPY FREEZE Make sure COPY FREEZE marks the pages as PD_ALL_VISIBLE and updates the visibility map. Until now it only marked individual tuples as frozen, but page-level flags were not updated. This is a fairly old patch, and multiple people worked on it. The first version was written by Jeff Janes, and then reworked by Pavan Deolasee and Anastasia Lubennikova. Author: Pavan Deolasee, Anastasia Lubennikova, Jeff Janes Reviewed-by: Kuntal Ghosh, Jeff Janes, Tomas Vondra, Masahiko Sawada, Andres Freund, Ibrar Ahmed, Robert Haas, Tatsuro Ishii Discussion: https://postgr.es/m/caboikdn-ptgv0mzntrk2q8otfuuajqaymgmkdu1dckftuxv...@mail.gmail.com Discussion: https://postgr.es/m/CAMkU%3D1w3osJJ2FneELhhNRLxfZitDgp9FPHee08NT2FQFmz_pQ%40mail.gmail.com --- .../pg_visibility/expected/pg_visibility.out | 64 +++ contrib/pg_visibility/sql/pg_visibility.sql | 77 +++ src/backend/access/heap/heapam.c | 76 -- src/backend/access/heap/hio.c | 17 src/include/access/heapam_xlog.h | 3 + 5 files changed, 229 insertions(+), 8 deletions(-) diff --git a/contrib/pg_visibility/expected/pg_visibility.out b/contrib/pg_visibility/expected/pg_visibility.out index ca4b6e186b..0017e3415c 100644 --- a/contrib/pg_visibility/expected/pg_visibility.out +++ b/contrib/pg_visibility/expected/pg_visibility.out @@ -179,6 +179,69 @@ select pg_truncate_visibility_map('test_partition'); (1 row) +-- test copy freeze +create table copyfreeze (a int, b char(1500)); +-- load all rows via COPY FREEZE and ensure that all pages are set all-visible +-- and all-frozen. +begin; +truncate copyfreeze; +copy copyfreeze from stdin freeze; +commit; +select * from pg_visibility_map('copyfreeze'); + blkno | all_visible | all_frozen +---+-+ + 0 | t | t + 1 | t | t + 2 | t | t +(3 rows) + +select * from pg_check_frozen('copyfreeze'); + t_ctid + +(0 rows) + +-- load half the rows via regular COPY and rest via COPY FREEZE. The pages +-- which are touched by regular COPY must not be set all-visible/all-frozen. On +-- the other hand, pages allocated by COPY FREEZE should be marked +-- all-frozen/all-visible. +begin; +truncate copyfreeze; +copy copyfreeze from stdin; +copy copyfreeze from stdin freeze; +commit; +select * from pg_visibility_map('copyfreeze'); + blkno | all_visible | all_frozen +---+-+ + 0 | f | f + 1 | f | f + 2 | t | t +(3 rows) + +select * from pg_check_frozen('copyfreeze'); + t_ctid + +(0 rows) + +-- Try a mix of regular COPY and COPY FREEZE. +begin; +trunc
Re: Commitfest 2020-11 is closed
On 02.12.2020 23:59, Tom Lane wrote: Anastasia Lubennikova writes: Commitfest 2020-11 is officially closed now. Many thanks to everyone who participated by posting patches, reviewing them, committing and sharing ideas in discussions! Thanks for all the hard work! Today, me and Georgios will move the remaining items to the next CF or return them with feedback. We're planning to leave Ready For Committer till the end of the week, to make them more visible and let them get the attention they deserve. This is actually a bit problematic, because now the cfbot is ignoring those patches (or if it's not, I don't know where it's displaying the results). Please go ahead and move the remaining open patches, or else re-open the CF if that's possible. regards, tom lane Oh, I wasn't aware of that. Thank you for the reminder. Now all patches are moved to the next CF. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: BUG #15383: Join Filter cost estimation problem in 10.5
On 30.10.2020 19:33, David G. Johnston wrote: On Fri, Oct 30, 2020 at 9:16 AM Anastasia Lubennikova mailto:lubennikov...@gmail.com>> wrote: Status update for a commitfest entry. It looks like there was no real progress on this issue since April. I see only an experimental patch. What kind of review does it need right now? Do we need more testing or maybe production statistics for such queries? David, are you going to continue working on it? Can you, please, provide a short summary of the problem and list open questions for reviewers. The new status of this patch is: Waiting on Author I agree that updating a wiki seems like an unappealing conclusion to this entry. I'm on the fence whether we just leave it in the cf and get a periodic nag when it gets bumped. As we are describing real problems or potential improvements to live portions of the codebase ISTM that adding code comments near to those locations would be warranted. The commit message can reference this thread. There seems to already be a convention for annotating code comments in such a way that they can be searched for - which basically is what sticking it in the wiki would accomplish but the searcher would have to look in the codebase and not on a website. For the target audience of this specific patch that seems quite reasonable. David J. Here we are again. The commitfest is closed now and the discussion haven't moved from where it was a month ago. I don't see any use in moving it to the next CF as is and I don't want to return it either, as this is clearly a bug. I think we have a few options: 1. We can add it into TODO until better times. 2. We can write a patch to address this problem in code comments. 3. We can maybe CC this thread to someone and ask for help. Do you know people, who have expertise in this area? None of these options is perfect, but the second option will perhaps be a good compromise. David, can you, please submit a patch? -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Reduce the time required for a database recovery from archive.
On 09.11.2020 19:31, Stephen Frost wrote: Greetings, * Dmitry Shulga (d.shu...@postgrespro.ru) wrote: On 19 Oct 2020, at 23:25, Stephen Frost wrote: process finishes a WAL file but then just sit around doing nothing while waiting for the applying process to finish another segment. I believe that for typical set-up the parameter max_restore_command_workers would have value 2 or 3 in order to supply a delivered WAL file on time just before it be started processing. This use case is for environment where time required for delivering WAL file from archive is greater than time required for applying records contained in the WAL file. If time required for WAL file delivering lesser than than time required for handling records contained in it then max_restore_command_workers shouldn't be specified at all That's certainly not correct at all- the two aren't really all that related, because any time spent waiting for a WAL file to be delivered is time that the applying process *could* be working to apply WAL instead of waiting. At a minimum, I'd expect us to want to have, by default, at least one worker process running out in front of the applying process to hopefully eliminate most, if not all, time where the applying process is waiting for a WAL to show up. In cases where the applying process is faster than a single fetching process, a user might want to have two or more restore workers, though ultimately I still contend that what they really want is as many workers as needed to make sure that the applying process doesn't ever need to wait- up to some limit based on the amount of space that's available. And back to the configuration side of this- have you considered the challenge that a user who is using very large WAL files might run into with the proposed approach that doesn't allow them to control the amount of space used? If I'm using 1G WAL files, then I need to have 16G available to have *any* pre-fetching done with this proposed approach, right? That doesn't seem great. Thanks, Stephen Status update for a commitfest entry. The commitfest is closed now. As this entry has been Waiting on Author for a while, I've marked it as returned with feedback. Dmitry, feel free to resubmit an updated version to a future commitfest. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Corruption during WAL replay
On 06.11.2020 14:40, Masahiko Sawada wrote: So I agree to proceed with the patch that adds a critical section independent of fixing other related things discussed in this thread. If Teja seems not to work on this I’ll write the patch. Regards, -- Masahiko Sawada EnterpriseDB: https://www.enterprisedb.com/ Status update for a commitfest entry. The commitfest is closed now. As this entry is a bug fix, I am moving it to the next CF. Are you planning to continue working on it? -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Corner-case bug in pg_rewind
On 16.11.2020 05:49, Ian Lawrence Barwick wrote: Note that the patch may require reworking for HEAD due to changes in commit 9c4f5192f6. I'll try to take another look this week. Regards Ian Barwick -- EnterpriseDB: https://www.enterprisedb.com Status update for a commitfest entry. The patch is Waiting on Author for some time. As this is a bug fix, I am moving it to the next CF. Ian, are you planning to continue working on it? -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Unnecessary delay in streaming replication due to replay lag
On 20.11.2020 11:21, Michael Paquier wrote: On Tue, Sep 15, 2020 at 05:30:22PM +0800, lchch1...@sina.cn wrote: I read the code and test the patch, it run well on my side, and I have several issues on the patch. + RequestXLogStreaming(ThisTimeLineID, +startpoint, +PrimaryConnInfo, +PrimarySlotName, +wal_receiver_create_temp_slot); This patch thinks that it is fine to request streaming even if PrimaryConnInfo is not set, but that's not fine. Anyway, I don't quite understand what you are trying to achieve here. "startpoint" is used to request the beginning of streaming. It is roughly the consistency LSN + some alpha with some checks on WAL pages (those WAL page checks are not acceptable as they make maintenance harder). What about the case where consistency is reached but there are many segments still ahead that need to be replayed? Your patch would cause streaming to begin too early, and a manual copy of segments is not a rare thing as in some environments a bulk copy of segments can make the catchup of a standby faster than streaming. It seems to me that what you are looking for here is some kind of pre-processing before entering the redo loop to determine the LSN that could be reused for the fast streaming start, which should match the end of the WAL present locally. In short, you would need a XLogReaderState that begins a scan of WAL from the redo point until it cannot find anything more, and use the last LSN found as a base to begin requesting streaming. The question of timeline jumps can also be very tricky, but it could also be possible to not allow this option if a timeline jump happens while attempting to guess the end of WAL ahead of time. Another thing: could it be useful to have an extra mode to begin streaming without waiting for consistency to finish? -- Michael Status update for a commitfest entry. This entry was "Waiting On Author" during this CF, so I've marked it as returned with feedback. Feel free to resubmit an updated version to a future commitfest. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Strange behavior with polygon and NaN
On 25.11.2020 11:14, Kyotaro Horiguchi wrote: At Wed, 25 Nov 2020 11:39:39 +0900 (JST), Kyotaro Horiguchi wrote in So that line of thought prompts me to tread *very* carefully when trying to dodge NaN results. We need to be certain that we introduce only logically-defensible special cases. Something like float8_coef_mul() seems much more likely to lead us into errors than away from them. Agreed on that point. I'm going to rewirte the patch in that direction. Removed the function float8_coef_mul(). I noticed that the check you proposed to add to line_closept_point doesn't work for the following case: select line('{1,-1,0}') <-> point(1e300, 'Infinity'); Ax + By + C = 1 * 1e300 + -1 * Inf + 0 = -Inf is not NaN so we go on the following steps. derive the perpendicular line: => line(-1, -1, Inf} derive the cross point : => point(Inf, Inf) calculate the distance : => NaN (which should be Infinity) So I left the check whether distance is NaN in this version. In the previous version the check is done before directly calculating the distance, but since we already have the result of Ax+Bx+C so I decided not to use point_dt() in this version. Although I wrote that it should be wrong that applying FPzero() to coefficients, there are some places already doing that so I followed those predecessors. Reverted the change of pg_hypot(). While checking the regression results, I noticed that the follwoing calculation, which seems wrong. select line('{3,NaN,5}') = line('{3,NaN,5}'); ?column? -- t But after looking point_eq(), I decided to let the behavior alone since I'm not sure the reason for the behavior of the functions. At least the comment for point_eq() says that is the delibarate behvior. box_same, poly_same base on the point_eq_point so they behave the same way. By the way, '=' doesn't compare the shape but compares the area. However, what is the area of a line? That should be always 0 even if we considered it. And it is also strange that we don't have corresponding comparison ('<' and so) operators. It seems to me as if a mistake of '~='. If it is correct, I should revert the change of line_eq() along with fixing operator assignment. regards. Status update for a commitfest entry. The commitfest is closed now and this entry is "Waiting on author". As far as I see, part of the fixes is already committed. Is there anything left to work on or this patch needs review/ ready for committer now? -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Terminate the idle sessions
On 25.11.2020 05:18, Li Japin wrote: On Nov 24, 2020, at 11:20 PM, David G. Johnston mailto:david.g.johns...@gmail.com>> wrote: On Mon, Nov 23, 2020 at 11:22 PM Li Japin <mailto:japi...@hotmail.com>> wrote: How about use “foreign-data wrapper” replace “postgres_fdw”? I don't see much value in avoiding mentioning that specific term - my proposal turned it into an example instead of being exclusive. - This parameter should be set to zero if you use some connection-pooling software, - or pg servers used by postgres_fdw, because connections might be closed unexpectedly. + This parameter should be set to zero if you use connection-pooling software, + or PostgreSQL servers connected to using foreign-data + wrapper, because connections might be closed unexpectedly. Maybe: + or your PostgreSQL server receives connection from postgres_fdw or similar middleware. + Such software is expected to self-manage its connections. Thank you for your suggestion and patient! Fixed. ``` + + This parameter should be set to zero if you use connection-pooling software, + or PostgreSQL servers connected to using postgres_fdw + or similar middleware (such software is expected to self-manage its connections), + because connections might be closed unexpectedly. + ``` -- Best regards Japin Li Status update for a commitfest entry. As far as I see, all recommendations from reviewers were addressed in the last version of the patch. It passes CFbot successfully, so I move it to Ready For Committer. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Commitfest 2020-11 is closed
Hi, hackers! Commitfest 2020-11 is officially closed now. Many thanks to everyone who participated by posting patches, reviewing them, committing and sharing ideas in discussions! Today, me and Georgios will move the remaining items to the next CF or return them with feedback. We're planning to leave Ready For Committer till the end of the week, to make them more visible and let them get the attention they deserve. And finally in the weekend I'll gather and share some statistics. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: enable_incremental_sort changes query behavior
On 01.12.2020 03:08, James Coleman wrote: On Tue, Nov 3, 2020 at 4:39 PM Tomas Vondra wrote: I've pushed the 0001 part, i.e. the main fix. Not sure about the other parts (comments and moving the code back to postgres_fdw) yet. I noticed the CF entry [1] got moved to the next CF; I'm thinking this entry should be marked as committed since the fix for the initial bug reported on this thread has been pushed. We have the parallel safety issue outstanding, but there's a separate thread and patch for that, so I'll make a new CF entry for that. I can mark it as committed, but I'm not sure how to "undo" (or if that's desirable) the move to the next CF. Thoughts? James 1: https://commitfest.postgresql.org/30/2754/ Oops... I must have rushed with this one, thank you for noticing. I don't see how to move it back either. I think it's fine to mark it as Committed where it is now. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [patch] [doc] Introduce view updating options more succinctly
The following review has been posted through the commitfest application: make installcheck-world: not tested Implements feature: not tested Spec compliant: not tested Documentation:tested, passed I wonder, why this patch didn't get a review during the CF. This minor improvement looks good to me, so I mark it Ready for Committer. The new status of this patch is: Ready for Committer
Re: [DOC] Document concurrent index builds waiting on each other
Status update for a commitfest entry. The commitfest is nearing the end and I wonder what is this discussion waiting for. It looks like the proposed patch received its fair share of review, so I mark it as ReadyForCommitter and lay responsibility for the final decision on them. The new status of this patch is: Ready for Committer
Re: Change JOIN tutorial to focus more on explicit joins
Status update for a commitfest entry. The commitfest is nearing the end and this thread was inactive for a while. As far as I see something got committed and now the discussion is stuck in arguing about parenthesis. FWIW, I think it is a matter of personal taste. Maybe we can compromise on simply leaving this part unchanged. If you are planning to continue working on it, please move it to the next CF.
Re: [PATCH] remove deprecated v8.2 containment operators
On 16.11.2020 23:55, Justin Pryzby wrote: On Fri, Nov 13, 2020 at 10:03:43AM -0500, Stephen Frost wrote: * Magnus Hagander (mag...@hagander.net) wrote: On Thu, Nov 12, 2020 at 11:28 PM Tom Lane wrote: The changes to the contrib modules appear to be incomplete in some ways. In cube, hstore, and seg, there are no changes to the extension scripts to remove the operators. All you're doing is changing the C code to no longer recognize the strategy, but that doesn't explain what will happen if the operator is still used. In intarray, by contrast, you're editing an existing extension script, but that should be done by an upgrade script instead. In the contrib modules, I'm afraid what you gotta do is remove the SQL operator definitions but leave the opclass code support in place. That's because there's no guarantee that users will update the extension's SQL version immediately, so a v14 build of the .so might still be used with the old SQL definitions. It's not clear how much window we need give for people to do that update, but I don't think "zero" is an acceptable answer. Based on my experience from the field, the answer is "never". As in, most people have no idea they are even *supposed* to do such an upgrade, so they don't do it. Until we solve that problem, I think we're basically stuck with keeping them "forever". (and even if/when we do, "zero" is probably not going to cut it, no) Yeah, this is a serious problem and one that we should figure out a way to fix or at least improve on- maybe by having pg_upgrade say something about extensions that could/should be upgraded..? I think what's needed are: 1) a way to *warn* users about deprecation. CREATE EXTENSION might give an elog(WARNING), but it's probably not enough. It only happens once, and if it's in pg_restore/pg_upgrade, it be wrapped by vendor upgrade scripts. It needs to be more like first function call in every session. Or more likely, put it in documentation for 10 years. 2) a way to *enforce* it, like making CREATE EXTENSION fail when run against an excessively old server, including by pg_restore/pg_upgrade (which ought to also handle it in --check). Are there any contrib for which (1) is done and we're anywhere near doing (2) ? Status update for a commitfest entry. The commitfest is nearing the end and this thread is "Waiting on Author". As far as I see we don't have a patch here and discussion is a bit stuck. So, I am planning to return it with feedback. Any objections? -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Dumping/restoring fails on inherited generated column
On 09.11.2020 13:43, Peter Eisentraut wrote: On 2020-11-06 04:55, Masahiko Sawada wrote: Both of these result in the same change to the dump output. Both of them have essentially the same idea. The first one adds the conditionals during the information gathering phase of pg_dump, the second one adds the conditionals during the output phase. Any further thoughts? I think the first one is better than the second (mine) because it can save the number of intermediate objects. I was hoping to wrap this issue up this week, but I found more problems with how these proposed changes interact with --binary-upgrade mode. I think I need to formalize my findings into pg_dump test cases as a next step. Then we can figure out what combination of tweaks will make them all work. I am moving this patch to the next CF, but it looks like the discussion is a bit stuck. Peter, can you please share your concerns about the interaction of the patch with --binary-upgrade mode? If you don't have time to write tests, you can just describe problems. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY
On 04.11.2020 02:56, Alvaro Herrera wrote: Here's an updated version of this patch. Apart from rebasing to current master, I made the following changes: * On the first transaction (the one that marks the partition as detached), the partition is locked with ShareLock rather than ShareUpdateExclusiveLock. This means that DML is not allowed anymore. This seems a reasonable restriction to me; surely by the time you're detaching the partition you're not inserting data into it anymore. * In ExecInitPartitionDispatchInfo, the previous version always excluded detached partitions. Now it does include them in isolation level repeatable read and higher. Considering the point above, this sounds a bit contradictory: you shouldn't be inserting new tuples in partitions being detached. But if you do, it makes more sense: in RR two queries that insert tuples in the same partition would not fail mid-transaction. (In a read-committed transaction, the second query does fail, but to me that does not sound surprising.) * ALTER TABLE .. DETACH PARTITION FINALIZE now waits for concurrent old snapshots, as previously discussed. This should ensure that the user doesn't just cancel the wait during the second transaction by Ctrl-C and run FINALIZE immediately afterwards, which I claimed would bring inconsistency. * Avoid creating the CHECK constraint if an identical one already exists. (Note I do not remove the constraint on ATTACH. That seems pointless.) Still to do: test this using the new hook added by 6f0b632f083b. Status update for a commitfest entry. The commitfest is nearing the end and this thread is "Waiting on Author". As far as I see the last message contains a patch. Is there anything left to work on or it needs review now? Alvaro, are you planning to continue working on it? -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Asymmetric partition-wise JOIN
On 09.11.2020 13:53, Anastasia Lubennikova wrote: On 21.08.2020 09:02, Andrey V. Lepikhov wrote: On 7/1/20 2:10 PM, Daniel Gustafsson wrote: On 27 Dec 2019, at 08:34, Kohei KaiGai wrote: The attached v2 fixed the problem, and regression test finished correctly. This patch no longer applies to HEAD, please submit an rebased version. Marking the entry Waiting on Author in the meantime. Rebased version of the patch on current master (d259afa736). I rebased it because it is a base of my experimental feature than we don't break partitionwise join of a relation with foreign partition and a local relation if we have info that remote server has foreign table link to the local relation (by analogy with shippable extensions). Maybe mark as 'Needs review'? Status update for a commitfest entry. According to cfbot, the patch fails to apply. Could you please send a rebased version? This thread was inactive for quite some time. Is anyone going to continue working on it? I see some interest in the idea of sharable hash, but I don't see even a prototype in this thread. So, probably, it is a matter of a separate discussion. Also, I took a look at the code. It looks like it needs some extra work. I am not a big expert in this area, so I'm sorry if questions are obvious. 1. What would happen if this assumption is not met? + * MEMO: We assume this pathlist keeps at least one AppendPath that + * represents partitioned table-scan, symmetric or asymmetric + * partition-wise join. It is not correct right now, however, a hook + * on add_path() to give additional decision for path removel allows + * to retain this kind of AppendPath, regardless of its cost. 2. Why do we wrap extract_asymmetric_partitionwise_subjoin() call into PG_TRY/PG_CATCH? What errors do we expect? 3. It looks like a crutch. If it isn't, I'd like to see a better comment about why "dynamic programming" is not applicable here. And shouldn't we also handle a root->join_cur_level? + /* temporary disables "dynamic programming" algorithm */ + root->join_rel_level = NULL; 4. This change looks like it can lead to a memory leak for old code. Maybe it is never the case, but again I think it worth a comment. - /* If there's nothing to adjust, don't call this function. */ - Assert(nappinfos >= 1 && appinfos != NULL); + /* If there's nothing to adjust, just return a duplication */ + if (nappinfos == 0) + return copyObject(node); 5. extract_asymmetric_partitionwise_subjoin() lacks a comment The new status of this patch is: Waiting on Author Status update for a commitfest entry. This entry was inactive during this CF, so I've marked it as returned with feedback. Feel free to resubmit an updated version to a future commitfest. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [PATCH] Runtime control of CLOBBER_CACHE_ALWAYS
On 27.10.2020 11:34, Peter Eisentraut wrote: On 2020-09-25 09:40, Craig Ringer wrote: While working on extensions I've often wanted to enable cache clobbering for a targeted piece of code, without paying the price of running it for all backends during postgres startup and any initial setup tasks that are required. So here's a patch that, when CLOBBER_CACHE_ALWAYS or CLOBBER_CACHE_RECURSIVE are defined, adds a GUC named clobber_cache_depth . It defaults to 1 for CLOBBER_CACHE_ALWAYS or 3 for CLOBBER_CACHE_RECURSIVE to match the existing compiled-in behaviour. But with this change it's now possible to run Pg with clobber_cache_depth=0 then set it to 1 only for targeted tests. clobber_cache_depth is treated as an unknown GUC if Pg was not built with CLOBBER_CACHE_ALWAYS or CLOBBER_CACHE_RECURSIVE defined. I think it would be great if the cache clobber code is always compiled in (but turned off) when building with assertions. The would reduce the number of hoops to jump through to actually use this locally and reduce the number of surprises from the build farm. For backward compatibility, you can arrange it so that the built-in default of clobber_cache_depth is 1 or 3 if CLOBBER_CACHE_ALWAYS or CLOBBER_CACHE_RECURSIVE are defined. Status update for a commitfest entry. This thread was inactive during the commitfest, so I am going to mark it as "Returned with Feedback". Craig, are you planning to continue working on it? The proposed idea is great and it looks like the patch needs only a minor improvement. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: pgbench - test whether a variable exists
On 30.11.2020 14:53, Fabien COELHO wrote: The new status of this patch is: Waiting on Author This patch was inactive during the commitfest, so I am going to mark it as "Returned with Feedback". Fabien, are you planning to continue working on it? Not in the short term, but probably for the next CF. Can you park it there? Sure, I'll move it to the next CF then. I also noticed, that the first message mentions the idea of refactoring to use some code it in both pgbench and psql code. Can you, please, share a link to the thread, if it exists? -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Improving psql slash usage help message
On 02.11.2020 18:02, Anastasia Lubennikova wrote: Status update for a commitfest entry. This thread was inactive for a while. Is anyone going to continue working on it? My two cents on the topic: I don’t see it as a big problem in the first place. In the source code, \dE refers to foreign tables and \de refers to forign servers. So, it seems more reasonable to me, to rename the latter. \dE[S+] [PATTERN] list foreign tables \det[+] [PATTERN] list foreign servers I also do not support merging the entries for different commands. I think that current help output is easier to read. The new status of this patch is: Waiting on Author Status update for a commitfest entry. This entry was inactive during this CF, so I've marked it as returned with feedback. Feel free to resubmit an updated version to a future commitfest. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: proposal: function pg_setting_value_split() to parse shared_preload_libraries etc.
On 23.10.2020 05:06, Ian Lawrence Barwick wrote: Having just submitted this, I realised I'm focussing on the GUCs which call "SplitDirectoriesString()" (as my specific uses case is for "shared_preload_libraries") but the patch does not consider the other GUC_LIST_INPUT settings, which call "SplitIdentifierString()", so as is, it might produce unexpected results for those. I'll rethink and submit an updated version. Status update for a commitfest entry. This entry was "Waiting on author" during this CF. As I see, the patch needs more work before review, so I changed it to "Withdrawn". Feel free to resubmit an updated version to a future commitfest. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: archive status ".ready" files may be created too early
On 14.10.2020 03:06, Kyotaro Horiguchi wrote: The patch includes a fix for primary->standby case. But I'm not sure we can do that in the cascaded case. A standby is not aware of the structure of a WAL blob and has no idea of up-to-where to send the received blobs. However, if we can rely on the behavior of CopyData that we always receive a blob as a whole sent from the sender at once, the cascaded standbys are free from the trouble (as far as the cascaded-standby doesn't crash just before writing the last-half of a record into pg_wal and after archiving the last full-segment, which seems unlikely.). regards. Status update for a commitfest entry. This entry was "Waiting on author" during this CF. As I see, the latest message contains new version of the patch. Does it need more work? Are you going to continue working on it? -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: pgbench - test whether a variable exists
On 20.10.2020 15:22, Ibrar Ahmed wrote: The following review has been posted through the commitfest application: make installcheck-world: not tested Implements feature: not tested Spec compliant: tested, failed Documentation:not tested I am not very convinced to have that, but I have performed some testing on master (). I found a crash [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `pgbench postgres -T 60 -f a.sql'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x55bfb572a4b2 in expr_yylex (yylval_param=0x7fff29a60870, yyscanner=0x55bfb6867a90) at exprscan.c:1107 1107*yy_cp = yyg->yy_hold_char; (gdb) bt #0 0x55bfb572a4b2 in expr_yylex (yylval_param=0x7fff29a60870, yyscanner=0x55bfb6867a90) at exprscan.c:1107 #1 0x55bfb572cb63 in expr_lex_one_word (state=0x55bfb6867a10, word_buf=0x7fff29a608d0, offset=0x7fff29a608a8) at exprscan.l:340 #2 0x55bfb57358fd in process_backslash_command (sstate=0x55bfb6867a10, source=0x55bfb68653a0 "a.sql") at pgbench.c:4540 #3 0x55bfb5736528 in ParseScript ( script=0x55bfb68655f0 "\\set nbranches :scale\n\\set ntellers 10 * :scale\n\\set naccounts 10 * :scale\n\\setrandom aid 1 :naccounts\n\\setrandom bid 1 :nbranches\n\\setrandom tid 1 :ntellers\n\\setrandom delta -5000 5000\nBEGIN;\nUPD"..., desc=0x55bfb68653a0 "a.sql", weight=1) at pgbench.c:4826 #4 0x55bfb5736967 in process_file (filename=0x55bfb68653a0 "a.sql", weight=1) at pgbench.c:4962 #5 0x55bfb5738628 in main (argc=6, argv=0x7fff29a610d8) at pgbench.c:5641 The new status of this patch is: Waiting on Author This patch was inactive during the commitfest, so I am going to mark it as "Returned with Feedback". Fabien, are you planning to continue working on it? -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Extending range type operators to cope with elements
On 31.10.2020 01:08, Tomas Vondra wrote: Hi, On Fri, Oct 30, 2020 at 04:01:27PM +, Georgios Kokolatos wrote: Hi, thank you for your contribution. I did notice that the cfbot [1] is failing for this patch. Please try to address the issues if you can for the upcoming commitfest. I took a look at the patch today - the regression failure was trivial, the expected output for one query was added to the wrong place, a couple lines off the proper place. Attached is an updated version of the patch, fixing that. I also reviewed the code - it seems pretty clean and in line with the surrounding code in rangetypes.c. Good job Esteban! I'll do a bit more review next week, and I'll see if I can get it committed. regards CFM reminder. Just in case you forgot about this thread) The commitfest is heading to the end. Tomas, will you have time to push this patch? The patch still applies and passes all cfbot checks. I also took a quick look at the code and everything looks good to me. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Removing unneeded self joins
On 31.10.2020 12:26, Andrey V. Lepikhov wrote: Thank you for this partial review, I included your changes: On 9/23/20 9:23 AM, David Rowley wrote: On Fri, 3 Apr 2020 at 17:43, Andrey Lepikhov wrote: Doing thing the way I describe will allow you to get rid of all the UniqueRelInfo stuff. Thanks for the review and sorry for the late reply. I fixed small mistakes, mentioned in your letter. Also I rewrote this patch at your suggestion [1]. Because of many changes, this patch can be viewed as a sketch. To change self-join detection algorithm I used your delta patch from [2]. I added in the split_selfjoin_quals routine complex expressions handling for demonstration. But, it is not very useful with current infrastructure, i think. Also I implemented one additional way for self-join detection algorithm: if the join target list isn't contained vars from inner relation, then we can detect self-join with only quals like a1.x=a2.y if check innerrel_is_unique is true. Analysis of the target list is contained in the new routine - tlist_contains_rel_exprs - rewritten version of the build_joinrel_tlist routine. Also changes of the join_is_removable() routine is removed from the patch. I couldn't understand why it is needed here. Note, this patch causes change of one join.sql regression test output. It is not a bug, but maybe fixed. Applied over commit 4a071afbd0. > [1] https://www.postgresql.org/message-id/CAKJS1f8p-KiEujr12k-oa52JNWWaQUjEjNg%2Bo1MGZk4mHBn_Rg%40mail.gmail.com [2] https://www.postgresql.org/message-id/CAKJS1f8cJOCGyoxi7a_LG7eu%2BWKF9%2BHTff3wp1KKS5gcUg2Qfg%40mail.gmail.com Status update for a commitfest entry. This entry was "Waiting on author" during this CF. As I see, the latest message contains new version of the patch. Does it need more work? Are you going to continue working on it? -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [proposal] de-TOAST'ing using a iterator
On 02.11.2020 22:08, John Naylor wrote: On Mon, Nov 2, 2020 at 1:30 PM Alvaro Herrera <mailto:alvhe...@alvh.no-ip.org>> wrote: On 2020-Nov-02, Anastasia Lubennikova wrote: > Status update for a commitfest entry. > > This entry was inactive for a very long time. > John, are you going to continue working on this? Not in the near future. For background, this was a 2019 GSoC project where I was reviewer of record, and the patch is mostly good, but there is some architectural awkwardness. I have tried to address that, but have not had success. The commitfest is nearing the end and as this thread has stalled, I've marked it Returned with Feedback. Feel free to open a new entry if you return to this patch. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: fixing old_snapshot_threshold's time->xid mapping
On 06.10.2020 08:32, Thomas Munro wrote: On Fri, Sep 25, 2020 at 2:00 PM Michael Paquier wrote: On Thu, Sep 24, 2020 at 03:46:14PM -0400, Robert Haas wrote: Thomas, with respect to your part of this patch set, I wonder if we can make the functions that you're using to write tests safe enough that we could add them to contrib/old_snapshot and let users run them if they want. As you have them, they are hedged around with vague and scary warnings, but is that really justified? And if so, can it be fixed? It would be nicer not to end up with two loadable modules here, and maybe the right sorts of functions could even have some practical use. Yeah, you may be right. I am thinking about that. In the meantime, here is a rebase. A quick recap of these remaining patches: 0001 replaces the current "magic test mode" that didn't really test anything with a new test mode that verifies pruning and STO behaviour. 0002 fixes a separate bug that Andres reported: the STO XID map suffers from wraparound-itis. 0003 adds a simple smoke test for Robert's commit 55b7e2f4. Before that fix landed, it failed. I have switched this item as waiting on author in the CF app then, as we are not completely done yet. Thanks. For the record, I think there is still one more complaint from Andres that remains unaddressed even once these are in the tree: there are thought to be some more places that lack TestForOldSnapshot() calls. Status update for a commitfest entry. This entry is "Waiting on author" and the thread was inactive for a while. As far as I see, part of the fixes is already committed. Is there anything left to work on or this patch set needs review now? -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: cleanup temporary files after crash
On 01.11.2020 04:25, Michael Paquier wrote: On Sat, Oct 31, 2020 at 09:01:15PM -0300, Euler Taveira wrote: I thought about not providing a GUC at all or provide it in the developer section. I've never heard someone saying that they use those temporary files to investigate an issue. Regarding a crash, all information is already available and temporary files don't provide extra details. This new GUC is just to keep the previous behavior. I'm fine without the GUC, though. The original behavior is as old as 4a5f38c4, and last time we talked about that there were arguments about still keeping the existing behavior to not cleanup files during a restart-after-crash scenario for the sake of being useful just "in case". I have never used that property myself, TBH, and I have seen much more cases of users caring about the data folder not facing an ENOSPC particularly if they don't use different partitions for pg_wal/ and the main data folder. -- Michael Thank you, Euler for submitting this. +1 for the feature. One of the platforms we support uses temp files a lot and we faced the problem, while never actually used these orphan files for debugging purposes. I also think that the GUC is not needed here. This 'feature' was internal from the very beginning, so users shouldn't care about preserving old behavior. Without the GUC the patch is very simple, please see attached version. I also omit the test, because I am not sure it will be stable given that the RemovePgTempFiles() allows the possibility of failure. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company commit 417581b287f4060178595cd96465adc639639290 Author: anastasia Date: Thu Nov 26 11:45:05 2020 +0300 Remove temporary files after a backend crash Successive crashes could result in useless storage usage until service is restarted. It could be the case on host with inadequate resources. This manual intervention for some environments is not desirable. The only reason we kept temp files was that someone might want to examine them for debugging purposes. With this patch we favor service continuity over debugging. Author: Euler Taveira Reviewer: Anastasia Lubennikova diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index b7799ed1d24..a151250734f 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -3999,6 +3999,9 @@ PostmasterStateMachine(void) ereport(LOG, (errmsg("all server processes terminated; reinitializing"))); + /* remove leftover temporary files after a crash */ + RemovePgTempFiles(); + /* allow background workers to immediately restart */ ResetBackgroundWorkerCrashTimes(); diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c index 05abcf72d68..d30bfc28541 100644 --- a/src/backend/storage/file/fd.c +++ b/src/backend/storage/file/fd.c @@ -2992,15 +2992,15 @@ CleanupTempFiles(bool isCommit, bool isProcExit) * remove any leftover files created by OpenTemporaryFile and any leftover * temporary relation files created by mdcreate. * - * NOTE: we could, but don't, call this during a post-backend-crash restart - * cycle. The argument for not doing it is that someone might want to examine - * the temp files for debugging purposes. This does however mean that - * OpenTemporaryFile had better allow for collision with an existing temp - * file name. + * This is also called during a post-backend-crash restart cycle. The argument + * for not doing it is that crashes of queries using temp files can result in + * useless storage usage that can only be reclaimed by a service restart. * * NOTE: this function and its subroutines generally report syscall failures * with ereport(LOG) and keep going. Removing temp files is not so critical * that we should fail to start the database when we can't do it. + * Because this routine is not strict, OpenTemporaryFile allows for collision + * with an existing temp file name. */ void RemovePgTempFiles(void)
Re: PoC: custom signal handler for extensions
On 03.09.2020 14:25, Pavel Borisov wrote: Is there a chance to see you restarting working on this patch ? I noticed that despite interest to the matter, the discussion in this CF thread stopped quite a while ago. So I'd like to push here a patch revised according to the previous discussion. Actually the patch is being used as a part of the pg_query_state extension during several major PostgreSQL releases. So I'd like to raise the matter of reviewing this updated patch and include it into version 14 as I see much use of this change. I welcome all your considerations, I took a look at the patch. It looks fine and I see, that it contains fixes for the latest questions in the thread. I think we should provide a test module for this feature, that will serve as both test and example of the use. This is a feature for extension developers, so I don't know where we should document it. At the very least we can improve comments. For example, describe the fact that custom signals are handled after receiving SIGUSR1. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Improper use about DatumGetInt32
On 02.11.2020 18:59, Peter Eisentraut wrote: I have committed 0003. For 0001, normal_rand(), I think you should reject negative arguments with an error. I've updated 0001. The change is trivial, see attached. For 0002, I think you should change the block number arguments to int8, same as other contrib modules do. Agree. It will need a bit more work, though. Probably a new version of pageinspect contrib, as the public API will change. Ashutosh, are you going to continue working on it? -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company commit b42df63c757bf5ff715052a401aba37691a7e2b8 Author: anastasia Date: Wed Nov 25 17:19:33 2020 +0300 Handle negative number of tuples passed to normal_rand() The function converts the first argument i.e. the number of tuples to return into an unsigned integer which turns out to be huge number when a negative value is passed. This causes the function to take much longer time to execute. Instead throw an error about invalid parameter value. While at it, improve SQL test to test the number of tuples returned by this function. Authors: Ashutosh Bapat, Anastasia Lubennikova diff --git a/contrib/tablefunc/expected/tablefunc.out b/contrib/tablefunc/expected/tablefunc.out index fffadc6e1b4..97bfd690841 100644 --- a/contrib/tablefunc/expected/tablefunc.out +++ b/contrib/tablefunc/expected/tablefunc.out @@ -3,10 +3,20 @@ CREATE EXTENSION tablefunc; -- normal_rand() -- no easy way to do this for regression testing -- -SELECT avg(normal_rand)::int FROM normal_rand(100, 250, 0.2); - avg -- - 250 +SELECT avg(normal_rand)::int, count(*) FROM normal_rand(100, 250, 0.2); + avg | count +-+--- + 250 | 100 +(1 row) + +-- negative number of tuples +SELECT avg(normal_rand)::int, count(*) FROM normal_rand(-1, 250, 0.2); +ERROR: invalid number of tuples: -1 +-- zero number of tuples +SELECT avg(normal_rand)::int, count(*) FROM normal_rand(0, 250, 0.2); + avg | count +-+--- + | 0 (1 row) -- diff --git a/contrib/tablefunc/sql/tablefunc.sql b/contrib/tablefunc/sql/tablefunc.sql index ec375b05c63..5341c005f91 100644 --- a/contrib/tablefunc/sql/tablefunc.sql +++ b/contrib/tablefunc/sql/tablefunc.sql @@ -4,7 +4,11 @@ CREATE EXTENSION tablefunc; -- normal_rand() -- no easy way to do this for regression testing -- -SELECT avg(normal_rand)::int FROM normal_rand(100, 250, 0.2); +SELECT avg(normal_rand)::int, count(*) FROM normal_rand(100, 250, 0.2); +-- negative number of tuples +SELECT avg(normal_rand)::int, count(*) FROM normal_rand(-1, 250, 0.2); +-- zero number of tuples +SELECT avg(normal_rand)::int, count(*) FROM normal_rand(0, 250, 0.2); -- -- crosstab() diff --git a/contrib/tablefunc/tablefunc.c b/contrib/tablefunc/tablefunc.c index 02f02eab574..c6d32d46f01 100644 --- a/contrib/tablefunc/tablefunc.c +++ b/contrib/tablefunc/tablefunc.c @@ -174,6 +174,7 @@ normal_rand(PG_FUNCTION_ARGS) FuncCallContext *funcctx; uint64 call_cntr; uint64 max_calls; + int32 num_tuples; normal_rand_fctx *fctx; float8 mean; float8 stddev; @@ -193,7 +194,14 @@ normal_rand(PG_FUNCTION_ARGS) oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx); /* total number of tuples to be returned */ - funcctx->max_calls = PG_GETARG_UINT32(0); + num_tuples = PG_GETARG_INT32(0); + + if (num_tuples < 0) + ereport(ERROR, +(errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("invalid number of tuples: %d", num_tuples))); + + funcctx->max_calls = num_tuples; /* allocate memory for user context */ fctx = (normal_rand_fctx *) palloc(sizeof(normal_rand_fctx));
Ready For Committer patches (CF 2020-11)
Hi, with this message, I want to draw attention to the RFC patches on the current commitfest. It would be good if committers could take a look at them. While doing a sweep through the CF, I have kicked a couple of entries back to Waiting on author, so now the list is correct. Now we have 17 entries. https://commitfest.postgresql.org/30/?status=3 The oldest RFC patches are listed below. These entries are quite large and complex. Still they got a good amount of reviews and it looks like after a long discussion they are in a good shape. Generic type subscripting <https://commitfest.postgresql.org/30/1062/> schema variables, LET command <https://commitfest.postgresql.org/30/1608/> pgbench - add pseudo-random permutation function <https://commitfest.postgresql.org/30/2138/> Allow REINDEX, CLUSTER and VACUUM FULL to rebuild on new TABLESPACE/INDEX_TABLESPACE <https://commitfest.postgresql.org/30/2269/> range_agg / multiranges <https://commitfest.postgresql.org/30/2112/> -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Libpq support to connect to standby server as priority
On 30.09.2020 10:57, Greg Nancarrow wrote: Thanks for your thoughts, patches and all the pointers. I'll be looking at all of them. (And yes, the comma instead of bitwise OR is of course an error, somehow made and gone unnoticed; the next field in the struct is an enum, so accepts any int value). Regards, Greg Nancarrow Fujitsu Australia CFM reminder. Hi, this entry is "Waiting on Author" and the thread was inactive for a while. As far as I see, the patch needs some further work. Are you going to continue working on it, or should I mark it as "returned with feedback" until a better time? -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: pgbench and timestamps (bounced)
On 11.09.2020 16:59, Fabien COELHO wrote: Hello Tom, It requires a mutex around the commands, I tried to do some windows implementation which may or may not work. Ugh, I'd really rather not do that. Even disregarding the effects of a mutex, though, my initial idea for fixing this has a big problem: if we postpone PREPARE of the query until first execution, then it's happening during timed execution of the benchmark scenario and thus distorting the timing figures. (Maybe if we'd always done it like that, it'd be okay, but I'm quite against changing the behavior now that it's stood for a long time.) Hmmm. Prepare is done *once* per client, ISTM that the impact on any statistically significant benchmark is nul in practice, or it would mean that the benchmark settings are too low. Second, the mutex is only used when absolutely necessary, only for the substitution part of the query (replacing :stuff by ?), because scripts are shared between threads. This is just once, in an unlikely case occuring at the beginning. However, perhaps there's more than one way to fix this. Once we've scanned all of the script and seen all the \set commands, we know (in principle) the set of all variable names that are in use. So maybe we could fix this by (1) During the initial scan of the script, make variable-table entries for every \set argument, with the values shown as undefined for the moment. Do not try to parse SQL commands in this scan, just collect them. The issue with this approach is SELECT 1 AS one \gset pref_ which will generate a "pref_one" variable, and these names cannot be guessed without SQL parsing and possibly execution. That is why the preparation is delayed to when the variables are actually known. (2) Make another scan in which we identify variable references in the SQL commands and issue PREPAREs (if enabled). (3) Perform the timed run. This avoids any impact of this bug fix on the semantics or timing of the benchmark proper. I'm not sure offhand whether this approach makes any difference for the concerns you had about identifying/suppressing variable references inside quotes. I do not think this plan is workable, because of the \gset issue. I do not see that the conditional mutex and delayed PREPARE would have any significant (measurable) impact on an actual (reasonable) benchmark run. A workable solution would be that each client actually execute each script once before starting the actual benchmark. It would still need a mutex and also a sync barrier (which I'm proposing in some other thread). However this may raise some other issues because then some operations would be trigger out of the benchmarking run, which may or may not be desirable. So I'm not to keen to go that way, and I think the proposed solution is reasonable from a benchmarking point of view as the impact is minimal, although not zero. CFM reminder. Hi, this entry is "Waiting on Author" and the thread was inactive for a while. I see this discussion still has some open questions. Are you going to continue working on it, or should I mark it as "returned with feedback" until a better time? -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: LogwrtResult contended spinlock
On 04.09.2020 20:13, Andres Freund wrote: Hi, On 2020-09-04 10:05:45 -0700, Andres Freund wrote: On 2020-09-03 14:34:52 -0400, Alvaro Herrera wrote: Looking at patterns like this if (XLogCtl->LogwrtRqst.Write < EndPos) XLogCtl->LogwrtRqst.Write = EndPos; It seems possible to implement with do { XLogRecPtr currwrite; currwrite = pg_atomic_read_u64(LogwrtRqst.Write); if (currwrite > EndPos) break; // already done by somebody else if (pg_atomic_compare_exchange_u64(LogwrtRqst.Write, currwrite, EndPos)) break; // successfully updated } while (true); This assumes that LogwrtRqst.Write never goes backwards, so it doesn't seem good material for a general routine. This *seems* correct to me, though this is muddy territory to me. Also, are there better ways to go about this? Hm, I was thinking that we'd first go for reading it without a spinlock, but continuing to write it as we currently do. But yea, I can't see an issue with what you propose here. I personally find do {} while () weird and avoid it when not explicitly useful, but that's extremely minor, obviously. Re general routine: On second thought, it might actually be worth having it. Even just for LSNs - there's plenty places where it's useful to ensure a variable is at least a certain size. I think I would be in favor of a general helper function. Do you mean by general helper function something like this? void swap_lsn(XLogRecPtr old_value, XLogRecPtr new_value, bool to_largest) { while (true) { XLogRecPtr currwrite; currwrite = pg_atomic_read_u64(old_value); if (to_largest) if (currwrite > new_value) break; /* already done by somebody else */ else if (currwrite < new_value) break; /* already done by somebody else */ if (pg_atomic_compare_exchange_u64(old_value, currwrite, new_value)) break; /* already done by somebody else */ } } which will be called like swap_lsn(XLogCtl->LogwrtRqst.Write, EndPos, true); Greetings, Andres Freund This CF entry was inactive for a while. Alvaro, are you going to continue working on it? -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: psql: add \si, \sm, \st and \sr functions to show CREATE commands for indexes, matviews, triggers and tables
On 18.08.2020 17:25, Tom Lane wrote: a.pervush...@postgrespro.ru writes: [ si_st_sm_sr_v2.patch ] I hadn't particularly noticed this thread before, but I happened to look through this patch, and I've got to say that this proposed feature seems like an absolute disaster from a maintenance standpoint. There will be no value in an \st command that is only 90% accurate; the produced DDL has to be 100% correct. This means that, if we accept this feature, psql will have to know everything pg_dump knows about how to construct the DDL describing tables, indexes, views, etc. That is a lot of code, and it's messy, and it changes nontrivially on a very regular basis. I can't accept that we want another copy in psql --- especially one that looks nothing like what pg_dump has. There've been repeated discussions about somehow extracting pg_dump's knowledge into a library that would also be available to other client programs (see e.g. the concurrent thread at [1]). That's quite a tall order, which is why it's not happened yet. But I think we really need to have something like that before we can accept this feature for psql. BTW, as an example of why this is far more difficult than it might seem at first glance, this patch doesn't even begin to meet the expectation stated at the top of describe.c: * Support for the various \d ("describe") commands. Note that the current * expectation is that all functions in this file will succeed when working * with servers of versions 7.4 and up. It's okay to omit irrelevant * information for an old server, but not to fail outright. It might be okay for this to cut off at 8.0 or so, as I think pg_dump does, but not to just fail on older servers. Another angle, which I'm not even sure how we want to think about it, is security. It will not do for "\et" to allow some attacker to replace function calls appearing in the table's CHECK constraints, for instance. So this means you've got to be very aware of CVE-2018-1058-style attacks. Our answer to that for pg_dump has partially depended on restricting the search_path used at both dump and restore time ... but I don't think \et gets to override the search path that the psql user is using. I'm not sure what that means in practice but it certainly requires some thought before we add the feature, not after. Anyway, I can see the attraction of having psql commands like these, but "write a bunch of new code that we'll have to maintain" does not seem like a desirable way to get them. regards, tom lane [1] https://www.postgresql.org/message-id/flat/9df8a3d3-13d2-116d-26ab-6a273c1ed38c%402ndquadrant.com Since there has been no activity on this thread since before the CF and no response from the author I have marked this "returned with feedback". Alexandra, feel free to resubmit it to the next commitfest, when you have time to address the issues raised in the review. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: New default role- 'pg_read_all_data'
On 29.10.2020 17:19, Stephen Frost wrote: Greetings, * Georgios Kokolatos (gkokola...@protonmail.com) wrote: this patch is in "Ready for committer" state and the Cfbot is happy. Glad that's still the case. :) Is there any committer that is available for taking a look at it? If there aren't any objections or further comments, I'll take another look through it and will commit it during the upcoming CF. Thanks! Stephen CFM reminder. Just in case you forgot about this thread) The commitfest is heading to the end. And there was a plenty of time for anyone to object. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: deferred primary key and logical replication
On 27.10.2020 13:46, Amit Kapila wrote: On Sun, Oct 25, 2020 at 9:39 PM Euler Taveira wrote: On Mon, 5 Oct 2020 at 08:34, Amit Kapila wrote: On Mon, May 11, 2020 at 2:41 AM Euler Taveira wrote: Hi, While looking at an old wal2json issue, I stumbled on a scenario that a table with a deferred primary key is not updatable in logical replication. AFAICS it has been like that since the beginning of logical decoding and seems to be an oversight while designing logical decoding. I am not sure if it is an oversight because we document that the index must be non-deferrable, see "USING INDEX records the old values of the columns covered by the named index, which must be unique, not partial, not deferrable, and include only columns marked NOT NULL." in docs [1]. Inspecting this patch again, I forgot to consider that RelationGetIndexList() is called by other backend modules. Since logical decoding deals with finished transactions, it is ok to use a deferrable primary key. But starting PG-14, we do support logical decoding of in-progress transactions as well. Commitfest entry status update. As far as I see, this patch needs some further work, so I move it to "Waiting on author". Euler, are you going to continue working on it? -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Online verification of checksums
On 23.11.2020 18:35, Stephen Frost wrote: Greetings, * Anastasia Lubennikova (a.lubennik...@postgrespro.ru) wrote: On 21.11.2020 04:30, Michael Paquier wrote: The only method I can think as being really reliable is based on two facts: - Do a check only on pd_checksums, as that validates the full contents of the page. - When doing a retry, make sure that there is no concurrent I/O activity in the shared buffers. This requires an API we don't have yet. It seems reasonable to me to rely on checksums only. As for retry, I think that API for concurrent I/O will be complicated. Instead, we can introduce a function to read the page directly from shared buffers after PAGE_RETRY_THRESHOLD attempts. It looks like a bullet-proof solution to me. Do you see any possible problems with it? We might end up reading pages back in that have been evicted, for one thing, which doesn't seem great, TBH, I think it is highly unlikely that the page that was just updated will be evicted. and this also seems likely to be awkward for cases which aren't using the replication protocol, unless every process maintains a connection to PG the entire time, which also doesn't seem great. Have I missed something? Now pg_basebackup has only one process + one child process for streaming. Anyway, I totally agree with your argument. The need to maintain connection(s) to PG is the most unpleasant part of the proposed approach. Also- what is the point of reading the page from shared buffers anyway..? Well... Reading a page from shared buffers is a reliable way to get a correct page from postgres under any concurrent load. So it just seems natural to me. All we need to do is prove that the page will be rewritten during WAL replay. Yes and this is a tricky part. Until you have explained it in your latest message, I wasn't sure how we can distinct concurrent update from a page header corruption. Now I agree that if page LSN updated and increased between rereads, it is safe enough to conclude that we have some concurrent load. If we can prove that, we don't actually care what the contents of the page are. We certainly can't calculate the checksum on a page we plucked out of shared buffers since we only calculate the checksum when we go to write the page out. Good point. I was thinking that we can recalculate checksum. Or even save a page without it, as we have checked LSN and know for sure that it will be rewritten by WAL replay. To sum up, I agree with your proposal to reread the page and rely on ascending LSNs. Can you submit a patch? You can write it on top of the latest attachment in this thread: v8-master-0001-Fix-page-verifications-in-base-backups.patch <https://www.postgresql.org/message-id/attachment/115403/v8-master-0001-Fix-page-verifications-in-base-backups.patch> from this message https://www.postgresql.org/message-id/20201030023028.gc1...@paquier.xyz -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: DROP relation IF EXISTS Docs and Tests - Bug Fix
On 30.09.2020 05:00, David G. Johnston wrote: On Tue, Sep 15, 2020 at 3:48 PM Alexander Korotkov mailto:aekorot...@gmail.com>> wrote: Hi! I've skimmed through the thread and checked the patchset. Everything looks good, except one paragraph, which doesn't look completely clear. + + This emulates the functionality provided by + but sets the created object's + type definition + to domain. + As I get it states that CREATE DOMAIN somehow "emulates" CREATE TYPE. Could you please, rephrase it? It looks confusing to me yet. v5 attached, looking at this fresh and with some comments to consider. I ended up just combining both patches into one. I did away with the glossary changes altogether, and the invention of the new term. I ended up limiting "type's type" to just domain usage but did a couple of a additional tweaks that tried to treat domains as not being actual types even though, at least in PostgreSQL, they are (at least as far as DROP TYPE is concerned - and since I don't have any understanding of the SQL Standard's decision to separate out create domain and create type I'll just stick to the implementation in front of me. David J. Reminder from a CF manager, as this thread was inactive for a while. Alexander, I see you signed up as a committer for this entry. Are you going to continue this work? -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Online verification of checksums
On 21.11.2020 04:30, Michael Paquier wrote: The only method I can think as being really reliable is based on two facts: - Do a check only on pd_checksums, as that validates the full contents of the page. - When doing a retry, make sure that there is no concurrent I/O activity in the shared buffers. This requires an API we don't have yet. It seems reasonable to me to rely on checksums only. As for retry, I think that API for concurrent I/O will be complicated. Instead, we can introduce a function to read the page directly from shared buffers after PAGE_RETRY_THRESHOLD attempts. It looks like a bullet-proof solution to me. Do you see any possible problems with it? -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Use of "long" in incremental sort code
On 05.11.2020 02:53, James Coleman wrote: On Tue, Nov 3, 2020 at 4:42 PM Tomas Vondra wrote: On Tue, Nov 03, 2020 at 03:53:53AM +0100, Tomas Vondra wrote: Hi, I took another look at this, and 99% of the patch (the fixes to sort debug messages) seems fine to me. Attached is the part I plan to get committed, including commit message etc. I've pushed this part. Thanks for the patch, Haiying Tang. The one change I decided to remove is this change in tuplesort_free: - longspaceUsed; + int64 spaceUsed; The reason why I think this variable should be 'long' is that we're using it for this: spaceUsed = LogicalTapeSetBlocks(state->tapeset); and LogicalTapeSetBlocks is defined like this: extern long LogicalTapeSetBlocks(LogicalTapeSet *lts); FWIW the "long" is not introduced by incremental sort - it used to be in tuplesort_end, the incremental sort patch just moved it to a different function. It's a bit confusing that tuplesort_updatemax has this: int64 spaceUsed; But I'd argue this is actually wrong, and should be "long" instead. (And this actually comes from the incremental sort patch, by me.) FWIW while looking at what the other places calling LogicalTapeSetBlocks do, and I noticed this: uint64 disk_used = LogicalTapeSetBlocks(...); in the disk-based hashagg patch. So that's a third data type ... IMHO this should simply switch the current int64 variable to long, as it was before. Not sure about about the hashagg uint64 variable. Is there anything that actually limits tape code to using at most 4GB on 32-bit systems? At first glance, I haven't found anything that could limit tape code. It uses BufFile, which is not limited by the OS file size limit. Still, If we want to change 'long' in LogicalTapeSetBlocks, we should probably also update nBlocksWritten and other variables. As far as I see, the major part of the patch was committed, so l update the status of the CF entry to "Committed". Feel free to create a new entry, if you're going to continue working on the remaining issue. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Commitfest 2020-11
Now that we are halfway through the commitfest, the status breakdown looks like this: Needs review: 116 Waiting on Author: 45 Ready for Committer: 22 Committed: 51 Returned with Feedback: 1 Withdrawn: 8 Rejected: 1 Total: 244 which means we have reached closure on a quarter of the patches. And many discussions have significantly moved forward. Keep it up and don't wait for the deadline commitfest) Most inactive discussions and patches which didn't apply were notified on their respective threads. If there will be no response till the last days of the CF they will be considered stalled and returned with feedback. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Skip ExecCheckRTPerms in CTAS with no data
On 29.09.2020 14:39, Bharath Rupireddy wrote: On Mon, Sep 28, 2020 at 7:48 PM Tom Lane wrote: Bharath Rupireddy writes: In case of CTAS with no data, we actually do not insert the tuples into the created table, so we can skip checking for the insert permissions. Anyways, the insert permissions will be checked when the tuples are inserted into the table. I'd argue this is wrong. You don't get to skip permissions checks in ordinary queries just because, say, there's a LIMIT 0 on the query. Right, when there's a select with limit 0 clause, we do check for the select permissions. But my point is, we don't check insert permissions(or select or update etc.) when we create a plain table using CREATE TABLE test_tbl(a1 INT). Of course, we do check create permissions. Going by the similar point, shouldn't we also check only create permission(which is already being done as part of DefineRelation) and skip the insert permission(the change this patch does) for the new table being created as part of CREATE TABLE test_tbl AS SELECT * FROM test_tbl2? However select permission will be checked for test_tbl2. The insert permissions will be checked anyways before inserting rows into the table created in CTAS. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com I see Tom's objection above. Still, I tend to agree that if 'WITH NO DATA' was specified explicitly, CREATE AS should behave more like a utility statement rather than a regular query. So I think that this patch can be useful in some use-cases and I definitely don't see any harm it could cause. Even the comment in the current code suggests that it is an option. I took a look at the patch. It is quite simple, so no comments about the code. It would be good to add a test to select_into.sql to show that it only applies to 'WITH NO DATA' and that subsequent insertions will fail if permissions are not set. Maybe we should also mention it a documentation, but I haven't found any specific paragraph about permissions on CTAS. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Prevent printing "next step instructions" in initdb and pg_upgrade
On 02.11.2020 16:23, Magnus Hagander wrote: On Tue, Oct 27, 2020 at 11:35:25AM +0100, Peter Eisentraut wrote: On 2020-10-06 12:26, Magnus Hagander wrote: I went with the name --no-instructions to have the same name for both initdb and pg_upgrade. The downside is that "no-instructions" also causes the scripts not to be written in pg_upgrade, which arguably is a different thing. We could go with "--no-instructions" and "--no-scripts", but that would leave the parameters different. I also considered "--no-next-step", but that one didn't quite have the right ring to me. I'm happy for other suggestions on the parameter names. What scripts are left after we remove the analyze script, as discussed in a different thread? There is still delete_old_cluster.sh. I wonder if we should just not do it and just say "you should delete the old cluster". Then we can leave it up to platform integrations to enhance that, based on their platform knowledge (such as for example knowing if the platform runs systemd or not). That would leave us with both pg_upgrade and initdb printing instructions, and not scripts, and solve that part of the issue. Do we only care about .sh scripts? There are also reindex_hash.sql and pg_largeobject.sqlin src/bin/pg_upgrade/version.c with instructions. How should we handle them? -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Asymmetric partition-wise JOIN
On 21.08.2020 09:02, Andrey V. Lepikhov wrote: On 7/1/20 2:10 PM, Daniel Gustafsson wrote: On 27 Dec 2019, at 08:34, Kohei KaiGai wrote: The attached v2 fixed the problem, and regression test finished correctly. This patch no longer applies to HEAD, please submit an rebased version. Marking the entry Waiting on Author in the meantime. Rebased version of the patch on current master (d259afa736). I rebased it because it is a base of my experimental feature than we don't break partitionwise join of a relation with foreign partition and a local relation if we have info that remote server has foreign table link to the local relation (by analogy with shippable extensions). Maybe mark as 'Needs review'? Status update for a commitfest entry. According to cfbot, the patch fails to apply. Could you please send a rebased version? This thread was inactive for quite some time. Is anyone going to continue working on it? I see some interest in the idea of sharable hash, but I don't see even a prototype in this thread. So, probably, it is a matter of a separate discussion. Also, I took a look at the code. It looks like it needs some extra work. I am not a big expert in this area, so I'm sorry if questions are obvious. 1. What would happen if this assumption is not met? + * MEMO: We assume this pathlist keeps at least one AppendPath that + * represents partitioned table-scan, symmetric or asymmetric + * partition-wise join. It is not correct right now, however, a hook + * on add_path() to give additional decision for path removel allows + * to retain this kind of AppendPath, regardless of its cost. 2. Why do we wrap extract_asymmetric_partitionwise_subjoin() call into PG_TRY/PG_CATCH? What errors do we expect? 3. It looks like a crutch. If it isn't, I'd like to see a better comment about why "dynamic programming" is not applicable here. And shouldn't we also handle a root->join_cur_level? + /* temporary disables "dynamic programming" algorithm */ + root->join_rel_level = NULL; 4. This change looks like it can lead to a memory leak for old code. Maybe it is never the case, but again I think it worth a comment. - /* If there's nothing to adjust, don't call this function. */ - Assert(nappinfos >= 1 && appinfos != NULL); + /* If there's nothing to adjust, just return a duplication */ + if (nappinfos == 0) + return copyObject(node); 5. extract_asymmetric_partitionwise_subjoin() lacks a comment The new status of this patch is: Waiting on Author -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
Status update for a commitfest entry. This thread was inactive for a while and from the latest messages, I see that the patch needs some further work. So I move it to "Waiting on Author". The new status of this patch is: Waiting on Author
Re: A problem about partitionwise join
Status update for a commitfest entry. According to CFbot this patch fails to apply. Richard, can you send an update, please? Also, I see that the thread was inactive for a while. Are you going to continue this work? I think it would be helpful, if you could write a short recap about current state of the patch and list open questions for reviewers. The new status of this patch is: Waiting on Author
Re: bitmaps and correlation
Status update for a commitfest entry According to cfbot, the patch fails to apply. Could you please send a rebased version? I wonder why this patch hangs so long without a review. Maybe it will help to move discussion forward, if you provide more examples of queries that can benefit from this imporovement? The first patch is simply a refactoring and don't see any possible objections against it. The second patch also looks fine to me. The logic is understandable and the code is neat. It wouldn't hurt to add a comment for this computation, though. + pages_fetched = pages_fetchedMAX + indexCorrelation*indexCorrelation*(pages_fetchedMIN - pages_fetchedMAX); The new status of this patch is: Waiting on Author
Re: WIP: BRIN multi-range indexes
Status update for a commitfest entry. According to cfbot the patch no longer compiles. Tomas, can you send an update, please? I also see that a few last messages mention a data corruption bug. Sounds pretty serious. Alvaro, have you had a chance to look at it? I don't see anything committed yet, nor any active discussion in other threads.
Re: RE: Delaying/avoiding BTreeTupleGetNAtts() call within _bt_compare()
Status update for a commitfest entry. This thread was inactive for a while. The latest review suggests that it is Ready For Committer. I also took a quick look at the patch and agree that it looks sensible. Maybe add a comment before the _bt_compare_inl() to explain the need for this code change. The new status of this patch is: Ready for Committer
Re: [proposal] de-TOAST'ing using a iterator
Status update for a commitfest entry. This entry was inactive for a very long time. John, are you going to continue working on this? The last message mentions some open issues, namely backend crashes, so I move it to "Waiting on author". The new status of this patch is: Waiting on Author
Re: Feedback on table expansion hook (including patch)
Status update for a commitfest entry. This patch implements useful improvement and the reviewer approved the code. It lacks a test, but looking at previously committed hooks, I think it is not mandatory. So, I move it to RFC. The new status of this patch is: Ready for Committer
Re: Improving psql slash usage help message
Status update for a commitfest entry. This thread was inactive for a while. Is anyone going to continue working on it? My two cents on the topic: I don’t see it as a big problem in the first place. In the source code, \dE refers to foreign tables and \de refers to forign servers. So, it seems more reasonable to me, to rename the latter. \dE[S+] [PATTERN] list foreign tables \det[+] [PATTERN] list foreign servers I also do not support merging the entries for different commands. I think that current help output is easier to read. The new status of this patch is: Waiting on Author
Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits
On 30.10.2020 03:42, Tomas Vondra wrote: Hi, I might be somewhat late to the party, but I've done a bit of benchmarking too ;-) I used TPC-H data from a 100GB test, and tried different combinations of COPY [FREEZE] and VACUUM [FREEZE], both on current master and with the patch. So I don't really observe any measurable slowdowns in the COPY part (in fact there seems to be a tiny speedup, but it might be just noise). In the VACUUM part, there's clear speedup when the data was already frozen by COPY (Yes, those are zeroes, because it took less than 1 second.) So that looks pretty awesome, I guess. For the record, these tests were run on a server with NVMe SSD, so hopefully reliable and representative numbers. Thank you for the review. A couple minor comments about the code: 2) I find the "if (all_frozen_set)" block a bit strange. It's a matter of personal preference, but I'd just use a single level of nesting, i.e. something like this: /* if everything frozen, the whole page has to be visible */ Assert(!(all_frozen_set && !PageIsAllVisible(page))); /* * If we've frozen everything on the page, and if we're already * holding pin on the vmbuffer, record that in the visibilitymap. * If we're not holding the pin, it's OK to skip this - it's just * an optimization. * * It's fine to use InvalidTransactionId here - this is only used * when HEAP_INSERT_FROZEN is specified, which intentionally * violates visibility rules. */ if (all_frozen_set && visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer)) visibilitymap_set(...); IMHO it's easier to read, but YMMV. I've also reworded the comment a bit to say what we're doing etc. And I've moved the comment from inside the if block into the main comment - that was making it harder to read too. I agree that it's a matter of taste. I've updated comments and left nesting unchanged to keep assertions simple. 3) I see RelationGetBufferForTuple does this: /* * This is for COPY FREEZE needs. If page is empty, * pin vmbuffer to set all_frozen bit */ if ((options & HEAP_INSERT_FROZEN) && (PageGetMaxOffsetNumber(BufferGetPage(buffer)) == 0)) visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer); so is it actually possible to get into the (all_frozen_set) without holding a pin on the visibilitymap? I haven't investigated this so maybe there are other ways to get into that code block. But the new additions to hio.c get the pin too. I was thinking that GetVisibilityMapPins() can somehow unset the pin. I gave it a second look. And now I don't think it's possible to get into this code block without a pin. So, I converted this check into an assertion. 4) In heap_xlog_multi_insert we now have this: if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED) PageClearAllVisible(page); if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) PageSetAllVisible(page); IIUC it makes no sense to have both flags at the same time, right? So what about adding Assert(!(XLH_INSERT_ALL_FROZEN_SET && XLH_INSERT_ALL_VISIBLE_CLEARED)); to check that? Agree. I placed this assertion to the very beginning of the function. It also helped to simplify the code a bit. I also noticed, that we were not updating visibility map for all_frozen from heap_xlog_multi_insert. Fixed. I wonder what to do about the heap_insert - I know there are concerns it would negatively impact regular insert, but is it really true? I suppose this optimization would be valuable even for cases where multi-insert is not possible. Do we have something like INSERT .. FREEZE? I only see TABLE_INSERT_FROZEN set for COPY FREEZE and for matview operations. Can you explain, what use-case are we trying to optimize by extending this patch to heap_insert()? The new version is attached. I've also fixed a typo in the comment by Tatsuo Ishii suggestion. Also, I tested this patch with replication and found no issues. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company commit 395716e277ac4be22b9b61311c301a69db0f9101 Author: anastasia Date: Mon Nov 2 16:27:48 2020 +0300 Teach COPY FREEZE to set PD_ALL_VISIBLE and visibility map bits. diff --git a/contrib/pg_visibility/expected/pg_visibility.out b/contrib/pg_visibility/expected/pg_visibility.out index ca4b6e186b..0017e3415c 100644 --- a/contrib/pg_visibility/expected/pg_visibility.out +++ b/contrib/pg_visibility/expected/pg_visibility.out @@ -179,6 +179,69 @@ select pg_truncate_visibility_map('test_partition'); (1 row) +-- test copy freeze +create table copyfreeze (a int, b char(1500)); +-- load all rows via COPY FREEZE and ensure that all pages are set all-visible +-- and all-frozen. +begin; +truncate copyfreeze; +copy copyfreeze from stdin freeze; +commit; +select * from pg_visi
Re: Commitfest 2020-11
Hi everyone, November commitfest is now in progress! Me and Georgios Kokolatos are happy to volunteer to manage it. During this CF, I want to pay more attention to a long-living issues. Current state for the Commitfest is: Needs review: 154 Waiting on Author: 32 Ready for Committer: 20 Committed: 32 Withdrawn: 5 Rejected: 1 Total: 244 We have quite a few ReadyForCommitter patches of a different size and complexity. Please, if you have submitted patches in this CF make sure that you are also reviewing patches of a similar number and complexity. The CF cannot move forward without patch review. Also, check the state of your patch at http://cfbot.cputube.org/ Happy hacking! -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Corruption during WAL replay
Status update for a commitfest entry. I see quite a few unanswered questions in the thread since the last patch version was sent. So, I move it to "Waiting on Author". The new status of this patch is: Waiting on Author
Re: BUG #15383: Join Filter cost estimation problem in 10.5
Status update for a commitfest entry. It looks like there was no real progress on this issue since April. I see only an experimental patch. What kind of review does it need right now? Do we need more testing or maybe production statistics for such queries? David, are you going to continue working on it? Can you, please, provide a short summary of the problem and list open questions for reviewers. The new status of this patch is: Waiting on Author
Re: Implementing Incremental View Maintenance
ср, 28 окт. 2020 г. в 08:02, Yugo NAGATA : > Hi Anastasia Lubennikova, > > I am writing this to you because I would like to ask the commitfest > manager something. > > The status of the patch was changed to "Waiting on Author" from > "Ready for Committer" at the beginning of this montfor the reason > that rebase was necessary. Now I updated the patch, so can I change > the status back to "Ready for Committer"? > > Regards, > Yugo Nagata > > Yes, go ahead. As far as I see, the patch is in a good shape and there are no unanswered questions from reviewers. Feel free to change the status of CF entries, when it seems reasonable to you. P.S. Please, avoid top-posting, It makes it harder to follow the discussion, in-line replies are customary in pgsql mailing lists. See https://en.wikipedia.org/wiki/Posting_style#Top-posting for details. -- Best regards, Lubennikova Anastasia
Re: [patch] Fix checksum verification in base backups for zero page headers
On 26.10.2020 04:13, Michael Paquier wrote: On Fri, Oct 23, 2020 at 08:00:08AM +0900, Michael Paquier wrote: Yeah, we could try to make the logic a bit more complicated like that. However, for any code path relying on a page read without any locking insurance, we cannot really have a lot of trust in any of the fields assigned to the page as this could just be random corruption garbage, and the only thing I am ready to trust here a checksum mismatch check, because that's the only field on the page that's linked to its full contents on the 8k page. This also keeps the code simpler. A small update here. I have extracted the refactored part for PageIsVerified() and committed it as that's independently useful. This makes the patch proposed here simpler on HEAD, leading to the attached. -- Michael Thank you for committing the first part. In case you need a second opinion on the remaining patch, it still looks good to me. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: SEARCH and CYCLE clauses
On 10.10.2020 08:25, Pavel Stehule wrote: Hi pá 9. 10. 2020 v 12:17 odesílatel Pavel Stehule mailto:pavel.steh...@gmail.com>> napsal: pá 9. 10. 2020 v 11:40 odesílatel Peter Eisentraut mailto:peter.eisentr...@2ndquadrant.com>> napsal: On 2020-09-22 20:29, Pavel Stehule wrote: > The result is correct. When I tried to use UNION instead UNION ALL, the > pg crash I fixed the crash, but UNION [DISTINCT] won't actually work here because row/record types are not hashable. I'm leaving the partial support in, but I'm documenting it as currently not supported. I think so UNION is a common solution against the cycles. So missing support for this specific case is not a nice thing. How much work is needed for hashing rows. It should not be too much code. > looks so clause USING in cycle detection is unsupported for DB2 and > Oracle - the examples from these databases doesn't work on PG without > modifications Yeah, the path clause is actually not necessary from a user's perspective, but it's required for internal bookkeeping. We could perhaps come up with a mechanism to make it invisible coming out of the CTE (maybe give the CTE a target list internally), but that seems like a separate project. The attached patch fixes the issues you have reported (also the view issue from the other email). I have also moved the whole rewrite support to a new file to not blow up rewriteHandler.c so much. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services This patch is based on transformation CYCLE and SEARCH clauses to specific expressions - it is in agreement with ANSI SQL There is not a problem with compilation Nobody had objections in discussion There are enough regress tests and documentation check-world passed doc build passed I'll mark this patch as ready for committer Possible enhancing for this feature (can be done in next steps) 1. support UNION DISTINCT 2. better compatibility with Oracle and DB2 (USING clause can be optional) Regards Pavel Status update for a commitfest entry. According to cfbot patch no longer applies. So I moved it to waiting on author. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits
Status update for a commitfest entry. This patch is ReadyForCommitter. It applies and passes the CI. There are no unanswered questions in the discussion. The discussion started in 2015 with a patch by Jeff Janes. Later it was revived by Pavan Deolasee. After it was picked up by Ibrar Ahmed and finally, it was rewritten by me, so I moved myself from reviewers to authors as well. The latest version was reviewed and tested by Ibrar Ahmed. The patch doesn't affect COPY FREEZE performance and significantly decreases the time of the following VACUUM.
Commitfest 2020-11
Hello, hackers! November commitfest will start just in a few days. I'm happy to volunteer to be the CFM for this one. With a help of Georgios Kokolatos [1]. It's time to register your patch in the commitfest, if not yet. If you already have a patch in the commitfest, update its status and make sure it still applies and that the tests pass. Check the state at http://cfbot.cputube.org/ If there is a long-running stale discussion, please send a short summary update about its current state, open questions, and TODOs. I hope, it will encourage reviewers to pay more attention to the thread. [1] https://www.postgresql.org/message-id/AxH0n_zLwwJ0MBN3uJpHfYDkV364diOGhtpLAv0OC0qHLN8ClyPsbRi1fSUAJLJZzObZE_y1qc-jqGravjIMoxVrdtLm74HmTUeIPWWkmSg%3D%40pm.me -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Commitfest manager 2020-11
On 20.10.2020 10:30, gkokola...@pm.me wrote: ‐‐‐ Original Message ‐‐‐ [snip] Wow, that was well in advance) I am willing to assist if you need any help. Indeed it was a bit early. I left for vacation after that. For the record, I am newly active to the community. In our PUG, in Stockholm, we held a meetup during which a contributor presented ways to contribute to the community, one of which is becoming CFM. So, I thought of picking up the recommendation. I have taken little part in CFs as reviewer/author and I have no idea how a CF is actually run. A contributor from Stockholm has been willing to mentor me to the part. Since you have both the knowledge and specific ideas on improving the CF, how about me assisting you? I could shadow you and you can groom me to the part, so that I can take the lead to a future CF more effectively. This is just a suggestion of course. I am happy with anything that can help the community as a whole. Even though, I've worked a lot with community, I have never been CFM before as well. So, I think we can just follow these articles: https://www.2ndquadrant.com/en/blog/managing-a-postgresql-commitfest/ https://wiki.postgresql.org/wiki/CommitFest_Checklist Some parts are a bit outdated, but in general the checklist is clear. I've already requested CFM privileges in pgsql-www and I'm going to spend next week sending pings and updates to the patches at commitfest. There are already 219 patches, so I will appreciate if you join me in this task. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [patch] Fix checksum verification in base backups for zero page headers
On 22.10.2020 04:25, Michael Paquier wrote: On Thu, Oct 22, 2020 at 12:47:03AM +0300, Anastasia Lubennikova wrote: We can also read such pages via shared buffers to be 100% sure. Yeah, but this has its limits as well. One can use ignore_checksum_failure, but this can actually be very dangerous as you can finish by loading into shared buffers a page that has a header thought as sane but with a large portion of its page randomly corrupted, spreading corruption around and leading to more fancy logic failures in Postgres, with more panic from customers. Not using ignore_checksum_failure is safer, but it makes an analyze of the damages for a given file harder as things would stop at the first failure of a file with a seqscan. pg_prewarm can help here, but that's not the goal of the tool to do that either. I was thinking about applying this only to pages with LSN > startLSN. Most of such pages are valid and already in memory, because they were changed just recently, so no need for pg_prewarm here. If such LSN appeared because of a data corruption, page verification from inside ReadBuffer() will report an error first. In proposed function, we can handle this error in any fashion we want. Something like: if (PageGetLSN(page) > startptr) { if (!read_page_via_buffercache()) //throw a warning about corrupted page //handle checksum error as needed else //page is valid. No worries } -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Allow some recovery parameters to be changed with reload
On 10.08.2020 23:20, Robert Haas wrote: On Sun, Aug 9, 2020 at 1:21 AM Michael Paquier wrote: Sorry for the late reply. I have been looking at that stuff again, and restore_command can be called in the context of a WAL sender process within the page_read callback of logical decoding via XLogReadDetermineTimeline(), as readTimeLineHistory() could look for a timeline history file. So restore_command is not used only in the startup process. Hmm, interesting. But, does that make this change wrong, apart from the comments? Like, in the case of primary_conninfo, maybe some confusion could result if the startup process decided whether to ask for a WAL receiver based on thinking primary_conninfo being set, while that process thought that it wasn't actually set after all, as previously discussed in http://postgr.es/m/ca+tgmozvmjx1+qtww2tsnphrnkwkzxc3zsrynfb-fpzm1ox...@mail.gmail.com ... but what's the corresponding hazard here, exactly? It doesn't seem that there's any way in which the decision one process makes affects the decision the other process makes. There's still a race condition: it's possible for a walsender Did you mean walreceiver here? to use the old restore_command after the startup process had already used the new one, or the other way around. However, it doesn't seem like that should confuse anything inside the server, and therefore I'm not sure we need to code around it. I came up with following scenario. Let's say we have xlog files 1,2,3 in dir1 and files 4,5 in dir2. If startup process had only handled files 1 and 2, before we switched restore_command from reading dir1 to reading dir2, it will fail to find next file. IIUC, it will assume that recovery is done, start server and walreceiver. The walreceiver will fail as well. I don't know, how realistic is this case, though. In general,. this feature looks useful and consistent with previous changes, so I am interested in pushing it forward. Sergey, could you please attach this thread to the upcoming CF, if you're going to continue working on it. A few more questions: - RestoreArchivedFile() is also used by pg_rewind. I don't see any particular problem with it, just want to remind that we should test it too. - How will it interact with possible future optimizations of archive restore? For example, WAL prefetch [1]. [1] https://www.postgresql.org/message-id/flat/601ee1f5-0b78-47e1-9aae-c15f74a1c...@postgrespro.ru -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [patch] Fix checksum verification in base backups for zero page headers
On 20.10.2020 09:24, Michael Paquier wrote: On Fri, Oct 16, 2020 at 06:02:52PM +0300, Anastasia Lubennikova wrote: In the current patch, PageIsVerifed called from pgbasebackup simply doesn't Should we change this too? I don't see any difference. I considered that, but now that does not seem worth bothering here. Done. Thanks for the updated patch. I have played with dd and some fake files to check the validity of the zero-case (as long as pd_lsn does not go wild). Here are some comments, with an updated patch attached: - Added a PageIsVerifiedExtended to make this patch back-patchable, with the original routine keeping its original shape. Thank you. I always forget about this. Do we have any checklist for such changes, that patch authors and reviewers can use? - I did not like much to show the WARNING from PageIsVerified() for the report, and we'd lose some context related to the base backups. The important part is to know which blocks from which files are found as problematic. - Switched the checks to have PageIsVerified() placed first in the hierarchy, so as we do all the basic validity checks before a look at the LSN. This is not really change things logically. - The meaning of block_retry is also the opposite of what it should be in the original code? IMO, the flag should be set to true if we still are fine to retry a block, and set it to false once the one-time retry has failed. Agree. - The error strings are not really consistent with the project style in this area. These are usually not spawned across multiple lines to ease searches with grep or such. Same question as above. I don't see this info about formatting in the error message style guide in documentation. Is it mentioned somewhere else? Anastasia, Michael B, does that look OK to you? The final patch looks good to me. NB: This patch fixes only one problem, the zero-page case, as it was the intention of Michael B to split this part into its own thread. We still have, of course, a second problem when it comes to a random LSN put into the page header which could mask an actual checksum failure so this only takes care of half the issues. Having a correct LSN approximation is a tricky problem IMO, but we could improve things by having some delta with an extra WAL page on top of GetInsertRecPtr(). And this function cannot be used for base backups taken from standbys. We can also read such pages via shared buffers to be 100% sure. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [PATCH] Add extra statistics to explain for Nested Loop
On 16.10.2020 12:07, Julien Rouhaud wrote: Le ven. 16 oct. 2020 à 16:12, Pavel Stehule <mailto:pavel.steh...@gmail.com>> a écrit : pá 16. 10. 2020 v 9:43 odesílatel mailto:e.sokol...@postgrespro.ru>> napsal: Hi, hackers. For some distributions of data in tables, different loops in nested loop joins can take different time and process different amounts of entries. It makes average statistics returned by explain analyze not very useful for DBA. To fix it, here is the patch that add printing of min and max statistics for time and rows across all loops in Nested Loop to EXPLAIN ANALYSE. Please don't hesitate to share any thoughts on this topic! +1 This is great feature - sometimes it can be pretty messy current limited format +1, this can be very handy! Cool. I have added your patch to the commitfest, so it won't get lost. https://commitfest.postgresql.org/30/2765/ I will review the code next week. Unfortunately, I cannot give any feedback about usability of this feature. User visible change is: - -> Nested Loop (actual rows=N loops=N) + -> Nested Loop (actual min_rows=0 rows=0 max_rows=0 loops=2) Pavel, Julien, could you please say if it looks good? -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Commitfest manager 2020-11
On 16.10.2020 21:57, Tom Lane wrote: Anastasia Lubennikova writes: On the other hand, I noticed a lot of stall threads, that weren't updated in months. Some of them seem to pass several CFs without any activity at all. I believe that it is wrong for many reasons, the major of which IMHO is a frustration of the authors. Can we come up with something to impove this situation? Yeah, that's a perennial problem. Part of the issue is just a shortage of people --- there are always more patches than we can review and commit in one month. IMO, another cause is that we have a hard time saying "no". If a particular patch isn't too well liked, we tend to just let it slide to the next CF rather than making the uncomfortable decision to reject it. If you've got thoughts about that, or any other ways to improve the process, for sure speak up. From a CFM perspective, we can try the following things: - Write recaps for long-running threads, listing open questions and TODOs. This one is my personal pain. Some threads do look scary and it is less likely that someone will even start a review if they have to catch up with a year-long discussion of 10 people. - Mark patches from first-time contributors with some tag. Probably, these entries are simple/dummy enough to handle them faster. And also it will be a good reminder to be a bit less demanding with beginners. See Dmitry's statistic about how many people have sent patch only once [1]. - Proactively ask committers, if they are going to work on the upcoming CF and will they need any specific help. Maybe we can also ask about their preferred code areas and check what is left uncovered. It's really bad if there is no one, who is working with let's say WAL internals during the CF. TBH, I have no idea, what are we going to do with this knowledge, but I think it's better to know. - From time to time call a council of several committers and make tough decisions about patches that are in discussion for too long (let's say 4 commitfests). Hopefully, it will be easier to reach a consensus in a "real-time" discussion, or we can toss a coin. This problem was raised in previous discussions too [2]. [1] https://www.postgresql.org/message-id/CA+q6zcXtg7cFwX-c+BoOwk65+jtR-sQGZ=1mqg-vgmvzuh8...@mail.gmail.com [2] https://www.postgresql.org/message-id/flat/CAA8%3DA7-owFLugBVZ0JjehTZJue7brEs2qTjVyZFRDq-B%3D%2BNwNg%40mail.gmail.com -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Commitfest manager 2020-11
On 24.08.2020 16:08, gkokola...@pm.me wrote: Hi all, Admittedly quite ahead of time, I would like to volunteer as Commitfest manager for 2020-11. If the role is not filled and there are no objections, I can reach out again in October for confirmation. //Georgios Wow, that was well in advance) I am willing to assist if you need any help. I was looking for this message, to find out who is the current CFM. Apparently, the November commitfest is not in progress yet. Still, I have a question. Should we also maintain statuses of the patches in the "Open" commitfest? 21 patches were already committed during this CF, which shows that even "open" CF is quite active. I've updated a few patches, that were sent by my colleagues. If there are no objections, I can do that for other entries too. On the other hand, I noticed a lot of stall threads, that weren't updated in months. Some of them seem to pass several CFs without any activity at all. I believe that it is wrong for many reasons, the major of which IMHO is a frustration of the authors. Can we come up with something to impove this situation? P.S. I have a few more ideas about the CF management. I suppose, that they are usually being discussed at pgcon meetings, but those won't happen anytime soon. Is there a special place for such discussions, or I may continue this thread? -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [patch] Fix checksum verification in base backups for zero page headers
On 07.10.2020 11:18, Michael Paquier wrote: On Fri, Sep 25, 2020 at 08:53:27AM +0200, Michael Banck wrote: Oh right, I've fixed up the commit message in the attached V4. Not much a fan of what's proposed here, for a couple of reasons: - If the page is not new, we should check if the header is sane or not. - It may be better to actually count checksum failures if these are repeatable in a given block, and report them. - The code would be more simple with a "continue" added for a block detected as new or with a LSN newer than the backup start. - The new error messages are confusing, and I think that these are hard to translate in a meaningful way. I am interested in moving this patch forward, so here is the updated v5. This version uses PageIsVerified. All error messages are left unchanged. So I think that we should try to use PageIsVerified() directly in the code path of basebackup.c, and this requires a couple of changes to make the routine more extensible: - Addition of a dboid argument for pgstat_report_checksum_failure(), whose call needs to be changed to use the *_in_db() flavor. In the current patch, PageIsVerifed called from pgbasebackup simply doesn't report failures to pgstat. Because in basebackup.c we already report checksum failures separately. Once per file. pgstat_report_checksum_failures_in_db(dboid, checksum_failures); Should we change this too? I don't see any difference. - Addition of an option to decide if a log should be generated or not. - Addition of an option to control if a checksum failure should be reported to pgstat or not, even if this is actually linked to the previous point, I have seen cases where being able to control both separately would be helpful, particularly the log part. For the last two ones, I would use an extensible argument based on bits32 with a set of flags that the caller can set at will. Done. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company commit 553d62dbcdec1a00139f3c5ab6a325ed857b6c9d Author: anastasia Date: Fri Oct 16 17:36:02 2020 +0300 Fix checksum verification in base backups for zero page headers We currently do not checksum a page if it is considered new by PageIsNew(). However, this means that if the whole page header is zero, PageIsNew() will consider that page new due to the pd_upper field being zero and no subsequent checksum verification is done. To fix, use PageIsVerified() in pg_basebackup code. Reported-By: Andres Freund Author: Michael Banck Reviewed-By: Asif Rehman, Anastasia Lubennikova Discussion: https://postgr.es/m/20190319200050.ncuxejradurja...@alap3.anarazel.de diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c index dbbd3aa31f..b4e050c9b8 100644 --- a/src/backend/catalog/storage.c +++ b/src/backend/catalog/storage.c @@ -443,7 +443,7 @@ RelationCopyStorage(SMgrRelation src, SMgrRelation dst, smgrread(src, forkNum, blkno, buf.data); - if (!PageIsVerified(page, blkno)) + if (!PageIsVerified(page, blkno, PIV_LOG_WARNING | PIV_REPORT_STAT)) ereport(ERROR, (errcode(ERRCODE_DATA_CORRUPTED), errmsg("invalid page in block %u of relation %s", diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c index b89df01fa7..aa8d52c1aa 100644 --- a/src/backend/replication/basebackup.c +++ b/src/backend/replication/basebackup.c @@ -1682,11 +1682,14 @@ sendFile(const char *readfilename, const char *tarfilename, * this case. We also skip completely new pages, since they * don't have a checksum yet. */ -if (!PageIsNew(page) && PageGetLSN(page) < startptr) +if (PageGetLSN(page) < startptr) { - checksum = pg_checksum_page((char *) page, blkno + segmentno * RELSEG_SIZE); - phdr = (PageHeader) page; - if (phdr->pd_checksum != checksum) + int piv_flags = (checksum_failures < 5) ? (PIV_LOG_WARNING) : 0; + /* + * Do not report checksum failures to pgstats from + * PageIsVerified, since we will do it later. + */ + if (!PageIsVerified(page, blkno, piv_flags)) { /* * Retry the block on the first failure. It's @@ -1732,13 +1735,6 @@ sendFile(const char *readfilename, const char *tarfilename, checksum_failures++; - if (checksum_failures <= 5) - ereport(WARNING, - (errmsg("checksum verification failed in " - "file \"%s\", block %d: calculated " - "%X but expected %X", - readfilename, blkno, checksum, - phdr->pd_checksum))); if (checksum_failures == 5) ereport(WARNING, (errmsg("further checksum verification " diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index e549fa1d30..971c96c7de 100644 --- a/src/backend/storage/buffer/bufmg
Re: Deleting older versions in unique indexes to avoid page splits
On 08.10.2020 02:48, Peter Geoghegan wrote: On Tue, Jun 30, 2020 at 5:03 PM Peter Geoghegan wrote: Attached is a POC patch that teaches nbtree to delete old duplicate versions from unique indexes. The optimization targets non-HOT duplicate version bloat. Although the patch is rather rough, it nevertheless manages to more or less eliminate a whole class of index bloat: Unique index bloat from non-HOT updates in workloads where no transaction lasts for more than a few seconds. I'm slightly surprised that this thread didn't generate more interest back in June. After all, maintaining the pristine initial state of (say) a primary key index even after many high throughput non-HOT updates (i.e. avoiding "logically unnecessary" page splits entirely) is quite appealing. It arguably goes some way towards addressing long held criticisms of our approach to MVCC. Especially if it can be generalized to all b-tree indexes -- the Uber blog post mentioned tables that have several indexes, which presumably means that there can be no HOT updates (the author of the blog post didn't seem to be aware of HOT at all). The idea seems very promising, especially when extended to handle non-unique indexes too. I've been trying to generalize my approach to work with all indexes. I think that I can find a strategy that is largely effective at preventing version churn page splits that take place with workloads that have many non-HOT updates, without any serious downsides for workloads that do not benefit. I want to get feedback on that now, since I expect that it will be controversial. Teaching indexes about how tuples are versioned or chaining tuples seems like a non-starter, so the trick will be to come up with something that behaves in approximately the same way as that in cases where it helps. The approach I have in mind is to pass down a hint from the executor to btinsert(), which lets nbtree know that the index tuple insert is in fact associated with a non-HOT update. This hint is only given when the update did not logically modify the columns that the index covers That's exactly what I wanted to discuss after the first letter. If we could make (non)HOT-updates index specific, I think it could improve performance a lot. Here is the maybe-controversial part: The algorithm initially assumes that all indexes more or less have the same properties as unique indexes from a versioning point of view, even though that's clearly not true. That is, it initially assumes that the only reason why there can be duplicates on any leaf page it looks at is because some previous transaction also did a non-HOT update that added a new, unchanged index tuple. The new algorithm only runs when the hint is passed down from the executor and when the only alternative is to split the page (or have a deduplication pass), so clearly there is some justification for this assumption -- it really is highly unlikely that this update (which is on the verge of splitting the page) just so happened to be the first such update that affected the page. To be blunt: It may be controversial that we're accessing multiple heap pages while holding an exclusive lock on a leaf page, in the hopes that we can avoid a page split, but without any certainty that it'll work out. Sometimes (maybe even most of the time), this assumption turns out to be mostly correct, and we benefit in the obvious way -- no "unnecessary" page splits for affected non-unique indexes. Other times it won't work out, usually because the duplicates are in fact logical duplicates from distinct logical rows. I think that this optimization can affect low cardinality indexes negatively, but it is hard to estimate impact without tests. Maybe it won't be a big deal, given that we attempt to eliminate old copies not very often and that low cardinality b-trees are already not very useful. Besides, we can always make this thing optional, so that users could tune it to their workload. I wonder, how this new feature will interact with physical replication? Replica may have quite different performance profile. For example, there can be long running queries, that now prevent vacuumfrom removing recently-dead rows. How will we handle same situation with this optimized deletion? -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: MultiXact\SLRU buffers configuration
On 28.09.2020 17:41, Andrey M. Borodin wrote: Hi, Anastasia! 28 авг. 2020 г., в 23:08, Anastasia Lubennikova написал(а): 1) The first patch is sensible and harmless, so I think it is ready for committer. I haven't tested the performance impact, though. 2) I like the initial proposal to make various SLRU buffers configurable, however, I doubt if it really solves the problem, or just moves it to another place? The previous patch you sent was based on some version that contained changes for other slru buffers numbers: 'multixact_offsets_slru_buffers' and 'multixact_members_slru_buffers'. Have you just forgot to attach them? The patch message "[PATCH v2 2/4]" hints that you had 4 patches) Meanwhile, I attach the rebased patch to calm down the CFbot. The changes are trivial. 2.1) I think that both min and max values for this parameter are too extreme. Have you tested them? + _local_cache_entries, + 256, 2, INT_MAX / 2, 2.2) MAX_CACHE_ENTRIES is not used anymore, so it can be deleted. 3) No changes for third patch. I just renamed it for consistency. Thank you for your review. Indeed, I had 4th patch with tests, but these tests didn't work well: I still did not manage to stress SLRUs to reproduce problem from production... You are absolutely correct in point 2: I did only tests with sane values. And observed extreme performance degradation with values ~ 64 megabytes. I do not know which highest values should we pick? 1Gb? Or highest possible functioning value? I would go with the values that we consider adequate for this setting. As I see, there is no strict rule about it in guc.c and many variables have large border values. Anyway, we need to explain it at least in the documentation and code comments. It seems that the default was conservative enough, so it can be also a minimal value too. As for maximum, can you provide any benchmark results? If we have a peak and a noticeable performance degradation after that, we can use it to calculate the preferable maxvalue. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [PATCH] Automatic HASH and LIST partition creation
On 06.10.2020 00:21, Pavel Borisov wrote: Hi, hackers! I added some extra tests for different cases of use of automatic partition creation. v3-0002 can be applied on top of the original v2 patch for correct work with some corner cases with constraints included in this test. Thank you for the tests. I've added them and the fix into the patch. I also noticed, that some table parameters, such as persistence were not promoted to auto generated partitions. This is fixed now. The test cases for temp and unlogged auto partitioned tables are updated respectively. Besides, I slightly refactored the code and fixed documentation typos, that were reported by Rahila. With my recent changes, one test statement, that you've added as failing, works. CREATE TABLE list_parted_fail (a int) PARTITION BY LIST (a) CONFIGURATION (VALUES IN ('1' collate "POSIX")); It simply ignores collate POSIX part and creates a table with following structure: Partitioned table "public.list_parted_fail" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description +-+---+--+-+-+--+- a | integer | | | | plain | | Partition key: LIST (a) Partitions: list_parted_fail_0 FOR VALUES IN (1) Do you think that it is a bug? For now, I removed this statement from tests just to calm down the CI. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company commit 4a387cfbd43e93b2c2e363307fb9c9ca53c3f56e Author: anastasia Date: Tue Oct 6 20:23:22 2020 +0300 Auto generated HASH and LIST partitions. New syntax: CREATE TABLE tbl_hash (i int) PARTITION BY HASH (i) CONFIGURATION (modulus 3); CREATE TABLE tbl_list (i int) PARTITION BY LIST (i) CONFIGURATION (values in (1, 2), (3, 4) DEFAULT PARTITION tbl_default); With tests and documentation draft diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml index 28f844071b..4501e81bb5 100644 --- a/doc/src/sgml/ref/create_table.sgml +++ b/doc/src/sgml/ref/create_table.sgml @@ -29,6 +29,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI ] ) [ INHERITS ( parent_table [, ... ] ) ] [ PARTITION BY { RANGE | LIST | HASH } ( { column_name | ( expression ) } [ COLLATE collation ] [ opclass ] [, ... ] ) ] +[ CONFIGURATION ( partition_bound_auto_spec ) ] [ USING method ] [ WITH ( storage_parameter [= value] [, ... ] ) | WITHOUT OIDS ] [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ] @@ -41,6 +42,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI [, ... ] ) ] [ PARTITION BY { RANGE | LIST | HASH } ( { column_name | ( expression ) } [ COLLATE collation ] [ opclass ] [, ... ] ) ] +[ CONFIGURATION ( partition_bound_auto_spec ) ] [ USING method ] [ WITH ( storage_parameter [= value] [, ... ] ) | WITHOUT OIDS ] [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ] @@ -53,6 +55,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI [, ... ] ) ] { FOR VALUES partition_bound_spec | DEFAULT } [ PARTITION BY { RANGE | LIST | HASH } ( { column_name | ( expression ) } [ COLLATE collation ] [ opclass ] [, ... ] ) ] +[ CONFIGURATION ( partition_bound_auto_spec ) ] [ USING method ] [ WITH ( storage_parameter [= value] [, ... ] ) | WITHOUT OIDS ] [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ] @@ -96,6 +99,11 @@ FROM ( { partition_bound_expr | MIN TO ( { partition_bound_expr | MINVALUE | MAXVALUE } [, ...] ) | WITH ( MODULUS numeric_literal, REMAINDER numeric_literal ) +and partition_bound_auto_spec is: + +VALUES IN ( partition_bound_expr [, ...] ), [( partition_bound_expr [, ...] )] [, ...] [DEFAULT PARTITION default_part_name] +MODULUS numeric_literal + index_parameters in UNIQUE, PRIMARY KEY, and EXCLUDE constraints are: [ INCLUDE ( column_name [, ... ] ) ] @@ -383,6 +391,11 @@ WITH ( MODULUS numeric_literal, REM however, you can define these constraints on individual partitions. + + Hash and list partitioning also support automatic creation of partitions + with an optional CONFIGURATION clause. + + See for more discussion on table partitioning. @@ -391,6 +404,42 @@ WITH ( MODULUS numeric_literal, REM + +CONFIGURATION ( partition_bound_auto_spec ) ] + + + The optional CONFIGURATION clause used together + with PARTITION BY specifies a rule of generating bounds + for partitions of the partitioned table. All partitions are created automatically + along with the parent table. + + Any indexes, constraints and user-defined row-level triggers that exist + in the parent table are cloned on the new pa
Re: [PATCH] Automatic HASH and LIST partition creation
On 30.09.2020 22:58, Rahila Syed wrote: Hi Anastasia, I tested the syntax with some basic commands and it works fine, regression tests also pass. Thank you for your review. Couple of comments: 1. The syntax used omits the { IMMEDIATE | DEFERRED} keywords suggested in the earlier discussions. I think it is intuitive to include IMMEDIATE with the current implementation so that the syntax can be extended with a DEFERRED clause in future for dynamic partitions. CREATE TABLE tbl_lst (i int) PARTITION BY LIST (i) CONFIGURATION (values in (1, 2), (3, 4) DEFAULT PARTITION tbl_default); After some consideration, I decided that we don't actually need to introduce IMMEDIATE | DEFERRED keyword. For hash and list partitions it will always be immediate, as the number of partitions cannot change after we initially set it. For range partitions, on the contrary, it doesn't make much sense to make partitions immediately, because in many use-cases one bound will be open. 2. One suggestion for generation of partition names is to append a unique id to avoid conflicts. Can you please give an example of such a conflict? I agree that current naming scheme is far from perfect, but I think that 'tablename'_partnum provides unique name for each partition. 3. Probably, here you mean to write list and hash instead of range and list as per the current state. Range and list partitioning also support automatic creation of partitions with an optional CONFIGURATION clause. 4. Typo in default_part_name +VALUES IN ( partition_bound_expr [, ...] ), [( partition_bound_expr [, ...] )] [, ...] [DEFAULT PARTITION defailt_part_name] +MODULUS numeric_literal Yes, you're right. I will fix these typos in next version of the patch. Thank you, Rahila Syed -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [PATCH] Automatic HASH and LIST partition creation
On 24.09.2020 06:27, Michael Paquier wrote: On Mon, Sep 14, 2020 at 02:38:56PM +0300, Anastasia Lubennikova wrote: Fixed. This was also caught by cfbot. This version should pass it clean. Please note that regression tests are failing, because of 6b2c4e59. -- Michael Thank you. Updated patch is attached. Open issues for review: - new syntax; - generation of partition names; - overall patch review and testing, especially with complex partitioning clauses. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company commit faa5805b839effd9d8220eff787fb0995276c370 Author: anastasia Date: Mon Sep 14 11:34:42 2020 +0300 Auto generated HASH and LIST partitions. New syntax: CREATE TABLE tbl_hash (i int) PARTITION BY HASH (i) CONFIGURATION (modulus 3); CREATE TABLE tbl_list (i int) PARTITION BY LIST (i) CONFIGURATION (values in (1, 2), (3, 4) DEFAULT PARTITION tbl_default); With documentation draft. diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml index 087cad184c..ff9a7eda09 100644 --- a/doc/src/sgml/ref/create_table.sgml +++ b/doc/src/sgml/ref/create_table.sgml @@ -29,6 +29,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI ] ) [ INHERITS ( parent_table [, ... ] ) ] [ PARTITION BY { RANGE | LIST | HASH } ( { column_name | ( expression ) } [ COLLATE collation ] [ opclass ] [, ... ] ) ] +[ CONFIGURATION ( partition_bound_auto_spec ) ] [ USING method ] [ WITH ( storage_parameter [= value] [, ... ] ) | WITHOUT OIDS ] [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ] @@ -41,6 +42,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI [, ... ] ) ] [ PARTITION BY { RANGE | LIST | HASH } ( { column_name | ( expression ) } [ COLLATE collation ] [ opclass ] [, ... ] ) ] +[ CONFIGURATION ( partition_bound_auto_spec ) ] [ USING method ] [ WITH ( storage_parameter [= value] [, ... ] ) | WITHOUT OIDS ] [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ] @@ -53,6 +55,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI [, ... ] ) ] { FOR VALUES partition_bound_spec | DEFAULT } [ PARTITION BY { RANGE | LIST | HASH } ( { column_name | ( expression ) } [ COLLATE collation ] [ opclass ] [, ... ] ) ] +[ CONFIGURATION ( partition_bound_auto_spec ) ] [ USING method ] [ WITH ( storage_parameter [= value] [, ... ] ) | WITHOUT OIDS ] [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ] @@ -96,6 +99,11 @@ FROM ( { partition_bound_expr | MIN TO ( { partition_bound_expr | MINVALUE | MAXVALUE } [, ...] ) | WITH ( MODULUS numeric_literal, REMAINDER numeric_literal ) +and partition_bound_auto_spec is: + +VALUES IN ( partition_bound_expr [, ...] ), [( partition_bound_expr [, ...] )] [, ...] [DEFAULT PARTITION defailt_part_name] +MODULUS numeric_literal + index_parameters in UNIQUE, PRIMARY KEY, and EXCLUDE constraints are: [ INCLUDE ( column_name [, ... ] ) ] @@ -383,6 +391,11 @@ WITH ( MODULUS numeric_literal, REM however, you can define these constraints on individual partitions. + + Range and list partitioning also support automatic creation of partitions + with an optional CONFIGURATION clause. + + See for more discussion on table partitioning. @@ -391,6 +404,38 @@ WITH ( MODULUS numeric_literal, REM + +CONFIGURATION ( partition_bound_auto_spec ) ] + + + The optional CONFIGURATION clause used together + with PARTITION BY specifies a rule of generating bounds + for partitions of the partitioned table. All partitions are created automatically + along with the parent table. + + Any indexes, constraints and user-defined row-level triggers that exist + in the parent table are cloned on the new partitions. + + + + The partition_bound_auto_spec + must correspond to the partitioning method and partition key of the + parent table, and must not overlap with any existing partition of that + parent. The form with VALUES IN is used for list partitioning + and the form with MODULUS is used for hash partitioning. + List partitioning can also provide a default partition using + DEFAULT PARTITION. + + + + Automatic range partitioning is not supported yet. + + + + + + + PARTITION OF parent_table { FOR VALUES partition_bound_spec | DEFAULT } diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c index 0409a40b82..6893fa5495 100644 --- a/src/backend/nodes/copyfuncs.c +++ b/src/backend/nodes/copyfuncs.c @@ -4628,6 +4628,7 @@ _copyPartitionSpec(const PartitionSpec *from) COPY_STRING_FIELD(strategy); COPY_NODE_FIELD(partParams); + COPY_NODE_FIELD(autopart); COPY_LOCATION_FIELD(location); return newnode; @@ -4650,6
Re: [patch] Fix checksum verification in base backups for zero page headers
On 22.09.2020 17:30, Michael Banck wrote: Hi, Am Dienstag, den 22.09.2020, 16:26 +0200 schrieb Michael Banck: Am Mittwoch, den 02.09.2020, 16:50 +0300 schrieb Anastasia Lubennikova: I've looked through the previous discussion. As far as I got it, most of the controversy was about online checksums improvements. The warning about pd_upper inconsistency that you've added is a good addition. The patch is a bit messy, though, because a huge code block was shifted. Will it be different, if you just leave "if (!PageIsNew(page) && PageGetLSN(page) < startptr)" block as it was, and add "else if (PageIsNew(page) && !PageIsZero(page))" ? Thanks, that does indeed look better as a patch and I think it's fine as-is for the code as well, I've attached a v2. Sorry, forgot to add you as reviewer in the proposed commit message, I've fixed that up now in V3. Michael Great. This version looks good to me. Thank you for answering my questions, I agree, that we can work on them in separate threads. So I mark this one as ReadyForCommitter. The only minor problem is a typo (?) in the proposed commit message. "If a page is all zero, consider that a checksum failure." It should be "If a page is NOT all zero...". -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [PATCH] Automatic HASH and LIST partition creation
and I'd like to meet this functionality in v14. I also hope that this patch will make it to v14, but for now, I don't see a consensus on the syntax and some details, so I wouldn't rush. Besides, it definitely needs more testing. I haven't thoroughly tested following cases yet: - how triggers and constraints are propagated to partitions; - how does it handle some tricky clauses in list partitioning expr_list; and so on. Also, there is an open question about partition naming. Currently, the patch implements dummy %tbl_%partnum name generation, which is far from user-friendly. I think we must provide some hook or trigger function to rename partitions after they were created. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company commit 08c5f8b35c4ffcaf09b8189cd8d0dc27ce76d715 Author: anastasia Date: Mon Sep 14 11:34:42 2020 +0300 Auto generated HASH and LIST partitions. New syntax: CREATE TABLE tbl_hash (i int) PARTITION BY HASH (i) CONFIGURATION (modulus 3); CREATE TABLE tbl_list (i int) PARTITION BY LIST (i) CONFIGURATION (values in (1, 2), (3, 4) DEFAULT PARTITION tbl_default); With documentation draft. diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml index 087cad184c..ff9a7eda09 100644 --- a/doc/src/sgml/ref/create_table.sgml +++ b/doc/src/sgml/ref/create_table.sgml @@ -29,6 +29,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI ] ) [ INHERITS ( parent_table [, ... ] ) ] [ PARTITION BY { RANGE | LIST | HASH } ( { column_name | ( expression ) } [ COLLATE collation ] [ opclass ] [, ... ] ) ] +[ CONFIGURATION ( partition_bound_auto_spec ) ] [ USING method ] [ WITH ( storage_parameter [= value] [, ... ] ) | WITHOUT OIDS ] [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ] @@ -41,6 +42,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI [, ... ] ) ] [ PARTITION BY { RANGE | LIST | HASH } ( { column_name | ( expression ) } [ COLLATE collation ] [ opclass ] [, ... ] ) ] +[ CONFIGURATION ( partition_bound_auto_spec ) ] [ USING method ] [ WITH ( storage_parameter [= value] [, ... ] ) | WITHOUT OIDS ] [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ] @@ -53,6 +55,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI [, ... ] ) ] { FOR VALUES partition_bound_spec | DEFAULT } [ PARTITION BY { RANGE | LIST | HASH } ( { column_name | ( expression ) } [ COLLATE collation ] [ opclass ] [, ... ] ) ] +[ CONFIGURATION ( partition_bound_auto_spec ) ] [ USING method ] [ WITH ( storage_parameter [= value] [, ... ] ) | WITHOUT OIDS ] [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ] @@ -96,6 +99,11 @@ FROM ( { partition_bound_expr | MIN TO ( { partition_bound_expr | MINVALUE | MAXVALUE } [, ...] ) | WITH ( MODULUS numeric_literal, REMAINDER numeric_literal ) +and partition_bound_auto_spec is: + +VALUES IN ( partition_bound_expr [, ...] ), [( partition_bound_expr [, ...] )] [, ...] [DEFAULT PARTITION defailt_part_name] +MODULUS numeric_literal + index_parameters in UNIQUE, PRIMARY KEY, and EXCLUDE constraints are: [ INCLUDE ( column_name [, ... ] ) ] @@ -383,6 +391,11 @@ WITH ( MODULUS numeric_literal, REM however, you can define these constraints on individual partitions. + + Range and list partitioning also support automatic creation of partitions + with an optional CONFIGURATION clause. + + See for more discussion on table partitioning. @@ -391,6 +404,38 @@ WITH ( MODULUS numeric_literal, REM + +CONFIGURATION ( partition_bound_auto_spec ) ] + + + The optional CONFIGURATION clause used together + with PARTITION BY specifies a rule of generating bounds + for partitions of the partitioned table. All partitions are created automatically + along with the parent table. + + Any indexes, constraints and user-defined row-level triggers that exist + in the parent table are cloned on the new partitions. + + + + The partition_bound_auto_spec + must correspond to the partitioning method and partition key of the + parent table, and must not overlap with any existing partition of that + parent. The form with VALUES IN is used for list partitioning + and the form with MODULUS is used for hash partitioning. + List partitioning can also provide a default partition using + DEFAULT PARTITION. + + + + Automatic range partitioning is not supported yet. + + + + + + + PARTITION OF parent_table { FOR VALUES partition_bound_spec | DEFAULT } diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c index 0409a40b82..6893fa5495 100644 --- a/src/backend/nodes/copyfuncs.c +++ b/src/backend/nodes/copyfuncs.
Re: Making index_set_state_flags() transactional
On 03.09.2020 11:04, Michael Paquier wrote: Hi all, For a couple of things I looked at lately, it would be really useful to make index_state_set_flags() transactional and replace its use of heap_inplace_update() by CatalogTupleUpdate(): - When dropping an index used in a replica identity, we could make the reset of relreplident for the parent table consistent with the moment the index is marked as invalid: https://www.postgresql.org/message-id/20200831062304.ga6...@paquier.xyz - In order to make CREATE INDEX CONCURRENTLY work for partitioned tables, we need to be able to switch indisvalid for all the index partitions in one final transaction so as we have a consistent partition tree at any state of the operation. Obviously, heap_inplace_update() is a problem here. The restriction we have in place now can be lifted thanks to 813fb03, that removed SnapshotNow, and as far as I looked at the code paths doing scans of pg_index, I don't see any reasons why we could not make this stuff transactional. Perhaps that's just because nobody had use cases where it would be useful, and I have two of them lately. Note that 3c84046 was the original commit that introduced the use of heap_inplace_update() and index_state_set_flags(), but we don't have SnapshotNow anymore. Note as well that we already make the index swapping transactional in REINDEX CONCURRENTLY for the pg_index entries. I got the feeling that this is an issue different than the other threads where this was discussed, which is why I am beginning a new thread. Any thoughts? Attached is a proposal of patch. Thanks, -- Michael As this patch is needed to continue the work started in "reindex/cic/cluster of partitioned tables" thread, I took a look on it. I agree that transactional update in index_set_state_flags() is good for both cases, that you mentioned and don't see any restrictions. It seems that not doing this before was just a loose end, as the comment from 813fb03 patch clearly stated, that it is safe to make this update transactional. The patch itself looks clear and committable. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: 回复:how to create index concurrently on partitioned table
On 07.09.2020 04:58, Michael Paquier wrote: On Fri, Sep 04, 2020 at 09:51:13AM +0900, Michael Paquier wrote: Glad to hear that, please take the time you need. Attached is a rebased version to address the various conflicts after 844c05a. -- Michael Thank you. With the fix for the cycle in reindex_partitions() and new comments added, I think this patch is ready for commit. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: history file on replica and double switchover
On 27.08.2020 16:02, Grigory Smolkin wrote: Hello! I`ve noticed, that when running switchover replica to master and back to replica, new history file is streamed to replica, but not archived, which is not great, because it breaks PITR if archiving is running on replica. The fix looks trivial. Bash script to reproduce the problem and patch are attached. Thanks for the report. I agree that it looks like a bug. For some reason, patch failed to apply on current master, even though I don't see any difference in the code. I'll attach this thread to the next commitfest, so it doesn't get lost. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: 回复:how to create index concurrently on partitioned table
On 02.09.2020 04:39, Michael Paquier wrote: The problem with dropped relations in REINDEX has been addressed by 1d65416, so I have gone through this patch again and simplified the use of session locks, these being taken only when doing a REINDEX CONCURRENTLY for a given partition. This part is in a rather committable shape IMO, so I would like to get it done first, before looking more at the other cases with CIC and CLUSTER. I am still planning to go through it once again. -- Michael Thank you for advancing this work. I was reviewing the previous version, but the review became outdated before I sent it. Overall design is fine, but I see a bunch of things that need to be fixed before commit. First of all, this patch fails at cfbot: indexcmds.c:2848:7: error: variable ‘parentoid’ set but not used [-Werror=unused-but-set-variable] Oid parentoid;^ It seems to be just a typo. With this minor fix the patch compiles and passes tests. diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index 75008eebde..f5b3c53a83 100644 --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -2864,7 +2864,7 @@ reindex_partitions(Oid relid, int options, bool concurrent, /* Save partition OID */ old_context = MemoryContextSwitchTo(reindex_context); - partitions = lappend_oid(partitions, partoid); + partitions = lappend_oid(partitions, parentoid); MemoryContextSwitchTo(old_context); } If this guessed fix is correct, I see the problem in the patch logic. In reindex_partitions() we collect parent relations to pass them to reindex_multiple_internal(). It implicitly changes the logic from REINDEX INDEX to REINDEX RELATION, which is not the same, if table has more than one index. For example, if I add one more index to a partitioned table from a create_index.sql test: create index idx on concur_reindex_part_0_2 using btree (c2); and call REINDEX INDEX CONCURRENTLY concur_reindex_part_index; idx will be reindexed as well. I doubt that it is the desired behavior. A few more random issues, that I noticed at first glance: 1) in create_index.sql Are these two lines intentional checks that must fail? If so, I propose to add a comment. REINDEX TABLE concur_reindex_part_index; REINDEX TABLE CONCURRENTLY concur_reindex_part_index; A few lines around also look like they were copy-pasted and need a second look. 2) This part of ReindexRelationConcurrently() is misleading. case RELKIND_PARTITIONED_TABLE: case RELKIND_PARTITIONED_INDEX: default: /* Return error if type of relation is not supported */ ereport(ERROR, (errcode(ERRCODE_WRONG_OBJECT_TYPE), errmsg("cannot reindex this type of relation concurrently"))); Maybe assertion is enough. It seems, that we should never reach this code because both ReindexTable and ReindexIndex handle those relkinds separately. Which leads me to the next question. 3) Is there a reason, why reindex_partitions() is not inside ReindexRelationConcurrently() ? I think that logically it belongs there. 4) I haven't tested multi-level partitions yet. In any case, it would be good to document this behavior explicitly. I need a bit more time to review this patch more thoroughly. Please, wait for it, before committing. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [patch] Fix checksum verification in base backups for zero page headers
On 01.09.2020 13:22, Michael Banck wrote: Hi, as a continuation of [1], I've split-off the zero page header case from the last patch, as this one seems less contentious. Michael [1] https://commitfest.postgresql.org/28/2308/ I've looked through the previous discussion. As far as I got it, most of the controversy was about online checksums improvements. The warning about pd_upper inconsistency that you've added is a good addition. The patch is a bit messy, though, because a huge code block was shifted. Will it be different, if you just leave "if (!PageIsNew(page) && PageGetLSN(page) < startptr)" block as it was, and add "else if (PageIsNew(page) && !PageIsZero(page))" ? While on it, I also have a few questions about the code around: 1) Maybe we can use other header sanity checks from PageIsVerified() as well? Or even the function itself. 2) > /* Reset loop to validate the block again */ How many times do we try to reread the block? Is one reread enough? Speaking of which, 'reread_cnt' name looks confusing to me. I would expect that this variable contains a number of attempts, not the number of bytes read. If everyone agrees, that for basebackup purpose it's enough to rely on a single reread, I'm ok with it. Another approach is to read the page directly from shared buffers to ensure that the page is fine. This way is more complicated, but should be almost bullet-proof. Using it we can also check pages with lsn >= startptr. 3) Judging by warning messages, we count checksum failures per file, not per page, and don't report after a fifth failure. Why so? Is it a common case that so many pages of one file are corrupted? -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: REINDEX SCHEMA/DATABASE/SYSTEM weak with dropped relations
On 01.09.2020 04:38, Michael Paquier wrote: I have added some extra comments. There is one in ReindexRelationConcurrently() to mention that there should be no extra use of MISSING_OK once the list of indexes is built as session locks are taken where needed. Great, it took me a moment to understand the logic around index list check at first pass. Does the version attached look fine to you? I have done one round of indentation while on it. Yes, this version is good. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: REINDEX SCHEMA/DATABASE/SYSTEM weak with dropped relations
On 13.08.2020 07:38, Michael Paquier wrote: Hi all, While working on support for REINDEX for partitioned relations, I have noticed an old bug in the logic of ReindexMultipleTables(): the list of relations to process is built in a first transaction, and then each table is done in an independent transaction, but we don't actually check that the relation still exists when doing the real work. I think that a complete fix involves two things: 1) Addition of one SearchSysCacheExists1() at the beginning of the loop processing each item in the list in ReindexMultipleTables(). This would protect from a relation dropped, but that would not be enough if ReindexMultipleTables() is looking at a relation dropped before we lock it in reindex_table() or ReindexRelationConcurrently(). Still that's simple, cheaper, and would protect from most problems. 2) Be completely water-proof and adopt a logic close to what we do for VACUUM where we try to open a relation, but leave if it does not exist. This can be achieved with just some try_relation_open() calls with the correct lock used, and we also need to have a new REINDEXOPT_* flag to control this behavior conditionally. 2) and 1) are complementary, but 2) is invasive, so based on the lack of complaints we have seen that does not seem really worth backpatching to me, and I think that we could also just have 1) on stable branches as a minimal fix, to take care of most of the problems that could show up to users. Attached is a patch to fix all that, with a cheap isolation test that fails on HEAD with a cache lookup failure. I am adding that to the next CF for now, and I would rather fix this issue before moving on with REINDEX for partitioned relations as fixing this issue reduces the use of session locks for partition trees. Any thoughts? -- Michael Hi, I reviewed the patch. It does work and the code is clean and sane. It implements a logic that we already had in CLUSTER, so I think it was simply an oversight. Thank you for fixing this. I noticed that REINDEXOPT_MISSING_OK can be passed to the TOAST table reindex. I think it would be better to reset the flag in this reindex_relation() call, as we don't expect a concurrent DROP here. /* * If the relation has a secondary toast rel, reindex that too while we * still hold the lock on the main table. */ if ((flags & REINDEX_REL_PROCESS_TOAST) && OidIsValid(toast_relid)) result |= reindex_relation(toast_relid, flags, options); -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company