Re: Use generation context to speed up tuplesorts
Le jeudi 20 janvier 2022, 02:31:15 CET David Rowley a écrit : > On Tue, 18 Jan 2022 at 19:45, Julien Rouhaud wrote: > > I'm also a bit confused about which patch(es) should actually be reviewed > > in that thread. My understanding is that the original patch might not be > > relevant anymore but I might be wrong. The CF entry should probably be > > updated to clarify that, with an annotation/ title change / author update > > or something. > > > > In the meantime I will switch the entry to Waiting on Author. > > I think what's going on here is a bit confusing. There's quite a few > patches on here that aim to do very different things. > > The patch I originally proposed was just to make tuplesort use > generation memory contexts. This helps speed up tuplesort because > generation contexts allow palloc() to be faster than the standard > allocator. Additionally, there's no power-of-2 wastage with generation > contexts like there is with the standard allocator. This helps fit > more tuples on fewer CPU cache lines. > > I believe Andres then mentioned that the fixed block size for > generation contexts might become a problem. With the standard > allocator the actual size of the underlying malloc() grows up until a > maximum size. With generation contexts this is always the same, so we > could end up with a very large number of blocks which means lots of > small mallocs which could end up slow. Tomas then posted a series of > patches to address this. > > I then posted another patch that has the planner make a choice on the > size of the blocks to use for the generation context based on the > tuple width and number of tuple estimates. My hope there was to > improve performance further by making a better choice for how large to > make the blocks in the first place. This reduces the need for Tomas' > patch, but does not eliminate the need for it. > > As of now, I still believe we'll need Tomas' patches to allow the > block size to grow up to a maximum size. I think those patches are > likely needed before we think about making tuplesort use generation > contexts. The reason I believe this is that we don't always have good > estimates for the number of tuples we're going to sort. nodeSort.c is > fairly easy as it just fetches tuples once and then sorts them. The > use of tuplesort for nodeIncrementalSort.c is much more complex as > we'll likely be performing a series of smaller sorts which are much > harder to get size estimates for. This means there's likely no magic > block size that we can use for the generation context that's used to > store the tuples in tuplesort. This is also the case for order by > aggregates and probably most other usages of tuplesort. > You left out the discussion I started about glibc's malloc specific behaviour. I tried to demonstrate that a big part of the performance gain you were seeing were not in fact due to our allocator by itself, but by the way different block sizes allocations interact with this malloc implementation and how it handles releasing memory back to the system. I also tried to show that we can give DBAs more control over how much memory to "waste" as the price for faster allocations. I agree that the generation context memory manager is useful in the tuplesort context, if only for the fact that we fall back to disk less quickly due to the reduced wastage of memory, but I would be wary of drawing conclusions too quickly based on a specific malloc implementation which shows threshold effects, which are compounded by our growing block sizes code in general. I have on my TODO list to run the same kind of benchmarks using jemalloc, to better isolate the performance gains expected from the generation allocator itself from the side effect of avoiding glibc's pattern of releasing memory back to the kernel too quickly. Julien, sorry for the confusion of adding that discussion and those patches to the same thread, but I don't really see a way of dissociating those two aspects. -- Ronan Dunklau
RE: Support tab completion for upper character inputs in psql
On Sunday, January 16, 2022 3:51 AM, Tom Lane said: > Peter Eisentraut writes: > > The rest of the patch seems ok in principle, since AFAICT enums are the > > only query result in tab-complete.c that are not identifiers and thus > > subject to case issues. > > I spent some time looking at this patch. I'm not very happy with it, > for two reasons: > > 1. The downcasing logic in the patch bears very little resemblance > to the backend's actual downcasing logic, which can be found in > src/backend/parser/scansup.c's downcase_identifier(). Notably, > the patch's restriction to only convert all-ASCII strings seems > indefensible, because that's not how things really work. I fear > we can't always exactly duplicate the backend's behavior, because > it's dependent on the server's locale and encoding; but I think > we should at least get it right in the common case where psql is > using the same locale and encoding as the server. Thanks for your suggestion, I removed ASCII strings check function and added single byte encoding check just like downcase_identifier. Also added PGCLIENTENCODING setting in the test script to make test cases pass. Now the patch supports tab-completion with none-quoted upper characters available when client encoding is in single byte. > 2. I don't think there's been much thought about the larger picture > of what is to be accomplished. Right now, we successfully > tab-complete inputs that are prefixes of the canonical spelling (per > quote_identifier) of the object's name, and don't try at all for > non-canonical spellings. I'm on board with trying to allow some of > the latter but I'm not sure that this patch represents much forward > progress. To be definite about it, suppose we have a DB containing > just two tables whose names start with "m", say mytab and mixedTab. > Then: > > (a) m immediately completes mytab, ignoring mixedTab > > (b) "m immediately completes "mixedTab", ignoring mytab > > (c) "my fails to find anything > > (d) mi fails to find anything > > (e) M fails to find anything > > This patch proposes to improve case (e), but to my taste cases (a) > through (c) are much bigger problems. It'd be nice if (d) worked too > --- that'd require injecting a double-quote where the user had not > typed one, but we already do the equivalent thing with single-quotes > for file names, so why not? (Although after fighting with readline > yesterday to try to get it to handle single-quoted enum labels sanely, > I'm not 100% sure if (d) is possible.) > > Also, even for case (e), what we have with this patch is that it > immediately completes mytab, ignoring mixedTab. Is that what we want? > Another example is that miX fails to find anything, which seems > like a POLA violation given that mY completes to mytab. > > I'm not certain how many of these alternatives can be supported > without introducing ambiguity that wasn't there before (which'd > manifest as failing to complete in cases where the existing code > chooses an alternative just fine). But I really don't like the > existing behavior for (b) and (c) --- I should be able to spell > a name with double quotes if I want, without losing completion > support. You are right, it's more convenient in that way. I haven't thought about it before. By now, the patch suppose: If user needs to type a table with name in upper character, they should input the double quotes by themselves. If the double quote is input by a user, only table name with upper character could be searched. I may try to implement as you expected but it seems not so easy. (as you said, without introducing ambiguity that wasn't there before) I'd appreciate if someone could give me a hint/hand on this. > BTW, another thing that maybe we should think about is how this > interacts with the pattern matching capability in \d and friends. > If people can tab-complete non-canonical spellings, they might > expect the same spellings to work in \d. I don't say that this > patch has to fix that, but we might want to look and be sure we're > not painting ourselves into a corner (especially since I see > that we already perform tab-completion in that context). Yes. Agreed, if we solve the previous problem, meta-command tab completion should also be considered. Regards, Tang v12-0001-Support-tab-completion-with-a-query-result-for-u.patch Description: v12-0001-Support-tab-completion-with-a-query-result-for-u.patch
Re: Refactoring of compression options in pg_basebackup
On Wed, Jan 19, 2022 at 08:35:23AM -0500, Robert Haas wrote: > I think that this will reject something like --compress=nonetheless by > telling you that 't' is not a valid separator. I think it would be > better to code this so that we first identify the portion preceding > the first colon, or the whole string if there is no colon. Then we > check whether that part is a compression method that we recognize. If > not, we complain. Well, if no colon is specified, we still need to check if optarg is a pure integer if it does not match any of the supported methods, as --compress=0 should be backward compatible with no compression and --compress=1~9 should imply gzip, no? > If so, we then check whatever is after the separator > for validity - and this might differ by type. For example, we could > then immediately reject none:4, and if in the future we want to allow > lz4:fast3, we could. Okay. > I think the code that handles the bare integer case should be at the > top of the function and should return, because that code is short. > Then the rest of the function doesn't need to be indented as deeply. Done this way, I hope. -- Michael From b5d84d8daac4c537657bfeef2ed0b5550106ffc2 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Thu, 20 Jan 2022 15:59:54 +0900 Subject: [PATCH v6] Rework the option set of pg_basebackup --- src/bin/pg_basebackup/pg_basebackup.c| 150 --- src/bin/pg_basebackup/t/010_pg_basebackup.pl | 46 +- doc/src/sgml/ref/pg_basebackup.sgml | 21 ++- 3 files changed, 194 insertions(+), 23 deletions(-) diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c index 2a58be638a..ae003766cc 100644 --- a/src/bin/pg_basebackup/pg_basebackup.c +++ b/src/bin/pg_basebackup/pg_basebackup.c @@ -123,6 +123,7 @@ static bool showprogress = false; static bool estimatesize = true; static int verbose = 0; static int compresslevel = 0; +static WalCompressionMethod compressmethod = COMPRESSION_NONE; static IncludeWal includewal = STREAM_WAL; static bool fastcheckpoint = false; static bool writerecoveryconf = false; @@ -376,7 +377,8 @@ usage(void) printf(_(" -X, --wal-method=none|fetch|stream\n" " include required WAL files with specified method\n")); printf(_(" -z, --gzip compress tar output\n")); - printf(_(" -Z, --compress=0-9 compress tar output with given compression level\n")); + printf(_(" -Z, --compress={gzip,none}[:LEVEL] or [LEVEL]\n" + " compress tar output with given compression method or level\n")); printf(_("\nGeneral options:\n")); printf(_(" -c, --checkpoint=fast|spread\n" " set fast or spread checkpointing\n")); @@ -541,8 +543,7 @@ LogStreamerMain(logstreamer_param *param) stream.do_sync); else stream.walmethod = CreateWalTarMethod(param->xlog, - (compresslevel != 0) ? - COMPRESSION_GZIP : COMPRESSION_NONE, + compressmethod, compresslevel, stream.do_sync); @@ -933,6 +934,84 @@ parse_max_rate(char *src) return (int32) result; } +/* + * Utility wrapper to parse the values specified for -Z/--compress. + * *methodres and *levelres will be optionally filled with values coming + * from the parsed results. + */ +static void +parse_compress_options(char *src, WalCompressionMethod *methodres, + int *levelres) +{ + char *sep; + int firstlen; + char *firstpart = NULL; + + /* check if the option is split in two */ + sep = strchr(src, ':'); + + /* + * The first part of the option value could be a method name, or just + * a level value. + */ + firstlen = (sep != NULL) ? (sep - src) : strlen(src); + firstpart = pg_malloc(firstlen + 1); + strncpy(firstpart, src, firstlen); + firstpart[firstlen] = '\0'; + + /* + * Check if the first part of the string matches with a supported + * compression method. + */ + if (pg_strcasecmp(firstpart, "gzip") != 0 && + pg_strcasecmp(firstpart, "none") != 0) + { + /* + * It does not match anything known, so check for the + * backward-compatible case of only an integer, where the implied + * compression method changes depending on the level value. + */ + if (!option_parse_int(firstpart, "-Z/--compress", 0, + INT_MAX, levelres)) + exit(1); + + *methodres = (*levelres > 0) ? + COMPRESSION_GZIP : COMPRESSION_NONE; + return; + } + + /* Supported method found. */ + if (pg_strcasecmp(firstpart, "gzip") == 0) + *methodres = COMPRESSION_GZIP; + else if (pg_strcasecmp(firstpart, "none") == 0) + *methodres = COMPRESSION_NONE; + + if (sep == NULL) + { + /* + * The caller specified a method without a colon separator, so let + * any subsequent checks assign a default. + */ + return; + } + + /* Check the contents after the colon separator. */ + sep++; + if (*sep == '\0') + { + pg_log_error("no compression level defined for method %s", firstpart); + exit(1); +
RE: [Proposal] Add foreign-server health checks infrastructure
Dear Zhihong, Thank you for reviewing! And I have to apologize for the late. I'll post new patchset later. > + UnregisterAllCheckingRemoteServersCallback(); > > UnregisterAllCheckingRemoteServersCallback > -> UnregisterAllCheckingRemoteServersCallbacks Will fix. > + CheckingRemoteServersCallbackItem *item; > + item = fdw_callbacks; > > The above two lines can be combined. Will fix. > +UnregisterCheckingRemoteServersCallback(CheckingRemoteServersCallback > callback, > + void *arg) > > Is the above func called anywhere ? Currently not, I just kept the API because of any other FDW extensions. But I cannot find any use cases, so will remove. > + if (item->callback == callback && item->arg == arg) > > Is comparing argument pointers stable ? What if the same argument is cloned > ? This part is no longer needed. Will remove. > + CallCheckingRemoteServersCallbacks(); > + > + if (remote_servers_connection_check_interval > 0) > + enable_timeout_after(CHECKING_REMOTE_SERVERS_TIMEOUT, > + > remote_servers_connection_check_interval); > > Should the call to CallCheckingRemoteServersCallbacks() be placed under the > if block checking remote_servers_connection_check_interval ? > If remote_servers_connection_check_interval is 0, it seems the callbacks > shouldn't be called. Agreed. We force stopping timeout when the GUC sets to 0 in assign_hook, so your saying is consistent. Will move. Best Regards, Hayato Kuroda FUJITSU LIMITED
Re: TAP test to cover "EndOfLogTLI != replayTLI" case
On Tue, Jan 18, 2022 at 10:31 AM Kyotaro Horiguchi wrote: > > At Tue, 18 Jan 2022 12:25:15 +0800, Julien Rouhaud wrote > in > > Hi, > > > > On Mon, Jan 17, 2022 at 09:33:57PM +0800, Julien Rouhaud wrote: > > > > > > Thanks for the updated patch! Note that thanks to Andres and Thomas > > > work, you > > > can now easily rely on the exact same CI than the cfbot on your own github > > > repository, if you need to debug something on a platform you don't have > > > access > > > to. You can check the documentation at [1] for more detail on how to > > > setup the > > > CI. > > > > The cfbot reports that the patch still fails on Windows but also failed on > > macos with the same error: https://cirrus-ci.com/task/5655777858813952: > > > > [14:20:43.950] # Failed test 'check standby content before timeline > > switch 0/500FAF8' > > [14:20:43.950] # at t/003_recovery_targets.pl line 239. > > [14:20:43.950] # got: '6000' > > [14:20:43.950] # expected: '7000' > > [14:20:43.950] # Looks like you failed 1 test of 10. > > > > I'm switching the CF entry to Waiting on Author. > > The most significant advantages of the local CI setup are > > - CI immediately responds to push. > > - You can dirty the code with additional logging aid as much as you >like to see closely what is going on. It makes us hesitant to do >the same on this ML:p > Indeed :) I found the cause for the test failing on window -- is due to the custom archive command setting which wasn't setting the correct window archive directory path. There is no option to choose a custom wal archive and restore patch in the TAP test. The attach 0001 patch proposes the same, which enables you to choose a custom WAL archive and restore directory. The only concern I have is that $node->info() will print the wrong archive directory path in that case, do we need to fix that? We might need to store archive_dir like base_dir, I wasn't sure about that. Thoughts? Comments? Regards, Amul From 4bc968d3508e9f608fd572f40f39e2cb374d53f4 Mon Sep 17 00:00:00 2001 From: Amul Sul Date: Wed, 19 Jan 2022 23:22:02 -0500 Subject: [PATCH v4 1/3] Allow TAP tests to choose custom WAL archive and restore path. --- src/test/perl/PostgreSQL/Test/Cluster.pm | 14 ++ 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm index b7d4c24553..6a562db2ce 100644 --- a/src/test/perl/PostgreSQL/Test/Cluster.pm +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm @@ -1033,10 +1033,13 @@ primary_conninfo='$root_connstr' # Internal routine to enable archive recovery command on a standby node sub enable_restoring { - my ($self, $root_node, $standby) = @_; - my $path = PostgreSQL::Test::Utils::perl2host($root_node->archive_dir); + my ($self, $root_node, $standby, $path) = @_; my $name = $self->name; + # Default archive directory will be used if not specified. + $path = $root_node->archive_dir if (!defined($path)); + $path = PostgreSQL::Test::Utils::perl2host($path); + print "### Enabling WAL restore for node \"$name\"\n"; # On Windows, the path specified in the restore command needs to use @@ -1101,10 +1104,13 @@ sub set_standby_mode # Internal routine to enable archiving sub enable_archiving { - my ($self) = @_; - my $path = PostgreSQL::Test::Utils::perl2host($self->archive_dir); + my ($self, $path) = @_; my $name = $self->name; + # Default archive directory will be used if not specified. + $path = $self->archive_dir if (!defined($path)); + $path = PostgreSQL::Test::Utils::perl2host($path); + print "### Enabling WAL archiving for node \"$name\"\n"; # On Windows, the path specified in the restore command needs to use -- 2.18.0 From 6c730ee10b9e64d90982aa555283c9094a5fdb87 Mon Sep 17 00:00:00 2001 From: Amul Sul Date: Wed, 19 Jan 2022 23:25:07 -0500 Subject: [PATCH v4 2/3] TAP test for EndOfLogTLI --- src/test/recovery/t/003_recovery_targets.pl | 47 - 1 file changed, 46 insertions(+), 1 deletion(-) diff --git a/src/test/recovery/t/003_recovery_targets.pl b/src/test/recovery/t/003_recovery_targets.pl index 24da78c0bc..b3f7076540 100644 --- a/src/test/recovery/t/003_recovery_targets.pl +++ b/src/test/recovery/t/003_recovery_targets.pl @@ -6,7 +6,7 @@ use strict; use warnings; use PostgreSQL::Test::Cluster; use PostgreSQL::Test::Utils; -use Test::More tests => 9; +use Test::More tests => 10; use Time::HiRes qw(usleep); # Create and test a standby from given backup, with a certain recovery target. @@ -182,3 +182,48 @@ $logfile = slurp_file($node_standby->logfile()); ok( $logfile =~ qr/FATAL: .* recovery ended before configured recovery target was reached/, 'recovery end before target reached is a fatal error'); + +# Test to cover a case where that we are looking for WAL record that ought to be +# in for e.g 00010005 we don't find it; instead we find +# 00020005 because of
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
At Sun, 16 Jan 2022 12:43:03 +0800, Julien Rouhaud wrote in > Other than that, I see that the TAP tests are failing on all the environment, > due to Perl errors. For instance: Perl seems to have changed its behavior for undef hash. It is said that "if (%undef_hash)" is false but actually it is true and "keys %undef_hash" is 1.. Finally I had to make backup_tablespaces() to return a hash reference. The test of pg_basebackup takes a backup with tar mode, which broke the test infrastructure. Cluster::backup now skips symlink adjustment when the backup contains "/base.tar". I gave up testing on Windows on my own environment and used Cirrus CI. # However, it works for confirmation of a established code. TAT of CI # is still long to do trial and error of unestablished code.. This version works for Unixen but still doesn't for Windows. I'm searching for a fix for Windows. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 5f88a80b9a585ca611ab6424f035330a47b2449f Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Thu, 11 Nov 2021 20:42:00 +0900 Subject: [PATCH v15 1/3] Add tablespace support to TAP framework TAP framework doesn't support nodes that have tablespaces. Especially backup and initialization from backups failed if the source node has tablespaces. This commit provides simple way to create tablespace directories and allows backup routines to handle tablespaces. --- src/bin/pg_basebackup/t/010_pg_basebackup.pl | 2 +- src/test/perl/PostgreSQL/Test/Cluster.pm | 264 ++- src/test/perl/PostgreSQL/Test/Utils.pm | 43 +++ 3 files changed, 306 insertions(+), 3 deletions(-) diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl index f0243f28d4..c139b5e000 100644 --- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl +++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl @@ -257,7 +257,7 @@ $node->safe_psql('postgres', "CREATE TABLESPACE tblspc1 LOCATION '$realTsDir';"); $node->safe_psql('postgres', "CREATE TABLE test1 (a int) TABLESPACE tblspc1;" - . "INSERT INTO test1 VALUES (1234);"); + . "INSERT INTO test1 VALUES (1234);"); $node->backup('tarbackup2', backup_options => ['-Ft']); # empty test1, just so that it's different from the to-be-restored data $node->safe_psql('postgres', "TRUNCATE TABLE test1;"); diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm index b7d4c24553..d433ccf610 100644 --- a/src/test/perl/PostgreSQL/Test/Cluster.pm +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm @@ -298,6 +298,64 @@ sub archive_dir =pod +=item $node->tablespace_storage([, nocreate]) + +Diretory to store tablespace directories. +If nocreate is true, returns undef if not yet created. + +=cut + +sub tablespace_storage +{ + my ($self, $nocreate) = @_; + + if (!defined $self->{_tsproot}) + { + # tablespace is not used, return undef if nocreate is specified. + return undef if ($nocreate); + + # create and remember the tablespae root directotry. + $self->{_tsproot} = PostgreSQL::Test::Utils::tempdir_short(); + } + + return $self->{_tsproot}; +} + +=pod + +=item $node->tablespaces() + +Returns a hash from tablespace OID to tablespace directory name. For +example, an oid 16384 pointing to /tmp/jWAhkT_fs0/ts1 is stored as +$hash{16384} = "ts1". + +=cut + +sub tablespaces +{ + my ($self) = @_; + my $pg_tblspc = $self->data_dir . '/pg_tblspc'; + my %ret; + + # return undef if no tablespace is used + return undef if (!defined $self->tablespace_storage(1)); + + # collect tablespace entries in pg_tblspc directory + opendir(my $dir, $pg_tblspc); + while (my $oid = readdir($dir)) + { + next if ($oid !~ /^([0-9]+)$/); + my $linkpath = "$pg_tblspc/$oid"; + my $tsppath = PostgreSQL::Test::Utils::dir_readlink($linkpath); + $ret{$oid} = File::Basename::basename($tsppath); + } + closedir($dir); + + return %ret; +} + +=pod + =item $node->backup_dir() The output path for backups taken with $node->backup() @@ -313,6 +371,77 @@ sub backup_dir =pod +=item $node->backup_tablespace_storage_path(backup_name) + +Returns tablespace location path for backup_name. +Retuns the parent directory if backup_name is not given. + +=cut + +sub backup_tablespace_storage_path +{ + my ($self, $backup_name) = @_; + my $dir = $self->backup_dir . '/__tsps'; + + $dir .= "/$backup_name" if (defined $backup_name); + + return $dir; +} + +=pod + +=item $node->backup_create_tablespace_storage(backup_name) + +Create tablespace location directory for backup_name if not yet. +Create the parent tablespace storage that holds all location +directories if backup_name is not supplied. + +=cut + +sub backup_create_tablespace_storage +{ + my ($self, $backup_name) = @_; + my $dir = $self->backup_tablespace_storage_path($backup_name); + + File::Path::make_path $dir if (! -d $dir); +} + +=pod + +=item $node->backup_tablespaces(backup_name) + +Returns a reference to hash from tablespace OID to
Re: Map WAL segment files on PMEM as WAL buffers
Hi Justin, Here is patchset v8. It will have "make check-world" and Cirrus to pass. Would you try this one? The v8 squashes some patches in v7 into related ones, and adds the following patches: - v8-0003: Add wal_pmem_map to postgresql.conf.sample. It also helps v8-0011. - v8-0009: Fix wrong handling of missingContrecPtr for test/recovery/t/026 to pass. It is the cause of the error. Thanks for your report. - v8-0010 and v8-0011: Each of the two is for CI only. v8-0010 adds --with-libpmem and v8-0011 enables "wal_pmem_map = on". Please note that, unlike your suggestion, in my patchset PMEM_IS_PMEM_FORCE=1 will be given as an environment variable in .cirrus.yml and "wal_pmem_map = on" will be given by initdb. Regards, Takashi -- Takashi Menjo v8-0001-Add-with-libpmem-option-for-PMEM-support.patch Description: Binary data v8-0002-Add-wal_pmem_map-to-GUC.patch Description: Binary data v8-0005-Map-WAL-segment-files-on-PMEM-as-WAL-buffers.patch Description: Binary data v8-0003-Add-wal_pmem_map-to-postgresql.conf.sample.patch Description: Binary data v8-0004-Export-InstallXLogFileSegment.patch Description: Binary data v8-0007-Update-document.patch Description: Binary data v8-0008-Preallocate-and-initialize-more-WAL-if-wal_pmem_m.patch Description: Binary data v8-0006-WAL-statistics-in-cases-of-wal_pmem_map-true.patch Description: Binary data v8-0009-Fix-wrong-handling-of-missingContrecPtr.patch Description: Binary data v8-0011-For-CI-only-Modify-initdb-for-wal_pmem_map-on.patch Description: Binary data v8-0010-For-CI-only-Setup-Cirrus-CI-for-with-libpmem.patch Description: Binary data
Re: Refactoring of compression options in pg_basebackup
On Wed, Jan 19, 2022 at 12:50:44PM -0300, Alvaro Herrera wrote: > Note there is an extra [ before the {gzip bit. Thanks! -- Michael signature.asc Description: PGP signature
Re: POC: GROUP BY optimization
I keep work on this patch. Here is intermediate results. On 7/22/21 3:58 AM, Tomas Vondra wrote: in the first loop. Which seems pretty bogus - why would there be just two groups? When processing the first expression, it's as if there was one big "prev group" with all the tuples, so why not to just use nGroups as it is? I think, heapsort code seems very strange. Look into fallback case. It based on an output_tuples value. Maybe we should use nGroups value here, but based on a number of output_tuples? > 1) I looked at the resources mentioned as sources the formulas came > from, but I've been unable to really match the algorithm to them. The > quicksort paper is particularly "dense", the notation seems to be very > different, and none of the theorems seem like an obvious fit. Would be > good to make the relationship clearer in comments etc. Fixed (See attachment). 3) I'm getting a bit skeptical about the various magic coefficients that are meant to model higher costs with non-uniform distribution. But consider that we do this, for example: tuplesPerPrevGroup = ceil(1.5 * tuplesPerPrevGroup / nGroups); but then in the next loop we call estimate_num_groups_incremental and pass this "tweaked" tuplesPerPrevGroup value to it. I'm pretty sure this may have various strange consequences - we'll calculate the nGroups based on the inflated value, and we'll calculate tuplesPerPrevGroup from that again - that seems susceptible to "amplification". We inflate tuplesPerPrevGroup by 50%, which means we'll get a higher nGroups estimate in the next loop - but not linearly. An then we'll calculate the inflated tuplesPerPrevGroup and estimated nGroup ... Weighting coefficient '1.5' shows our desire to minimize the number of comparison operations on each next attribute of a pathkeys list. Increasing this coef we increase a chance, that planner will order pathkeys by decreasing of uniqueness. I think, it's ok. -- regards, Andrey Lepikhov Postgres Professional From fbc5e6709550f485a2153dda97ef805700717f23 Mon Sep 17 00:00:00 2001 From: Andrey Lepikhov Date: Fri, 24 Dec 2021 13:01:48 +0500 Subject: [PATCH] My fixes. --- src/backend/optimizer/path/costsize.c | 33 --- src/backend/optimizer/path/pathkeys.c | 9 src/backend/optimizer/plan/planner.c | 8 --- 3 files changed, 20 insertions(+), 30 deletions(-) diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c index afb8ba54ea..70af9c91d5 100644 --- a/src/backend/optimizer/path/costsize.c +++ b/src/backend/optimizer/path/costsize.c @@ -1845,19 +1845,17 @@ get_width_cost_multiplier(PlannerInfo *root, Expr *expr) * groups in the current pathkey prefix and the new pathkey), and the comparison * costs (which is data type specific). * - * Estimation of the number of comparisons is based on ideas from two sources: + * Estimation of the number of comparisons is based on ideas from: * - * 1) "Algorithms" (course), Robert Sedgewick, Kevin Wayne [https://algs4.cs.princeton.edu/home/] + * "Quicksort Is Optimal", Robert Sedgewick, Jon Bentley, 2002 + * [https://www.cs.princeton.edu/~rs/talks/QuicksortIsOptimal.pdf] * - * 2) "Quicksort Is Optimal For Many Equal Keys" (paper), Sebastian Wild, - * arXiv:1608.04906v4 [cs.DS] 1 Nov 2017. [https://arxiv.org/abs/1608.04906] - * - * In term of that paper, let N - number of tuples, Xi - number of tuples with - * key Ki, then the estimate of number of comparisons is: + * In term of that paper, let N - number of tuples, Xi - number of identical + * tuples with value Ki, then the estimate of number of comparisons is: * * log(N! / (X1! * X2! * ..)) ~ sum(Xi * log(N/Xi)) * - * In our case all Xi are the same because now we don't have any estimation of + * We assume all Xi the same because now we don't have any estimation of * group sizes, we have only know the estimate of number of groups (distinct * values). In that case, formula becomes: * @@ -1865,7 +1863,7 @@ get_width_cost_multiplier(PlannerInfo *root, Expr *expr) * * For multi-column sorts we need to estimate the number of comparisons for * each individual column - for example with columns (c1, c2, ..., ck) we - * can estimate that number of comparions on ck is roughly + * can estimate that number of comparisons on ck is roughly * * ncomparisons(c1, c2, ..., ck) / ncomparisons(c1, c2, ..., c(k-1)) * @@ -1874,10 +1872,10 @@ get_width_cost_multiplier(PlannerInfo *root, Expr *expr) * * N * sum( Fk * log(Gk) ) * - * Note: We also consider column witdth, not just the comparator cost. + * Note: We also consider column width, not just the comparator cost. * * NOTE: some callers currently pass NIL for pathkeys because they - * can't conveniently supply the sort keys. In this case, it will fallback to + * can't conveniently supply the sort keys. In this case, it will fallback to * simple comparison cost estimate. */ static Cost @@ -1925,13
Re: pg_upgrade should truncate/remove its logs before running
On Wed, Jan 19, 2022 at 09:59:14PM -0600, Justin Pryzby wrote: > We require that the dir not exist, by testing if (mkdir()). > So it's okay if someone specifies ../whatever or $CWD. What I am scared of here is the use of rmtree() if we allow something like that. So we should either keep the removal code in its original shape and allow such cases, or restrict the output path. At the end, something has to change. My points are in favor of the latter because I don't really see anybody doing the former. You favor the former. Now, we are not talking about a lot of code for any of these, anyway. Perhaps we'd better wait for more opinions. -- Michael signature.asc Description: PGP signature
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
At Mon, 17 Jan 2022 17:24:43 +0900 (JST), Kyotaro Horiguchi wrote in > At Sun, 16 Jan 2022 12:43:03 +0800, Julien Rouhaud wrote > in > > I'm not very familiar with windows, but maybe using strawberry perl instead > > ([1]) would fix your problem? I think it's also quite popular and is > > commonly > > used to run pgBadger on Windows. > > Thanks! I'll try it later. Build is stopped by some unresolvable symbols. Strawberry perl is 5.28, which doesn't expose new_ctype, new_collate and new_numeric according the past discussion. (Active perl is 5.32). https://www.postgresql.org/message-id/20200501134711.08750c5f%40antares.wagner.home However, the patch provided revealed other around 70 unresolved symbol errors... # Hmm. perl on CentOS 8 is 5.26.. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: row filtering for logical replication
On Thu, Jan 20, 2022 at 2:29 PM Amit Kapila wrote: > > On Thu, Jan 20, 2022 at 7:51 AM Peter Smith wrote: > > > > Here are some review comments for v68-0001. > > > > > > 3. src/backend/replication/pgoutput/pgoutput.c - pgoutput_row_filter_init > > > > + /* If no filter found, clean up the memory and return */ > > + if (!has_filter) > > + { > > + if (entry->cache_expr_cxt != NULL) > > + MemoryContextDelete(entry->cache_expr_cxt); > > + > > + entry->exprstate_valid = true; > > + return; > > + } > > > > IMO this should be refactored to have if/else, so the function has > > just a single point of return and a single point where the > > exprstate_valid is set. e.g. > > > > if (!has_filter) > > { > > /* If no filter found, clean up the memory and return */ > > ... > > } > > else > > { > > /* Create or reset the memory context for row filters */ > > ... > > /* > > * Now all the filters for all pubactions are known. Combine them when > > * their pubactions are same. > > ... > > } > > > > entry->exprstate_valid = true; > > > > This part of the code is changed in v68 which makes the current code > more suitable as we don't need to deal with memory context in if part. > I am not sure if it is good to add the else block here but I think > that is just a matter of personal preference. > Sorry, my mistake - I quoted the v67 source instead of the v68 source. There is no else needed now at all. My suggestion becomes. e.g. if (has_filter) { // deal with mem ctx ... // combine filters ... } entry->exprstate_valid = true; return; -- Kind Regards, Peter Smith Fujitsu Australia
Re: pg_upgrade should truncate/remove its logs before running
On Thu, Jan 20, 2022 at 12:01:29PM +0900, Michael Paquier wrote: > On Wed, Jan 19, 2022 at 06:05:40PM -0600, Justin Pryzby wrote: > > > I'm not sure these restrictions are needed ? > > This could lead to issues with rmtree() if we are not careful enough, > no? We'd had our deal of argument injections with pg_upgrade commands > in the past (fcd15f1). We require that the dir not exist, by testing if (mkdir()). So it's okay if someone specifies ../whatever or $CWD. -- Justin
Re: row filtering for logical replication
On Thu, Jan 20, 2022 at 7:51 AM Peter Smith wrote: > > 5. DEBUG level 3 > > I found there are 3 debug logs in this patch and they all have DEBUG3 level. > > IMO it is probably OK as-is, > +1. > but just a comparison I noticed that the > most detailed logging for logical replication worker.c was DEBUG2. > Perhaps row-filter patch should be using DEBUG2 also? > OTOH, the other related files like reorderbuffer.c and snapbuild.c are using DEBUG3 for the detailed messages. So, I think it is probably okay to retain logs as is. -- With Regards, Amit Kapila.
Re: row filtering for logical replication
On Thu, Jan 20, 2022 at 7:51 AM Peter Smith wrote: > > Here are some review comments for v68-0001. > > > 3. src/backend/replication/pgoutput/pgoutput.c - pgoutput_row_filter_init > > + /* If no filter found, clean up the memory and return */ > + if (!has_filter) > + { > + if (entry->cache_expr_cxt != NULL) > + MemoryContextDelete(entry->cache_expr_cxt); > + > + entry->exprstate_valid = true; > + return; > + } > > IMO this should be refactored to have if/else, so the function has > just a single point of return and a single point where the > exprstate_valid is set. e.g. > > if (!has_filter) > { > /* If no filter found, clean up the memory and return */ > ... > } > else > { > /* Create or reset the memory context for row filters */ > ... > /* > * Now all the filters for all pubactions are known. Combine them when > * their pubactions are same. > ... > } > > entry->exprstate_valid = true; > This part of the code is changed in v68 which makes the current code more suitable as we don't need to deal with memory context in if part. I am not sure if it is good to add the else block here but I think that is just a matter of personal preference. -- With Regards, Amit Kapila.
Re: Add last commit LSN to pg_last_committed_xact()
Hi, On 2022-01-19 21:23:12 -0500, James Coleman wrote: > { oid => '3537', descr => 'get identification of SQL object', > diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h > index a5f9e9..2a026b0844 100644 > --- a/src/include/storage/proc.h > +++ b/src/include/storage/proc.h > @@ -258,6 +258,11 @@ struct PGPROC > PGPROC *lockGroupLeader;/* lock group leader, if I'm a member */ > dlist_head lockGroupMembers; /* list of members, if I'm a > leader */ > dlist_node lockGroupLink; /* my member link, if I'm a member */ > + > + /* > + * Last transaction metadata. > + */ > + XLogRecPtr lastCommitLSN; /* cache of last committed LSN > */ > }; Might be worth forcing this to be on a separate cacheline than stuff more hotly accessed by other backends, like the lock group stuff. Greetings, Andres Freund
Re: pg_upgrade should truncate/remove its logs before running
On Wed, Jan 19, 2022 at 06:05:40PM -0600, Justin Pryzby wrote: > I still don't know if it even needs to be configurable. I want this to be configurable to ease the switch of the pg_upgrade to TAP, moving the logs into a deterministic temporary location proper to each run. This makes reporting much easier on failure, with repeatable tests, and that's why I began poking at this patch first. > I'm not sure these restrictions are needed ? This could lead to issues with rmtree() if we are not careful enough, no? We'd had our deal of argument injections with pg_upgrade commands in the past (fcd15f1). > + outputpath = make_absolute_path(log_opts.basedir); > > + if (path_contains_parent_reference(outputpath)) > > + pg_fatal("reference to parent directory not allowed\n"); > > > Besides, you're passing the wrong path here. What would you suggest? I was just looking at that again this morning, and splitted the logic into two parts for the absolute and relative path cases, preventing all cases like that, which would be weird, anyway: ../ ../popo .././ ././ /direct/path/to/cwd/ /direct/path/../path/to/cwd/ >> + pg_upgrade, and is be removed after a successful > > remove "be" Fixed. >> + if (mkdir(log_opts.basedir, S_IRWXU | S_IRWXG | S_IRWXO)) > > S_IRWXG | S_IRWXO are useless due to the umask, right ? > Maybe use PG_DIR_MODE_OWNER ? Hmm. We could just use pg_dir_create_mode, then. See pg_rewind, as one example. This opens the door for something pluggable to SetDataDirectoryCreatePerm(), though the original use is kind of different with data folders. > You're printing the wrong var. filename_path is not initialized. Ugh. -- Michael From 2efd3252e189578f865eb54d076596387494585c Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Thu, 20 Jan 2022 11:58:59 +0900 Subject: [PATCH v4] pg_upgrade: write log files and dumps in subdir.. The subdir must not exist when pg_upgrade is started. This avoids appending to logfiles which have pre-existing errors in them. And allows more easily cleaning up after pg_upgrade. --- src/bin/pg_upgrade/.gitignore | 2 - src/bin/pg_upgrade/Makefile | 4 +- src/bin/pg_upgrade/check.c | 12 ++-- src/bin/pg_upgrade/dump.c | 6 +- src/bin/pg_upgrade/exec.c | 5 +- src/bin/pg_upgrade/function.c | 3 +- src/bin/pg_upgrade/option.c | 29 ++-- src/bin/pg_upgrade/pg_upgrade.c | 120 +--- src/bin/pg_upgrade/pg_upgrade.h | 7 ++ src/bin/pg_upgrade/server.c | 6 +- doc/src/sgml/ref/pgupgrade.sgml | 17 - 11 files changed, 145 insertions(+), 66 deletions(-) diff --git a/src/bin/pg_upgrade/.gitignore b/src/bin/pg_upgrade/.gitignore index 2d3bfeaa50..939e50db6c 100644 --- a/src/bin/pg_upgrade/.gitignore +++ b/src/bin/pg_upgrade/.gitignore @@ -1,9 +1,7 @@ /pg_upgrade # Generated by test suite -/pg_upgrade_internal.log /delete_old_cluster.sh /delete_old_cluster.bat /reindex_hash.sql -/loadable_libraries.txt /log/ /tmp_check/ diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile index 44d06be5a6..e833a1c5f0 100644 --- a/src/bin/pg_upgrade/Makefile +++ b/src/bin/pg_upgrade/Makefile @@ -45,9 +45,7 @@ uninstall: clean distclean maintainer-clean: rm -f pg_upgrade$(X) $(OBJS) rm -rf delete_old_cluster.sh log/ tmp_check/ \ - loadable_libraries.txt reindex_hash.sql \ - pg_upgrade_dump_globals.sql \ - pg_upgrade_dump_*.custom pg_upgrade_*.log + reindex_hash.sql pg_upgrade_output.d/ # When $(MAKE) is present, make automatically infers that this is a # recursive make. which is not actually what we want here, as that diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c index 3d218c2ad2..019bcb6c7b 100644 --- a/src/bin/pg_upgrade/check.c +++ b/src/bin/pg_upgrade/check.c @@ -783,7 +783,8 @@ check_for_isn_and_int8_passing_mismatch(ClusterInfo *cluster) return; } - snprintf(output_path, sizeof(output_path), + snprintf(output_path, sizeof(output_path), "%s/%s", + log_opts.basedir, "contrib_isn_and_int8_pass_by_value.txt"); for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++) @@ -860,7 +861,8 @@ check_for_user_defined_postfix_ops(ClusterInfo *cluster) prep_status("Checking for user-defined postfix operators"); - snprintf(output_path, sizeof(output_path), + snprintf(output_path, sizeof(output_path), "%s/%s", + log_opts.basedir, "postfix_ops.txt"); /* Find any user defined postfix operators */ @@ -959,7 +961,8 @@ check_for_tables_with_oids(ClusterInfo *cluster) prep_status("Checking for tables WITH OIDS"); - snprintf(output_path, sizeof(output_path), + snprintf(output_path, sizeof(output_path), "%s/%s", +
Re: Null commitTS bug
At Tue, 18 Jan 2022 13:48:11 +0900 (JST), Kyotaro Horiguchi wrote in > At Tue, 18 Jan 2022 10:43:55 +0900, Michael Paquier > wrote in > > On Sun, Jan 16, 2022 at 11:01:25PM -0500, Tom Lane wrote: > > > Isn't that a very bad way to write "i = j + 1"? > > > > > > I agree with Horiguchi-san that > > > for (i = 0, headxid = xid;;) > > > > Okay. Horiguchi-san, would you like to write a patch? > > Yes, I will. This is that. I think this is a separate issue from the actual bug. This is applicable at least back to 9.6 and I think this should be applied back to all supported versions to avoid future backptach conflicts. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 9211fa61513dcdbca273656454395a3dcf3ee4e7 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Thu, 20 Jan 2022 10:16:48 +0900 Subject: [PATCH] Improve confusing code in TransactionTreeSetCommitTsData TransactionTreeSetCommitTsData has a bit confusing use of the += operator. Simplifying it makes the code easier to follow. In the same function for-loop initializes only non-controlling variables, which is not great style. They ought to be initialized outside the loop. --- src/backend/access/transam/commit_ts.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c index 659109f8d4..88eac10456 100644 --- a/src/backend/access/transam/commit_ts.c +++ b/src/backend/access/transam/commit_ts.c @@ -168,7 +168,10 @@ TransactionTreeSetCommitTsData(TransactionId xid, int nsubxids, * subxid not on the previous page as head. This way, we only have to * lock/modify each SLRU page once. */ - for (i = 0, headxid = xid;;) + headxid = xid; + i = 0; + + for (;;) { int pageno = TransactionIdToCTsPage(headxid); int j; @@ -192,7 +195,7 @@ TransactionTreeSetCommitTsData(TransactionId xid, int nsubxids, * just wrote. */ headxid = subxids[j]; - i += j - i + 1; + i = j + 1; } /* update the cached value in shared memory */ -- 2.27.0
Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
At Thu, 20 Jan 2022 00:36:32 +, "Bossart, Nathan" wrote in > On 1/3/22, 5:52 PM, "Kyotaro Horiguchi" wrote: > > It seems to me "LSN" or just "location" is more confusing or > > mysterious than "REDO LSN" for the average user. If we want to avoid > > being technically too detailed, we would use just "start LSN=%X/%X, > > end LSN=%X/%X". And it is equivalent to "WAL range=[%X/%X, %X/%X]".. > > My first instinct was that this should stay aligned with > pg_controldata, but that would mean using "location=%X/%X, REDO > location=%X/%X," which doesn't seem terribly descriptive. IIUC the > "checkpoint location" is the LSN of the WAL record for the checkpoint, > and the "checkpoint's REDO location" is the LSN where checkpoint > creation began (i.e., what you must retain for crash recovery). My > vote is for "start=%X/%X, end=%X/%X." +1. Works for me. %X/%X itself expresses it is an LSN. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Document atthasmissing default optimization avoids verification table scan
On Wed, Jan 19, 2022 at 6:14 PM James Coleman wrote: > I'm open to the idea of wordsmithing here, of course, but I strongly > disagree that this is irrelevant data. Ok, but wording aside, only changing a tip in the DDL - Add Table section doesn't seem like a complete fix. The notes in alter table, where I'd look for such an official directive first, need to be touched as well. For the alter table docs maybe change/add to the existing sentence below (I'm in favor of not pointing out that scanning the table doesn't mean we are rewriting it, but maybe I'm making another unwarranted assumption regarding obviousness...). "Adding a CHECK or NOT NULL constraint requires scanning the table [but not rewriting it] to verify that existing rows meet the constraint. It is skipped when done as part of ADD COLUMN unless a table rewrite is required anyway." On that note, does the check constraint interplay with the default rewrite avoidance in the same way? For the Tip I'd almost rather redo it to say: "Before PostgreSQL 11, adding a new column to a table required rewriting that table, making it a very slow operation. More recent versions can sometimes optimize away this rewrite and related validation scans. See the notes in ALTER TABLE for details." Though I suppose I'd accept something like (leave existing text, alternative patch text): "[...]large tables.\nIf the added column also has a not null constraint the usual verification scan is also skipped." "constant" is used in the Tip, "non-volatile" is used in alter table - hence a desire to have just one source of truth, with alter table being the correct place. We should sync them up otherwise. David J.
Re: Add last commit LSN to pg_last_committed_xact()
On Tue, Jan 18, 2022 at 9:19 PM Andres Freund wrote: > > > + LWLockAcquire(ProcArrayLock, LW_SHARED); > > + lsn = ShmemVariableCache->finishedProcsLastCommitLSN; > > + for (index = 0; index < ProcGlobal->allProcCount; index++) > > + { > > + XLogRecPtr procLSN = > > ProcGlobal->allProcs[index].lastCommitLSN; > > + if (procLSN > lsn) > > + lsn = procLSN; > > + } > > + LWLockRelease(ProcArrayLock); > > I think it'd be better to go through the pgprocnos infrastructure, so that > only connected procs need to be checked. > > LWLockAcquire(ProcArrayLock, LW_SHARED); > for (i = 0; i < arrayP->numProcs; i++) > { > int pgprocno = arrayP->pgprocnos[i]; > PGPROC *proc = [pgprocno]; > > if (proc->lastCommitLSN > lsn) >lsn =proc->lastCommitLSN; >} > > > > diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h > > index a5f9e9..2a026b0844 100644 > > --- a/src/include/storage/proc.h > > +++ b/src/include/storage/proc.h > > @@ -258,6 +258,11 @@ struct PGPROC > > PGPROC *lockGroupLeader;/* lock group leader, if I'm a member > > */ > > dlist_head lockGroupMembers; /* list of members, if I'm a > > leader */ > > dlist_node lockGroupLink; /* my member link, if I'm a member */ > > + > > + /* > > + * Last transaction metadata. > > + */ > > + XLogRecPtr lastCommitLSN; /* cache of last committed > > LSN */ > > }; > > We do not rely on 64bit integers to be read/written atomically, just 32bit > ones. To make this work for older platforms you'd have to use a > pg_atomic_uint64. On new-ish platforms pg_atomic_read_u64/pg_atomic_write_u64 > end up as plain read/writes, but on older ones they'd do the necessarily > locking to make that safe... All right, here's an updated patch. The final interface (new function or refactor the existing not to rely on commit_ts) is still TBD (and I'd appreciate input on that from Alvaro and others). Thanks, James Coleman From 4b1d2db87be0090bc8840e1c6d990443dccb5237 Mon Sep 17 00:00:00 2001 From: jcoleman Date: Fri, 14 Jan 2022 22:26:10 + Subject: [PATCH v3] Expose LSN of last commit via pg_last_committed_xact --- src/backend/access/transam/commit_ts.c | 25 +-- src/backend/access/transam/twophase.c | 2 +- src/backend/access/transam/xact.c | 10 +--- src/backend/storage/ipc/procarray.c| 34 ++ src/include/access/commit_ts.h | 5 ++-- src/include/access/transam.h | 2 ++ src/include/catalog/pg_proc.dat| 6 ++--- src/include/storage/proc.h | 5 src/include/storage/procarray.h| 2 ++ 9 files changed, 73 insertions(+), 18 deletions(-) diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c index 659109f8d4..f0c3185021 100644 --- a/src/backend/access/transam/commit_ts.c +++ b/src/backend/access/transam/commit_ts.c @@ -33,10 +33,11 @@ #include "funcapi.h" #include "miscadmin.h" #include "pg_trace.h" -#include "storage/shmem.h" +#include "storage/procarray.h" #include "utils/builtins.h" #include "utils/snapmgr.h" #include "utils/timestamp.h" +#include "utils/pg_lsn.h" /* * Defines for CommitTs page sizes. A page is the same BLCKSZ as is used @@ -93,6 +94,7 @@ static SlruCtlData CommitTsCtlData; typedef struct CommitTimestampShared { TransactionId xidLastCommit; + XLogRecPtr lsnLastCommit; CommitTimestampEntry dataLastCommit; bool commitTsActive; } CommitTimestampShared; @@ -135,7 +137,7 @@ static void WriteTruncateXlogRec(int pageno, TransactionId oldestXid); void TransactionTreeSetCommitTsData(TransactionId xid, int nsubxids, TransactionId *subxids, TimestampTz timestamp, - RepOriginId nodeid) + XLogRecPtr commitLsn, RepOriginId nodeid) { int i; TransactionId headxid; @@ -414,24 +416,28 @@ pg_last_committed_xact(PG_FUNCTION_ARGS) TransactionId xid; RepOriginId nodeid; TimestampTz ts; - Datum values[3]; - bool nulls[3]; + XLogRecPtr lsn; + Datum values[4]; + bool nulls[4]; TupleDesc tupdesc; HeapTuple htup; /* and construct a tuple with our data */ xid = GetLatestCommitTsData(, ); + lsn = GetLastCommitLSN(); /* * Construct a tuple descriptor for the result row. This must match this * function's pg_proc entry! */ - tupdesc = CreateTemplateTupleDesc(3); + tupdesc = CreateTemplateTupleDesc(4); TupleDescInitEntry(tupdesc, (AttrNumber) 1, "xid", XIDOID, -1, 0); TupleDescInitEntry(tupdesc, (AttrNumber) 2, "timestamp", TIMESTAMPTZOID, -1, 0); - TupleDescInitEntry(tupdesc, (AttrNumber) 3, "roident", + TupleDescInitEntry(tupdesc, (AttrNumber) 3, "lsn", + PG_LSNOID, -1, 0); + TupleDescInitEntry(tupdesc, (AttrNumber) 4, "roident", OIDOID, -1, 0); tupdesc = BlessTupleDesc(tupdesc); @@ -447,8 +453,11 @@
Re: row filtering for logical replication
Here are some review comments for v68-0001. ~~~ 1. Commit message "When a publication is defined or modified, rows that don't satisfy an optional WHERE clause will be filtered out." That wording seems strange to me - it sounds like the filtering takes place at the point of creating/altering. Suggest reword something like: "When a publication is defined or modified, an optional WHERE clause can be specified. Rows that don't satisfy this WHERE clause will be filtered out." ~~~ 2. Commit message "The WHERE clause allows simple expressions that don't have user-defined functions, operators..." Suggest adding the word ONLY: "The WHERE clause only allows simple expressions that don't have user-defined functions, operators..." ~~~ 3. src/backend/replication/pgoutput/pgoutput.c - pgoutput_row_filter_init + /* If no filter found, clean up the memory and return */ + if (!has_filter) + { + if (entry->cache_expr_cxt != NULL) + MemoryContextDelete(entry->cache_expr_cxt); + + entry->exprstate_valid = true; + return; + } IMO this should be refactored to have if/else, so the function has just a single point of return and a single point where the exprstate_valid is set. e.g. if (!has_filter) { /* If no filter found, clean up the memory and return */ ... } else { /* Create or reset the memory context for row filters */ ... /* * Now all the filters for all pubactions are known. Combine them when * their pubactions are same. ... } entry->exprstate_valid = true; ~~~ 4. src/backend/replication/pgoutput/pgoutput.c - pgoutput_row_filter comment + /* + * We need this map to avoid relying on changes in ReorderBufferChangeType + * enum. + */ + static int map_changetype_pubaction[] = { + [REORDER_BUFFER_CHANGE_INSERT] = PUBACTION_INSERT, + [REORDER_BUFFER_CHANGE_UPDATE] = PUBACTION_UPDATE, + [REORDER_BUFFER_CHANGE_DELETE] = PUBACTION_DELETE + }; Suggest rewording comment and remove the double-spacing: BEFORE: "We need this map to avoid relying on changes in ReorderBufferChangeType enum." AFTER: "We need this map to avoid relying on ReorderBufferChangeType enums having specific values." ~~~ 5. DEBUG level 3 I found there are 3 debug logs in this patch and they all have DEBUG3 level. IMO it is probably OK as-is, but just a comparison I noticed that the most detailed logging for logical replication worker.c was DEBUG2. Perhaps row-filter patch should be using DEBUG2 also? -- Kind Regards, Peter Smith. Fujitsu Australia
Re: slowest tap tests - split or accelerate?
Hi, On 2022-01-19 09:42:31 -0800, Andres Freund wrote: > Both ours have this. Unfortunately on windows cp doesn't natively > exist. Although git does provide it. I tried a few things that appear to be > natively available (time is best of three executions): > > gnu cp from git, cp -a tmp_install\initdb_template t\ > 670ms > > xcopy.exe /E /Q tmp_install\initdb_template t\ > 638ms This errors out if there's any forward slashes in paths, thinking it's a flag. Seems out. > robocopy /e /NFL /NDL tmp_install\initdb_template t\ > 575ms > > So I guess we could use robocopy? That's shipped as part of windows starting > in > windows 10... xcopy has been there for longer, so I might just default to > that. It's part of of the OS back to at least windows 2016. I've found some random links on the webs saying that it's included "This command is available in Vista and Windows 7 by default. For Windows XP and Server 2003 this tool can be downloaded as part of Server 2003 Windows Resource Kit tools. ". Given that our oldest supported msvc version only runs on Windows 7 upwards [2], I think we should be good? Alternatively we could lift copydir() to src/common? But that seems like a bit more work than I want to put in. For a second I was thinking that using something like copy --reflink=auto could make a lot of sense for machines like florican, removing most of the IO from a "templated initdb". But it looks like freebsd doesn't have that, and it'd be a pain to figure out whether cp has --reflink. Greetings, Andres Freund [1] https://www.windows-commandline.com/download-robocopy/ [2] https://docs.microsoft.com/en-us/visualstudio/releases/2013/vs2013-sysrequirements-vs
Re: Use generation context to speed up tuplesorts
On Tue, 18 Jan 2022 at 19:45, Julien Rouhaud wrote: > I'm also a bit confused about which patch(es) should actually be reviewed in > that thread. My understanding is that the original patch might not be > relevant > anymore but I might be wrong. The CF entry should probably be updated to > clarify that, with an annotation/ title change / author update or something. > > In the meantime I will switch the entry to Waiting on Author. I think what's going on here is a bit confusing. There's quite a few patches on here that aim to do very different things. The patch I originally proposed was just to make tuplesort use generation memory contexts. This helps speed up tuplesort because generation contexts allow palloc() to be faster than the standard allocator. Additionally, there's no power-of-2 wastage with generation contexts like there is with the standard allocator. This helps fit more tuples on fewer CPU cache lines. I believe Andres then mentioned that the fixed block size for generation contexts might become a problem. With the standard allocator the actual size of the underlying malloc() grows up until a maximum size. With generation contexts this is always the same, so we could end up with a very large number of blocks which means lots of small mallocs which could end up slow. Tomas then posted a series of patches to address this. I then posted another patch that has the planner make a choice on the size of the blocks to use for the generation context based on the tuple width and number of tuple estimates. My hope there was to improve performance further by making a better choice for how large to make the blocks in the first place. This reduces the need for Tomas' patch, but does not eliminate the need for it. As of now, I still believe we'll need Tomas' patches to allow the block size to grow up to a maximum size. I think those patches are likely needed before we think about making tuplesort use generation contexts. The reason I believe this is that we don't always have good estimates for the number of tuples we're going to sort. nodeSort.c is fairly easy as it just fetches tuples once and then sorts them. The use of tuplesort for nodeIncrementalSort.c is much more complex as we'll likely be performing a series of smaller sorts which are much harder to get size estimates for. This means there's likely no magic block size that we can use for the generation context that's used to store the tuples in tuplesort. This is also the case for order by aggregates and probably most other usages of tuplesort. David
Re: Replace uses of deprecated Python module distutils.sysconfig
=?UTF-8?Q?Juan_Jos=C3=A9_Santamar=C3=ADa_Flecha?= writes: > On Wed, Jan 19, 2022 at 9:40 PM Tom Lane wrote: >> Done. (I couldn't find any equivalent logic in the MSVC build scripts >> though; is there something I missed?) > MSVC will use the path configured in src\tools\msvc\config.pl > $config->{"python"}, > there is no ambiguity. Ah, right. We have only three active Windows animals that are building with python, and all three are using 3.something, so that side of the house seems to be ready to go. regards, tom lane
Re: Document atthasmissing default optimization avoids verification table scan
On Wed, Jan 19, 2022 at 7:51 PM David G. Johnston wrote: > > On Wed, Jan 19, 2022 at 5:08 PM Bossart, Nathan wrote: >> >> On 9/24/21, 7:30 AM, "James Coleman" wrote: >> > When PG11 added the ability for ALTER TABLE ADD COLUMN to set a constant >> > default value without rewriting the table the doc changes did not note >> > how the new feature interplayed with ADD COLUMN DEFAULT NOT NULL. >> > Previously such a new column required a verification table scan to >> > ensure no values were null. That scan happens under an exclusive lock on >> > the table, so it can have a meaningful impact on database "accessible >> > uptime". >> >> I'm likely misunderstanding, but are you saying that adding a new >> column with a default value and a NOT NULL constraint used to require >> a verification scan? > > > As a side-effect of rewriting every live record in the table and indexes to > brand new files, yes. I doubt an actual independent scan was performed since > the only way for the newly written tuples to not have the default value > inserted would be a severe server bug. I've confirmed it wasn't a separate scan, but it does evaluate all constraints (it doesn't have any optimizations for skipping ones probably true by virtue of the new default). >> >> + Additionally adding a column with a constant default value avoids a >> + a table scan to verify no NULL values are present. >> >> Should this clarify that it's referring to NOT NULL constraints? >> > > This doesn't seem like relevant material to comment on. It's an > implementation detail that is sufficiently covered by "making the ALTER TABLE > very fast even on large tables". > > Also, the idea of performing that scan seems ludicrous. I just added the > column and told it to populate with default values - why do you need to check > that your server didn't miss any? I'm open to the idea of wordsmithing here, of course, but I strongly disagree that this is irrelevant data. There are plenty of optimizations Postgres could theoretically implement but doesn't, so measuring what should happen by what you think is obvious ("told it to populate with default values - why do you need to check") is clearly not valid. This patch actually came out of our specifically needing to verify that this is true before an op precisely because docs aren't actually clear and because we can't risk a large table scan under an exclusive lock. We're clearly not the only ones with that question; it came up in a comment on this blog post announcing the newly committed feature [1]. I realize that most users aren't as worried about this kind of specific detail about DDL as we are (requiring absolutely zero slow DDL while under an exclusive lock), but it is relevant to high uptime systems. Thanks, James Coleman 1: https://www.depesz.com/2018/04/04/waiting-for-postgresql-11-fast-alter-table-add-column-with-a-non-null-default/
Re: Remove redundant MemoryContextSwith in BeginCopyFrom
On Thu, 20 Jan 2022 at 08:50, Tom Lane wrote: > Japin Li writes: >> I see you have already push this patch on master (89f059bdf52), why not >> remove MemoryContextSwitchTo in the middle of BeginCopyTo in this commit? > > That was a different suggestion from a different person, so I didn't > want to muddle the credit. Also, it requires a bit of testing, > while 89f059bdf52 was visibly perfectly safe. > Thanks for your explanation! -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd.
Re: Document atthasmissing default optimization avoids verification table scan
On Wed, Jan 19, 2022 at 5:08 PM Bossart, Nathan wrote: > On 9/24/21, 7:30 AM, "James Coleman" wrote: > > When PG11 added the ability for ALTER TABLE ADD COLUMN to set a constant > > default value without rewriting the table the doc changes did not note > > how the new feature interplayed with ADD COLUMN DEFAULT NOT NULL. > > Previously such a new column required a verification table scan to > > ensure no values were null. That scan happens under an exclusive lock on > > the table, so it can have a meaningful impact on database "accessible > > uptime". > > I'm likely misunderstanding, but are you saying that adding a new > column with a default value and a NOT NULL constraint used to require > a verification scan? > As a side-effect of rewriting every live record in the table and indexes to brand new files, yes. I doubt an actual independent scan was performed since the only way for the newly written tuples to not have the default value inserted would be a severe server bug. > + Additionally adding a column with a constant default value avoids a > + a table scan to verify no NULL values are present. > > Should this clarify that it's referring to NOT NULL constraints? > > This doesn't seem like relevant material to comment on. It's an implementation detail that is sufficiently covered by "making the ALTER TABLE very fast even on large tables". Also, the idea of performing that scan seems ludicrous. I just added the column and told it to populate with default values - why do you need to check that your server didn't miss any? David J.
Re: Remove redundant MemoryContextSwith in BeginCopyFrom
Japin Li writes: > I see you have already push this patch on master (89f059bdf52), why not > remove MemoryContextSwitchTo in the middle of BeginCopyTo in this commit? That was a different suggestion from a different person, so I didn't want to muddle the credit. Also, it requires a bit of testing, while 89f059bdf52 was visibly perfectly safe. regards, tom lane
Re: Remove redundant MemoryContextSwith in BeginCopyFrom
On Thu, 20 Jan 2022 at 00:38, Tom Lane wrote: > Bharath Rupireddy writes: >> +1. It looks like a thinko from c532d15d. There's no code in between, >> so switching to oldcontext doesn't make sense. > > Agreed. > >> I think we also need to remove MemoryContextSwitchTo(oldcontext); at >> the end of BeginCopyTo in copyto.c, because we are not changing memory >> contexts in between. > > Hmm, I think it'd be a better idea to remove the one in the middle of > BeginCopyTo. The code after that is still doing setup of the cstate, > so the early switch back looks to me like trouble waiting to happen. > Agreed I see you have already push this patch on master (89f059bdf52), why not remove MemoryContextSwitchTo in the middle of BeginCopyTo in this commit? -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd.
Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
On 1/3/22, 5:52 PM, "Kyotaro Horiguchi" wrote: > It seems to me "LSN" or just "location" is more confusing or > mysterious than "REDO LSN" for the average user. If we want to avoid > being technically too detailed, we would use just "start LSN=%X/%X, > end LSN=%X/%X". And it is equivalent to "WAL range=[%X/%X, %X/%X]".. My first instinct was that this should stay aligned with pg_controldata, but that would mean using "location=%X/%X, REDO location=%X/%X," which doesn't seem terribly descriptive. IIUC the "checkpoint location" is the LSN of the WAL record for the checkpoint, and the "checkpoint's REDO location" is the LSN where checkpoint creation began (i.e., what you must retain for crash recovery). My vote is for "start=%X/%X, end=%X/%X." Nathan
Re: Document atthasmissing default optimization avoids verification table scan
On 9/24/21, 7:30 AM, "James Coleman" wrote: > When PG11 added the ability for ALTER TABLE ADD COLUMN to set a constant > default value without rewriting the table the doc changes did not note > how the new feature interplayed with ADD COLUMN DEFAULT NOT NULL. > Previously such a new column required a verification table scan to > ensure no values were null. That scan happens under an exclusive lock on > the table, so it can have a meaningful impact on database "accessible > uptime". I'm likely misunderstanding, but are you saying that adding a new column with a default value and a NOT NULL constraint used to require a verification scan? + Additionally adding a column with a constant default value avoids a + a table scan to verify no NULL values are present. Should this clarify that it's referring to NOT NULL constraints? Nathan
Re: pg_upgrade should truncate/remove its logs before running
On Wed, Jan 19, 2022 at 05:13:18PM +0900, Michael Paquier wrote: > On Tue, Jan 11, 2022 at 10:08:13PM -0600, Justin Pryzby wrote: > > I asked about that before. Right now, it'll exit(1) when mkdir fails. > > > > I had written a patch to allow "." by skipping mkdir (or allowing it to > > fail if > > errno == EEXIST), but it seems like an awfully bad idea to try to make that > > work with rmtree(). I still don't know if it even needs to be configurable. > - Add some sanity check about the path used, aka no parent reference > allowed and the output path should not be a direct parent of the > current working directory. I'm not sure these restrictions are needed ? + outputpath = make_absolute_path(log_opts.basedir); + if (path_contains_parent_reference(outputpath)) + pg_fatal("reference to parent directory not allowed\n"); Besides, you're passing the wrong path here. > I have noticed a couple of incorrect things in the docs, and some > other things. It is a bit late here, so I may have missed a couple of > things but I'll look at this stuff once again in a couple of days. > + pg_upgrade, and is be removed after a successful remove "be" > + if (mkdir(log_opts.basedir, S_IRWXU | S_IRWXG | S_IRWXO)) S_IRWXG | S_IRWXO are useless due to the umask, right ? Maybe use PG_DIR_MODE_OWNER ? > + if (mkdir(log_opts.basedir, S_IRWXU | S_IRWXG | S_IRWXO)) > + pg_fatal("could not create directory \"%s\": %m\n", > filename_path); > + if (mkdir(log_opts.dumpdir, S_IRWXU | S_IRWXG | S_IRWXO)) > + pg_fatal("could not create directory \"%s\": %m\n", > filename_path); > + if (mkdir(log_opts.logdir, S_IRWXU | S_IRWXG | S_IRWXO)) > + pg_fatal("could not create directory \"%s\": %m\n", > filename_path); You're printing the wrong var. filename_path is not initialized. -- Justin
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
"Bossart, Nathan" writes: > I think the other side of this is that we don't want checkpointing to > continually fail because of a noncritical failure. That could also > lead to problems down the road. Yeah, a persistent failure to complete checkpoints is very nasty. Your disk will soon fill with unrecyclable WAL. I don't see how that's better than a somewhat hypothetical performance issue. regards, tom lane
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
On 1/19/22, 11:08 AM, "Andres Freund" wrote: > On 2022-01-19 13:34:21 -0500, Tom Lane wrote: >> As far as the patch itself goes, I agree that failure to unlink >> is noncritical, because such a file would have no further effect >> and we can just ignore it. > > I don't agree. We iterate through the directory regularly on systems with > catalog changes + logical decoding. An ever increasing list of gunk will make > that more and more expensive. And I haven't heard a meaningful reason why we > would have map-* files that we can't remove. I think the other side of this is that we don't want checkpointing to continually fail because of a noncritical failure. That could also lead to problems down the road. > Ignoring failures like this just makes problems much harder to debug and they > tend to bite harder for it. If such noncritical failures happened regularly, the server logs will likely become filled with messages about it. Perhaps users may not notice for a while, but I don't think the proposed patch would make debugging excessively difficult. Nathan
Re: Replace uses of deprecated Python module distutils.sysconfig
I wrote: > Peter Eisentraut writes: >> On 19.01.22 01:18, Tom Lane wrote: >>> ... I'm tempted to resurrect the >>> idea of changing configure's probe order to "python3 python python2" >>> in the meantime, just so we can see how much of the buildfarm is ready >>> for that. >> This seems sensible in any case, given that we have quasi-committed to >> enforcing Python 3 soon. > Done. The early returns are not great: we have about half a dozen machines so far that are finding python3, and reporting sane-looking Python include paths, but not finding Python.h. They're all Linux-oid machines, so I suppose what is going on is that they have the base python3 package installed but not python3-dev or local equivalent. I want to leave that patch in place long enough so we can get a fairly full survey of which machines are OK and which are not, but I suppose I'll have to revert it tomorrow or so. We did promise the owners a month to adjust their configurations. regards, tom lane
Re: missing indexes in indexlist with partitioned tables
On 2022-Jan-19, Arne Roland wrote: > > a) make sure that the submitted patch updates these test results so > > that the test pass [...] > > Just for the record: I did run the tests, but I did miss that the > commit of Tomas fix for fractional optimization is already on the > master. Please note that this is a very new test case from a patch > committed less than one week ago. Ah, apologies, I didn't realize that that test was so new. > If anything I obviously need practical help, on how > I can catch on recent changes quicker and without manual intervention. > I don't have a modified buildfarm animal running, that tries to apply > my patch and run regression tests for my patch on a daily basis. See src/tools/ci/README (for multi-platform testing of patches on several platforms) and http://commitfest.cputube.org/ -- Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/
Re: do only critical work during single-user vacuum?
On Wed, Jan 19, 2022 at 09:11:48PM +, Bossart, Nathan wrote: > I personally think VACUUM FOR WRAPAROUND is the best of the options > provided thus far. Could you avoid introducing a new grammar pattern in VACUUM? Any new option had better be within the parenthesized part as it is extensible at will with its set of DefElems. -- Michael signature.asc Description: PGP signature
Re: missing indexes in indexlist with partitioned tables
On Wed, Jan 19, 2022 at 1:50 PM Arne Roland wrote: > Hi, > > > I came up with a slightly more intuitive way to force the different > outcome for the fractional test, that does hardly change anything. > > > I'm not sure, whether the differentiation between fract_x and fract_t is > worth it, since there shouldn't be any difference, but as mentioned before > I added it for potential future clarity. > > > Thanks for your feedback again! > > > Regards > > Arne > > > -- > *From:* Arne Roland > *Sent:* Wednesday, January 19, 2022 10:13:55 PM > *To:* Alvaro Herrera; Julien Rouhaud > *Cc:* pgsql-hackers > *Subject:* Re: missing indexes in indexlist with partitioned tables > > > Hi! > > > From: Alvaro Herrera > > [...] > > Hmm, these plan changes look valid to me. A left self-join using the > > primary key column? That looks optimizable all right. > > [...] > > What I still don't know is whether this patch is actually desirable or > > not. If the only cases it affects is self-joins, is there much actual > > value? > > This is not really about self joins. That was just the most simple > example, because otherwise we need a second table. > It's unique, it's not relevant whether it's the same table. In most cases > it won't. I was talking about join pruning. > > > I suspect that the author of partition-wise joins would want to change > > these queries so that whatever was being tested by these self-joins is > > tested by some other means (maybe just create an identical partitioned > > table via CREATE TABLE fract_t2 ... ; INSERT INTO fract_t2 SELECT FROM > > fract_t). But at the same time, the author of this patch should > > Your suggestion doesn't work, because with my patch we solve that case as > well. We solve the general join pruning case. If we make the index > non-unique however, we should be able to test the fractional case the > same way. > > > b) add some test cases to verify that his desired > > behavior is tested somewhere, not just in a test case that's > > incidentally broken by his patch. > > My patch already includes such a test, look at @@ -90,6 +90,13 @@ > src/test/regress/sql/partition_join.sql > Since the selfjoin part was confusing to you, it might be worthwhile to do > test that with two different tables. While I see no need to test in that > way, I will adjust the patch so. Just to make it more clear for people > looking at those tests in the future. > > a) make > > sure that the submitted patch updates these test results so that the > > test pass [...] > > Just for the record: I did run the tests, but I did miss that the commit > of Tomas fix for fractional optimization is already on the master. Please > note that this is a very new test case from a patch committed less than one > week ago. > > I'm glad Julien Rouhaud pointed out, that Tomas patch is committed it by > now. That was very helpful to me, as I can now integrate the two tests. > > @Álvaro Herrera: > If you want to help me, please don't put forward an abstract list of > responsibilities. If anything I obviously need practical help, on how I can > catch on recent changes quicker and without manual intervention. I don't > have a modified buildfarm animal running, that tries to apply my patch and > run regression tests for my patch on a daily basis. > Is there a simple way for me to check for that? > > I will probably integrate those two tests, since they can work of similar > structures without need to recreate the tables again and again. I have > clear understanding how that new test works. I have to attend a few calls > now, but I should be able to update the tests later. > > Regards > Arne > > Hi, - if (indexRelation->rd_rel->relkind == RELKIND_PARTITIONED_INDEX) + if (inhparent && (!index->indisunique || indexRelation->rd_rel->relkind != RELKIND_PARTITIONED_INDEX)) The check on RELKIND_PARTITIONED_INDEX seems to negate what the comment above says: +* Don't add partitioned indexes to the indexlist Cheers
Re: Schema variables - new implementation for Postgres 15
Hi, On Wed, Jan 19, 2022 at 09:09:41PM +0100, Pavel Stehule wrote: > st 19. 1. 2022 v 9:01 odesílatel Julien Rouhaud napsal: > > RAISE NOTICE should use local variables, and in this case is a question if we > should raise a warning. There are two possible analogies - we can see session > variables like global variables, and then the warning should not be raised, > or we can see relation between session variables and plpgsql variables > similar like session variables and some with higher priority, and then > warning should be raised. If we want to ensure that the new session variable > doesn't break code, then session variables should have lower priority than > plpgsql variables too. And because the plpgsql protection against collision > cannot be used, then I prefer raising the warning. Ah that's indeed a good point. I agree, they're from a different part of the system so they should be treated as different things, and thus raising a warning. It's consistent with the chosen conservative approach anyway. > PLpgSQL assignment should not be in collision with session variables ever Agreed. > > > > > =# DO $$ > > BEGIN > > v := 'test'; > > RAISE NOTICE 'v: %', v; > > END; > > $$ LANGUAGE plpgsql; > > ERROR: 42601: "v" is not a known variable > > LINE 3: v := 'test'; > > > > But the RAISE NOTICE does see the session variable (which should be the > > correct > > behavior I think), so the warning should have been raised for this > > instruction > > (and in that case the message is incorrect, as it's not shadowing a > > column). > > > > In this case I can detect node type, and I can identify external param > node, but I cannot to detect if this code was executed from PLpgSQL or from > some other > > So I can to modify warning text to some Yes, that's what I had in mind too. > DETAIL: The identifier can be column reference or query parameter or > session variable reference. > HINT: The column reference and query parameter is preferred against > session variable reference. > > I cannot to use term "plpgsql variable" becase I cannot to ensure validity > of this message > > Maybe is better to don't talk about source of this issue, and just talk > about result - so the warning text should be just > > MESSAGE: "session variable \"\" is shadowed > DETAIL: "session variables can be shadowed by columns, routine's variables > and routine's arguments with same name" > > Is it better? I clearly prefer the 2nd version.
Re: missing indexes in indexlist with partitioned tables
Hi, I came up with a slightly more intuitive way to force the different outcome for the fractional test, that does hardly change anything. I'm not sure, whether the differentiation between fract_x and fract_t is worth it, since there shouldn't be any difference, but as mentioned before I added it for potential future clarity. Thanks for your feedback again! Regards Arne From: Arne Roland Sent: Wednesday, January 19, 2022 10:13:55 PM To: Alvaro Herrera; Julien Rouhaud Cc: pgsql-hackers Subject: Re: missing indexes in indexlist with partitioned tables Hi! > From: Alvaro Herrera > [...] > Hmm, these plan changes look valid to me. A left self-join using the > primary key column? That looks optimizable all right. > [...] > What I still don't know is whether this patch is actually desirable or > not. If the only cases it affects is self-joins, is there much actual > value? This is not really about self joins. That was just the most simple example, because otherwise we need a second table. It's unique, it's not relevant whether it's the same table. In most cases it won't. I was talking about join pruning. > I suspect that the author of partition-wise joins would want to change > these queries so that whatever was being tested by these self-joins is > tested by some other means (maybe just create an identical partitioned > table via CREATE TABLE fract_t2 ... ; INSERT INTO fract_t2 SELECT FROM > fract_t). But at the same time, the author of this patch should Your suggestion doesn't work, because with my patch we solve that case as well. We solve the general join pruning case. If we make the index non-unique however, we should be able to test the fractional case the same way. > b) add some test cases to verify that his desired > behavior is tested somewhere, not just in a test case that's > incidentally broken by his patch. My patch already includes such a test, look at @@ -90,6 +90,13 @@ src/test/regress/sql/partition_join.sql Since the selfjoin part was confusing to you, it might be worthwhile to do test that with two different tables. While I see no need to test in that way, I will adjust the patch so. Just to make it more clear for people looking at those tests in the future. a) make > sure that the submitted patch updates these test results so that the > test pass [...] Just for the record: I did run the tests, but I did miss that the commit of Tomas fix for fractional optimization is already on the master. Please note that this is a very new test case from a patch committed less than one week ago. I'm glad Julien Rouhaud pointed out, that Tomas patch is committed it by now. That was very helpful to me, as I can now integrate the two tests. @Álvaro Herrera: If you want to help me, please don't put forward an abstract list of responsibilities. If anything I obviously need practical help, on how I can catch on recent changes quicker and without manual intervention. I don't have a modified buildfarm animal running, that tries to apply my patch and run regression tests for my patch on a daily basis. Is there a simple way for me to check for that? I will probably integrate those two tests, since they can work of similar structures without need to recreate the tables again and again. I have clear understanding how that new test works. I have to attend a few calls now, but I should be able to update the tests later. Regards Arne diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c index 0ef70ad7f1..fe50919637 100644 --- a/src/backend/optimizer/path/indxpath.c +++ b/src/backend/optimizer/path/indxpath.c @@ -3504,7 +3504,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel, Assert(list_length(exprlist) == list_length(oprlist)); /* Short-circuit if no indexes... */ - if (rel->indexlist == NIL) + if (rel->indexlist == NIL && rel->partIndexlist == NIL) return false; /* @@ -3549,7 +3549,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel, return false; /* Examine each index of the relation ... */ - foreach(ic, rel->indexlist) + foreach(ic, list_concat(rel->indexlist, rel->partIndexlist)) { IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic); int c; diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c index 337f470d58..b04b3f88ad 100644 --- a/src/backend/optimizer/plan/analyzejoins.c +++ b/src/backend/optimizer/plan/analyzejoins.c @@ -23,6 +23,7 @@ #include "postgres.h" #include "nodes/nodeFuncs.h" +#include "nodes/nodes.h" #include "optimizer/clauses.h" #include "optimizer/joininfo.h" #include "optimizer/optimizer.h" @@ -598,7 +599,7 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel) */ ListCell *lc; - foreach(lc, rel->indexlist) + foreach(lc, list_concat(rel->indexlist, rel->partIndexlist)) { IndexOptInfo *ind = (IndexOptInfo *) lfirst(lc); diff --git
Re: refactoring basebackup.c
On Wed, Jan 19, 2022 at 7:16 AM Dipesh Pandit wrote: > I have done initial testing and > working on updating the test coverage. I spent some time thinking about test coverage for the server-side backup code today and came up with the attached (v12-0003). It does an end-to-end test that exercises server-side backup and server-side compression and then untars the backup and validity-checks it using pg_verifybackup. In addition to being good test coverage for these patches, it also plugs a gap in the test coverage of pg_verifybackup, which currently has no test case that untars a tar-format backup and then verifies the result. I couldn't figure out a way to do that back at the time I was working on pg_verifybackup, because I didn't think we had any existing precedent for using 'tar' from a TAP test. But it was pointed out to me that we do, so I used that as the model for this test. It should be easy to generalize this test case to test lz4 and zstd as well, I think. But I guess we'll still need something different to test what your patch is doing. -- Robert Haas EDB: http://www.enterprisedb.com v12-0002-Server-side-gzip-compression.patch Description: Binary data v12-0003-Test-server-side-backup-backup-compression-and-p.patch Description: Binary data v12-0001-Support-base-backup-targets.patch Description: Binary data
Re: Replace uses of deprecated Python module distutils.sysconfig
On Wed, Jan 19, 2022 at 9:40 PM Tom Lane wrote: > > Done. (I couldn't find any equivalent logic in the MSVC build scripts > though; is there something I missed?) > > MSVC will use the path configured in src\tools\msvc\config.pl > $config->{"python"}, there is no ambiguity. Regards, Juan José Santamaría Flecha
Re: missing indexes in indexlist with partitioned tables
Hi! > From: Alvaro Herrera > [...] > Hmm, these plan changes look valid to me. A left self-join using the > primary key column? That looks optimizable all right. > [...] > What I still don't know is whether this patch is actually desirable or > not. If the only cases it affects is self-joins, is there much actual > value? This is not really about self joins. That was just the most simple example, because otherwise we need a second table. It's unique, it's not relevant whether it's the same table. In most cases it won't. I was talking about join pruning. > I suspect that the author of partition-wise joins would want to change > these queries so that whatever was being tested by these self-joins is > tested by some other means (maybe just create an identical partitioned > table via CREATE TABLE fract_t2 ... ; INSERT INTO fract_t2 SELECT FROM > fract_t). But at the same time, the author of this patch should Your suggestion doesn't work, because with my patch we solve that case as well. We solve the general join pruning case. If we make the index non-unique however, we should be able to test the fractional case the same way. > b) add some test cases to verify that his desired > behavior is tested somewhere, not just in a test case that's > incidentally broken by his patch. My patch already includes such a test, look at @@ -90,6 +90,13 @@ src/test/regress/sql/partition_join.sql Since the selfjoin part was confusing to you, it might be worthwhile to do test that with two different tables. While I see no need to test in that way, I will adjust the patch so. Just to make it more clear for people looking at those tests in the future. a) make > sure that the submitted patch updates these test results so that the > test pass [...] Just for the record: I did run the tests, but I did miss that the commit of Tomas fix for fractional optimization is already on the master. Please note that this is a very new test case from a patch committed less than one week ago. I'm glad Julien Rouhaud pointed out, that Tomas patch is committed it by now. That was very helpful to me, as I can now integrate the two tests. @Álvaro Herrera: If you want to help me, please don't put forward an abstract list of responsibilities. If anything I obviously need practical help, on how I can catch on recent changes quicker and without manual intervention. I don't have a modified buildfarm animal running, that tries to apply my patch and run regression tests for my patch on a daily basis. Is there a simple way for me to check for that? I will probably integrate those two tests, since they can work of similar structures without need to recreate the tables again and again. I have clear understanding how that new test works. I have to attend a few calls now, but I should be able to update the tests later. Regards Arne
Re: do only critical work during single-user vacuum?
On 1/19/22, 11:15 AM, "John Naylor" wrote: > This seems to be the motivating reason for wanting new configurability > on the server side. In any case, new knobs are out of scope for this > thread. If the use case is compelling enough, may I suggest starting a > new thread? Sure. Perhaps the new top-level command will use these new options someday. > Regarding the thread subject, I've been playing with the grammar, and > found it's quite easy to have > > VACUUM FOR WRAPAROUND > or > VACUUM FOR EMERGENCY > > since FOR is a reserved word (and following that can be an IDENT plus > a strcmp check) and cannot conflict with table names. This sounds a > bit more natural than VACUUM LIMIT. Opinions? I personally think VACUUM FOR WRAPAROUND is the best of the options provided thus far. Nathan
Re: Adding CI to our tree
Andres Freund writes: > I'm doubtful that it's good that we use BAIL_OUT so liberally, because it > prevents further tests from running (i.e. if 001 bails, 002+ doesn't run), > which doesn't generally seem like the right thing to do after a single test > fails. But that's really independent of the fix you make here. Agreed, that's a separate question. It does seem like "stop this script and move to the next one" would be better behavior, though. > I'd maybe do s/later/in the END block/ or such, so that one knows where to > look. Took me a minute to find it again. OK. regards, tom lane
Re: Replace uses of deprecated Python module distutils.sysconfig
Peter Eisentraut writes: > On 19.01.22 01:18, Tom Lane wrote: >> ... I'm tempted to resurrect the >> idea of changing configure's probe order to "python3 python python2" >> in the meantime, just so we can see how much of the buildfarm is ready >> for that. > This seems sensible in any case, given that we have quasi-committed to > enforcing Python 3 soon. Done. (I couldn't find any equivalent logic in the MSVC build scripts though; is there something I missed?) regards, tom lane
Re: Fix BUG #17335: Duplicate result rows in Gather node
On Fri, 31 Dec 2021 at 00:14, Yura Sokolov wrote: > Suggested quick (and valid) fix in the patch attached: > - If Append has single child, then copy its parallel awareness. I've been looking at this and I've gone through changing my mind about what's the right fix quite a number of times. My current thoughts are that I don't really like the fact that we can have plans in the following shape: Finalize Aggregate -> Gather Workers Planned: 1 -> Partial Aggregate -> Parallel Hash Left Join Hash Cond: (gather_append_1.fk = gather_append_2.fk) -> Index Scan using gather_append_1_ix on gather_append_1 Index Cond: (f = true) -> Parallel Hash -> Parallel Seq Scan on gather_append_2 It's only made safe by the fact that Gather will only use 1 worker. To me, it just seems too fragile to assume that's always going to be the case. I feel like this fix just relies on the fact that create_gather_path() and create_gather_merge_path() do "pathnode->num_workers = subpath->parallel_workers;". If someone decided that was to work a different way, then we risk this breaking again. Additionally, today we have Gather and GatherMerge, but we may one day end up with more node types that gather results from parallel workers, or even a completely different way of executing plans. I think a safer way to fix this is to just not remove the Append/MergeAppend node if the parallel_aware flag of the only-child and the Append/MergeAppend don't match. I've done that in the attached. I believe the code at the end of add_paths_to_append_rel() can remain as is. David dont_remove_singlechild_appends_with_mismatching_parallel_awareness.patch Description: Binary data
Re: Adding CI to our tree
Hi, On 2022-01-19 15:05:44 -0500, Tom Lane wrote: > And the cause of that is obvious: Cluster::start thinks that if "pg_ctl > start" failed, there couldn't possibly be a postmaster running. So it > doesn't bother to update self->_pid, and then the END hook thinks there is > nothing to do. Ah, right. I'm doubtful that it's good that we use BAIL_OUT so liberally, because it prevents further tests from running (i.e. if 001 bails, 002+ doesn't run), which doesn't generally seem like the right thing to do after a single test fails. But that's really independent of the fix you make here. > The attached simple fix gets rid of this problem. Any objections? Nope, sounds like a plan. > diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm > b/src/test/perl/PostgreSQL/Test/Cluster.pm > index 7af0f8db13..fd0738d12d 100644 > --- a/src/test/perl/PostgreSQL/Test/Cluster.pm > +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm > @@ -845,6 +845,11 @@ sub start > { > print "# pg_ctl start failed; logfile:\n"; > print PostgreSQL::Test::Utils::slurp_file($self->logfile); > + > + # pg_ctl could have timed out, so check to see if there's a pid > file; > + # without this, we fail to shut down the new postmaster later. > + $self->_update_pid(-1); I'd maybe do s/later/in the END block/ or such, so that one knows where to look. Took me a minute to find it again. Greetings, Andres Freund
Re: Schema variables - new implementation for Postgres 15
st 19. 1. 2022 v 9:01 odesílatel Julien Rouhaud napsal: > Hi, > > On Tue, Jan 18, 2022 at 10:01:01PM +0100, Pavel Stehule wrote: > > pá 14. 1. 2022 v 3:44 odesílatel Julien Rouhaud > napsal: > > > > > > > > =# let myvariable = 'AA'; > > > LET > > > > > > =# select 'AA' collate "en-x-icu" < myvariable; > > > ?column? > > > -- > > > f > > > (1 row) > > > > > > =# select 'AA' collate "en-x-icu" < myvariable collate mycollation; > > > ERROR: 42P21: collation mismatch between explicit collations > "en-x-icu" > > > and "mycollation" > > > LINE 1: select 'AA' collate "en-x-icu" < myvariable collate mycollat... > > > > > > > What do you expect? I don't understand collating well, but it looks > > correct. Minimally the tables have the same behavior. > > Indeed, I actually didn't know that such object's collation were implicit > and > could be overloaded without a problem as long as there's no conflict > between > all the explicit collations. So I agree that the current behavior is ok, > including a correct handling for wanted conflicts: > > =# create variable var1 text collate "fr-x-icu"; > CREATE VARIABLE > > =# create variable var2 text collate "en-x-icu"; > CREATE VARIABLE > > =# let var1 = 'hoho'; > LET > > =# let var2 = 'hoho'; > LET > > =# select var1 < var2; > ERROR: 42P22: could not determine which collation to use for string > comparison > HINT: Use the COLLATE clause to set the collation explicitly. > > > Please, can you check the attached patches? > > All the issue I mentioned are fixed, thanks! > > thank you for check > > I see a few problems with the other new features added though. The new > session_variables_ambiguity_warning GUC is called even in contexts where it > shouldn't apply. For instance: > > =# set session_variables_ambiguity_warning = 1; > SET > > =# create variable v text; > CREATE VARIABLE > > =# DO $$ > DECLARE v text; > BEGIN > v := 'test'; > RAISE NOTICE 'v: %', v; > END; > $$ LANGUAGE plpgsql; > WARNING: 42702: session variable "v" is shadowed by column > LINE 1: v := 'test' > ^ > DETAIL: The identifier can be column reference or session variable > reference. > HINT: The column reference is preferred against session variable > reference. > QUERY: v := 'test' > > But this "v := 'test'" shouldn't be a substitute for a LET, and it indeed > doesn't work: > Yes, there are some mistakes (bugs). The PLpgSQL assignment as target should not see session variables, so warning is nonsense there. RAISE NOTICE should use local variables, and in this case is a question if we should raise a warning. There are two possible analogies - we can see session variables like global variables, and then the warning should not be raised, or we can see relation between session variables and plpgsql variables similar like session variables and some with higher priority, and then warning should be raised. If we want to ensure that the new session variable doesn't break code, then session variables should have lower priority than plpgsql variables too. And because the plpgsql protection against collision cannot be used, then I prefer raising the warning. PLpgSQL assignment should not be in collision with session variables ever > > =# DO $$ > BEGIN > v := 'test'; > RAISE NOTICE 'v: %', v; > END; > $$ LANGUAGE plpgsql; > ERROR: 42601: "v" is not a known variable > LINE 3: v := 'test'; > > But the RAISE NOTICE does see the session variable (which should be the > correct > behavior I think), so the warning should have been raised for this > instruction > (and in that case the message is incorrect, as it's not shadowing a > column). > In this case I can detect node type, and I can identify external param node, but I cannot to detect if this code was executed from PLpgSQL or from some other So I can to modify warning text to some DETAIL: The identifier can be column reference or query parameter or session variable reference. HINT: The column reference and query parameter is preferred against session variable reference. I cannot to use term "plpgsql variable" becase I cannot to ensure validity of this message Maybe is better to don't talk about source of this issue, and just talk about result - so the warning text should be just MESSAGE: "session variable \"\" is shadowed DETAIL: "session variables can be shadowed by columns, routine's variables and routine's arguments with same name" Is it better?
Re: Adding CI to our tree
I wrote: > This test attempt revealed another problem too: the standby never > shut down, and thus the calling "make" never quit, until I intervened > manually. I'm not sure why. I see that Cluster::promote uses > system_or_bail() to run "pg_ctl promote" ... could it be that > BAIL_OUT causes the normal script END hooks to not get run? > But it seems like we'd have noticed that long ago. I failed to reproduce any failure in the promote step, and I now think I was mistaken and it happened during the standby's initial start. I can reproduce that very easily by setting PGCTLTIMEOUT to a second or two; with fsync enabled, it takes the standby more than that to reach a consistent state. And the cause of that is obvious: Cluster::start thinks that if "pg_ctl start" failed, there couldn't possibly be a postmaster running. So it doesn't bother to update self->_pid, and then the END hook thinks there is nothing to do. Now, leaving an idle postmaster hanging around isn't a mortal sin, since it'll go away by itself shortly after the next cycle of testing does an "rm -rf" on its data directory. But it's ugly, and conceivably it could cause problems for later testing on machines with limited shmem or semaphore space. The attached simple fix gets rid of this problem. Any objections? regards, tom lane diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm index 7af0f8db13..fd0738d12d 100644 --- a/src/test/perl/PostgreSQL/Test/Cluster.pm +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm @@ -845,6 +845,11 @@ sub start { print "# pg_ctl start failed; logfile:\n"; print PostgreSQL::Test::Utils::slurp_file($self->logfile); + + # pg_ctl could have timed out, so check to see if there's a pid file; + # without this, we fail to shut down the new postmaster later. + $self->_update_pid(-1); + BAIL_OUT("pg_ctl start failed") unless $params{fail_ok}; return 0; } @@ -1123,7 +1128,10 @@ archive_command = '$copy_command' return; } -# Internal method +# Internal method to update $self->{_pid} +# $is_running = 1: pid file should be there +# $is_running = 0: pid file should NOT be there +# $is_running = -1: we aren't sure sub _update_pid { my ($self, $is_running) = @_; @@ -1138,7 +1146,7 @@ sub _update_pid close $pidfile; # If we found a pidfile when there shouldn't be one, complain. - BAIL_OUT("postmaster.pid unexpectedly present") unless $is_running; + BAIL_OUT("postmaster.pid unexpectedly present") if $is_running == 0; return; } @@ -1146,7 +1154,7 @@ sub _update_pid print "# No postmaster PID for node \"$name\"\n"; # Complain if we expected to find a pidfile. - BAIL_OUT("postmaster.pid unexpectedly not present") if $is_running; + BAIL_OUT("postmaster.pid unexpectedly not present") if $is_running == 1; return; }
Re: do only critical work during single-user vacuum?
On 1/18/22, 9:47 PM, "Masahiko Sawada" wrote: > IIUC what we want to do here are two things: (1) select only old > tables and (2) set INDEX_CLEANUP = off, TRUNCATE = off, and FREEZE = > on. VACUUM LIMIT statement does both things at the same time. Although > I’m concerned a bit about its flexibility, it’s a reasonable solution. > > On the other hand, it’s probably also useful to do either one thing in > some cases. For instance, having a selector for (1) would be useful, > and having a new option like FAST_FREEZE for (2) would also be useful. > Given there is already a way for (2) (it does not default though), I > think it might also be a good start inventing something for (1). For > instance, a selector for VACUUM statement I came up with is: > > VACUUM (verbose on) TABLES WITH (min_xid_age = 16); > or > VACUUM (verbose on) TABLES WITH (min_age = failsafe_limit); > > We can expand it in the future to select tables by, for example, dead > tuple ratio, size, etc. > > It's a random thought but maybe worth considering. That's an interesting idea. A separate selector clause could also allow users to choose how they interacted (e.g., should the options be OR'd or AND'd). Nathan
Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
On Wed, Jan 19, 2022 at 6:56 AM Robert Haas wrote: > I don't think I've said anything on this thread that is an attack on > you. I am getting pretty frustrated with the tenor of the discussion, > though. I feel like you're the one attacking me, and I don't like it. "Attack" is a strong word (much stronger than "defend"), and I don't think I'd use it to describe anything that has happened on this thread. All I said was that you misrepresented my views when you pounced on my use of the word "continuous". Which, honestly, I was very surprised by. > For all of that, I'm not even convinced that you're wrong. I just > think you might be wrong. I don't really know. I agree that I might be wrong, though of course I think that I'm probably correct. I value your input as a critical voice -- that's generally how we get really good designs. > However, there are cases where waiting, and only > waiting, gets the job done. If you're not willing to admit that those > cases exist, or you think they don't matter, then we disagree. They exist, of course. That's why I don't want to completely eliminate the idea of waiting for a cleanup lock. Rather, I want to change the design to recognize that that's an extreme measure, that should be delayed for as long as possible. There are many ways that the problem could naturally resolve itself. Waiting for a cleanup lock after only 50 million XIDs (the vacuum_freeze_min_age default) is like performing brain surgery to treat somebody with a headache (at least with the infrastructure from the earlier patches in place). It's not impossible that "surgery" could help, in theory (could be a tumor, better to catch these things early!), but that fact alone can hardly justify such a drastic measure. That doesn't mean that brain surgery isn't ever appropriate, of course. It should be delayed until it starts to become obvious that it's really necessary (but before it really is too late). > If you > admit that they exist and think they matter but believe that there's > some reason why increasing FreezeLimit can't cause any damage, then > either (a) you have a good reason for that belief which I have thus > far been unable to understand or (b) you're more optimistic about the > proposed change than can be entirely justified. I don't deny that it's just about possible that the changes that I'm thinking of could make the situation worse in some cases, but I think that the overwhelming likelihood is that things will be improved across the board. Consider the age of the tables from BenchmarkSQL, with the patch series: relname │ age │ mxid_age ──┼─┼── bmsql_district │ 657 │0 bmsql_warehouse │ 696 │0 bmsql_item │ 1,371,978 │0 bmsql_config │ 1,372,061 │0 bmsql_new_order │ 3,754,163 │0 bmsql_history│ 11,545,940 │0 bmsql_order_line │ 23,095,678 │0 bmsql_oorder │ 40,653,743 │0 bmsql_customer │ 51,371,610 │0 bmsql_stock │ 51,371,610 │0 (10 rows) We see significant "natural variation" here, unlike HEAD, where the age of all tables is exactly the same at all times, or close to it (incidentally, this leads to the largest tables all being anti-wraparound VACUUMed at the same time). There is a kind of natural ebb and flow for each table over time, as relfrozenxid is advanced, due in part to workload characteristics. Less than half of all XIDs will ever modify the two largest tables, for example, and so autovacuum should probably never be launched because of the age of either table (barring some change in workload conditions, perhaps). As I've said a few times now, XIDs are generally "the wrong unit", except when needed as a backstop against wraparound failure. The natural variation that I see contributes to my optimism. A situation where we cannot get a cleanup lock may well resolve itself, for many reasons, that are hard to precisely nail down but are nevertheless very real. The vacuum_freeze_min_age design (particularly within an aggressive VACUUM) is needlessly rigid, probably just because the assumption before now has always been that we can only advance relfrozenxid in an aggressive VACUUM (it might happen in a non-aggressive VACUUM if we get very lucky, which cannot be accounted for). Because it is rigid, it is brittle. Because it is brittle, it will (on a long enough timeline, for a susceptible workload) actually break. > On the other hand if that user is going to close that > cursor after 10 minutes and open a new one in the same place 10 > seconds later, the best thing to do is to keep FreezeLimit as low as > possible, because the first time we wait for the pin to be released > we're guaranteed to advance relfrozenxid within 10 minutes, whereas if > we don't do that we may keep missing the brief windows in which no > cursor is held for a very long time. But we have absolutely no way of >
Re: Time to increase hash_mem_multiplier default?
On Sun, Jan 16, 2022 at 7:28 PM Peter Geoghegan wrote: > > The current hash_mem_multiplier default is 1.0, which is a fairly > conservative default: it preserves the historic behavior, which is > that hash-based executor nodes receive the same work_mem budget as > sort-based nodes. I propose that the default be increased to 2.0 for > Postgres 15. I don't have anything really profound to say here, but in the last year I did on a couple occasions recommend clients to raise hash_mem_multiplier to 2.0 to fix performance problems. During this cycle, we also got a small speedup in the external sorting code. Also, if the "generation context" idea gets traction, that might be another reason to consider differentiating the mem settings. -- John Naylor EDB: http://www.enterprisedb.com
Re: speed up text_position() for utf-8
I wrote: > 0001 puts the main implementation of pg_utf_mblen() into an inline > function and uses this in pg_mblen(). This is somewhat faster in the > strpos tests, so that gives some measure of the speedup expected for > other callers. Text search seems to call this a lot, so this might > have noticeable benefit. > > 0002 refactors text_position_get_match_pos() to use > pg_mbstrlen_with_len(). This itself is significantly faster when > combined with 0001, likely because the latter can inline the call to > pg_mblen(). The intention is to speed up more than just text_position. > > 0003 explicitly specializes for the inline version of pg_utf_mblen() > into pg_mbstrlen_with_len(), but turns out to be almost as slow as > master for ascii. It doesn't help if I undo the previous change in > pg_mblen(), and I haven't investigated why yet. > > 0002 looks good now, but the experience with 0003 makes me hesitant to > propose this seriously until I can figure out what's going on there. > > The test is as earlier, a worst-case substring search, times in milliseconds. > > patch | no match | ascii | multibyte > +--+---+--- > PG11 | 1220 | 1220 | 1150 > master | 385 | 2420 | 1980 > 0001 | 390 | 2180 | 1670 > 0002 | 389 | 1330 | 1100 > 0003 | 391 | 2100 | 1360 I tried this test on a newer CPU, and 0003 had no regression. Both systems used gcc 11.2. Rather than try to figure out why an experiment had unexpected behavior, I plan to test 0001 and 0002 on a couple different compilers/architectures and call it a day. It's also worth noting that 0002 by itself seemed to be decently faster on the newer machine, but not as fast as 0001 and 0002 together. Looking at the assembly, pg_mblen is inlined into pg_mbstrlen_[with_len] and pg_mbcliplen, so the specialization for utf-8 in 0001 would be inlined in the other 3 as well. That's only a few bytes, so I think it's fine. -- John Naylor EDB: http://www.enterprisedb.com
Re: do only critical work during single-user vacuum?
On Wed, Jan 19, 2022 at 12:46 AM Masahiko Sawada wrote: > > On Fri, Jan 14, 2022 at 7:04 AM Bossart, Nathan wrote: > > > > I guess I'm ultimately imagining the new options as replacing the > > vacuumdb implementation. IOW vacuumdb would just use MIN_(M)XID_AGE > > behind the scenes (as would a new top-level command). > > I had the same idea. This seems to be the motivating reason for wanting new configurability on the server side. In any case, new knobs are out of scope for this thread. If the use case is compelling enough, may I suggest starting a new thread? Regarding the thread subject, I've been playing with the grammar, and found it's quite easy to have VACUUM FOR WRAPAROUND or VACUUM FOR EMERGENCY since FOR is a reserved word (and following that can be an IDENT plus a strcmp check) and cannot conflict with table names. This sounds a bit more natural than VACUUM LIMIT. Opinions? -- John Naylor EDB: http://www.enterprisedb.com
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
Hi, On 2022-01-19 13:34:21 -0500, Tom Lane wrote: > Bharath Rupireddy writes: > > On Sat, Jan 15, 2022 at 2:59 PM Julien Rouhaud wrote: > >> Maybe I'm missing something but wouldn't > >> https://commitfest.postgresql.org/36/3448/ better solve the problem? > > > The error can cause the new background process proposed there in that > > thread to restart, which is again costly. Since we have LOG-only and > > continue behavior in CheckPointSnapBuild already, having the same > > behavior for CheckPointLogicalRewriteHeap helps a lot. > > [ stares at CheckPointLogicalRewriteHeap for awhile ... ] > > This code has got more problems than that. It took me awhile to > absorb it, but we don't actually care about the contents of any of > those files; all of the information is encoded in the file *names*. I'm not following - we *do* need the contents of the files? They're applied as-needed in ApplyLogicalMappingFile(). > (This strikes me as probably not a very efficient design, compared > to putting the same data into a single text file; but for now I'll > assume we're not up for a complete rewrite.) That being the case, > I wonder what it is we expect fsync'ing the surviving files to do > exactly. We should be fsync'ing the directory not the files > themselves, no? Fsyncing the directory doesn't guarantee anything about the contents of files. But, you're right, we need an fsync of the directory too. > Other things that seem poorly thought out: > > * Why is the check for "map-" prefix after, rather than before, > the lstat? It doesn't seem to matter much - there shouldn't be a meaningful amount of other files in there. > * Why is it okay to ignore lstat failure? Seems like we might > as well not even have the lstat. Yea, that seems odd, not sure why that ended up this way. I guess the aim might have been to tolerate random files we don't have permissions for or such? > * The sscanf on the file name would not notice trailing junk, > such as an editor backup marker. Is that okay? I don't really see a problem with it - there shouldn't be other files matching the pattern - but it couldn't hurt to check the pattern matches exhaustively. > As far as the patch itself goes, I agree that failure to unlink > is noncritical, because such a file would have no further effect > and we can just ignore it. I don't agree. We iterate through the directory regularly on systems with catalog changes + logical decoding. An ever increasing list of gunk will make that more and more expensive. And I haven't heard a meaningful reason why we would have map-* files that we can't remove. Ignoring failures like this just makes problems much harder to debug and they tend to bite harder for it. > I think I also agree that failure of the sscanf is noncritical, because the > implication of that is that the file name doesn't conform to our > expectations, which means it's basically just like the check that causes us > to ignore names not starting with "map-". (Actually, isn't the separate > check for "map-" useless, given that sscanf will make the equivalent check?) Well, this way only files starting with "map-" are expected to conform to a strict format, the rest is ignored? > Anyway, I think possibly we can drop the bottom half of the loop > (the part trying to fsync non-removed files) in favor of fsync'ing > the directory afterwards. Thoughts? I don't think that'd be correct. In short: We should add a directory fsync, I'm fine with improving the error checking, but the rest seems like a net-negative with no convincing reasoning. Greetings, Andres Freund
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
I wrote: > Anyway, I think possibly we can drop the bottom half of the loop > (the part trying to fsync non-removed files) in favor of fsync'ing > the directory afterwards. Thoughts? Oh, scratch that --- *this* loop doesn't care about the file contents, but other code does. However, don't we need a directory fsync too? Or is that handled someplace else? regards, tom lane
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
Bharath Rupireddy writes: > On Sat, Jan 15, 2022 at 2:59 PM Julien Rouhaud wrote: >> Maybe I'm missing something but wouldn't >> https://commitfest.postgresql.org/36/3448/ better solve the problem? > The error can cause the new background process proposed there in that > thread to restart, which is again costly. Since we have LOG-only and > continue behavior in CheckPointSnapBuild already, having the same > behavior for CheckPointLogicalRewriteHeap helps a lot. [ stares at CheckPointLogicalRewriteHeap for awhile ... ] This code has got more problems than that. It took me awhile to absorb it, but we don't actually care about the contents of any of those files; all of the information is encoded in the file *names*. (This strikes me as probably not a very efficient design, compared to putting the same data into a single text file; but for now I'll assume we're not up for a complete rewrite.) That being the case, I wonder what it is we expect fsync'ing the surviving files to do exactly. We should be fsync'ing the directory not the files themselves, no? Other things that seem poorly thought out: * Why is the check for "map-" prefix after, rather than before, the lstat? * Why is it okay to ignore lstat failure? Seems like we might as well not even have the lstat. * The sscanf on the file name would not notice trailing junk, such as an editor backup marker. Is that okay? As far as the patch itself goes, I agree that failure to unlink is noncritical, because such a file would have no further effect and we can just ignore it. I think I also agree that failure of the sscanf is noncritical, because the implication of that is that the file name doesn't conform to our expectations, which means it's basically just like the check that causes us to ignore names not starting with "map-". (Actually, isn't the separate check for "map-" useless, given that sscanf will make the equivalent check?) I started out wondering why the patch didn't also change the loop's other ERROR conditions to LOG. But we do want to ERROR if we're unable to sync transient state down to disk, and that is what the other steps (think they) are doing. It might be worth a comment to point that out though, before someone decides that if these errors are just LOG level then the others can be too. Anyway, I think possibly we can drop the bottom half of the loop (the part trying to fsync non-removed files) in favor of fsync'ing the directory afterwards. Thoughts? regards, tom lane
Re: Compiling PostgreSQL for WIndows with 16kb blocksize
Hi, On January 19, 2022 9:07:50 AM PST, Yannick Collette wrote: >Hello, > >For test purposes I need to compile PostgreSQL 14.1 using a 16kb blocksize. > >CFLAGS="-D WINVER=0x0600 -D _WIN32_WINNT=0x0600" LIBS="-ladvapi32" > ./configure --host=x86_64-w64-mingw32 --with-blocksize=16 >--with-wal-blocksize=16 --with-openssl --with-libxml >--prefix=/c/postgresql/pg14/ 2>&1 | tee configure.log > >Below is the beginning of my configure.log, with no errors: make and make >install ok also. > >configure: loading site script /etc/config.site >checking build system type... x86_64-pc-msys >checking host system type... x86_64-w64-mingw32 >checking which template to use... win32 >checking whether NLS is wanted... no >checking for default port number... 5432 >checking for block size... 16kB >checking for segment size... 1GB >checking for WAL block size... 16kB >checking for x86_64-w64-mingw32-gcc... x86_64-w64-mingw32-gcc >checking whether the C compiler works... yes >... > >DB created successfully using initdb. > >Unfortunately my blocksize is still 8kb when checking in DB. > >postgres=# show block_size; > block_size > > 8192 >(1 row) > > >postgres=# select version(); > version >- > PostgreSQL 14.1 on x86_64-w64-mingw32, compiled by >x86_64-w64-mingw32-gcc.exe (Rev9, Built by MSYS2 project) 10.2.0, 64-bit >(1 row) > >Is there anything additional step I'm missing? Any chance this is from an older build? You might need to make clean, if you previously ./configured differently. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: slowest tap tests - split or accelerate?
Hi, On 2022-01-19 12:14:21 -0500, Tom Lane wrote: > No, it was largely the same as what you have here, I think. I dug > up my WIP patch and attach it below, just in case there's any ideas > worth borrowing. Heh, it does look quite similar. > + "cp -a \"%s\" \"%s/data\" > > \"%s/log/initdb.log\" 2>&1", > + pg_proto_datadir, > + temp_instance, > + outputdir); > + if (system(buf)) > + { > + fprintf(stderr, _("\n%s: cp failed\nExamine > %s/log/initdb.log for the reason.\nCommand was: %s\n"), progname, outputdir, > buf); > + exit(2); > + } Both ours have this. Unfortunately on windows cp doesn't natively exist. Although git does provide it. I tried a few things that appear to be natively available (time is best of three executions): gnu cp from git, cp -a tmp_install\initdb_template t\ 670ms xcopy.exe /E /Q tmp_install\initdb_template t\ 638ms robocopy /e /NFL /NDL tmp_install\initdb_template t\ 575ms So I guess we could use robocopy? That's shipped as part of windows starting in windows 10... xcopy has been there for longer, so I might just default to that. Greetings, Andres Freund
Re: Compiling PostgreSQL for WIndows with 16kb blocksize
Yannick Collette writes: > For test purposes I need to compile PostgreSQL 14.1 using a 16kb blocksize. > CFLAGS="-D WINVER=0x0600 -D _WIN32_WINNT=0x0600" LIBS="-ladvapi32" > ./configure --host=x86_64-w64-mingw32 --with-blocksize=16 > --with-wal-blocksize=16 --with-openssl --with-libxml > --prefix=/c/postgresql/pg14/ 2>&1 | tee configure.log I don't know anything about the Windows-specific details here, but the --with-blocksize option looks fine, and it works for me: regression=# show block_size; block_size 16384 (1 row) (Worth noting here is that a lot of our regression tests fail at non-default blocksizes, because plans change, or rows no longer need toasting, or the like.) > Unfortunately my blocksize is still 8kb when checking in DB. > postgres=# show block_size; > block_size > > 8192 > (1 row) I think you must be connecting to the wrong server. regards, tom lane
Re: slowest tap tests - split or accelerate?
Andres Freund writes: > On 2022-01-19 11:54:01 -0500, Tom Lane wrote: >> Me too ;-). As I remarked earlier, I'd tried this once before and >> gave up because it didn't seem to be winning much. But that was >> before we had so many initdb's triggered by TAP tests, I think. > What approach did you use? Do you have a better idea than generating > tmp_install/initdb_template? No, it was largely the same as what you have here, I think. I dug up my WIP patch and attach it below, just in case there's any ideas worth borrowing. >> (Note that all four runs have the "fsync = on" removed from >> 008_fsm_truncation.pl.) > I assume you're planning on comitting that? Yeah, will do that shortly. regards, tom lane diff --git a/src/Makefile.global.in b/src/Makefile.global.in index dc8a89af8e..a28a03ac72 100644 --- a/src/Makefile.global.in +++ b/src/Makefile.global.in @@ -332,6 +332,7 @@ ifeq ($(MAKELEVEL),0) rm -rf '$(abs_top_builddir)'/tmp_install $(MKDIR_P) '$(abs_top_builddir)'/tmp_install/log $(MAKE) -C '$(top_builddir)' DESTDIR='$(abs_top_builddir)'/tmp_install install >'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1 + '$(abs_top_builddir)/tmp_install$(bindir)/initdb' -D '$(abs_top_builddir)/tmp_install/proto_pgdata' --no-clean --no-sync -A trust >'$(abs_top_builddir)'/tmp_install/log/initdb.log 2>&1 endif $(if $(EXTRA_INSTALL),for extra in $(EXTRA_INSTALL); do $(MAKE) -C '$(top_builddir)'/$$extra DESTDIR='$(abs_top_builddir)'/tmp_install install >>'$(abs_top_builddir)'/tmp_install/log/install.log || exit; done) endif @@ -355,7 +356,7 @@ $(if $(filter $(PORTNAME),darwin),DYLD_LIBRARY_PATH,$(if $(filter $(PORTNAME),ai endef define with_temp_install -PATH="$(abs_top_builddir)/tmp_install$(bindir):$$PATH" $(call add_to_path,$(ld_library_path_var),$(abs_top_builddir)/tmp_install$(libdir)) +PATH="$(abs_top_builddir)/tmp_install$(bindir):$$PATH" $(call add_to_path,$(ld_library_path_var),$(abs_top_builddir)/tmp_install$(libdir)) PG_PROTO_DATADIR='$(abs_top_builddir)/tmp_install/proto_pgdata' endef ifeq ($(enable_tap_tests),yes) diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm index f4fa600951..22c1a726c7 100644 --- a/src/test/perl/PostgresNode.pm +++ b/src/test/perl/PostgresNode.pm @@ -404,8 +404,23 @@ sub init mkdir $self->backup_dir; mkdir $self->archive_dir; - TestLib::system_or_bail('initdb', '-D', $pgdata, '-A', 'trust', '-N', - @{ $params{extra} }); + # If we're using default initdb parameters, and the top-level "make check" + # created a prototype data directory for us, just copy that. + if (!defined($params{extra}) && + defined($ENV{PG_PROTO_DATADIR}) && + -d $ENV{PG_PROTO_DATADIR}) + { + rmdir($pgdata); + RecursiveCopy::copypath($ENV{PG_PROTO_DATADIR}, $pgdata); + chmod(0700, $pgdata); + } + else + { + TestLib::system_or_bail('initdb', '-D', $pgdata, +'--no-clean', '--no-sync', '-A', 'trust', +@{ $params{extra} }); + } + TestLib::system_or_bail($ENV{PG_REGRESS}, '--config-auth', $pgdata); open my $conf, '>>', "$pgdata/postgresql.conf"; diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c index abb742b1ed..04d7fb910b 100644 --- a/src/test/regress/pg_regress.c +++ b/src/test/regress/pg_regress.c @@ -2214,6 +2214,8 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc if (temp_instance) { + char *pg_proto_datadir; + struct stat ppst; FILE *pg_conf; const char *env_wait; int wait_seconds; @@ -2243,20 +2245,43 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc if (!directory_exists(buf)) make_directory(buf); - /* initdb */ - header(_("initializing database system")); - snprintf(buf, sizeof(buf), - "\"%s%sinitdb\" -D \"%s/data\" --no-clean --no-sync%s%s > \"%s/log/initdb.log\" 2>&1", - bindir ? bindir : "", - bindir ? "/" : "", - temp_instance, - debug ? " --debug" : "", - nolocale ? " --no-locale" : "", - outputdir); - if (system(buf)) + /* see if we should run initdb or just copy a prototype datadir */ + pg_proto_datadir = getenv("PG_PROTO_DATADIR"); + if (!nolocale && + pg_proto_datadir && + stat(pg_proto_datadir, ) == 0 && + S_ISDIR(ppst.st_mode)) { - fprintf(stderr, _("\n%s: initdb failed\nExamine %s/log/initdb.log for the reason.\nCommand was: %s\n"), progname, outputdir, buf); - exit(2); + /* copy prototype */ + header(_("copying prototype data directory")); + snprintf(buf, sizeof(buf), + "cp -a \"%s\" \"%s/data\" > \"%s/log/initdb.log\" 2>&1", + pg_proto_datadir, + temp_instance, + outputdir); + if (system(buf)) + { +fprintf(stderr, _("\n%s: cp failed\nExamine %s/log/initdb.log for the reason.\nCommand was: %s\n"), progname, outputdir, buf); +exit(2); + } + } + else + { + /* run initdb */ + header(_("initializing database system")); + snprintf(buf, sizeof(buf), +
Compiling PostgreSQL for WIndows with 16kb blocksize
Hello, For test purposes I need to compile PostgreSQL 14.1 using a 16kb blocksize. CFLAGS="-D WINVER=0x0600 -D _WIN32_WINNT=0x0600" LIBS="-ladvapi32" ./configure --host=x86_64-w64-mingw32 --with-blocksize=16 --with-wal-blocksize=16 --with-openssl --with-libxml --prefix=/c/postgresql/pg14/ 2>&1 | tee configure.log Below is the beginning of my configure.log, with no errors: make and make install ok also. configure: loading site script /etc/config.site checking build system type... x86_64-pc-msys checking host system type... x86_64-w64-mingw32 checking which template to use... win32 checking whether NLS is wanted... no checking for default port number... 5432 checking for block size... 16kB checking for segment size... 1GB checking for WAL block size... 16kB checking for x86_64-w64-mingw32-gcc... x86_64-w64-mingw32-gcc checking whether the C compiler works... yes ... DB created successfully using initdb. Unfortunately my blocksize is still 8kb when checking in DB. postgres=# show block_size; block_size 8192 (1 row) postgres=# select version(); version - PostgreSQL 14.1 on x86_64-w64-mingw32, compiled by x86_64-w64-mingw32-gcc.exe (Rev9, Built by MSYS2 project) 10.2.0, 64-bit (1 row) Is there anything additional step I'm missing? Thanks in advance for your help! Yannick
Re: slowest tap tests - split or accelerate?
Hi, On 2022-01-19 11:54:01 -0500, Tom Lane wrote: > Andres Freund writes: > > On 2022-01-18 13:40:40 -0800, Andres Freund wrote: > >> Maybe we really should do at least the most simplistic caching for > >> initdbs, by > >> doing one initdb as part of the creation of temp_install. Then > >> Cluster::init > >> would need logic to only use that if $params{extra} is empty. > > > I hacked this together. And the wins are bigger than I thought. > > Me too ;-). As I remarked earlier, I'd tried this once before and > gave up because it didn't seem to be winning much. But that was > before we had so many initdb's triggered by TAP tests, I think. What approach did you use? Do you have a better idea than generating tmp_install/initdb_template? I for a bit wondered whether initdb should do this internally instead. But it seemed more work than I wanted to tackle. The bit in the patch about generating initdb_template in Install.pm definitely needs to be made conditional, but I don't precisely know on what. The buildfarm just calls it as perl install.pl "$installdir > So this looks like it'll be a nice win for low-end hardware, too. Nice! > (Note that all four runs have the "fsync = on" removed from > 008_fsm_truncation.pl.) I assume you're planning on comitting that? Greetings, Andres Freund
Re: slowest tap tests - split or accelerate?
Andres Freund writes: > On 2022-01-18 13:40:40 -0800, Andres Freund wrote: >> Maybe we really should do at least the most simplistic caching for initdbs, >> by >> doing one initdb as part of the creation of temp_install. Then Cluster::init >> would need logic to only use that if $params{extra} is empty. > I hacked this together. And the wins are bigger than I thought. Me too ;-). As I remarked earlier, I'd tried this once before and gave up because it didn't seem to be winning much. But that was before we had so many initdb's triggered by TAP tests, I think. I tried this patch on florican's host, which seems mostly disk-bound when doing check-world. It barely gets any win from parallelism: $ time make -s check-world -j1 >/dev/null 3809.60 real 584.44 user 282.23 sys $ time make -s check-world -j2 >/dev/null 3789.90 real 610.60 user 289.60 sys Adding v2-0001-hack-use-template-initdb-in-tap-tests.patch: $ time make -s check-world -j1 >/dev/null 3193.46 real 221.32 user 226.11 sys $ time make -s check-world -j2 >/dev/null 3211.19 real 224.31 user 230.07 sys (Note that all four runs have the "fsync = on" removed from 008_fsm_truncation.pl.) So this looks like it'll be a nice win for low-end hardware, too. regards, tom lane
Re: Replace uses of deprecated Python module distutils.sysconfig
On 19.01.22 01:18, Tom Lane wrote: Anyway, based on these results, we might have better luck switching to sysconfig after we start forcing python3. I'm tempted to resurrect the idea of changing configure's probe order to "python3 python python2" in the meantime, just so we can see how much of the buildfarm is ready for that. This seems sensible in any case, given that we have quasi-committed to enforcing Python 3 soon.
Re: Remove redundant MemoryContextSwith in BeginCopyFrom
Bharath Rupireddy writes: > +1. It looks like a thinko from c532d15d. There's no code in between, > so switching to oldcontext doesn't make sense. Agreed. > I think we also need to remove MemoryContextSwitchTo(oldcontext); at > the end of BeginCopyTo in copyto.c, because we are not changing memory > contexts in between. Hmm, I think it'd be a better idea to remove the one in the middle of BeginCopyTo. The code after that is still doing setup of the cstate, so the early switch back looks to me like trouble waiting to happen. regards, tom lane
Re: refactoring basebackup.c
On 1/18/22 8:12 PM, Jeevan Ladhe wrote: Similar to LZ4 server-side compression, I have also tried to add a ZSTD server-side compression in the attached patch. Thanks Jeevan. while testing found one scenario where the server is getting crash while performing pg_basebackup against server-compression=zstd for a huge data second time Steps to reproduce --PG sources ( apply v11-0001,v11-0001,v9-0001,v9-0002 , configure --with-lz4,--with-zstd, make/install, initdb, start server) --insert huge data (./pgbench -i -s 2000 postgres) --restart the server (./pg_ctl -D data restart) --pg_basebackup ( ./pg_basebackup -t server:/tmp/yc1 --server-compression=zstd -R -Xnone -n -N -l 'ccc' --no-estimate-size -v) --insert huge data (./pgbench -i -s 1000 postgres) --restart the server (./pg_ctl -D data restart) --run pg_basebackup again (./pg_basebackup -t server:/tmp/yc11 --server-compression=zstd -v -Xnone ) [edb@centos7tushar bin]$ ./pg_basebackup -t server:/tmp/yc11 --server-compression=zstd -v -Xnone pg_basebackup: initiating base backup, waiting for checkpoint to complete 2022-01-19 21:23:26.508 IST [30219] LOG: checkpoint starting: force wait 2022-01-19 21:23:26.608 IST [30219] LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 1 removed, 0 recycled; write=0.001 s, sync=0.001 s, total=0.101 s; sync files=0, longest=0.000 s, average=0.000 s; distance=16369 kB, estimate=16369 kB pg_basebackup: checkpoint completed TRAP: FailedAssertion("len > 0 && len <= sink->bbs_buffer_length", File: "../../../src/include/replication/basebackup_sink.h", Line: 208, PID: 30226) postgres: walsender edb [local] sending backup "pg_basebackup base backup"(ExceptionalCondition+0x7a)[0x94ceca] postgres: walsender edb [local] sending backup "pg_basebackup base backup"[0x7b9a08] postgres: walsender edb [local] sending backup "pg_basebackup base backup"[0x7b9be2] postgres: walsender edb [local] sending backup "pg_basebackup base backup"[0x7b5b30] postgres: walsender edb [local] sending backup "pg_basebackup base backup"(SendBaseBackup+0x563)[0x7b7053] postgres: walsender edb [local] sending backup "pg_basebackup base backup"(exec_replication_command+0x961)[0x7c9a41] postgres: walsender edb [local] sending backup "pg_basebackup base backup"(PostgresMain+0x92f)[0x81ca3f] postgres: walsender edb [local] sending backup "pg_basebackup base backup"[0x48e430] postgres: walsender edb [local] sending backup "pg_basebackup base backup"(PostmasterMain+0xfd2)[0x785702] postgres: walsender edb [local] sending backup "pg_basebackup base backup"(main+0x1c6)[0x48fb96] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f63642c8555] postgres: walsender edb [local] sending backup "pg_basebackup base backup"[0x48feb5] pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. 2022-01-19 21:25:34.485 IST [30205] LOG: server process (PID 30226) was terminated by signal 6: Aborted 2022-01-19 21:25:34.485 IST [30205] DETAIL: Failed process was running: BASE_BACKUP ( LABEL 'pg_basebackup base backup', PROGRESS, MANIFEST 'yes', TABLESPACE_MAP, TARGET 'server', TARGET_DETAIL '/tmp/yc11', COMPRESSION 'zstd') 2022-01-19 21:25:34.485 IST [30205] LOG: terminating any other active server processes [edb@centos7tushar bin]$ 2022-01-19 21:25:34.489 IST [30205] LOG: all server processes terminated; reinitializing 2022-01-19 21:25:34.536 IST [30228] LOG: database system was interrupted; last known up at 2022-01-19 21:23:26 IST 2022-01-19 21:25:34.669 IST [30228] LOG: database system was not properly shut down; automatic recovery in progress 2022-01-19 21:25:34.671 IST [30228] LOG: redo starts at 9/728 2022-01-19 21:25:34.671 IST [30228] LOG: invalid record length at 9/7000148: wanted 24, got 0 2022-01-19 21:25:34.671 IST [30228] LOG: redo done at 9/7000110 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s 2022-01-19 21:25:34.673 IST [30229] LOG: checkpoint starting: end-of-recovery immediate wait 2022-01-19 21:25:34.713 IST [30229] LOG: checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.003 s, sync=0.001 s, total=0.041 s; sync files=2, longest=0.001 s, average=0.001 s; distance=0 kB, estimate=0 kB 2022-01-19 21:25:34.718 IST [30205] LOG: database system is ready to accept connections Observation - if we change server-compression method to lz4 from zstd then it is NOT happening. [edb@centos7tushar bin]$ ./pg_basebackup -t server:/tmp/ycc1 --server-compression=lz4 -v -Xnone pg_basebackup: initiating base backup, waiting for checkpoint to complete 2022-01-19 21:27:51.642 IST [30229] LOG: checkpoint starting: force wait 2022-01-19 21:27:51.687 IST [30229] LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 1 removed, 0 recycled; write=0.001 s, sync=0.001 s, total=0.046 s; sync
Re: Replace uses of deprecated Python module distutils.sysconfig
I wrote: > Anyway, based on these results, we might have better luck switching to > sysconfig after we start forcing python3. On the other hand, that answer is not back-patchable, and we surely need a back-patchable fix, because people will try to build the back branches against newer pythons. Based on the buildfarm results so far, the problem can be described as "some installations say /usr/local when they should have said /usr". I experimented with the attached delta patch and it fixes the problem on my Debian 9 image. (I don't know Python, so there may be a better way to do this.) We'd have to also bump the minimum 3.x version to 3.2, but that seems very unlikely to bother anyone. regards, tom lane diff --git a/config/python.m4 b/config/python.m4 index 8ca1eaa64b..c65356c6ac 100644 --- a/config/python.m4 +++ b/config/python.m4 @@ -56,13 +56,20 @@ AC_MSG_RESULT([$python_configdir]) AC_MSG_CHECKING([Python include directories]) python_includespec=`${PYTHON} -c " -import sysconfig -a = '-I' + sysconfig.get_path('include') -b = '-I' + sysconfig.get_path('platinclude') +import sysconfig, os.path +a = sysconfig.get_path('include') +b = sysconfig.get_path('platinclude') +# Some versions of sysconfig report '/usr/local/include' +# when they should have said '/usr/include' +if not os.path.exists(a + '/Python.h'): +aalt = a.replace('/usr/local/', '/usr/', 1) +if os.path.exists(aalt + '/Python.h'): +a = aalt +b = b.replace('/usr/local/', '/usr/', 1) if a == b: -print(a) +print('-I' + a) else: -print(a + ' ' + b)"` +print('-I' + a + ' -I' + b)"` if test "$PORTNAME" = win32 ; then python_includespec=`echo $python_includespec | sed 's,[[\]],/,g'` fi diff --git a/configure b/configure index 9c856cb1d5..f88db9467d 100755 --- a/configure +++ b/configure @@ -10370,13 +10370,20 @@ $as_echo "$python_configdir" >&6; } { $as_echo "$as_me:${as_lineno-$LINENO}: checking Python include directories" >&5 $as_echo_n "checking Python include directories... " >&6; } python_includespec=`${PYTHON} -c " -import sysconfig -a = '-I' + sysconfig.get_path('include') -b = '-I' + sysconfig.get_path('platinclude') +import sysconfig, os.path +a = sysconfig.get_path('include') +b = sysconfig.get_path('platinclude') +# Some versions of sysconfig report '/usr/local/include' +# when they should have said '/usr/include' +if not os.path.exists(a + '/Python.h'): +aalt = a.replace('/usr/local/', '/usr/', 1) +if os.path.exists(aalt + '/Python.h'): +a = aalt +b = b.replace('/usr/local/', '/usr/', 1) if a == b: -print(a) +print('-I' + a) else: -print(a + ' ' + b)"` +print('-I' + a + ' -I' + b)"` if test "$PORTNAME" = win32 ; then python_includespec=`echo $python_includespec | sed 's,[\],/,g'` fi
Re: A qsort template
On Tue, Jan 18, 2022 at 9:58 PM Peter Geoghegan wrote: > > On Tue, Jan 18, 2022 at 6:39 PM John Naylor > wrote: > > Editorializing the null position in queries is not very common in my > > experience. Not null is interesting since it'd be trivial to pass > > constant false to the same Apply[XYZ]SortComparator() and let the > > compiler remove all those branches for us. On the other hand, those > > branches would be otherwise predicted well, so it might make little or > > no difference. > > If you were going to do this, maybe you could encode NULL directly in > an abbreviated key. I think that that could be made to work if it was > limited to opclasses with abbreviated keys encoded as unsigned > integers. Just a thought. Now that you mention that, I do remember reading about this technique in the context of b-tree access, so it does make sense. If we had that capability, it would be trivial to order the nulls how we want while building the sort tuple datums, and the not-null case would be handled automatically. And have a smaller code footprint, I think. -- John Naylor EDB: http://www.enterprisedb.com
Re: refactoring basebackup.c
On Wed, Jan 19, 2022 at 7:16 AM Dipesh Pandit wrote: > I have added support for decompressing a gzip compressed tar file > at client. pg_basebackup can enable server side compression for > plain format backup with this change. > > Added a gzip extractor which decompresses the compressed archive > and forwards it to the next streamer. I have done initial testing and > working on updating the test coverage. Cool. It's going to need some documentation changes, too. I don't like the way you coded this in CreateBackupStreamer(). I would like the decision about whether to use bbstreamer_gzip_extractor_new(), and/or throw an error about not being able to parse an archive, to based on the file type i.e. "did we get a .tar.gz file?" rather than on whether we asked for server-side compression. Notice that the existing logic checks whether we actually got a .tar file from the server rather than assuming that's what must have happened. As a matter of style, I don't think it's good for the only thing inside of an "if" statement to be another "if" statement. The two could be merged, but we also don't want to have the "if" conditional be too complex. I am imagining that this should end up saying something like if (must_parse_archive && !is_tar && !is_tar_gz) { pg_log_error(... + * "windowBits" must be greater than or equal to "windowBits" value + * provided to deflateInit2 while compressing. It would be nice to clarify why we know the value we're using is safe. Maybe we're using the maximum possible value, in which case you could just add that to the end of the comment: "...so we use the maximum possible value for safety." +/* + * End of the stream, if there is some pending data in output buffers then + * we must forward it to next streamer. + */ +if (res == Z_STREAM_END) { +bbstreamer_content(mystreamer->base.bbs_next, member, mystreamer->base.bbs_buffer.data, +mystreamer->bytes_written, context); +} Uncuddle the brace. It probably doesn't make much difference, but I would be inclined to do the final flush in bbstreamer_gzip_extractor_finalize() rather than here. That way we rely on our own notion of when there's no more input data rather than zlib's notion. Probably terrible things are going to happen if those two ideas don't match up but there might be some other compression algorithm that doesn't return a distinguishing code at end-of-stream. Such an algorithm would have to take care of any leftover data in the finalize function, so I think we should do that here too, so the code can be similar in all cases. Perhaps we should move all the gzip stuff to a new file bbstreamer_gzip.c. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Refactoring of compression options in pg_basebackup
On 2022-Jan-19, Michael Paquier wrote: > + printf(_(" -Z, --compress=[{gzip,none}[:LEVEL] or [LEVEL]\n" > + " compress tar output with > given compression method or level\n")); Note there is an extra [ before the {gzip bit. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "I suspect most samba developers are already technically insane... Of course, since many of them are Australians, you can't tell." (L. Torvalds)
Re: Remove redundant MemoryContextSwith in BeginCopyFrom
On Wed, Jan 19, 2022 at 7:51 PM Japin Li wrote: > > > Hi, hackers > > When I read the code of COPY ... FROM ..., I find there is a redundant > MemoryContextSwith() in BeginCopyFrom(). In BeginCopyFrom, it creates > a COPY memory context and then switches to it, in the middle of this > function, it switches to the oldcontext and immediately switches back to > COPY memory context, IMO, this is redundant, and can be removed safely. +1. It looks like a thinko from c532d15d. There's no code in between, so switching to oldcontext doesn't make sense. MemoryContextSwitchTo(oldcontext); << no code here >> oldcontext = MemoryContextSwitchTo(cstate->copycontext); I think we also need to remove MemoryContextSwitchTo(oldcontext); at the end of BeginCopyTo in copyto.c, because we are not changing memory contexts in between. diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index 34c8b80593..5182048e4f 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -742,8 +742,6 @@ BeginCopyTo(ParseState *pstate, cstate->bytes_processed = 0; - MemoryContextSwitchTo(oldcontext); - return cstate; } Regards, Bharath Rupireddy.
Re: Remove redundant MemoryContextSwith in BeginCopyFrom
On Wed, Jan 19, 2022 at 11:21 AM Japin Li wrote: > > > Hi, hackers > > When I read the code of COPY ... FROM ..., I find there is a redundant > MemoryContextSwith() in BeginCopyFrom(). In BeginCopyFrom, it creates > a COPY memory context and then switches to it, in the middle of this > function, it switches to the oldcontext and immediately switches back to > COPY memory context, IMO, this is redundant, and can be removed safely. > LGTM (it passed all regression without any issue) -- Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
Re: New developer papercut - Makefile references INSTALL
Daniel Gustafsson writes: > I ended up pushing this with the URL in place as there IMO was consensus in > the > thread for including it. We could if we want to update it to point to v15 > docs > once we branch off, but anything more than that is probably in the diminishing > returns territory in terms of effort involved. I think pointing at devel is fine, since this text is mostly aimed at developers or would-be developers anyway. regards, tom lane
Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
On Tue, Jan 18, 2022 at 1:48 PM Peter Geoghegan wrote: > That's what I was reacting to -- it had nothing to do with any > concerns you may have had. I wasn't thinking about long-idle cursors > at all. I was defending myself, because I was put in a position where > I had to defend myself. I don't think I've said anything on this thread that is an attack on you. I am getting pretty frustrated with the tenor of the discussion, though. I feel like you're the one attacking me, and I don't like it. > I still don't understand why you think that my idea (not yet > implemented) of making FreezeLimit into a backstop (making it > autovacuum_freeze_max_age/2 or something) and relying on the new > "early freezing" criteria for almost everything is going to make the > situation worse in this scenario with long idle cursors. It's intended > to make it better. I just don't understand how I haven't been able to convey my concern here by now. I've already written multiple emails about it. If none of them were clear enough for you to understand, I'm not sure how saying the same thing over again can help. When I say I've already written about this, I'm referring specifically to the following: - https://postgr.es/m/ca+tgmobkjm9bszr3eteb6mjdlkwxkk5zxx0xhlf-w9kugvo...@mail.gmail.com in the second-to-last paragraph, beginning with "I don't really see" - https://www.postgresql.org/message-id/CA%2BTgmoaGoZ2wX6T4sj0eL5YAOQKW3tS8ViMuN%2BtcqWJqFPKFaA%40mail.gmail.com in the second paragraph beginning with "Because waiting on a lock" - https://www.postgresql.org/message-id/CA%2BTgmoZYri_LUp4od_aea%3DA8RtjC%2B-Z1YmTc7ABzTf%2BtRD2Opw%40mail.gmail.com in the paragraph beginning with "That's the part I'm not sure I believe." For all of that, I'm not even convinced that you're wrong. I just think you might be wrong. I don't really know. It seems to me however that you're understating the value of waiting, which I've tried to explain in the above places. Waiting does have the very real disadvantage of starving the rest of the system of the work that autovacuum worker would have been doing, and that's why I think you might be right. However, there are cases where waiting, and only waiting, gets the job done. If you're not willing to admit that those cases exist, or you think they don't matter, then we disagree. If you admit that they exist and think they matter but believe that there's some reason why increasing FreezeLimit can't cause any damage, then either (a) you have a good reason for that belief which I have thus far been unable to understand or (b) you're more optimistic about the proposed change than can be entirely justified. > My sense is that there are very few apps that are hopelessly incapable > of advancing relfrozenxid from day one. I find it much easier to > believe that users that had this experience got away with it for a > very long time, until their luck ran out, somehow. I would like to > minimize the chance of that ever happening, to the extent that that's > possible within the confines of the basic heapam/vacuumlazy.c > invariants. I agree with the idea that most people are OK at the beginning and then at some point their luck runs out and catastrophe strikes. I think there are a couple of different kinds of catastrophe that can happen. For instance, somebody could park a cursor in the middle of a table someplace and leave it there until the snow melts. Or, somebody could take a table lock and sit on it forever. Or, there could be a corrupted page in the table that causes VACUUM to error out every time it's reached. In the second and third situations, it doesn't matter a bit what we do with FreezeLimit, but in the first one it might. If the user is going to leave that cursor sitting there literally forever, the best solution is to raise FreezeLimit as high as we possibly can. The system is bound to shut down due to wraparound at some point, but we at least might as well vacuum other stuff while we're waiting for that to happen. On the other hand if that user is going to close that cursor after 10 minutes and open a new one in the same place 10 seconds later, the best thing to do is to keep FreezeLimit as low as possible, because the first time we wait for the pin to be released we're guaranteed to advance relfrozenxid within 10 minutes, whereas if we don't do that we may keep missing the brief windows in which no cursor is held for a very long time. But we have absolutely no way of knowing which of those things is going to happen on any particular system, or of estimating which one is more common in general. -- Robert Haas EDB: http://www.enterprisedb.com
Re: [ERROR] Copy from CSV fails due to memory error.
On 1/19/22 14:01, Kostas Chasialis wrote: Hey. I am facing an issue when I try to run the following command COPY FROM WITH DELIMITER E',’; This file, is rather large, it's around 178GBs. When I try to run this COPY command I get the following error: ERROR: out of memory DETAIL: Failed on request of size 2048 in memory context "AfterTriggerEvents". CONTEXT: COPY ssbm300_lineorder, line 50796791 Clearly a memory allocation function is failing but I have no clue how to fix it. I have tried experimenting with shared_buffers value in postgresql.conf file but after searching a bit I quickly realized that I do not know what I am doing there so I left it with default value. Same with work_mem value. Did you face this issue before? Can you help me resolve it? Well, it's clearly related to "after" triggers - do you have anything such triggers on the table? AFAIK it might be related to deferred constraints (like unique / foreign keys). Do you have anything like that? If yes, I guess the only solution is to make the constraints not deferred or split the copy into smaller chunks. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Pluggable toaster
Hi, >I'm not convinced that's universally true. Yes, I'm sure certain TOAST >implementations would benefit from tighter control over compression, but >does that imply compression and toast are redundant? I doubt that, >because we compress non-toasted types too, for example. And layering has >a value too, as makes it easier to replace the pieces. Not exactly. It is a mean to control TOAST itself without changing the core each time you want to change Toast strategy or method. Compression is just an example. And no Toasters are available without the patch proposed, there is the one and only. >Perhaps. My main point is that we should not be making too many radical >changes at once - it makes it much harder to actually get anything done. >So yeah, doing TOAST through IOT might be interesting, but I'd leave >that for a separate patch. That's why 4 distinct patches with incremental changes were proposed - 1) just new Toaster API with some necessary core changes required by the API; 2) default toaster routed via new API (but all it's functions are not affected and dummy toaster extension as an example); 3) 1+2+some refactoring and versioning; 4) extension module for bytea columns. Toast through IOT is a topic for discussion but does not seem to give a major advantage over existing storage method, according to tests. >It seems better to prevent such incompatible combinations and restrict >each toaster to just compatible data types, and the mapping table >(linking toaster and data types) seems a way to do that. To handle this case a validate function (toastervalidate_function) is proposed in the TsrRoutine structure. >If you have to implement custom toaster to implement custom compression >method, doesn't that make things more complex? You'd have to solve all >the issues for custom compression methods and also all issues for custom >toaster. Also, what if you want to just compress the column, not toast? Default compression is restricted to 2 compression methods, all other means require extensions. Also, the name Toaster is a little bit misleading because it intends that data is being sliced, but it is not always the case, to be toasted a piece of bread must not necessarily be sliced. Regards, -- Nikita Malakhov Postgres Professional https://postgrespro.ru/ On Tue, Jan 18, 2022 at 7:06 PM Tomas Vondra wrote: > > > On 1/18/22 15:56, Teodor Sigaev wrote: > > Hi! > > > >> Maybe doing that kind of compression in TOAST is somehow simpler, but > >> I don't see it. > > Seems, in ideal world, compression should be inside toaster. > > > > I'm not convinced that's universally true. Yes, I'm sure certain TOAST > implementations would benefit from tighter control over compression, but > does that imply compression and toast are redundant? I doubt that, > because we compress non-toasted types too, for example. And layering has > a value too, as makes it easier to replace the pieces. > > >> > >>> 2 Current toast storage stores chunks in heap accesses method and to > >>> provide fast access by toast id it makes an index. Ideas: > >>>- store chunks directly in btree tree, pgsql's btree already has an > >>> INCLUDE columns, so, chunks and visibility data will be stored > only > >>> in leaf pages. Obviously it reduces number of disk's access for > >>> "untoasting". > >>>- use another access method for chunk storage > >>> > >> > >> Maybe, but that probably requires more thought - e.g. btree requires > >> the values to be less than 1/3 page, so I wonder how would that play > >> with toasting of values. > > That's ok, because chunk size is 2000 bytes right now and its could be > > saved. > >> > > Perhaps. My main point is that we should not be making too many radical > changes at once - it makes it much harder to actually get anything done. > So yeah, doing TOAST through IOT might be interesting, but I'd leave > that for a separate patch. > > > > >> Seems you'd need a mapping table, to allow M:N mapping between types > >> and toasters, linking it to all "compatible" types. It's not clear to > >> me how would this work with custom data types, domains etc. > > If toaster will look into internal structure then it should know type's > > binary format. So, new custom types have a little chance to work with > > old custom toaster. Default toaster works with any types. > > The question is what happens when you combine data type with a toaster > that is not designed for that type. I mean, imagine you have a JSONB > toaster and you set it for a bytea column. Naive implementation will > just crash, because it'll try to process bytea as if it was JSONB. > > It seems better to prevent such incompatible combinations and restrict > each toaster to just compatible data types, and the mapping table > (linking toaster and data types) seems a way to do that. > > However, it seems toasters are either generic (agnostic to data types, > treating everything as bytea) or specialized. I doubt any specialized > toaster can
Remove redundant MemoryContextSwith in BeginCopyFrom
Hi, hackers When I read the code of COPY ... FROM ..., I find there is a redundant MemoryContextSwith() in BeginCopyFrom(). In BeginCopyFrom, it creates a COPY memory context and then switches to it, in the middle of this function, it switches to the oldcontext and immediately switches back to COPY memory context, IMO, this is redundant, and can be removed safely. -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd. diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index 0d6b34206a..7b3f5a84b8 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -1340,10 +1340,6 @@ BeginCopyFrom(ParseState *pstate, cstate->whereClause = whereClause; - MemoryContextSwitchTo(oldcontext); - - oldcontext = MemoryContextSwitchTo(cstate->copycontext); - /* Initialize state variables */ cstate->eol_type = EOL_UNKNOWN; cstate->cur_relname = RelationGetRelationName(cstate->rel);
Re: New developer papercut - Makefile references INSTALL
> On 18 Jan 2022, at 17:05, Peter Eisentraut > wrote: > On 18.01.22 16:51, Daniel Gustafsson wrote: >> I plan on applying the attached which address the feedback given. > > The indentation of the two INSTRUCTIONS= lines uses a different mix of tabs > and spaces, so it looks a bit weird depending on how you view it. > > It's also a bit strange that the single quotes are part of the value of > $INSTRUCTIONS rather than part of the fixed text. Fixed both of these, thanks! > The URL links to the "devel" version of the installation instructions, which > will not remain appropriate after release. I don't know how to fix that > without creating an additional maintenance point. Since README.git already > contains that link, I would leave off the web site business and just make the > change of the dynamically chosen file name. I ended up pushing this with the URL in place as there IMO was consensus in the thread for including it. We could if we want to update it to point to v15 docs once we branch off, but anything more than that is probably in the diminishing returns territory in terms of effort involved. -- Daniel Gustafsson https://vmware.com/
Re: Logical replication timeout problem
Hello Amit, If it takes little work for you, can you please send me a piece of code with the change needed to do the test Thanks Regards, Fabrice On Fri, Jan 14, 2022 at 1:03 PM Amit Kapila wrote: > On Fri, Jan 14, 2022 at 3:47 PM Fabrice Chapuis > wrote: > > > > If I can follow you, I have to make the following changes: > > > > No, not like that but we can try that way as well to see if that helps > to avoid your problem. Am, I understanding correctly even after > modification, you are seeing the problem. Can you try by calling > WalSndKeepaliveIfNecessary() instead of WalSndKeepalive()? > > -- > With Regards, > Amit Kapila. >
Re: Refactoring of compression options in pg_basebackup
On Tue, Jan 18, 2022 at 11:27 PM Michael Paquier wrote: > WFM. Attached is a patch that extends --compress to handle a method > with an optional compression level. Some extra tests are added to > cover all that. I think that this will reject something like --compress=nonetheless by telling you that 't' is not a valid separator. I think it would be better to code this so that we first identify the portion preceding the first colon, or the whole string if there is no colon. Then we check whether that part is a compression method that we recognize. If not, we complain. If so, we then check whatever is after the separator for validity - and this might differ by type. For example, we could then immediately reject none:4, and if in the future we want to allow lz4:fast3, we could. I think the code that handles the bare integer case should be at the top of the function and should return, because that code is short. Then the rest of the function doesn't need to be indented as deeply. "First check after the compression method" seems like it would be better written "First check for the compression method" or "First check the compression method". -- Robert Haas EDB: http://www.enterprisedb.com
Re: pgsql: Modify pg_basebackup to use a new COPY subprotocol for base back
On Tue, Jan 18, 2022 at 9:29 PM Tom Lane wrote: > Robert Haas writes: > > Here's a patch, based in part on some off-list discussion with Andres. > > I believe Andres has already confirmed that this fix works, but it > > wouldn't hurt if Tom wants to verify it also. > > WFM too --- at least, pg_basebackup's "make check" passes now. Thanks for checking. Committed. -- Robert Haas EDB: http://www.enterprisedb.com
Re: row filtering for logical replication
On Tue, Jan 18, 2022 at 10:23 AM Greg Nancarrow wrote: > > On Tue, Jan 18, 2022 at 2:31 PM Amit Kapila wrote: > > > > On Tue, Jan 18, 2022 at 8:41 AM Greg Nancarrow wrote: > > > > > > On Tue, Jan 18, 2022 at 2:31 AM houzj.f...@fujitsu.com > > > wrote: > > > > > > > > > (2) GetTopMostAncestorInPublication > > > > > Is there a reason why there is no "break" after finding a > > > > > topmost_relid? Why keep searching and potentially overwrite a > > > > > previously-found topmost_relid? If it's intentional, I think that a > > > > > comment should be added to explain it. > > > > > > > > The code was moved from get_rel_sync_entry, and was trying to get the > > > > last oid in the ancestor list which is published by the publication. Do > > > > you > > > > have some suggestions for the comment ? > > > > > > > > > > Maybe the existing comment should be updated to just spell it out like > > > that: > > > > > > /* > > > * Find the "topmost" ancestor that is in this publication, by getting the > > > * last Oid in the ancestors list which is published by the publication. > > > */ > > > > > > > I am not sure that is helpful w.r.t what Peter is looking for as that > > is saying what code is doing and he wants to know why it is so? I > > think one can understand this by looking at get_partition_ancestors > > which will return the top-most ancestor as the last element. I feel > > either we can say see get_partition_ancestors or maybe explain how the > > ancestors are stored in this list. > > > > (note: I asked the original question about why there is no "break", not Peter) > Okay. > Maybe instead, an additional comment could be added to the > GetTopMostAncestorInPublication function to say "Note that the > ancestors list is ordered such that the topmost ancestor is at the end > of the list". > I am fine with this and I see that Hou-San already used this in the latest version of patch. > Unfortunately the get_partition_ancestors function > currently doesn't explicitly say that the topmost ancestors are > returned at the end of the list (I guess you could conclude it by then > looking at get_partition_ancestors_worker code which it calls). > Also, this leads me to wonder if searching the ancestors list > backwards might be better here, and break at the first match? > I am not sure of the gains by doing that and anyway, that is a separate topic of discussion as it is an existing code. -- With Regards, Amit Kapila.
[ERROR] Copy from CSV fails due to memory error.
Hey. I am facing an issue when I try to run the following command COPY FROM WITH DELIMITER E',’; This file, is rather large, it's around 178GBs. When I try to run this COPY command I get the following error: ERROR: out of memory DETAIL: Failed on request of size 2048 in memory context "AfterTriggerEvents". CONTEXT: COPY ssbm300_lineorder, line 50796791 Clearly a memory allocation function is failing but I have no clue how to fix it. I have tried experimenting with shared_buffers value in postgresql.conf file but after searching a bit I quickly realized that I do not know what I am doing there so I left it with default value. Same with work_mem value. Did you face this issue before? Can you help me resolve it? Thanks in advance!
Re: WIN32 pg_import_system_collations
On Wed, Jan 19, 2022 at 10:53 AM Julien Rouhaud wrote: > > This version doesn't apply anymore: > > Thanks for the heads up. Please find attached a rebased patch. I have also rewritten some comments to address previous reviews, code and test remain the same. Regards, Juan José Santamaría Flecha v3-0001-WIN32-pg_import_system_collations.patch Description: Binary data
Re: refactoring basebackup.c
Hi, I have added support for decompressing a gzip compressed tar file at client. pg_basebackup can enable server side compression for plain format backup with this change. Added a gzip extractor which decompresses the compressed archive and forwards it to the next streamer. I have done initial testing and working on updating the test coverage. Note: Before applying the patch, please apply Robert's v11 version of the patches 0001 and 0002. Thanks, Dipesh From 737badce26ed05b5cdb64d9ffd1735fef9acbbf8 Mon Sep 17 00:00:00 2001 From: Dipesh Pandit Date: Wed, 19 Jan 2022 17:11:45 +0530 Subject: [PATCH] Support for extracting gzip compressed archive pg_basebackup can support server side compression using gzip. In order to support plain format backup with option '-Fp' we need to add support for decompressing the compressed blocks at client. This patch addresses the extraction of gzip compressed blocks at client. --- src/bin/pg_basebackup/bbstreamer.h | 1 + src/bin/pg_basebackup/bbstreamer_file.c | 175 src/bin/pg_basebackup/pg_basebackup.c | 58 +-- 3 files changed, 225 insertions(+), 9 deletions(-) diff --git a/src/bin/pg_basebackup/bbstreamer.h b/src/bin/pg_basebackup/bbstreamer.h index fc88b50..270b0df 100644 --- a/src/bin/pg_basebackup/bbstreamer.h +++ b/src/bin/pg_basebackup/bbstreamer.h @@ -205,6 +205,7 @@ extern bbstreamer *bbstreamer_extractor_new(const char *basepath, const char *(*link_map) (const char *), void (*report_output_file) (const char *)); +extern bbstreamer *bbstreamer_gzip_extractor_new(bbstreamer *next); extern bbstreamer *bbstreamer_tar_parser_new(bbstreamer *next); extern bbstreamer *bbstreamer_tar_terminator_new(bbstreamer *next); extern bbstreamer *bbstreamer_tar_archiver_new(bbstreamer *next); diff --git a/src/bin/pg_basebackup/bbstreamer_file.c b/src/bin/pg_basebackup/bbstreamer_file.c index 77ca222..350af1d 100644 --- a/src/bin/pg_basebackup/bbstreamer_file.c +++ b/src/bin/pg_basebackup/bbstreamer_file.c @@ -37,6 +37,13 @@ typedef struct bbstreamer_gzip_writer char *pathname; gzFile gzfile; } bbstreamer_gzip_writer; + +typedef struct bbstreamer_gzip_extractor +{ + bbstreamer base; + z_stream zstream; + size_t bytes_written; +} bbstreamer_gzip_extractor; #endif typedef struct bbstreamer_extractor @@ -76,6 +83,21 @@ const bbstreamer_ops bbstreamer_gzip_writer_ops = { .finalize = bbstreamer_gzip_writer_finalize, .free = bbstreamer_gzip_writer_free }; + +static void bbstreamer_gzip_extractor_content(bbstreamer *streamer, + bbstreamer_member *member, + const char *data, int len, + bbstreamer_archive_context context); +static void bbstreamer_gzip_extractor_finalize(bbstreamer *streamer); +static void bbstreamer_gzip_extractor_free(bbstreamer *streamer); +static void *gzip_palloc(void *opaque, unsigned items, unsigned size); +static void gzip_pfree(void *opaque, void *address); + +const bbstreamer_ops bbstreamer_gzip_extractor_ops = { + .content = bbstreamer_gzip_extractor_content, + .finalize = bbstreamer_gzip_extractor_finalize, + .free = bbstreamer_gzip_extractor_free +}; #endif static void bbstreamer_extractor_content(bbstreamer *streamer, @@ -349,6 +371,159 @@ get_gz_error(gzFile gzf) #endif /* + * Create a new base backup streamer that performs decompression of gzip + * compressed blocks. + */ +bbstreamer * +bbstreamer_gzip_extractor_new(bbstreamer *next) +{ +#ifdef HAVE_LIBZ + bbstreamer_gzip_extractor *streamer; + z_stream *zs; + + Assert(next != NULL); + + streamer = palloc0(sizeof(bbstreamer_gzip_extractor)); + *((const bbstreamer_ops **) >base.bbs_ops) = + _gzip_extractor_ops; + + streamer->base.bbs_next = next; + initStringInfo(>base.bbs_buffer); + + /* Initialize internal stream state for decompression */ + zs = >zstream; + zs->zalloc = gzip_palloc; + zs->zfree = gzip_pfree; + zs->next_out = (uint8 *) streamer->base.bbs_buffer.data; + zs->avail_out = streamer->base.bbs_buffer.maxlen; + + /* + * Data compression was initialized using deflateInit2 to request a gzip + * header. Similarly, we are using inflateInit2 to initialize data + * decompression. + * "windowBits" must be greater than or equal to "windowBits" value + * provided to deflateInit2 while compressing. + */ + if (inflateInit2(zs, 15 + 16) != Z_OK) + { + pg_log_error("could not initialize compression library"); + exit(1); + + } + + return >base; +#else + pg_log_error("this build does not support compression"); + exit(1); +#endif +} + +#ifdef HAVE_LIBZ +/* + * Decompress the input data to output buffer until we ran out of the input + * data. Each time the output buffer is full invoke bbstreamer_content to pass + * on the decompressed data to next streamer. + */ +static void +bbstreamer_gzip_extractor_content(bbstreamer *streamer, + bbstreamer_member *member, + const char *data, int len, + bbstreamer_archive_context context) +{ +
Re: [PATCH] Add reloption for views to enable RLS
Hi, On 1/19/22 09:30, Julien Rouhaud wrote: Hi, On Tue, Jan 18, 2022 at 04:16:53PM +0100, Christoph Heiss wrote: I've attached a v2 where I addressed the things you mentioned. This version unfortunately doesn't apply anymore: http://cfbot.cputube.org/patch_36_3466.log === Applying patches on top of PostgreSQL commit ID e0e567a106726f6709601ee7cffe73eb6da8084e === === applying patch ./0001-PATCH-v2-1-3-Add-new-boolean-reloption-security_invo.patch === applying patch ./0002-PATCH-v2-2-3-Add-regression-tests-for-new-security_i.patch patching file src/test/regress/expected/create_view.out Hunk #5 FAILED at 2019. Hunk #6 succeeded at 2056 (offset 16 lines). 1 out of 6 hunks FAILED -- saving rejects to file src/test/regress/expected/create_view.out.rej Could you send a rebased version? My bad - I attached a new version rebased on latest master. Thanks, Christoph HeissFrom b1b5a6c6bd63c509b3bcdef6d1e7a548413c2a9d Mon Sep 17 00:00:00 2001 From: Christoph Heiss Date: Tue, 18 Jan 2022 15:42:58 +0100 Subject: [PATCH 1/3] [PATCH v3 1/3] Add new boolean reloption security_invoker to views When this reloption is set to true, all references to the underlying tables will be checked against the invoking user rather than the view owner, as is currently implemented. Row level security must be enabled on the tables for this to take effect. Signed-off-by: Christoph Heiss --- src/backend/access/common/reloptions.c| 11 src/backend/nodes/copyfuncs.c | 1 + src/backend/nodes/equalfuncs.c| 1 + src/backend/nodes/outfuncs.c | 1 + src/backend/nodes/readfuncs.c | 1 + src/backend/optimizer/plan/subselect.c| 1 + src/backend/optimizer/prep/prepjointree.c | 1 + src/backend/rewrite/rewriteHandler.c | 1 + src/backend/utils/cache/relcache.c| 63 +-- src/include/nodes/parsenodes.h| 1 + src/include/utils/rel.h | 11 11 files changed, 65 insertions(+), 28 deletions(-) diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c index d592655258..beb170b5e6 100644 --- a/src/backend/access/common/reloptions.c +++ b/src/backend/access/common/reloptions.c @@ -140,6 +140,15 @@ static relopt_bool boolRelOpts[] = }, false }, + { + { + "security_invoker", + "View subquery in invoked within the current security context.", + RELOPT_KIND_VIEW, + AccessExclusiveLock + }, + false + }, { { "vacuum_truncate", @@ -1996,6 +2005,8 @@ view_reloptions(Datum reloptions, bool validate) static const relopt_parse_elt tab[] = { {"security_barrier", RELOPT_TYPE_BOOL, offsetof(ViewOptions, security_barrier)}, + {"security_invoker", RELOPT_TYPE_BOOL, + offsetof(ViewOptions, security_invoker)}, {"check_option", RELOPT_TYPE_ENUM, offsetof(ViewOptions, check_option)} }; diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c index 90b5da51c9..b171992ba8 100644 --- a/src/backend/nodes/copyfuncs.c +++ b/src/backend/nodes/copyfuncs.c @@ -2465,6 +2465,7 @@ _copyRangeTblEntry(const RangeTblEntry *from) COPY_NODE_FIELD(tablesample); COPY_NODE_FIELD(subquery); COPY_SCALAR_FIELD(security_barrier); + COPY_SCALAR_FIELD(security_invoker); COPY_SCALAR_FIELD(jointype); COPY_SCALAR_FIELD(joinmergedcols); COPY_NODE_FIELD(joinaliasvars); diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c index 06345da3ba..a832c5fefe 100644 --- a/src/backend/nodes/equalfuncs.c +++ b/src/backend/nodes/equalfuncs.c @@ -2766,6 +2766,7 @@ _equalRangeTblEntry(const RangeTblEntry *a, const RangeTblEntry *b) COMPARE_NODE_FIELD(tablesample); COMPARE_NODE_FIELD(subquery); COMPARE_SCALAR_FIELD(security_barrier); + COMPARE_SCALAR_FIELD(security_invoker); COMPARE_SCALAR_FIELD(jointype); COMPARE_SCALAR_FIELD(joinmergedcols); COMPARE_NODE_FIELD(joinaliasvars); diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c index 2b0236937a..883284ad0d 100644 --- a/src/backend/nodes/outfuncs.c +++ b/src/backend/nodes/outfuncs.c @@ -3261,6 +3261,7 @@ _outRangeTblEntry(StringInfo str, const RangeTblEntry *node) case RTE_SUBQUERY: WRITE_NODE_FIELD(subquery); WRITE_BOOL_FIELD(security_barrier); + WRITE_BOOL_FIELD(security_invoker); break; case RTE_JOIN: WRITE_ENUM_FIELD(jointype, JoinType); diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c index 3f68f7c18d..ad825bb27f 100644 --- a/src/backend/nodes/readfuncs.c +++ b/src/backend/nodes/readfuncs.c @@ -1444,6 +1444,7 @@ _readRangeTblEntry(void) case RTE_SUBQUERY: READ_NODE_FIELD(subquery); READ_BOOL_FIELD(security_barrier); + READ_BOOL_FIELD(security_invoker); break; case RTE_JOIN: READ_ENUM_FIELD(jointype, JoinType); diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c index 41bd1ae7d4..30c66b9c6d 100644 --- a/src/backend/optimizer/plan/subselect.c
Re: generic plans and "initial" pruning
On Wed, 19 Jan 2022 at 08:31, Amit Langote wrote: > > Maybe force all DDL, or just DDL that would cause safety issues, to > > update a hierarchy version number, so queries can tell whether they > > need to replan. Don't know, just looking for an O(1) solution. > > Yeah, it would be great if it would suffice to take a single lock on > the partitioned table mentioned in the query, rather than on all > elements of the partition tree added to the plan. AFAICS, ways to get > that are 1) Prevent modifying non-root partition tree elements, Can we reuse the concept of Strong/Weak locking here? When a DDL request is in progress (for that partitioned table), take all required locks for safety. When a DDL request is not in progress, take minimal locks knowing it is safe. We can take a single PartitionTreeModificationLock, nowait to prove that we do not need all locks. DDL would request the lock in exclusive mode. (Other mechanisms possible). -- Simon Riggshttp://www.EnterpriseDB.com/
Re: a misbehavior of partition row movement (?)
On Wed, Jan 19, 2022 at 6:26 PM Alvaro Herrera wrote: > On 2022-Jan-19, Amit Langote wrote: > > BTW, your tweaks patch added this: > > > > + * *inserted_tuple is the tuple that's effectively inserted; > > + * *inserted_destrel is the relation where it was inserted. > > + * These are only set on success. FIXME -- see what happens on > > the "do nothing" cases. > > > > If by "do nothing cases" you mean INSERT ON CONFLICT ... DO NOTHING, > > then I don't think it matters, because the caller in that case would > > be ExecModifyTable() which doesn't care about inserted_tuple and > > inserted_destrel. > > No, I meant a FOR EACH ROW trigger that does RETURN NULL to "abort" the > insertion. Ah, gotcha. > IIRC in non-partitioned cases it is possibly to break > referential integrity by using those. What I was wondering is whether > you can make this new code crash. insert_destrel would be left set to NULL, which means ExecCrossPartitionUpdateForeignKey() won't get called, because: * NULL insert_destrel means that the move failed to occur, that * is, the update failed, so no need to anything in that case. */ if (insert_destrel && resultRelInfo->ri_TrigDesc && resultRelInfo->ri_TrigDesc->trig_update_after_row) ExecCrossPartitionUpdateForeignKey(mtstate, estate, Moreover, trigger documentation warns of a "possibility of surprising outcomes" if BR triggers are present in partitions that are chosen as destinations of cross-partition updates: "Then all row-level BEFORE INSERT triggers are fired on the destination partition. The possibility of surprising outcomes should be considered when all these triggers affect the row being moved." I suppose the new code shouldn't need to take special care in such cases either. -- Amit Langote EDB: http://www.enterprisedb.com
Re: pg_upgrade should truncate/remove its logs before running
On Wed, Jan 19, 2022 at 05:13:18PM +0900, Michael Paquier wrote: > I have noticed a couple of incorrect things in the docs, and some > other things. It is a bit late here, so I may have missed a couple of > things but I'll look at this stuff once again in a couple of days. And the docs failed to build.. -- Michael From b5a22e5e641930a46c74622bbf42a00b0abdeb8b Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Wed, 19 Jan 2022 19:38:24 +0900 Subject: [PATCH v3] pg_upgrade: write log files and dumps in subdir.. The subdir must not exist when pg_upgrade is started. This avoids appending to logfiles which have pre-existing errors in them. And allows more easily cleaning up after pg_upgrade. --- src/bin/pg_upgrade/.gitignore | 2 - src/bin/pg_upgrade/Makefile | 4 +- src/bin/pg_upgrade/check.c | 12 ++-- src/bin/pg_upgrade/dump.c | 6 +- src/bin/pg_upgrade/exec.c | 5 +- src/bin/pg_upgrade/function.c | 3 +- src/bin/pg_upgrade/option.c | 29 +++-- src/bin/pg_upgrade/pg_upgrade.c | 105 src/bin/pg_upgrade/pg_upgrade.h | 8 ++- src/bin/pg_upgrade/server.c | 6 +- doc/src/sgml/ref/pgupgrade.sgml | 17 +- 11 files changed, 130 insertions(+), 67 deletions(-) diff --git a/src/bin/pg_upgrade/.gitignore b/src/bin/pg_upgrade/.gitignore index 2d3bfeaa50..939e50db6c 100644 --- a/src/bin/pg_upgrade/.gitignore +++ b/src/bin/pg_upgrade/.gitignore @@ -1,9 +1,7 @@ /pg_upgrade # Generated by test suite -/pg_upgrade_internal.log /delete_old_cluster.sh /delete_old_cluster.bat /reindex_hash.sql -/loadable_libraries.txt /log/ /tmp_check/ diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile index 44d06be5a6..e833a1c5f0 100644 --- a/src/bin/pg_upgrade/Makefile +++ b/src/bin/pg_upgrade/Makefile @@ -45,9 +45,7 @@ uninstall: clean distclean maintainer-clean: rm -f pg_upgrade$(X) $(OBJS) rm -rf delete_old_cluster.sh log/ tmp_check/ \ - loadable_libraries.txt reindex_hash.sql \ - pg_upgrade_dump_globals.sql \ - pg_upgrade_dump_*.custom pg_upgrade_*.log + reindex_hash.sql pg_upgrade_output.d/ # When $(MAKE) is present, make automatically infers that this is a # recursive make. which is not actually what we want here, as that diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c index 3d218c2ad2..019bcb6c7b 100644 --- a/src/bin/pg_upgrade/check.c +++ b/src/bin/pg_upgrade/check.c @@ -783,7 +783,8 @@ check_for_isn_and_int8_passing_mismatch(ClusterInfo *cluster) return; } - snprintf(output_path, sizeof(output_path), + snprintf(output_path, sizeof(output_path), "%s/%s", + log_opts.basedir, "contrib_isn_and_int8_pass_by_value.txt"); for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++) @@ -860,7 +861,8 @@ check_for_user_defined_postfix_ops(ClusterInfo *cluster) prep_status("Checking for user-defined postfix operators"); - snprintf(output_path, sizeof(output_path), + snprintf(output_path, sizeof(output_path), "%s/%s", + log_opts.basedir, "postfix_ops.txt"); /* Find any user defined postfix operators */ @@ -959,7 +961,8 @@ check_for_tables_with_oids(ClusterInfo *cluster) prep_status("Checking for tables WITH OIDS"); - snprintf(output_path, sizeof(output_path), + snprintf(output_path, sizeof(output_path), "%s/%s", + log_opts.basedir, "tables_with_oids.txt"); /* Find any tables declared WITH OIDS */ @@ -1214,7 +1217,8 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster) prep_status("Checking for user-defined encoding conversions"); - snprintf(output_path, sizeof(output_path), + snprintf(output_path, sizeof(output_path), "%s/%s", + log_opts.basedir, "encoding_conversions.txt"); /* Find any user defined encoding conversions */ diff --git a/src/bin/pg_upgrade/dump.c b/src/bin/pg_upgrade/dump.c index 953873fa01..b69b4f9569 100644 --- a/src/bin/pg_upgrade/dump.c +++ b/src/bin/pg_upgrade/dump.c @@ -22,9 +22,10 @@ generate_old_dump(void) /* run new pg_dumpall binary for globals */ exec_prog(UTILITY_LOG_FILE, NULL, true, true, "\"%s/pg_dumpall\" %s --globals-only --quote-all-identifiers " - "--binary-upgrade %s -f %s", + "--binary-upgrade %s -f \"%s/%s\"", new_cluster.bindir, cluster_conn_opts(_cluster), log_opts.verbose ? "--verbose" : "", + log_opts.dumpdir, GLOBALS_DUMP_FILE); check_ok(); @@ -52,9 +53,10 @@ generate_old_dump(void) parallel_exec_prog(log_file_name, NULL, "\"%s/pg_dump\" %s --schema-only --quote-all-identifiers " - "--binary-upgrade --format=custom %s --file=\"%s\" %s", + "--binary-upgrade --format=custom %s --file=\"%s/%s\" %s", new_cluster.bindir, cluster_conn_opts(_cluster), log_opts.verbose ? "--verbose" : "", + log_opts.dumpdir, sql_file_name, escaped_connstr.data); termPQExpBuffer(_connstr); diff --git a/src/bin/pg_upgrade/exec.c b/src/bin/pg_upgrade/exec.c
Re: using extended statistics to improve join estimates
Hi, On Tue, Jan 04, 2022 at 03:55:50PM -0800, Andres Freund wrote: > On 2022-01-01 18:21:06 +0100, Tomas Vondra wrote: > > Here's an updated patch, rebased and fixing a couple typos reported by > > Justin Pryzby directly. > > FWIW, cfbot reports a few compiler warnings: Also the patch doesn't apply anymore: http://cfbot.cputube.org/patch_36_3055.log === Applying patches on top of PostgreSQL commit ID 74527c3e022d3ace648340b79a6ddec3419f6732 === === applying patch ./0001-Estimate-joins-using-extended-statistics-20220101.patch patching file src/backend/optimizer/path/clausesel.c patching file src/backend/statistics/extended_stats.c Hunk #1 FAILED at 30. Hunk #2 succeeded at 102 (offset 1 line). Hunk #3 succeeded at 2619 (offset 9 lines). 1 out of 3 hunks FAILED -- saving rejects to file src/backend/statistics/extended_stats.c.rej
Re: row filtering for logical replication
On Wed, Jan 19, 2022 at 7:45 AM houzj.f...@fujitsu.com wrote: > > On Tues, Jan 18, 2022 8:35 PM Amit Kapila wrote: > > Attach the V67 patch set which address the above comments. > Some more comments and suggestions: = 1. Can we do slot initialization in maybe_send_schema() instead of introducing a new flag for it? 2. + * For updates if no old tuple, it means none of the replica identity + * columns changed and this would reduce to a simple update. We only need + * to evaluate the row filter for the new tuple. Is it possible with the current version of the patch? I am asking because, for updates, we now allow only RI columns in row filter, so do we need to evaluate the row filter in this case? I think ideally, we don't need to evaluate the row filter in this case as for updates only replica identity columns are allowed but users can use constant expressions in the row filter. So, we need to evaluate the row filter in this case as well. Is my understanding correct? 3. + /* If no filter found, clean up the memory and return */ + if (!has_filter) + { + if (entry->cache_expr_cxt != NULL) + MemoryContextDelete(entry->cache_expr_cxt); I think this clean-up needs to be performed when we set exprstate_valid to false. I have changed accordingly in the attached patch. Apart from the above, I have made quite a few changes in the code comments in the attached top-up patch, kindly review those and merge them into the main patch, if you are okay with it. -- With Regards, Amit Kapila. v67_changes_amit_1.patch Description: Binary data
Re: WIN32 pg_import_system_collations
Hi, On Mon, Dec 13, 2021 at 05:28:47PM +0100, Juan José Santamaría Flecha wrote: > On Mon, Dec 13, 2021 at 9:41 AM Juan José Santamaría Flecha < > juanjo.santama...@gmail.com> wrote: > > Per path tester. This version doesn't apply anymore: http://cfbot.cputube.org/patch_36_3450.log === Applying patches on top of PostgreSQL commit ID e0e567a106726f6709601ee7cffe73eb6da8084e === === applying patch ./v2-0001-WIN32-pg_import_system_collations.patch [...] patching file src/tools/msvc/vcregress.pl Hunk #1 succeeded at 153 (offset -1 lines). Hunk #2 FAILED at 170. 1 out of 2 hunks FAILED -- saving rejects to file src/tools/msvc/vcregress.pl.rej Could you send a rebased version? In the meantime I will switch the CF entry to Waiting on Author.
Re: a misbehavior of partition row movement (?)
On 2022-Jan-19, Amit Langote wrote: > BTW, your tweaks patch added this: > > + * *inserted_tuple is the tuple that's effectively inserted; > + * *inserted_destrel is the relation where it was inserted. > + * These are only set on success. FIXME -- see what happens on > the "do nothing" cases. > > If by "do nothing cases" you mean INSERT ON CONFLICT ... DO NOTHING, > then I don't think it matters, because the caller in that case would > be ExecModifyTable() which doesn't care about inserted_tuple and > inserted_destrel. No, I meant a FOR EACH ROW trigger that does RETURN NULL to "abort" the insertion. IIRC in non-partitioned cases it is possibly to break referential integrity by using those. What I was wondering is whether you can make this new code crash. -- Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/ "I can see support will not be a problem. 10 out of 10."(Simon Wittber) (http://archives.postgresql.org/pgsql-general/2004-12/msg00159.php)
Re: Support for NSS as a libpq TLS backend
> On 18 Jan 2022, at 17:37, Jacob Champion wrote: > > On Wed, 2021-12-15 at 23:10 +0100, Daniel Gustafsson wrote: >> I've attached a v50 which fixes the issues found by Joshua upthread, as well >> as >> rebases on top of all the recent SSL and pgcrypto changes. > > I'm currently tracking down a slot leak. When opening and closing large > numbers of NSS databases, at some point we appear to run out of slots > and then NSS starts misbehaving, even though we've closed all of our > context handles. Interesting, are you able to share a reproducer for this so I can assist in debugging it? -- Daniel Gustafsson https://vmware.com/
Re: Support tab completion for upper character inputs in psql
Hi, On Sat, Jan 15, 2022 at 01:51:26PM -0500, Tom Lane wrote: > > I spent some time looking at this patch. I'm not very happy with it, > for two reasons: > [...] On top of that the patch doesn't apply anymore: http://cfbot.cputube.org/patch_36_2979.log === Applying patches on top of PostgreSQL commit ID 5987feb70b5bbb1fc4e64d433f490df08d91dd45 === === applying patch ./v11-0001-Support-tab-completion-with-a-query-result-for-u.patch patching file src/bin/psql/t/010_tab_completion.pl Hunk #1 FAILED at 41. Hunk #2 succeeded at 150 (offset 1 line). 1 out of 2 hunks FAILED -- saving rejects to file src/bin/psql/t/010_tab_completion.pl.rej I'm switching the CF entry to Waiting on Author.
Re: Skipping logical replication transactions on subscriber side
On Wed, Jan 19, 2022 at 12:46 PM Masahiko Sawada wrote: > > On Wed, Jan 19, 2022 at 12:22 PM osumi.takami...@fujitsu.com > wrote: > > > > (6) apply_handle_stream_abort > > > > @@ -1209,6 +1300,13 @@ apply_handle_stream_abort(StringInfo s) > > > > logicalrep_read_stream_abort(s, , ); > > > > + /* > > +* We don't expect the user to set the XID of the transaction that > > is > > +* rolled back but if the skip XID is set, clear it. > > +*/ > > + if (MySubscription->skipxid == xid || MySubscription->skipxid == > > subxid) > > + clear_subscription_skip_xid(MySubscription->skipxid, > > InvalidXLogRecPtr, 0); > > + > > > > In my humble opinion, this still cares about subtransaction xid still. > > If we want to be consistent with top level transactions only, > > I felt checking MySubscription->skipxid == xid should be sufficient. > > I thought if we can clear subskipxid whose value has already been > processed on the subscriber with a reasonable cost it makes sense to > do that because it can reduce the possibility of the issue that XID is > wraparound while leaving the wrong in subskipxid. > I guess that could happen if the user sets some unrelated XID value. So, I think it should be okay to not clear this but we can add a comment in the code at that place that we don't clear subtransaction's XID as we don't support skipping individual subtransactions or something like that. -- With Regards, Amit Kapila.
Re: Showing I/O timings spent reading/writing temp buffers in EXPLAIN
Hi, On Tue, Nov 16, 2021 at 04:37:44PM +0900, Masahiko Sawada wrote: > > I've attached an updated patch. Please review it. It seems that the regression tests aren't entirely stable, per cfbot: https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/36/3298. The failures look like: diff -U3 /tmp/cirrus-ci-build/src/test/recovery/../regress/expected/explain.out /tmp/cirrus-ci-build/src/test/recovery/tmp_check/results/explain.out --- /tmp/cirrus-ci-build/src/test/recovery/../regress/expected/explain.out 2022-01-19 03:50:37.087931908 + +++ /tmp/cirrus-ci-build/src/test/recovery/tmp_check/results/explain.out 2022-01-19 03:57:41.013616137 + @@ -512,9 +512,10 @@ I/O Timings: temp read=N.N write=N.N -> Function Scan on generate_series (cost=N.N..N.N rows=N width=N) (actual time=N.N..N.N rows=N loops=N) I/O Timings: temp read=N.N write=N.N + I/O Timings: shared/local read=N.N Planning Time: N.N ms Execution Time: N.N ms -(6 rows) +(7 rows) select explain_filter('explain (analyze, buffers, format json) select count(*) from generate_series(1, 10)'); explain_filter I don't see any obvious error in the code, so I'm wondering if it's simply the result of populating the cache with generate_series() info.