date:20220119

Re: Use generation context to speed up tuplesorts

2022-01-19 Thread Ronan Dunklau

Le jeudi 20 janvier 2022, 02:31:15 CET David Rowley a écrit :
> On Tue, 18 Jan 2022 at 19:45, Julien Rouhaud  wrote:
> > I'm also a bit confused about which patch(es) should actually be reviewed
> > in that thread.  My understanding is that the original patch might not be
> > relevant anymore but I might be wrong.  The CF entry should probably be
> > updated to clarify that, with an annotation/ title change / author update
> > or something.
> > 
> > In the meantime I will switch the entry to Waiting on Author.
> 
> I think what's going on here is a bit confusing. There's quite a few
> patches on here that aim to do very different things.
> 
> The patch I originally proposed was just to make tuplesort use
> generation memory contexts. This helps speed up tuplesort because
> generation contexts allow palloc() to be faster than the standard
> allocator. Additionally, there's no power-of-2 wastage with generation
> contexts like there is with the standard allocator. This helps fit
> more tuples on fewer CPU cache lines.
> 
> I believe Andres then mentioned that the fixed block size for
> generation contexts might become a problem. With the standard
> allocator the actual size of the underlying malloc() grows up until a
> maximum size.  With generation contexts this is always the same, so we
> could end up with a very large number of blocks which means lots of
> small mallocs which could end up slow.  Tomas then posted a series of
> patches to address this.
> 
> I then posted another patch that has the planner make a choice on the
> size of the blocks to use for the generation context based on the
> tuple width and number of tuple estimates. My hope there was to
> improve performance further by making a better choice for how large to
> make the blocks in the first place.  This reduces the need for Tomas'
> patch, but does not eliminate the need for it.
> 
> As of now, I still believe we'll need Tomas' patches to allow the
> block size to grow up to a maximum size.  I think those patches are
> likely needed before we think about making tuplesort use generation
> contexts.  The reason I believe this is that we don't always have good
> estimates for the number of tuples we're going to sort.  nodeSort.c is
> fairly easy as it just fetches tuples once and then sorts them.  The
> use of tuplesort for nodeIncrementalSort.c is much more complex as
> we'll likely be performing a series of smaller sorts which are much
> harder to get size estimates for. This means there's likely no magic
> block size that we can use for the generation context that's used to
> store the tuples in tuplesort.  This is also the case for order by
> aggregates and probably most other usages of tuplesort.
> 

You left out the discussion I started about glibc's malloc specific behaviour.

I tried to demonstrate that a big part of the performance gain you were seeing 
were not in fact due to our allocator by itself, but by the way different block 
sizes allocations interact with this malloc implementation and how it handles 
releasing memory back to the system. I also tried to show that we can give 
DBAs more control over how much memory to "waste" as the price for faster 
allocations.

I agree that the generation context memory manager is useful in the tuplesort 
context, if only for the fact that we fall back to disk less quickly due to 
the reduced wastage of memory, but I would be wary of drawing conclusions too 
quickly based on a specific malloc implementation which shows threshold 
effects, 
which are compounded by our growing block sizes code in general.

I have on my TODO list to run the same kind of benchmarks using jemalloc, to 
better isolate the performance gains expected from the generation allocator 
itself from the side effect of avoiding glibc's pattern of releasing memory 
back to the kernel too quickly. 

Julien, sorry for the confusion  of adding that discussion and those patches 
to the same thread, but I don't really see a way of dissociating those two 
aspects.

-- 
Ronan Dunklau

RE: Support tab completion for upper character inputs in psql

2022-01-19 Thread tanghy.f...@fujitsu.com

On Sunday, January 16, 2022 3:51 AM, Tom Lane  said:
> Peter Eisentraut  writes:
> > The rest of the patch seems ok in principle, since AFAICT enums are the
> > only query result in tab-complete.c that are not identifiers and thus
> > subject to case issues.
> 
> I spent some time looking at this patch.  I'm not very happy with it,
> for two reasons:
> 
> 1. The downcasing logic in the patch bears very little resemblance
> to the backend's actual downcasing logic, which can be found in
> src/backend/parser/scansup.c's downcase_identifier().  Notably,
> the patch's restriction to only convert all-ASCII strings seems
> indefensible, because that's not how things really work.  I fear
> we can't always exactly duplicate the backend's behavior, because
> it's dependent on the server's locale and encoding; but I think
> we should at least get it right in the common case where psql is
> using the same locale and encoding as the server.

Thanks for your suggestion, I removed ASCII strings check function
and added single byte encoding check just like downcase_identifier.
Also added PGCLIENTENCODING setting in the test script to make 
test cases pass.
Now the patch supports tab-completion with none-quoted upper characters
available when client encoding is in single byte.

> 2. I don't think there's been much thought about the larger picture
> of what is to be accomplished.  Right now, we successfully
> tab-complete inputs that are prefixes of the canonical spelling (per
> quote_identifier) of the object's name, and don't try at all for
> non-canonical spellings.  I'm on board with trying to allow some of
> the latter but I'm not sure that this patch represents much forward
> progress.  To be definite about it, suppose we have a DB containing
> just two tables whose names start with "m", say mytab and mixedTab.
> Then:
> 
> (a) m immediately completes mytab, ignoring mixedTab
> 
> (b) "m immediately completes "mixedTab", ignoring mytab
> 
> (c) "my fails to find anything
> 
> (d) mi fails to find anything
> 
> (e) M fails to find anything
> 
> This patch proposes to improve case (e), but to my taste cases (a)
> through (c) are much bigger problems.  It'd be nice if (d) worked too
> --- that'd require injecting a double-quote where the user had not
> typed one, but we already do the equivalent thing with single-quotes
> for file names, so why not?  (Although after fighting with readline
> yesterday to try to get it to handle single-quoted enum labels sanely,
> I'm not 100% sure if (d) is possible.)
> 
> Also, even for case (e), what we have with this patch is that it
> immediately completes mytab, ignoring mixedTab.  Is that what we want?
> Another example is that miX fails to find anything, which seems
> like a POLA violation given that mY completes to mytab.
>
> I'm not certain how many of these alternatives can be supported
> without introducing ambiguity that wasn't there before (which'd
> manifest as failing to complete in cases where the existing code
> chooses an alternative just fine).  But I really don't like the
> existing behavior for (b) and (c) --- I should be able to spell
> a name with double quotes if I want, without losing completion
> support.

You are right, it's more convenient in that way.
I haven't thought about it before. By now, the patch suppose:
If user needs to type a table with name in upper character, 
they should input the double quotes by themselves. If the double 
quote is input by a user, only table name with upper character could be 
searched.

I may try to implement as you expected but it seems not so easy. 
(as you said, without introducing ambiguity that wasn't there before)
I'd appreciate if someone could give me a hint/hand on this.

> BTW, another thing that maybe we should think about is how this
> interacts with the pattern matching capability in \d and friends.
> If people can tab-complete non-canonical spellings, they might
> expect the same spellings to work in \d.  I don't say that this
> patch has to fix that, but we might want to look and be sure we're
> not painting ourselves into a corner (especially since I see
> that we already perform tab-completion in that context).

Yes. Agreed, if we solve the previous problem, 
meta-command tab completion should also be considered.

Regards,
Tang


v12-0001-Support-tab-completion-with-a-query-result-for-u.patch
Description:  v12-0001-Support-tab-completion-with-a-query-result-for-u.patch

Re: Refactoring of compression options in pg_basebackup

2022-01-19 Thread Michael Paquier

On Wed, Jan 19, 2022 at 08:35:23AM -0500, Robert Haas wrote:
> I think that this will reject something like --compress=nonetheless by
> telling you that 't' is not a valid separator. I think it would be
> better to code this so that we first identify the portion preceding
> the first colon, or the whole string if there is no colon. Then we
> check whether that part is a compression method that we recognize. If
> not, we complain.

Well, if no colon is specified, we still need to check if optarg
is a pure integer if it does not match any of the supported methods,
as --compress=0 should be backward compatible with no compression and
--compress=1~9 should imply gzip, no?

> If so, we then check whatever is after the separator
> for validity - and this might differ by type. For example, we could
> then immediately reject none:4, and if in the future we want to allow
> lz4:fast3, we could.

Okay.

> I think the code that handles the bare integer case should be at the
> top of the function and should return, because that code is short.
> Then the rest of the function doesn't need to be indented as deeply.

Done this way, I hope.
--
Michael
From b5d84d8daac4c537657bfeef2ed0b5550106ffc2 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 20 Jan 2022 15:59:54 +0900
Subject: [PATCH v6] Rework the option set of pg_basebackup

---
 src/bin/pg_basebackup/pg_basebackup.c| 150 ---
 src/bin/pg_basebackup/t/010_pg_basebackup.pl |  46 +-
 doc/src/sgml/ref/pg_basebackup.sgml  |  21 ++-
 3 files changed, 194 insertions(+), 23 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 2a58be638a..ae003766cc 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -123,6 +123,7 @@ static bool showprogress = false;
 static bool estimatesize = true;
 static int	verbose = 0;
 static int	compresslevel = 0;
+static WalCompressionMethod compressmethod = COMPRESSION_NONE;
 static IncludeWal includewal = STREAM_WAL;
 static bool fastcheckpoint = false;
 static bool writerecoveryconf = false;
@@ -376,7 +377,8 @@ usage(void)
 	printf(_("  -X, --wal-method=none|fetch|stream\n"
 			 " include required WAL files with specified method\n"));
 	printf(_("  -z, --gzip compress tar output\n"));
-	printf(_("  -Z, --compress=0-9 compress tar output with given compression level\n"));
+	printf(_("  -Z, --compress={gzip,none}[:LEVEL] or [LEVEL]\n"
+			 " compress tar output with given compression method or level\n"));
 	printf(_("\nGeneral options:\n"));
 	printf(_("  -c, --checkpoint=fast|spread\n"
 			 " set fast or spread checkpointing\n"));
@@ -541,8 +543,7 @@ LogStreamerMain(logstreamer_param *param)
 	stream.do_sync);
 	else
 		stream.walmethod = CreateWalTarMethod(param->xlog,
-			  (compresslevel != 0) ?
-			  COMPRESSION_GZIP : COMPRESSION_NONE,
+			  compressmethod,
 			  compresslevel,
 			  stream.do_sync);
 
@@ -933,6 +934,84 @@ parse_max_rate(char *src)
 	return (int32) result;
 }
 
+/*
+ * Utility wrapper to parse the values specified for -Z/--compress.
+ * *methodres and *levelres will be optionally filled with values coming
+ * from the parsed results.
+ */
+static void
+parse_compress_options(char *src, WalCompressionMethod *methodres,
+	   int *levelres)
+{
+	char *sep;
+	int firstlen;
+	char *firstpart = NULL;
+
+	/* check if the option is split in two */
+	sep = strchr(src, ':');
+
+	/*
+	 * The first part of the option value could be a method name, or just
+	 * a level value.
+	 */
+	firstlen = (sep != NULL) ? (sep - src) : strlen(src);
+	firstpart = pg_malloc(firstlen + 1);
+	strncpy(firstpart, src, firstlen);
+	firstpart[firstlen] = '\0';
+
+	/*
+	 * Check if the first part of the string matches with a supported
+	 * compression method.
+	 */
+	if (pg_strcasecmp(firstpart, "gzip") != 0 &&
+		pg_strcasecmp(firstpart, "none") != 0)
+	{
+		/*
+		 * It does not match anything known, so check for the
+		 * backward-compatible case of only an integer, where the implied
+		 * compression method changes depending on the level value.
+		 */
+		if (!option_parse_int(firstpart, "-Z/--compress", 0,
+			  INT_MAX, levelres))
+			exit(1);
+
+		*methodres = (*levelres > 0) ?
+			COMPRESSION_GZIP : COMPRESSION_NONE;
+		return;
+	}
+
+	/* Supported method found. */
+	if (pg_strcasecmp(firstpart, "gzip") == 0)
+		*methodres = COMPRESSION_GZIP;
+	else if (pg_strcasecmp(firstpart, "none") == 0)
+		*methodres = COMPRESSION_NONE;
+
+	if (sep == NULL)
+	{
+		/*
+		 * The caller specified a method without a colon separator, so let
+		 * any subsequent checks assign a default.
+		 */
+		return;
+	}
+
+	/* Check the contents after the colon separator. */
+	sep++;
+	if (*sep == '\0')
+	{
+		pg_log_error("no compression level defined for method %s", firstpart);
+		exit(1);
+

RE: [Proposal] Add foreign-server health checks infrastructure

2022-01-19 Thread kuroda.hay...@fujitsu.com

Dear Zhihong,

Thank you for reviewing! And I have to apologize for the late.
I'll post new patchset later.

> +   UnregisterAllCheckingRemoteServersCallback();
> 
> UnregisterAllCheckingRemoteServersCallback
> -> UnregisterAllCheckingRemoteServersCallbacks

Will fix.

> +   CheckingRemoteServersCallbackItem *item;
> +   item = fdw_callbacks;
> 
> The above two lines can be combined.

Will fix.

> +UnregisterCheckingRemoteServersCallback(CheckingRemoteServersCallback
> callback,
> +   void *arg)
> 
> Is the above func called anywhere ?

Currently not, I just kept the API because of any other FDW extensions.
But I cannot find any use cases, so will remove.

> +   if (item->callback == callback && item->arg == arg)
> 
> Is comparing argument pointers stable ? What if the same argument is cloned
> ?

This part is no longer needed. Will remove.

> +   CallCheckingRemoteServersCallbacks();
> +
> +   if (remote_servers_connection_check_interval > 0)
> +   enable_timeout_after(CHECKING_REMOTE_SERVERS_TIMEOUT,
> +
>  remote_servers_connection_check_interval);
>
> Should the call to CallCheckingRemoteServersCallbacks() be placed under the
> if block checking remote_servers_connection_check_interval ?
> If remote_servers_connection_check_interval is 0, it seems the callbacks
> shouldn't be called.

Agreed. We force stopping timeout when the GUC sets to 0 in assign_hook,
so your saying is consistent. Will move.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: TAP test to cover "EndOfLogTLI != replayTLI" case

2022-01-19 Thread Amul Sul

On Tue, Jan 18, 2022 at 10:31 AM Kyotaro Horiguchi
 wrote:
>
> At Tue, 18 Jan 2022 12:25:15 +0800, Julien Rouhaud  wrote 
> in
> > Hi,
> >
> > On Mon, Jan 17, 2022 at 09:33:57PM +0800, Julien Rouhaud wrote:
> > >
> > > Thanks for the updated patch!  Note that thanks to Andres and Thomas 
> > > work, you
> > > can now easily rely on the exact same CI than the cfbot on your own github
> > > repository, if you need to debug something on a platform you don't have 
> > > access
> > > to.  You can check the documentation at [1] for more detail on how to 
> > > setup the
> > > CI.
> >
> > The cfbot reports that the patch still fails on Windows but also failed on
> > macos with the same error: https://cirrus-ci.com/task/5655777858813952:
> >
> > [14:20:43.950] #   Failed test 'check standby content before timeline 
> > switch 0/500FAF8'
> > [14:20:43.950] #   at t/003_recovery_targets.pl line 239.
> > [14:20:43.950] #  got: '6000'
> > [14:20:43.950] # expected: '7000'
> > [14:20:43.950] # Looks like you failed 1 test of 10.
> >
> > I'm switching the CF entry to Waiting on Author.
>
> The most significant advantages of the local CI setup are
>
>  - CI immediately responds to push.
>
>  - You can dirty the code with additional logging aid as much as you
>like to see closely what is going on.  It makes us hesitant to do
>the same on this ML:p
>

Indeed :)

I found the cause for the test failing on window -- is due to the
custom archive command setting which wasn't setting the correct window
archive directory path.

There is no option to choose a custom wal archive and restore patch in
the TAP test. The attach 0001 patch proposes the same, which enables
you to choose a custom WAL archive and restore directory. The only
concern I have is that $node->info() will print the wrong archive
directory path in that case, do we need to fix that?  We might need to
store archive_dir like base_dir, I wasn't sure about that.  Thoughts?
Comments?

Regards,
Amul
From 4bc968d3508e9f608fd572f40f39e2cb374d53f4 Mon Sep 17 00:00:00 2001
From: Amul Sul 
Date: Wed, 19 Jan 2022 23:22:02 -0500
Subject: [PATCH v4 1/3] Allow TAP tests to choose custom WAL archive and
 restore path.

---
 src/test/perl/PostgreSQL/Test/Cluster.pm | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index b7d4c24553..6a562db2ce 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -1033,10 +1033,13 @@ primary_conninfo='$root_connstr'
 # Internal routine to enable archive recovery command on a standby node
 sub enable_restoring
 {
-	my ($self, $root_node, $standby) = @_;
-	my $path = PostgreSQL::Test::Utils::perl2host($root_node->archive_dir);
+	my ($self, $root_node, $standby, $path) = @_;
 	my $name = $self->name;
 
+	# Default archive directory will be used if not specified.
+	$path = $root_node->archive_dir if (!defined($path));
+	$path = PostgreSQL::Test::Utils::perl2host($path);
+
 	print "### Enabling WAL restore for node \"$name\"\n";
 
 	# On Windows, the path specified in the restore command needs to use
@@ -1101,10 +1104,13 @@ sub set_standby_mode
 # Internal routine to enable archiving
 sub enable_archiving
 {
-	my ($self) = @_;
-	my $path   = PostgreSQL::Test::Utils::perl2host($self->archive_dir);
+	my ($self, $path) = @_;
 	my $name   = $self->name;
 
+	# Default archive directory will be used if not specified.
+	$path = $self->archive_dir if (!defined($path));
+	$path = PostgreSQL::Test::Utils::perl2host($path);
+
 	print "### Enabling WAL archiving for node \"$name\"\n";
 
 	# On Windows, the path specified in the restore command needs to use
-- 
2.18.0

From 6c730ee10b9e64d90982aa555283c9094a5fdb87 Mon Sep 17 00:00:00 2001
From: Amul Sul 
Date: Wed, 19 Jan 2022 23:25:07 -0500
Subject: [PATCH v4 2/3] TAP test for EndOfLogTLI

---
 src/test/recovery/t/003_recovery_targets.pl | 47 -
 1 file changed, 46 insertions(+), 1 deletion(-)

diff --git a/src/test/recovery/t/003_recovery_targets.pl b/src/test/recovery/t/003_recovery_targets.pl
index 24da78c0bc..b3f7076540 100644
--- a/src/test/recovery/t/003_recovery_targets.pl
+++ b/src/test/recovery/t/003_recovery_targets.pl
@@ -6,7 +6,7 @@ use strict;
 use warnings;
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
-use Test::More tests => 9;
+use Test::More tests => 10;
 use Time::HiRes qw(usleep);
 
 # Create and test a standby from given backup, with a certain recovery target.
@@ -182,3 +182,48 @@ $logfile = slurp_file($node_standby->logfile());
 ok( $logfile =~
 	  qr/FATAL: .* recovery ended before configured recovery target was reached/,
 	'recovery end before target reached is a fatal error');
+
+# Test to cover a case where that we are looking for WAL record that ought to be
+# in for e.g 00010005 we don't find it; instead we find
+# 00020005 because of

Re: standby recovery fails (tablespace related) (tentative patch and discussion)

2022-01-19 Thread Kyotaro Horiguchi

At Sun, 16 Jan 2022 12:43:03 +0800, Julien Rouhaud  wrote 
in 
> Other than that, I see that the TAP tests are failing on all the environment,
> due to Perl errors.  For instance:

Perl seems to have changed its behavior for undef hash.

It is said that "if (%undef_hash)" is false but actually it is true
and "keys %undef_hash" is 1..  Finally I had to make
backup_tablespaces() to return a hash reference.  The test of
pg_basebackup takes a backup with tar mode, which broke the test
infrastructure. Cluster::backup now skips symlink adjustment when the
backup contains "/base.tar".

I gave up testing on Windows on my own environment and used Cirrus CI.

# However, it works for confirmation of a established code.  TAT of CI
# is still long to do trial and error of unestablished code..

This version works for Unixen but still doesn't for Windows. I'm
searching for a fix for Windows.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 5f88a80b9a585ca611ab6424f035330a47b2449f Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 11 Nov 2021 20:42:00 +0900
Subject: [PATCH v15 1/3] Add tablespace support to TAP framework

TAP framework doesn't support nodes that have tablespaces.  Especially
backup and initialization from backups failed if the source node has
tablespaces.  This commit provides simple way to create tablespace
directories and allows backup routines to handle tablespaces.
---
 src/bin/pg_basebackup/t/010_pg_basebackup.pl |   2 +-
 src/test/perl/PostgreSQL/Test/Cluster.pm | 264 ++-
 src/test/perl/PostgreSQL/Test/Utils.pm   |  43 +++
 3 files changed, 306 insertions(+), 3 deletions(-)

diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index f0243f28d4..c139b5e000 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -257,7 +257,7 @@ $node->safe_psql('postgres',
 	"CREATE TABLESPACE tblspc1 LOCATION '$realTsDir';");
 $node->safe_psql('postgres',
 	"CREATE TABLE test1 (a int) TABLESPACE tblspc1;"
-	  . "INSERT INTO test1 VALUES (1234);");
+ . "INSERT INTO test1 VALUES (1234);");
 $node->backup('tarbackup2', backup_options => ['-Ft']);
 # empty test1, just so that it's different from the to-be-restored data
 $node->safe_psql('postgres', "TRUNCATE TABLE test1;");
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index b7d4c24553..d433ccf610 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -298,6 +298,64 @@ sub archive_dir
 
 =pod
 
+=item $node->tablespace_storage([, nocreate])
+
+Diretory to store tablespace directories.
+If nocreate is true, returns undef if not yet created.
+
+=cut
+
+sub tablespace_storage
+{
+	my ($self, $nocreate) = @_;
+
+	if (!defined $self->{_tsproot})
+	{
+		# tablespace is not used, return undef if nocreate is specified.
+		return undef if ($nocreate);
+
+		# create and remember the tablespae root directotry.
+		$self->{_tsproot} = PostgreSQL::Test::Utils::tempdir_short();
+	}
+
+	return $self->{_tsproot};
+}
+
+=pod
+
+=item $node->tablespaces()
+
+Returns a hash from tablespace OID to tablespace directory name.  For
+example, an oid 16384 pointing to /tmp/jWAhkT_fs0/ts1 is stored as
+$hash{16384} = "ts1".
+
+=cut
+
+sub tablespaces
+{
+	my ($self) = @_;
+	my $pg_tblspc = $self->data_dir . '/pg_tblspc';
+	my %ret;
+
+	# return undef if no tablespace is used
+	return undef if (!defined $self->tablespace_storage(1));
+
+	# collect tablespace entries in pg_tblspc directory
+	opendir(my $dir, $pg_tblspc);
+	while (my $oid = readdir($dir))
+	{
+		next if ($oid !~ /^([0-9]+)$/);
+		my $linkpath = "$pg_tblspc/$oid";
+		my $tsppath = PostgreSQL::Test::Utils::dir_readlink($linkpath);
+		$ret{$oid} = File::Basename::basename($tsppath);
+	}
+	closedir($dir);
+
+	return %ret;
+}
+
+=pod
+
 =item $node->backup_dir()
 
 The output path for backups taken with $node->backup()
@@ -313,6 +371,77 @@ sub backup_dir
 
 =pod
 
+=item $node->backup_tablespace_storage_path(backup_name)
+
+Returns tablespace location path for backup_name.
+Retuns the parent directory if backup_name is not given.
+
+=cut
+
+sub backup_tablespace_storage_path
+{
+	my ($self, $backup_name) = @_;
+	my $dir = $self->backup_dir . '/__tsps';
+
+	$dir .= "/$backup_name" if (defined $backup_name);
+
+	return $dir;
+}
+
+=pod
+
+=item $node->backup_create_tablespace_storage(backup_name)
+
+Create tablespace location directory for backup_name if not yet.
+Create the parent tablespace storage that holds all location
+directories if backup_name is not supplied.
+
+=cut
+
+sub backup_create_tablespace_storage
+{
+	my ($self, $backup_name) = @_;
+	my $dir = $self->backup_tablespace_storage_path($backup_name);
+
+	File::Path::make_path $dir if (! -d $dir);
+}
+
+=pod
+
+=item $node->backup_tablespaces(backup_name)
+
+Returns a reference to hash from tablespace OID to

Re: Map WAL segment files on PMEM as WAL buffers

2022-01-19 Thread Takashi Menjo

Hi Justin,

Here is patchset v8. It will have "make check-world" and Cirrus to
pass. Would you try this one?

The v8 squashes some patches in v7 into related ones, and adds the
following patches:

- v8-0003: Add wal_pmem_map to postgresql.conf.sample. It also helps v8-0011.

- v8-0009: Fix wrong handling of missingContrecPtr for
test/recovery/t/026 to pass. It is the cause of the error. Thanks for
your report.

- v8-0010 and v8-0011: Each of the two is for CI only. v8-0010 adds
--with-libpmem and v8-0011 enables "wal_pmem_map = on". Please note
that, unlike your suggestion, in my patchset PMEM_IS_PMEM_FORCE=1 will
be given as an environment variable in .cirrus.yml and "wal_pmem_map =
on" will be given by initdb.

Regards,
Takashi

-- 
Takashi Menjo 


v8-0001-Add-with-libpmem-option-for-PMEM-support.patch
Description: Binary data


v8-0002-Add-wal_pmem_map-to-GUC.patch
Description: Binary data


v8-0005-Map-WAL-segment-files-on-PMEM-as-WAL-buffers.patch
Description: Binary data


v8-0003-Add-wal_pmem_map-to-postgresql.conf.sample.patch
Description: Binary data


v8-0004-Export-InstallXLogFileSegment.patch
Description: Binary data


v8-0007-Update-document.patch
Description: Binary data


v8-0008-Preallocate-and-initialize-more-WAL-if-wal_pmem_m.patch
Description: Binary data


v8-0006-WAL-statistics-in-cases-of-wal_pmem_map-true.patch
Description: Binary data


v8-0009-Fix-wrong-handling-of-missingContrecPtr.patch
Description: Binary data


v8-0011-For-CI-only-Modify-initdb-for-wal_pmem_map-on.patch
Description: Binary data


v8-0010-For-CI-only-Setup-Cirrus-CI-for-with-libpmem.patch
Description: Binary data

Re: Refactoring of compression options in pg_basebackup

2022-01-19 Thread Michael Paquier

On Wed, Jan 19, 2022 at 12:50:44PM -0300, Alvaro Herrera wrote:
> Note there is an extra [ before the {gzip bit.

Thanks!
--
Michael


signature.asc
Description: PGP signature

Re: POC: GROUP BY optimization

2022-01-19 Thread Andrey V. Lepikhov

I keep work on this patch. Here is intermediate results.

On 7/22/21 3:58 AM, Tomas Vondra wrote:

in the first loop. Which seems pretty bogus - why would there be just
two groups? When processing the first expression, it's as if there was
one big "prev group" with all the tuples, so why not to just use nGroups
as it is?

I think, heapsort code seems very strange. Look into fallback case. It 
based on an output_tuples value. Maybe we should use nGroups value here, 
but based on a number of output_tuples?

> 1) I looked at the resources mentioned as sources the formulas came
> from, but I've been unable to really match the algorithm to them. The
> quicksort paper is particularly "dense", the notation seems to be very
> different, and none of the theorems seem like an obvious fit. Would be
> good to make the relationship clearer in comments etc.

Fixed (See attachment).

3) I'm getting a bit skeptical about the various magic coefficients that
are meant to model higher costs with non-uniform distribution. But
consider that we do this, for example:

tuplesPerPrevGroup = ceil(1.5 * tuplesPerPrevGroup / nGroups);

but then in the next loop we call estimate_num_groups_incremental and
pass this "tweaked" tuplesPerPrevGroup value to it. I'm pretty sure this
may have various strange consequences - we'll calculate the nGroups
based on the inflated value, and we'll calculate tuplesPerPrevGroup from
that again - that seems susceptible to "amplification".

We inflate tuplesPerPrevGroup by 50%, which means we'll get a higher
nGroups estimate in the next loop - but not linearly. An then we'll
calculate the inflated tuplesPerPrevGroup and estimated nGroup ...

Weighting coefficient '1.5' shows our desire to minimize the number of 
comparison operations on each next attribute of a pathkeys list.
Increasing this coef we increase a chance, that planner will order 
pathkeys by decreasing of uniqueness.

I think, it's ok.

--
regards,
Andrey Lepikhov
Postgres Professional
From fbc5e6709550f485a2153dda97ef805700717f23 Mon Sep 17 00:00:00 2001
From: Andrey Lepikhov 
Date: Fri, 24 Dec 2021 13:01:48 +0500
Subject: [PATCH] My fixes.

---
 src/backend/optimizer/path/costsize.c | 33 ---
 src/backend/optimizer/path/pathkeys.c |  9 
 src/backend/optimizer/plan/planner.c  |  8 ---
 3 files changed, 20 insertions(+), 30 deletions(-)

diff --git a/src/backend/optimizer/path/costsize.c 
b/src/backend/optimizer/path/costsize.c
index afb8ba54ea..70af9c91d5 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1845,19 +1845,17 @@ get_width_cost_multiplier(PlannerInfo *root, Expr *expr)
  * groups in the current pathkey prefix and the new pathkey), and the 
comparison
  * costs (which is data type specific).
  *
- * Estimation of the number of comparisons is based on ideas from two sources:
+ * Estimation of the number of comparisons is based on ideas from:
  *
- * 1) "Algorithms" (course), Robert Sedgewick, Kevin Wayne 
[https://algs4.cs.princeton.edu/home/]
+ * "Quicksort Is Optimal", Robert Sedgewick, Jon Bentley, 2002
+ * [https://www.cs.princeton.edu/~rs/talks/QuicksortIsOptimal.pdf]
  *
- * 2) "Quicksort Is Optimal For Many Equal Keys" (paper), Sebastian Wild,
- * arXiv:1608.04906v4 [cs.DS] 1 Nov 2017. [https://arxiv.org/abs/1608.04906]
- *
- * In term of that paper, let N - number of tuples, Xi - number of tuples with
- * key Ki, then the estimate of number of comparisons is:
+ * In term of that paper, let N - number of tuples, Xi - number of identical
+ * tuples with value Ki, then the estimate of number of comparisons is:
  *
  * log(N! / (X1! * X2! * ..))  ~  sum(Xi * log(N/Xi))
  *
- * In our case all Xi are the same because now we don't have any estimation of
+ * We assume all Xi the same because now we don't have any estimation of
  * group sizes, we have only know the estimate of number of groups (distinct
  * values). In that case, formula becomes:
  *
@@ -1865,7 +1863,7 @@ get_width_cost_multiplier(PlannerInfo *root, Expr *expr)
  *
  * For multi-column sorts we need to estimate the number of comparisons for
  * each individual column - for example with columns (c1, c2, ..., ck) we
- * can estimate that number of comparions on ck is roughly
+ * can estimate that number of comparisons on ck is roughly
  *
  * ncomparisons(c1, c2, ..., ck) / ncomparisons(c1, c2, ..., c(k-1))
  *
@@ -1874,10 +1872,10 @@ get_width_cost_multiplier(PlannerInfo *root, Expr *expr)
  *
  * N * sum( Fk * log(Gk) )
  *
- * Note: We also consider column witdth, not just the comparator cost.
+ * Note: We also consider column width, not just the comparator cost.
  *
  * NOTE: some callers currently pass NIL for pathkeys because they
- * can't conveniently supply the sort keys.  In this case, it will fallback to
+ * can't conveniently supply the sort keys. In this case, it will fallback to
  * simple comparison cost estimate.
  */
 static Cost
@@ -1925,13

Re: pg_upgrade should truncate/remove its logs before running

2022-01-19 Thread Michael Paquier

On Wed, Jan 19, 2022 at 09:59:14PM -0600, Justin Pryzby wrote:
> We require that the dir not exist, by testing if (mkdir()).
> So it's okay if someone specifies ../whatever or $CWD.

What I am scared of here is the use of rmtree() if we allow something
like that.  So we should either keep the removal code in its original
shape and allow such cases, or restrict the output path.  At the end,
something has to change.  My points are in favor of the latter because
I don't really see anybody doing the former.  You favor the former.
Now, we are not talking about a lot of code for any of these, anyway.
Perhaps we'd better wait for more opinions.
--
Michael


signature.asc
Description: PGP signature

Re: standby recovery fails (tablespace related) (tentative patch and discussion)

2022-01-19 Thread Kyotaro Horiguchi

At Mon, 17 Jan 2022 17:24:43 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Sun, 16 Jan 2022 12:43:03 +0800, Julien Rouhaud  wrote 
> in 
> > I'm not very familiar with windows, but maybe using strawberry perl instead
> > ([1]) would fix your problem?  I think it's also quite popular and is 
> > commonly
> > used to run pgBadger on Windows.
> 
> Thanks! I'll try it later.

Build is stopped by some unresolvable symbols.

Strawberry perl is 5.28, which doesn't expose new_ctype, new_collate
and new_numeric according the past discussion.  (Active perl is 5.32).

https://www.postgresql.org/message-id/20200501134711.08750c5f%40antares.wagner.home

However, the patch provided revealed other around 70 unresolved symbol
errors...

# Hmm. perl on CentOS 8 is 5.26..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: row filtering for logical replication

2022-01-19 Thread Peter Smith

On Thu, Jan 20, 2022 at 2:29 PM Amit Kapila  wrote:
>
> On Thu, Jan 20, 2022 at 7:51 AM Peter Smith  wrote:
> >
> > Here are some review comments for v68-0001.
> >
> >
> > 3. src/backend/replication/pgoutput/pgoutput.c - pgoutput_row_filter_init
> >
> > + /* If no filter found, clean up the memory and return */
> > + if (!has_filter)
> > + {
> > + if (entry->cache_expr_cxt != NULL)
> > + MemoryContextDelete(entry->cache_expr_cxt);
> > +
> > + entry->exprstate_valid = true;
> > + return;
> > + }
> >
> > IMO this should be refactored to have if/else, so the function has
> > just a single point of return and a single point where the
> > exprstate_valid is set. e.g.
> >
> > if (!has_filter)
> > {
> > /* If no filter found, clean up the memory and return */
> > ...
> > }
> > else
> > {
> > /* Create or reset the memory context for row filters */
> > ...
> > /*
> > * Now all the filters for all pubactions are known. Combine them when
> > * their pubactions are same.
> >  ...
> > }
> >
> > entry->exprstate_valid = true;
> >
>
> This part of the code is changed in v68 which makes the current code
> more suitable as we don't need to deal with memory context in if part.
> I am not sure if it is good to add the else block here but I think
> that is just a matter of personal preference.
>

Sorry, my mistake - I quoted the v67 source instead of the v68 source.

There is no else needed now at all. My suggestion becomes. e.g.

if (has_filter)
{
// deal with mem ctx ...
// combine filters ...
}
entry->exprstate_valid = true;
return;

--
Kind Regards,
Peter Smith
Fujitsu Australia

Re: pg_upgrade should truncate/remove its logs before running

2022-01-19 Thread Justin Pryzby

On Thu, Jan 20, 2022 at 12:01:29PM +0900, Michael Paquier wrote:
> On Wed, Jan 19, 2022 at 06:05:40PM -0600, Justin Pryzby wrote:
> 
> > I'm not sure these restrictions are needed ?
> 
> This could lead to issues with rmtree() if we are not careful enough,
> no?  We'd had our deal of argument injections with pg_upgrade commands
> in the past (fcd15f1).

We require that the dir not exist, by testing if (mkdir()).
So it's okay if someone specifies ../whatever or $CWD.

-- 
Justin

Re: row filtering for logical replication

2022-01-19 Thread Amit Kapila

On Thu, Jan 20, 2022 at 7:51 AM Peter Smith  wrote:
>
> 5. DEBUG level 3
>
> I found there are 3 debug logs in this patch and they all have DEBUG3 level.
>
> IMO it is probably OK as-is,
>

+1.

> but just a comparison I noticed that the
> most detailed logging for logical replication worker.c was DEBUG2.
> Perhaps row-filter patch should be using DEBUG2 also?
>

OTOH, the other related files like reorderbuffer.c and snapbuild.c
are using DEBUG3 for the detailed messages. So, I think it is probably
okay to retain logs as is.

-- 
With Regards,
Amit Kapila.

Re: row filtering for logical replication

2022-01-19 Thread Amit Kapila

On Thu, Jan 20, 2022 at 7:51 AM Peter Smith  wrote:
>
> Here are some review comments for v68-0001.
>
>
> 3. src/backend/replication/pgoutput/pgoutput.c - pgoutput_row_filter_init
>
> + /* If no filter found, clean up the memory and return */
> + if (!has_filter)
> + {
> + if (entry->cache_expr_cxt != NULL)
> + MemoryContextDelete(entry->cache_expr_cxt);
> +
> + entry->exprstate_valid = true;
> + return;
> + }
>
> IMO this should be refactored to have if/else, so the function has
> just a single point of return and a single point where the
> exprstate_valid is set. e.g.
>
> if (!has_filter)
> {
> /* If no filter found, clean up the memory and return */
> ...
> }
> else
> {
> /* Create or reset the memory context for row filters */
> ...
> /*
> * Now all the filters for all pubactions are known. Combine them when
> * their pubactions are same.
>  ...
> }
>
> entry->exprstate_valid = true;
>

This part of the code is changed in v68 which makes the current code
more suitable as we don't need to deal with memory context in if part.
I am not sure if it is good to add the else block here but I think
that is just a matter of personal preference.

-- 
With Regards,
Amit Kapila.

Re: Add last commit LSN to pg_last_committed_xact()

2022-01-19 Thread Andres Freund

Hi,

On 2022-01-19 21:23:12 -0500, James Coleman wrote:
>  { oid => '3537', descr => 'get identification of SQL object',
> diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
> index a5f9e9..2a026b0844 100644
> --- a/src/include/storage/proc.h
> +++ b/src/include/storage/proc.h
> @@ -258,6 +258,11 @@ struct PGPROC
>   PGPROC *lockGroupLeader;/* lock group leader, if I'm a member */
>   dlist_head  lockGroupMembers;   /* list of members, if I'm a 
> leader */
>   dlist_node  lockGroupLink;  /* my member link, if I'm a member */
> +
> + /*
> +  * Last transaction metadata.
> +  */
> + XLogRecPtr  lastCommitLSN;  /* cache of last committed LSN 
> */
>  };

Might be worth forcing this to be on a separate cacheline than stuff more
hotly accessed by other backends, like the lock group stuff.

Greetings,

Andres Freund

Re: pg_upgrade should truncate/remove its logs before running

2022-01-19 Thread Michael Paquier

On Wed, Jan 19, 2022 at 06:05:40PM -0600, Justin Pryzby wrote:
> I still don't know if it even needs to be configurable.

I want this to be configurable to ease the switch of the pg_upgrade to
TAP, moving the logs into a deterministic temporary location proper to
each run.  This makes reporting much easier on failure, with
repeatable tests, and that's why I began poking at this patch first.

> I'm not sure these restrictions are needed ?

This could lead to issues with rmtree() if we are not careful enough,
no?  We'd had our deal of argument injections with pg_upgrade commands
in the past (fcd15f1).

> +   outputpath = make_absolute_path(log_opts.basedir);
>   
> +   if (path_contains_parent_reference(outputpath))   
>   
> +   pg_fatal("reference to parent directory not allowed\n");  
>   
> 
> Besides, you're passing the wrong path here.

What would you suggest?  I was just looking at that again this
morning, and splitted the logic into two parts for the absolute and
relative path cases, preventing all cases like that, which would be
weird, anyway:
../
../popo
.././
././
/direct/path/to/cwd/
/direct/path/../path/to/cwd/

>> +  pg_upgrade, and is be removed after a successful
> 
> remove "be"

Fixed.

>> +   if (mkdir(log_opts.basedir, S_IRWXU | S_IRWXG | S_IRWXO))
> 
> S_IRWXG | S_IRWXO are useless due to the umask, right ?
> Maybe use PG_DIR_MODE_OWNER ?

Hmm.  We could just use pg_dir_create_mode, then.  See pg_rewind, as
one example.  This opens the door for something pluggable to
SetDataDirectoryCreatePerm(), though the original use is kind of
different with data folders.

> You're printing the wrong var.  filename_path is not initialized.

Ugh.
--
Michael
From 2efd3252e189578f865eb54d076596387494585c Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 20 Jan 2022 11:58:59 +0900
Subject: [PATCH v4] pg_upgrade: write log files and dumps in subdir..

The subdir must not exist when pg_upgrade is started.
This avoids appending to logfiles which have pre-existing errors in them.
And allows more easily cleaning up after pg_upgrade.
---
 src/bin/pg_upgrade/.gitignore   |   2 -
 src/bin/pg_upgrade/Makefile |   4 +-
 src/bin/pg_upgrade/check.c  |  12 ++--
 src/bin/pg_upgrade/dump.c   |   6 +-
 src/bin/pg_upgrade/exec.c   |   5 +-
 src/bin/pg_upgrade/function.c   |   3 +-
 src/bin/pg_upgrade/option.c |  29 ++--
 src/bin/pg_upgrade/pg_upgrade.c | 120 +---
 src/bin/pg_upgrade/pg_upgrade.h |   7 ++
 src/bin/pg_upgrade/server.c |   6 +-
 doc/src/sgml/ref/pgupgrade.sgml |  17 -
 11 files changed, 145 insertions(+), 66 deletions(-)

diff --git a/src/bin/pg_upgrade/.gitignore b/src/bin/pg_upgrade/.gitignore
index 2d3bfeaa50..939e50db6c 100644
--- a/src/bin/pg_upgrade/.gitignore
+++ b/src/bin/pg_upgrade/.gitignore
@@ -1,9 +1,7 @@
 /pg_upgrade
 # Generated by test suite
-/pg_upgrade_internal.log
 /delete_old_cluster.sh
 /delete_old_cluster.bat
 /reindex_hash.sql
-/loadable_libraries.txt
 /log/
 /tmp_check/
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 44d06be5a6..e833a1c5f0 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -45,9 +45,7 @@ uninstall:
 clean distclean maintainer-clean:
 	rm -f pg_upgrade$(X) $(OBJS)
 	rm -rf delete_old_cluster.sh log/ tmp_check/ \
-	   loadable_libraries.txt reindex_hash.sql \
-	   pg_upgrade_dump_globals.sql \
-	   pg_upgrade_dump_*.custom pg_upgrade_*.log
+	   reindex_hash.sql pg_upgrade_output.d/
 
 # When $(MAKE) is present, make automatically infers that this is a
 # recursive make. which is not actually what we want here, as that
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 3d218c2ad2..019bcb6c7b 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -783,7 +783,8 @@ check_for_isn_and_int8_passing_mismatch(ClusterInfo *cluster)
 		return;
 	}
 
-	snprintf(output_path, sizeof(output_path),
+	snprintf(output_path, sizeof(output_path), "%s/%s",
+			 log_opts.basedir,
 			 "contrib_isn_and_int8_pass_by_value.txt");
 
 	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
@@ -860,7 +861,8 @@ check_for_user_defined_postfix_ops(ClusterInfo *cluster)
 
 	prep_status("Checking for user-defined postfix operators");
 
-	snprintf(output_path, sizeof(output_path),
+	snprintf(output_path, sizeof(output_path), "%s/%s",
+			 log_opts.basedir,
 			 "postfix_ops.txt");
 
 	/* Find any user defined postfix operators */
@@ -959,7 +961,8 @@ check_for_tables_with_oids(ClusterInfo *cluster)
 
 	prep_status("Checking for tables WITH OIDS");
 
-	snprintf(output_path, sizeof(output_path),
+	snprintf(output_path, sizeof(output_path), "%s/%s",
+

Re: Null commitTS bug

2022-01-19 Thread Kyotaro Horiguchi

At Tue, 18 Jan 2022 13:48:11 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Tue, 18 Jan 2022 10:43:55 +0900, Michael Paquier  
> wrote in 
> > On Sun, Jan 16, 2022 at 11:01:25PM -0500, Tom Lane wrote:
> > > Isn't that a very bad way to write "i = j + 1"?
> > > 
> > > I agree with Horiguchi-san that
> > >   for (i = 0, headxid = xid;;)
> > 
> > Okay.  Horiguchi-san, would you like to write a patch?
> 
> Yes, I will.

This is that. I think this is a separate issue from the actual
bug. This is applicable at least back to 9.6 and I think this should
be applied back to all supported versions to avoid future backptach
conflicts.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 9211fa61513dcdbca273656454395a3dcf3ee4e7 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 20 Jan 2022 10:16:48 +0900
Subject: [PATCH] Improve confusing code in TransactionTreeSetCommitTsData

TransactionTreeSetCommitTsData has a bit confusing use of the +=
operator.  Simplifying it makes the code easier to follow.  In the
same function for-loop initializes only non-controlling variables,
which is not great style.  They ought to be initialized outside the
loop.
---
 src/backend/access/transam/commit_ts.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/transam/commit_ts.c 
b/src/backend/access/transam/commit_ts.c
index 659109f8d4..88eac10456 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -168,7 +168,10 @@ TransactionTreeSetCommitTsData(TransactionId xid, int 
nsubxids,
 * subxid not on the previous page as head.  This way, we only have to
 * lock/modify each SLRU page once.
 */
-   for (i = 0, headxid = xid;;)
+   headxid = xid;
+   i = 0;
+
+   for (;;)
{
int pageno = 
TransactionIdToCTsPage(headxid);
int j;
@@ -192,7 +195,7 @@ TransactionTreeSetCommitTsData(TransactionId xid, int 
nsubxids,
 * just wrote.
 */
headxid = subxids[j];
-   i += j - i + 1;
+   i = j + 1;
}
 
/* update the cached value in shared memory */
-- 
2.27.0

Re: Add checkpoint and redo LSN to LogCheckpointEnd log message

2022-01-19 Thread Kyotaro Horiguchi

At Thu, 20 Jan 2022 00:36:32 +, "Bossart, Nathan"  
wrote in 
> On 1/3/22, 5:52 PM, "Kyotaro Horiguchi"  wrote:
> > It seems to me "LSN" or just "location" is more confusing or
> > mysterious than "REDO LSN" for the average user. If we want to avoid
> > being technically too detailed, we would use just "start LSN=%X/%X,
> > end LSN=%X/%X".  And it is equivalent to "WAL range=[%X/%X, %X/%X]"..
> 
> My first instinct was that this should stay aligned with
> pg_controldata, but that would mean using "location=%X/%X, REDO
> location=%X/%X," which doesn't seem terribly descriptive.  IIUC the
> "checkpoint location" is the LSN of the WAL record for the checkpoint,
> and the "checkpoint's REDO location" is the LSN where checkpoint
> creation began (i.e., what you must retain for crash recovery).  My
> vote is for "start=%X/%X, end=%X/%X."

+1. Works for me.  %X/%X itself expresses it is an LSN.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Document atthasmissing default optimization avoids verification table scan

2022-01-19 Thread David G. Johnston

On Wed, Jan 19, 2022 at 6:14 PM James Coleman  wrote:

> I'm open to the idea of wordsmithing here, of course, but I strongly
> disagree that this is irrelevant data.

Ok, but wording aside, only changing a tip in the DDL - Add Table section
doesn't seem like a complete fix.  The notes in alter table, where I'd look
for such an official directive first, need to be touched as well.

For the alter table docs maybe change/add to the existing sentence below
(I'm in favor of not pointing out that scanning the table doesn't mean we
are rewriting it, but maybe I'm making another unwarranted assumption
regarding obviousness...).

"Adding a CHECK or NOT NULL constraint requires scanning the table [but not
rewriting it] to verify that existing rows meet the constraint.  It is
skipped when done as part of ADD COLUMN unless a table rewrite is required
anyway."

On that note, does the check constraint interplay with the default rewrite
avoidance in the same way?

For the Tip I'd almost rather redo it to say:

"Before PostgreSQL 11, adding a new column to a table required rewriting
that table, making it a very slow operation.  More recent versions can
sometimes optimize away this rewrite and related validation scans.  See the
notes in ALTER TABLE for details."

Though I suppose I'd accept something like (leave existing text,
alternative patch text):

"[...]large tables.\nIf the added column also has a not null constraint the
usual verification scan is also skipped."

"constant" is used in the Tip, "non-volatile" is used in alter table -
hence a desire to have just one source of truth, with alter table being the
correct place.  We should sync them up otherwise.

David J.

Re: Add last commit LSN to pg_last_committed_xact()

2022-01-19 Thread James Coleman

On Tue, Jan 18, 2022 at 9:19 PM Andres Freund  wrote:
>
> > + LWLockAcquire(ProcArrayLock, LW_SHARED);
> > + lsn = ShmemVariableCache->finishedProcsLastCommitLSN;
> > + for (index = 0; index < ProcGlobal->allProcCount; index++)
> > + {
> > + XLogRecPtr procLSN = 
> > ProcGlobal->allProcs[index].lastCommitLSN;
> > + if (procLSN > lsn)
> > + lsn = procLSN;
> > + }
> > + LWLockRelease(ProcArrayLock);
>
> I think it'd be better to go through the pgprocnos infrastructure, so that
> only connected procs need to be checked.
>
> LWLockAcquire(ProcArrayLock, LW_SHARED);
> for (i = 0; i < arrayP->numProcs; i++)
> {
> int pgprocno = arrayP->pgprocnos[i];
> PGPROC *proc = [pgprocno];
>
> if (proc->lastCommitLSN > lsn)
>lsn =proc->lastCommitLSN;
>}
>
>
> > diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
> > index a5f9e9..2a026b0844 100644
> > --- a/src/include/storage/proc.h
> > +++ b/src/include/storage/proc.h
> > @@ -258,6 +258,11 @@ struct PGPROC
> >   PGPROC *lockGroupLeader;/* lock group leader, if I'm a member 
> > */
> >   dlist_head  lockGroupMembers;   /* list of members, if I'm a 
> > leader */
> >   dlist_node  lockGroupLink;  /* my member link, if I'm a member */
> > +
> > + /*
> > +  * Last transaction metadata.
> > +  */
> > + XLogRecPtr  lastCommitLSN;  /* cache of last committed 
> > LSN */
> >  };
>
> We do not rely on 64bit integers to be read/written atomically, just 32bit
> ones. To make this work for older platforms you'd have to use a
> pg_atomic_uint64. On new-ish platforms pg_atomic_read_u64/pg_atomic_write_u64
> end up as plain read/writes, but on older ones they'd do the necessarily
> locking to make that safe...

All right, here's an updated patch.

The final interface (new function or refactor the existing not to rely
on commit_ts) is still TBD (and I'd appreciate input on that from
Alvaro and others).

Thanks,
James Coleman
From 4b1d2db87be0090bc8840e1c6d990443dccb5237 Mon Sep 17 00:00:00 2001
From: jcoleman 
Date: Fri, 14 Jan 2022 22:26:10 +
Subject: [PATCH v3] Expose LSN of last commit via pg_last_committed_xact

---
 src/backend/access/transam/commit_ts.c | 25 +--
 src/backend/access/transam/twophase.c  |  2 +-
 src/backend/access/transam/xact.c  | 10 +---
 src/backend/storage/ipc/procarray.c| 34 ++
 src/include/access/commit_ts.h |  5 ++--
 src/include/access/transam.h   |  2 ++
 src/include/catalog/pg_proc.dat|  6 ++---
 src/include/storage/proc.h |  5 
 src/include/storage/procarray.h|  2 ++
 9 files changed, 73 insertions(+), 18 deletions(-)

diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 659109f8d4..f0c3185021 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -33,10 +33,11 @@
 #include "funcapi.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
-#include "storage/shmem.h"
+#include "storage/procarray.h"
 #include "utils/builtins.h"
 #include "utils/snapmgr.h"
 #include "utils/timestamp.h"
+#include "utils/pg_lsn.h"
 
 /*
  * Defines for CommitTs page sizes.  A page is the same BLCKSZ as is used
@@ -93,6 +94,7 @@ static SlruCtlData CommitTsCtlData;
 typedef struct CommitTimestampShared
 {
 	TransactionId xidLastCommit;
+	XLogRecPtr lsnLastCommit;
 	CommitTimestampEntry dataLastCommit;
 	bool		commitTsActive;
 } CommitTimestampShared;
@@ -135,7 +137,7 @@ static void WriteTruncateXlogRec(int pageno, TransactionId oldestXid);
 void
 TransactionTreeSetCommitTsData(TransactionId xid, int nsubxids,
 			   TransactionId *subxids, TimestampTz timestamp,
-			   RepOriginId nodeid)
+			   XLogRecPtr commitLsn, RepOriginId nodeid)
 {
 	int			i;
 	TransactionId headxid;
@@ -414,24 +416,28 @@ pg_last_committed_xact(PG_FUNCTION_ARGS)
 	TransactionId xid;
 	RepOriginId nodeid;
 	TimestampTz ts;
-	Datum		values[3];
-	bool		nulls[3];
+	XLogRecPtr lsn;
+	Datum		values[4];
+	bool		nulls[4];
 	TupleDesc	tupdesc;
 	HeapTuple	htup;
 
 	/* and construct a tuple with our data */
 	xid = GetLatestCommitTsData(, );
+	lsn = GetLastCommitLSN();
 
 	/*
 	 * Construct a tuple descriptor for the result row.  This must match this
 	 * function's pg_proc entry!
 	 */
-	tupdesc = CreateTemplateTupleDesc(3);
+	tupdesc = CreateTemplateTupleDesc(4);
 	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "xid",
 	   XIDOID, -1, 0);
 	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "timestamp",
 	   TIMESTAMPTZOID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 3, "roident",
+	TupleDescInitEntry(tupdesc, (AttrNumber) 3, "lsn",
+	   PG_LSNOID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 4, "roident",
 	   OIDOID, -1, 0);
 	tupdesc = BlessTupleDesc(tupdesc);
 
@@ -447,8 +453,11 @@

Re: row filtering for logical replication

2022-01-19 Thread Peter Smith

Here are some review comments for v68-0001.

~~~

1. Commit message

"When a publication is defined or modified, rows that don't satisfy an
optional WHERE clause
will be filtered out."

That wording seems strange to me - it sounds like the filtering takes
place at the point of creating/altering.

Suggest reword something like:
"When a publication is defined or modified, an optional WHERE clause
can be specified. Rows that don't
satisfy this WHERE clause will be filtered out."

~~~

2. Commit message

"The WHERE clause allows simple expressions that don't have
user-defined functions, operators..."

Suggest adding the word ONLY:
"The WHERE clause only allows simple expressions that don't have
user-defined functions, operators..."

~~~

3. src/backend/replication/pgoutput/pgoutput.c - pgoutput_row_filter_init

+ /* If no filter found, clean up the memory and return */
+ if (!has_filter)
+ {
+ if (entry->cache_expr_cxt != NULL)
+ MemoryContextDelete(entry->cache_expr_cxt);
+
+ entry->exprstate_valid = true;
+ return;
+ }

IMO this should be refactored to have if/else, so the function has
just a single point of return and a single point where the
exprstate_valid is set. e.g.

if (!has_filter)
{
/* If no filter found, clean up the memory and return */
...
}
else
{
/* Create or reset the memory context for row filters */
...
/*
* Now all the filters for all pubactions are known. Combine them when
* their pubactions are same.
 ...
}

entry->exprstate_valid = true;

~~~

4. src/backend/replication/pgoutput/pgoutput.c - pgoutput_row_filter comment

+ /*
+ * We need this map  to avoid relying on changes in ReorderBufferChangeType
+ * enum.
+ */
+ static int map_changetype_pubaction[] = {
+ [REORDER_BUFFER_CHANGE_INSERT] = PUBACTION_INSERT,
+ [REORDER_BUFFER_CHANGE_UPDATE] = PUBACTION_UPDATE,
+ [REORDER_BUFFER_CHANGE_DELETE] = PUBACTION_DELETE
+ };

Suggest rewording comment and remove the double-spacing:

BEFORE:
"We need this map  to avoid relying on changes in ReorderBufferChangeType enum."

AFTER:
"We need this map to avoid relying on ReorderBufferChangeType enums
having specific values."

~~~

5. DEBUG level 3

I found there are 3 debug logs in this patch and they all have DEBUG3 level.

IMO it is probably OK as-is, but just a comparison I noticed that the
most detailed logging for logical replication worker.c was DEBUG2.
Perhaps row-filter patch should be using DEBUG2 also?

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: slowest tap tests - split or accelerate?

2022-01-19 Thread Andres Freund

Hi,

On 2022-01-19 09:42:31 -0800, Andres Freund wrote:
> Both ours have this. Unfortunately on windows cp doesn't natively
> exist. Although git does provide it.  I tried a few things that appear to be
> natively available (time is best of three executions):
> 
>  gnu cp from git, cp -a tmp_install\initdb_template t\
>  670ms
> 
>  xcopy.exe /E /Q tmp_install\initdb_template t\
>  638ms

This errors out if there's any forward slashes in paths, thinking it's a
flag. Seems out.

>  robocopy /e /NFL /NDL tmp_install\initdb_template t\
>  575ms
> 
> So I guess we could use robocopy? That's shipped as part of windows starting 
> in
> windows 10... xcopy has been there for longer, so I might just default to 
> that.

It's part of of the OS back to at least windows 2016. I've found some random
links on the webs saying that it's included "This command is available in
Vista and Windows 7 by default. For Windows XP and Server 2003 this tool can
be downloaded as part of Server 2003 Windows Resource Kit tools. ".

Given that our oldest supported msvc version only runs on Windows 7 upwards
[2], I think we should be good?

Alternatively we could lift copydir() to src/common? But that seems like a bit
more work than I want to put in.

For a second I was thinking that using something like copy --reflink=auto
could make a lot of sense for machines like florican, removing most of the IO
from a "templated initdb". But it looks like freebsd doesn't have that, and
it'd be a pain to figure out whether cp has --reflink.

Greetings,

Andres Freund

[1] https://www.windows-commandline.com/download-robocopy/
[2] 
https://docs.microsoft.com/en-us/visualstudio/releases/2013/vs2013-sysrequirements-vs

Re: Use generation context to speed up tuplesorts

2022-01-19 Thread David Rowley

On Tue, 18 Jan 2022 at 19:45, Julien Rouhaud  wrote:
> I'm also a bit confused about which patch(es) should actually be reviewed in
> that thread.  My understanding is that the original patch might not be 
> relevant
> anymore but I might be wrong.  The CF entry should probably be updated to
> clarify that, with an annotation/ title change / author update or something.
>
> In the meantime I will switch the entry to Waiting on Author.

I think what's going on here is a bit confusing. There's quite a few
patches on here that aim to do very different things.

The patch I originally proposed was just to make tuplesort use
generation memory contexts. This helps speed up tuplesort because
generation contexts allow palloc() to be faster than the standard
allocator. Additionally, there's no power-of-2 wastage with generation
contexts like there is with the standard allocator. This helps fit
more tuples on fewer CPU cache lines.

I believe Andres then mentioned that the fixed block size for
generation contexts might become a problem. With the standard
allocator the actual size of the underlying malloc() grows up until a
maximum size.  With generation contexts this is always the same, so we
could end up with a very large number of blocks which means lots of
small mallocs which could end up slow.  Tomas then posted a series of
patches to address this.

I then posted another patch that has the planner make a choice on the
size of the blocks to use for the generation context based on the
tuple width and number of tuple estimates. My hope there was to
improve performance further by making a better choice for how large to
make the blocks in the first place.  This reduces the need for Tomas'
patch, but does not eliminate the need for it.

As of now, I still believe we'll need Tomas' patches to allow the
block size to grow up to a maximum size.  I think those patches are
likely needed before we think about making tuplesort use generation
contexts.  The reason I believe this is that we don't always have good
estimates for the number of tuples we're going to sort.  nodeSort.c is
fairly easy as it just fetches tuples once and then sorts them.  The
use of tuplesort for nodeIncrementalSort.c is much more complex as
we'll likely be performing a series of smaller sorts which are much
harder to get size estimates for. This means there's likely no magic
block size that we can use for the generation context that's used to
store the tuples in tuplesort.  This is also the case for order by
aggregates and probably most other usages of tuplesort.

David

Re: Replace uses of deprecated Python module distutils.sysconfig

2022-01-19 Thread Tom Lane

=?UTF-8?Q?Juan_Jos=C3=A9_Santamar=C3=ADa_Flecha?=  
writes:
> On Wed, Jan 19, 2022 at 9:40 PM Tom Lane  wrote:
>> Done.  (I couldn't find any equivalent logic in the MSVC build scripts
>> though; is there something I missed?)

> MSVC will use the path configured in src\tools\msvc\config.pl 
> $config->{"python"},
> there is no ambiguity.

Ah, right.  We have only three active Windows animals that are building
with python, and all three are using 3.something, so that side of the
house seems to be ready to go.

regards, tom lane

Re: Document atthasmissing default optimization avoids verification table scan

2022-01-19 Thread James Coleman

On Wed, Jan 19, 2022 at 7:51 PM David G. Johnston
 wrote:
>
> On Wed, Jan 19, 2022 at 5:08 PM Bossart, Nathan  wrote:
>>
>> On 9/24/21, 7:30 AM, "James Coleman"  wrote:
>> > When PG11 added the ability for ALTER TABLE ADD COLUMN to set a constant
>> > default value without rewriting the table the doc changes did not note
>> > how the new feature interplayed with ADD COLUMN DEFAULT NOT NULL.
>> > Previously such a new column required a verification table scan to
>> > ensure no values were null. That scan happens under an exclusive lock on
>> > the table, so it can have a meaningful impact on database "accessible
>> > uptime".
>>
>> I'm likely misunderstanding, but are you saying that adding a new
>> column with a default value and a NOT NULL constraint used to require
>> a verification scan?
>
>
> As a side-effect of rewriting every live record in the table and indexes to 
> brand new files, yes.  I doubt an actual independent scan was performed since 
> the only way for the newly written tuples to not have the default value 
> inserted would be a severe server bug.

I've confirmed it wasn't a separate scan, but it does evaluate all
constraints (it doesn't have any optimizations for skipping ones
probably true by virtue of the new default).

>>
>> + Additionally adding a column with a constant default value avoids a
>> + a table scan to verify no NULL values are present.
>>
>> Should this clarify that it's referring to NOT NULL constraints?
>>
>
> This doesn't seem like relevant material to comment on.  It's an 
> implementation detail that is sufficiently covered by "making the ALTER TABLE 
> very fast even on large tables".
>
> Also, the idea of performing that scan seems ludicrous.  I just added the 
> column and told it to populate with default values - why do you need to check 
> that your server didn't miss any?

I'm open to the idea of wordsmithing here, of course, but I strongly
disagree that this is irrelevant data. There are plenty of
optimizations Postgres could theoretically implement but doesn't, so
measuring what should happen by what you think is obvious ("told it to
populate with default values - why do you need to check") is clearly
not valid.

This patch actually came out of our specifically needing to verify
that this is true before an op precisely because docs aren't actually
clear and because we can't risk a large table scan under an exclusive
lock. We're clearly not the only ones with that question; it came up
in a comment on this blog post announcing the newly committed feature
[1].

I realize that most users aren't as worried about this kind of
specific detail about DDL as we are (requiring absolutely zero slow
DDL while under an exclusive lock), but it is relevant to high uptime
systems.

Thanks,
James Coleman

1: 
https://www.depesz.com/2018/04/04/waiting-for-postgresql-11-fast-alter-table-add-column-with-a-non-null-default/

Re: Remove redundant MemoryContextSwith in BeginCopyFrom

2022-01-19 Thread Japin Li



On Thu, 20 Jan 2022 at 08:50, Tom Lane  wrote:
> Japin Li  writes:
>> I see you have already push this patch on master (89f059bdf52), why not
>> remove MemoryContextSwitchTo in the middle of BeginCopyTo in this commit?
>
> That was a different suggestion from a different person, so I didn't
> want to muddle the credit.  Also, it requires a bit of testing,
> while 89f059bdf52 was visibly perfectly safe.
>

Thanks for your explanation!

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

Re: Document atthasmissing default optimization avoids verification table scan

2022-01-19 Thread David G. Johnston

On Wed, Jan 19, 2022 at 5:08 PM Bossart, Nathan  wrote:

> On 9/24/21, 7:30 AM, "James Coleman"  wrote:
> > When PG11 added the ability for ALTER TABLE ADD COLUMN to set a constant
> > default value without rewriting the table the doc changes did not note
> > how the new feature interplayed with ADD COLUMN DEFAULT NOT NULL.
> > Previously such a new column required a verification table scan to
> > ensure no values were null. That scan happens under an exclusive lock on
> > the table, so it can have a meaningful impact on database "accessible
> > uptime".
>
> I'm likely misunderstanding, but are you saying that adding a new
> column with a default value and a NOT NULL constraint used to require
> a verification scan?
>

As a side-effect of rewriting every live record in the table and indexes to
brand new files, yes.  I doubt an actual independent scan was performed
since the only way for the newly written tuples to not have the default
value inserted would be a severe server bug.

> + Additionally adding a column with a constant default value avoids a
> + a table scan to verify no NULL values are present.
>
> Should this clarify that it's referring to NOT NULL constraints?
>
>
This doesn't seem like relevant material to comment on.  It's an
implementation detail that is sufficiently covered by "making the ALTER
TABLE very fast even on large tables".

Also, the idea of performing that scan seems ludicrous.  I just added the
column and told it to populate with default values - why do you need to
check that your server didn't miss any?

David J.

Re: Remove redundant MemoryContextSwith in BeginCopyFrom

2022-01-19 Thread Tom Lane

Japin Li  writes:
> I see you have already push this patch on master (89f059bdf52), why not
> remove MemoryContextSwitchTo in the middle of BeginCopyTo in this commit?

That was a different suggestion from a different person, so I didn't
want to muddle the credit.  Also, it requires a bit of testing,
while 89f059bdf52 was visibly perfectly safe.

regards, tom lane

Re: Remove redundant MemoryContextSwith in BeginCopyFrom

2022-01-19 Thread Japin Li



On Thu, 20 Jan 2022 at 00:38, Tom Lane  wrote:
> Bharath Rupireddy  writes:
>> +1. It looks like a thinko from c532d15d. There's no code in between,
>> so switching to oldcontext doesn't make sense.
>
> Agreed.
>
>> I think we also need to remove MemoryContextSwitchTo(oldcontext); at
>> the end of BeginCopyTo in copyto.c, because we are not changing memory
>> contexts in between.
>
> Hmm, I think it'd be a better idea to remove the one in the middle of
> BeginCopyTo.  The code after that is still doing setup of the cstate,
> so the early switch back looks to me like trouble waiting to happen.
>

Agreed

I see you have already push this patch on master (89f059bdf52), why not
remove MemoryContextSwitchTo in the middle of BeginCopyTo in this commit?

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

Re: Add checkpoint and redo LSN to LogCheckpointEnd log message

2022-01-19 Thread Bossart, Nathan

On 1/3/22, 5:52 PM, "Kyotaro Horiguchi"  wrote:
> It seems to me "LSN" or just "location" is more confusing or
> mysterious than "REDO LSN" for the average user. If we want to avoid
> being technically too detailed, we would use just "start LSN=%X/%X,
> end LSN=%X/%X".  And it is equivalent to "WAL range=[%X/%X, %X/%X]"..

My first instinct was that this should stay aligned with
pg_controldata, but that would mean using "location=%X/%X, REDO
location=%X/%X," which doesn't seem terribly descriptive.  IIUC the
"checkpoint location" is the LSN of the WAL record for the checkpoint,
and the "checkpoint's REDO location" is the LSN where checkpoint
creation began (i.e., what you must retain for crash recovery).  My
vote is for "start=%X/%X, end=%X/%X."

Nathan

Re: Document atthasmissing default optimization avoids verification table scan

2022-01-19 Thread Bossart, Nathan

On 9/24/21, 7:30 AM, "James Coleman"  wrote:
> When PG11 added the ability for ALTER TABLE ADD COLUMN to set a constant
> default value without rewriting the table the doc changes did not note
> how the new feature interplayed with ADD COLUMN DEFAULT NOT NULL.
> Previously such a new column required a verification table scan to
> ensure no values were null. That scan happens under an exclusive lock on
> the table, so it can have a meaningful impact on database "accessible
> uptime".

I'm likely misunderstanding, but are you saying that adding a new
column with a default value and a NOT NULL constraint used to require
a verification scan?

+ Additionally adding a column with a constant default value avoids a
+ a table scan to verify no NULL values are present.

Should this clarify that it's referring to NOT NULL constraints?

Nathan

Re: pg_upgrade should truncate/remove its logs before running

2022-01-19 Thread Justin Pryzby

On Wed, Jan 19, 2022 at 05:13:18PM +0900, Michael Paquier wrote:
> On Tue, Jan 11, 2022 at 10:08:13PM -0600, Justin Pryzby wrote:
> > I asked about that before.  Right now, it'll exit(1) when mkdir fails.
> > 
> > I had written a patch to allow "." by skipping mkdir (or allowing it to 
> > fail if
> > errno == EEXIST), but it seems like an awfully bad idea to try to make that
> > work with rmtree().

I still don't know if it even needs to be configurable.

> - Add some sanity check about the path used, aka no parent reference
> allowed and the output path should not be a direct parent of the
> current working directory.

I'm not sure these restrictions are needed ?

+   outputpath = make_absolute_path(log_opts.basedir);  

+   if (path_contains_parent_reference(outputpath)) 

+   pg_fatal("reference to parent directory not allowed\n");


Besides, you're passing the wrong path here.

> I have noticed a couple of incorrect things in the docs, and some
> other things.  It is a bit late here, so I may have missed a couple of
> things but I'll look at this stuff once again in a couple of days.

> +  pg_upgrade, and is be removed after a successful

remove "be"

> +   if (mkdir(log_opts.basedir, S_IRWXU | S_IRWXG | S_IRWXO))

S_IRWXG | S_IRWXO are useless due to the umask, right ?
Maybe use PG_DIR_MODE_OWNER ?

> +   if (mkdir(log_opts.basedir, S_IRWXU | S_IRWXG | S_IRWXO))
> +   pg_fatal("could not create directory \"%s\": %m\n", 
> filename_path);
> +   if (mkdir(log_opts.dumpdir, S_IRWXU | S_IRWXG | S_IRWXO))
> +   pg_fatal("could not create directory \"%s\": %m\n", 
> filename_path);
> +   if (mkdir(log_opts.logdir, S_IRWXU | S_IRWXG | S_IRWXO))
> +   pg_fatal("could not create directory \"%s\": %m\n", 
> filename_path);

You're printing the wrong var.  filename_path is not initialized.

-- 
Justin

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-01-19 Thread Tom Lane

"Bossart, Nathan"  writes:
> I think the other side of this is that we don't want checkpointing to
> continually fail because of a noncritical failure.  That could also
> lead to problems down the road.

Yeah, a persistent failure to complete checkpoints is very nasty.
Your disk will soon fill with unrecyclable WAL.  I don't see how
that's better than a somewhat hypothetical performance issue.

regards, tom lane

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-01-19 Thread Bossart, Nathan

On 1/19/22, 11:08 AM, "Andres Freund"  wrote:
> On 2022-01-19 13:34:21 -0500, Tom Lane wrote:
>> As far as the patch itself goes, I agree that failure to unlink
>> is noncritical, because such a file would have no further effect
>> and we can just ignore it.
>
> I don't agree. We iterate through the directory regularly on systems with
> catalog changes + logical decoding. An ever increasing list of gunk will make
> that more and more expensive.  And I haven't heard a meaningful reason why we
> would have map-* files that we can't remove.

I think the other side of this is that we don't want checkpointing to
continually fail because of a noncritical failure.  That could also
lead to problems down the road.

> Ignoring failures like this just makes problems much harder to debug and they
> tend to bite harder for it.

If such noncritical failures happened regularly, the server logs will
likely become filled with messages about it.  Perhaps users may not
notice for a while, but I don't think the proposed patch would make
debugging excessively difficult.

Nathan

Re: Replace uses of deprecated Python module distutils.sysconfig

2022-01-19 Thread Tom Lane

I wrote:
> Peter Eisentraut  writes:
>> On 19.01.22 01:18, Tom Lane wrote:
>>> ... I'm tempted to resurrect the
>>> idea of changing configure's probe order to "python3 python python2"
>>> in the meantime, just so we can see how much of the buildfarm is ready
>>> for that.

>> This seems sensible in any case, given that we have quasi-committed to 
>> enforcing Python 3 soon.

> Done.

The early returns are not great: we have about half a dozen machines
so far that are finding python3, and reporting sane-looking Python
include paths, but not finding Python.h.  They're all Linux-oid
machines, so I suppose what is going on is that they have the base
python3 package installed but not python3-dev or local equivalent.

I want to leave that patch in place long enough so we can get a
fairly full survey of which machines are OK and which are not,
but I suppose I'll have to revert it tomorrow or so.  We did
promise the owners a month to adjust their configurations.

regards, tom lane

Re: missing indexes in indexlist with partitioned tables

2022-01-19 Thread Alvaro Herrera

On 2022-Jan-19, Arne Roland wrote:

> > a) make sure that the submitted patch updates these test results so
> > that the test pass [...]
> 
> Just for the record: I did run the tests, but I did miss that the
> commit of Tomas fix for fractional optimization is already on the
> master. Please note that this is a very new test case from a patch
> committed less than one week ago.

Ah, apologies, I didn't realize that that test was so new.

> If anything I obviously need practical help, on how
> I can catch on recent changes quicker and without manual intervention.
> I don't have a modified buildfarm animal running, that tries to apply
> my patch and run regression tests for my patch on a daily basis.

See src/tools/ci/README (for multi-platform testing of patches on
several platforms) and http://commitfest.cputube.org/

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/

Re: do only critical work during single-user vacuum?

2022-01-19 Thread Michael Paquier

On Wed, Jan 19, 2022 at 09:11:48PM +, Bossart, Nathan wrote:
> I personally think VACUUM FOR WRAPAROUND is the best of the options
> provided thus far.

Could you avoid introducing a new grammar pattern in VACUUM?  Any new
option had better be within the parenthesized part as it is extensible
at will with its set of DefElems.
--
Michael


signature.asc
Description: PGP signature

Re: missing indexes in indexlist with partitioned tables

2022-01-19 Thread Zhihong Yu

On Wed, Jan 19, 2022 at 1:50 PM Arne Roland  wrote:

> Hi,
>
>
> I came up with a slightly more intuitive way to force the different
> outcome for the fractional test, that does hardly change anything.
>
>
> I'm not sure, whether the differentiation between fract_x and fract_t is
> worth it, since there shouldn't be any difference, but as mentioned before
> I added it for potential future clarity.
>
>
> Thanks for your feedback again!
>
>
> Regards
>
> Arne
>
>
> --
> *From:* Arne Roland
> *Sent:* Wednesday, January 19, 2022 10:13:55 PM
> *To:* Alvaro Herrera; Julien Rouhaud
> *Cc:* pgsql-hackers
> *Subject:* Re: missing indexes in indexlist with partitioned tables
>
>
> Hi!
>
> > From: Alvaro Herrera 
> > [...]
> > Hmm, these plan changes look valid to me.  A left self-join using the
> > primary key column?  That looks optimizable all right.
> > [...]
> > What I still don't know is whether this patch is actually desirable or
> > not.  If the only cases it affects is self-joins, is there much actual
> > value?
>
> This is not really about self joins. That was just the most simple
> example, because otherwise we need a second table.
> It's unique, it's not relevant whether it's the same table. In most cases
> it won't. I was talking about join pruning.
>
> > I suspect that the author of partition-wise joins would want to change
> > these queries so that whatever was being tested by these self-joins is
> > tested by some other means (maybe just create an identical partitioned
> > table via CREATE TABLE fract_t2 ... ; INSERT INTO fract_t2 SELECT FROM
> > fract_t).  But at the same time, the author of this patch should
>
> Your suggestion doesn't work, because with my patch we solve that case as
> well. We solve the general join pruning case. If we make the index
> non-unique however, we should be able to test the fractional case the
> same way.
>
> > b) add some test cases to verify that his desired
> > behavior is tested somewhere, not just in a test case that's
> > incidentally broken by his patch.
>
> My patch already includes such a test, look at @@ -90,6 +90,13 @@
> src/test/regress/sql/partition_join.sql
> Since the selfjoin part was confusing to you, it might be worthwhile to do
> test that with two different tables. While I see no need to test in that
> way, I will adjust the patch so. Just to make it more clear for people
> looking at those tests in the future.
>
> a) make
> > sure that the submitted patch updates these test results so that the
> > test pass [...]
>
> Just for the record: I did run the tests, but I did miss that the commit
> of Tomas fix for fractional optimization is already on the master. Please
> note that this is a very new test case from a patch committed less than one
> week ago.
>
> I'm glad Julien Rouhaud pointed out, that Tomas patch is committed it by
> now. That was very helpful to me, as I can now integrate the two tests.
>
> @Álvaro Herrera:
> If you want to help me, please don't put forward an abstract list of
> responsibilities. If anything I obviously need practical help, on how I can
> catch on recent changes quicker and without manual intervention. I don't
> have a modified buildfarm animal running, that tries to apply my patch and
> run regression tests for my patch on a daily basis.
> Is there a simple way for me to check for that?
>
> I will probably integrate those two tests, since they can work of similar
> structures without need to recreate the tables again and again. I have
> clear understanding how that new test works. I have to attend a few calls
> now, but I should be able to update the tests later.
>
> Regards
> Arne
>
> Hi,

-   if (indexRelation->rd_rel->relkind == RELKIND_PARTITIONED_INDEX)
+   if (inhparent && (!index->indisunique ||
indexRelation->rd_rel->relkind != RELKIND_PARTITIONED_INDEX))

The check on RELKIND_PARTITIONED_INDEX seems to negate what the comment
above says:

+* Don't add partitioned indexes to the indexlist

Cheers

Re: Schema variables - new implementation for Postgres 15

2022-01-19 Thread Julien Rouhaud

Hi,

On Wed, Jan 19, 2022 at 09:09:41PM +0100, Pavel Stehule wrote:
> st 19. 1. 2022 v 9:01 odesílatel Julien Rouhaud  napsal:
> 
> RAISE NOTICE should use local variables, and in this case is a question if we
> should raise a warning. There are two possible analogies - we can see session
> variables like global variables, and then the warning should not be raised,
> or we can see relation between session variables and plpgsql variables
> similar like session variables and some with higher priority, and then
> warning should be raised. If we want to ensure that the new session variable
> doesn't break code, then session variables should have lower priority than
> plpgsql variables too. And because the plpgsql protection against collision
> cannot  be used, then I prefer raising the warning.

Ah that's indeed a good point.  I agree, they're from a different part of the
system so they should be treated as different things, and thus raising a
warning.  It's consistent with the chosen conservative approach anyway.

> PLpgSQL assignment should not be in collision with session variables ever

Agreed.

> 
> >
> > =# DO $$
> > BEGIN
> > v := 'test';
> > RAISE NOTICE 'v: %', v;
> > END;
> > $$ LANGUAGE plpgsql;
> > ERROR:  42601: "v" is not a known variable
> > LINE 3: v := 'test';
> >
> > But the RAISE NOTICE does see the session variable (which should be the
> > correct
> > behavior I think), so the warning should have been raised for this
> > instruction
> > (and in that case the message is incorrect, as it's not shadowing a
> > column).
> >
> 
> In this case I can detect node type, and I can identify external param
> node, but I cannot to detect if this code was executed from PLpgSQL or from
> some other
> 
> So I can to modify warning text to some

Yes, that's what I had in mind too.

> DETAIL:  The identifier can be column reference or query parameter or
> session variable reference.
> HINT:  The column reference and query parameter is preferred against
> session variable reference.
> 
> I cannot to use term "plpgsql variable" becase I cannot to ensure validity
> of this message
> 
> Maybe is better to don't talk about source of this issue, and just talk
> about result - so the warning text should be just
> 
> MESSAGE: "session variable \"\" is shadowed
> DETAIL: "session variables can be shadowed by columns, routine's variables
> and routine's arguments with same name"
> 
> Is it better?

I clearly prefer the 2nd version.

Re: missing indexes in indexlist with partitioned tables

2022-01-19 Thread Arne Roland

Hi,


I came up with a slightly more intuitive way to force the different outcome for 
the fractional test, that does hardly change anything.


I'm not sure, whether the differentiation between fract_x and fract_t is worth 
it, since there shouldn't be any difference, but as mentioned before I added it 
for potential future clarity.


Thanks for your feedback again!


Regards

Arne



From: Arne Roland
Sent: Wednesday, January 19, 2022 10:13:55 PM
To: Alvaro Herrera; Julien Rouhaud
Cc: pgsql-hackers
Subject: Re: missing indexes in indexlist with partitioned tables


Hi!

> From: Alvaro Herrera 
> [...]
> Hmm, these plan changes look valid to me.  A left self-join using the
> primary key column?  That looks optimizable all right.
> [...]
> What I still don't know is whether this patch is actually desirable or
> not.  If the only cases it affects is self-joins, is there much actual
> value?

This is not really about self joins. That was just the most simple example, 
because otherwise we need a second table.
It's unique, it's not relevant whether it's the same table. In most cases it 
won't. I was talking about join pruning.

> I suspect that the author of partition-wise joins would want to change
> these queries so that whatever was being tested by these self-joins is
> tested by some other means (maybe just create an identical partitioned
> table via CREATE TABLE fract_t2 ... ; INSERT INTO fract_t2 SELECT FROM
> fract_t).  But at the same time, the author of this patch should

Your suggestion doesn't work, because with my patch we solve that case as well. 
We solve the general join pruning case. If we make the index non-unique 
however, we should be able to test the fractional case the same way.

> b) add some test cases to verify that his desired
> behavior is tested somewhere, not just in a test case that's
> incidentally broken by his patch.

My patch already includes such a test, look at @@ -90,6 +90,13 @@ 
src/test/regress/sql/partition_join.sql
Since the selfjoin part was confusing to you, it might be worthwhile to do test 
that with two different tables. While I see no need to test in that way, I will 
adjust the patch so. Just to make it more clear for people looking at those 
tests in the future.

a) make
> sure that the submitted patch updates these test results so that the
> test pass [...]

Just for the record: I did run the tests, but I did miss that the commit of 
Tomas fix for fractional optimization is already on the master. Please note 
that this is a very new test case from a patch committed less than one week ago.

I'm glad Julien Rouhaud pointed out, that Tomas patch is committed it by now. 
That was very helpful to me, as I can now integrate the two tests.

@Álvaro Herrera:
If you want to help me, please don't put forward an abstract list of 
responsibilities. If anything I obviously need practical help, on how I can 
catch on recent changes quicker and without manual intervention. I don't have a 
modified buildfarm animal running, that tries to apply my patch and run 
regression tests for my patch on a daily basis.
Is there a simple way for me to check for that?

I will probably integrate those two tests, since they can work of similar 
structures without need to recreate the tables again and again. I have clear 
understanding how that new test works. I have to attend a few calls now, but I 
should be able to update the tests later.

Regards
Arne

diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 0ef70ad7f1..fe50919637 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3504,7 +3504,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 	Assert(list_length(exprlist) == list_length(oprlist));
 
 	/* Short-circuit if no indexes... */
-	if (rel->indexlist == NIL)
+	if (rel->indexlist == NIL && rel->partIndexlist == NIL)
 		return false;
 
 	/*
@@ -3549,7 +3549,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 		return false;
 
 	/* Examine each index of the relation ... */
-	foreach(ic, rel->indexlist)
+	foreach(ic, list_concat(rel->indexlist, rel->partIndexlist))
 	{
 		IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
 		int			c;
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 337f470d58..b04b3f88ad 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -23,6 +23,7 @@
 #include "postgres.h"
 
 #include "nodes/nodeFuncs.h"
+#include "nodes/nodes.h"
 #include "optimizer/clauses.h"
 #include "optimizer/joininfo.h"
 #include "optimizer/optimizer.h"
@@ -598,7 +599,7 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
 		 */
 		ListCell   *lc;
 
-		foreach(lc, rel->indexlist)
+		foreach(lc, list_concat(rel->indexlist, rel->partIndexlist))
 		{
 			IndexOptInfo *ind = (IndexOptInfo *) lfirst(lc);
 
diff --git

Re: refactoring basebackup.c

2022-01-19 Thread Robert Haas

On Wed, Jan 19, 2022 at 7:16 AM Dipesh Pandit  wrote:
> I have done initial testing and
> working on updating the test coverage.

I spent some time thinking about test coverage for the server-side
backup code today and came up with the attached (v12-0003). It does an
end-to-end test that exercises server-side backup and server-side
compression and then untars the backup and validity-checks it using
pg_verifybackup. In addition to being good test coverage for these
patches, it also plugs a gap in the test coverage of pg_verifybackup,
which currently has no test case that untars a tar-format backup and
then verifies the result. I couldn't figure out a way to do that back
at the time I was working on pg_verifybackup, because I didn't think
we had any existing precedent for using 'tar' from a TAP test. But it
was pointed out to me that we do, so I used that as the model for this
test. It should be easy to generalize this test case to test lz4 and
zstd as well, I think. But I guess we'll still need something
different to test what your patch is doing.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

v12-0002-Server-side-gzip-compression.patch
Description: Binary data

v12-0003-Test-server-side-backup-backup-compression-and-p.patch
Description: Binary data

v12-0001-Support-base-backup-targets.patch
Description: Binary data

Re: Replace uses of deprecated Python module distutils.sysconfig

2022-01-19 Thread Juan José Santamaría Flecha

On Wed, Jan 19, 2022 at 9:40 PM Tom Lane  wrote:

>
> Done.  (I couldn't find any equivalent logic in the MSVC build scripts
> though; is there something I missed?)
>
> MSVC will use the path configured in src\tools\msvc\config.pl 
> $config->{"python"},
there is no ambiguity.

Regards,

Juan José Santamaría Flecha

Re: missing indexes in indexlist with partitioned tables

2022-01-19 Thread Arne Roland

Hi!

> From: Alvaro Herrera 
> [...]
> Hmm, these plan changes look valid to me.  A left self-join using the
> primary key column?  That looks optimizable all right.
> [...]
> What I still don't know is whether this patch is actually desirable or
> not.  If the only cases it affects is self-joins, is there much actual
> value?

This is not really about self joins. That was just the most simple example, 
because otherwise we need a second table.
It's unique, it's not relevant whether it's the same table. In most cases it 
won't. I was talking about join pruning.

> I suspect that the author of partition-wise joins would want to change
> these queries so that whatever was being tested by these self-joins is
> tested by some other means (maybe just create an identical partitioned
> table via CREATE TABLE fract_t2 ... ; INSERT INTO fract_t2 SELECT FROM
> fract_t).  But at the same time, the author of this patch should

Your suggestion doesn't work, because with my patch we solve that case as well. 
We solve the general join pruning case. If we make the index non-unique 
however, we should be able to test the fractional case the same way.

> b) add some test cases to verify that his desired
> behavior is tested somewhere, not just in a test case that's
> incidentally broken by his patch.

My patch already includes such a test, look at @@ -90,6 +90,13 @@ 
src/test/regress/sql/partition_join.sql
Since the selfjoin part was confusing to you, it might be worthwhile to do test 
that with two different tables. While I see no need to test in that way, I will 
adjust the patch so. Just to make it more clear for people looking at those 
tests in the future.

a) make
> sure that the submitted patch updates these test results so that the
> test pass [...]

Just for the record: I did run the tests, but I did miss that the commit of 
Tomas fix for fractional optimization is already on the master. Please note 
that this is a very new test case from a patch committed less than one week ago.

I'm glad Julien Rouhaud pointed out, that Tomas patch is committed it by now. 
That was very helpful to me, as I can now integrate the two tests.

@Álvaro Herrera:
If you want to help me, please don't put forward an abstract list of 
responsibilities. If anything I obviously need practical help, on how I can 
catch on recent changes quicker and without manual intervention. I don't have a 
modified buildfarm animal running, that tries to apply my patch and run 
regression tests for my patch on a daily basis.
Is there a simple way for me to check for that?

I will probably integrate those two tests, since they can work of similar 
structures without need to recreate the tables again and again. I have clear 
understanding how that new test works. I have to attend a few calls now, but I 
should be able to update the tests later.

Regards
Arne

Re: do only critical work during single-user vacuum?

2022-01-19 Thread Bossart, Nathan

On 1/19/22, 11:15 AM, "John Naylor"  wrote:
> This seems to be the motivating reason for wanting new configurability
> on the server side. In any case, new knobs are out of scope for this
> thread. If the use case is compelling enough, may I suggest starting a
> new thread?

Sure.  Perhaps the new top-level command will use these new options
someday.

> Regarding the thread subject, I've been playing with the grammar, and
> found it's quite easy to have
>
> VACUUM FOR WRAPAROUND
> or
> VACUUM FOR EMERGENCY
>
> since FOR is a reserved word (and following that can be an IDENT plus
> a strcmp check) and cannot conflict with table names. This sounds a
> bit more natural than VACUUM LIMIT. Opinions?

I personally think VACUUM FOR WRAPAROUND is the best of the options
provided thus far.

Nathan

Re: Adding CI to our tree

2022-01-19 Thread Tom Lane

Andres Freund  writes:
> I'm doubtful that it's good that we use BAIL_OUT so liberally, because it
> prevents further tests from running (i.e. if 001 bails, 002+ doesn't run),
> which doesn't generally seem like the right thing to do after a single test
> fails. But that's really independent of the fix you make here.

Agreed, that's a separate question.  It does seem like "stop this script
and move to the next one" would be better behavior, though.

> I'd maybe do s/later/in the END block/ or such, so that one knows where to
> look. Took me a minute to find it again.

OK.

regards, tom lane

Re: Replace uses of deprecated Python module distutils.sysconfig

2022-01-19 Thread Tom Lane

Peter Eisentraut  writes:
> On 19.01.22 01:18, Tom Lane wrote:
>> ... I'm tempted to resurrect the
>> idea of changing configure's probe order to "python3 python python2"
>> in the meantime, just so we can see how much of the buildfarm is ready
>> for that.

> This seems sensible in any case, given that we have quasi-committed to 
> enforcing Python 3 soon.

Done.  (I couldn't find any equivalent logic in the MSVC build scripts
though; is there something I missed?)

regards, tom lane

Re: Fix BUG #17335: Duplicate result rows in Gather node

2022-01-19 Thread David Rowley

On Fri, 31 Dec 2021 at 00:14, Yura Sokolov  wrote:
> Suggested quick (and valid) fix in the patch attached:
> - If Append has single child, then copy its parallel awareness.

I've been looking at this and I've gone through changing my mind about
what's the right fix quite a number of times.

My current thoughts are that I don't really like the fact that we can
have plans in the following shape:

 Finalize Aggregate
   ->  Gather
 Workers Planned: 1
 ->  Partial Aggregate
   ->  Parallel Hash Left Join
 Hash Cond: (gather_append_1.fk = gather_append_2.fk)
 ->  Index Scan using gather_append_1_ix on gather_append_1
   Index Cond: (f = true)
 ->  Parallel Hash
   ->  Parallel Seq Scan on gather_append_2

It's only made safe by the fact that Gather will only use 1 worker.
To me, it just seems too fragile to assume that's always going to be
the case. I feel like this fix just relies on the fact that
create_gather_path() and create_gather_merge_path() do
"pathnode->num_workers = subpath->parallel_workers;". If someone
decided that was to work a different way, then we risk this breaking
again. Additionally, today we have Gather and GatherMerge, but we may
one day end up with more node types that gather results from parallel
workers, or even a completely different way of executing plans.

I think a safer way to fix this is to just not remove the
Append/MergeAppend node if the parallel_aware flag of the only-child
and the Append/MergeAppend don't match. I've done that in the
attached.

I believe the code at the end of add_paths_to_append_rel() can remain as is.

David

dont_remove_singlechild_appends_with_mismatching_parallel_awareness.patch
Description: Binary data

Re: Adding CI to our tree

2022-01-19 Thread Andres Freund

Hi,

On 2022-01-19 15:05:44 -0500, Tom Lane wrote:
> And the cause of that is obvious: Cluster::start thinks that if "pg_ctl
> start" failed, there couldn't possibly be a postmaster running.  So it
> doesn't bother to update self->_pid, and then the END hook thinks there is
> nothing to do.

Ah, right.

I'm doubtful that it's good that we use BAIL_OUT so liberally, because it
prevents further tests from running (i.e. if 001 bails, 002+ doesn't run),
which doesn't generally seem like the right thing to do after a single test
fails. But that's really independent of the fix you make here.

> The attached simple fix gets rid of this problem.  Any objections?

Nope, sounds like a plan.

> diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm 
> b/src/test/perl/PostgreSQL/Test/Cluster.pm
> index 7af0f8db13..fd0738d12d 100644
> --- a/src/test/perl/PostgreSQL/Test/Cluster.pm
> +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
> @@ -845,6 +845,11 @@ sub start
>   {
>   print "# pg_ctl start failed; logfile:\n";
>   print PostgreSQL::Test::Utils::slurp_file($self->logfile);
> +
> + # pg_ctl could have timed out, so check to see if there's a pid 
> file;
> + # without this, we fail to shut down the new postmaster later.
> + $self->_update_pid(-1);

I'd maybe do s/later/in the END block/ or such, so that one knows where to
look. Took me a minute to find it again.

Greetings,

Andres Freund

Re: Schema variables - new implementation for Postgres 15

2022-01-19 Thread Pavel Stehule

st 19. 1. 2022 v 9:01 odesílatel Julien Rouhaud  napsal:

> Hi,
>
> On Tue, Jan 18, 2022 at 10:01:01PM +0100, Pavel Stehule wrote:
> > pá 14. 1. 2022 v 3:44 odesílatel Julien Rouhaud 
> napsal:
> >
> > >
> > > =# let myvariable = 'AA';
> > > LET
> > >
> > > =# select 'AA' collate "en-x-icu" < myvariable;
> > >  ?column?
> > > --
> > >  f
> > > (1 row)
> > >
> > > =# select 'AA' collate "en-x-icu" < myvariable collate mycollation;
> > > ERROR:  42P21: collation mismatch between explicit collations
> "en-x-icu"
> > > and "mycollation"
> > > LINE 1: select 'AA' collate "en-x-icu" < myvariable collate mycollat...
> > >
> >
> > What do you expect?  I don't understand collating well, but it looks
> > correct. Minimally the tables have the same behavior.
>
> Indeed, I actually didn't know that such object's collation were implicit
> and
> could be overloaded without a problem as long as there's no conflict
> between
> all the explicit collations.  So I agree that the current behavior is ok,
> including a correct handling for wanted conflicts:
>
> =# create variable var1 text collate "fr-x-icu";
> CREATE VARIABLE
>
> =# create variable var2 text collate "en-x-icu";
> CREATE VARIABLE
>
> =# let var1 = 'hoho';
> LET
>
> =# let var2 = 'hoho';
> LET
>
> =# select var1 < var2;
> ERROR:  42P22: could not determine which collation to use for string
> comparison
> HINT:  Use the COLLATE clause to set the collation explicitly.
>
> > Please, can you check the attached patches?
>
> All the issue I mentioned are fixed, thanks!
>
>
thank you for check


>
> I see a few problems with the other new features added though.  The new
> session_variables_ambiguity_warning GUC is called even in contexts where it
> shouldn't apply.  For instance:
>
> =# set session_variables_ambiguity_warning = 1;
> SET
>
> =# create variable v text;
> CREATE VARIABLE
>
> =# DO $$
> DECLARE v text;
> BEGIN
> v := 'test';
> RAISE NOTICE 'v: %', v;
> END;
> $$ LANGUAGE plpgsql;
> WARNING:  42702: session variable "v" is shadowed by column
> LINE 1: v := 'test'
> ^
> DETAIL:  The identifier can be column reference or session variable
> reference.
> HINT:  The column reference is preferred against session variable
> reference.
> QUERY:  v := 'test'
>
> But this "v := 'test'" shouldn't be a substitute for a LET, and it indeed
> doesn't work:
>

Yes, there are some mistakes (bugs). The PLpgSQL assignment as target
should not see session variables, so warning is nonsense there. RAISE
NOTICE should use local variables, and in this case is a question if we
should raise a warning. There are two possible analogies - we can see
session variables like global variables, and then the warning should not be
raised, or we can see relation between session variables and plpgsql
variables similar like session variables and some with higher priority, and
then warning should be raised. If we want to ensure that the new session
variable doesn't break code, then session variables should have lower
priority than plpgsql variables too. And because the plpgsql protection
against collision cannot  be used, then I prefer raising the warning.

PLpgSQL assignment should not be in collision with session variables ever

>
> =# DO $$
> BEGIN
> v := 'test';
> RAISE NOTICE 'v: %', v;
> END;
> $$ LANGUAGE plpgsql;
> ERROR:  42601: "v" is not a known variable
> LINE 3: v := 'test';
>
> But the RAISE NOTICE does see the session variable (which should be the
> correct
> behavior I think), so the warning should have been raised for this
> instruction
> (and in that case the message is incorrect, as it's not shadowing a
> column).
>

In this case I can detect node type, and I can identify external param
node, but I cannot to detect if this code was executed from PLpgSQL or from
some other

So I can to modify warning text to some

DETAIL:  The identifier can be column reference or query parameter or
session variable reference.
HINT:  The column reference and query parameter is preferred against
session variable reference.

I cannot to use term "plpgsql variable" becase I cannot to ensure validity
of this message

Maybe is better to don't talk about source of this issue, and just talk
about result - so the warning text should be just

MESSAGE: "session variable \"\" is shadowed
DETAIL: "session variables can be shadowed by columns, routine's variables
and routine's arguments with same name"

Is it better?

Re: Adding CI to our tree

2022-01-19 Thread Tom Lane

I wrote:
> This test attempt revealed another problem too: the standby never
> shut down, and thus the calling "make" never quit, until I intervened
> manually.  I'm not sure why.  I see that Cluster::promote uses
> system_or_bail() to run "pg_ctl promote" ... could it be that
> BAIL_OUT causes the normal script END hooks to not get run?
> But it seems like we'd have noticed that long ago.

I failed to reproduce any failure in the promote step, and I now
think I was mistaken and it happened during the standby's initial
start.  I can reproduce that very easily by setting PGCTLTIMEOUT
to a second or two; with fsync enabled, it takes the standby more
than that to reach a consistent state.  And the cause of that
is obvious: Cluster::start thinks that if "pg_ctl start" failed,
there couldn't possibly be a postmaster running.  So it doesn't
bother to update self->_pid, and then the END hook thinks there
is nothing to do.

Now, leaving an idle postmaster hanging around isn't a mortal sin,
since it'll go away by itself shortly after the next cycle of
testing does an "rm -rf" on its data directory.  But it's ugly,
and conceivably it could cause problems for later testing on
machines with limited shmem or semaphore space.

The attached simple fix gets rid of this problem.  Any objections?

regards, tom lane

diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 7af0f8db13..fd0738d12d 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -845,6 +845,11 @@ sub start
 	{
 		print "# pg_ctl start failed; logfile:\n";
 		print PostgreSQL::Test::Utils::slurp_file($self->logfile);
+
+		# pg_ctl could have timed out, so check to see if there's a pid file;
+		# without this, we fail to shut down the new postmaster later.
+		$self->_update_pid(-1);
+
 		BAIL_OUT("pg_ctl start failed") unless $params{fail_ok};
 		return 0;
 	}
@@ -1123,7 +1128,10 @@ archive_command = '$copy_command'
 	return;
 }

-# Internal method
+# Internal method to update $self->{_pid}
+# $is_running = 1: pid file should be there
+# $is_running = 0: pid file should NOT be there
+# $is_running = -1: we aren't sure
 sub _update_pid
 {
 	my ($self, $is_running) = @_;
@@ -1138,7 +1146,7 @@ sub _update_pid
 		close $pidfile;

 		# If we found a pidfile when there shouldn't be one, complain.
-		BAIL_OUT("postmaster.pid unexpectedly present") unless $is_running;
+		BAIL_OUT("postmaster.pid unexpectedly present") if $is_running == 0;
 		return;
 	}

@@ -1146,7 +1154,7 @@ sub _update_pid
 	print "# No postmaster PID for node \"$name\"\n";

 	# Complain if we expected to find a pidfile.
-	BAIL_OUT("postmaster.pid unexpectedly not present") if $is_running;
+	BAIL_OUT("postmaster.pid unexpectedly not present") if $is_running == 1;
 	return;
 }

Re: do only critical work during single-user vacuum?

2022-01-19 Thread Bossart, Nathan

On 1/18/22, 9:47 PM, "Masahiko Sawada"  wrote:
> IIUC what we want to do here are two things: (1) select only old
> tables and (2) set INDEX_CLEANUP = off, TRUNCATE = off, and FREEZE =
> on. VACUUM LIMIT statement does both things at the same time. Although
> I’m concerned a bit about its flexibility, it’s a reasonable solution.
>
> On the other hand, it’s probably also useful to do either one thing in
> some cases. For instance, having a selector for (1) would be useful,
> and having a new option like FAST_FREEZE for (2) would also be useful.
> Given there is already a way for (2) (it does not default though), I
> think it might also be a good start inventing something for (1). For
> instance, a selector for VACUUM statement I came up with is:
>
> VACUUM (verbose on) TABLES WITH (min_xid_age = 16);
> or
> VACUUM (verbose on) TABLES WITH (min_age = failsafe_limit);
>
> We can expand it in the future to select tables by, for example, dead
> tuple ratio, size, etc.
>
> It's a random thought but maybe worth considering.

That's an interesting idea.  A separate selector clause could also
allow users to choose how they interacted (e.g., should the options be
OR'd or AND'd).

Nathan

Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-01-19 Thread Peter Geoghegan

On Wed, Jan 19, 2022 at 6:56 AM Robert Haas  wrote:
> I don't think I've said anything on this thread that is an attack on
> you. I am getting pretty frustrated with the tenor of the discussion,
> though. I feel like you're the one attacking me, and I don't like it.

"Attack" is a strong word (much stronger than "defend"), and I don't
think I'd use it to describe anything that has happened on this
thread. All I said was that you misrepresented my views when you
pounced on my use of the word "continuous". Which, honestly, I was
very surprised by.

> For all of that, I'm not even convinced that you're wrong. I just
> think you might be wrong. I don't really know.

I agree that I might be wrong, though of course I think that I'm
probably correct. I value your input as a critical voice -- that's
generally how we get really good designs.

> However, there are cases where waiting, and only
> waiting, gets the job done. If you're not willing to admit that those
> cases exist, or you think they don't matter, then we disagree.

They exist, of course. That's why I don't want to completely eliminate
the idea of waiting for a cleanup lock. Rather, I want to change the
design to recognize that that's an extreme measure, that should be
delayed for as long as possible. There are many ways that the problem
could naturally resolve itself.

Waiting for a cleanup lock after only 50 million XIDs (the
vacuum_freeze_min_age default) is like performing brain surgery to
treat somebody with a headache (at least with the infrastructure from
the earlier patches in place). It's not impossible that "surgery"
could help, in theory (could be a tumor, better to catch these things
early!), but that fact alone can hardly justify such a drastic
measure. That doesn't mean that brain surgery isn't ever appropriate,
of course. It should be delayed until it starts to become obvious that
it's really necessary (but before it really is too late).

> If you
> admit that they exist and think they matter but believe that there's
> some reason why increasing FreezeLimit can't cause any damage, then
> either (a) you have a good reason for that belief which I have thus
> far been unable to understand or (b) you're more optimistic about the
> proposed change than can be entirely justified.

I don't deny that it's just about possible that the changes that I'm
thinking of could make the situation worse in some cases, but I think
that the overwhelming likelihood is that things will be improved
across the board.

Consider the age of the tables from BenchmarkSQL, with the patch series:

 relname  │ age │ mxid_age
──┼─┼──
 bmsql_district   │ 657 │0
 bmsql_warehouse  │ 696 │0
 bmsql_item   │   1,371,978 │0
 bmsql_config │   1,372,061 │0
 bmsql_new_order  │   3,754,163 │0
 bmsql_history│  11,545,940 │0
 bmsql_order_line │  23,095,678 │0
 bmsql_oorder │  40,653,743 │0
 bmsql_customer   │  51,371,610 │0
 bmsql_stock  │ 51,371,610 │0
(10 rows)

We see significant "natural variation" here, unlike HEAD, where the
age of all tables is exactly the same at all times, or close to it
(incidentally, this leads to the largest tables all being
anti-wraparound VACUUMed at the same time). There is a kind of natural
ebb and flow for each table over time, as relfrozenxid is advanced,
due in part to workload characteristics. Less than half of all XIDs
will ever modify the two largest tables, for example, and so
autovacuum should probably never be launched because of the age of
either table (barring some change in workload conditions, perhaps). As
I've said a few times now, XIDs are generally "the wrong unit", except
when needed as a backstop against wraparound failure.

The natural variation that I see contributes to my optimism. A
situation where we cannot get a cleanup lock may well resolve itself,
for many reasons, that are hard to precisely nail down but are
nevertheless very real.

The vacuum_freeze_min_age design (particularly within an aggressive
VACUUM) is needlessly rigid, probably just because the assumption
before now has always been that we can only advance relfrozenxid in an
aggressive VACUUM (it might happen in a non-aggressive VACUUM if we
get very lucky, which cannot be accounted for). Because it is rigid,
it is brittle. Because it is brittle, it will (on a long enough
timeline, for a susceptible workload) actually break.

> On the other hand if that user is going to close that
> cursor after 10 minutes and open a new one in the same place 10
> seconds later, the best thing to do is to keep FreezeLimit as low as
> possible, because the first time we wait for the pin to be released
> we're guaranteed to advance relfrozenxid within 10 minutes, whereas if
> we don't do that we may keep missing the brief windows in which no
> cursor is held for a very long time. But we have absolutely no way of
>

Re: Time to increase hash_mem_multiplier default?

2022-01-19 Thread John Naylor

On Sun, Jan 16, 2022 at 7:28 PM Peter Geoghegan  wrote:
>
> The current hash_mem_multiplier default is 1.0, which is a fairly
> conservative default: it preserves the historic behavior, which is
> that hash-based executor nodes receive the same work_mem budget as
> sort-based nodes. I propose that the default be increased to 2.0 for
> Postgres 15.

I don't have anything really profound to say here, but in the last
year I did on a couple occasions recommend clients to raise
hash_mem_multiplier to 2.0 to fix performance problems.

During this cycle, we also got a small speedup in the external sorting
code. Also, if the "generation context" idea gets traction, that might
be another reason to consider differentiating the mem settings.
--
John Naylor
EDB: http://www.enterprisedb.com

Re: speed up text_position() for utf-8

2022-01-19 Thread John Naylor

I wrote:

> 0001 puts the main implementation of pg_utf_mblen() into an inline
> function and uses this in pg_mblen(). This is somewhat faster in the
> strpos tests, so that gives some measure of the speedup expected for
> other callers. Text search seems to call this a lot, so this might
> have noticeable benefit.
>
> 0002 refactors text_position_get_match_pos() to use
> pg_mbstrlen_with_len(). This itself is significantly faster when
> combined with 0001, likely because the latter can inline the call to
> pg_mblen(). The intention is to speed up more than just text_position.
>
> 0003 explicitly specializes for the inline version of pg_utf_mblen()
> into pg_mbstrlen_with_len(), but turns out to be almost as slow as
> master for ascii. It doesn't help if I undo the previous change in
> pg_mblen(), and I haven't investigated why yet.
>
> 0002 looks good now, but the experience with 0003 makes me hesitant to
> propose this seriously until I can figure out what's going on there.
>
> The test is as earlier, a worst-case substring search, times in milliseconds.
>
>  patch  | no match | ascii | multibyte
> +--+---+---
>  PG11   | 1220 |  1220 |  1150
>  master |  385 |  2420 |  1980
>  0001   |  390 |  2180 |  1670
>  0002   |  389 |  1330 |  1100
>  0003   |  391 |  2100 |  1360

I tried this test on a newer CPU, and 0003 had no regression. Both
systems used gcc 11.2. Rather than try to figure out why an experiment
had unexpected behavior, I plan to test 0001 and 0002 on a couple
different compilers/architectures and call it a day. It's also worth
noting that 0002 by itself seemed to be decently faster on the newer
machine, but not as fast as 0001 and 0002 together.

Looking at the assembly, pg_mblen is inlined into
pg_mbstrlen_[with_len] and pg_mbcliplen, so the specialization for
utf-8 in 0001 would be inlined in the other 3 as well. That's only a
few bytes, so I think it's fine.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: do only critical work during single-user vacuum?

2022-01-19 Thread John Naylor

On Wed, Jan 19, 2022 at 12:46 AM Masahiko Sawada  wrote:
>
> On Fri, Jan 14, 2022 at 7:04 AM Bossart, Nathan  wrote:
> >
> > I guess I'm ultimately imagining the new options as replacing the
> > vacuumdb implementation.  IOW vacuumdb would just use MIN_(M)XID_AGE
> > behind the scenes (as would a new top-level command).
>
> I had the same idea.

This seems to be the motivating reason for wanting new configurability
on the server side. In any case, new knobs are out of scope for this
thread. If the use case is compelling enough, may I suggest starting a
new thread?

Regarding the thread subject, I've been playing with the grammar, and
found it's quite easy to have

VACUUM FOR WRAPAROUND
or
VACUUM FOR EMERGENCY

since FOR is a reserved word (and following that can be an IDENT plus
a strcmp check) and cannot conflict with table names. This sounds a
bit more natural than VACUUM LIMIT. Opinions?

-- 
John Naylor
EDB: http://www.enterprisedb.com

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-01-19 Thread Andres Freund

Hi,

On 2022-01-19 13:34:21 -0500, Tom Lane wrote:
> Bharath Rupireddy  writes:
> > On Sat, Jan 15, 2022 at 2:59 PM Julien Rouhaud  wrote:
> >> Maybe I'm missing something but wouldn't
> >> https://commitfest.postgresql.org/36/3448/ better solve the problem?
>
> > The error can cause the new background process proposed there in that
> > thread to restart, which is again costly. Since we have LOG-only and
> > continue behavior in CheckPointSnapBuild already, having the same
> > behavior for CheckPointLogicalRewriteHeap helps a lot.
>
> [ stares at CheckPointLogicalRewriteHeap for awhile ... ]
>
> This code has got more problems than that.  It took me awhile to
> absorb it, but we don't actually care about the contents of any of
> those files; all of the information is encoded in the file *names*.

I'm not following - we *do* need the contents of the files? They're applied
as-needed in ApplyLogicalMappingFile().

> (This strikes me as probably not a very efficient design, compared
> to putting the same data into a single text file; but for now I'll
> assume we're not up for a complete rewrite.)  That being the case,
> I wonder what it is we expect fsync'ing the surviving files to do
> exactly.  We should be fsync'ing the directory not the files
> themselves, no?

Fsyncing the directory doesn't guarantee anything about the contents of
files. But, you're right, we need an fsync of the directory too.

> Other things that seem poorly thought out:
>
> * Why is the check for "map-" prefix after, rather than before,
> the lstat?

It doesn't seem to matter much - there shouldn't be a meaningful amount of
other files in there.

> * Why is it okay to ignore lstat failure?  Seems like we might
> as well not even have the lstat.

Yea, that seems odd, not sure why that ended up this way. I guess the aim
might have been to tolerate random files we don't have permissions for or
such?

> * The sscanf on the file name would not notice trailing junk,
> such as an editor backup marker.  Is that okay?

I don't really see a problem with it - there shouldn't be other files matching
the pattern - but it couldn't hurt to check the pattern matches exhaustively.

> As far as the patch itself goes, I agree that failure to unlink
> is noncritical, because such a file would have no further effect
> and we can just ignore it.

I don't agree. We iterate through the directory regularly on systems with
catalog changes + logical decoding. An ever increasing list of gunk will make
that more and more expensive.  And I haven't heard a meaningful reason why we
would have map-* files that we can't remove.

Ignoring failures like this just makes problems much harder to debug and they
tend to bite harder for it.

> I think I also agree that failure of the sscanf is noncritical, because the
> implication of that is that the file name doesn't conform to our
> expectations, which means it's basically just like the check that causes us
> to ignore names not starting with "map-".  (Actually, isn't the separate
> check for "map-" useless, given that sscanf will make the equivalent check?)

Well, this way only files starting with "map-" are expected to conform to a
strict format, the rest is ignored?

> Anyway, I think possibly we can drop the bottom half of the loop
> (the part trying to fsync non-removed files) in favor of fsync'ing
> the directory afterwards.  Thoughts?

I don't think that'd be correct.

In short: We should add a directory fsync, I'm fine with improving the error
checking, but the rest seems like a net-negative with no convincing reasoning.

Greetings,

Andres Freund

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-01-19 Thread Tom Lane

I wrote:
> Anyway, I think possibly we can drop the bottom half of the loop
> (the part trying to fsync non-removed files) in favor of fsync'ing
> the directory afterwards.  Thoughts?

Oh, scratch that --- *this* loop doesn't care about the file
contents, but other code does.  However, don't we need a directory
fsync too?  Or is that handled someplace else?

regards, tom lane

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-01-19 Thread Tom Lane

Bharath Rupireddy  writes:
> On Sat, Jan 15, 2022 at 2:59 PM Julien Rouhaud  wrote:
>> Maybe I'm missing something but wouldn't
>> https://commitfest.postgresql.org/36/3448/ better solve the problem?

> The error can cause the new background process proposed there in that
> thread to restart, which is again costly. Since we have LOG-only and
> continue behavior in CheckPointSnapBuild already, having the same
> behavior for CheckPointLogicalRewriteHeap helps a lot.

[ stares at CheckPointLogicalRewriteHeap for awhile ... ]

This code has got more problems than that.  It took me awhile to
absorb it, but we don't actually care about the contents of any of
those files; all of the information is encoded in the file *names*.
(This strikes me as probably not a very efficient design, compared
to putting the same data into a single text file; but for now I'll
assume we're not up for a complete rewrite.)  That being the case,
I wonder what it is we expect fsync'ing the surviving files to do
exactly.  We should be fsync'ing the directory not the files
themselves, no?

Other things that seem poorly thought out:

* Why is the check for "map-" prefix after, rather than before,
the lstat?

* Why is it okay to ignore lstat failure?  Seems like we might
as well not even have the lstat.

* The sscanf on the file name would not notice trailing junk,
such as an editor backup marker.  Is that okay?

As far as the patch itself goes, I agree that failure to unlink
is noncritical, because such a file would have no further effect
and we can just ignore it.  I think I also agree that failure
of the sscanf is noncritical, because the implication of that
is that the file name doesn't conform to our expectations, which
means it's basically just like the check that causes us to
ignore names not starting with "map-".  (Actually, isn't the
separate check for "map-" useless, given that sscanf will make
the equivalent check?)

I started out wondering why the patch didn't also change the loop's
other ERROR conditions to LOG.  But we do want to ERROR if we're
unable to sync transient state down to disk, and that is what
the other steps (think they) are doing.  It might be worth a
comment to point that out though, before someone decides that
if these errors are just LOG level then the others can be too.

Anyway, I think possibly we can drop the bottom half of the loop
(the part trying to fsync non-removed files) in favor of fsync'ing
the directory afterwards.  Thoughts?

regards, tom lane

Re: Compiling PostgreSQL for WIndows with 16kb blocksize

2022-01-19 Thread Andres Freund

Hi,

On January 19, 2022 9:07:50 AM PST, Yannick Collette 
 wrote:
>Hello,
>
>For test purposes I need to compile PostgreSQL 14.1 using a 16kb blocksize.
>
>CFLAGS="-D WINVER=0x0600 -D _WIN32_WINNT=0x0600" LIBS="-ladvapi32"
> ./configure --host=x86_64-w64-mingw32 --with-blocksize=16
>--with-wal-blocksize=16 --with-openssl --with-libxml
>--prefix=/c/postgresql/pg14/ 2>&1 | tee configure.log
>
>Below is the beginning of my configure.log, with no errors:  make and make
>install ok also.
>
>configure: loading site script /etc/config.site
>checking build system type... x86_64-pc-msys
>checking host system type... x86_64-w64-mingw32
>checking which template to use... win32
>checking whether NLS is wanted... no
>checking for default port number... 5432
>checking for block size... 16kB
>checking for segment size... 1GB
>checking for WAL block size... 16kB
>checking for x86_64-w64-mingw32-gcc... x86_64-w64-mingw32-gcc
>checking whether the C compiler works... yes
>...
>
>DB created successfully using initdb.
>
>Unfortunately my blocksize is still 8kb when checking in DB.
>
>postgres=# show block_size;
> block_size
>
> 8192
>(1 row)
>
>
>postgres=# select version();
>   version
>-
> PostgreSQL 14.1 on x86_64-w64-mingw32, compiled by
>x86_64-w64-mingw32-gcc.exe (Rev9, Built by MSYS2 project) 10.2.0, 64-bit
>(1 row)
>
>Is there anything additional step I'm missing?

Any chance this is from an older build? You might need to make clean, if you 
previously ./configured differently.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: slowest tap tests - split or accelerate?

2022-01-19 Thread Andres Freund

Hi,

On 2022-01-19 12:14:21 -0500, Tom Lane wrote:
> No, it was largely the same as what you have here, I think.  I dug
> up my WIP patch and attach it below, just in case there's any ideas
> worth borrowing.

Heh, it does look quite similar.

> +  "cp -a \"%s\" \"%s/data\" > 
> \"%s/log/initdb.log\" 2>&1",
> +  pg_proto_datadir,
> +  temp_instance,
> +  outputdir);
> + if (system(buf))
> + {
> + fprintf(stderr, _("\n%s: cp failed\nExamine 
> %s/log/initdb.log for the reason.\nCommand was: %s\n"), progname, outputdir, 
> buf);
> + exit(2);
> + }

Both ours have this. Unfortunately on windows cp doesn't natively
exist. Although git does provide it.  I tried a few things that appear to be
natively available (time is best of three executions):

 gnu cp from git, cp -a tmp_install\initdb_template t\
 670ms

 xcopy.exe /E /Q tmp_install\initdb_template t\
 638ms

 robocopy /e /NFL /NDL tmp_install\initdb_template t\
 575ms

So I guess we could use robocopy? That's shipped as part of windows starting in
windows 10... xcopy has been there for longer, so I might just default to that.

Greetings,

Andres Freund

Re: Compiling PostgreSQL for WIndows with 16kb blocksize

2022-01-19 Thread Tom Lane

Yannick Collette  writes:
> For test purposes I need to compile PostgreSQL 14.1 using a 16kb blocksize.

> CFLAGS="-D WINVER=0x0600 -D _WIN32_WINNT=0x0600" LIBS="-ladvapi32"
>  ./configure --host=x86_64-w64-mingw32 --with-blocksize=16
> --with-wal-blocksize=16 --with-openssl --with-libxml
> --prefix=/c/postgresql/pg14/ 2>&1 | tee configure.log

I don't know anything about the Windows-specific details here,
but the --with-blocksize option looks fine, and it works for me:

regression=# show block_size;
 block_size 

 16384
(1 row)

(Worth noting here is that a lot of our regression tests fail
at non-default blocksizes, because plans change, or rows no
longer need toasting, or the like.)

> Unfortunately my blocksize is still 8kb when checking in DB.
> postgres=# show block_size;
>  block_size
> 
>  8192
> (1 row)

I think you must be connecting to the wrong server.

regards, tom lane

Re: slowest tap tests - split or accelerate?

2022-01-19 Thread Tom Lane

Andres Freund  writes:
> On 2022-01-19 11:54:01 -0500, Tom Lane wrote:
>> Me too ;-).  As I remarked earlier, I'd tried this once before and
>> gave up because it didn't seem to be winning much.  But that was
>> before we had so many initdb's triggered by TAP tests, I think.

> What approach did you use? Do you have a better idea than generating
> tmp_install/initdb_template?

No, it was largely the same as what you have here, I think.  I dug
up my WIP patch and attach it below, just in case there's any ideas
worth borrowing.

>> (Note that all four runs have the "fsync = on" removed from
>> 008_fsm_truncation.pl.)

> I assume you're planning on comitting that?

Yeah, will do that shortly.

regards, tom lane

diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index dc8a89af8e..a28a03ac72 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -332,6 +332,7 @@ ifeq ($(MAKELEVEL),0)
 	rm -rf '$(abs_top_builddir)'/tmp_install
 	$(MKDIR_P) '$(abs_top_builddir)'/tmp_install/log
 	$(MAKE) -C '$(top_builddir)' DESTDIR='$(abs_top_builddir)'/tmp_install install >'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1
+	'$(abs_top_builddir)/tmp_install$(bindir)/initdb' -D '$(abs_top_builddir)/tmp_install/proto_pgdata' --no-clean --no-sync -A trust >'$(abs_top_builddir)'/tmp_install/log/initdb.log 2>&1
 endif
 	$(if $(EXTRA_INSTALL),for extra in $(EXTRA_INSTALL); do $(MAKE) -C '$(top_builddir)'/$$extra DESTDIR='$(abs_top_builddir)'/tmp_install install >>'$(abs_top_builddir)'/tmp_install/log/install.log || exit; done)
 endif
@@ -355,7 +356,7 @@ $(if $(filter $(PORTNAME),darwin),DYLD_LIBRARY_PATH,$(if $(filter $(PORTNAME),ai
 endef
 
 define with_temp_install
-PATH="$(abs_top_builddir)/tmp_install$(bindir):$$PATH" $(call add_to_path,$(ld_library_path_var),$(abs_top_builddir)/tmp_install$(libdir))
+PATH="$(abs_top_builddir)/tmp_install$(bindir):$$PATH" $(call add_to_path,$(ld_library_path_var),$(abs_top_builddir)/tmp_install$(libdir)) PG_PROTO_DATADIR='$(abs_top_builddir)/tmp_install/proto_pgdata'
 endef
 
 ifeq ($(enable_tap_tests),yes)
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index f4fa600951..22c1a726c7 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -404,8 +404,23 @@ sub init
 	mkdir $self->backup_dir;
 	mkdir $self->archive_dir;
 
-	TestLib::system_or_bail('initdb', '-D', $pgdata, '-A', 'trust', '-N',
-		@{ $params{extra} });
+	# If we're using default initdb parameters, and the top-level "make check"
+	# created a prototype data directory for us, just copy that.
+	if (!defined($params{extra}) &&
+		defined($ENV{PG_PROTO_DATADIR}) &&
+		-d $ENV{PG_PROTO_DATADIR})
+	{
+		rmdir($pgdata);
+		RecursiveCopy::copypath($ENV{PG_PROTO_DATADIR}, $pgdata);
+		chmod(0700, $pgdata);
+	}
+	else
+	{
+		TestLib::system_or_bail('initdb', '-D', $pgdata,
+'--no-clean', '--no-sync', '-A', 'trust',
+@{ $params{extra} });
+	}
+
 	TestLib::system_or_bail($ENV{PG_REGRESS}, '--config-auth', $pgdata);
 
 	open my $conf, '>>', "$pgdata/postgresql.conf";
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index abb742b1ed..04d7fb910b 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2214,6 +2214,8 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 
 	if (temp_instance)
 	{
+		char	   *pg_proto_datadir;
+		struct stat ppst;
 		FILE	   *pg_conf;
 		const char *env_wait;
 		int			wait_seconds;
@@ -2243,20 +2245,43 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		if (!directory_exists(buf))
 			make_directory(buf);
 
-		/* initdb */
-		header(_("initializing database system"));
-		snprintf(buf, sizeof(buf),
- "\"%s%sinitdb\" -D \"%s/data\" --no-clean --no-sync%s%s > \"%s/log/initdb.log\" 2>&1",
- bindir ? bindir : "",
- bindir ? "/" : "",
- temp_instance,
- debug ? " --debug" : "",
- nolocale ? " --no-locale" : "",
- outputdir);
-		if (system(buf))
+		/* see if we should run initdb or just copy a prototype datadir */
+		pg_proto_datadir = getenv("PG_PROTO_DATADIR");
+		if (!nolocale &&
+			pg_proto_datadir &&
+			stat(pg_proto_datadir, ) == 0 &&
+			S_ISDIR(ppst.st_mode))
 		{
-			fprintf(stderr, _("\n%s: initdb failed\nExamine %s/log/initdb.log for the reason.\nCommand was: %s\n"), progname, outputdir, buf);
-			exit(2);
+			/* copy prototype */
+			header(_("copying prototype data directory"));
+			snprintf(buf, sizeof(buf),
+	 "cp -a \"%s\" \"%s/data\" > \"%s/log/initdb.log\" 2>&1",
+	 pg_proto_datadir,
+	 temp_instance,
+	 outputdir);
+			if (system(buf))
+			{
+fprintf(stderr, _("\n%s: cp failed\nExamine %s/log/initdb.log for the reason.\nCommand was: %s\n"), progname, outputdir, buf);
+exit(2);
+			}
+		}
+		else
+		{
+			/* run initdb */
+			header(_("initializing database system"));
+			snprintf(buf, sizeof(buf),
+

Compiling PostgreSQL for WIndows with 16kb blocksize

2022-01-19 Thread Yannick Collette

Hello,

For test purposes I need to compile PostgreSQL 14.1 using a 16kb blocksize.

CFLAGS="-D WINVER=0x0600 -D _WIN32_WINNT=0x0600" LIBS="-ladvapi32"
 ./configure --host=x86_64-w64-mingw32 --with-blocksize=16
--with-wal-blocksize=16 --with-openssl --with-libxml
--prefix=/c/postgresql/pg14/ 2>&1 | tee configure.log

Below is the beginning of my configure.log, with no errors:  make and make
install ok also.

configure: loading site script /etc/config.site
checking build system type... x86_64-pc-msys
checking host system type... x86_64-w64-mingw32
checking which template to use... win32
checking whether NLS is wanted... no
checking for default port number... 5432
checking for block size... 16kB
checking for segment size... 1GB
checking for WAL block size... 16kB
checking for x86_64-w64-mingw32-gcc... x86_64-w64-mingw32-gcc
checking whether the C compiler works... yes
...

DB created successfully using initdb.

Unfortunately my blocksize is still 8kb when checking in DB.

postgres=# show block_size;
 block_size

 8192
(1 row)


postgres=# select version();
   version
-
 PostgreSQL 14.1 on x86_64-w64-mingw32, compiled by
x86_64-w64-mingw32-gcc.exe (Rev9, Built by MSYS2 project) 10.2.0, 64-bit
(1 row)

Is there anything additional step I'm missing?

Thanks in advance for your help!
Yannick

Re: slowest tap tests - split or accelerate?

2022-01-19 Thread Andres Freund

Hi,

On 2022-01-19 11:54:01 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2022-01-18 13:40:40 -0800, Andres Freund wrote:
> >> Maybe we really should do at least the most simplistic caching for 
> >> initdbs, by
> >> doing one initdb as part of the creation of temp_install. Then 
> >> Cluster::init
> >> would need logic to only use that if $params{extra} is empty.
>
> > I hacked this together. And the wins are bigger than I thought.
>
> Me too ;-).  As I remarked earlier, I'd tried this once before and
> gave up because it didn't seem to be winning much.  But that was
> before we had so many initdb's triggered by TAP tests, I think.

What approach did you use? Do you have a better idea than generating
tmp_install/initdb_template?

I for a bit wondered whether initdb should do this internally instead. But it
seemed more work than I wanted to tackle.

The bit in the patch about generating initdb_template in Install.pm definitely
needs to be made conditional, but I don't precisely know on what. The
buildfarm just calls it as
  perl install.pl "$installdir

> So this looks like it'll be a nice win for low-end hardware, too.

Nice!

> (Note that all four runs have the "fsync = on" removed from
> 008_fsm_truncation.pl.)

I assume you're planning on comitting that?

Greetings,

Andres Freund

Re: slowest tap tests - split or accelerate?

2022-01-19 Thread Tom Lane

Andres Freund  writes:
> On 2022-01-18 13:40:40 -0800, Andres Freund wrote:
>> Maybe we really should do at least the most simplistic caching for initdbs, 
>> by
>> doing one initdb as part of the creation of temp_install. Then Cluster::init
>> would need logic to only use that if $params{extra} is empty.

> I hacked this together. And the wins are bigger than I thought.

Me too ;-).  As I remarked earlier, I'd tried this once before and
gave up because it didn't seem to be winning much.  But that was
before we had so many initdb's triggered by TAP tests, I think.

I tried this patch on florican's host, which seems mostly disk-bound
when doing check-world.  It barely gets any win from parallelism:

$ time make -s check-world -j1 >/dev/null
 3809.60 real   584.44 user   282.23 sys
$ time make -s check-world -j2 >/dev/null
 3789.90 real   610.60 user   289.60 sys

Adding v2-0001-hack-use-template-initdb-in-tap-tests.patch:

$ time make -s check-world -j1 >/dev/null
 3193.46 real   221.32 user   226.11 sys
$ time make -s check-world -j2 >/dev/null
 3211.19 real   224.31 user   230.07 sys

(Note that all four runs have the "fsync = on" removed from
008_fsm_truncation.pl.)

So this looks like it'll be a nice win for low-end hardware, too.

regards, tom lane

Re: Replace uses of deprecated Python module distutils.sysconfig

2022-01-19 Thread Peter Eisentraut


On 19.01.22 01:18, Tom Lane wrote:

Anyway, based on these results, we might have better luck switching to
sysconfig after we start forcing python3.  I'm tempted to resurrect the
idea of changing configure's probe order to "python3 python python2"
in the meantime, just so we can see how much of the buildfarm is ready
for that.


This seems sensible in any case, given that we have quasi-committed to 
enforcing Python 3 soon.

Re: Remove redundant MemoryContextSwith in BeginCopyFrom

2022-01-19 Thread Tom Lane

Bharath Rupireddy  writes:
> +1. It looks like a thinko from c532d15d. There's no code in between,
> so switching to oldcontext doesn't make sense.

Agreed.

> I think we also need to remove MemoryContextSwitchTo(oldcontext); at
> the end of BeginCopyTo in copyto.c, because we are not changing memory
> contexts in between.

Hmm, I think it'd be a better idea to remove the one in the middle of
BeginCopyTo.  The code after that is still doing setup of the cstate,
so the early switch back looks to me like trouble waiting to happen.

regards, tom lane

Re: refactoring basebackup.c

2022-01-19 Thread tushar


On 1/18/22 8:12 PM, Jeevan Ladhe wrote:

Similar to LZ4 server-side compression, I have also tried to add a ZSTD
server-side compression in the attached patch.
Thanks Jeevan. while testing found one scenario where the server is 
getting crash while performing pg_basebackup

against server-compression=zstd for a huge data second time

Steps to reproduce
--PG sources ( apply v11-0001,v11-0001,v9-0001,v9-0002 , configure 
--with-lz4,--with-zstd, make/install, initdb, start server)

--insert huge data (./pgbench -i -s 2000 postgres)
--restart the server (./pg_ctl -D data restart)
--pg_basebackup ( ./pg_basebackup  -t server:/tmp/yc1 
--server-compression=zstd -R  -Xnone -n -N -l 'ccc' --no-estimate-size -v)

--insert huge data (./pgbench -i -s 1000 postgres)
--restart the server (./pg_ctl -D data restart)
--run pg_basebackup again (./pg_basebackup  -t server:/tmp/yc11 
--server-compression=zstd -v  -Xnone )


[edb@centos7tushar bin]$ ./pg_basebackup  -t server:/tmp/yc11 
--server-compression=zstd -v  -Xnone

pg_basebackup: initiating base backup, waiting for checkpoint to complete
2022-01-19 21:23:26.508 IST [30219] LOG:  checkpoint starting: force wait
2022-01-19 21:23:26.608 IST [30219] LOG:  checkpoint complete: wrote 0 
buffers (0.0%); 0 WAL file(s) added, 1 removed, 0 recycled; write=0.001 
s, sync=0.001 s, total=0.101 s; sync files=0, longest=0.000 s, 
average=0.000 s; distance=16369 kB, estimate=16369 kB

pg_basebackup: checkpoint completed
TRAP: FailedAssertion("len > 0 && len <= sink->bbs_buffer_length", File: 
"../../../src/include/replication/basebackup_sink.h", Line: 208, PID: 30226)
postgres: walsender edb [local] sending backup "pg_basebackup base 
backup"(ExceptionalCondition+0x7a)[0x94ceca]
postgres: walsender edb [local] sending backup "pg_basebackup base 
backup"[0x7b9a08]
postgres: walsender edb [local] sending backup "pg_basebackup base 
backup"[0x7b9be2]
postgres: walsender edb [local] sending backup "pg_basebackup base 
backup"[0x7b5b30]
postgres: walsender edb [local] sending backup "pg_basebackup base 
backup"(SendBaseBackup+0x563)[0x7b7053]
postgres: walsender edb [local] sending backup "pg_basebackup base 
backup"(exec_replication_command+0x961)[0x7c9a41]
postgres: walsender edb [local] sending backup "pg_basebackup base 
backup"(PostgresMain+0x92f)[0x81ca3f]
postgres: walsender edb [local] sending backup "pg_basebackup base 
backup"[0x48e430]
postgres: walsender edb [local] sending backup "pg_basebackup base 
backup"(PostmasterMain+0xfd2)[0x785702]
postgres: walsender edb [local] sending backup "pg_basebackup base 
backup"(main+0x1c6)[0x48fb96]

/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f63642c8555]
postgres: walsender edb [local] sending backup "pg_basebackup base 
backup"[0x48feb5]
pg_basebackup: error: could not read COPY data: server closed the 
connection unexpectedly

    This probably means the server terminated abnormally
    before or while processing the request.
2022-01-19 21:25:34.485 IST [30205] LOG:  server process (PID 30226) was 
terminated by signal 6: Aborted
2022-01-19 21:25:34.485 IST [30205] DETAIL:  Failed process was running: 
BASE_BACKUP ( LABEL 'pg_basebackup base backup', PROGRESS,  MANIFEST 
'yes',  TABLESPACE_MAP,  TARGET 'server', TARGET_DETAIL '/tmp/yc11',  
COMPRESSION 'zstd')
2022-01-19 21:25:34.485 IST [30205] LOG:  terminating any other active 
server processes
[edb@centos7tushar bin]$ 2022-01-19 21:25:34.489 IST [30205] LOG: all 
server processes terminated; reinitializing
2022-01-19 21:25:34.536 IST [30228] LOG:  database system was 
interrupted; last known up at 2022-01-19 21:23:26 IST
2022-01-19 21:25:34.669 IST [30228] LOG:  database system was not 
properly shut down; automatic recovery in progress

2022-01-19 21:25:34.671 IST [30228] LOG:  redo starts at 9/728
2022-01-19 21:25:34.671 IST [30228] LOG:  invalid record length at 
9/7000148: wanted 24, got 0
2022-01-19 21:25:34.671 IST [30228] LOG:  redo done at 9/7000110 system 
usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2022-01-19 21:25:34.673 IST [30229] LOG:  checkpoint starting: 
end-of-recovery immediate wait
2022-01-19 21:25:34.713 IST [30229] LOG:  checkpoint complete: wrote 3 
buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.003 
s, sync=0.001 s, total=0.041 s; sync files=2, longest=0.001 s, 
average=0.001 s; distance=0 kB, estimate=0 kB
2022-01-19 21:25:34.718 IST [30205] LOG:  database system is ready to 
accept connections


Observation -

if we change server-compression method to lz4 from zstd then it is NOT 
happening.


[edb@centos7tushar bin]$ ./pg_basebackup  -t server:/tmp/ycc1 
--server-compression=lz4 -v  -Xnone

pg_basebackup: initiating base backup, waiting for checkpoint to complete
2022-01-19 21:27:51.642 IST [30229] LOG:  checkpoint starting: force wait
2022-01-19 21:27:51.687 IST [30229] LOG:  checkpoint complete: wrote 0 
buffers (0.0%); 0 WAL file(s) added, 1 removed, 0 recycled; write=0.001 
s, sync=0.001 s, total=0.046 s; sync

Re: Replace uses of deprecated Python module distutils.sysconfig

2022-01-19 Thread Tom Lane

I wrote:
> Anyway, based on these results, we might have better luck switching to
> sysconfig after we start forcing python3.

On the other hand, that answer is not back-patchable, and we surely
need a back-patchable fix, because people will try to build the
back branches against newer pythons.

Based on the buildfarm results so far, the problem can be described
as "some installations say /usr/local when they should have said /usr".
I experimented with the attached delta patch and it fixes the problem
on my Debian 9 image.  (I don't know Python, so there may be a better
way to do this.)  We'd have to also bump the minimum 3.x version to
3.2, but that seems very unlikely to bother anyone.

regards, tom lane

diff --git a/config/python.m4 b/config/python.m4
index 8ca1eaa64b..c65356c6ac 100644
--- a/config/python.m4
+++ b/config/python.m4
@@ -56,13 +56,20 @@ AC_MSG_RESULT([$python_configdir])
 
 AC_MSG_CHECKING([Python include directories])
 python_includespec=`${PYTHON} -c "
-import sysconfig
-a = '-I' + sysconfig.get_path('include')
-b = '-I' + sysconfig.get_path('platinclude')
+import sysconfig, os.path
+a = sysconfig.get_path('include')
+b = sysconfig.get_path('platinclude')
+# Some versions of sysconfig report '/usr/local/include'
+# when they should have said '/usr/include'
+if not os.path.exists(a + '/Python.h'):
+aalt = a.replace('/usr/local/', '/usr/', 1)
+if os.path.exists(aalt + '/Python.h'):
+a = aalt
+b = b.replace('/usr/local/', '/usr/', 1)
 if a == b:
-print(a)
+print('-I' + a)
 else:
-print(a + ' ' + b)"`
+print('-I' + a + ' -I' + b)"`
 if test "$PORTNAME" = win32 ; then
 python_includespec=`echo $python_includespec | sed 's,[[\]],/,g'`
 fi
diff --git a/configure b/configure
index 9c856cb1d5..f88db9467d 100755
--- a/configure
+++ b/configure
@@ -10370,13 +10370,20 @@ $as_echo "$python_configdir" >&6; }
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking Python include directories" >&5
 $as_echo_n "checking Python include directories... " >&6; }
 python_includespec=`${PYTHON} -c "
-import sysconfig
-a = '-I' + sysconfig.get_path('include')
-b = '-I' + sysconfig.get_path('platinclude')
+import sysconfig, os.path
+a = sysconfig.get_path('include')
+b = sysconfig.get_path('platinclude')
+# Some versions of sysconfig report '/usr/local/include'
+# when they should have said '/usr/include'
+if not os.path.exists(a + '/Python.h'):
+aalt = a.replace('/usr/local/', '/usr/', 1)
+if os.path.exists(aalt + '/Python.h'):
+a = aalt
+b = b.replace('/usr/local/', '/usr/', 1)
 if a == b:
-print(a)
+print('-I' + a)
 else:
-print(a + ' ' + b)"`
+print('-I' + a + ' -I' + b)"`
 if test "$PORTNAME" = win32 ; then
 python_includespec=`echo $python_includespec | sed 's,[\],/,g'`
 fi

Re: A qsort template

2022-01-19 Thread John Naylor

On Tue, Jan 18, 2022 at 9:58 PM Peter Geoghegan  wrote:
>
> On Tue, Jan 18, 2022 at 6:39 PM John Naylor
>  wrote:
> > Editorializing the null position in queries is not very common in my
> > experience. Not null is interesting since it'd be trivial to pass
> > constant false to the same Apply[XYZ]SortComparator() and let the
> > compiler remove all those branches for us. On the other hand, those
> > branches would be otherwise predicted well, so it might make little or
> > no difference.
>
> If you were going to do this, maybe you could encode NULL directly in
> an abbreviated key. I think that that could be made to work if it was
> limited to opclasses with abbreviated keys encoded as unsigned
> integers. Just a thought.

Now that you mention that, I do remember reading about this technique
in the context of b-tree access, so it does make sense. If we had that
capability, it would be trivial to order the nulls how we want while
building the sort tuple datums, and the not-null case would be handled
automatically. And have a smaller code footprint, I think.

-- 
John Naylor
EDB: http://www.enterprisedb.com

Re: refactoring basebackup.c

2022-01-19 Thread Robert Haas

On Wed, Jan 19, 2022 at 7:16 AM Dipesh Pandit  wrote:
> I have added support for decompressing a gzip compressed tar file
> at client. pg_basebackup can enable server side compression for
> plain format backup with this change.
>
> Added a gzip extractor which decompresses the compressed archive
> and forwards it to the next streamer. I have done initial testing and
> working on updating the test coverage.

Cool. It's going to need some documentation changes, too.

I don't like the way you coded this in CreateBackupStreamer(). I would
like the decision about whether to use
bbstreamer_gzip_extractor_new(), and/or throw an error about not being
able to parse an archive, to based on the file type i.e. "did we get a
.tar.gz file?" rather than on whether we asked for server-side
compression. Notice that the existing logic checks whether we actually
got a .tar file from the server rather than assuming that's what must
have happened.

As a matter of style, I don't think it's good for the only thing
inside of an "if" statement to be another "if" statement. The two
could be merged, but we also don't want to have the "if" conditional
be too complex. I am imagining that this should end up saying
something like if (must_parse_archive && !is_tar && !is_tar_gz) {
pg_log_error(...

+ * "windowBits" must be greater than or equal to "windowBits" value
+ * provided to deflateInit2 while compressing.

It would be nice to clarify why we know the value we're using is safe.
Maybe we're using the maximum possible value, in which case you could
just add that to the end of the comment: "...so we use the maximum
possible value for safety."

+/*
+ * End of the stream, if there is some pending data in output buffers then
+ * we must forward it to next streamer.
+ */
+if (res == Z_STREAM_END) {
+bbstreamer_content(mystreamer->base.bbs_next, member,
mystreamer->base.bbs_buffer.data,
+mystreamer->bytes_written, context);
+}

Uncuddle the brace.

It probably doesn't make much difference, but I would be inclined to
do the final flush in bbstreamer_gzip_extractor_finalize() rather than
here. That way we rely on our own notion of when there's no more input
data rather than zlib's notion. Probably terrible things are going to
happen if those two ideas don't match up  but there might be some
other compression algorithm that doesn't return a distinguishing code
at end-of-stream. Such an algorithm would have to take care of any
leftover data in the finalize function, so I think we should do that
here too, so the code can be similar in all cases.

Perhaps we should move all the gzip stuff to a new file bbstreamer_gzip.c.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Refactoring of compression options in pg_basebackup

2022-01-19 Thread Alvaro Herrera

On 2022-Jan-19, Michael Paquier wrote:

> + printf(_("  -Z, --compress=[{gzip,none}[:LEVEL] or [LEVEL]\n"
> +  " compress tar output with 
> given compression method or level\n"));

Note there is an extra [ before the {gzip bit.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"I suspect most samba developers are already technically insane...
Of course, since many of them are Australians, you can't tell." (L. Torvalds)

Re: Remove redundant MemoryContextSwith in BeginCopyFrom

2022-01-19 Thread Bharath Rupireddy

On Wed, Jan 19, 2022 at 7:51 PM Japin Li  wrote:
>
>
> Hi, hackers
>
> When I read the code of COPY ... FROM ..., I find there is a redundant
> MemoryContextSwith() in BeginCopyFrom().  In BeginCopyFrom, it creates
> a COPY memory context and then switches to it, in the middle of this
> function, it switches to the oldcontext and immediately switches back to
> COPY memory context, IMO, this is redundant, and can be removed safely.

+1. It looks like a thinko from c532d15d. There's no code in between,
so switching to oldcontext doesn't make sense.

MemoryContextSwitchTo(oldcontext);
<< no code here >>
oldcontext = MemoryContextSwitchTo(cstate->copycontext);

I think we also need to remove MemoryContextSwitchTo(oldcontext); at
the end of BeginCopyTo in copyto.c, because we are not changing memory
contexts in between.

diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 34c8b80593..5182048e4f 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -742,8 +742,6 @@ BeginCopyTo(ParseState *pstate,

cstate->bytes_processed = 0;

-   MemoryContextSwitchTo(oldcontext);
-
return cstate;
 }

Regards,
Bharath Rupireddy.

Re: Remove redundant MemoryContextSwith in BeginCopyFrom

2022-01-19 Thread Fabrízio de Royes Mello

On Wed, Jan 19, 2022 at 11:21 AM Japin Li  wrote:
>
>
> Hi, hackers
>
> When I read the code of COPY ... FROM ..., I find there is a redundant
> MemoryContextSwith() in BeginCopyFrom().  In BeginCopyFrom, it creates
> a COPY memory context and then switches to it, in the middle of this
> function, it switches to the oldcontext and immediately switches back to
> COPY memory context, IMO, this is redundant, and can be removed safely.
>

LGTM (it passed all regression without any issue)

--
   Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Re: New developer papercut - Makefile references INSTALL

2022-01-19 Thread Tom Lane

Daniel Gustafsson  writes:
> I ended up pushing this with the URL in place as there IMO was consensus in 
> the
> thread for including it.  We could if we want to update it to point to v15 
> docs
> once we branch off, but anything more than that is probably in the diminishing
> returns territory in terms of effort involved.

I think pointing at devel is fine, since this text is mostly aimed
at developers or would-be developers anyway.

regards, tom lane

Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-01-19 Thread Robert Haas

On Tue, Jan 18, 2022 at 1:48 PM Peter Geoghegan  wrote:
> That's what I was reacting to -- it had nothing to do with any
> concerns you may have had. I wasn't thinking about long-idle cursors
> at all. I was defending myself, because I was put in a position where
> I had to defend myself.

I don't think I've said anything on this thread that is an attack on
you. I am getting pretty frustrated with the tenor of the discussion,
though. I feel like you're the one attacking me, and I don't like it.

> I still don't understand why you think that my idea (not yet
> implemented) of making FreezeLimit into a backstop (making it
> autovacuum_freeze_max_age/2 or something) and relying on the new
> "early freezing" criteria for almost everything is going to make the
> situation worse in this scenario with long idle cursors. It's intended
> to make it better.

I just don't understand how I haven't been able to convey my concern
here by now. I've already written multiple emails about it. If none of
them were clear enough for you to understand, I'm not sure how saying
the same thing over again can help. When I say I've already written
about this, I'm referring specifically to the following:

- 
https://postgr.es/m/ca+tgmobkjm9bszr3eteb6mjdlkwxkk5zxx0xhlf-w9kugvo...@mail.gmail.com
in the second-to-last paragraph, beginning with "I don't really see"
- 
https://www.postgresql.org/message-id/CA%2BTgmoaGoZ2wX6T4sj0eL5YAOQKW3tS8ViMuN%2BtcqWJqFPKFaA%40mail.gmail.com
in the second paragraph beginning with "Because waiting on a lock"
- 
https://www.postgresql.org/message-id/CA%2BTgmoZYri_LUp4od_aea%3DA8RtjC%2B-Z1YmTc7ABzTf%2BtRD2Opw%40mail.gmail.com
in the paragraph beginning with "That's the part I'm not sure I
believe."

For all of that, I'm not even convinced that you're wrong. I just
think you might be wrong. I don't really know. It seems to me however
that you're understating the value of waiting, which I've tried to
explain in the above places. Waiting does have the very real
disadvantage of starving the rest of the system of the work that
autovacuum worker would have been doing, and that's why I think you
might be right. However, there are cases where waiting, and only
waiting, gets the job done. If you're not willing to admit that those
cases exist, or you think they don't matter, then we disagree. If you
admit that they exist and think they matter but believe that there's
some reason why increasing FreezeLimit can't cause any damage, then
either (a) you have a good reason for that belief which I have thus
far been unable to understand or (b) you're more optimistic about the
proposed change than can be entirely justified.

> My sense is that there are very few apps that are hopelessly incapable
> of advancing relfrozenxid from day one. I find it much easier to
> believe that users that had this experience got away with it for a
> very long time, until their luck ran out, somehow. I would like to
> minimize the chance of that ever happening, to the extent that that's
> possible within the confines of the basic heapam/vacuumlazy.c
> invariants.

I agree with the idea that most people are OK at the beginning and
then at some point their luck runs out and catastrophe strikes. I
think there are a couple of different kinds of catastrophe that can
happen. For instance, somebody could park a cursor in the middle of a
table someplace and leave it there until the snow melts. Or, somebody
could take a table lock and sit on it forever. Or, there could be a
corrupted page in the table that causes VACUUM to error out every time
it's reached. In the second and third situations, it doesn't matter a
bit what we do with FreezeLimit, but in the first one it might. If the
user is going to leave that cursor sitting there literally forever,
the best solution is to raise FreezeLimit as high as we possibly can.
The system is bound to shut down due to wraparound at some point, but
we at least might as well vacuum other stuff while we're waiting for
that to happen. On the other hand if that user is going to close that
cursor after 10 minutes and open a new one in the same place 10
seconds later, the best thing to do is to keep FreezeLimit as low as
possible, because the first time we wait for the pin to be released
we're guaranteed to advance relfrozenxid within 10 minutes, whereas if
we don't do that we may keep missing the brief windows in which no
cursor is held for a very long time. But we have absolutely no way of
knowing which of those things is going to happen on any particular
system, or of estimating which one is more common in general.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [ERROR] Copy from CSV fails due to memory error.

2022-01-19 Thread Tomas Vondra





On 1/19/22 14:01, Kostas Chasialis wrote:

Hey.

I am facing an issue when I try to run the following command

COPY  FROM  WITH DELIMITER E',’;

This file, is rather large, it's around 178GBs.

When I try to run this COPY command I get the following error:

ERROR:  out of memory
DETAIL:  Failed on request of size 2048 in memory context "AfterTriggerEvents".
CONTEXT:  COPY ssbm300_lineorder, line 50796791

Clearly a memory allocation function is failing but I have no clue how to fix 
it.

I have tried experimenting with shared_buffers value in postgresql.conf file 
but after searching a bit I quickly realized that I do not know what I am doing 
there so I left it with default value. Same with work_mem value.

Did you face this issue before? Can you help me resolve it?



Well, it's clearly related to "after" triggers - do you have anything 
such triggers on the table? AFAIK it might be related to deferred 
constraints (like unique / foreign keys). Do you have anything like that?


If yes, I guess the only solution is to make the constraints not 
deferred or split the copy into smaller chunks.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Pluggable toaster

2022-01-19 Thread Nikita Malakhov

Hi,

>I'm not convinced that's universally true. Yes, I'm sure certain TOAST
>implementations would benefit from tighter control over compression, but
>does that imply compression and toast are redundant? I doubt that,
>because we compress non-toasted types too, for example. And layering has
>a value too, as makes it easier to replace the pieces.
Not exactly. It is a mean to control TOAST itself without changing the core
each time you want to change Toast strategy or method. Compression is
just an example. And no Toasters are available without the patch proposed,
there is the one and only.

>Perhaps. My main point is that we should not be making too many radical
>changes at once - it makes it much harder to actually get anything done.
>So yeah, doing TOAST through IOT might be interesting, but I'd leave
>that for a separate patch.
That's why 4 distinct patches with incremental changes were proposed -
1) just new Toaster API with some necessary core changes required by the
API;
2) default toaster routed via new API (but all it's functions are not
affected
and dummy toaster extension as an example);
3) 1+2+some refactoring and versioning;
4) extension module for bytea columns.
Toast through IOT is a topic for discussion but does not seem to give a
major
advantage over existing storage method, according to tests.

>It seems better to prevent such incompatible combinations and restrict
>each toaster to just compatible data types, and the mapping table
>(linking toaster and data types) seems a way to do that.
To handle this case a validate function (toastervalidate_function) is
proposed
in the TsrRoutine structure.

>If you have to implement custom toaster to implement custom compression
>method, doesn't that make things more complex? You'd have to solve all
>the issues for custom compression methods and also all issues for custom
>toaster. Also, what if you want to just compress the column, not toast?
Default compression is restricted to 2 compression methods, all other means
require extensions. Also, the name Toaster is a little bit misleading
because
it intends that data is being sliced, but it is not always the case, to be
toasted
a piece of bread must not necessarily be sliced.

Regards,
--
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

On Tue, Jan 18, 2022 at 7:06 PM Tomas Vondra 
wrote:

>
>
> On 1/18/22 15:56, Teodor Sigaev wrote:
> > Hi!
> >
> >> Maybe doing that kind of compression in TOAST is somehow simpler, but
> >> I don't see it.
> > Seems, in ideal world, compression should be inside toaster.
> >
>
> I'm not convinced that's universally true. Yes, I'm sure certain TOAST
> implementations would benefit from tighter control over compression, but
> does that imply compression and toast are redundant? I doubt that,
> because we compress non-toasted types too, for example. And layering has
> a value too, as makes it easier to replace the pieces.
>
> >>
> >>> 2 Current toast storage stores chunks in heap accesses method and to
> >>> provide fast access by toast id it makes an index. Ideas:
> >>>- store chunks directly in btree tree, pgsql's btree already has an
> >>>  INCLUDE columns, so, chunks and visibility data will be stored
> only
> >>>  in leaf pages. Obviously it reduces number of disk's access for
> >>>  "untoasting".
> >>>- use another access method for chunk storage
> >>>
> >>
> >> Maybe, but that probably requires more thought - e.g. btree requires
> >> the values to be less than 1/3 page, so I wonder how would that play
> >> with toasting of values.
> > That's ok, because chunk size is 2000 bytes right now and its could be
> > saved.
> >>
>
> Perhaps. My main point is that we should not be making too many radical
> changes at once - it makes it much harder to actually get anything done.
> So yeah, doing TOAST through IOT might be interesting, but I'd leave
> that for a separate patch.
>
> >
> >> Seems you'd need a mapping table, to allow M:N mapping between types
> >> and toasters, linking it to all "compatible" types. It's not clear to
> >> me how would this work with custom data types, domains etc.
> > If toaster will look into internal structure then it should know type's
> > binary format. So, new custom types have a little chance to work with
> > old custom toaster. Default toaster works with any types.
>
> The question is what happens when you combine data type with a toaster
> that is not designed for that type. I mean, imagine you have a JSONB
> toaster and you set it for a bytea column. Naive implementation will
> just crash, because it'll try to process bytea as if it was JSONB.
>
> It seems better to prevent such incompatible combinations and restrict
> each toaster to just compatible data types, and the mapping table
> (linking toaster and data types) seems a way to do that.
>
> However, it seems toasters are either generic (agnostic to data types,
> treating everything as bytea) or specialized. I doubt any specialized
> toaster can

Remove redundant MemoryContextSwith in BeginCopyFrom

2022-01-19 Thread Japin Li


Hi, hackers

When I read the code of COPY ... FROM ..., I find there is a redundant
MemoryContextSwith() in BeginCopyFrom().  In BeginCopyFrom, it creates
a COPY memory context and then switches to it, in the middle of this
function, it switches to the oldcontext and immediately switches back to
COPY memory context, IMO, this is redundant, and can be removed safely.

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 0d6b34206a..7b3f5a84b8 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1340,10 +1340,6 @@ BeginCopyFrom(ParseState *pstate,
 
 	cstate->whereClause = whereClause;
 
-	MemoryContextSwitchTo(oldcontext);
-
-	oldcontext = MemoryContextSwitchTo(cstate->copycontext);
-
 	/* Initialize state variables */
 	cstate->eol_type = EOL_UNKNOWN;
 	cstate->cur_relname = RelationGetRelationName(cstate->rel);

Re: New developer papercut - Makefile references INSTALL

2022-01-19 Thread Daniel Gustafsson

> On 18 Jan 2022, at 17:05, Peter Eisentraut 
>  wrote:
> On 18.01.22 16:51, Daniel Gustafsson wrote:

>> I plan on applying the attached which address the feedback given.
> 
> The indentation of the two INSTRUCTIONS= lines uses a different mix of tabs 
> and spaces, so it looks a bit weird depending on how you view it.
> 
> It's also a bit strange that the single quotes are part of the value of 
> $INSTRUCTIONS rather than part of the fixed text.

Fixed both of these, thanks!

> The URL links to the "devel" version of the installation instructions, which 
> will not remain appropriate after release.  I don't know how to fix that 
> without creating an additional maintenance point.  Since README.git already 
> contains that link, I would leave off the web site business and just make the 
> change of the dynamically chosen file name.

I ended up pushing this with the URL in place as there IMO was consensus in the
thread for including it.  We could if we want to update it to point to v15 docs
once we branch off, but anything more than that is probably in the diminishing
returns territory in terms of effort involved.

--
Daniel Gustafsson   https://vmware.com/

Re: Logical replication timeout problem

2022-01-19 Thread Fabrice Chapuis

Hello Amit,







If it takes little work for you, can you please send me a piece of code
with the change needed to do the test

Thanks

Regards,

Fabrice


On Fri, Jan 14, 2022 at 1:03 PM Amit Kapila  wrote:

> On Fri, Jan 14, 2022 at 3:47 PM Fabrice Chapuis 
> wrote:
> >
> > If I can follow you, I have to make the following changes:
> >
>
> No, not like that but we can try that way as well to see if that helps
> to avoid your problem. Am, I understanding correctly even after
> modification, you are seeing the problem. Can you try by calling
> WalSndKeepaliveIfNecessary() instead of WalSndKeepalive()?
>
> --
> With Regards,
> Amit Kapila.
>

Re: Refactoring of compression options in pg_basebackup

2022-01-19 Thread Robert Haas

On Tue, Jan 18, 2022 at 11:27 PM Michael Paquier  wrote:
> WFM.  Attached is a patch that extends --compress to handle a method
> with an optional compression level.  Some extra tests are added to
> cover all that.

I think that this will reject something like --compress=nonetheless by
telling you that 't' is not a valid separator. I think it would be
better to code this so that we first identify the portion preceding
the first colon, or the whole string if there is no colon. Then we
check whether that part is a compression method that we recognize. If
not, we complain. If so, we then check whatever is after the separator
for validity - and this might differ by type. For example, we could
then immediately reject none:4, and if in the future we want to allow
lz4:fast3, we could.

I think the code that handles the bare integer case should be at the
top of the function and should return, because that code is short.
Then the rest of the function doesn't need to be indented as deeply.

"First check after the compression method" seems like it would be
better written "First check for the compression method" or "First
check the compression method".

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pgsql: Modify pg_basebackup to use a new COPY subprotocol for base back

2022-01-19 Thread Robert Haas

On Tue, Jan 18, 2022 at 9:29 PM Tom Lane  wrote:
> Robert Haas  writes:
> > Here's a patch, based in part on some off-list discussion with Andres.
> > I believe Andres has already confirmed that this fix works, but it
> > wouldn't hurt if Tom wants to verify it also.
>
> WFM too --- at least, pg_basebackup's "make check" passes now.

Thanks for checking. Committed.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: row filtering for logical replication

2022-01-19 Thread Amit Kapila

On Tue, Jan 18, 2022 at 10:23 AM Greg Nancarrow  wrote:
>
> On Tue, Jan 18, 2022 at 2:31 PM Amit Kapila  wrote:
> >
> > On Tue, Jan 18, 2022 at 8:41 AM Greg Nancarrow  wrote:
> > >
> > > On Tue, Jan 18, 2022 at 2:31 AM houzj.f...@fujitsu.com
> > >  wrote:
> > > >
> > > > > (2) GetTopMostAncestorInPublication
> > > > > Is there a reason why there is no "break" after finding a
> > > > > topmost_relid? Why keep searching and potentially overwrite a
> > > > > previously-found topmost_relid? If it's intentional, I think that a
> > > > > comment should be added to explain it.
> > > >
> > > > The code was moved from get_rel_sync_entry, and was trying to get the
> > > > last oid in the ancestor list which is published by the publication. Do 
> > > > you
> > > > have some suggestions for the comment ?
> > > >
> > >
> > > Maybe the existing comment should be updated to just spell it out like 
> > > that:
> > >
> > > /*
> > >  * Find the "topmost" ancestor that is in this publication, by getting the
> > >  * last Oid in the ancestors list which is published by the publication.
> > >  */
> > >
> >
> > I am not sure that is helpful w.r.t what Peter is looking for as that
> > is saying what code is doing and he wants to know why it is so? I
> > think one can understand this by looking at get_partition_ancestors
> > which will return the top-most ancestor as the last element. I feel
> > either we can say see get_partition_ancestors or maybe explain how the
> > ancestors are stored in this list.
> >
>
> (note: I asked the original question about why there is no "break", not Peter)
>

Okay.

> Maybe instead, an additional comment could be added to the
> GetTopMostAncestorInPublication function to say "Note that the
> ancestors list is ordered such that the topmost ancestor is at the end
> of the list".
>

I am fine with this and I see that Hou-San already used this in the
latest version of patch.

> Unfortunately the get_partition_ancestors function
> currently doesn't explicitly say that the topmost ancestors are
> returned at the end of the list (I guess you could conclude it by then
> looking at get_partition_ancestors_worker code which it calls).
> Also, this leads me to wonder if searching the ancestors list
> backwards might be better here, and break at the first match?
>

I am not sure of the gains by doing that and anyway, that is a
separate topic of discussion as it is an existing code.

-- 
With Regards,
Amit Kapila.

[ERROR] Copy from CSV fails due to memory error.

2022-01-19 Thread Kostas Chasialis

Hey.

I am facing an issue when I try to run the following command

COPY  FROM  WITH DELIMITER E',’;

This file, is rather large, it's around 178GBs. 

When I try to run this COPY command I get the following error:

ERROR:  out of memory
DETAIL:  Failed on request of size 2048 in memory context "AfterTriggerEvents".
CONTEXT:  COPY ssbm300_lineorder, line 50796791

Clearly a memory allocation function is failing but I have no clue how to fix 
it.

I have tried experimenting with shared_buffers value in postgresql.conf file 
but after searching a bit I quickly realized that I do not know what I am doing 
there so I left it with default value. Same with work_mem value.

Did you face this issue before? Can you help me resolve it?

Thanks in advance!

Re: WIN32 pg_import_system_collations

2022-01-19 Thread Juan José Santamaría Flecha

On Wed, Jan 19, 2022 at 10:53 AM Julien Rouhaud  wrote:

>
> This version doesn't apply anymore:
>
> Thanks for the heads up.

Please find attached a rebased patch. I have also rewritten some comments
to address previous reviews, code and test remain the same.

Regards,

Juan José Santamaría Flecha

v3-0001-WIN32-pg_import_system_collations.patch
Description: Binary data

Re: refactoring basebackup.c

2022-01-19 Thread Dipesh Pandit

Hi,

I have added support for decompressing a gzip compressed tar file
at client. pg_basebackup can enable server side compression for
plain format backup with this change.

Added a gzip extractor which decompresses the compressed archive
and forwards it to the next streamer. I have done initial testing and
working on updating the test coverage.

Note: Before applying the patch, please apply Robert's v11 version
of the patches 0001 and 0002.

Thanks,
Dipesh
From 737badce26ed05b5cdb64d9ffd1735fef9acbbf8 Mon Sep 17 00:00:00 2001
From: Dipesh Pandit 
Date: Wed, 19 Jan 2022 17:11:45 +0530
Subject: [PATCH] Support for extracting gzip compressed archive

pg_basebackup can support server side compression using gzip. In
order to support plain format backup with option '-Fp' we need to
add support for decompressing the compressed blocks at client. This
patch addresses the extraction of gzip compressed blocks at client.
---
 src/bin/pg_basebackup/bbstreamer.h  |   1 +
 src/bin/pg_basebackup/bbstreamer_file.c | 175 
 src/bin/pg_basebackup/pg_basebackup.c   |  58 +--
 3 files changed, 225 insertions(+), 9 deletions(-)

diff --git a/src/bin/pg_basebackup/bbstreamer.h b/src/bin/pg_basebackup/bbstreamer.h
index fc88b50..270b0df 100644
--- a/src/bin/pg_basebackup/bbstreamer.h
+++ b/src/bin/pg_basebackup/bbstreamer.h
@@ -205,6 +205,7 @@ extern bbstreamer *bbstreamer_extractor_new(const char *basepath,
 			const char *(*link_map) (const char *),
 			void (*report_output_file) (const char *));
 
+extern bbstreamer *bbstreamer_gzip_extractor_new(bbstreamer *next);
 extern bbstreamer *bbstreamer_tar_parser_new(bbstreamer *next);
 extern bbstreamer *bbstreamer_tar_terminator_new(bbstreamer *next);
 extern bbstreamer *bbstreamer_tar_archiver_new(bbstreamer *next);
diff --git a/src/bin/pg_basebackup/bbstreamer_file.c b/src/bin/pg_basebackup/bbstreamer_file.c
index 77ca222..350af1d 100644
--- a/src/bin/pg_basebackup/bbstreamer_file.c
+++ b/src/bin/pg_basebackup/bbstreamer_file.c
@@ -37,6 +37,13 @@ typedef struct bbstreamer_gzip_writer
 	char	   *pathname;
 	gzFile		gzfile;
 }			bbstreamer_gzip_writer;
+
+typedef struct bbstreamer_gzip_extractor
+{
+	bbstreamer	base;
+	z_stream	zstream;
+	size_t		bytes_written;
+} bbstreamer_gzip_extractor;
 #endif
 
 typedef struct bbstreamer_extractor
@@ -76,6 +83,21 @@ const bbstreamer_ops bbstreamer_gzip_writer_ops = {
 	.finalize = bbstreamer_gzip_writer_finalize,
 	.free = bbstreamer_gzip_writer_free
 };
+
+static void bbstreamer_gzip_extractor_content(bbstreamer *streamer,
+			  bbstreamer_member *member,
+			  const char *data, int len,
+			  bbstreamer_archive_context context);
+static void bbstreamer_gzip_extractor_finalize(bbstreamer *streamer);
+static void bbstreamer_gzip_extractor_free(bbstreamer *streamer);
+static void *gzip_palloc(void *opaque, unsigned items, unsigned size);
+static void gzip_pfree(void *opaque, void *address);
+
+const bbstreamer_ops bbstreamer_gzip_extractor_ops = {
+	.content = bbstreamer_gzip_extractor_content,
+	.finalize = bbstreamer_gzip_extractor_finalize,
+	.free = bbstreamer_gzip_extractor_free
+};
 #endif
 
 static void bbstreamer_extractor_content(bbstreamer *streamer,
@@ -349,6 +371,159 @@ get_gz_error(gzFile gzf)
 #endif
 
 /*
+ * Create a new base backup streamer that performs decompression of gzip
+ * compressed blocks.
+ */
+bbstreamer *
+bbstreamer_gzip_extractor_new(bbstreamer *next)
+{
+#ifdef HAVE_LIBZ
+	bbstreamer_gzip_extractor	*streamer;
+	z_stream *zs;
+
+	Assert(next != NULL);
+
+	streamer = palloc0(sizeof(bbstreamer_gzip_extractor));
+	*((const bbstreamer_ops **) >base.bbs_ops) =
+		_gzip_extractor_ops;
+
+	streamer->base.bbs_next = next;
+	initStringInfo(>base.bbs_buffer);
+
+	/* Initialize internal stream state for decompression */
+	zs = >zstream;
+	zs->zalloc = gzip_palloc;
+	zs->zfree = gzip_pfree;
+	zs->next_out = (uint8 *) streamer->base.bbs_buffer.data;
+	zs->avail_out = streamer->base.bbs_buffer.maxlen;
+
+	/*
+	 * Data compression was initialized using deflateInit2 to request a gzip
+	 * header. Similarly, we are using inflateInit2 to initialize data
+	 * decompression.
+	 * "windowBits" must be greater than or equal to "windowBits" value
+	 * provided to deflateInit2 while compressing.
+	 */
+	if (inflateInit2(zs, 15 + 16) != Z_OK)
+	{
+		pg_log_error("could not initialize compression library");
+		exit(1);
+
+	}
+
+	return >base;
+#else
+	pg_log_error("this build does not support compression");
+	exit(1);
+#endif
+}
+
+#ifdef HAVE_LIBZ
+/*
+ * Decompress the input data to output buffer until we ran out of the input
+ * data. Each time the output buffer is full invoke bbstreamer_content to pass
+ * on the decompressed data to next streamer.
+ */
+static void
+bbstreamer_gzip_extractor_content(bbstreamer *streamer,
+  bbstreamer_member *member,
+  const char *data, int len,
+  bbstreamer_archive_context context)
+{
+

Re: [PATCH] Add reloption for views to enable RLS

2022-01-19 Thread Christoph Heiss


Hi,

On 1/19/22 09:30, Julien Rouhaud wrote:

Hi,

On Tue, Jan 18, 2022 at 04:16:53PM +0100, Christoph Heiss wrote:


I've attached a v2 where I addressed the things you mentioned.


This version unfortunately doesn't apply anymore:
http://cfbot.cputube.org/patch_36_3466.log
=== Applying patches on top of PostgreSQL commit ID 
e0e567a106726f6709601ee7cffe73eb6da8084e ===
=== applying patch 
./0001-PATCH-v2-1-3-Add-new-boolean-reloption-security_invo.patch
=== applying patch 
./0002-PATCH-v2-2-3-Add-regression-tests-for-new-security_i.patch
patching file src/test/regress/expected/create_view.out
Hunk #5 FAILED at 2019.
Hunk #6 succeeded at 2056 (offset 16 lines).
1 out of 6 hunks FAILED -- saving rejects to file 
src/test/regress/expected/create_view.out.rej

Could you send a rebased version?


My bad - I attached a new version rebased on latest master.

Thanks,
Christoph HeissFrom b1b5a6c6bd63c509b3bcdef6d1e7a548413c2a9d Mon Sep 17 00:00:00 2001
From: Christoph Heiss 
Date: Tue, 18 Jan 2022 15:42:58 +0100
Subject: [PATCH 1/3] [PATCH v3 1/3] Add new boolean reloption security_invoker
 to views

When this reloption is set to true, all references to the underlying tables will
be checked against the invoking user rather than the view owner, as is currently
implemented.
Row level security must be enabled on the tables for this to take effect.

Signed-off-by: Christoph Heiss 
---
 src/backend/access/common/reloptions.c| 11 
 src/backend/nodes/copyfuncs.c |  1 +
 src/backend/nodes/equalfuncs.c|  1 +
 src/backend/nodes/outfuncs.c  |  1 +
 src/backend/nodes/readfuncs.c |  1 +
 src/backend/optimizer/plan/subselect.c|  1 +
 src/backend/optimizer/prep/prepjointree.c |  1 +
 src/backend/rewrite/rewriteHandler.c  |  1 +
 src/backend/utils/cache/relcache.c| 63 +--
 src/include/nodes/parsenodes.h|  1 +
 src/include/utils/rel.h   | 11 
 11 files changed, 65 insertions(+), 28 deletions(-)

diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index d592655258..beb170b5e6 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -140,6 +140,15 @@ static relopt_bool boolRelOpts[] =
 		},
 		false
 	},
+	{
+		{
+			"security_invoker",
+			"View subquery in invoked within the current security context.",
+			RELOPT_KIND_VIEW,
+			AccessExclusiveLock
+		},
+		false
+	},
 	{
 		{
 			"vacuum_truncate",
@@ -1996,6 +2005,8 @@ view_reloptions(Datum reloptions, bool validate)
 	static const relopt_parse_elt tab[] = {
 		{"security_barrier", RELOPT_TYPE_BOOL,
 		offsetof(ViewOptions, security_barrier)},
+		{"security_invoker", RELOPT_TYPE_BOOL,
+		offsetof(ViewOptions, security_invoker)},
 		{"check_option", RELOPT_TYPE_ENUM,
 		offsetof(ViewOptions, check_option)}
 	};
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 90b5da51c9..b171992ba8 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2465,6 +2465,7 @@ _copyRangeTblEntry(const RangeTblEntry *from)
 	COPY_NODE_FIELD(tablesample);
 	COPY_NODE_FIELD(subquery);
 	COPY_SCALAR_FIELD(security_barrier);
+	COPY_SCALAR_FIELD(security_invoker);
 	COPY_SCALAR_FIELD(jointype);
 	COPY_SCALAR_FIELD(joinmergedcols);
 	COPY_NODE_FIELD(joinaliasvars);
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 06345da3ba..a832c5fefe 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -2766,6 +2766,7 @@ _equalRangeTblEntry(const RangeTblEntry *a, const RangeTblEntry *b)
 	COMPARE_NODE_FIELD(tablesample);
 	COMPARE_NODE_FIELD(subquery);
 	COMPARE_SCALAR_FIELD(security_barrier);
+	COMPARE_SCALAR_FIELD(security_invoker);
 	COMPARE_SCALAR_FIELD(jointype);
 	COMPARE_SCALAR_FIELD(joinmergedcols);
 	COMPARE_NODE_FIELD(joinaliasvars);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 2b0236937a..883284ad0d 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -3261,6 +3261,7 @@ _outRangeTblEntry(StringInfo str, const RangeTblEntry *node)
 		case RTE_SUBQUERY:
 			WRITE_NODE_FIELD(subquery);
 			WRITE_BOOL_FIELD(security_barrier);
+			WRITE_BOOL_FIELD(security_invoker);
 			break;
 		case RTE_JOIN:
 			WRITE_ENUM_FIELD(jointype, JoinType);
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 3f68f7c18d..ad825bb27f 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1444,6 +1444,7 @@ _readRangeTblEntry(void)
 		case RTE_SUBQUERY:
 			READ_NODE_FIELD(subquery);
 			READ_BOOL_FIELD(security_barrier);
+			READ_BOOL_FIELD(security_invoker);
 			break;
 		case RTE_JOIN:
 			READ_ENUM_FIELD(jointype, JoinType);
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 41bd1ae7d4..30c66b9c6d 100644
--- a/src/backend/optimizer/plan/subselect.c

Re: generic plans and "initial" pruning

2022-01-19 Thread Simon Riggs

On Wed, 19 Jan 2022 at 08:31, Amit Langote  wrote:

> > Maybe force all DDL, or just DDL that would cause safety issues, to
> > update a hierarchy version number, so queries can tell whether they
> > need to replan. Don't know, just looking for an O(1) solution.
>
> Yeah, it would be great if it would suffice to take a single lock on
> the partitioned table mentioned in the query, rather than on all
> elements of the partition tree added to the plan.  AFAICS, ways to get
> that are 1) Prevent modifying non-root partition tree elements,

Can we reuse the concept of Strong/Weak locking here?

When a DDL request is in progress (for that partitioned table), take
all required locks for safety. When a DDL request is not in progress,
take minimal locks knowing it is safe.

We can take a single PartitionTreeModificationLock, nowait to prove
that we do not need all locks. DDL would request the lock in exclusive
mode. (Other mechanisms possible).

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: a misbehavior of partition row movement (?)

2022-01-19 Thread Amit Langote

On Wed, Jan 19, 2022 at 6:26 PM Alvaro Herrera  wrote:
> On 2022-Jan-19, Amit Langote wrote:
> > BTW, your tweaks patch added this:
> >
> > + * *inserted_tuple is the tuple that's effectively inserted;
> > + * *inserted_destrel is the relation where it was inserted.
> > + * These are only set on success.  FIXME -- see what happens on
> > the "do nothing" cases.
> >
> > If by "do nothing cases" you mean INSERT ON CONFLICT ... DO NOTHING,
> > then I don't think it matters, because the caller in that case would
> > be ExecModifyTable() which doesn't care about inserted_tuple and
> > inserted_destrel.
>
> No, I meant a FOR EACH ROW trigger that does RETURN NULL to "abort" the
> insertion.

Ah, gotcha.

>  IIRC in non-partitioned cases it is possibly to break
> referential integrity by using those.  What I was wondering is whether
> you can make this new code crash.

insert_destrel would be left set to NULL, which means
ExecCrossPartitionUpdateForeignKey() won't get called, because:

 * NULL insert_destrel means that the move failed to occur, that
 * is, the update failed, so no need to anything in that case.
 */
if (insert_destrel &&
resultRelInfo->ri_TrigDesc &&
resultRelInfo->ri_TrigDesc->trig_update_after_row)
ExecCrossPartitionUpdateForeignKey(mtstate, estate,

Moreover, trigger documentation warns of a "possibility of surprising
outcomes" if BR triggers are present in partitions that are chosen as
destinations of cross-partition updates:

"Then all row-level BEFORE INSERT triggers are fired on the
destination partition. The possibility of surprising outcomes should
be considered when all these triggers affect the row being moved."

I suppose the new code shouldn't need to take special care in such cases either.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Re: pg_upgrade should truncate/remove its logs before running

2022-01-19 Thread Michael Paquier

On Wed, Jan 19, 2022 at 05:13:18PM +0900, Michael Paquier wrote:
> I have noticed a couple of incorrect things in the docs, and some
> other things.  It is a bit late here, so I may have missed a couple of
> things but I'll look at this stuff once again in a couple of days.

And the docs failed to build..
--
Michael
From b5a22e5e641930a46c74622bbf42a00b0abdeb8b Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Wed, 19 Jan 2022 19:38:24 +0900
Subject: [PATCH v3] pg_upgrade: write log files and dumps in subdir..

The subdir must not exist when pg_upgrade is started.
This avoids appending to logfiles which have pre-existing errors in them.
And allows more easily cleaning up after pg_upgrade.
---
 src/bin/pg_upgrade/.gitignore   |   2 -
 src/bin/pg_upgrade/Makefile |   4 +-
 src/bin/pg_upgrade/check.c  |  12 ++--
 src/bin/pg_upgrade/dump.c   |   6 +-
 src/bin/pg_upgrade/exec.c   |   5 +-
 src/bin/pg_upgrade/function.c   |   3 +-
 src/bin/pg_upgrade/option.c |  29 +++--
 src/bin/pg_upgrade/pg_upgrade.c | 105 
 src/bin/pg_upgrade/pg_upgrade.h |   8 ++-
 src/bin/pg_upgrade/server.c |   6 +-
 doc/src/sgml/ref/pgupgrade.sgml |  17 +-
 11 files changed, 130 insertions(+), 67 deletions(-)

diff --git a/src/bin/pg_upgrade/.gitignore b/src/bin/pg_upgrade/.gitignore
index 2d3bfeaa50..939e50db6c 100644
--- a/src/bin/pg_upgrade/.gitignore
+++ b/src/bin/pg_upgrade/.gitignore
@@ -1,9 +1,7 @@
 /pg_upgrade
 # Generated by test suite
-/pg_upgrade_internal.log
 /delete_old_cluster.sh
 /delete_old_cluster.bat
 /reindex_hash.sql
-/loadable_libraries.txt
 /log/
 /tmp_check/
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 44d06be5a6..e833a1c5f0 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -45,9 +45,7 @@ uninstall:
 clean distclean maintainer-clean:
 	rm -f pg_upgrade$(X) $(OBJS)
 	rm -rf delete_old_cluster.sh log/ tmp_check/ \
-	   loadable_libraries.txt reindex_hash.sql \
-	   pg_upgrade_dump_globals.sql \
-	   pg_upgrade_dump_*.custom pg_upgrade_*.log
+	   reindex_hash.sql pg_upgrade_output.d/
 
 # When $(MAKE) is present, make automatically infers that this is a
 # recursive make. which is not actually what we want here, as that
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 3d218c2ad2..019bcb6c7b 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -783,7 +783,8 @@ check_for_isn_and_int8_passing_mismatch(ClusterInfo *cluster)
 		return;
 	}
 
-	snprintf(output_path, sizeof(output_path),
+	snprintf(output_path, sizeof(output_path), "%s/%s",
+			 log_opts.basedir,
 			 "contrib_isn_and_int8_pass_by_value.txt");
 
 	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
@@ -860,7 +861,8 @@ check_for_user_defined_postfix_ops(ClusterInfo *cluster)
 
 	prep_status("Checking for user-defined postfix operators");
 
-	snprintf(output_path, sizeof(output_path),
+	snprintf(output_path, sizeof(output_path), "%s/%s",
+			 log_opts.basedir,
 			 "postfix_ops.txt");
 
 	/* Find any user defined postfix operators */
@@ -959,7 +961,8 @@ check_for_tables_with_oids(ClusterInfo *cluster)
 
 	prep_status("Checking for tables WITH OIDS");
 
-	snprintf(output_path, sizeof(output_path),
+	snprintf(output_path, sizeof(output_path), "%s/%s",
+			 log_opts.basedir,
 			 "tables_with_oids.txt");
 
 	/* Find any tables declared WITH OIDS */
@@ -1214,7 +1217,8 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 
 	prep_status("Checking for user-defined encoding conversions");
 
-	snprintf(output_path, sizeof(output_path),
+	snprintf(output_path, sizeof(output_path), "%s/%s",
+			 log_opts.basedir,
 			 "encoding_conversions.txt");
 
 	/* Find any user defined encoding conversions */
diff --git a/src/bin/pg_upgrade/dump.c b/src/bin/pg_upgrade/dump.c
index 953873fa01..b69b4f9569 100644
--- a/src/bin/pg_upgrade/dump.c
+++ b/src/bin/pg_upgrade/dump.c
@@ -22,9 +22,10 @@ generate_old_dump(void)
 	/* run new pg_dumpall binary for globals */
 	exec_prog(UTILITY_LOG_FILE, NULL, true, true,
 			  "\"%s/pg_dumpall\" %s --globals-only --quote-all-identifiers "
-			  "--binary-upgrade %s -f %s",
+			  "--binary-upgrade %s -f \"%s/%s\"",
 			  new_cluster.bindir, cluster_conn_opts(_cluster),
 			  log_opts.verbose ? "--verbose" : "",
+			  log_opts.dumpdir,
 			  GLOBALS_DUMP_FILE);
 	check_ok();
 
@@ -52,9 +53,10 @@ generate_old_dump(void)
 
 		parallel_exec_prog(log_file_name, NULL,
 		   "\"%s/pg_dump\" %s --schema-only --quote-all-identifiers "
-		   "--binary-upgrade --format=custom %s --file=\"%s\" %s",
+		   "--binary-upgrade --format=custom %s --file=\"%s/%s\" %s",
 		   new_cluster.bindir, cluster_conn_opts(_cluster),
 		   log_opts.verbose ? "--verbose" : "",
+		   log_opts.dumpdir,
 		   sql_file_name, escaped_connstr.data);
 
 		termPQExpBuffer(_connstr);
diff --git a/src/bin/pg_upgrade/exec.c b/src/bin/pg_upgrade/exec.c

Re: using extended statistics to improve join estimates

2022-01-19 Thread Julien Rouhaud

Hi,

On Tue, Jan 04, 2022 at 03:55:50PM -0800, Andres Freund wrote:
> On 2022-01-01 18:21:06 +0100, Tomas Vondra wrote:
> > Here's an updated patch, rebased and fixing a couple typos reported by
> > Justin Pryzby directly.
> 
> FWIW, cfbot reports a few compiler warnings:

Also the patch doesn't apply anymore:

http://cfbot.cputube.org/patch_36_3055.log
=== Applying patches on top of PostgreSQL commit ID 
74527c3e022d3ace648340b79a6ddec3419f6732 ===
=== applying patch 
./0001-Estimate-joins-using-extended-statistics-20220101.patch
patching file src/backend/optimizer/path/clausesel.c
patching file src/backend/statistics/extended_stats.c
Hunk #1 FAILED at 30.
Hunk #2 succeeded at 102 (offset 1 line).
Hunk #3 succeeded at 2619 (offset 9 lines).
1 out of 3 hunks FAILED -- saving rejects to file 
src/backend/statistics/extended_stats.c.rej

Re: row filtering for logical replication

2022-01-19 Thread Amit Kapila

On Wed, Jan 19, 2022 at 7:45 AM houzj.f...@fujitsu.com
 wrote:
>
> On Tues, Jan 18, 2022 8:35 PM Amit Kapila  wrote:
>
> Attach the V67 patch set which address the above comments.
>

Some more comments and suggestions:
=
1. Can we do slot initialization in maybe_send_schema() instead of
introducing a new flag for it?

2.
+ * For updates if no old tuple, it means none of the replica identity
+ * columns changed and this would reduce to a simple update. We only need
+ * to evaluate the row filter for the new tuple.

Is it possible with the current version of the patch? I am asking
because, for updates, we now allow only RI columns in row filter, so
do we need to evaluate the row filter in this case? I think ideally,
we don't need to evaluate the row filter in this case as for updates
only replica identity columns are allowed but users can use constant
expressions in the row filter. So, we need to evaluate the row filter
in this case as well. Is my understanding correct?

3. + /* If no filter found, clean up the memory and return */
+ if (!has_filter)
+ {
+ if (entry->cache_expr_cxt != NULL)
+ MemoryContextDelete(entry->cache_expr_cxt);

I think this clean-up needs to be performed when we set
exprstate_valid to false. I have changed accordingly in the attached
patch.

Apart from the above, I have made quite a few changes in the code
comments in the attached top-up patch, kindly review those and merge
them into the main patch, if you are okay with it.

-- 
With Regards,
Amit Kapila.

v67_changes_amit_1.patch
Description: Binary data

Re: WIN32 pg_import_system_collations

2022-01-19 Thread Julien Rouhaud

Hi,

On Mon, Dec 13, 2021 at 05:28:47PM +0100, Juan José Santamaría Flecha wrote:
> On Mon, Dec 13, 2021 at 9:41 AM Juan José Santamaría Flecha <
> juanjo.santama...@gmail.com> wrote:
> 
> Per path tester.

This version doesn't apply anymore:

http://cfbot.cputube.org/patch_36_3450.log
=== Applying patches on top of PostgreSQL commit ID 
e0e567a106726f6709601ee7cffe73eb6da8084e ===
=== applying patch ./v2-0001-WIN32-pg_import_system_collations.patch
[...]
patching file src/tools/msvc/vcregress.pl
Hunk #1 succeeded at 153 (offset -1 lines).
Hunk #2 FAILED at 170.
1 out of 2 hunks FAILED -- saving rejects to file 
src/tools/msvc/vcregress.pl.rej

Could you send a rebased version?  In the meantime I will switch the CF entry
to Waiting on Author.

Re: a misbehavior of partition row movement (?)

2022-01-19 Thread Alvaro Herrera

On 2022-Jan-19, Amit Langote wrote:

> BTW, your tweaks patch added this:
> 
> + * *inserted_tuple is the tuple that's effectively inserted;
> + * *inserted_destrel is the relation where it was inserted.
> + * These are only set on success.  FIXME -- see what happens on
> the "do nothing" cases.
> 
> If by "do nothing cases" you mean INSERT ON CONFLICT ... DO NOTHING,
> then I don't think it matters, because the caller in that case would
> be ExecModifyTable() which doesn't care about inserted_tuple and
> inserted_destrel.

No, I meant a FOR EACH ROW trigger that does RETURN NULL to "abort" the
insertion.  IIRC in non-partitioned cases it is possibly to break
referential integrity by using those.  What I was wondering is whether
you can make this new code crash.

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
"I can see support will not be a problem.  10 out of 10."(Simon Wittber)
  (http://archives.postgresql.org/pgsql-general/2004-12/msg00159.php)

Re: Support for NSS as a libpq TLS backend

2022-01-19 Thread Daniel Gustafsson

> On 18 Jan 2022, at 17:37, Jacob Champion  wrote:
> 
> On Wed, 2021-12-15 at 23:10 +0100, Daniel Gustafsson wrote:
>> I've attached a v50 which fixes the issues found by Joshua upthread, as well 
>> as
>> rebases on top of all the recent SSL and pgcrypto changes.
> 
> I'm currently tracking down a slot leak. When opening and closing large
> numbers of NSS databases, at some point we appear to run out of slots
> and then NSS starts misbehaving, even though we've closed all of our
> context handles.

Interesting, are you able to share a reproducer for this so I can assist in
debugging it?

--
Daniel Gustafsson   https://vmware.com/

Re: Support tab completion for upper character inputs in psql

2022-01-19 Thread Julien Rouhaud

Hi,

On Sat, Jan 15, 2022 at 01:51:26PM -0500, Tom Lane wrote:
> 
> I spent some time looking at this patch.  I'm not very happy with it,
> for two reasons:
> [...]

On top of that the patch doesn't apply anymore:

http://cfbot.cputube.org/patch_36_2979.log
=== Applying patches on top of PostgreSQL commit ID 
5987feb70b5bbb1fc4e64d433f490df08d91dd45 ===
=== applying patch 
./v11-0001-Support-tab-completion-with-a-query-result-for-u.patch
patching file src/bin/psql/t/010_tab_completion.pl
Hunk #1 FAILED at 41.
Hunk #2 succeeded at 150 (offset 1 line).
1 out of 2 hunks FAILED -- saving rejects to file 
src/bin/psql/t/010_tab_completion.pl.rej

I'm switching the CF entry to Waiting on Author.

Re: Skipping logical replication transactions on subscriber side

2022-01-19 Thread Amit Kapila

On Wed, Jan 19, 2022 at 12:46 PM Masahiko Sawada  wrote:
>
> On Wed, Jan 19, 2022 at 12:22 PM osumi.takami...@fujitsu.com
>  wrote:
> >
> > (6) apply_handle_stream_abort
> >
> > @@ -1209,6 +1300,13 @@ apply_handle_stream_abort(StringInfo s)
> >
> > logicalrep_read_stream_abort(s, , );
> >
> > +   /*
> > +* We don't expect the user to set the XID of the transaction that 
> > is
> > +* rolled back but if the skip XID is set, clear it.
> > +*/
> > +   if (MySubscription->skipxid == xid || MySubscription->skipxid == 
> > subxid)
> > +   clear_subscription_skip_xid(MySubscription->skipxid, 
> > InvalidXLogRecPtr, 0);
> > +
> >
> > In my humble opinion, this still cares about subtransaction xid still.
> > If we want to be consistent with top level transactions only,
> > I felt checking MySubscription->skipxid == xid should be sufficient.
>
> I thought if we can clear subskipxid whose value has already been
> processed on the subscriber with a reasonable cost it makes sense to
> do that because it can reduce the possibility of the issue that XID is
> wraparound while leaving the wrong in subskipxid.
>

I guess that could happen if the user sets some unrelated XID value.
So, I think it should be okay to not clear this but we can add a
comment in the code at that place that we don't clear subtransaction's
XID as we don't support skipping individual subtransactions or
something like that.

-- 
With Regards,
Amit Kapila.

Re: Showing I/O timings spent reading/writing temp buffers in EXPLAIN

2022-01-19 Thread Julien Rouhaud

Hi,

On Tue, Nov 16, 2021 at 04:37:44PM +0900, Masahiko Sawada wrote:
> 
> I've attached an updated patch. Please review it.

It seems that the regression tests aren't entirely stable, per cfbot:
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/36/3298.

The failures look like:

diff -U3 /tmp/cirrus-ci-build/src/test/recovery/../regress/expected/explain.out 
/tmp/cirrus-ci-build/src/test/recovery/tmp_check/results/explain.out
--- /tmp/cirrus-ci-build/src/test/recovery/../regress/expected/explain.out  
2022-01-19 03:50:37.087931908 +
+++ /tmp/cirrus-ci-build/src/test/recovery/tmp_check/results/explain.out
2022-01-19 03:57:41.013616137 +
@@ -512,9 +512,10 @@
I/O Timings: temp read=N.N write=N.N
->  Function Scan on generate_series  (cost=N.N..N.N rows=N width=N) 
(actual time=N.N..N.N rows=N loops=N)
  I/O Timings: temp read=N.N write=N.N
+   I/O Timings: shared/local read=N.N
  Planning Time: N.N ms
  Execution Time: N.N ms
-(6 rows)
+(7 rows)

 select explain_filter('explain (analyze, buffers, format json) select count(*) 
from generate_series(1, 10)');
 explain_filter


I don't see any obvious error in the code, so I'm wondering if it's simply
the result of populating the cache with generate_series() info.

1 2 >

1 - 100 of 108 matches

Mail list logo