Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

2023-07-04 Thread Amit Kapila
On Wed, Jul 5, 2023 at 9:01 AM Peter Smith  wrote:
>
> Hi. Here are some review comments for this patch.
>
> +1 for the patch idea.
>
> --
>
> I wasn't sure about the code comment adjustments suggested by Amit [1]:
> "Accordingly, the comments atop build_replindex_scan_key(),
> FindUsableIndexForReplicaIdentityFull(), IsIndexOnlyOnExpression()
> should also be adjusted."
>
> Actually, I thought the FindUsableIndexForReplicaIdentityFull()
> function comment is *already* describing the limitation about the
> leftmost column (see fragment below), so IIUC the Sawada-san patch is
> only trying to expose that same information in the PG docs.
>
> [FindUsableIndexForReplicaIdentityFull comment fragment]
>  * We also skip indexes if the remote relation does not contain the leftmost
>  * column of the index. This is because in most such cases sequential scan is
>  * favorable over index scan.
>

This implies that the leftmost column of the index must be
non-expression but I feel what the patch intends to say in docs is
more straightforward and it doesn't match what the proposed docs says.

> ~
>
> OTOH, it may be better if these limitation rule details were not
> scattered in the code. e.g. build_replindex_scan_key() function
> comment can be simplified:
>
> CURRENT:
>  * This is not generic routine, it expects the idxrel to be a btree, 
> non-partial
>  * and have at least one column reference (i.e. cannot consist of only
>  * expressions).
>
> SUGGESTION:
> This is not a generic routine. It expects the 'idxrel' to be an index
> deemed "usable" by the function
> FindUsableIndexForReplicaIdentityFull().
>

Note that for PK/ReplicaIdentity, we don't even call
FindUsableIndexForReplicaIdentityFull() but build_replindex_scan_key()
would still be called for such index. So, I am not sure your proposed
wording is an improvement.

> --
> doc/src/sgml/logical-replication.sgml
>
> 1.
> the key.  When replica identity FULL is specified,
> indexes can be used on the subscriber side for searching the rows.
> Candidate
> indexes must be btree, non-partial, and have at least one column reference
> -   (i.e. cannot consist of only expressions).  These restrictions
> -   on the non-unique index properties adhere to some of the restrictions that
> -   are enforced for primary keys.  If there are no such suitable indexes,
> +   at the leftmost column indexes (i.e. cannot consist of only
> expressions).  These
> +   restrictions on the non-unique index properties adhere to some of
> the restrictions
> +   that are enforced for primary keys.  If there are no such suitable 
> indexes,
> the search on the subscriber side can be very inefficient, therefore
> replica identity FULL should only be used as a
> fallback if no other solution is possible.  If a replica identity other
>
> Isn't this using the word "indexes" with different meanings in the
> same sentence? e.g. IIUC "leftmost column indexes" is referring to the
> ordinal number of the index fields. TBH, I am not sure the patch
> wording is even describing the limitation in quite the same way as
> what the code is actually doing.
>
> HEAD (code comment):
>  * We also skip indexes if the remote relation does not contain the leftmost
>  * column of the index. This is because in most such cases sequential scan is
>  * favorable over index scan.
>
> HEAD (rendered docs)
> Candidate indexes must be btree, non-partial, and have at least one
> column reference (i.e. cannot consist of only expressions). These
> restrictions on the non-unique index properties adhere to some of the
> restrictions that are enforced for primary keys.
>
> PATCHED (rendered docs)
> Candidate indexes must be btree, non-partial, and have at least one
> column reference at the leftmost column indexes (i.e. cannot consist
> of only expressions). These restrictions on the non-unique index
> properties adhere to some of the restrictions that are enforced for
> primary keys.
>
> MY SUGGESTION:
> Candidate indexes must be btree, non-partial, and have at least one
> column reference (i.e. cannot consist of only expressions).
> Furthermore, the leftmost field of the candidate index must be a
> column of the published table. These restrictions on the non-unique
> index properties adhere to some of the restrictions that are enforced
> for primary keys.
>

I don't know if this suggestion is what the code is actually doing. In
function RemoteRelContainsLeftMostColumnOnIdx(), we have the following
checks:
==
keycol = indexInfo->ii_IndexAttrNumbers[0];
if (!AttributeNumberIsValid(keycol))
return false;

if (attrmap->maplen <= AttrNumberGetAttrOffset(keycol))
return false;

return attrmap->attnums[AttrNumberGetAttrOffset(keycol)] >= 0;
==

The first of these checks indicates that the leftmost column of the
index should be non-expression, second and third indicates what you
suggest in your wording. We can also think that what you wrote in a
way is a superset of "leftmost index 

Re: Clean up command argument assembly

2023-07-04 Thread Peter Eisentraut

On 04.07.23 14:14, Heikki Linnakangas wrote:

On 26/06/2023 12:33, Peter Eisentraut wrote:

This is a small code cleanup patch.

Several commands internally assemble command lines to call other
commands.  This includes initdb, pg_dumpall, and pg_regress.  (Also
pg_ctl, but that is different enough that I didn't consider it here.)
This has all evolved a bit organically, with fixed-size buffers, and
various optional command-line arguments being injected with
confusing-looking code, and the spacing between options handled in
inconsistent ways.  This patch cleans all this up a bit to look clearer
and be more easily extensible with new arguments and options.


+1


committed


We start each command with printfPQExpBuffer(), and then append
arguments as necessary with appendPQExpBuffer().  Also standardize on
using initPQExpBuffer() over createPQExpBuffer() where possible.
pg_regress uses StringInfo instead of PQExpBuffer, but many of the
same ideas apply.


It's a bit bogus to use PQExpBuffer for these. If you run out of memory, 
you silently get an empty string instead. StringInfo, which exits the 
process on OOM, would be more appropriate. We have tons of such 
inappropriate uses of PQExpBuffer in all our client programs, though, so 
I don't insist on fixing this particular case right now.


Interesting point.  But as you say better dealt with as a separate problem.





Re: Add more sanity checks around callers of changeDependencyFor()

2023-07-04 Thread Michael Paquier
On Tue, Jul 04, 2023 at 02:40:04PM -0400, Tom Lane wrote:
> Alvaro Herrera  writes:
>> Hmm, shouldn't we disallow moving the function to another schema, if the
>> function's schema was originally determined at extension creation time?
>> I'm not sure we really want to allow moving objects of an extension to a
>> different schema.
> 
> Why not?  I do not believe that an extension's objects are required
> to all be in the same schema.

Yes, I don't see what we would gain by putting restrictions regarding
which schema an object is located in, depending on which schema an
extension uses.
--
Michael


signature.asc
Description: PGP signature


Re: logicalrep_message_type throws an error

2023-07-04 Thread Amit Kapila
On Mon, Jul 3, 2023 at 6:32 PM Euler Taveira  wrote:
>
> On Mon, Jul 3, 2023, at 7:30 AM, Ashutosh Bapat wrote:
>
> logicalrep_message_type() is used to convert logical message type code
> into name while prepared error context or details. Thus when this
> function is called probably an ERROR is already raised. If
> logicalrep_message_type() gets an unknown message type, it will throw
> an error, which will suppress the error for which we are building
> context or details. That's not useful. I think instead
> logicalrep_message_type() should return "unknown" when it encounters
> an unknown message type and let the original error message be thrown
> as is.
>
>
> Hmm. Good catch. The current behavior is:
>
> ERROR:  invalid logical replication message type "X"
> LOG:  background worker "logical replication worker" (PID 71800) exited with 
> exit code 1
>
> ... that hides the details. After providing a default message type:
>
> ERROR:  invalid logical replication message type "X"
> CONTEXT:  processing remote data for replication origin "pg_16638" during 
> message type "???" in transaction 796, finished at 0/16266F8
>

I think after returning "???" from logicalrep_message_type(), the
above is possible when we get the error: "invalid logical replication
message type "X"" from apply_dispatch(), right? If so, then what about
the case when we forgot to handle some message in
logicalrep_message_type() but handled it in apply_dispatch()? Isn't it
better to return the 'action' from the function
logicalrep_message_type() for unknown type? That way the information
could be a bit better and we can easily catch the code bug as well.

-- 
With Regards,
Amit Kapila.




pg_upgrade and cross-library upgrades

2023-07-04 Thread Michael Paquier
Hi all,

After removing --with-openssl from its build of HEAD, snapper has
begun failing in the pg_upgrade path 11->HEAD, because it attempts
pg_upgrade from binaries that have OpenSSL to builds without it:
https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=snapper=HEAD

Using the TAP tests of pg_upgrade, I can get the same failure with the
following steps:
1) Setup instance based on Postgres 11, compiled with OpenSSL.
2) Run a few tests and tap a dump:
# From 11 source tree:
make installcheck
cd contrib/pgcrypto/
USE_MODULE_DB=1 make installcheck
~/path/to/11/bin/pg_dumpall -f /tmp/olddump.sql
3) From 16~ source tree, compiled without OpenSSL:
cd src/bin/pg_upgrade
olddump=/tmp/olddump.sql oldinstall=~/path/to/11/ make check

And then you would get:
could not load library "$libdir/pgcrypto": ERROR:  could not access
file "$libdir/pgcrypto": No such file or directory
In database: contrib_regression_pgcrypto

The same thing as HEAD could be done on its back-branches by removing
--with-openssl and bring more consistency, but pg_upgrade has never
been good at handling upgrades with different library requirements.
Something I am wondering is if AdjustUpgrade.pm could gain more
knowledge in this area, though I am unsure how much could be achieved
as this module has only object-level knowledge.

Anyway, that's not really limited to pgcrypto, any extension with
different cross-library requirements would see that.  One example,
xml2 could be compiled with libxml and without libxslt.  It is less
popular than pgcrypto, but the failure should be the same.

I'd rather choose the shortcut of removing --with-openssl from snapper
in the short term, but that does nothing for the root issue in the
long-term.

Thoughts?
--
Michael


signature.asc
Description: PGP signature


Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

2023-07-04 Thread Peter Smith
Hi. Here are some review comments for this patch.

+1 for the patch idea.

--

I wasn't sure about the code comment adjustments suggested by Amit [1]:
"Accordingly, the comments atop build_replindex_scan_key(),
FindUsableIndexForReplicaIdentityFull(), IsIndexOnlyOnExpression()
should also be adjusted."

Actually, I thought the FindUsableIndexForReplicaIdentityFull()
function comment is *already* describing the limitation about the
leftmost column (see fragment below), so IIUC the Sawada-san patch is
only trying to expose that same information in the PG docs.

[FindUsableIndexForReplicaIdentityFull comment fragment]
 * We also skip indexes if the remote relation does not contain the leftmost
 * column of the index. This is because in most such cases sequential scan is
 * favorable over index scan.

~

OTOH, it may be better if these limitation rule details were not
scattered in the code. e.g. build_replindex_scan_key() function
comment can be simplified:

CURRENT:
 * This is not generic routine, it expects the idxrel to be a btree, non-partial
 * and have at least one column reference (i.e. cannot consist of only
 * expressions).

SUGGESTION:
This is not a generic routine. It expects the 'idxrel' to be an index
deemed "usable" by the function
FindUsableIndexForReplicaIdentityFull().

--
doc/src/sgml/logical-replication.sgml

1.
the key.  When replica identity FULL is specified,
indexes can be used on the subscriber side for searching the rows.
Candidate
indexes must be btree, non-partial, and have at least one column reference
-   (i.e. cannot consist of only expressions).  These restrictions
-   on the non-unique index properties adhere to some of the restrictions that
-   are enforced for primary keys.  If there are no such suitable indexes,
+   at the leftmost column indexes (i.e. cannot consist of only
expressions).  These
+   restrictions on the non-unique index properties adhere to some of
the restrictions
+   that are enforced for primary keys.  If there are no such suitable indexes,
the search on the subscriber side can be very inefficient, therefore
replica identity FULL should only be used as a
fallback if no other solution is possible.  If a replica identity other

Isn't this using the word "indexes" with different meanings in the
same sentence? e.g. IIUC "leftmost column indexes" is referring to the
ordinal number of the index fields. TBH, I am not sure the patch
wording is even describing the limitation in quite the same way as
what the code is actually doing.

HEAD (code comment):
 * We also skip indexes if the remote relation does not contain the leftmost
 * column of the index. This is because in most such cases sequential scan is
 * favorable over index scan.

HEAD (rendered docs)
Candidate indexes must be btree, non-partial, and have at least one
column reference (i.e. cannot consist of only expressions). These
restrictions on the non-unique index properties adhere to some of the
restrictions that are enforced for primary keys.

PATCHED (rendered docs)
Candidate indexes must be btree, non-partial, and have at least one
column reference at the leftmost column indexes (i.e. cannot consist
of only expressions). These restrictions on the non-unique index
properties adhere to some of the restrictions that are enforced for
primary keys.

MY SUGGESTION:
Candidate indexes must be btree, non-partial, and have at least one
column reference (i.e. cannot consist of only expressions).
Furthermore, the leftmost field of the candidate index must be a
column of the published table. These restrictions on the non-unique
index properties adhere to some of the restrictions that are enforced
for primary keys.

--
[1] Amit suggestions -
https://www.postgresql.org/message-id/CAA4eK1LZ_A-UmC_P%2B_hLi%2BPbwyqak%2BvRKemZ7imzk2puVTpHOA%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: [PATCH] Add loongarch native checksum implementation.

2023-07-04 Thread YANG Xudong

Is there any other comment?

If the patch looks OK, I would like to update its status to ready for 
committer in the commitfest.


Thanks!

On 2023/6/16 09:28, YANG Xudong wrote:

Updated the patch based on the comments.

On 2023/6/15 18:30, John Naylor wrote:


On Wed, Jun 14, 2023 at 9:20 AM YANG Xudong > wrote:

 >
 > Attached a new patch with fixes based on the comment below.

Note: It's helpful to pass "-v" to git format-patch, to have different 
versions.




Added v2

 > > For x86 and Arm, if it fails to link without an -march flag, we 
allow
 > > for a runtime check. The flags "-march=armv8-a+crc" and 
"-msse4.2" are
 > > for instructions not found on all platforms. The patch also 
checks both

 > > ways, and each one results in "Use LoongArch CRC instruction
 > > unconditionally". The -march flag here is general, not specific. In
 > > other words, if this only runs inside "+elif host_cpu == 
'loongarch64'",

 > > why do we need both with -march and without?
 > >
 >
 > Removed the elif branch.

Okay, since we've confirmed that no arch flag is necessary, some other 
places can be simplified:


--- a/src/port/Makefile
+++ b/src/port/Makefile
@@ -98,6 +98,11 @@ pg_crc32c_armv8.o: CFLAGS+=$(CFLAGS_CRC)
  pg_crc32c_armv8_shlib.o: CFLAGS+=$(CFLAGS_CRC)
  pg_crc32c_armv8_srv.o: CFLAGS+=$(CFLAGS_CRC)

+# all versions of pg_crc32c_loongarch.o need CFLAGS_CRC
+pg_crc32c_loongarch.o: CFLAGS+=$(CFLAGS_CRC)
+pg_crc32c_loongarch_shlib.o: CFLAGS+=$(CFLAGS_CRC)
+pg_crc32c_loongarch_srv.o: CFLAGS+=$(CFLAGS_CRC)

This was copy-and-pasted from platforms that use a runtime check, so 
should be unnecessary.




Removed these lines.

+# If the intrinsics are supported, sets 
pgac_loongarch_crc32c_intrinsics,

+# and CFLAGS_CRC.

+# Check if __builtin_loongarch_crcc_* intrinsics can be used
+# with the default compiler flags.
+# CFLAGS_CRC is set if the extra flag is required.

Same here -- it seems we don't need to set CFLAGS_CRC at all. Can you 
confirm?




We don't need to set CFLAGS_CRC as commented. I have updated the 
configure script to make it align with the logic in meson build script.


 > > Also, I don't have a Loongarch machine for testing. Could you 
show that
 > > the instructions are found in the binary, maybe using objdump and 
grep?

 > > Or a performance test?
 > >
 >
 > The output of the objdump command `objdump -dS
 > ../postgres-build/tmp_install/usr/local/pgsql/bin/postgres  | grep 
-B 30

 > -A 10 crcc` is attached.

Thanks for confirming.

--
John Naylor
EDB: http://www.enterprisedb.com 





Re: Initial Schema Sync for Logical Replication

2023-07-04 Thread Masahiko Sawada
On Mon, Jun 19, 2023 at 5:29 PM Peter Smith  wrote:
>
> Hi,
>
> Below are my review comments for the PoC patch 0001.
>
> In addition,  the patch needed rebasing, and, after I rebased it
> locally in my private environment there were still test failures:
> a) The 'make check' tests fail but only in a minor way due to changes colname
> b) the subscription TAP test did not work at all for me -- many errors.

Thank you for reviewing the patch.

While updating the patch, I realized that the current approach won't
work well or at least has the problem with partition tables. If a
publication has a partitioned table with publish_via_root = false, the
subscriber launches tablesync workers for its partitions so that each
tablesync worker copies data of each partition. Similarly, if it has a
partition table with publish_via_root = true, the subscriber launches
a tablesync worker for the parent table. With the current design,
since the tablesync worker is responsible for both schema and data
synchronization for the target table, it won't be possible to
synchronize both the parent table's schema and partitions' schema. For
example, there is no pg_subscription_rel entry for the parent table if
the publication has publish_via_root = false. In addition to that, we
need to be careful about the order of synchronization of the parent
table and its partitions. We cannot start schema synchronization for
partitions before its parent table. So it seems to me that we need to
consider another approach.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Autogenerate some wait events code and documentation

2023-07-04 Thread Michael Paquier
On Tue, Jul 04, 2023 at 09:34:33AM +0200, Drouvot, Bertrand wrote:
> Yeah, with "capture" set to "false" then ninja alldocs does not error out
> and wait_event_types.sgml gets generated.
> 
> I'll look at the extra options --code and --docs.

+wait_event_types.sgml: 
$(top_srcdir)/src/backend/utils/activity/wait_event_names.txt 
$(top_srcdir)/src/backend/utils/activity/generate-wait_event_types.pl
+   $(PERL) 
$(top_srcdir)/src/backend/utils/activity/generate-wait_event_types.pl --docs $< 
> $@

This is doing the same error as meson in v10, there is no need for
the last part doing the redirection because the script outputs
nothing.  Here is the command generated:
make -C doc/src/sgml/ wait_event_types.sgml
'/usr/bin/perl'
../../../src/backend/utils/activity/generate-wait_event_types.pl
--docs ../../../src/backend/utils/activity/wait_event_names.txt >
wait_event_types.sgml

+wait_event_names = custom_target('wait_event_names',
+  input: files('../../backend/utils/activity/wait_event_names.txt'),
+  output: ['wait_event_types.h'],
This one was not completely correct (look at fmgrtab, for example), as
it is missing pgstat_wait_event.c in the output generated.  We could
perhaps be more selective with all that, including fmgrtab, but I have
left that out for now.  Note also the tweak with install_dir to not
install the C file.

+wait_event_names = custom_target('wait_event_names',
+  input: files('./wait_event_names.txt'),
+  output: ['pgstat_wait_event.c'],
+  command: [
+perl, files('./generate-wait_event_types.pl'),
+'-o', '@OUTDIR@', '--code',
+'@INPUT@'
+  ],
+  install: true,
+  install_dir: [false],
+)
[...]
+# these include .c files generated in ../../../include/activity, seems nicer 
to not
+# add that as an include path for the whole backend
+waitevent_sources = files(
   'wait_event.c',
 )
+
+backend_link_with += static_library('wait_event_names',
+  waitevent_sources,
+  dependencies: [backend_code],
+  include_directories: include_directories('../../../include/utils'),
+  kwargs: internal_lib_args,
+)

"wait_event_names" with the extra command should not be necessary
here, because we feed from the C file generated in src/include/utils/,
included in wait_event.c.  See src/backend/nodes/meson.build for a
similar example

Two of the error messages after rename() in the script were
inconsistent.  So reworded these on the way.

I have added a usage() to the script, while on it.

The VPATH build was broken, because the following line was missing
from src/backend/utils/activity/Makefile to be able to detect
pgstat_wait_event.c from wait_event.c:
override CPPFLAGS := -I. -I$(srcdir) $(CPPFLAGS)

With all that in place, VPATH builds, the CI, meson, configure/make
and the various cleanup targets were working fine, so I have applied
it.  Now let's see what the buildfarm tells.

The final --stat number is like that:
 22 files changed, 757 insertions(+), 2111 deletions(-)
--
Michael


signature.asc
Description: PGP signature


Make --help output fit within 80 columns per line

2023-07-04 Thread torikoshia

Hi,

As discussed in [1], outputs of --help for some commands fits into 80 
columns

per line, while others do not.

Since it seems preferable to have consistent line break policy and some 
people

use 80-column terminal, wouldn't it be better to make all commands in 80
columns per line?

Attached patch which does this for src/bin commands.

If this is the way to go, I'll do same things for contrib commands.

[1] 
https://www.postgresql.org/message-id/3fe4af5a0a81fc6a2ec01cb484c0a487%40oss.nttdata.com



--
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATIONFrom 4505bbcf0efe199e34159696d64bd9f8cc60fb37 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Wed, 5 Jul 2023 10:14:58 +0900
Subject: [PATCH v1] Make --help output fit within 80 columns per line

Outputs of --help for some commands fits into 80 columns per line,
while others do not.
For the consistency and for 80-column terminal, this patch makes them
fit within 80 columns per line.

---
 src/bin/initdb/initdb.c   |  6 --
 src/bin/pg_amcheck/pg_amcheck.c   | 21 ---
 src/bin/pg_archivecleanup/pg_archivecleanup.c |  2 +-
 src/bin/pg_basebackup/pg_receivewal.c | 12 +++
 src/bin/pg_basebackup/pg_recvlogical.c| 18 ++--
 src/bin/pg_checksums/pg_checksums.c   |  3 ++-
 src/bin/pg_dump/pg_dump.c |  3 ++-
 src/bin/pg_dump/pg_dumpall.c  |  3 ++-
 src/bin/pg_dump/pg_restore.c  |  6 --
 src/bin/pg_upgrade/option.c   |  9 +---
 src/bin/pgbench/pgbench.c |  6 --
 src/bin/psql/help.c   |  9 +---
 src/bin/scripts/createdb.c|  3 ++-
 src/bin/scripts/pg_isready.c  |  3 ++-
 src/bin/scripts/vacuumdb.c| 21 ---
 15 files changed, 83 insertions(+), 42 deletions(-)

diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index fc1fb363e7..ccdd093e24 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2429,8 +2429,10 @@ usage(const char *progname)
 	printf(_("  %s [OPTION]... [DATADIR]\n"), progname);
 	printf(_("\nOptions:\n"));
 	printf(_("  -A, --auth=METHOD default authentication method for local connections\n"));
-	printf(_("  --auth-host=METHODdefault authentication method for local TCP/IP connections\n"));
-	printf(_("  --auth-local=METHOD   default authentication method for local-socket connections\n"));
+	printf(_("  --auth-host=METHODdefault authentication method for local TCP/IP\n"
+			 "connections\n"));
+	printf(_("  --auth-local=METHOD   default authentication method for local-socket\n"
+			 "connections\n"));
 	printf(_(" [-D, --pgdata=]DATADIR location for this database cluster\n"));
 	printf(_("  -E, --encoding=ENCODING   set default encoding for new databases\n"));
 	printf(_("  -g, --allow-group-access  allow group read/execute on data directory\n"));
diff --git a/src/bin/pg_amcheck/pg_amcheck.c b/src/bin/pg_amcheck/pg_amcheck.c
index 68f8180c19..754a2723ce 100644
--- a/src/bin/pg_amcheck/pg_amcheck.c
+++ b/src/bin/pg_amcheck/pg_amcheck.c
@@ -1150,17 +1150,23 @@ help(const char *progname)
 	printf(_("  -S, --exclude-schema=PATTERNdo NOT check matching schema(s)\n"));
 	printf(_("  -t, --table=PATTERN check matching table(s)\n"));
 	printf(_("  -T, --exclude-table=PATTERN do NOT check matching table(s)\n"));
-	printf(_("  --no-dependent-indexes  do NOT expand list of relations to include indexes\n"));
-	printf(_("  --no-dependent-toastdo NOT expand list of relations to include TOAST tables\n"));
+	printf(_("  --no-dependent-indexes  do NOT expand list of relations to include\n"
+			 "  indexes\n"));
+	printf(_("  --no-dependent-toastdo NOT expand list of relations to include\n"
+			 "  TOAST tables\n"));
 	printf(_("  --no-strict-names   do NOT require patterns to match objects\n"));
 	printf(_("\nTable checking options:\n"));
 	printf(_("  --exclude-toast-pointersdo NOT follow relation TOAST pointers\n"));
 	printf(_("  --on-error-stop stop checking at end of first corrupt page\n"));
-	printf(_("  --skip=OPTION   do NOT check \"all-frozen\" or \"all-visible\" blocks\n"));
-	printf(_("  --startblock=BLOCK  begin checking table(s) at the given block number\n"));
-	printf(_("  --endblock=BLOCKcheck table(s) only up to the given block number\n"));
+	printf(_("  --skip=OPTION   do NOT check \"all-frozen\" or \"all-visible\"\n"
+			 "  blocks\n"));
+	printf(_("  --startblock=BLOCK  begin checking table(s) at the given block\n"
+			 "  number\n"));
+	

Re: Extension Enhancement: Buffer Invalidation in pg_buffercache

2023-07-04 Thread jian he
On Sat, Jul 1, 2023 at 6:09 AM Thomas Munro  wrote:
>
>
> It should be restricted by role, but I wonder which role it should be.
> Testing for superuser is now out of fashion.
>

as pg_buffercache/pg_buffercache--1.2--1.3.sql. You need pg_maintain
privilege to use pg_buffercache.
The following query works on a single user. Obviously you need a role who
can gain pg_monitor privilege.

begin;
create role test login nosuperuser;
grant select, insert on onek to test;
grant pg_monitor to test;
set role test;
select count(*) from onek;
insert into onek values(default);
(SELECT count(*) FROM pg_buffercache WHERE relfilenode =
pg_relation_filenode('onek'::regclass))
except
(
select count(pg_buffercache_invalidate(bufferid))
frompg_buffercache   where relfilenode =
pg_relation_filenode('onek'::regclass)
);

rollback;


Re: brininsert optimization opportunity

2023-07-04 Thread Soumyadeep Chakraborty
On Tue, Jul 4, 2023 at 2:54 PM Tomas Vondra
 wrote:

> I'm not sure if memory context callbacks are the right way to rely on
> for this purpose. The primary purpose of memory contexts is to track
> memory, so using them for this seems a bit weird.

Yeah, this just kept getting dirtier and dirtier.

> There are cases that do something similar, like expandendrecord.c where
> we track refcounted tuple slot, but IMHO there's a big difference
> between tracking one slot allocated right there, and unknown number of
> buffers allocated much later.

Yeah, the following code in ER_mc_callbackis is there I think to prevent double
freeing the tupdesc (since it might be freed in ResourceOwnerReleaseInternal())
(The part about: /* Ditto for tupdesc references */).

if (tupdesc->tdrefcount > 0)
{
if (--tupdesc->tdrefcount == 0)
FreeTupleDesc(tupdesc);
}
Plus the above code doesn't try anything with Resource owner stuff, whereas
releasing a buffer means:
ReleaseBuffer() -> UnpinBuffer() ->
ResourceOwnerForgetBuffer(CurrentResourceOwner, b);

> The fact that even with the extra context is still doesn't handle query
> cancellations is another argument against that approach (I wonder how
> expandedrecord.c handles that, but I haven't checked).
>
> >
> > Maybe there is a better way of doing our cleanup? I'm not sure. Would
> > love your input!
> >
> > The other alternative for all this is to introduce new AM callbacks for
> > insert_begin and insert_end. That might be a tougher sell?
> >
>
> That's the approach I wanted to suggest, more or less - to do the
> cleanup from ExecCloseIndices() before index_close(). I wonder if it's
> even correct to do that later, once we release the locks etc.

I'll try this out and introduce a couple of new index AM callbacks. I
think it's best to do it before releasing the locks - otherwise it
might be weird
to manipulate buffers of an index relation, without having some sort of lock on
it. I'll think about it some more.

> I don't think ii_AmCache was intended for stuff like this - GIN and GiST
> only use it to cache stuff that can be just pfree-d, but for buffers
> that's no enough. It's not surprising we need to improve this.

Hmmm, yes, although the docs state:
"If the index AM wishes to cache data across successive index insertions within
an SQL statement, it can allocate space in indexInfo->ii_Context and
store a pointer
to the data in indexInfo->ii_AmCache (which will be NULL initially)."
they don't mention anything about buffer usage. Well we will fix it!

PS: It should be possible to make GIN and GiST use the new index AM APIs
as well.

> FWIW while debugging this (breakpoint on MemoryContextDelete) I was
> rather annoyed the COPY keeps dropping and recreating the two BRIN
> contexts - brininsert cxt / brin dtuple. I wonder if we could keep and
> reuse those too, but I don't know how much it'd help.
>

Interesting, I will investigate that.

> > Now, to finally answer your question about the speedup without
> > generate_series(). We do see an even higher speedup!
> >
> > seq 1 2 > /tmp/data.csv
> > \timing
> > DROP TABLE heap;
> > CREATE TABLE heap(i int);
> > CREATE INDEX ON heap USING brin(i) WITH (pages_per_range=1);
> > COPY heap FROM '/tmp/data.csv';
> >
> > -- 3 runs (master 29cf61ade3f245aa40f427a1d6345287ef77e622)
> > COPY 2
> > Time: 205072.444 ms (03:25.072)
> > Time: 215380.369 ms (03:35.380)
> > Time: 203492.347 ms (03:23.492)
> >
> > -- 3 runs (branch v2)
> >
> > COPY 2
> > Time: 135052.752 ms (02:15.053)
> > Time: 135093.131 ms (02:15.093)
> > Time: 138737.048 ms (02:18.737)
> >
>
> That's nice, but it still doesn't say how much of that is reading the
> data. If you do just copy into a table without any indexes, how long
> does it take?

So, I loaded the same heap table without any indexes and at the same
scale. I got:

postgres=# COPY heap FROM '/tmp/data.csv';
COPY 2
Time: 116161.545 ms (01:56.162)
Time: 114182.745 ms (01:54.183)
Time: 114975.368 ms (01:54.975)

perf diff also attached between the three: w/ no indexes (baseline),
master and v2.

Regards,
Soumyadeep (VMware)


perf_diff_3way.out
Description: Binary data


[PATCH] Add GitLab CI to PostgreSQL

2023-07-04 Thread Newhouse, Robin
Hello!

I propose the attached patch to be applied on the 'master' branch
of PostgreSQL to add GitLab CI automation alongside Cirrus CI in the PostgreSQL 
repository.

It is not intended to be a replacement for Cirrus CI, but simply suggestion for 
the
PostgreSQL project to host centrally a Gitlab CI definition for those who 
prefer to use
it while developing/testing PostgreSQL.

The intent is to facilitate collaboration among GitLab users, promote 
standardization
and consistency, and ultimately, improve testing and code quality.

Robin Newhouse
Amazon Web Services: https://aws.amazon.com


v1-0001-Add-GitLab-CI-to-PostgreSQL.patch
Description: v1-0001-Add-GitLab-CI-to-PostgreSQL.patch


Re: Deleting prepared statements from libpq.

2023-07-04 Thread Michael Paquier
On Tue, Jul 04, 2023 at 04:09:43PM +0900, Michael Paquier wrote:
> On Tue, Jul 04, 2023 at 08:28:40AM +0900, Michael Paquier wrote:
>> Sure, feel free.  I was planning to look at and play more with it.
>
> Well, done.

For the sake of completeness, as I forgot to send my notes.

+   if (PQsendClosePrepared(conn, "select_one") != 1)
+   pg_fatal("PQsendClosePortal failed: %s", PQerrorMessage(conn));
There was a small copy-pasto here.
--
Michael


signature.asc
Description: PGP signature


Re: Experiments with Postgres and SSL

2023-07-04 Thread Michael Paquier
On Tue, Jul 04, 2023 at 05:15:49PM +0300, Heikki Linnakangas wrote:
> I don't see the point of the libpq 'sslalpn' option either. Let's send ALPN
> always.
> 
> Admittedly having the options make testing different of combinations of old
> and new clients and servers a little easier. But I don't think we should add
> options for the sake of backwards compatibility tests.

Hmm.  I would actually argue in favor of having these with tests in
core to stress the previous SSL hanshake protocol, as not having these
parameters would mean that we rely only on major version upgrades in
the buildfarm to test the backward-compatible code path, making issues
much harder to catch.  And we still need to maintain the
backward-compatible path for 10 years based on what pg_dump and
pg_upgrade need to support.
--
Michael


signature.asc
Description: PGP signature


Re: check_strxfrm_bug()

2023-07-04 Thread Thomas Munro
On Wed, Jul 5, 2023 at 10:15 AM Thomas Munro  wrote:
> [1] https://cirrus-ci.com/build/5298278007308288

That'll teach me to be impatient.  I only waited for compiling to
finish after triggering the optional extra MinGW task before sending
the above email, figuring that the only risk was there, but then the
pg_upgrade task failed due to mismatched locales.  Apparently there is
something I don't understand yet about locale_t support under MinGW.




Re: check_strxfrm_bug()

2023-07-04 Thread Thomas Munro
On Tue, Jul 4, 2023 at 2:52 AM Tristan Partin  wrote:
> The patch looks good to me as well. Happy to rebase my other patch on
> this one.

Thanks.  Here is a slightly tidier version.  It passes on CI[1]
including the optional extra MinGW64/Meson task, and the
MinGW64/autoconf configure+build that is in the SanityCheck task.
There are two questions I'm hoping to get feedback on:  (1) I believe
that defining HAVE_MBSTOWCS_L etc in win32_port.h is the best idea
because that is also where we define mbstowcs_l() etc.  Does that make
sense?  (2) IIRC, ye olde Solution.pm system might break if I were to
completely remove HAVE_MBSTOWCS_L and HAVE_WCSTOMBS_L from Solution.pm
(there must be a check somewhere that compares it with pg_config.h.in
or something like that), but it would also break if I defined them as
1 there (macro redefinition).Will undef in Solution.pm be
acceptable (ie define nothing to avoid redefinition, but side-step the
sanity check)?  It's a bit of a kludge, but IIRC we're dropping that
3rd build system in 17 so maybe that's OK?  (Not tested as I don't
have Windows and CI doesn't test Solution.pm, so I'd be grateful if
someone who has Windows/Solution.pm setup could try this.)

[1] https://cirrus-ci.com/build/5298278007308288
From ab0705a9e169627a77bf4e3cc36ea53603162921 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Tue, 4 Jul 2023 16:32:35 +1200
Subject: [PATCH v2] All supported systems have locale_t.

locale_t is defined by POSIX.1-2008 and SUSv4, and available on all
targeted systems.  For Windows, win32_port.h redirects to a partial
implementation called _locale_t.  It's time to remove a lot of
compile-time tests for HAVE_LOCALE_T, and a few associated comments and
dead code branches.

Definitions for HAVE_WCSTOMBS_L and HAVE_MBSTOWCS_L had to be moved into
win32_port.h, because this change revealed that MinGW builds on Windows
were failing to detect these functions, but without them we couldn't
build due to the assumption that uselocale() exists.

Reviewed-by: Noah Misch 
Reviewed-by: Tristan Partin 
Discussion: https://postgr.es/m/CA%2BhUKGLg7_T2GKwZFAkEf0V7vbnur-NfCjZPKZb%3DZfAXSV1ORw%40mail.gmail.com
---
 config/c-library.m4  | 10 ++
 configure|  5 ---
 meson.build  | 22 +++-
 src/backend/commands/collationcmds.c |  2 +-
 src/backend/regex/regc_pg_locale.c   | 52 +---
 src/backend/utils/adt/formatting.c   | 18 --
 src/backend/utils/adt/like.c |  2 --
 src/backend/utils/adt/like_support.c |  2 --
 src/backend/utils/adt/pg_locale.c| 36 ---
 src/include/pg_config.h.in   |  3 --
 src/include/port/win32_port.h|  3 ++
 src/include/utils/pg_locale.h|  7 +---
 src/tools/msvc/Solution.pm   |  5 ++-
 13 files changed, 16 insertions(+), 151 deletions(-)

diff --git a/config/c-library.m4 b/config/c-library.m4
index c1dd804679..aa8223d2ef 100644
--- a/config/c-library.m4
+++ b/config/c-library.m4
@@ -86,9 +86,9 @@ AC_DEFUN([PGAC_STRUCT_SOCKADDR_SA_LEN],
 # PGAC_TYPE_LOCALE_T
 # --
 # Check for the locale_t type and find the right header file.  macOS
-# needs xlocale.h; standard is locale.h, but glibc also has an
-# xlocale.h file that we should not use.
-#
+# needs xlocale.h; standard is locale.h, but glibc <= 2.25 also had an
+# xlocale.h file that we should not use, so we check the standard
+# header first.
 AC_DEFUN([PGAC_TYPE_LOCALE_T],
 [AC_CACHE_CHECK([for locale_t], pgac_cv_type_locale_t,
 [AC_COMPILE_IFELSE([AC_LANG_PROGRAM(
@@ -102,10 +102,6 @@ locale_t x;],
 [])],
 [pgac_cv_type_locale_t='yes (in xlocale.h)'],
 [pgac_cv_type_locale_t=no])])])
-if test "$pgac_cv_type_locale_t" != no; then
-  AC_DEFINE(HAVE_LOCALE_T, 1,
-[Define to 1 if the system has the type `locale_t'.])
-fi
 if test "$pgac_cv_type_locale_t" = 'yes (in xlocale.h)'; then
   AC_DEFINE(LOCALE_T_IN_XLOCALE, 1,
 [Define to 1 if `locale_t' requires .])
diff --git a/configure b/configure
index 997d42d8f7..f1d4630007 100755
--- a/configure
+++ b/configure
@@ -15120,11 +15120,6 @@ rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
 fi
 { $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv_type_locale_t" >&5
 $as_echo "$pgac_cv_type_locale_t" >&6; }
-if test "$pgac_cv_type_locale_t" != no; then
-
-$as_echo "#define HAVE_LOCALE_T 1" >>confdefs.h
-
-fi
 if test "$pgac_cv_type_locale_t" = 'yes (in xlocale.h)'; then
 
 $as_echo "#define LOCALE_T_IN_XLOCALE 1" >>confdefs.h
diff --git a/meson.build b/meson.build
index 3ea4b0d72a..82a966fcc2 100644
--- a/meson.build
+++ b/meson.build
@@ -2283,17 +2283,12 @@ else
   cdata.set('STRERROR_R_INT', false)
 endif
 
-# Check for the locale_t type and find the right header file.  macOS
-# needs xlocale.h; standard is locale.h, but glibc also has an
-# xlocale.h file that we should not use.  MSVC has a replacement
-# defined in src/include/port/win32_port.h.
-if 

Re: Parallel CREATE INDEX for BRIN indexes

2023-07-04 Thread Tomas Vondra



On 7/4/23 23:53, Matthias van de Meent wrote:
> On Thu, 8 Jun 2023 at 14:55, Tomas Vondra  
> wrote:
>>
>> Hi,
>>
>> Here's a WIP patch allowing parallel CREATE INDEX for BRIN indexes. The
>> infrastructure (starting workers etc.) is "inspired" by the BTREE code
>> (i.e. copied from that and massaged a bit to call brin stuff).
> 
> Nice work.
> 
>> In both cases _brin_end_parallel then reads the summaries from worker
>> files, and adds them into the index. In 0001 this is fairly simple,
>> although we could do one more improvement and sort the ranges by range
>> start to make the index nicer (and possibly a bit more efficient). This
>> should be simple, because the per-worker results are already sorted like
>> that (so a merge sort in _brin_end_parallel would be enough).
> 
> I see that you manually built the passing and sorting of tuples
> between workers, but can't we use the parallel tuplesort
> infrastructure for that? It already has similar features in place and
> improves code commonality.
> 

Maybe. I wasn't that familiar with what parallel tuplesort can and can't
do, and the little I knew I managed to forget since I wrote this patch.
Which similar features do you have in mind?

The workers are producing the results in "start_block" order, so if they
pass that to the leader, it probably can do the usual merge sort.

>> For 0002 it's a bit more complicated, because with a single parallel
>> scan brinbuildCallbackParallel can't decide if a range is assigned to a
>> different worker or empty. And we want to generate summaries for empty
>> ranges in the index. We could either skip such range during index build,
>> and then add empty summaries in _brin_end_parallel (if needed), or add
>> them and then merge them using "union".
>>
>>
>> I just realized there's a third option to do this - we could just do
>> regular parallel scan (with no particular regard to pagesPerRange), and
>> then do "union" when merging results from workers. It doesn't require
>> the sequence of TID scans, and the union would also handle the empty
>> ranges. The per-worker results might be much larger, though, because
>> each worker might produce up to the "full" BRIN index.
> 
> Would it be too much effort to add a 'min_chunk_size' argument to
> table_beginscan_parallel (or ParallelTableScanDesc) that defines the
> minimum granularity of block ranges to be assigned to each process? I
> think that would be the most elegant solution that would require
> relatively little effort: table_block_parallelscan_nextpage already
> does parallel management of multiple chunk sizes, and I think this
> modification would fit quite well in that code.
> 

I'm confused. Isn't that pretty much exactly what 0002 does? I mean,
that passes pagesPerRange to table_parallelscan_initialize(), so that
each pagesPerRange is assigned to a single worker.

The trouble I described above is that the scan returns tuples, and the
consumer has no idea what was the chunk size or how many other workers
are there. Imagine you get a tuple from block 1, and then a tuple from
block 1000. Does that mean that the blocks in between are empty or that
they were processed by some other worker?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: brininsert optimization opportunity

2023-07-04 Thread Tomas Vondra
On 7/4/23 21:25, Soumyadeep Chakraborty wrote:
> Thank you both for reviewing!
> 
> On Tue, Jul 4, 2023 at 4:24AM Alvaro Herrera  wrote:
> 
>> Hmm, yeah, I remember being bit bothered by this repeated
>> initialization. Your patch looks reasonable to me. I would set
>> bistate->bs_rmAccess to NULL in the cleanup callback, just to be sure.
>> Also, please add comments atop these two new functions, to explain what
>> they are.
> 
> Done. Set bistate->bs_desc = NULL; as well. Added comments.
> 
> 
> On Tue, Jul 4, 2023 at 4:59AM Tomas Vondra
>  wrote:
> 
>> Yeah. I wonder how much of that runtime is the generate_series(),
>> though. What's the speedup if that part is subtracted. It's guaranteed
>> to be even more significant, but by how much?
> 
> When trying COPY, I got tripped by the following:
> 
> We get a buffer leak WARNING for the meta page and a revmap page.
> 
> WARNING:  buffer refcount leak: [094] (rel=base/156912/206068,
> blockNum=1, flags=0x8300, refcount=1 1)
> WARNING:  buffer refcount leak: [093] (rel=base/156912/206068,
> blockNum=0, flags=0x8300, refcount=1 1)
> 
> PrintBufferLeakWarning bufmgr.c:3240
> ResourceOwnerReleaseInternal resowner.c:554
> ResourceOwnerRelease resowner.c:494
> PortalDrop portalmem.c:563
> exec_simple_query postgres.c:1284
> 
> We release the buffer during this resowner release and then we crash
> with:
> 
> TRAP: failed Assert("bufnum <= NBuffers"), File:
> "../../../../src/include/storage/bufmgr.h", Line: 305, PID: 86833
> postgres: pivotal test4 [local] 
> COPY(ExceptionalCondition+0xbb)[0x5572b55bcc79]
> postgres: pivotal test4 [local] COPY(+0x61ccfc)[0x5572b537dcfc]
> postgres: pivotal test4 [local] COPY(ReleaseBuffer+0x19)[0x5572b5384db2]
> postgres: pivotal test4 [local] COPY(brinRevmapTerminate+0x1e)[0x5572b4e3fd39]
> postgres: pivotal test4 [local] COPY(+0xcfc44)[0x5572b4e30c44]
> postgres: pivotal test4 [local] COPY(+0x89e7f2)[0x5572b55ff7f2]
> postgres: pivotal test4 [local] COPY(MemoryContextDelete+0xd7)[0x5572b55ff683]
> postgres: pivotal test4 [local] COPY(PortalDrop+0x374)[0x5572b5602dc7]
> 
> Unfortunately, when we do COPY, the MemoryContext where makeIndexInfo
> gets called is PortalContext and that is what is set in ii_Context.
> Furthermore, we clean up the resource owner stuff before we can clean
> up the MemoryContexts in PortalDrop().
> 
> The CurrentMemoryContext when initialize_brin_insertstate() is called
> depends. For CopyMultiInsertBufferFlush() -> ExecInsertIndexTuples()
> it is PortalContext, and for CopyFrom() -> ExecInsertIndexTuples() it is
> ExecutorState/ExprContext. We can't rely on it to register the callback
> neither.
> > What we can do is create a new MemoryContext for holding the
> BrinInsertState, and we tie the callback to that so that cleanup is not
> affected by all of these variables. See v2 patch attached. Passes make
> installcheck-world and make installcheck -C src/test/modules/brin.
>> However, we do still have 1 issue with the v2 patch:
> When we try to cancel (Ctrl-c) a running COPY command:
> ERROR:  buffer 151 is not owned by resource owner TopTransaction
> 

I'm not sure if memory context callbacks are the right way to rely on
for this purpose. The primary purpose of memory contexts is to track
memory, so using them for this seems a bit weird.

There are cases that do something similar, like expandendrecord.c where
we track refcounted tuple slot, but IMHO there's a big difference
between tracking one slot allocated right there, and unknown number of
buffers allocated much later.

The fact that even with the extra context is still doesn't handle query
cancellations is another argument against that approach (I wonder how
expandedrecord.c handles that, but I haven't checked).

> 
> Maybe there is a better way of doing our cleanup? I'm not sure. Would
> love your input!
> 
> The other alternative for all this is to introduce new AM callbacks for
> insert_begin and insert_end. That might be a tougher sell?
> 

That's the approach I wanted to suggest, more or less - to do the
cleanup from ExecCloseIndices() before index_close(). I wonder if it's
even correct to do that later, once we release the locks etc.

I don't think ii_AmCache was intended for stuff like this - GIN and GiST
only use it to cache stuff that can be just pfree-d, but for buffers
that's no enough. It's not surprising we need to improve this.

FWIW while debugging this (breakpoint on MemoryContextDelete) I was
rather annoyed the COPY keeps dropping and recreating the two BRIN
contexts - brininsert cxt / brin dtuple. I wonder if we could keep and
reuse those too, but I don't know how much it'd help.

> Now, to finally answer your question about the speedup without
> generate_series(). We do see an even higher speedup!
> 
> seq 1 2 > /tmp/data.csv
> \timing
> DROP TABLE heap;
> CREATE TABLE heap(i int);
> CREATE INDEX ON heap USING brin(i) WITH (pages_per_range=1);
> COPY heap FROM '/tmp/data.csv';
> 
> -- 3 runs (master 

Re: Parallel CREATE INDEX for BRIN indexes

2023-07-04 Thread Matthias van de Meent
On Thu, 8 Jun 2023 at 14:55, Tomas Vondra  wrote:
>
> Hi,
>
> Here's a WIP patch allowing parallel CREATE INDEX for BRIN indexes. The
> infrastructure (starting workers etc.) is "inspired" by the BTREE code
> (i.e. copied from that and massaged a bit to call brin stuff).

Nice work.

> In both cases _brin_end_parallel then reads the summaries from worker
> files, and adds them into the index. In 0001 this is fairly simple,
> although we could do one more improvement and sort the ranges by range
> start to make the index nicer (and possibly a bit more efficient). This
> should be simple, because the per-worker results are already sorted like
> that (so a merge sort in _brin_end_parallel would be enough).

I see that you manually built the passing and sorting of tuples
between workers, but can't we use the parallel tuplesort
infrastructure for that? It already has similar features in place and
improves code commonality.

> For 0002 it's a bit more complicated, because with a single parallel
> scan brinbuildCallbackParallel can't decide if a range is assigned to a
> different worker or empty. And we want to generate summaries for empty
> ranges in the index. We could either skip such range during index build,
> and then add empty summaries in _brin_end_parallel (if needed), or add
> them and then merge them using "union".
>
>
> I just realized there's a third option to do this - we could just do
> regular parallel scan (with no particular regard to pagesPerRange), and
> then do "union" when merging results from workers. It doesn't require
> the sequence of TID scans, and the union would also handle the empty
> ranges. The per-worker results might be much larger, though, because
> each worker might produce up to the "full" BRIN index.

Would it be too much effort to add a 'min_chunk_size' argument to
table_beginscan_parallel (or ParallelTableScanDesc) that defines the
minimum granularity of block ranges to be assigned to each process? I
think that would be the most elegant solution that would require
relatively little effort: table_block_parallelscan_nextpage already
does parallel management of multiple chunk sizes, and I think this
modification would fit quite well in that code.

Kind regards,

Matthias van de Meent




Re: PG 16 draft release notes ready

2023-07-04 Thread Bruce Momjian
On Tue, Jul  4, 2023 at 03:31:05PM +0900, Michael Paquier wrote:
> On Thu, May 18, 2023 at 04:49:47PM -0400, Bruce Momjian wrote:
> > I have completed the first draft of the PG 16 release notes.  You can
> > see the output here:
> > 
> > https://momjian.us/pgsql_docs/release-16.html
> > 
> > I will adjust it to the feedback I receive;  that URL will quickly show
> > all updates.
> 
> Sawada-san has mentioned on twitter that fdd8937 is not mentioned in
> the release notes, and it seems to me that he is right.  This is
> described as a bug in the commit log, but it did not get backpatched
> because of the lack of complaints.  Also, because we've removed
> support for anything older than Windows 10 in PG16, this change very
> easy to do.

I did review this and wasn't sure exactly what I would describe.  It is
saying huge pages will now work on some versions of Windows 10 but
didn't before?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: pg_basebackup check vs Windows file path limits

2023-07-04 Thread Daniel Gustafsson
> On 4 Jul 2023, at 20:19, Andrew Dunstan  wrote:

> But sadly we're kinda back where we started. fairywren is failing on 
> REL_16_STABLE. Before the changes the failure occurred because the test 
> script was unable to create the file with a path > 255. Now that we have a 
> way to create the file the test for pg_basebackup to reject files with names 
> > 100 fails, I presume because the server can't actually see the file. At 
> this stage I'm thinking the best thing would be to skip the test altogether 
> on windows if the path is longer than 255.

That does sound like a fairly large hammer for a nail small enough that we
should be able to fix it, but I don't have any other good ideas off the cuff.

--
Daniel Gustafsson





Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

2023-07-04 Thread Melih Mutlu
Hayato Kuroda (Fujitsu) , 4 Tem 2023 Sal,
08:42 tarihinde şunu yazdı:
> > > But in the later patch the tablesync worker tries to reuse the slot 
> > > during the
> > > synchronization, so in this case the application_name should be same as
> > slotname.
> > >
> >
> > Fair enough. I am slightly afraid that if we can't show the benefits
> > with later patches then we may need to drop them but at this stage I
> > feel we need to investigate why those are not helping?
>
> Agreed. Now I'm planning to do performance testing independently. We can 
> discuss
> based on that or Melih's one.

Here I attached  what I use for performance testing of this patch.

I only benchmarked the patch set with reusing connections very roughly
so far. But seems like it improves quite significantly. For example,
it took 611 ms to sync 100 empty tables, it was 1782 ms without
reusing connections.
First 3 patches from the set actually bring a good amount of
improvement, but not sure about the later patches yet.

Amit Kapila , 3 Tem 2023 Pzt, 08:59 tarihinde
şunu yazdı:
> On thinking about this, I think the primary benefit we were expecting
> by saving network round trips for slot drop/create but now that we
> anyway need an extra round trip to establish a snapshot, so such a
> benefit was not visible. This is just a theory so we should validate
> it. The another idea as discussed before [1] could be to try copying
> multiple tables in a single transaction. Now, keeping a transaction
> open for a longer time could have side-effects on the publisher node.
> So, we probably need to ensure that we don't perform multiple large
> syncs and even for smaller tables (and later sequences) perform it
> only for some threshold number of tables which we can figure out by
> some tests. Also, the other safety-check could be that anytime we need
> to perform streaming (sync with apply worker), we won't copy more
> tables in same transaction.
>
> Thoughts?

Yeah, maybe going to the publisher for creating a slot or only a
snapshot does not really make enough difference. I was hoping that
creating only snapshot by an existing replication slot would help the
performance. I guess I was either wrong or am missing something in the
implementation.

The tricky bit with keeping a long transaction to copy multiple tables
is deciding how many tables one transaction can copy.

Thanks,
-- 
Melih Mutlu
Microsoft
--- on publisher
SELECT 'CREATE TABLE manytables_'||i||'(i int);' FROM generate_series(1, 100) 
g(i) \gexec
SELECT pg_create_logical_replication_slot('mysub_slot', 'pgoutput');

--- on subscriber
SELECT 'CREATE TABLE manytables_'||i||'(i int);' FROM generate_series(1, 100) 
g(i) \gexec

CREATE OR REPLACE PROCEDURE log_rep_test(max INTEGER) AS $$
DECLARE
counter INTEGER := 1;
total_duration INTERVAL := '0';
avg_duration FLOAT := 0.0;
start_time TIMESTAMP;
end_time TIMESTAMP;
BEGIN
WHILE counter <= max LOOP

EXECUTE 'DROP SUBSCRIPTION IF EXISTS mysub;';

start_time := clock_timestamp();
EXECUTE 'CREATE SUBSCRIPTION mysub CONNECTION ''dbname=postgres 
port=5432'' PUBLICATION mypub WITH (create_slot=false, 
slot_name=''mysub_slot'');';
COMMIT;

WHILE EXISTS (SELECT 1 FROM pg_subscription_rel WHERE srsubstate != 
'r') LOOP
COMMIT;
END LOOP;

end_time := clock_timestamp();


EXECUTE 'ALTER SUBSCRIPTION mysub DISABLE;';
EXECUTE 'ALTER SUBSCRIPTION mysub SET (slot_name = none);';


total_duration := total_duration + (end_time - start_time);

counter := counter + 1;
END LOOP;

IF max > 0 THEN
avg_duration := EXTRACT(EPOCH FROM total_duration) / max * 1000;
END IF;

RAISE NOTICE '%', avg_duration;
END;
$$ LANGUAGE plpgsql;


call log_rep_test(5);

Re: brininsert optimization opportunity

2023-07-04 Thread Soumyadeep Chakraborty
Thank you both for reviewing!

On Tue, Jul 4, 2023 at 4:24AM Alvaro Herrera  wrote:

> Hmm, yeah, I remember being bit bothered by this repeated
> initialization. Your patch looks reasonable to me. I would set
> bistate->bs_rmAccess to NULL in the cleanup callback, just to be sure.
> Also, please add comments atop these two new functions, to explain what
> they are.

Done. Set bistate->bs_desc = NULL; as well. Added comments.


On Tue, Jul 4, 2023 at 4:59AM Tomas Vondra
 wrote:

> Yeah. I wonder how much of that runtime is the generate_series(),
> though. What's the speedup if that part is subtracted. It's guaranteed
> to be even more significant, but by how much?

When trying COPY, I got tripped by the following:

We get a buffer leak WARNING for the meta page and a revmap page.

WARNING:  buffer refcount leak: [094] (rel=base/156912/206068,
blockNum=1, flags=0x8300, refcount=1 1)
WARNING:  buffer refcount leak: [093] (rel=base/156912/206068,
blockNum=0, flags=0x8300, refcount=1 1)

PrintBufferLeakWarning bufmgr.c:3240
ResourceOwnerReleaseInternal resowner.c:554
ResourceOwnerRelease resowner.c:494
PortalDrop portalmem.c:563
exec_simple_query postgres.c:1284

We release the buffer during this resowner release and then we crash
with:

TRAP: failed Assert("bufnum <= NBuffers"), File:
"../../../../src/include/storage/bufmgr.h", Line: 305, PID: 86833
postgres: pivotal test4 [local] COPY(ExceptionalCondition+0xbb)[0x5572b55bcc79]
postgres: pivotal test4 [local] COPY(+0x61ccfc)[0x5572b537dcfc]
postgres: pivotal test4 [local] COPY(ReleaseBuffer+0x19)[0x5572b5384db2]
postgres: pivotal test4 [local] COPY(brinRevmapTerminate+0x1e)[0x5572b4e3fd39]
postgres: pivotal test4 [local] COPY(+0xcfc44)[0x5572b4e30c44]
postgres: pivotal test4 [local] COPY(+0x89e7f2)[0x5572b55ff7f2]
postgres: pivotal test4 [local] COPY(MemoryContextDelete+0xd7)[0x5572b55ff683]
postgres: pivotal test4 [local] COPY(PortalDrop+0x374)[0x5572b5602dc7]

Unfortunately, when we do COPY, the MemoryContext where makeIndexInfo
gets called is PortalContext and that is what is set in ii_Context.
Furthermore, we clean up the resource owner stuff before we can clean
up the MemoryContexts in PortalDrop().

The CurrentMemoryContext when initialize_brin_insertstate() is called
depends. For CopyMultiInsertBufferFlush() -> ExecInsertIndexTuples()
it is PortalContext, and for CopyFrom() -> ExecInsertIndexTuples() it is
ExecutorState/ExprContext. We can't rely on it to register the callback
neither.

What we can do is create a new MemoryContext for holding the
BrinInsertState, and we tie the callback to that so that cleanup is not
affected by all of these variables. See v2 patch attached. Passes make
installcheck-world and make installcheck -C src/test/modules/brin.

However, we do still have 1 issue with the v2 patch:
When we try to cancel (Ctrl-c) a running COPY command:
ERROR:  buffer 151 is not owned by resource owner TopTransaction

#4  0x559cbc54a934 in ResourceOwnerForgetBuffer
(owner=0x559cbd6fcf28, buffer=143) at resowner.c:997
#5  0x559cbc2c45e7 in UnpinBuffer (buf=0x7f8d4a8f3f80) at bufmgr.c:2390
#6  0x559cbc2c7e49 in ReleaseBuffer (buffer=143) at bufmgr.c:4488
#7  0x559cbbd82d53 in brinRevmapTerminate (revmap=0x559cbd7a03b8)
at brin_revmap.c:105
#8  0x559cbbd73c44 in brininsertCleanupCallback
(arg=0x559cbd7a5b68) at brin.c:168
#9  0x559cbc54280c in MemoryContextCallResetCallbacks
(context=0x559cbd7a5a50) at mcxt.c:506
#10 0x559cbc54269d in MemoryContextDelete (context=0x559cbd7a5a50)
at mcxt.c:421
#11 0x559cbc54273e in MemoryContextDeleteChildren
(context=0x559cbd69ae90) at mcxt.c:457
#12 0x559cbc54625c in AtAbort_Portals () at portalmem.c:850

Haven't found a way to fix this ^ yet.

Maybe there is a better way of doing our cleanup? I'm not sure. Would
love your input!

The other alternative for all this is to introduce new AM callbacks for
insert_begin and insert_end. That might be a tougher sell?

Now, to finally answer your question about the speedup without
generate_series(). We do see an even higher speedup!

seq 1 2 > /tmp/data.csv
\timing
DROP TABLE heap;
CREATE TABLE heap(i int);
CREATE INDEX ON heap USING brin(i) WITH (pages_per_range=1);
COPY heap FROM '/tmp/data.csv';

-- 3 runs (master 29cf61ade3f245aa40f427a1d6345287ef77e622)
COPY 2
Time: 205072.444 ms (03:25.072)
Time: 215380.369 ms (03:35.380)
Time: 203492.347 ms (03:23.492)

-- 3 runs (branch v2)

COPY 2
Time: 135052.752 ms (02:15.053)
Time: 135093.131 ms (02:15.093)
Time: 138737.048 ms (02:18.737)

Regards,
Soumyadeep (VMware)
From 54ba134f4afe9c4bac19e7d8fde31b9768dc23cd Mon Sep 17 00:00:00 2001
From: Soumyadeep Chakraborty 
Date: Tue, 4 Jul 2023 11:50:35 -0700
Subject: [PATCH v2 1/1] Reuse revmap and brin desc in brininsert

brininsert() used to have code that performed per-tuple initialization
of the revmap. That had some overhead.
---
 src/backend/access/brin/brin.c | 89 +-
 1 file 

Re: Reducing connection overhead in pg_upgrade compat check phase

2023-07-04 Thread Nathan Bossart
I put together a rebased version of the patch for cfbot.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From ee5805dc450f081b77ae3a7df315ceafb6ccc5e1 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson 
Date: Mon, 13 Mar 2023 14:46:24 +0100
Subject: [PATCH v4 1/1] pg_upgrade: run all data type checks per connection

The checks for data type usage were each connecting to all databases
in the cluster and running their query. On cluster which have a lot
of databases this can become unnecessarily expensive. This moves the
checks to run in a single connection instead to minimize connection
setup/teardown overhead.

Reviewed-by: Nathan Bossart 
Reviewed-by: Justin Pryzby 
Discussion: https://postgr.es/m/bb4c76f-d416-4f9f-949e-dbe950d37...@yesql.se
---
 src/bin/pg_upgrade/check.c  | 575 
 src/bin/pg_upgrade/pg_upgrade.h |  29 +-
 src/bin/pg_upgrade/version.c| 289 +++-
 3 files changed, 433 insertions(+), 460 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 64024e3b9e..c829aed26e 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -10,6 +10,7 @@
 #include "postgres_fe.h"
 
 #include "catalog/pg_authid_d.h"
+#include "catalog/pg_class_d.h"
 #include "catalog/pg_collation.h"
 #include "fe_utils/string_utils.h"
 #include "mb/pg_wchar.h"
@@ -23,14 +24,375 @@ static void check_for_isn_and_int8_passing_mismatch(ClusterInfo *cluster);
 static void check_for_user_defined_postfix_ops(ClusterInfo *cluster);
 static void check_for_incompatible_polymorphics(ClusterInfo *cluster);
 static void check_for_tables_with_oids(ClusterInfo *cluster);
-static void check_for_composite_data_type_usage(ClusterInfo *cluster);
-static void check_for_reg_data_type_usage(ClusterInfo *cluster);
-static void check_for_aclitem_data_type_usage(ClusterInfo *cluster);
-static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
 
+/*
+ * Data type usage checks. Each check for problematic data type usage is
+ * defined in this array with metadata, SQL query for finding the data type
+ * and a function pointer for determining if the check should be executed
+ * for the current version.
+ */
+static int n_data_types_usage_checks = 7;
+static DataTypesUsageChecks data_types_usage_checks[] = {
+	/*
+	 * Look for composite types that were made during initdb *or* belong to
+	 * information_schema; that's important in case information_schema was
+	 * dropped and reloaded.
+	 *
+	 * The cutoff OID here should match the source cluster's value of
+	 * FirstNormalObjectId.  We hardcode it rather than using that C #define
+	 * because, if that #define is ever changed, our own version's value is
+	 * NOT what to use.  Eventually we may need a test on the source cluster's
+	 * version to select the correct value.
+	 */
+	{.status = "Checking for system-defined composite types in user tables",
+	 .report_filename = "tables_using_composite.txt",
+	 .base_query =
+	 "SELECT t.oid FROM pg_catalog.pg_type t "
+	 "LEFT JOIN pg_catalog.pg_namespace n ON t.typnamespace = n.oid "
+	 " WHERE typtype = 'c' AND (t.oid < 16384 OR nspname = 'information_schema')",
+	 .report_text =
+	 "Your installation contains system-defined composite type(s) in user tables.\n"
+	 "These type OIDs are not stable across PostgreSQL versions,\n"
+	 "so this cluster cannot currently be upgraded.  You can\n"
+	 "drop the problem columns and restart the upgrade.\n"
+	 "A list of the problem columns is in the file:",
+	 .version_hook = NULL},
+
+	/*
+	 * 9.3 -> 9.4
+	 *	Fully implement the 'line' data type in 9.4, which previously returned
+	 *	"not enabled" by default and was only functionally enabled with a
+	 *	compile-time switch; as of 9.4 "line" has a different on-disk
+	 *	representation format.
+	 */
+	{.status = "Checking for incompatible \"line\" data type",
+	 .report_filename = "tables_using_line.txt",
+	 .base_query =
+	 "SELECT 'pg_catalog.line'::pg_catalog.regtype AS oid",
+	 .report_text =
+	 "your installation contains the \"line\" data type in user tables.\n"
+	 "this data type changed its internal and input/output format\n"
+	 "between your old and new versions so this\n"
+	 "cluster cannot currently be upgraded.  you can\n"
+	 "drop the problem columns and restart the upgrade.\n"
+	 "a list of the problem columns is in the file:",
+	 .version_hook = old_9_3_check_for_line_data_type_usage},
+
+	/*
+	 *	pg_upgrade only preserves these system values:
+	 *		pg_class.oid
+	 *		pg_type.oid
+	 *		pg_enum.oid
+	 *
+	 *	Many of the reg* data types reference system catalog info that is
+	 *	not preserved, and hence these data types cannot be used in user
+	 *	tables upgraded by pg_upgrade.
+	 */
+	{.status = "Checking for reg* data types in user 

Re: pg_stat_statements and "IN" conditions

2023-07-04 Thread Dmitry Dolgov
> On Mon, Jul 03, 2023 at 09:46:11PM -0700, Nathan Bossart wrote:

Thanks for reviewing.

> On Sun, Mar 19, 2023 at 01:27:34PM +0100, Dmitry Dolgov wrote:
> > +If this parameter is on, two queries with an array will get the 
> > same
> > +query identifier if the only difference between them is the number 
> > of
> > +constants, both numbers is of the same order of magnitude and 
> > greater or
> > +equal 10 (so the order of magnitude is greather than 1, it is not 
> > worth
> > +the efforts otherwise).
>
> IMHO this adds way too much complexity to something that most users would
> expect to be an on/off switch.

This documentation is exclusively to be precise about how does it work.
Users don't have to worry about all this, and pretty much turn it
on/off, as you've described. I agree though, I could probably write this
text a bit differently.

> If I understand Álvaro's suggestion [0] correctly, he's saying that in
> addition to allowing "on" and "off", it might be worth allowing
> something like "powers" to yield roughly the behavior described above.
> I don't think he's suggesting that this "powers" behavior should be
> the only available option.

Independently of what Álvaro was suggesting, I find the "powers"
approach more suitable, because it answers my own concerns about the
previous implementation. Having "on"/"off" values means we would have to
scratch heads coming up with a one-size-fit-all default value, or to
introduce another option for the actual cut-off threshold. I would like
to avoid both of those options, that's why I went with "powers" only.

> Also, it seems counterintuitive that queries with fewer than 10
> constants are not merged.

Why? What would be your intuition using this feature?

> In the interest of moving this patch forward, I would suggest making it a
> simple on/off switch in 0002 and moving the "powers" functionality to a new
> 0003 patch.  I think separating out the core part of this feature might
> help reviewers.  As you can see, I got distracted by the complicated
> threshold logic and ended up focusing my first round of review there.

I would disagree. As I've described above, to me "powers" seems to be a
better fit, and the complicated logic is in fact reusing one already
existing function. Do those arguments sound convincing to you?




Re: Add more sanity checks around callers of changeDependencyFor()

2023-07-04 Thread Tom Lane
Alvaro Herrera  writes:
> Hmm, shouldn't we disallow moving the function to another schema, if the
> function's schema was originally determined at extension creation time?
> I'm not sure we really want to allow moving objects of an extension to a
> different schema.

Why not?  I do not believe that an extension's objects are required
to all be in the same schema.

regards, tom lane




Re: suppressing useless wakeups in logical/worker.c

2023-07-04 Thread Nathan Bossart
On Tue, Jul 04, 2023 at 09:48:23AM +0200, Daniel Gustafsson wrote:
>> On 17 Mar 2023, at 10:16, Amit Kapila  wrote:
> 
>> Few minor comments:
> 
> Have you had a chance to address the comments raised by Amit in this thread?

Not yet, sorry.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: O(n) tasks cause lengthy startups and checkpoints

2023-07-04 Thread Nathan Bossart
On Tue, Jul 04, 2023 at 09:30:43AM +0200, Daniel Gustafsson wrote:
>> On 4 Apr 2023, at 05:36, Nathan Bossart  wrote:
>> 
>> I sent this one to the next commitfest and marked it as waiting-on-author
>> and targeted for v17.  I'm aiming to have something that addresses the
>> latest feedback ready for the July commitfest.
> 
> Have you had a chance to look at this such that there is something ready?

Not yet, sorry.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: On /*----- comments

2023-07-04 Thread Tom Lane
Heikki Linnakangas  writes:
> On 03/07/2023 11:48, Daniel Gustafsson wrote:
> On 30 Jun 2023, at 17:22, Tom Lane  wrote:
>>> Seems reasonable; the trailing dashes eat a line without adding much.

>> +1

> Pushed a patch to remove the end-guard from the example in the pgindent 
> README. And fixed the bogus end-guard in walsender.c.

I don't see any actual push?

> I'm not sure there is a universal best length. It depends on the comment 
> what looks best. The very long ones in particular would not look good on 
> comments in a deeply indented block. So I think the status quo is fine.

OK, no strong feeling about that here.

regards, tom lane




Re: Cleaning up array_in()

2023-07-04 Thread Tom Lane
Nikhil Benesch  writes:
> I spotted a few opportunities for further reducing state tracked by
> `ArrayCount`. You may not find all of these suggestions to be
> worthwhile.

I found some time today to look at these points.

> 1) `in_quotes` appears to be wholly redundant with `parse_state ==
> ARRAY_QUOTED_ELEM_STARTED`.

I agree that it is redundant, but I'm disinclined to remove it because
the in_quotes logic matches that in ReadArrayStr.  I think it's better
to keep those two functions in sync.  The parse_state represents an
independent set of checks that need not be repeated by ReadArrayStr,
but both functions have to track quoting.  The same for eoArray.

> 2) The `empty_array` special case does not seem to be important to
> ArrayCount's callers, which don't even special case `ndims == 0` but
> rather `ArrayGetNItemsSafe(..) == 0`. Perhaps this is a philosophical
> question as to whether `ArrayCount('{{}, {}}')` should return
> (ndims=2, dims=[2, 0]) or (ndims=0). Obviously someone needs to do
> that normalization, but `ArrayCount` could leave that normalization to
> `ReadArrayStr`.

This idea I do like.  While looking at the callers, I also noticed
that it's impossible currently to write an empty array with explicit
specification of bounds.  It seems to me that you ought to be able
to write, say,

SELECT '[1:0]={}'::int[];

but up to now you got "upper bound cannot be less than lower bound";
and if you somehow got past that, you'd get "Specified array
dimensions do not match array contents." because of ArrayCount's
premature optimization of "one-dimensional array with length zero"
to "zero-dimensional array".  We can fix that by doing what you said
and adjusting the initial bounds restriction to be "upper bound cannot
be less than lower bound minus one".

> I also have a sense that `ndims_frozen` made the distinction between
> `ARRAY_ELEM_DELIMITED` and `ARRAY_LEVEL_DELIMITED` unnecessary, and
> the two states could be merged into a single `ARRAY_DELIMITED` state,
> but I've not pulled on this thread hard enough to say so confidently.

I looked at jian he's implementation of that and was not impressed:
I do not think the logic gets any clearer, and it seems to me that
this makes a substantial dent in ArrayCount's ability to detect syntax
errors.  The fact that one of the test case error messages got better
seems pretty accidental to me.  We can get the same result in a more
purposeful way by giving a different error message for
ARRAY_ELEM_DELIMITED.

So I end up with the attached.  I went ahead and dropped
ArrayGetOffset0() as part of 0001, and I split 0002 into two patches
where the new 0002 avoids re-indenting any existing code in order
to ease review, and then 0003 is just a mechanical application
of pgindent.

I still didn't do anything about "number of array dimensions (7)
exceeds the maximum allowed (6)".  There are quite a few instances
of that wording, not only array_in's, and I'm not sure whether to
change the rest.  In any case that looks like something that
could be addressed separately.  The other error message wording
changes here seem to me to be okay.

regards, tom lane

From 82cac24a618db69e149c140e7064eebda9f1ddfc Mon Sep 17 00:00:00 2001
From: Tom Lane 
Date: Tue, 4 Jul 2023 12:39:41 -0400
Subject: [PATCH v2 1/3] Simplify and speed up ReadArrayStr().

ReadArrayStr() seems to have been written on the assumption that
non-rectangular input is fine and it should pad with NULLs anywhere
that elements are missing.  We disallowed non-rectangular input
ages ago (commit 0e13d627b), but never simplified this function
as a follow-up.  In particular, the existing code recomputes each
element's linear location from scratch, which is quite unnecessary
for rectangular input: we can just assign the elements sequentially,
saving lots of arithmetic.  Add some more commentary while at it.

ArrayGetOffset0() is no longer used anywhere, so remove it.
---
 src/backend/utils/adt/arrayfuncs.c | 69 ++
 src/backend/utils/adt/arrayutils.c | 15 ---
 src/include/utils/array.h  |  1 -
 3 files changed, 33 insertions(+), 52 deletions(-)

diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index 9000f83a83..050568808a 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -93,7 +93,7 @@ static bool array_isspace(char ch);
 static int	ArrayCount(const char *str, int *dim, char typdelim,
 	   Node *escontext);
 static bool ReadArrayStr(char *arrayStr, const char *origStr,
-		 int nitems, int ndim, int *dim,
+		 int nitems,
 		 FmgrInfo *inputproc, Oid typioparam, int32 typmod,
 		 char typdelim,
 		 int typlen, bool typbyval, char typalign,
@@ -391,7 +391,7 @@ array_in(PG_FUNCTION_ARGS)
 	dataPtr = (Datum *) palloc(nitems * sizeof(Datum));
 	nullsPtr = (bool *) palloc(nitems * sizeof(bool));
 	if (!ReadArrayStr(p, string,
-	  nitems, ndim, dim,
+	  

Re: pg_basebackup check vs Windows file path limits

2023-07-04 Thread Andrew Dunstan


On 2023-07-03 Mo 11:18, Andrew Dunstan wrote:



On 2023-07-03 Mo 10:16, Daniel Gustafsson wrote:

On 3 Jul 2023, at 16:12, Andrew Dunstan  wrote:
I've pushed a better solution, which creates the file via a short symlink. 
Experimentation on fairywren showed this working.

The buildfarm seems a tad upset after this?



Yeah :-(

I think it should be fixing itself now.





But sadly we're kinda back where we started. fairywren is failing on 
REL_16_STABLE. Before the changes the failure occurred because the test 
script was unable to create the file with a path > 255. Now that we have 
a way to create the file the test for pg_basebackup to reject files with 
names > 100 fails, I presume because the server can't actually see the 
file. At this stage I'm thinking the best thing would be to skip the 
test altogether on windows if the path is longer than 255.



cheers


andrew


--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: Is a pg_stat_force_next_flush() call sufficient for regression tests?

2023-07-04 Thread Tomas Vondra



On 7/4/23 04:29, Kyotaro Horiguchi wrote:
> At Mon, 3 Jul 2023 15:45:52 +0200, Tomas Vondra 
>  wrote in 
>> So I'm wondering if pg_stat_force_next_flush() is enough - AFAICS this
>> only sets some flag for the *next* pgstat_report_stat() call, but how do
>> we know that happens before the query execution?
>>
>> Shouldn't there be something like pg_stat_flush() that actually does the
>> flushing, instead of just setting the flag?
> 
> The reason for the function is that pg_stat_flush() is supposed not to
> be called within a transaction.  AFAICS pg_stat_force_next_flush()
> takes effect after a successfull transaction end and before the next
> command execution.
> 

Sure, if we're supposed to report the stats only at the end of a
transaction, that makes sense. But then why didn't that happen here?

> To verify this, I put in an assertion to check that the flag gets
> consumed before reading of pg_stat_io (a.diff), then ran pgbench with
> the attached custom script. As expected, it didn't fire at all during
> several trials. When I wrapped all lines in t.sql within a
> begin-commit block, the assertion fired off immediately as a matter of
> course.
> 

If I understand correctly, this just verifies that

1) if everything goes well, we report the stats at the end of the
transaction (otherwise the case without BEGIN/COMMIT would fail)

2) we don't report stats when in a transaction (with the BEGIN/COMMIT)

But the eelpout failure clearly suggests this may misbehave.

> Is there any chance concurrent backends or some other things can
> actually hinder the backend from reusing buffers?
> 

No idea. I'm not very familiar with the reworked pgstat system, but
either the pgstat_report_stat() was not called for some reason, or it
decided there's nothing to report (i.e. have_iostats==false). Not sure
why would that happen.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: On /*----- comments

2023-07-04 Thread Heikki Linnakangas

On 03/07/2023 11:48, Daniel Gustafsson wrote:

On 30 Jun 2023, at 17:22, Tom Lane  wrote:



Seems reasonable; the trailing dashes eat a line without adding much.


+1


Pushed a patch to remove the end-guard from the example in the pgindent 
README. And fixed the bogus end-guard in walsender.c.



Should we also provide specific guidance about how many leading dashes
to use for this?  I vaguely recall that pgindent might only need one,
but I think using somewhere around 5 to 10 looks better.


There are ~50 different lenghts used when looking at block comments from line 2
(to avoid the file header comment) and onwards in files, the ones with 10 or
more occurrences are:

  145 /*--
   78 /*--
   76 
/*-
   37 /*--
   29 /*
   23 /*
   22 /*
   21 /*
   15 /*-
   14 /*--
   13 /*---
   13 /*---
   12 /*--

10 leading dashes is the clear winner so recommending that for new/edited
comments seem like a good way to reduce churn.


The example in the pgindent README also uses 10 dashes.

I'm not sure there is a universal best length. It depends on the comment 
what looks best. The very long ones in particular would not look good on 
comments in a deeply indented block. So I think the status quo is fine.



Looking at line 1 comments for fun shows pretty strong consistency:

1611 /*-
   22 
/*--
   18 /*
   13 /*
7 
/*---
4 /*---
4 /*--
1 /*--

plpy_util.h being the only one that sticks out.


I don't see any reason for the variance in these. Seems accidental..

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Does a cancelled REINDEX CONCURRENTLY need to be messy?

2023-07-04 Thread Álvaro Herrera
On 2023-Jul-04, Michael Paquier wrote:

> On Mon, Jul 03, 2023 at 07:46:27PM +0200, Alvaro Herrera wrote:

> > Perhaps we could have autovacuum check for it, and do it
> > separately of vacuum proper.)
> 
> Being able to reuse some of the worker/launcher parts from autovacuum
> could make things easier for a bgworker implementation, perhaps?

TBH I don't understand what you are thinking about.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"I can see support will not be a problem.  10 out of 10."(Simon Wittber)
  (http://archives.postgresql.org/pgsql-general/2004-12/msg00159.php)




Re: Add more sanity checks around callers of changeDependencyFor()

2023-07-04 Thread Alvaro Herrera
On 2023-Jun-29, Heikki Linnakangas wrote:

> I can hit the above error with the attached test case. That seems wrong,
> although I don't know if it means that the check is wrong or it exposed a
> long-standing bug.

> +CREATE SCHEMA test_func_dep1;
> +CREATE SCHEMA test_func_dep2;
> +CREATE EXTENSION test_ext_req_schema1 SCHEMA test_func_dep1;
> +ALTER FUNCTION test_func_dep1.dep_req1() SET SCHEMA test_func_dep2;
> +
> +ALTER EXTENSION test_ext_req_schema1 SET SCHEMA test_func_dep2;
> +
> +DROP EXTENSION test_ext_req_schema1 CASCADE;

Hmm, shouldn't we disallow moving the function to another schema, if the
function's schema was originally determined at extension creation time?
I'm not sure we really want to allow moving objects of an extension to a
different schema.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
“Cuando no hay humildad las personas se degradan” (A. Christie)




Re: Missing llvm_leave_fatal_on_oom() call

2023-07-04 Thread Daniel Gustafsson
> On 21 Feb 2023, at 15:50, Heikki Linnakangas  wrote:
> 
> llvm_release_context() calls llvm_enter_fatal_on_oom(), but it never calls 
> llvm_leave_fatal_on_oom(). Isn't that a clear leak?

Not sure how much of a leak it is since IIUC LLVM just stores a function
pointer to our error handler, but I can't see a reason not clean it up here.
The attached fix LGTM and passes make check with jit_above_cost set to zero.

--
Daniel Gustafsson





Re: Creation of an empty table is not fsync'd at checkpoint

2023-07-04 Thread Heikki Linnakangas

On 03/07/2023 15:59, Daniel Gustafsson wrote:

This patch required a trivial rebase after conflicting with bfcf1b3480 so I've
attached a v2 to get the CFBot to run this.


Thank you! Pushed to all supported branches. (Without forgetting the new 
REL_16_STABLE :-D ).


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Initdb-time block size specification

2023-07-04 Thread Peter Eisentraut

On 01.07.23 00:21, Tomas Vondra wrote:

Right, that's the dance we do to protect against torn pages. But Andres
suggested that if you have modern storage and configure it correctly,
writing with 4kB pages would be atomic. So we wouldn't need to do this
FPI stuff, eliminating pretty significant source of write amplification.


This work in progress for the Linux kernel was also mentioned at PGCon: 
.  Subject the various conditions, the 
kernel would then guarantee atomic writes for blocks larger than the 
hardware's native size.






Re: Why is DATESTYLE, ordering ignored for output but used for input ?

2023-07-04 Thread Dave Cramer
On Mon, 3 Jul 2023 at 17:13, Matthias van de Meent <
boekewurm+postg...@gmail.com> wrote:

> On Mon, 3 Jul 2023 at 20:06, Dave Cramer  wrote:
> >
> > Greetings,
> >
> > For ISO and German dates the order DMY is completely ignored on output
> but used for input.
> >
> > test=# set datestyle to 'ISO,DMY';
> > SET
> > select '7-8-2023'::date
> > test-# ;
> > date
> > 
> >  2023-08-07
> > (1 row)
> >
> > test=# set datestyle to 'ISO,MDY';
> > SET
> > test=# select '7-8-2023'::date
> > ;
> > date
> > 
> >  2023-07-08
> > (1 row)
> >
> > Note regardless of  how the ordering is specified it is always output as
> > YMD
>
> Wouldn't that be because ISO only has one correct ordering of the day
> and month fields? I fail to see why we'd output non-ISO-formatted date
> strings when ISO format is requested. I believe the reason is the same
> for German dates - Germany's official (or most common?) date
> formatting has a single ordering of these fields, which is also the
> ordering that we supply.
>

seems rather un-intuitive that it works for some datestyles and not for
others


>
> The code comments also seem to hint to this:
>
> > case USE_ISO_DATES:
> > case USE_XSD_DATES:
> >  /* compatible with ISO date formats */
>
> > case USE_GERMAN_DATES:
> > /* German-style date format */
>
> This has been this way since the code for ISO was originally committed
> in July of '97 with 8507ddb9 and the GERMAN formatting which was added
> in December of '97 as D.M/Y with 352b3687 (and later that month was
> updated to D.M.Y with ca23837a).
> Sadly, the -hackers archives don't seem to have any mails from that
> time period, so I couldn't find much info on the precise rationale
> around this behavior.
>

Yeah, I couldn't find much either.


>
> Kind regards,
>
> Matthias van de Meent
> Neon (https://neon.tech/)
>
> PS. That was some interesting digging into the history of the date
> formatting module.
>

Always interesting digging into the history of the project.

Dave


Re: [PATCH] Extend the length of BackgroundWorker.bgw_library_name

2023-07-04 Thread Yurii Rashkovskii
Nathan,

On Mon, Jul 3, 2023 at 8:12 PM Nathan Bossart 
wrote:

> On Mon, Jul 03, 2023 at 06:00:12PM -0700, Yurii Rashkovskii wrote:
> > Great, thank you! The reason I was leaving the other constant in place to
> > make upgrading extensions trivial (so that they don't need to adjust for
> > this), but if you think this is a better way, I am fine with it.
>
> Sorry, I'm not following.  Which constant are you referring to?
>

Apologies, I misread the final patch. All good!

-- 
Y.


Re: ResourceOwner refactoring

2023-07-04 Thread Aleksander Alekseev
Hi,

> Thanks, here's a fixed version. (ResourceOwner resource release
> callbacks mustn't call ResourceOwnerForget anymore, but AbortBufferIO
> was still doing that)

I believe v15 is ready to be merged. I suggest we merge it early in
the PG17 release cycle.

-- 
Best regards,
Aleksander Alekseev




Re: Experiments with Postgres and SSL

2023-07-04 Thread Heikki Linnakangas

On 31/03/2023 10:59, Greg Stark wrote:

IIRC I put a variable labeled a "GUC" but forgot to actually make it a
GUC. But I'm thinking of maybe removing that variable since I don't
see much of a use case for controlling this manually. I *think* ALPN
is supported by all the versions of OpenSSL we support.


+1 on removing the variable. Let's make ALPN mandatory for direct SSL 
connections, like Jacob suggested. And for old-style handshakes, accept 
and check ALPN if it's given.


I don't see the point of the libpq 'sslalpn' option either. Let's send 
ALPN always.


Admittedly having the options make testing different of combinations of 
old and new clients and servers a little easier. But I don't think we 
should add options for the sake of backwards compatibility tests.



--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -1126,13 +1126,16 @@ pq_discardbytes(size_t len)
 /* 
  * pq_buffer_has_data  - is any buffered data 
available to read?
  *
- * This will *not* attempt to read more data.
+ * Actually returns the number of bytes in the buffer...
+ *
+ * This will *not* attempt to read more data. And reading up to that number of
+ * bytes should not cause reading any more data either.
  * 
  */
-bool
+size_t
 pq_buffer_has_data(void)
 {
-   return (PqRecvPointer < PqRecvLength);
+   return (PqRecvLength - PqRecvPointer);
 }


Let's rename the function.


/* push unencrypted buffered data back through SSL setup */
len = pq_buffer_has_data();
if (len > 0)
{
buf = palloc(len);
if (pq_getbytes(buf, len) == EOF)
return STATUS_ERROR; /* shouldn't be possible */
port->raw_buf = buf;
port->raw_buf_remaining = len;
port->raw_buf_consumed = 0;
}

Assert(pq_buffer_has_data() == 0);
if (secure_open_server(port) == -1)
{
ereport(COMMERROR,
(errcode(ERRCODE_PROTOCOL_VIOLATION),
 errmsg("SSL Protocol Error during direct 
SSL connection initiation")));
return STATUS_ERROR;
}

if (port->raw_buf_remaining > 0)
{
ereport(COMMERROR,
(errcode(ERRCODE_PROTOCOL_VIOLATION),
 errmsg("received unencrypted data after SSL 
request"),
 errdetail("This could be either a 
client-software bug or evidence of an attempted man-in-the-middle attack.")));
return STATUS_ERROR;
}
if (port->raw_buf)
pfree(port->raw_buf);


This pattern is repeated in both callers of secure_open_server(). Could 
we move this into secure_open_server() itself? That would feel pretty 
natural, be-secure.c already contains the secure_raw_read() function 
that reads the 'raw_buf' field.



const char *
PQsslAttribute(PGconn *conn, const char *attribute_name)
{
...

if (strcmp(attribute_name, "alpn") == 0)
{
const unsigned char *data;
unsigned int len;
static char alpn_str[256]; /* alpn doesn't support longer than 
255 bytes */
SSL_get0_alpn_selected(conn->ssl, , );
if (data == NULL || len==0 || len > sizeof(alpn_str)-1)
return NULL;
memcpy(alpn_str, data, len);
alpn_str[len] = 0;
return alpn_str;
}


Using a static buffer doesn't look right. If you call PQsslAttribute on 
two different connections from two different threads concurrently, they 
will write to the same buffer. I see that you copied it from the 
"key_bits" handlng, but it has the same issue.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: logical decoding and replication of sequences, take 2

2023-07-04 Thread Ashutosh Bapat
On Mon, Jun 26, 2023 at 8:35 PM Tomas Vondra
 wrote:
> On 6/26/23 15:18, Ashutosh Bapat wrote:

> > I will look at 0004 next.
> >
>
> OK


0004- is quite large. I think if we split this into two or even three
1. publication and
subscription catalog handling 2. built-in replication protocol changes, it
might be easier to review. But anyway, I have given it one read. I have
reviewed the parts which deal with the replication-proper in detail. I have
*not* thoroughly reviewed the parts which deal with the catalogs, pg_dump,
describe and tab completion. Similarly tests.  If those parts need a
thorough review, please let
me know.

But before jumping into the comments, a weird scenario I tried.  On publisher I
created a table t1(a int, b int) and a sequence s and added both to a
publication. On subscriber I swapped their names i.e. created a table s(a int, b
int) and a sequence t1 and subscribed to the publication. The subscription was
created, and during replication it threw error "logical replication target
relation "public.t1" is missing replicated columns: "a", "b" and  logical
replication target relation "public.s" is missing replicated columns:
"last_value", "log_cnt", "is_called". I think it's good that it at least
threw an error. But it would be good if it detected that the reltypes
themselves are different and mentioned that in the error. Something like
"logical replication target "public.s" is not a sequence like source
"public.s".

Comments on the patch itself.

I didn't find any mention of 'sequence' in the documentation of publish option
in CREATE or ALTER PUBLICATION. Something missing in the documentation? But do
we really need to record "sequence" as an operation? Just adding the sequences
to the publication should be fine right? There's only one operation on
sequences, updating the sequence row.

+CREATE VIEW pg_publication_sequences AS
+SELECT
+P.pubname AS pubname,
+N.nspname AS schemaname,
+C.relname AS sequencename

If we report oid or regclass for sequences it might be easier to join the view
further. We don't have reg* for publication so we report both oid and
name of publication.

+/*
+ * Update the sequence state by modifying the existing sequence data row.
+ *
+ * This keeps the same relfilenode, so the behavior is non-transactional.
+ */
+static void
+SetSequence_non_transactional(Oid seqrelid, int64 last_value, int64
log_cnt, bool is_called)

This function has some code similar to nextval but with the sequence
of operations (viz. changes to buffer, WAL insert and cache update) changed.
Given the comments in nextval_internal() the difference in sequence of
operations should not make a difference in the end result. But I think it will
be good to deduplicate the code to avoid confusion and also for ease of
maintenance.

+
+/*
+ * Update the sequence state by creating a new relfilenode.
+ *
+ * This creates a new relfilenode, to allow transactional behavior.
+ */
+static void
+SetSequence_transactional(Oid seq_relid, int64 last_value, int64
log_cnt, bool is_called)

Need some deduplication here as well. But the similarities with AlterSequence,
ResetSequence or DefineSequence are less.

@@ -730,9 +731,9 @@ CreateSubscription(ParseState *pstate,
CreateSubscriptionStmt *stmt,
 {
 /*
- * Get the table list from publisher and build local table status
- * info.
+ * Get the table and sequence list from publisher and build
+ * local relation sync status info.
  */
-tables = fetch_table_list(wrconn, publications);
-foreach(lc, tables)
+relations = fetch_table_list(wrconn, publications);

Is it allowed to connect a newer subscriber to an old publisher? If
yes the query
to fetch sequences will throw an error since it won't find the catalog.

@@ -882,8 +886,10 @@ AlterSubscription_refresh(Subscription *sub, bool
copy_data,
-/* Get the table list from publisher. */
+/* Get the list of relations from publisher. */
 pubrel_names = fetch_table_list(wrconn, sub->publications);
+pubrel_names = list_concat(pubrel_names,
+   fetch_sequence_list(wrconn,
sub->publications));

Similarly here.

+void
+logicalrep_write_sequence(StringInfo out, Relation rel, TransactionId xid,
+
... snip ...
+pq_sendint8(out, flags);
+pq_sendint64(out, lsn);
... snip ...
+LogicalRepRelId
+logicalrep_read_sequence(StringInfo in, LogicalRepSequence *seqdata)
+{
... snip ...
+/* XXX skipping flags and lsn */
+pq_getmsgint(in, 1);
+pq_getmsgint64(in);

We are ignoring these two fields on the WAL receiver side. I don't see such
fields being part of INSERT, UPDATE or DELETE messages. Should we just drop
those or do they have some future use? Two lsns are written by
OutputPrepareWrite() as prologue to the logical message. If this LSN
is one of them, it could be dropped anyway.


+static void

Re: Named Operators

2023-07-04 Thread Daniel Gustafsson
> On 8 Feb 2023, at 16:57, Tom Lane  wrote:

> I do not think this proposal is going anywhere as-written.

Reading this thread, it seems there is concensus against this proposal in its
current form, and no updated patch has been presented, so I will mark this as
Returned with Feedback.  Please feel free to resubmit to a future CF when there
is renewed interest in working on this.

--
Daniel Gustafsson





Re: Remove incidental md5() function uses from several tests

2023-07-04 Thread Peter Eisentraut

On 07.06.23 10:19, Daniel Gustafsson wrote:

Unlike for the main regression tests, I didn't write a fipshash() wrapper here, 
because that would have been too repetitive and wouldn't really save much here. 
 In some cases it was easier to remove one layer of indirection by changing 
column types from text to bytea.


Agreed.  Since the commit message mentions 208bf364a9 it would probably be a
good idea to add some version of the above fipshash clarification to the commit
message.


Committed with that addition, thanks.





Re: pipe_read_line for reading arbitrary strings

2023-07-04 Thread Daniel Gustafsson
> On 4 Jul 2023, at 13:59, Heikki Linnakangas  wrote:
> On 08/03/2023 00:05, Daniel Gustafsson wrote:

>> If we are going to continue using this for reading $stuff from pipes, maybe 
>> we
>> should think about presenting a nicer API which removes that risk?  Returning
>> an allocated buffer which contains all the output along the lines of the 
>> recent
>> pg_get_line work seems a lot nicer and safer IMO.
> 
> +1

Thanks for review!

>> /*
>> * Execute a command in a pipe and read the first line from it. The returned
>> * string is allocated, the caller is responsible for freeing.
>> */
>> char *
>> pipe_read_line(char *cmd)
> 
> I think it's worth being explicit here that it's palloc'd, or malloc'd in 
> frontend programs, rather than just "allocated". Like in pg_get_line.

Good point, I'll make that happen before committing this.

--
Daniel Gustafsson





Re: SQL:2011 application time

2023-07-04 Thread Daniel Gustafsson
> On 8 May 2023, at 09:10, Peter Eisentraut  
> wrote:
> 
> On 03.05.23 23:02, Paul Jungwirth wrote:
>> Thank you again for the review. Here is a patch with most of your feedback 
>> addressed. Sorry it has taken so long! These patches are rebased up to 
>> 1ab763fc22adc88e5d779817e7b42b25a9dd7c9e
>> (May 3).
> 
> Here are a few small fixup patches to get your patch set compiling cleanly.
> 
> Also, it looks like the patches 0002, 0003, and 0004 are not split up 
> correctly.  0002 contains tests using the FOR PORTION OF syntax introduced in 
> 0003, and 0003 uses the function build_period_range() from 0004.

These patches no longer apply without a new rebase.  Should this patch be
closed in while waiting for the prequisite of adding btree_gist to core
mentioned upthread?  I see no patch registered in the CF for this unless I'm
missing sometihng.

--
Daniel Gustafsson





Re: [PATCH]Feature improvement for MERGE tab completion

2023-07-04 Thread Daniel Gustafsson
> On 28 Mar 2023, at 20:55, Gregory Stark (as CFM)  wrote:
> 
> It looks like this remaining work isn't going to happen this CF and
> therefore this release. There hasn't been an update since January when
> Dean Rasheed posted a review.
> 
> Unless there's any updates soon I'll move this on to the next
> commitfest or mark it returned with feedback.

There are still no updates to this patch or thread, so I'm closing this as
Returned with Feedback.  Please feel free to resubmit to a future CF when there
is renewed interest in working on this.

--
Daniel Gustafsson





Re: Clean up command argument assembly

2023-07-04 Thread Heikki Linnakangas

On 26/06/2023 12:33, Peter Eisentraut wrote:

This is a small code cleanup patch.

Several commands internally assemble command lines to call other
commands.  This includes initdb, pg_dumpall, and pg_regress.  (Also
pg_ctl, but that is different enough that I didn't consider it here.)
This has all evolved a bit organically, with fixed-size buffers, and
various optional command-line arguments being injected with
confusing-looking code, and the spacing between options handled in
inconsistent ways.  This patch cleans all this up a bit to look clearer
and be more easily extensible with new arguments and options.


+1


We start each command with printfPQExpBuffer(), and then append
arguments as necessary with appendPQExpBuffer().  Also standardize on
using initPQExpBuffer() over createPQExpBuffer() where possible.
pg_regress uses StringInfo instead of PQExpBuffer, but many of the
same ideas apply.


It's a bit bogus to use PQExpBuffer for these. If you run out of memory, 
you silently get an empty string instead. StringInfo, which exits the 
process on OOM, would be more appropriate. We have tons of such 
inappropriate uses of PQExpBuffer in all our client programs, though, so 
I don't insist on fixing this particular case right now.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Improve the performance of nested loop join in the case of partitioned inner table

2023-07-04 Thread David Rowley
On Thu, 13 Apr 2023 at 03:00, Alexandr Nikulin
 wrote:
> explain analyze select * from ids join test_part on ids.id=test_part.id where 
> ascii(ids.name)=ascii('best case');
> explain analyze select * from ids join test_part on ids.id=test_part.id where 
> ascii(ids.name)=ascii('worst case');
>
> The average results on my machine are as follows:
>
>  | vanila postgres | patched postgres
> best case| 2286 ms | 1924 ms
> worst case   | 2278 ms | 2360 ms
>
> So far I haven't refactored the patch as per David's advice. I just want to 
> understand if we need such an optimization?

My thoughts are that the worst-case numbers are not exactly great.  I
very much imagine that with average cases, it's much more likely than
not that the parameter values from the nested loop's next outer row
will be different than it is that it'll be the same.

Let's say, roughly your numbers show a 20% speedup for the best case,
and a 4% slowdown for the worst case, for us to break even with this
patch as it is, the parameter value would have to be the same around 1
out of 5 times. That does not seem like good odds to bet on given
we're likely working with data types that allow billions of distinct
values.

I think if you really wanted to make this work, then you'd need to get
the planner on board with making the decision on if this should be
done or not based on the n_distinct estimates from the outer side of
the join.  Either that or some heuristic in the executor that tries
for a while and gives up if the parameter value changes too often.
Some code was added in 3592e0ff9 that uses a heuristics approach to
solving this problem by only enabling the optimisation if we hit the
same partition at least 16 times and switches it off again as soon as
the datum no longer matches the cached partition.  I'm not quite sure
how the same could be made to work here as with 3592e0ff9. A tuple
only belongs to a single partition and we can very cheaply check if
this partition is the same as the last one by checking if the
partition index matches.  With this case, since we're running a query,
many partitions can remain after partition pruning runs, and checking
that some large number of partitions match some other large number of
partitions is not going to be as cheap as checking just two partitions
match. Bitmapsets can help here, but they'll just never be as fast as
checking two ints match.

In short, I think you're going to have to come up with something very
crafty here to reduce the overhead worst case. Whatever it is will
need to be neat and self-contained, perhaps in execPartition.c.  It
just does not seem good to have logic related to partition pruning
inside nodeNestloop.c.

I'm going to mark this as waiting on author in the CF app. It might be
better if you withdraw it and resubmit when you have a patch that
addresses the worst-case regression issue.

David




Re: pipe_read_line for reading arbitrary strings

2023-07-04 Thread Heikki Linnakangas

On 08/03/2023 00:05, Daniel Gustafsson wrote:

When skimming through pg_rewind during a small review I noticed the use of
pipe_read_line for reading arbitrary data from a pipe, the mechanics of which
seemed odd.

Commit 5b2f4afffe6 refactored find_other_exec() and broke out pipe_read_line()
as a static convenience routine for reading a single line of output to catch a
version number.  Many years later, commit a7e8ece41 exposed it externally in
order to read a GUC from postgresql.conf using "postgres -C ..".  f06b1c598
also make use of it for reading a version string much like find_other_exec().
Funnily enough, while now used for arbitrary string reading the variable is
still "pgver".

Since the function requires passing a buffer/size, and at most size - 1 bytes
will be read via fgets(), there is a truncation risk when using this for
reading GUCs (like how pg_rewind does, though the risk there is slim to none).


Good point.


If we are going to continue using this for reading $stuff from pipes, maybe we
should think about presenting a nicer API which removes that risk?  Returning
an allocated buffer which contains all the output along the lines of the recent
pg_get_line work seems a lot nicer and safer IMO.


+1


/*
 * Execute a command in a pipe and read the first line from it. The returned
 * string is allocated, the caller is responsible for freeing.
 */
char *
pipe_read_line(char *cmd)


I think it's worth being explicit here that it's palloc'd, or malloc'd 
in frontend programs, rather than just "allocated". Like in pg_get_line.


Other than that, LGTM.

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: brininsert optimization opportunity

2023-07-04 Thread Tomas Vondra



On 7/4/23 13:23, Alvaro Herrera wrote:
> On 2023-Jul-03, Soumyadeep Chakraborty wrote:
> 
>> My colleague, Ashwin, pointed out to me that brininsert's per-tuple init
>> of the revmap access struct can have non-trivial overhead.
>>
>> Turns out he is right. We are saving 24 bytes of memory per-call for
>> the access struct, and a bit on buffer/locking overhead, with the
>> attached patch.
> 
> Hmm, yeah, I remember being bit bothered by this repeated
> initialization.  Your patch looks reasonable to me.  I would set
> bistate->bs_rmAccess to NULL in the cleanup callback, just to be sure.
> Also, please add comments atop these two new functions, to explain what
> they are.
> 
> Nice results.
> 

Yeah. I wonder how much of that runtime is the generate_series(),
though. What's the speedup if that part is subtracted. It's guaranteed
to be even more significant, but by how much?

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Extension Enhancement: Buffer Invalidation in pg_buffercache

2023-07-04 Thread jian he
On Tue, Jul 4, 2023 at 5:45 PM Japin Li  wrote:

>
> On Tue, 04 Jul 2023 at 17:00, jian he  wrote:
> > the following will also crash. no idea why.
> > begin;
> > select count(*) from onek;
> > select relpages from pg_class where relname = 'onek'; --queryA
> >
> > SELECT count(*) FROM pg_buffercache WHERE relfilenode =
> > pg_relation_filenode('onek'::regclass); --queryB
> >
> > insert into onek values(default);
> >
> > select count(pg_buffercache_invalidate(bufferid)) from
> > pg_buffercache where relfilenode =
> > pg_relation_filenode('onek'::regclass);
> >
> > -
> > queryA returns 35, queryB returns 37.
> > --
> > crash info:
> > test_dev=*# insert into onek values(default);
> > INSERT 0 1
> > test_dev=*# select count(pg_buffercache_invalidate(bufferid)) from
> > pg_buffercache where relfilenode =
> > pg_relation_filenode('onek'::regclass);
> > TRAP: failed Assert("resarr->nitems < resarr->maxitems"), File:
> >
> "../../Desktop/pg_sources/main/postgres/src/backend/utils/resowner/resowner.c",
> > Line: 275, PID: 1533312
>
> According to the comments of ResourceArrayAdd(), the caller must have
> previously
> done ResourceArrayEnlarge(). I tried to call ResourceOwnerEnlargeBuffers()
> before
> PinBuffer_Locked(), so it can avoid this crash.
>
> if ((buf_state & BM_DIRTY) == BM_DIRTY)
> {
> +   /* make sure we can handle the pin */
> +   ResourceOwnerEnlargeBuffers(CurrentResourceOwner);
> +
> /*
>  * Try once to flush the dirty buffer.
>  */
> PinBuffer_Locked(bufHdr);
>
> --
> Regrads,
> Japin Li.
>


thanks. tested flush pg_catalog, public schema, now, both works as pitched.


Re: Making empty Bitmapsets always be NULL

2023-07-04 Thread Yuya Watari
Hello,

On Tue, Jul 4, 2023 at 9:36 AM David Rowley  wrote:
> I've now pushed the patch.

Thanks for the commit!

-- 
Best regards,
Yuya Watari




Re: brininsert optimization opportunity

2023-07-04 Thread Alvaro Herrera
On 2023-Jul-03, Soumyadeep Chakraborty wrote:

> My colleague, Ashwin, pointed out to me that brininsert's per-tuple init
> of the revmap access struct can have non-trivial overhead.
> 
> Turns out he is right. We are saving 24 bytes of memory per-call for
> the access struct, and a bit on buffer/locking overhead, with the
> attached patch.

Hmm, yeah, I remember being bit bothered by this repeated
initialization.  Your patch looks reasonable to me.  I would set
bistate->bs_rmAccess to NULL in the cleanup callback, just to be sure.
Also, please add comments atop these two new functions, to explain what
they are.

Nice results.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/




Re: Incremental sort for access method with ordered scan support (amcanorderbyop)

2023-07-04 Thread David Rowley
On Tue, 4 Jul 2023 at 20:12, Richard Guo  wrote:
> The v4 patch looks good to me (maybe some cosmetic tweaks are still
> needed for the comments).  I think it's now 'Ready for Committer'.

I agree. I went and hit the comments with a large hammer and while
there also adjusted the regression tests. I didn't think having "t" as
a table name was a good idea as it seems like a name with a high risk
of conflicting with a concurrently running test. Also, there didn't
seem to be much need to insert data into that table as the tests
didn't query any of it.

The only other small tweak I made was to not call list_copy_head()
when the list does not need to be shortened. It's likely not that
important, but if the majority of cases are not partial matches, then
we'd otherwise be needlessly making copies of the list.

I pushed the adjusted patch.

David




Re: Add 64-bit XIDs into PostgreSQL 15

2023-07-04 Thread Aleksander Alekseev
Hi,

> This patch hasn't applied in quite some time, and the thread has moved to
> discussing higher lever items rather than the suggested patch, so I'm closing
> this as Returned with Feedback.  Please feel free to resubmit when there is
> renewed interest and a concensus on how/what to proceed with.

Yes, this thread awaits several other patches to be merged [1] in
order to continue, so it makes sense to mark it as RwF for the time
being. Thanks!

[1]: https://commitfest.postgresql.org/43/3489/

-- 
Best regards,
Aleksander Alekseev




Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

2023-07-04 Thread Amit Kapila
On Mon, Jul 3, 2023 at 7:45 AM Masahiko Sawada  wrote:
>
> Commit 89e46da5e5 allowed us to use indexes for searching on REPLICA
> IDENTITY FULL tables. The documentation explains:
>
> When replica identity full is specified,
> indexes can be used on the subscriber side for searching the rows.  Candidate
> indexes must be btree, non-partial, and have at least one column reference
> (i.e. cannot consist of only expressions).
>
> To be exact, IIUC the column reference must be on the leftmost column
> of indexes. Does it make sense to mention that?
>

Yeah, I think it is good to mention that. Accordingly, the comments
atop build_replindex_scan_key(),
FindUsableIndexForReplicaIdentityFull(), IsIndexOnlyOnExpression()
should also be adjusted.

-- 
With Regards,
Amit Kapila.




Re: Extension Enhancement: Buffer Invalidation in pg_buffercache

2023-07-04 Thread Japin Li


On Tue, 04 Jul 2023 at 17:00, jian he  wrote:
> the following will also crash. no idea why.
> begin;
> select count(*) from onek;
> select relpages from pg_class where relname = 'onek'; --queryA
>
> SELECT count(*) FROM pg_buffercache WHERE relfilenode =
> pg_relation_filenode('onek'::regclass); --queryB
>
> insert into onek values(default);
>
> select count(pg_buffercache_invalidate(bufferid)) from
> pg_buffercache where relfilenode =
> pg_relation_filenode('onek'::regclass);
>
> -
> queryA returns 35, queryB returns 37.
> --
> crash info:
> test_dev=*# insert into onek values(default);
> INSERT 0 1
> test_dev=*# select count(pg_buffercache_invalidate(bufferid)) from
> pg_buffercache where relfilenode =
> pg_relation_filenode('onek'::regclass);
> TRAP: failed Assert("resarr->nitems < resarr->maxitems"), File:
> "../../Desktop/pg_sources/main/postgres/src/backend/utils/resowner/resowner.c",
> Line: 275, PID: 1533312

According to the comments of ResourceArrayAdd(), the caller must have previously
done ResourceArrayEnlarge(). I tried to call ResourceOwnerEnlargeBuffers() 
before
PinBuffer_Locked(), so it can avoid this crash.

if ((buf_state & BM_DIRTY) == BM_DIRTY)
{
+   /* make sure we can handle the pin */
+   ResourceOwnerEnlargeBuffers(CurrentResourceOwner);
+
/*
 * Try once to flush the dirty buffer.
 */
PinBuffer_Locked(bufHdr);

-- 
Regrads,
Japin Li.




Re: Extension Enhancement: Buffer Invalidation in pg_buffercache

2023-07-04 Thread jian he
the following will also crash. no idea why.
begin;
select count(*) from onek;
select relpages from pg_class where relname = 'onek'; --queryA

SELECT count(*) FROM pg_buffercache WHERE relfilenode =
pg_relation_filenode('onek'::regclass); --queryB

insert into onek values(default);

select count(pg_buffercache_invalidate(bufferid)) from
pg_buffercache where relfilenode =
pg_relation_filenode('onek'::regclass);

-
queryA returns 35, queryB returns 37.
--
crash info:
test_dev=*# insert into onek values(default);
INSERT 0 1
test_dev=*# select count(pg_buffercache_invalidate(bufferid)) from
pg_buffercache where relfilenode =
pg_relation_filenode('onek'::regclass);
TRAP: failed Assert("resarr->nitems < resarr->maxitems"), File:
"../../Desktop/pg_sources/main/postgres/src/backend/utils/resowner/resowner.c",
Line: 275, PID: 1533312
postgres: jian test_dev [local]
SELECT(ExceptionalCondition+0xa1)[0x55fc8f8d14e1]
postgres: jian test_dev [local] SELECT(+0x9e7ab3)[0x55fc8f915ab3]
postgres: jian test_dev [local]
SELECT(ResourceOwnerRememberBuffer+0x1d)[0x55fc8f91696d]
postgres: jian test_dev [local] SELECT(+0x78ab17)[0x55fc8f6b8b17]
postgres: jian test_dev [local]
SELECT(TryInvalidateBuffer+0x6d)[0x55fc8f6c507d]
/home/jian/postgres/pg16_test/lib/pg_buffercache.so(pg_buffercache_invalidate+0x3d)[0x7f2361837abd]
postgres: jian test_dev [local] SELECT(+0x57eebc)[0x55fc8f4acebc]
postgres: jian test_dev [local]
SELECT(ExecInterpExprStillValid+0x3c)[0x55fc8f4a6e2c]
postgres: jian test_dev [local] SELECT(+0x5a0f16)[0x55fc8f4cef16]
postgres: jian test_dev [local] SELECT(+0x5a3588)[0x55fc8f4d1588]
postgres: jian test_dev [local] SELECT(+0x58f747)[0x55fc8f4bd747]
postgres: jian test_dev [local]
SELECT(standard_ExecutorRun+0x1f0)[0x55fc8f4b29f0]
postgres: jian test_dev [local] SELECT(ExecutorRun+0x46)[0x55fc8f4b2d16]
postgres: jian test_dev [local] SELECT(+0x7eb3b0)[0x55fc8f7193b0]
postgres: jian test_dev [local] SELECT(PortalRun+0x1eb)[0x55fc8f71b7ab]
postgres: jian test_dev [local] SELECT(+0x7e8cf4)[0x55fc8f716cf4]
postgres: jian test_dev [local] SELECT(PostgresMain+0x134f)[0x55fc8f71869f]
postgres: jian test_dev [local] SELECT(+0x70f80c)[0x55fc8f63d80c]
postgres: jian test_dev [local]
SELECT(PostmasterMain+0x1758)[0x55fc8f63f278]
postgres: jian test_dev [local] SELECT(main+0x27e)[0x55fc8f27067e]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f2361629d90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f2361629e40]
postgres: jian test_dev [local] SELECT(_start+0x25)[0x55fc8f272bb5]
2023-07-04 16:56:13.088 CST [1532822] LOG:  server process (PID 1533312)
was terminated by signal 6: Aborted
2023-07-04 16:56:13.088 CST [1532822] DETAIL:  Failed process was running:
select count(pg_buffercache_invalidate(bufferid)) from
pg_buffercache where relfilenode =
pg_relation_filenode('onek'::regclass);
2023-07-04 16:56:13.088 CST [1532822] LOG:  terminating any other active
server processes
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: 2023-07-04
16:56:13.091 CST [1533381] FATAL:  the database system is in recovery mode
Failed.
The connection to the server was lost. Attempting reset: Failed.


Re: ALTER TABLE SET ACCESS METHOD on partitioned tables

2023-07-04 Thread Daniel Gustafsson
Have you had a chance to address the comments raised by Michael in his last
review such that a new patch revision can be submitted?

--
Daniel Gustafsson





Re: Extension Enhancement: Buffer Invalidation in pg_buffercache

2023-07-04 Thread Japin Li


On Mon, 03 Jul 2023 at 16:26, Palak Chaturvedi  
wrote:
> Hi Thomas,
> Thank you for your suggestions. I have added the sql in the meson
> build as well.
>
> On Sat, 1 Jul 2023 at 03:39, Thomas Munro  wrote:
>>
>> On Fri, Jun 30, 2023 at 10:47 PM Palak Chaturvedi
>>  wrote:
>> > pgbench=# select count(pg_buffercache_invalidate(bufferid)) from
>> > pg_buffercache where relfilenode =
>> > pg_relation_filenode('pgbench_accounts'::regclass);
>>
>> Hi Palak,
>>
>> Thanks for working on this!  I think this will be very useful for
>> testing existing workloads but also for testing future work on
>> prefetching with AIO (and DIO), work on putting SLRUs (or anything
>> else) into the buffer pool, nearby proposals for caching buffer
>> mapping information, etc etc.
>>
>> Palak and I talked about this idea a bit last week (stimulated by a
>> recent thread[1], but the topic has certainly come up before), and we
>> discussed some different ways one could specify which pages are
>> dropped.  For example, perhaps the pg_prewarm extension could have an
>> 'unwarm' option instead.  I personally thought the buffer ID-based
>> approach was quite good because it's extremely simple, while giving
>> the user the full power of SQL to say which buffers.   Half a table?
>> Visibility map?  Everything?  Root page of an index?  I think that's
>> probably better than something that requires more code and
>> complication but is less flexible in the end.  It feels like the right
>> level of rawness for something primarily of interest to hackers and
>> advanced users.  I don't think it matters that there is a window
>> between selecting a buffer ID and invalidating it, for the intended
>> use cases.  That's my vote, anyway, let's see if others have other
>> ideas...
>>
>> We also talked a bit about how one might control the kernel page cache
>> in more fine-grained ways for testing purposes, but it seems like the
>> pgfincore project has that covered with its pgfadvise_willneed() and
>> pgfadvise_dontneed().  IMHO that project could use more page-oriented
>> operations (instead of just counts and coarse grains operations) but
>> that's something that could be material for patches to send to the
>> extension maintainers.  This work, in contrast, is more tangled up
>> with bufmgr.c internals, so it feels like this feature belongs in a
>> core contrib module.
>>
>> Some initial thoughts on the patch:
>>
>> I wonder if we should include a simple exercise in
>> contrib/pg_buffercache/sql/pg_buffercache.sql.  One problem is that
>> it's not guaranteed to succeed in general.  It doesn't wait for pins
>> to go away, and it doesn't retry cleaning dirty buffers after one
>> attempt, it just returns false, which I think is probably the right
>> approach, but it makes the behaviour too non-deterministic for simple
>> tests.  Perhaps it's enough to include an exercise where we call it a
>> few times to hit a couple of cases, but not verify what effect it has.
>>
>> It should be restricted by role, but I wonder which role it should be.
>> Testing for superuser is now out of fashion.
>>
>> Where the Makefile mentions 1.4--1.5.sql, the meson.build file needs
>> to do the same.  That's because PostgreSQL is currently in transition
>> from autoconf/gmake to meson/ninja[2], so for now we have to maintain
>> both build systems.  That's why it fails to build in some CI tasks[3].
>> You can enable CI in your own GitHub account if you want to run test
>> builds on several operating systems, see [4] for info.
>>
>> [1] 
>> https://www.postgresql.org/message-id/flat/CAFSGpE3y_oMK1uHhcHxGxBxs%2BKrjMMdGrE%2B6HHOu0vttVET0UQ%40mail.gmail.com
>> [2] https://wiki.postgresql.org/wiki/Meson
>> [3] http://cfbot.cputube.org/palak-chaturvedi.html
>> [4] 
>> https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob_plain;f=src/tools/ci/README;hb=HEAD

I think, zero is not a valid buffer identifier. See src/include/storage/buf.h.

+   bufnum = PG_GETARG_INT32(0);
+   if (bufnum < 0 || bufnum > NBuffers)
+   {
+   ereport(ERROR,
+   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+errmsg("buffernum is not valid")));
+
+   }

If we use SELECT pg_buffercache_invalidate(0), it will crash.

-- 
Regrads,
Japin Li.




Re: Incremental sort for access method with ordered scan support (amcanorderbyop)

2023-07-04 Thread Richard Guo
On Sun, Jul 2, 2023 at 12:02 PM Miroslav Bendik 
wrote:

> Thanks, for suggestions.
>
> On Sun 02. 07. 2023 at 10:18 Richard Guo  wrote:
> > 1. For comment "On success, the result list is ordered by pathkeys.", I
> > think it'd be more accurate if we say something like "On success, the
> > result list is ordered by pathkeys or a prefix list of pathkeys."
> > considering the possibility of incremental sort.
> >
> > 2. The comment below is not true anymore.
> >
> >/*
> > * Note: for any failure to match, we just return NIL immediately.
> > * There is no value in matching just some of the pathkeys.
> > */
> > We should either remove it or change it to emphasize that we may return
> > a prefix of the pathkeys for incremental sort.
>
> Comments are updated now.
>
> > BTW, would you please add the patch to the CF to not lose track of it?
>
> Submitted 


The v4 patch looks good to me (maybe some cosmetic tweaks are still
needed for the comments).  I think it's now 'Ready for Committer'.

Thanks
Richard


Re: Exposing the lock manager's WaitForLockers() to SQL

2023-07-04 Thread Will Mortensen
Updated patch with more tests and a first attempt at doc updates.

As the commit message and doc now point out, using
WaitForLockersMultiple() makes for a behavior difference with actually
locking multiple tables, in that the combined set of conflicting locks
is obtained only once for all tables, rather than obtaining conflicts
and locking / waiting for just the first table and then obtaining
conflicts and locking / waiting for the second table, etc. This is
definitely desirable for my use case, but maybe these kinds of
differences illustrate the potential awkwardness of extending LOCK?

Thanks again for any and all feedback!


v2-0001-Add-WAIT-ONLY-option-to-LOCK-statement.patch
Description: Binary data


Re: suppressing useless wakeups in logical/worker.c

2023-07-04 Thread Daniel Gustafsson
> On 17 Mar 2023, at 10:16, Amit Kapila  wrote:

> Few minor comments:

Have you had a chance to address the comments raised by Amit in this thread?

--
Daniel Gustafsson





Re: Allow parallel plan for referential integrity checks?

2023-07-04 Thread Daniel Gustafsson
> On 20 Mar 2023, at 16:48, Frédéric Yhuel  wrote:
> On 3/20/23 15:58, Gregory Stark (as CFM) wrote:

>> Should we move it to next release at this
>> point? Even if you get time to work on it this commitfest do you think
>> it's likely to be committable in the next few weeks?
> 
> It is very unlikely. Maybe it's better to remove it from CF and put it back 
> later if the test case I will provide does a better job at convincing the 
> Postgres folks that RI checks should be parallelized.

As there is no new patch submitted I will go ahead and do that, please feel
free to resubmit when there is renewed interest in working on this.

--
Daniel Gustafsson





Re: Add 64-bit XIDs into PostgreSQL 15

2023-07-04 Thread Daniel Gustafsson
This patch hasn't applied in quite some time, and the thread has moved to
discussing higher lever items rather than the suggested patch, so I'm closing
this as Returned with Feedback.  Please feel free to resubmit when there is
renewed interest and a concensus on how/what to proceed with.

--
Daniel Gustafsson





Re: [PATCH] Add native windows on arm64 support

2023-07-04 Thread Michael Paquier
On Thu, May 11, 2023 at 01:19:54PM +0900, Michael Paquier wrote:
> On Thu, May 11, 2023 at 12:17:25PM +1200, Thomas Munro wrote:
>> Azure does have an image "Microsoft Windows 11 Preview arm64" to run
>> on Ampere CPUs, which says it's a pre-release version intended for
>> validation, which sounds approximately like what we want.  I will try
>> to find out more.
> 
> Interesting.  Thanks for mentioning it.

Now that v17 is open, I was looking at v8 posted at [1] and I don't
have much more to add about it.  The problem about the lack of
buildfarm machine is still annoying, but I don't see a reason not to
move forward and let people play with this stuff on HEAD.  At least
that would be progress.  Any thoughts?

Thomas, what's the state of ARM support for Windows on Azure?  Is that
still in preview?

[1]: 
https://www.postgresql.org/message-id/dbee741f-b9b7-a0d5-1b1b-f9b532bb6f56%40linaro.org
--
Michael


signature.asc
Description: PGP signature


Re: Autogenerate some wait events code and documentation

2023-07-04 Thread Drouvot, Bertrand

Hi,

On 7/3/23 9:11 AM, Michael Paquier wrote:

On Mon, Jul 03, 2023 at 03:57:42PM +0900, Michael Paquier wrote:


Thanks for looking at it and having fixed the issues that were present in
v10.


I think that we should add some options to the perl script to be more
selective with the files generated.  How about having two options
called --docs and --code to select one or the other, then limit what
gets generated in each path?  I guess that it would be cleaner if we
error in the case where both options are defined, and just use some
gotos to redirect to each foreach loop to limit extra indentations in
the script.  This would avoid the need to remove the C and H files
from the docs, additionally, which is what the Makefile in doc/ does.

I have fixed all the issues I've found in v11 attached, except for the
last one (I have added the OUTDIR trick for reference, but that's
incorrect and incomplete).  Could you look at this part?


Ah.  It took me a few extra minutes, but I think that we should set
"capture" to "false", no?  It looks like meson is getting confused,
expecting something in stdout but the new script generates a few
files, and does not output anything.  That's different from the other
doc-related perl scripts.
--


Yeah, with "capture" set to "false" then ninja alldocs does not error out
and wait_event_types.sgml gets generated.

I'll look at the extra options --code and --docs.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Check lateral references within PHVs for memoize cache keys

2023-07-04 Thread Richard Guo
On Fri, Dec 30, 2022 at 11:00 AM Richard Guo  wrote:

> On Fri, Dec 9, 2022 at 5:16 PM Richard Guo  wrote:
>
>> Actually we do have checked PHVs for lateral references, earlier in
>> create_lateral_join_info.  But that time we only marked lateral_relids
>> and direct_lateral_relids, without remembering the lateral expressions.
>> So I'm wondering whether we can fix that by fetching Vars (or PHVs) of
>> lateral references within PlaceHolderVars and remembering them in the
>> baserel's lateral_vars.
>>
>> Attach a draft patch to show my thoughts.
>>
>
> Update the patch to fix test failures.
>

Rebase the patch on HEAD as cfbot reminds.

Thanks
Richard


v3-0001-Check-lateral-refs-within-PHVs-for-memoize-cache-keys.patch
Description: Binary data


Re: O(n) tasks cause lengthy startups and checkpoints

2023-07-04 Thread Daniel Gustafsson
> On 4 Apr 2023, at 05:36, Nathan Bossart  wrote:
> 
> I sent this one to the next commitfest and marked it as waiting-on-author
> and targeted for v17.  I'm aiming to have something that addresses the
> latest feedback ready for the July commitfest.

Have you had a chance to look at this such that there is something ready?

--
Daniel Gustafsson





Re: real/float example for testlibpq3

2023-07-04 Thread Daniel Gustafsson
> On 21 Mar 2023, at 14:44, Mark Wong  wrote:
> 
> On Mon, Mar 20, 2023 at 01:37:57PM -0400, Gregory Stark (as CFM) wrote:
>> On Mon, 23 Jan 2023 at 11:54, Mark Wong  wrote:
>> fficient way to communicate useful information.
>>> 
>>> Yeah, I will try to cover all the data types we ship. :)  I'll keep at
>>> it and drop the code examples.
>> 
>> I assume this isn't going to happen for this commitfest? If you think
>> it is then shout otherwise I'll mark it Returned with Feedback and
>> move it on to the next release.
> 
> Sounds good.  I actually thought I did that already, thanks for catching
> that.

This has been waiting on author since January, unless there is a new patch
ready I will close this as Returned with Feedback and you can resubmit for
another CF when there is a new patch.

--
Daniel Gustafsson





Re: collect_corrupt_items_vacuum.patch

2023-07-04 Thread Daniel Gustafsson
This patch has been waiting on the author for about a year now, so I will close
it as Returned with Feedback.  Plesae feel free to resubmit to a future CF when
there is renewed interest in working on this.

--
Daniel Gustafsson





Re: Deleting prepared statements from libpq.

2023-07-04 Thread Michael Paquier
On Tue, Jul 04, 2023 at 08:28:40AM +0900, Michael Paquier wrote:
> Sure, feel free.  I was planning to look at and play more with it.

Well, done.
--
Michael


signature.asc
Description: PGP signature


Re: PG 16 draft release notes ready

2023-07-04 Thread Michael Paquier
On Thu, May 18, 2023 at 04:49:47PM -0400, Bruce Momjian wrote:
> I have completed the first draft of the PG 16 release notes.  You can
> see the output here:
> 
>   https://momjian.us/pgsql_docs/release-16.html
> 
> I will adjust it to the feedback I receive;  that URL will quickly show
> all updates.

Sawada-san has mentioned on twitter that fdd8937 is not mentioned in
the release notes, and it seems to me that he is right.  This is
described as a bug in the commit log, but it did not get backpatched
because of the lack of complaints.  Also, because we've removed
support for anything older than Windows 10 in PG16, this change very
easy to do.
--
Michael


signature.asc
Description: PGP signature


Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2023-07-04 Thread Michael Paquier
On Tue, Jul 04, 2023 at 02:15:18PM +0800, Julien Rouhaud wrote:
> Thanks, I actually saw that and already took care of removing openssl support 
> a
> couple of hours ago, and also added a new note on the animal to remember when
> it was removed.  It should come back to green at the next scheduled run.

Thanks!
--
Michael


signature.asc
Description: PGP signature


Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2023-07-04 Thread Julien Rouhaud
Hi,

On Tue, Jul 04, 2023 at 03:03:01PM +0900, Michael Paquier wrote:
> On Tue, Jul 04, 2023 at 07:16:47AM +0900, Michael Paquier wrote:
> > The second and third animals to fail are skate and snapper, both using
> > Debian 7 Wheezy.  As far as I know, it was an LTS supported until
> > 2018.  The owner of both machines is added in CC.  I guess that we
> > this stuff could just remove --with-openssl from the configure
> > switches.
>
> lapwing has reported a failure and runs a Debian 7, so adding Julien
> in CC about the removal of --with-openssl or similar in this animal.

Thanks, I actually saw that and already took care of removing openssl support a
couple of hours ago, and also added a new note on the animal to remember when
it was removed.  It should come back to green at the next scheduled run.




Re: Extension Enhancement: Buffer Invalidation in pg_buffercache

2023-07-04 Thread Palak Chaturvedi
hi,
I don't think we need to check the usage count. Because we are
clearing all the buffers that are not pinned.
Checking the usage count is for buffer replacement since we are not
replacing it does not matter.
On Mon, 3 Jul 2023 at 21:16, jian he  wrote:
>
> On Mon, Jul 3, 2023 at 4:26 PM Palak Chaturvedi
>  wrote:
> >
> > Hi Thomas,
> > Thank you for your suggestions. I have added the sql in the meson
> > build as well.
> >
> > On Sat, 1 Jul 2023 at 03:39, Thomas Munro  wrote:
> > >
> > > On Fri, Jun 30, 2023 at 10:47 PM Palak Chaturvedi
> > >  wrote:
> > > > pgbench=# select count(pg_buffercache_invalidate(bufferid)) from
> > > > pg_buffercache where relfilenode =
> > > > pg_relation_filenode('pgbench_accounts'::regclass);
> > >
> > > Hi Palak,
> > >
> > > Thanks for working on this!  I think this will be very useful for
> > > testing existing workloads but also for testing future work on
> > > prefetching with AIO (and DIO), work on putting SLRUs (or anything
> > > else) into the buffer pool, nearby proposals for caching buffer
> > > mapping information, etc etc.
> > >
> > > Palak and I talked about this idea a bit last week (stimulated by a
> > > recent thread[1], but the topic has certainly come up before), and we
> > > discussed some different ways one could specify which pages are
> > > dropped.  For example, perhaps the pg_prewarm extension could have an
> > > 'unwarm' option instead.  I personally thought the buffer ID-based
> > > approach was quite good because it's extremely simple, while giving
> > > the user the full power of SQL to say which buffers.   Half a table?
> > > Visibility map?  Everything?  Root page of an index?  I think that's
> > > probably better than something that requires more code and
> > > complication but is less flexible in the end.  It feels like the right
> > > level of rawness for something primarily of interest to hackers and
> > > advanced users.  I don't think it matters that there is a window
> > > between selecting a buffer ID and invalidating it, for the intended
> > > use cases.  That's my vote, anyway, let's see if others have other
> > > ideas...
> > >
> > > We also talked a bit about how one might control the kernel page cache
> > > in more fine-grained ways for testing purposes, but it seems like the
> > > pgfincore project has that covered with its pgfadvise_willneed() and
> > > pgfadvise_dontneed().  IMHO that project could use more page-oriented
> > > operations (instead of just counts and coarse grains operations) but
> > > that's something that could be material for patches to send to the
> > > extension maintainers.  This work, in contrast, is more tangled up
> > > with bufmgr.c internals, so it feels like this feature belongs in a
> > > core contrib module.
> > >
> > > Some initial thoughts on the patch:
> > >
> > > I wonder if we should include a simple exercise in
> > > contrib/pg_buffercache/sql/pg_buffercache.sql.  One problem is that
> > > it's not guaranteed to succeed in general.  It doesn't wait for pins
> > > to go away, and it doesn't retry cleaning dirty buffers after one
> > > attempt, it just returns false, which I think is probably the right
> > > approach, but it makes the behaviour too non-deterministic for simple
> > > tests.  Perhaps it's enough to include an exercise where we call it a
> > > few times to hit a couple of cases, but not verify what effect it has.
> > >
> > > It should be restricted by role, but I wonder which role it should be.
> > > Testing for superuser is now out of fashion.
> > >
> > > Where the Makefile mentions 1.4--1.5.sql, the meson.build file needs
> > > to do the same.  That's because PostgreSQL is currently in transition
> > > from autoconf/gmake to meson/ninja[2], so for now we have to maintain
> > > both build systems.  That's why it fails to build in some CI tasks[3].
> > > You can enable CI in your own GitHub account if you want to run test
> > > builds on several operating systems, see [4] for info.
> > >
> > > [1] 
> > > https://www.postgresql.org/message-id/flat/CAFSGpE3y_oMK1uHhcHxGxBxs%2BKrjMMdGrE%2B6HHOu0vttVET0UQ%40mail.gmail.com
> > > [2] https://wiki.postgresql.org/wiki/Meson
> > > [3] http://cfbot.cputube.org/palak-chaturvedi.html
> > > [4] 
> > > https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob_plain;f=src/tools/ci/README;hb=HEAD
>
> newbie question:
> quote from: https://www.interdb.jp/pg/pgsql08.html
> >
> > Pinned: When the corresponding buffer pool slot stores a page and any 
> > PostgreSQL processes are accessing the page (i.e. refcount and usage_count 
> > are greater than or equal to 1), the state of this buffer descriptor is 
> > pinned.
> > Unpinned: When the corresponding buffer pool slot stores a page but no 
> > PostgreSQL processes are accessing the page (i.e. usage_count is greater 
> > than or equal to 1, but refcount is 0), the state of this buffer descriptor 
> > is unpinned.
>
>
> So do you need to check BUF_STATE_GET_REFCOUNT(buf_state) and
> 

Re: ProcessStartupPacket(): database_name and user_name truncation

2023-07-04 Thread Drouvot, Bertrand

Hi,

On 7/3/23 10:34 PM, Nathan Bossart wrote:

On Sat, Jul 01, 2023 at 04:02:06PM +0200, Drouvot, Bertrand wrote:

Please find V2 attached where it's failing as soon as the database name or
user name are detected as overlength.


Thanks, Bertrand.  I chickened out and ended up committing v1 for now
(i.e., simply removing the truncation code).  I didn't like the idea of
trying to keep the new error messages consistent with code in faraway
files, and the startup packet length limit is already pretty aggressive, so
I'm a little less concerned about lugging around long names.  Plus, I think
v2 had some subtle interactions with db_user_namespace (maybe for the
better), but I didn't spend too much time looking at that since
db_user_namespace will likely be removed soon.


Thanks Nathan for the feedback and explanations, I think that makes fully sense.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2023-07-04 Thread Michael Paquier
On Tue, Jul 04, 2023 at 07:16:47AM +0900, Michael Paquier wrote:
> The second and third animals to fail are skate and snapper, both using
> Debian 7 Wheezy.  As far as I know, it was an LTS supported until
> 2018.  The owner of both machines is added in CC.  I guess that we
> this stuff could just remove --with-openssl from the configure
> switches.

lapwing has reported a failure and runs a Debian 7, so adding Julien
in CC about the removal of --with-openssl or similar in this animal.
--
Michael


signature.asc
Description: PGP signature


Re: Including a sample Table Access Method with core code

2023-07-04 Thread Michael Paquier
On Mon, Jul 03, 2023 at 08:33:32PM +0200, Hannu Krosing wrote:
> One thing that was briefly mentioned (but is missing from the notes)
> is need to have a sample API client in contrib/ , both for having a
> 2nd user for API to make it more likely that non-heap AMs are doable
> and also to serve as an easy starting point for someone interested in
> developing a new AM.

That sounds like a fair thing to have, though templates may live
better under src/test/modules.

> There are a few candidates which could be lightweight enough for this
> 
> * in-memory temp tables, especially if you specify max table size at
> creation and/or limit data types which can be used.
>
> * "overlay tables" - tables which "overlay" another - possibly
> read-only - table and store only changed rows and tombstones for
> deletions. (this likely would make more sense as a FDW itself as Table
> AM currently knows nothing about Primary Keys and these are likely
> needed for overlays)
> 
> * Table AM as a (pl/)Python Class - this is inspired by the amazing
> Multicorn [2] FDW-in-Python tool which made it ridiculously easy to
> expose anything (mailbox, twitter feed, git commit history,
> you-name-it) as a Foreign Table

I cannot say how simple that is without seeing the code, but limiting
the use of an AM to be linked to a single session sounds like a
concept simple enough, limiting its relpersistence on the way.  One
thing that may be also interesting is something that does not go
through the Postgres buffer pool.

> Included Mark Dilger directly to this mail as he mentioned he has a
> Perl script that makes a functional copy of heap AM that can be
> compiled as installed as custom AM.

Similar discussion has happened in 640c198 related to the creation of 
dummy_index_am, where the argument is that such a module needs to
provide value in testing some of the core internals.  dummy_index_am
did so for reloptions on indexes because there was not much coverage
for that part of the system.

> @mark - maybe you can create 3 boilerplate Table AMs for the above
> named `mem_am`, `overlay_am` and `py3_am` and we could put them
> somewhere for interested parties to play with ?

Not sure if that's worth counting, but I also have a table AM template
stored in my plugin repo:
https://github.com/michaelpq/pg_plugins/tree/main/blackhole_am

It does as much as its name states, being able to eat all the data fed
to it.
--
Michael


signature.asc
Description: PGP signature