Re: Add regular expression testing for user name mapping in the peer authentication TAP test

2022-10-14 Thread Drouvot, Bertrand

Hi,

On 10/15/22 5:11 AM, Michael Paquier wrote:

On Fri, Oct 14, 2022 at 06:31:15PM +0200, Drouvot, Bertrand wrote:

while working on [1], I thought it could also be useful to add regular
expression testing for user name mapping in the peer authentication TAP
test.


Good idea now that we have a bit more coverage in the authentication
tests.


Thanks for looking at it!


+# Test with regular expression in user name map.
+my $last_system_user_char = substr($system_user, -1);


This would attach to the regex the last character of the system user.


Right.


I would perhaps have used more characters than that (-3?), as substr()
with a negative number larger than the string given in input would
give the entire string.  That's a nit, though.


I don't have a strong opinion on this, so let's extract the last 3 
characters. This is what v2 attached does.





+# The regular expression does not match.
+reset_pg_ident($node, 'mypeermap', '/^$', 'testmapuser');


This matches only an empty string, my brain gets that right?


Right. Giving a second thought to the non matching case, I think I'd 
prefer to concatenate the system_user to the system_user instead. This 
is what v2 does.


Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comdiff --git a/src/test/authentication/t/003_peer.pl 
b/src/test/authentication/t/003_peer.pl
index fc951dea06..e719b05d0f 100644
--- a/src/test/authentication/t/003_peer.pl
+++ b/src/test/authentication/t/003_peer.pl
@@ -23,18 +23,34 @@ sub reset_pg_hba
return;
 }
 
+# Delete pg_ident.conf from the given node, add a new entry to it
+# and then execute a reload to refresh it.
+sub reset_pg_ident
+{
+   my $node= shift;
+   my $map_name= shift;
+   my $system_user = shift;
+   my $pg_user = shift;
+
+   unlink($node->data_dir . '/pg_ident.conf');
+   $node->append_conf('pg_ident.conf', "$map_name $system_user $pg_user");
+   $node->reload;
+   return;
+}
+
 # Test access for a single role, useful to wrap all tests into one.
 sub test_role
 {
local $Test::Builder::Level = $Test::Builder::Level + 1;
 
-   my ($node, $role, $method, $expected_res, %params) = @_;
+   my ($node, $role, $method, $expected_res, $test_details, %params) = @_;
my $status_string = 'failed';
$status_string = 'success' if ($expected_res eq 0);
 
my $connstr = "user=$role";
my $testname =
- "authentication $status_string for method $method, role $role";
+ "authentication $status_string for method $method, role $role "
+ . $test_details;
 
if ($expected_res eq 0)
{
@@ -87,16 +103,50 @@ my $system_user =
 # Tests without the user name map.
 # Failure as connection is attempted with a database role not mapping
 # to an authorized system user.
-test_role($node, qq{testmapuser}, 'peer', 2,
+test_role(
+   $node, qq{testmapuser}, 'peer', 2,
+   'without user name map',
log_like => [qr/Peer authentication failed for user "testmapuser"/]);
 
 # Tests with a user name map.
-$node->append_conf('pg_ident.conf', qq{mypeermap $system_user testmapuser});
+reset_pg_ident($node, 'mypeermap', $system_user, 'testmapuser');
 reset_pg_hba($node, 'peer map=mypeermap');
 
 # Success as the database role matches with the system user in the map.
-test_role($node, qq{testmapuser}, 'peer', 0,
+test_role($node, qq{testmapuser}, 'peer', 0, 'with user name map',
log_like =>
  [qr/connection authenticated: identity="$system_user" method=peer/]);
 
+# Test with regular expression in user name map.
+# Extract the last 3 characters from the system_user
+# or the entire system_user (if its length is <= -3).
+my $regex_test_string = substr($system_user, -3);
+
+# The regular expression matches.
+reset_pg_ident($node, 'mypeermap', qq{/^.*$regex_test_string\$},
+   'testmapuser');
+test_role(
+   $node,
+   qq{testmapuser},
+   'peer',
+   0,
+   'with regular expression in user name map',
+   log_like =>
+ [qr/connection authenticated: identity="$system_user" method=peer/]);
+
+
+# Concatenate system_user to system_user.
+$regex_test_string = $system_user . $system_user;
+
+# The regular expression does not match.
+reset_pg_ident($node, 'mypeermap', qq{/^.*$regex_test_string\$},
+   'testmapuser');
+test_role(
+   $node,
+   qq{testmapuser},
+   'peer',
+   2,
+   'with regular expression in user name map',
+   log_like => [qr/no match in usermap "mypeermap" for user 
"testmapuser"/]);
+
 done_testing();


Re: Eliminating SPI from RI triggers - take 2

2022-10-14 Thread Amit Langote
On Wed, Oct 12, 2022 at 2:27 AM Robert Haas  wrote:
>
> On Thu, Sep 29, 2022 at 12:47 AM Amit Langote  wrote:
> > [ patches ]
>
> While looking over this thread I came across this code:

Thanks for looking.

> /* For data reading, executor always omits detached partitions */
> if (estate->es_partition_directory == NULL)
> estate->es_partition_directory =
> CreatePartitionDirectory(estate->es_query_cxt, false);
>
> But CreatePartitionDirectory is declared like this:
>
> extern PartitionDirectory CreatePartitionDirectory(MemoryContext mcxt,
> bool omit_detached);
>
> So the comment seems to say the opposite of what the code does. The
> code seems to match the explanation in the commit message for
> 71f4c8c6f74ba021e55d35b1128d22fb8c6e1629, so I am guessing that
> perhaps s/always/never/ is needed here.

I think you are right.  In commit 8aba9322511 that fixed a bug in this
area, we have this hunk:

-   /* Executor must always include detached partitions */
+   /* For data reading, executor always omits detached partitions */
if (estate->es_partition_directory == NULL)
estate->es_partition_directory =
-   CreatePartitionDirectory(estate->es_query_cxt, true);
+   CreatePartitionDirectory(estate->es_query_cxt, false);

The same commit also renamed the include_detached parameter of
CreatePartitionDirectory() to omit_detached but the comment change
didn't quite match with that.

I will fix this and other related comments to be consistent about
using the word "omit".  Will include them in the updated 0003.

> I also noticed that ExecCreatePartitionPruneState no longer exists in
> the code but is still referenced in
> src/test/modules/delay_execution/specs/partition-addition.spec

It looks like we missed that reference in commit 297daa9d435 wherein
we renamed it to just CreatePartitionPruneState().

I have posted a patch to fix this.

> Regarding 0003, it seems unfortunate that
> find_inheritance_children_extended() will now have 6 arguments 4 of
> which have to do with detached partition handling. That is a lot of
> detached partition handling, and it's hard to reason about. I don't
> see an obvious way of simplifying things very much, but I wonder if we
> could at least have the new omit_detached_snapshot snapshot replace
> the existing bool omit_detached flag. Something like the attached
> incremental patch.

Yeah, I was wondering the same too and don't see a reason why we
couldn't do it that way.

I have merged your incremental patch into 0003.

> Probably we need to go further than the attached, though. I don't
> think that PartitionDirectoryLookup() should be getting any new
> arguments. The whole point of that function is that it's supposed to
> ensure that the returned value is stable, and the comments say so. But
> with these changes it isn't any more, because it depends on the
> snapshot you pass. It seems fine to specify when you create the
> partition directory that you want it to show a different, still-stable
> view of the world, but as written, it seems to me to undermine the
> idea that the return value is expected to be stable at all. Is there a
> way we can avoid that?

Ok, I think it makes sense to have CreatePartitionDirectory take in
the snapshot and store it in PartitionDirectoryData for use during
each subsequent PartitionDirectoryLookup().  So we'll be replacing the
current omit_detached flag in PartitionDirectoryData, just as we are
doing for the interface functions.  Done that way in 0003.

Regarding 0002, which introduces ri_LookupKeyInPkRel(), I realized
that it may have been initializing the ScanKeys wrongly.  It was using
ScanKeyInit(), which uses InvalidOid for sk_subtype, causing the index
AM / btree code to use the wrong comparison functions when PK and FK
column types don't match.  That may have been a reason for 32-bit
machine failures pointed out by Andres upthread.  I've fixed it by
using ScanKeyEntryInitialize() to pass the opfamily-specified right
argument (FK column) type OID.

Attached updated patches.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
From 2abdbf2d3cecf193f9f6dbb154a1bf0c36afae04 Mon Sep 17 00:00:00 2001
From: amitlan 
Date: Wed, 28 Sep 2022 16:37:55 +0900
Subject: [PATCH v7 4/4] Teach ri_LookupKeyInPkRel() to pass
 omit_detached_snapshot

Now that the RI triggers that need to look up PK rows in a
partitioned table can manipulate partitions directly through
ExecGetLeafPartitionForKey(), the snapshot being passed to omit or
include detach-pending partitions can also now be passed explicitly,
rather than using ActiveSnapshot for that purpose.

For the detach-pending partitions to be correctly omitted or included
from the consideration of PK row lookup, the PartitionDesc machinery
needs to see the latest snapshot.  Pushing the latest snapshot to be
the ActiveSnapshot as is done presently meant that even the scans that
should NOT be using the latest snapshot also end up using one to
time-qualify 

Re: CREATE COLLATION must be specified

2022-10-14 Thread Shay Rojansky
> > CREATE COLLATION some_collation (LC_COLLATE = 'en-u-ks-primary',
> >  LC_CTYPE = 'en-u-ks-primary',
> >  PROVIDER = icu,
> >  DETERMINISTIC = False
> > );
> >
> > This works on PG14, but on PG15 it errors with 'parameter "locale" must
> > be specified'.
> >
> > I wanted to make sure this breaking change is intentional (it doesn't
> > seem documented in the release notes or in the docs for CREATE
COLLATION).
>
> This change is intentional, but the documentation could be improved.

I think this is still missing in the PG15 release notes (
https://www.postgresql.org/docs/15/release-15.html).


Re: thinko in basic_archive.c

2022-10-14 Thread Bharath Rupireddy
On Sat, Oct 15, 2022 at 12:03 AM Nathan Bossart
 wrote:
>
> On Fri, Oct 14, 2022 at 02:15:19PM +0530, Bharath Rupireddy wrote:
> > Given that temp file name includes WAL file name, epoch to
> > milliseconds scale and MyProcPid, can there be name collisions after a
> > server crash or even when multiple servers with different pids are
> > archiving/copying the same WAL file to the same directory?
>
> While unlikely, I think it's theoretically possible.

Can you please help me understand how name collisions can happen with
temp file names including WAL file name, timestamp to millisecond
scale, and PID? Having the timestamp is enough to provide a non-unique
temp file name when PID wraparound occurs, right? Am I missing
something here?

> > What happens to the left-over temp files after a server crash? Will
> > they be lying around in the archive directory? I understand that we
> > can't remove such files because we can't distinguish left-over files
> > from a crash and the temp files that another server is in the process
> > of copying.
>
> The temporary files are not automatically removed after a crash.  The
> documentation for basic archive has a note about this [0].

Hm, we cannot remove the temp file for all sorts of crashes, but
having on_shmem_exit() or before_shmem_exit() or atexit() or any such
callback removing it would help us cover some crash scenarios (that
exit with proc_exit() or exit()) at least. I think the basic_archive
module currently leaves temp files around even when the server is
restarted legitimately while copying to or renaming the temp file, no?

I can quickly find these exit callbacks deleting the files:
atexit(cleanup_directories_atexit);
atexit(remove_temp);
before_shmem_exit(ReplicationSlotShmemExit, 0);
before_shmem_exit(logicalrep_worker_onexit, (Datum) 0);
before_shmem_exit(BeforeShmemExit_Files, 0);

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Add regular expression testing for user name mapping in the peer authentication TAP test

2022-10-14 Thread Michael Paquier
On Fri, Oct 14, 2022 at 06:31:15PM +0200, Drouvot, Bertrand wrote:
> while working on [1], I thought it could also be useful to add regular
> expression testing for user name mapping in the peer authentication TAP
> test.

Good idea now that we have a bit more coverage in the authentication
tests.

> +# Test with regular expression in user name map.
> +my $last_system_user_char = substr($system_user, -1);

This would attach to the regex the last character of the system user.
I would perhaps have used more characters than that (-3?), as substr()
with a negative number larger than the string given in input would
give the entire string.  That's a nit, though.

> +# The regular expression does not match.
> +reset_pg_ident($node, 'mypeermap', '/^$', 'testmapuser');

This matches only an empty string, my brain gets that right?
--
Michael


signature.asc
Description: PGP signature


Re: New "single-call SRF" APIs are very confusingly named

2022-10-14 Thread Michael Paquier
On Fri, Oct 14, 2022 at 05:09:46PM -0400, Melanie Plageman wrote:
> To summarize, I am in support of renaming SetSingleFuncCall() ->
> InitMaterializedSRF() and SRF_SINGLE_USE_EXPECTED ->
> MAT_SRF_USE_EXPECTED_TUPLE_DESC (or just DESC) as suggested elsewhere in
> this thread. And I think we should eventually consider renaming the
> multi* function names and consider if ExprSingleResult is a good name.

As already mentioned, these have been around for years, so the impact
would be bigger.  Attached is a patch for HEAD and REL_15_STABLE to
switch this stuff with new names, with what's needed for ABI
compatibility.  My plan would be to keep the compatibility parts only
in 15, and drop them from HEAD.
--
Michael
From 8e01bd6f7753fc21ca8567dff9a1c0ba33cb1a81 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Sat, 15 Oct 2022 11:39:36 +0900
Subject: [PATCH] Rename SetSingleFuncCall() & its flags

---
 src/include/funcapi.h | 14 +++
 src/backend/access/transam/rmgr.c |  2 +-
 src/backend/access/transam/xlogprefetcher.c   |  2 +-
 src/backend/commands/event_trigger.c  |  4 ++--
 src/backend/commands/extension.c  |  6 ++---
 src/backend/commands/prepare.c|  2 +-
 src/backend/foreign/foreign.c |  2 +-
 src/backend/replication/logical/launcher.c|  2 +-
 .../replication/logical/logicalfuncs.c|  2 +-
 src/backend/replication/logical/origin.c  |  2 +-
 src/backend/replication/slotfuncs.c   |  2 +-
 src/backend/replication/walsender.c   |  2 +-
 src/backend/storage/ipc/shmem.c   |  2 +-
 src/backend/utils/adt/datetime.c  |  2 +-
 src/backend/utils/adt/genfile.c   |  4 ++--
 src/backend/utils/adt/hbafuncs.c  |  4 ++--
 src/backend/utils/adt/jsonfuncs.c | 10 
 src/backend/utils/adt/mcxtfuncs.c |  2 +-
 src/backend/utils/adt/misc.c  |  2 +-
 src/backend/utils/adt/pgstatfuncs.c   |  6 ++---
 src/backend/utils/adt/varlena.c   |  2 +-
 src/backend/utils/fmgr/README |  2 +-
 src/backend/utils/fmgr/funcapi.c  | 23 +--
 src/backend/utils/misc/guc_funcs.c|  2 +-
 src/backend/utils/misc/pg_config.c|  2 +-
 src/backend/utils/mmgr/portalmem.c|  2 +-
 .../test_ddl_deparse/test_ddl_deparse.c   |  2 +-
 contrib/amcheck/verify_heapam.c   |  2 +-
 contrib/dblink/dblink.c   |  2 +-
 contrib/pageinspect/brinfuncs.c   |  2 +-
 contrib/pageinspect/gistfuncs.c   |  4 ++--
 .../pg_stat_statements/pg_stat_statements.c   |  2 +-
 contrib/pg_walinspect/pg_walinspect.c |  4 ++--
 contrib/pgrowlocks/pgrowlocks.c   |  2 +-
 contrib/postgres_fdw/connection.c |  2 +-
 contrib/xml2/xpath.c  |  2 +-
 36 files changed, 73 insertions(+), 58 deletions(-)

diff --git a/src/include/funcapi.h b/src/include/funcapi.h
index c9709f25b2..a2dec55338 100644
--- a/src/include/funcapi.h
+++ b/src/include/funcapi.h
@@ -282,7 +282,7 @@ HeapTupleGetDatum(const HeapTupleData *tuple)
  * memory allocated in multi_call_memory_ctx, but holding file descriptors or
  * other non-memory resources open across calls is a bug.  SRFs that need
  * such resources should not use these macros, but instead populate a
- * tuplestore during a single call, as set up by SetSingleFuncCall() (see
+ * tuplestore during a single call, as set up by InitMaterializedSRF() (see
  * fmgr/README).  Alternatively, set up a callback to release resources
  * at query shutdown, using RegisterExprContextCallback().
  *
@@ -291,9 +291,15 @@ HeapTupleGetDatum(const HeapTupleData *tuple)
 
 /* from funcapi.c */
 
-/* flag bits for SetSingleFuncCall() */
-#define SRF_SINGLE_USE_EXPECTED	0x01	/* use expectedDesc as tupdesc */
-#define SRF_SINGLE_BLESS		0x02	/* validate tuple for SRF */
+/* flag bits for InitMaterializedSRF() */
+#define MAT_SRF_USE_EXPECTED_DESC	0x01	/* use expectedDesc as tupdesc */
+#define MAT_SRF_BLESS0x02	/* complete tuple descriptor, for
+			 * a transient RECORD datatype */
+extern void InitMaterializedSRF(FunctionCallInfo fcinfo, bits32 flags);
+
+/* Compatibility declarations, for v15 */
+#define SRF_SINGLE_USE_EXPECTED MAT_SRF_USE_EXPECTED_DESC
+#define SRF_SINGLE_BLESS		MAT_SRF_BLESS
 extern void SetSingleFuncCall(FunctionCallInfo fcinfo, bits32 flags);
 
 extern FuncCallContext *init_MultiFuncCall(PG_FUNCTION_ARGS);
diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c
index 3b6de3aa04..6bb4de387f 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -145,7 +145,7 @@ pg_get_wal_resource_managers(PG_FUNCTION_ARGS)
 	Datum		values[PG_GET_RESOURCE_MANAGERS_COLS];
 	bool		nulls[PG_GET_RESOURCE_MANAGERS_COLS] = {0};
 
-	SetSingleFuncCall(fcinfo, 0);
+	

Re: make_ctags: use -I option to ignore pg_node_attr macro

2022-10-14 Thread Tatsuo Ishii
> On Thu, 13 Oct 2022 15:35:09 +0900 (JST)
> Tatsuo Ishii  wrote:
> 
>> > OK, that sounds good then.  I would make a feature request to have a
>> > switch that supresses creation of these links, then.
>> 
>> Ok, I have added "-n" option to make_ctags so that it skips to create
>> the links.
>> 
>> Also I have changed make_etags so that it exec make_ctags, which seems
>> to be the consensus.
> 
> Thank you for following up my patch.
> I fixed the patch to allow use both -e and -n options together.

Thanks. I have made mostly cosmetic changes so that it is more
consistent with existing scripts.

I would like to push v6 patch if there's no objection.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
diff --git a/src/tools/make_ctags b/src/tools/make_ctags
index 912b6fafac..102881667b 100755
--- a/src/tools/make_ctags
+++ b/src/tools/make_ctags
@@ -1,12 +1,37 @@
 #!/bin/sh
 
-# src/tools/make_ctags
+# src/tools/make_ctags [-e] [-n]
+# If -e is specified, generate tags files for emacs.
+# If -n is specified, don't create symbolic links of tags file.
+usage="Usage:  $0 [-e][-n]"
+if [ $# -gt 2 ]
+then	echo $usage
+	exit 1
+fi
+
+MODE=
+NO_SYMLINK=
+TAGS_FILE="tags"
+
+while [ $# -gt 0 ]
+do
+	if [ $1 = "-e" ]
+	then	MODE="-e"
+		TAGS_FILE="TAGS"
+	elif [ $1 = "-n" ]
+	then	NO_SYMLINK="Y"
+	else
+		echo $usage
+		exit 1
+	fi
+	shift
+done
 
 command -v ctags >/dev/null || \
 	{ echo "'ctags' program not found" 1>&2; exit 1; }
 
 trap "ret=$?; rm -rf /tmp/$$; exit $ret" 0 1 2 3 15
-rm -f ./tags
+rm -f ./$TAGS_FILE
 
 IS_EXUBERANT=""
 ctags --version 2>&1 | grep Exuberant && IS_EXUBERANT="Y"
@@ -34,9 +59,17 @@ then	FLAGS="--c-kinds=+dfmstuv"
 else	FLAGS="-dt"
 fi
 
+# Use -I option to ignore a macro
+if [ "$IS_EXUBERANT" ]
+then	IGNORE_IDENTIFIES="-I pg_node_attr+"
+else	IGNORE_IDENTIFIES=
+fi
+
 # this is outputting the tags into the file 'tags', and appending
-find `pwd`/ -type f -name '*.[chyl]' -print |
-	xargs ctags -a -f tags "$FLAGS"
+find `pwd`/ \( -name tmp_install -prune -o -name tmp_check -prune \) \
+	-o \( -name "*.[chly]" -o -iname "*makefile*" -o -name "*.mk" -o -name "*.in" \
+	-o -name "*.sql" -o -name "*.p[lm]" \) -type f -print |
+	xargs ctags $MODE -a -f $TAGS_FILE "$FLAGS" "$IGNORE_IDENTIFIES"
 
 # Exuberant tags has a header that we cannot sort in with the other entries
 # so we skip the sort step
@@ -45,10 +78,13 @@ find `pwd`/ -type f -name '*.[chyl]' -print |
 if [ ! "$IS_EXUBERANT" ]
 then	LC_ALL=C
 	export LC_ALL
-	sort tags >/tmp/$$ && mv /tmp/$$ tags
+	sort $TAGS_FILE >/tmp/$$ && mv /tmp/$$ $TAGS_FILE
 fi
 
-find . \( -name 'CVS' -prune \) -o \( -name .git -prune \) -o -type d -print |
-while read DIR
-do	[ "$DIR" != "." ] && ln -f -s `echo "$DIR" | sed 's;/[^/]*;/..;g'`/tags "$DIR"/tags
-done
+# create symbolic links
+if [ ! "$NO_SYMLINK" ]
+thenfind . \( -name 'CVS' -prune \) -o \( -name .git -prune \) -o -type d -print |
+	while read DIR
+	do	[ "$DIR" != "." ] && ln -f -s `echo "$DIR" | sed 's;/[^/]*;/..;g'`/$TAGS_FILE "$DIR"/$TAGS_FILE
+	done
+fi
diff --git a/src/tools/make_etags b/src/tools/make_etags
index 9288ef7b14..afc57e3e89 100755
--- a/src/tools/make_etags
+++ b/src/tools/make_etags
@@ -1,16 +1,6 @@
 #!/bin/sh
-
 # src/tools/make_etags
 
-command -v etags >/dev/null || \
-	{ echo "'etags' program not found" 1>&2; exit 1; }
-
-rm -f ./TAGS
-
-find `pwd`/ -type f -name '*.[chyl]' -print |
-	xargs etags --append -o TAGS
-
-find . \( -name CVS -prune \) -o \( -name .git -prune \) -o -type d -print |
-while read DIR
-do	[ "$DIR" != "." ] && ln -f -s `pwd`/TAGS "$DIR"
-done
+cdir=`dirname $0`
+dir=`(cd $cdir && pwd)`
+exec $dir/make_ctags -e $*


Re: predefined role(s) for VACUUM and ANALYZE

2022-10-14 Thread Corey Huinker
>
> Sounds good.  Here's a new patch set with aclitem's typalign fixed.
>

Patch applies.
Passes make check and make check-world.
Test coverage seems adequate.

Coding is very clear and very much in the style of the existing code. Any
quibbles I have with the coding style are ones I have with the overall
pg-style, and this isn't the forum for that.

I haven't done any benchmarking yet, but it seems that the main question
will be the impact on ordinary DML statements.

I have no opinion about the design debate earlier in this thread, but I do
think that this patch is ready and adds something concrete to the ongoing
discussion.


Re: Refactor to introduce pg_strcoll().

2022-10-14 Thread Jeff Davis
On Thu, 2022-10-13 at 10:57 +0200, Peter Eisentraut wrote:
> It's a bit confusing that arguments must be NUL-terminated, but the 
> length is still specified.  Maybe another sentence to explain that
> would 
> be helpful.

Added a comment. It was a little frustrating to get a perfectly clean
API, because the callers do some buffer manipulation and optimizations
of their own. I think this is an improvement, but suggestions welcome.

If win32 is used with UTF-8 and wcscoll, it ends up allocating some
extra stack space for the temporary buffers, whereas previously it used
the buffers on the stack of varstr_cmp(). I'm not sure if that's a
problem or not.

> The length arguments ought to be of type size_t, I think.

Changed.

Thank you.


-- 
Jeff Davis
PostgreSQL Contributor Team - AWS


From 4d5664552a8a86418a94c37fd4ab8ca3a665c1cd Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Thu, 6 Oct 2022 10:46:36 -0700
Subject: [PATCH] Refactor: introduce pg_strcoll().

Isolate collation routines into pg_locale.c and simplify varlena.c.
---
 src/backend/utils/adt/pg_locale.c | 180 ++
 src/backend/utils/adt/varlena.c   | 207 +-
 src/include/utils/pg_locale.h |   3 +-
 3 files changed, 184 insertions(+), 206 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 2b42d9ccd8..3eb6a67bdc 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1639,6 +1639,186 @@ pg_newlocale_from_collation(Oid collid)
 	return cache_entry->locale;
 }
 
+/*
+ * win32_utf8_wcscoll
+ *
+ * Convert UTF8 arguments to wide characters and invoke wcscoll() or
+ * wcscoll_l(). Will allocate on large input.
+ */
+#ifdef WIN32
+#define TEXTBUFLEN		1024
+static int
+win32_utf8_wcscoll(const char *arg1, size_t len1, const char *arg2,
+   size_t len2, pg_locale_t locale)
+{
+	char		a1buf[TEXTBUFLEN];
+	char		a2buf[TEXTBUFLEN];
+	char	   *a1p,
+			   *a2p;
+	int			a1len;
+	int			a2len;
+	int			r;
+	int			result;
+
+	if (len1 >= TEXTBUFLEN / 2)
+	{
+		a1len = len1 * 2 + 2;
+		a1p = palloc(a1len);
+	}
+	else
+	{
+		a1len = TEXTBUFLEN;
+		a1p = a1buf;
+	}
+	if (len2 >= TEXTBUFLEN / 2)
+	{
+		a2len = len2 * 2 + 2;
+		a2p = palloc(a2len);
+	}
+	else
+	{
+		a2len = TEXTBUFLEN;
+		a2p = a2buf;
+	}
+
+	/* API does not work for zero-length input */
+	if (len1 == 0)
+		r = 0;
+	else
+	{
+		r = MultiByteToWideChar(CP_UTF8, 0, arg1, len1,
+(LPWSTR) a1p, a1len / 2);
+		if (!r)
+			ereport(ERROR,
+	(errmsg("could not convert string to UTF-16: error code %lu",
+			GetLastError(;
+	}
+	((LPWSTR) a1p)[r] = 0;
+
+	if (len2 == 0)
+		r = 0;
+	else
+	{
+		r = MultiByteToWideChar(CP_UTF8, 0, arg2, len2,
+(LPWSTR) a2p, a2len / 2);
+		if (!r)
+			ereport(ERROR,
+	(errmsg("could not convert string to UTF-16: error code %lu",
+			GetLastError(;
+	}
+	((LPWSTR) a2p)[r] = 0;
+
+	errno = 0;
+#ifdef HAVE_LOCALE_T
+	if (locale)
+		result = wcscoll_l((LPWSTR) a1p, (LPWSTR) a2p, locale->info.lt);
+	else
+#endif
+		result = wcscoll((LPWSTR) a1p, (LPWSTR) a2p);
+	if (result == 2147483647)	/* _NLSCMPERROR; missing from mingw
+ * headers */
+		ereport(ERROR,
+(errmsg("could not compare Unicode strings: %m")));
+
+	if (a1p != a1buf)
+		pfree(a1p);
+	if (a2p != a2buf)
+		pfree(a2p);
+
+	return result;
+}
+#endif
+
+/*
+ * pg_strcoll
+ *
+ * Call ucol_strcollUTF8(), ucol_strcoll(), strcoll(), strcoll_l(), wcscoll(),
+ * or wcscoll_l() as appropriate for the given locale, platform, and database
+ * encoding. If the locale is not specified, use the database collation.
+ *
+ * Arguments must be NUL-terminated so they can be passed directly to
+ * strcoll(); but we also need the lengths to pass to ucol_strcoll().
+ *
+ * If the collation is deterministic, break ties with memcmp(), and then with
+ * the string length.
+ */
+int
+pg_strcoll(const char *arg1, size_t len1, const char *arg2, size_t len2,
+		   pg_locale_t locale)
+{
+	int result;
+
+#ifdef WIN32
+	/* Win32 does not have UTF-8, so we need to map to UTF-16 */
+	if (GetDatabaseEncoding() == PG_UTF8
+		&& (!locale || locale->provider == COLLPROVIDER_LIBC))
+	{
+		result = win32_utf8_wcscoll(arg1, len1, arg2, len2, locale);
+	}
+	else
+#endif			/* WIN32 */
+	if (locale)
+	{
+		if (locale->provider == COLLPROVIDER_ICU)
+		{
+#ifdef USE_ICU
+#ifdef HAVE_UCOL_STRCOLLUTF8
+			if (GetDatabaseEncoding() == PG_UTF8)
+			{
+UErrorCode	status;
+
+status = U_ZERO_ERROR;
+result = ucol_strcollUTF8(locale->info.icu.ucol,
+		  arg1, len1,
+		  arg2, len2,
+		  );
+if (U_FAILURE(status))
+	ereport(ERROR,
+			(errmsg("collation failed: %s", u_errorName(status;
+			}
+			else
+#endif
+			{
+int32_t		ulen1,
+			ulen2;
+UChar	   *uchar1,
+		   *uchar2;
+
+ulen1 = icu_to_uchar(, arg1, len1);
+ulen2 = icu_to_uchar(, arg2, len2);
+
+result = ucol_strcoll(locale->info.icu.ucol,
+	

Re: Avoid memory leaks during base backups

2022-10-14 Thread Cary Huang
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

Hello

I applied your v5 patch on the current master and run valgrind on it while 
doing a basebackup with simulated error. No memory leak related to backup is 
observed. Regression is also passing 

thank you

Cary Huang
HighGo Software Canada

Re: New docs chapter on Transaction Management and related changes

2022-10-14 Thread Robert Treat
On Fri, Oct 14, 2022 at 3:51 PM Bruce Momjian  wrote:
> Attached is the merged patch from all the great comments I received.  I
> have also rebuilt the docs with the updated patch:
>
> https://momjian.us/tmp/pgsql/
>

+   RELEASE SAVEPOINT also subcommits and destroys
+   all savepoints that were established after the named savepoint was
+   established. This means that any subtransactions of the named savepoint
+   will also be subcommitted and destroyed.

Wonder if we should be more explicit that data changes are preserved,
not destroyed... something like:
"This means that any changes within subtransactions of the named
savepoint will be subcommitted and those subtransactions will be
destroyed."


Robert Treat
https://xzilla.net




Re: archive modules

2022-10-14 Thread Nathan Bossart
On Fri, Oct 14, 2022 at 11:51:30AM -0700, Nathan Bossart wrote:
> On Fri, Oct 14, 2022 at 12:10:18PM +0530, Bharath Rupireddy wrote:
>> 2) I think we have a problem - set archive_mode and archive_library
>> and start the server, then set archive_command, reload the conf, see
>> [3] - the archiver needs to error out right? The archiver gets
>> restarted whenever archive_library changes but not when
>> archive_command changes. I think the right place for the error is
>> after or at the end of HandlePgArchInterrupts().
> 
> Good catch.  You are right, this is broken.  I believe that we need to
> check for the misconfiguration in HandlePgArchInterrupts() in addition to
> LoadArchiveLibrary().  I will work on fixing this.

As promised...

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 66312b53b8..9d0f3608c4 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3597,9 +3597,11 @@ include_dir 'conf.d'


 This parameter can only be set in the postgresql.conf
-file or on the server command line.  It is ignored unless
+file or on the server command line.  It is only used if
 archive_mode was enabled at server start and
-archive_library is set to an empty string.
+archive_library is set to an empty string.  If both
+archive_command and archive_library
+are set, archiving will fail.
 If archive_command is an empty string (the default) while
 archive_mode is enabled (and archive_library
 is set to an empty string), WAL archiving is temporarily
@@ -3624,7 +3626,9 @@ include_dir 'conf.d'

 The library to use for archiving completed WAL file segments.  If set to
 an empty string (the default), archiving via shell is enabled, and
- is used.  Otherwise, the specified
+ is used.  If both
+archive_command and archive_library
+are set, archiving will fail.  Otherwise, the specified
 shared library is used for archiving.  For more information, see
  and
 .
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 3868cd7bd3..39c2115943 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -801,7 +801,8 @@ HandlePgArchInterrupts(void)
 		archiveLibChanged = strcmp(XLogArchiveLibrary, archiveLib) != 0;
 		pfree(archiveLib);
 
-		if (archiveLibChanged)
+		if (archiveLibChanged ||
+			(XLogArchiveLibrary[0] != '\0' && XLogArchiveCommand[0] != '\0'))
 		{
 			/*
 			 * Call the currently loaded archive module's shutdown callback,
@@ -809,17 +810,25 @@ HandlePgArchInterrupts(void)
 			 */
 			call_archive_module_shutdown_callback(0, 0);
 
-			/*
-			 * Ideally, we would simply unload the previous archive module and
-			 * load the new one, but there is presently no mechanism for
-			 * unloading a library (see the comment above
-			 * internal_load_library()).  To deal with this, we simply restart
-			 * the archiver.  The new archive module will be loaded when the
-			 * new archiver process starts up.
-			 */
-			ereport(LOG,
-	(errmsg("restarting archiver process because value of "
-			"\"archive_library\" was changed")));
+			if (archiveLibChanged)
+			{
+/*
+ * Ideally, we would simply unload the previous archive module
+ * and load the new one, but there is presently no mechanism
+ * for unloading a library (see the comment above
+ * internal_load_library()).  To deal with this, we simply
+ * restart the archiver.  The new archive module will be loaded
+ * when the new archiver process starts up.
+ */
+ereport(LOG,
+		(errmsg("restarting archiver process because value of "
+"\"archive_library\" was changed")));
+			}
+			else
+ereport(ERROR,
+		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+		 errmsg("both archive_command and archive_library specified"),
+		 errdetail("Only one of archive_command, archive_library may be set.")));
 
 			proc_exit(0);
 		}
@@ -836,6 +845,12 @@ LoadArchiveLibrary(void)
 {
 	ArchiveModuleInit archive_init;
 
+	if (XLogArchiveLibrary[0] != '\0' && XLogArchiveCommand[0] != '\0')
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("both archive_command and archive_library specified"),
+ errdetail("Only one of archive_command, archive_library may be set.")));
+
 	memset(, 0, sizeof(ArchiveModuleCallbacks));
 
 	/*


[PATCH] comment fixes for delayChkptFlags

2022-10-14 Thread David Christensen
Enclosed is a trivial fix for a typo and misnamed field I noted when doing
some code review.

Best,

David


0001-fix-comment-typos-for-delayChkptFlags.patch
Description: Binary data


Re: New "single-call SRF" APIs are very confusingly named

2022-10-14 Thread Tom Lane
Melanie Plageman  writes:
> So, while I agree that the "Single" in SetSingleFuncCall() could be
> confusing given the name of ExprSingleResult, I feel like actually all
> of the names are somewhat wrong.

Maybe, but ExprSingleResult et al. have been there for decades and
are certainly embedded in a ton of third-party code.  It's a bit
late to rename them, whether you think they're confusing or not.
Maybe we can get away with changing names introduced in v15, but
even that I'm afraid will get some pushback.

Having said that, I'd probably have used names based on "materialize"
not "single" for what this code is doing.

regards, tom lane




Re: New "single-call SRF" APIs are very confusingly named

2022-10-14 Thread Melanie Plageman
On Thu, Oct 13, 2022 at 3:48 PM Andres Freund  wrote:
> I unfortunately just noticed this now, just after we released...
>
> In
>
> commit 9e98583898c347e007958c8a09911be2ea4acfb9
> Author: Michael Paquier 
> Date:   2022-03-07 10:26:29 +0900
>
> Create routine able to set single-call SRFs for Materialize mode
>
>
> a new helper was added:
>
> #define SRF_SINGLE_USE_EXPECTED 0x01/* use expectedDesc as tupdesc */
> #define SRF_SINGLE_BLESS0x02/* validate tuple for SRF
*/
> extern void SetSingleFuncCall(FunctionCallInfo fcinfo, bits32 flags);
>
>
> I think the naming here is very poor. For one, "Single" here conflicts
with
> ExprSingleResult which indicates "expression does not return a set",
> i.e. precisely the opposite what SetSingleFuncCall() is used for. For
another
> the "Set" in SetSingleFuncCall makes it sound like it's function setting a
> property.
>
> Even leaving the confusion with ExprSingleResult aside, calling it
"Single"
> still seems very non-descriptive. I assume it's named to contrast with
> init_MultiFuncCall() etc. While those are also not named greatly, they're
not
> typically used directly but wraped in SRF_FIRSTCALL_INIT etc.

So, while I agree that the "Single" in SetSingleFuncCall() could be
confusing given the name of ExprSingleResult, I feel like actually all
of the names are somewhat wrong.

ExprSingleResult, ExprMultipleResult, and ExprEndResult are used as
values of ReturnSetInfo->isDone, used in value-per-call mode to indicate
whether or not a given value is the last or not. The comment on
ExprSingleResult says it means "expression does not return a set",
however, in Materialize mode (which is for functions returning a set),
isDone is supposed to be set to ExprSingleResult.

Take this code in ExecMakeFunctionResultSet()

  else if (rsinfo.returnMode == SFRM_Materialize)
  {
  /* check we're on the same page as the function author */
  if (rsinfo.isDone != ExprSingleResult)

So, isDone is used for a different purpose in value-per-call and
materialize modes (and with pretty contradictory definitions) which is
pretty confusing.

Besides that, it is not clear to me that ExprMultipleResult conveys that
the result is a member of or an element of a set. Perhaps it should be
ExprSetMemberResult and instead of using ExprSingleResult for
Materialize mode there should be another enum value that indicates "not
used" or "materialize mode". It could even be ExprSetResult -- since the
whole result is a set. Though that may be confusing since isDone is not
used for Materialize mode except to ensure "we're on the same page as
the function author".

Expr[Single|Multiple]Result aside, I do see how SINGLE/Single when used
for a helper function that does set up for SFRM_Materialize mode
functions is confusing.

The routines for SFRM_ValuePerCall all use multi, so I don't think it
was unreasonable to use single. However, I agree it would be better to
use something else (probably materialize).

The different dimensions requiring distinction are:
- returns a set (Y/N)
- called multiple times to produce a single result (Y/N)
- builds a tuplestore for result set (Y/N)

SFRM_Materialize comment says "result set instantiated in Tuplestore" --
So, I feel like the question is, does a function which returns its
entire result set in a single invocation have to do so using a
tuplestore and does one that returns part of its result set on each
invocation have to do so without a tuplestore (does each invocation have
to return a scalar or tuple)?

The current implementation may not support it, but it doesn't seem like
using a tuplestore and returning all elements of the result set vs some
of them in one invocation are alternatives.

It might be better if the SetFunctionReturnMode stuck to distinguishing
between functions returning their entire result in one invocation or
part of their result in one invocation.

That being said, the current SetSingleFuncCall() makes the tuplestore
and ensures the TupleDesc required by Materialize mode is set or
created. Since it seems only to apply to Materialize mode, I am in favor
of using "materialize" instead of "single".

> Maybe something like InitMaterializedSRF() w/
> MAT_SRF_(USE_EXPECTED_DESC|BLESS)

I also agree that starting the function name with Set isn't the best. I
like InitMaterializedSRF() and MAT_SRF_USE_EXPECTED_TUPLE_DESC. Are there
other kinds of descs?

Also, "single call" and "multi call" are confusing because they kind of
seem like they are describing a behavior of the function limiting the
number of times it can be called. Perhaps the multi* function names
could eventually be renamed something to convey how much of a function's
result can be expected to be produced on an invocation.

To summarize, I am in support of renaming SetSingleFuncCall() ->
InitMaterializedSRF() and SRF_SINGLE_USE_EXPECTED ->
MAT_SRF_USE_EXPECTED_TUPLE_DESC (or just DESC) as suggested elsewhere in
this thread. And I think we should 

Re: Add index scan progress to pg_stat_progress_vacuum

2022-10-14 Thread Imseih (AWS), Sami
Thank you for the feedback!

>While it seems to be a good idea to have the atomic counter for the
>number of indexes completed, I think we should not use the global
>variable referencing the counter as follow:

>+static pg_atomic_uint32 *index_vacuum_completed = NULL;
>:
>+void
>+parallel_vacuum_progress_report(void)
>+{
>+   if (IsParallelWorker() || !report_parallel_vacuum_progress)
>+   return;
>+
>+   pgstat_progress_update_param(PROGRESS_VACUUM_INDEX_COMPLETED,
>+
> pg_atomic_read_u32(index_vacuum_completed));
>+}

>I think we can pass ParallelVacuumState (or PVIndStats) to the
>reporting function so that it can check the counter and report the
>progress.

Yes, that makes sense.

>---
>I am not too sure that the idea of using the vacuum delay points is the 
> best
>plan. I think we should try to avoid piggybacking on such general
>infrastructure if we can, and instead look for a way to tie this to
>   something that is specific to parallel vacuum.
>---

Adding the call to vacuum_delay_point made sense since it's
called in all major vacuum scans. While it is also called
by analyze, it will only do anything if the caller sets a flag
to report_parallel_vacuum_progress.


 >   Instead, I think we can add a boolean and the pointer for
 >   ParallelVacuumState to IndexVacuumInfo. If the boolean is true, an
 >   index AM can call the reporting function with the pointer to
 >   ParallelVacuumState while scanning index blocks, for example, for
 >   every 1GB. The boolean can be true only for the leader process.

Will doing this every 1GB be necessary? Considering the function
will not do much more than update progress from the value
stored in index_vacuum_completed?


>   Agreed, but with the following change, the leader process waits in a
>busy loop, which should not be acceptable:

Good point.

>Also, I think it's better to check whether idx_completed_progress
equals to the number indexes instead.

Good point

> 5. Went back to the idea of adding a new view called 
pg_stat_progress_vacuum_index
> which is accomplished by adding a new type called VACUUM_PARALLEL in 
progress.h

>Probably, we can devide the patch into two patches. One for adding the

Makes sense.

Thanks

Sami Imseih
Amazon Web Services (AWS)





Re: New docs chapter on Transaction Management and related changes

2022-10-14 Thread Bruce Momjian
On Fri, Oct 14, 2022 at 12:22:35PM +0100, Simon Riggs wrote:
> > > +   
> > > +The parent xid of each subxid is recorded in the
> > > +pg_subtrans directory. No entry is made for
> > > +top-level xids since they do not have a parent, nor is an entry made
> > > +for read-only subtransactions.
> > > +   
> >
> > Maybe say "the immediate parent xid of each ...", or is it too obvious?
> 
> +1 to all of those suggestions

Attached is the merged patch from all the great comments I received.  I
have also rebuilt the docs with the updated patch:

https://momjian.us/tmp/pgsql/

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 66312b53b8..024a3c5101 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7224,12 +7224,14 @@ local0.*/var/log/postgresql
 
 
  %v
- Virtual transaction ID (backendID/localXID)
+ Virtual transaction ID (backendID/localXID);  see
+ 
  no
 
 
  %x
- Transaction ID (0 if none is assigned)
+ Transaction ID (0 if none is assigned);  see
+ 
  no
 
 
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index b030b36002..fdffba4442 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4992,7 +4992,8 @@ WHERE ...
 xmin and xmax.  Transaction identifiers are 32-bit quantities.
 In some contexts, a 64-bit variant xid8 is used.  Unlike
 xid values, xid8 values increase strictly
-monotonically and cannot be reused in the lifetime of a database cluster.
+monotonically and cannot be reused in the lifetime of a database
+cluster.  See  for more details.

 

diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index de450cd661..0d6be9a2fa 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -104,6 +104,7 @@
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index b5a2f94c4e..1e0d587932 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -24676,7 +24676,10 @@ SELECT collation for ('foo' COLLATE "de_DE");

 Returns the current transaction's ID.  It will assign a new one if the
 current transaction does not have one already (because it has not
-performed any database updates).
+performed any database updates);  see  for details.  If executed in a
+subtransaction this will return the top-level xid;  see  for details.

   
 
@@ -24693,6 +24696,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
 ID is assigned yet.  (It's best to use this variant if the transaction
 might otherwise be read-only, to avoid unnecessary consumption of an
 XID.)
+If executed in a subtransaction this will return the top-level xid.

   
 
@@ -24736,6 +24740,8 @@ SELECT collation for ('foo' COLLATE "de_DE");

 Returns a current snapshot, a data structure
 showing which transaction IDs are now in-progress.
+Only top-level xids are included in the snapshot; subxids are not
+shown;  see  for details.

   
 
@@ -24790,7 +24796,8 @@ SELECT collation for ('foo' COLLATE "de_DE");
 Is the given transaction ID visible according
 to this snapshot (that is, was it completed before the snapshot was
 taken)?  Note that this function will not give the correct answer for
-a subtransaction ID.
+a subtransaction ID (subxid);  see  for
+details.

   
  
@@ -24802,8 +24809,9 @@ SELECT collation for ('foo' COLLATE "de_DE");
 wraps around every 4 billion transactions.  However,
 the functions shown in  use a
 64-bit type xid8 that does not wrap around during the life
-of an installation, and can be converted to xid by casting if
-required.  The data type pg_snapshot stores information about
+of an installation and can be converted to xid by casting if
+required;  see  for details. 
+The data type pg_snapshot stores information about
 transaction ID visibility at a particular moment in time.  Its components
 are described in .
 pg_snapshot's textual representation is
@@ -24849,7 +24857,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
 xmax and not in this list was already completed at the time
 of the snapshot, and thus is either visible or dead according to its
 commit status.  This list does not include the transaction IDs of
-subtransactions.
+subtransactions (subxids).

   
  
diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
index 

Re: New docs chapter on Transaction Management and related changes

2022-10-14 Thread Bruce Momjian
On Fri, Oct 14, 2022 at 10:46:15AM +0200, Álvaro Herrera wrote:
> +1 for this new chapter.  This latest patch looks pretty good.  I think
> that introducing the concept of "sub-commit" as in Simon's follow-up
> patch clarifies things, though the word itself looks very odd.  Maybe
> it's okay.  The addition of the savepoint example looks good also.

Yes, I like that term since it isn't a permament commit.

> On 2022-Oct-13, Bruce Momjian wrote:
> 
> > +  
> > +   PostgreSQL supports a two-phase commit (2PC)
> > +   protocol that allows multiple distributed systems to work together
> > +   in a transactional manner.  The commands are PREPARE
> > +   TRANSACTION, COMMIT PREPARED and
> 
> I suggest/request that we try to avoid breaking tagged constants in
> DocBook; doing so makes it much easier to miss them later when grepping
> for them (don't laugh, it has happened to me).  Also, it breaks
> formatting in some weird cases.  I know this makes editing a bit harder
> because you can't just reflow with your editor like you would normal
> text.  So this'd be:
> 
> +   in a transactional manner.  The commands are PREPARE 
> TRANSACTION,
> +   COMMIT PREPARED and
> 
> with whatever word wrapping you like, except breaking between PREPARE
> and TRANSACTION.

Uh, I do a lot of word wraps and I don't think I can reaonably avoid
these splits.

> 
> > +   
> > +In addition to vxid and xid,
> > +when a transaction is prepared it is also identified by a Global
> > +Transaction Identifier (GID). GIDs
> > +are string literals up to 200 bytes long, which must be
> > +unique amongst other currently prepared transactions.
> > +The mapping of GID to xid is shown in  > +
> > linkend="view-pg-prepared-xacts">pg_prepared_xacts.
> > +   
> 
> Maybe say "is prepared for two-phase commit", to make the topic of this
> paragraph more obvious?

Agreed.

> > +   
> > +The parent xid of each subxid is recorded in the
> > +pg_subtrans directory. No entry is made for
> > +top-level xids since they do not have a parent, nor is an entry made
> > +for read-only subtransactions.
> > +   
> 
> Maybe say "the immediate parent xid of each ...", or is it too obvious?

Agreed with your wording.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson





Re: New docs chapter on Transaction Management and related changes

2022-10-14 Thread Bruce Momjian
On Fri, Oct 14, 2022 at 01:05:14PM +0100, Simon Riggs wrote:
> On Fri, 14 Oct 2022 at 08:55, Simon Riggs  
> wrote:
> 
> > In related changes, I've also done some major rewording of the RELEASE
> > SAVEPOINT command, since it was incorrectly described as having "no
> > other user visible behavior". A complex example is given to explain,
> > using the terminology established in the main patch.
> 
> ROLLBACK TO SAVEPOINT also needs some clarification, patch attached.
> 
> (Commentary: It's confusing to me that ROLLBACK TO SAVEPOINT starts a
> new subtransaction, whereas RELEASE SAVEPOINT does not. You might
> imagine they would both start a new subtransaction, but that is not
> the case.)

Agreed, added.


-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson





Re: New docs chapter on Transaction Management and related changes

2022-10-14 Thread Bruce Momjian
On Fri, Oct 14, 2022 at 08:55:05AM +0100, Simon Riggs wrote:
> On Thu, 13 Oct 2022 at 23:06, Bruce Momjian  wrote:
> >
> > Thanks, updated patch attached.  You can see the output at:
> 
> Thank you for your work to tighten and cleanup this patch, much appreciated.
> 
> I had two minor typos, plus a slight rewording to avoid using the word
> "global" in the last section, since that is associated with
> distributed or 2PC transactions. For those changes, I provide a
> patch-on-patch so you can see clearly.

Yes, I didn't like global either --- I like your wording.  I added your
other changes too, with slight rewording.  Merged patch to be posted in
a later email.

> In related changes, I've also done some major rewording of the RELEASE
> SAVEPOINT command, since it was incorrectly described as having "no
> other user visible behavior". A complex example is given to explain,
> using the terminology established in the main patch.

Okay, added.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson





Re: Question about pull_up_sublinks_qual_recurse

2022-10-14 Thread Tom Lane
Andy Fan  writes:
> After some more self review,  I find my proposal has the following side
> effects.

Yeah, I do not think this works at all.  The mechanism as it stands
right now is that we can insert pulled-up semijoins above j->larg
if they only need variables from relations in j->larg, and we can
insert them above j->rarg if they only need variables from relations
in j->rarg.  You can't just ignore that distinction and insert them
somewhere further up the tree.  Per the comment in
pull_up_sublinks_jointree_recurse:

 * Now process qual, showing appropriate child relids as available,
 * and attach any pulled-up jointree items at the right place. In the
 * inner-join case we put new JoinExprs above the existing one (much
 * as for a FromExpr-style join).  In outer-join cases the new
 * JoinExprs must go into the nullable side of the outer join. The
 * point of the available_rels machinations is to ensure that we only
 * pull up quals for which that's okay.

If the pulled-up join doesn't go into the nullable side of the upper
join then you've changed semantics.  In this case, it'd amount to
reassociating a semijoin that was within the righthand side of another
semijoin to above that other semijoin.  The discussion of outer join
reordering in optimizer/README says that that doesn't work, and while
I'm too lazy to construct an example right now, I believe it's true.

regards, tom lane




Re: Tracking last scan time

2022-10-14 Thread Dave Page
On Fri, 14 Oct 2022 at 19:16, Andres Freund  wrote:

> Hi,
>
> On 2022-10-13 14:38:06 +0100, Dave Page wrote:
> > Thanks for that. It looks good to me, bar one comment (repeated 3 times
> in
> > the sql and expected files):
> >
> > fetch timestamps from before the next test
> >
> > "from " should be removed.
>
> I was trying to say something with that from, but clearly it wasn't
> understandable :). Removed.
>
> With that I pushed the changes and marked the CF entry as committed.


Thanks!


> --
-- 
Dave Page
https://pgsnake.blogspot.com

EDB Postgres
https://www.enterprisedb.com


Re: archive modules

2022-10-14 Thread Nathan Bossart
On Fri, Oct 14, 2022 at 12:10:18PM +0530, Bharath Rupireddy wrote:
> +ereport(ERROR,
> +(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("both archive_command and archive_library 
> specified"),
> + errdetail("Only one of archive_command,
> archive_library may be set.")));
> 
> The above errmsg looks informational. Can we just say something like
> below?  It doesn't require errdetail as the errmsg says it all. See
> the other instances elsewhere [2].
> 
> ereport(ERROR,
> (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>  errmsg("cannot specify both \"archive_command\" and
> \"archive_library\"")));

I modeled this after the ERROR that error_multiple_recovery_targets()
emits.  I don't think there's really any material difference between your
proposal and mine, but I don't have a strong opinion.

> 2) I think we have a problem - set archive_mode and archive_library
> and start the server, then set archive_command, reload the conf, see
> [3] - the archiver needs to error out right? The archiver gets
> restarted whenever archive_library changes but not when
> archive_command changes. I think the right place for the error is
> after or at the end of HandlePgArchInterrupts().

Good catch.  You are right, this is broken.  I believe that we need to
check for the misconfiguration in HandlePgArchInterrupts() in addition to
LoadArchiveLibrary().  I will work on fixing this.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: thinko in basic_archive.c

2022-10-14 Thread Nathan Bossart
On Fri, Oct 14, 2022 at 02:15:19PM +0530, Bharath Rupireddy wrote:
> Given that temp file name includes WAL file name, epoch to
> milliseconds scale and MyProcPid, can there be name collisions after a
> server crash or even when multiple servers with different pids are
> archiving/copying the same WAL file to the same directory?

While unlikely, I think it's theoretically possible.  If there is a
collision, archiving should fail and retry later with a different temporary
file name.

> What happens to the left-over temp files after a server crash? Will
> they be lying around in the archive directory? I understand that we
> can't remove such files because we can't distinguish left-over files
> from a crash and the temp files that another server is in the process
> of copying.

The temporary files are not automatically removed after a crash.  The
documentation for basic archive has a note about this [0].

> If the goal is to copy files atomically, why can't we name the temp
> file 'wal_file_name.pid.temp', assuming no PID wraparound and get rid
> of appending time? Since basic_archive is a test module illustrating
> archive_library implementation, do we really need to worry about name
> collisions?

Yeah, it's debatable how much we care about this for basic_archive.  We
previously decided that we at least care a little [1], so that's why we
have such elaborate temporary file names.  If anything, I hope that the
presence of this logic causes archive module authors to think about these
problems.

> The patch LGTM.

Thanks!

[0] https://www.postgresql.org/docs/devel/basic-archive.html#id-1.11.7.15.6
[1] 
https://postgr.es/m/CA%2BTgmoaSkSmo22SwJaV%2BycNPoGpxe0JV%3DTcTbh4ip8Cwjr0ULQ%40mail.gmail.com

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-10-14 Thread Andres Freund
Hi,

On 2022-10-14 10:40:11 +0530, Amit Kapila wrote:
> On Fri, Oct 14, 2022 at 2:25 AM Andres Freund  wrote:
> >
> > >
> > > Attached are two patches. The first patch is what Robert has proposed
> > > with some changes in comments to emphasize the fact that cleanup lock
> > > on the new bucket is just to be consistent with the old bucket page
> > > locking as we are initializing it just before checking for cleanup
> > > lock. In the second patch, I removed the acquisition of cleanup lock
> > > on the new bucket page and changed the comments/README accordingly.
> > >
> > > I think we can backpatch the first patch and the second patch can be
> > > just a HEAD-only patch. Does that sound reasonable to you?
> >
> > Not particularly, no. I don't understand how "overwrite a page and then get 
> > a
> > cleanup lock" can sensibly be described by this comment:
> >
> > > +++ b/src/backend/access/hash/hashpage.c
> > > @@ -807,7 +807,8 @@ restart_expand:
> > >* before changing the metapage's mapping info, in case we can't 
> > > get the
> > >* disk space.  Ideally, we don't need to check for cleanup lock on 
> > > new
> > >* bucket as no other backend could find this bucket unless meta 
> > > page is
> > > -  * updated.  However, it is good to be consistent with old bucket 
> > > locking.
> > > +  * updated and we initialize the page just before it.  However, it 
> > > is just
> > > +  * to be consistent with old bucket locking.
> > >*/
> > >   buf_nblkno = _hash_getnewbuf(rel, start_nblkno, MAIN_FORKNUM);
> > >   if (!IsBufferCleanupOK(buf_nblkno))
> >
> > This is basically saying "I am breaking basic rules of locking just to be
> > consistent", no?
> >
> 
> Fair point. How about something like: "XXX Do we really need to check
> for cleanup lock on the new bucket? Here, we initialize the page, so
> ideally we don't need to perform any operation that requires such a
> check."?.

This still seems to omit that the code is quite broken.

> Feel free to suggest something better.

How about something like:

  XXX: This code is wrong, we're overwriting the buffer before "acquiring" the
  cleanup lock. Currently this is not known to have bad consequences because
  XYZ and the fix seems a bit too risky for the backbranches.

Greetings,

Andres Freund




Re: Tracking last scan time

2022-10-14 Thread Andres Freund
Hi,

On 2022-10-13 14:38:06 +0100, Dave Page wrote:
> Thanks for that. It looks good to me, bar one comment (repeated 3 times in
> the sql and expected files):
> 
> fetch timestamps from before the next test
> 
> "from " should be removed.

I was trying to say something with that from, but clearly it wasn't
understandable :). Removed.

With that I pushed the changes and marked the CF entry as committed.

Thanks for the feature Dave and the reviews everyone.

Greetings,

Andres Freund




Re: Bug: pg_regress makefile does not always copy refint.so

2022-10-14 Thread Alvaro Herrera
On 2022-Oct-14, Donghang Lin wrote:

> Hi hackers,
> 
> When I was building pg_regress, it didn’t always copy a rebuilt version of 
> refint.so to the folder.
> 
> Steps to reproduce:
> OS: centos7
> PSQL version: 14.5
> 
> 1. configure and build postgres
> 2. edit file src/include/utils/rel.h so that contrib/spi will rebuild
> 3. cd ${builddir}/src/test/regress
> 4. make
> We’ll find refint.so is rebuilt in contrib/spi, but not copied over to 
> regress folder.
> While autoinc.so is rebuilt and copied over.

I have a somewhat-related-but-not-really complaint.  I recently had need to
have refint.so, autoinc.so and regress.so in the install directory; but it
turns out that there's no provision at all to get them installed.

Packagers have long have had a need for this; for example the postgresql-test
RPM file is built using this icky recipe:

%if %test
# tests. There are many files included here that are unnecessary,
# but include them anyway for completeness.  We replace the original
# Makefiles, however.
%{__mkdir} -p %{buildroot}%{pgbaseinstdir}/lib/test
%{__cp} -a src/test/regress %{buildroot}%{pgbaseinstdir}/lib/test
%{__install} -m 0755 contrib/spi/refint.so 
%{buildroot}%{pgbaseinstdir}/lib/test/regress
%{__install} -m 0755 contrib/spi/autoinc.so 
%{buildroot}%{pgbaseinstdir}/lib/test/regress

I assume that the DEB does something similar, but I didn't look.

I think it would be better to provide a Make rule to allow these files to be
installed.  I'll see about a proposed patch.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Saca el libro que tu religión considere como el indicado para encontrar la
oración que traiga paz a tu alma. Luego rebootea el computador
y ve si funciona" (Carlos Duclós)




Re: [PATCH] Reset single-row processing mode at end of pipeline commands queue

2022-10-14 Thread Alvaro Herrera
Hello Denis,

On 2022-Oct-07, Denis Laxalde wrote:

> I'm trying to make single-row mode and pipeline mode work together in
> Psycopg using libpq. I think there is something wrong with respect to the
> single-row mode flag, not being correctly reset, in some situations.
> 
> The minimal case I'm considering is (in a pipeline):
> * send query 1,
> * get its results in single-row mode,
> * send query 2,
> * get its results *not* in single-row mode.
> 
> It seems that, as the command queue in the pipeline is empty after getting
> the results of query 1, the single-row mode flag is not reset and is still
> active for query 2, thus leading to an unexpected PGRES_SINGLE_TUPLE status.
> 
> The attached patch demonstrates this in the test suite. It also suggests to
> move the statement resetting single-row mode up in pqPipelineProcessQueue(),
> before exiting the function when the command queue is empty in particular.

Your suggestion to move the code up seems correct to me.  Therefore, I
have pushed this including the added test code.  Thanks for an excellent
report and patch.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/




Re: dynamic result sets support in extended query protocol

2022-10-14 Thread Pavel Stehule
Hi


pá 14. 10. 2022 v 9:12 odesílatel Peter Eisentraut <
peter.eisentr...@enterprisedb.com> napsal:

> On 01.02.22 15:40, Peter Eisentraut wrote:
> > On 12.01.22 11:20, Julien Rouhaud wrote:
> >> Since you mentioned that this patch depends on the SHOW_ALL_RESULTS
> >> psql patch
> >> which is still being worked on, I'm not expecting much activity here
> >> until the
> >> prerequirements are done.  It also seems better to mark this patch as
> >> Waiting
> >> on Author as further reviews are probably not really needed for now.
> >
> > Well, a review on the general architecture and approach would have been
> > useful.  But I understand that without the psql work, it's difficult for
> > a reviewer to even get started on this patch.  It's also similarly
> > difficult for me to keep updating it.  So I'll set it to Returned with
> > feedback for now and take it off the table.  I want to get back to it
> > when the prerequisites are more settled.
>
> Now that the psql support for multiple result sets exists, I want to
> revive this patch.  It's the same as the last posted version, except now
> it doesn't require any psql changes or any weird test modifications
> anymore.
>
> (Old news: This patch allows declaring a cursor WITH RETURN in a
> procedure to make the cursor's data be returned as a result of the CALL
> invocation.  The procedure needs to be declared with the DYNAMIC RESULT
> SETS attribute.)
>

I did a quick test of this patch, and it is working pretty well.

I have two ideas.

1. there can be possibility to set "dynamic result sets" to unknown. The
behaviour of the "dynamic result sets" option is a little bit confusing. I
expect the number of result sets should be exactly the same as this number.
But the warning is raised only when this number is acrossed. For this
implementation the correct name should be like "max dynamic result sets" or
some like this. At this moment, I see this feature "dynamic result sets"
more confusing, and because the effect is just a warning, then I don't see
a strong benefit. I can see some benefit if I can declare so CALL will be
without dynamic result sets, or with exact number of dynamic result sets or
with unknown number of dynamic result sets. And if the result is not
expected, then an exception should be raised (not warning).

2. Unfortunately, it doesn't work nicely with pagers. It starts a pager for
one result, and waits for the end, and starts pager for the second result,
and waits for the end. There is not a possibility to see all results at one
time. The current behavior is correct, but I don't think it is user
friendly. I think I can teach pspg to support multiple documents. But I
need a more robust protocol and some separators - minimally an empty line
(but some ascii control char can be safer). As second step we can introduce
new psql option like PSQL_MULTI_PAGER, that can be used when possible
result sets is higher than 1

Regards

Pavel


Add regular expression testing for user name mapping in the peer authentication TAP test

2022-10-14 Thread Drouvot, Bertrand

Hi hackers,

while working on [1], I thought it could also be useful to add regular 
expression testing for user name mapping in the peer authentication TAP 
test.


This kind of test already exists in kerberos/t/001_auth.pl but the 
proposed one in the peer authentication testing would probably be more 
widely tested.


Please find attached a patch proposal to do so.

[1]: 
https://www.postgresql.org/message-id/4f55303e-62c1-1072-61db-fbfb30bd66c8%40gmail.com


Looking forward to your feedback,

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comdiff --git a/src/test/authentication/t/003_peer.pl 
b/src/test/authentication/t/003_peer.pl
index fc951dea06..4e2efbe5e3 100644
--- a/src/test/authentication/t/003_peer.pl
+++ b/src/test/authentication/t/003_peer.pl
@@ -23,18 +23,34 @@ sub reset_pg_hba
return;
 }
 
+# Delete pg_ident.conf from the given node, add a new entry to it
+# and then execute a reload to refresh it.
+sub reset_pg_ident
+{
+   my $node= shift;
+   my $map_name= shift;
+   my $system_user = shift;
+   my $pg_user = shift;
+
+   unlink($node->data_dir . '/pg_ident.conf');
+   $node->append_conf('pg_ident.conf', "$map_name $system_user $pg_user");
+   $node->reload;
+   return;
+}
+
 # Test access for a single role, useful to wrap all tests into one.
 sub test_role
 {
local $Test::Builder::Level = $Test::Builder::Level + 1;
 
-   my ($node, $role, $method, $expected_res, %params) = @_;
+   my ($node, $role, $method, $expected_res, $test_details, %params) = @_;
my $status_string = 'failed';
$status_string = 'success' if ($expected_res eq 0);
 
my $connstr = "user=$role";
my $testname =
- "authentication $status_string for method $method, role $role";
+ "authentication $status_string for method $method, role $role "
+ . $test_details;
 
if ($expected_res eq 0)
{
@@ -87,16 +103,43 @@ my $system_user =
 # Tests without the user name map.
 # Failure as connection is attempted with a database role not mapping
 # to an authorized system user.
-test_role($node, qq{testmapuser}, 'peer', 2,
+test_role(
+   $node, qq{testmapuser}, 'peer', 2,
+   'without user name map',
log_like => [qr/Peer authentication failed for user "testmapuser"/]);
 
 # Tests with a user name map.
-$node->append_conf('pg_ident.conf', qq{mypeermap $system_user testmapuser});
+reset_pg_ident($node, 'mypeermap', $system_user, 'testmapuser');
 reset_pg_hba($node, 'peer map=mypeermap');
 
 # Success as the database role matches with the system user in the map.
-test_role($node, qq{testmapuser}, 'peer', 0,
+test_role($node, qq{testmapuser}, 'peer', 0, 'with user name map',
log_like =>
  [qr/connection authenticated: identity="$system_user" method=peer/]);
 
+# Test with regular expression in user name map.
+my $last_system_user_char = substr($system_user, -1);
+
+# The regular expression matches.
+reset_pg_ident($node, 'mypeermap', qq{/^.*$last_system_user_char\$},
+   'testmapuser');
+test_role(
+   $node,
+   qq{testmapuser},
+   'peer',
+   0,
+   'with regular expression in user name map',
+   log_like =>
+ [qr/connection authenticated: identity="$system_user" method=peer/]);
+
+# The regular expression does not match.
+reset_pg_ident($node, 'mypeermap', '/^$', 'testmapuser');
+test_role(
+   $node,
+   qq{testmapuser},
+   'peer',
+   2,
+   'with regular expression in user name map',
+   log_like => [qr/no match in usermap "mypeermap" for user 
"testmapuser"/]);
+
 done_testing();


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-10-14 Thread Önder Kalacı
Hi,

Thanks for the review!


Here are some comments on v17 patch.
>
> 1.
> -LogicalRepRelMapEntry *
> +LogicalRepPartMapEntry *
>  logicalrep_partition_open(LogicalRepRelMapEntry *root,
>   Relation partrel,
> AttrMap *map)
>  {
>
> Is there any reason to change the return type of
> logicalrep_partition_open()? It
> seems ok without this change.
>

I think you are right, I probably needed that in some of my
earlier iterations of the patch, but now it seems redundant. Reverted back
to the original version.


>
> 2.
>
> +* of the relation cache entry (e.g., such as ANALYZE or
> +* CREATE/DROP index on the relation).
>
> "e.g." and "such as" mean the same. I think we remove one of them.
>

fixed


>
> 3.
> +$node_subscriber->poll_query_until(
> +   'postgres', q{select (idx_scan = 2) from pg_stat_all_indexes where
> indexrelname = 'test_replica_id_full_idx';}
> +) or die "Timed out while waiting for'check subscriber tap_sub_rep_full
> deletes one row via index";
> +
>
> +$node_subscriber->poll_query_until(
> +   'postgres', q{select (idx_scan = 1) from pg_stat_all_indexes where
> indexrelname = 'test_replica_id_full_idy';}
> +) or die "Timed out while waiting for'check subscriber tap_sub_rep_full
> deletes one row via index";
>
>
> "for'check" -> "for check"
>

fixed


>
> 3.
> +$node_subscriber->safe_psql('postgres',
> +   "SELECT pg_reload_conf();");
> +
> +# Testcase start: SUBSCRIPTION BEHAVIOR WITH ENABLE_INDEXSCAN
> +# 
> +
> +$node_subscriber->stop('fast');
> +$node_publisher->stop('fast');
> +
>
> "Testcase start" in the comment should be "Testcase end".
>
>
fixed


> 4.
> There seems to be a problem in the following scenario, which results in
> inconsistent data between publisher and subscriber.
>
> -- publisher
> CREATE TABLE test_replica_id_full (x int, y int);
> ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL;
> CREATE PUBLICATION tap_pub_rep_full FOR TABLE test_replica_id_full;
>
> -- subscriber
> CREATE TABLE test_replica_id_full (x int, y int);
> CREATE UNIQUE INDEX test_replica_id_full_idx ON test_replica_id_full(x);
> CREATE SUBSCRIPTION tap_sub_rep_full_0 CONNECTION 'dbname=postgres
> port=5432' PUBLICATION tap_pub_rep_full;
>
> -- publisher
> INSERT INTO test_replica_id_full VALUES (NULL,1);
> INSERT INTO test_replica_id_full VALUES (NULL,2);
> INSERT INTO test_replica_id_full VALUES (NULL,3);
> update test_replica_id_full SET x=1 where y=2;
>
> The data in publisher:
> postgres=# select * from test_replica_id_full order by y;
>  x | y
> ---+---
>| 1
>  1 | 2
>| 3
> (3 rows)
>
> The data in subscriber:
> postgres=# select * from test_replica_id_full order by y;
>  x | y
> ---+---
>| 2
>  1 | 2
>| 3
> (3 rows)
>
> There is no such problem on master branch.
>
>
Uff, the second problem reported regarding NULL values for this patch (both
by you). First, v18 contains the fix for the problem. It turns out that my
idea of treating all unique indexes (pkey, replica identity and unique
regular indexes) the same proved to be wrong.  The former two require all
the involved columns to have NOT NULL. The latter not.

This resulted in RelationFindReplTupleByIndex() to skip tuples_equal() for
regular unique indexes (e.g., non pkey/replid). Hence, the first NULL value
is considered the matching tuple. Instead, we should be doing a full tuple
equality check (e.g., tuples_equal). This is what v18 does. Also, add the
above scenario as a test.

I think we can probably skip tuples_equal() for unique indexes that consist
of only NOT NULL columns. However, that seems like an over-optimization. If
you have such a unique index, why not create a primary key anyway?  That's
why I don't see much value in compicating the code for that use case.

Thanks for the review & testing. I'll focus more on the NULL values on my
own testing as well. Still, I wanted to push my changes so that you can
also have a look if possible.

Attach v18.

Onder KALACI


v18_0001_use_index_on_subs_when_pub_rep_ident_full.patch
Description: Binary data


Re: Error for WITH options on partitioned tables

2022-10-14 Thread Karina Litskevich
Hi, Simon!

The new error message looks better. But I believe it is better to use
"parameters" instead of "options" as it is called "storage parameters"
in the documentation. I also believe it is better to report error in
partitioned_table_reloptions() (thanks to Japin Li for mentioning this
function) as it also fixes the error message in the following situation:

test=# CREATE TABLE parted_col_comment (a int, b text) PARTITION BY LIST
(a);
CREATE TABLE
test=# ALTER TABLE parted_col_comment SET (fillfactor=100);
ERROR:  unrecognized parameter "fillfactor"

I attached the new versions of patches.

I'm not sure about the errcode. May be it is better to report error with
ERRCODE_INVALID_OBJECT_DEFINITION for CREATE TABLE and with
ERRCODE_WRONG_OBJECT_TYPE for ALTER TABLE (as when you try "ALTER TABLE
partitioned INHERIT nonpartitioned;" an ERROR with ERRCODE_WRONG_OBJECT_TYPE
is reported). Then either the desired code should be passed to
partitioned_table_reloptions() or similar checks and ereport calls should be
placed in two different places. I'm not sure whether it is a good idea to
change the code in one of these ways just to change the error code.

Best regards,
Karina Litskevich
Postgres Professional: http://postgrespro.com/
From 6407538f0bfd3eb56f5a4574d8893f9494f2a810 Mon Sep 17 00:00:00 2001
From: Karina Litskevich 
Date: Fri, 14 Oct 2022 17:48:27 +0300
Subject: [PATCH v2 2/2] better error message for setting parameters for
 partitioned table

---
 src/backend/access/common/reloptions.c | 13 ++---
 src/test/regress/expected/alter_table.out  |  3 ++-
 src/test/regress/expected/create_table.out |  3 ++-
 3 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index 6458a9c276..75d6ff5040 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -1984,13 +1984,12 @@ build_local_reloptions(local_relopts *relopts, Datum options, bool validate)
 bytea *
 partitioned_table_reloptions(Datum reloptions, bool validate)
 {
-	/*
-	 * There are no options for partitioned tables yet, but this is able to do
-	 * some validation.
-	 */
-	return (bytea *) build_reloptions(reloptions, validate,
-	  RELOPT_KIND_PARTITIONED,
-	  0, NULL, 0);
+	if (validate && reloptions)
+		ereport(ERROR,
+errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+errmsg("cannot specify storage parameters for a partitioned table"),
+errhint("specify storage parameters on leaf partitions instead"));
+	return NULL;
 }
 
 /*
diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out
index cec7bfa1f1..a719ef22c7 100644
--- a/src/test/regress/expected/alter_table.out
+++ b/src/test/regress/expected/alter_table.out
@@ -3803,7 +3803,8 @@ ALTER TABLE partitioned ALTER COLUMN b TYPE char(5);
 ERROR:  cannot alter column "b" because it is part of the partition key of relation "partitioned"
 -- specifying storage parameters for partitioned tables is not supported
 ALTER TABLE partitioned SET (fillfactor=100);
-ERROR:  unrecognized parameter "fillfactor"
+ERROR:  cannot specify storage parameters for a partitioned table
+HINT:  specify storage parameters on leaf partitions instead
 -- partitioned table cannot participate in regular inheritance
 CREATE TABLE nonpartitioned (
 	a int,
diff --git a/src/test/regress/expected/create_table.out b/src/test/regress/expected/create_table.out
index 45c1a0a23e..f16be87729 100644
--- a/src/test/regress/expected/create_table.out
+++ b/src/test/regress/expected/create_table.out
@@ -989,7 +989,8 @@ Number of partitions: 0
 DROP TABLE parted_col_comment;
 -- specifying storage parameters for partitioned tables is not supported
 CREATE TABLE parted_col_comment (a int, b text) PARTITION BY LIST (a) WITH (fillfactor=100);
-ERROR:  unrecognized parameter "fillfactor"
+ERROR:  cannot specify storage parameters for a partitioned table
+HINT:  specify storage parameters on leaf partitions instead
 -- list partitioning on array type column
 CREATE TABLE arrlp (a int[]) PARTITION BY LIST (a);
 CREATE TABLE arrlp12 PARTITION OF arrlp FOR VALUES IN ('{1}', '{2}');
-- 
2.25.1

From 580566815bc5abd6193f86ddbd55fac1ac941873 Mon Sep 17 00:00:00 2001
From: Karina Litskevich 
Date: Fri, 14 Oct 2022 16:22:57 +0300
Subject: [PATCH v2 1/2] test parameters for partitioned table

---
 src/test/regress/expected/alter_table.out  | 3 +++
 src/test/regress/expected/create_table.out | 3 +++
 src/test/regress/sql/alter_table.sql   | 3 +++
 src/test/regress/sql/create_table.sql  | 3 +++
 4 files changed, 12 insertions(+)

diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out
index 346f594ad0..cec7bfa1f1 100644
--- a/src/test/regress/expected/alter_table.out
+++ b/src/test/regress/expected/alter_table.out
@@ -3801,6 +3801,9 @@ ALTER TABLE partitioned DROP COLUMN b;
 ERROR:  cannot drop column "b" 

Re: Fix error message for MERGE foreign tables

2022-10-14 Thread Tom Lane
Richard Guo  writes:
> Or maybe we can make it even earlier, when we expand an RTE for a
> partitioned table and add result tables to leaf_result_relids.

I'm not really on board with injecting command-type-specific logic into
completely unrelated places just so that we can throw an error a bit
earlier.  Alvaro's suggestion of make_modifytable seemed plausible,
not least because it avoids spending any effort when the command
couldn't be MERGE at all.

regards, tom lane




Re: [PATCH] Improve tab completion for ALTER TABLE on identity columns

2022-10-14 Thread Dagfinn Ilmari Mannsåker
Dagfinn Ilmari Mannsåker  writes:

> Hi Hackers,
>
> I noticed that psql has no tab completion around identity columns in
> ALTER TABLE, so here's some patches for that.

Added to the next commit fest:

https://commitfest.postgresql.org/40/3947/

- ilmari




Re: Move backup-related code to xlogbackup.c/.h

2022-10-14 Thread Bharath Rupireddy
On Fri, Oct 14, 2022 at 1:54 PM Alvaro Herrera  wrote:
>
> On 2022-Oct-13, Bharath Rupireddy wrote:
>
> > The pg_backup_start_callback() can just go ahead and reset
> > sessionBackupState. However, it leads us to the complete removal of
> > pg_backup_start_callback() itself and use do_pg_abort_backup()
> > consistently across, saving 20 LOC attached as v5-0001.
>
> OK, that's not bad -- but there is a fatal flaw here: do_pg_backup_start
> only sets sessionBackupState *after* it has finished setting things up,
> so if you only change it like this, do_pg_abort_backup will indeed run,
> but it'll do nothing because it hits the "quick exit" test.  Therefore,
> if a backup aborts while setting up, you'll keep running with forced
> page writes until next postmaster crash or restart.  Not good.

Ugh.

> ISTM we need to give another flag to the callback function besides
> emit_warning: one that says whether to test sessionBackupState or not.

I think this needs a new structure, something like below, which makes
things complex.
typedef struct pg_abort_backup_params
{
/* This tells whether or not the do_pg_abort_backup callback can
quickly exit. */
boolcan_quick_exit;
/* This tells whether or not the do_pg_abort_backup callback can
emit a warning. */
boolemit_warning;
} pg_abort_backup_params;

> I suppose the easiest way to do it with no other changes is to turn
> 'arg' into a bitmask.

This one too isn't good IMO.

> But alternatively, we could just remove emit_warning as a flag and have
> the warning be emitted always; then we can use the boolean for the other
> purpose.  I don't think the extra WARNING thrown during backup set-up is
> going to be a problem, since it will mostly never be seen anyway (and if
> you do see it, it's not a lie.)

+1 for this.

Please review the v6 patch-set further.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


v6-0001-Use-do_pg_abort_backup-consistently-across-the-ba.patch
Description: Binary data


v6-0002-Add-functions-for-xlogbackup.c-to-call-back-into-.patch
Description: Binary data


v6-0003-Move-backup-related-code-from-xlog.c-to-xlogbacku.patch
Description: Binary data


v6-0004-Move-backup-related-code-from-xlogfuncs.c-to-xlog.patch
Description: Binary data


Re: Fix error message for MERGE foreign tables

2022-10-14 Thread Richard Guo
On Fri, Oct 14, 2022 at 7:19 PM Richard Guo  wrote:

>
> On Fri, Oct 14, 2022 at 5:24 PM Alvaro Herrera 
> wrote:
>
>> Actually, I hadn't realized that the originally submitted patch had the
>> test in postgres_fdw only, but we really want it to catch any FDW, so it
>> needs to be somewhere more general.  The best place I found to put this
>> test is in make_modifytable ... I searched for some earlier place in the
>> planner to do it, but couldn't find anything.
>>
>> So what do people think about this?
>
>
> Good point. I agree that the test should be in a more general place.
>
> I wonder if we can make it earlier in grouping_planner() just before we
> add ModifyTablePath.
>

Or maybe we can make it even earlier, when we expand an RTE for a
partitioned table and add result tables to leaf_result_relids.

--- a/src/backend/optimizer/util/inherit.c
+++ b/src/backend/optimizer/util/inherit.c
@@ -627,6 +627,16 @@ expand_single_inheritance_child(PlannerInfo *root,
RangeTblEntry *parentrte,
  root->leaf_result_relids = bms_add_member(root->leaf_result_relids,
childRTindex);

+ if (parse->commandType == CMD_MERGE &&
+ childrte->relkind == RELKIND_FOREIGN_TABLE)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+  errmsg("cannot execute MERGE on relation \"%s\"",
+ RelationGetRelationName(childrel)),
+  errdetail_relkind_not_supported(childrte->relkind)));
+ }

Thanks
Richard


Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-10-14 Thread Drouvot, Bertrand

Hi,

On 10/14/22 7:30 AM, Michael Paquier wrote:

On Wed, Oct 12, 2022 at 08:17:14AM +0200, Drouvot, Bertrand wrote:

Indeed, ;-)


So, I have spent the last two days looking at all that, studying the
structure of the patch and the existing HEAD code,


Thanks!


The code could be split to tackle things step-by-step:
- One refactoring patch to introduce token_regcomp() and
token_regexec(), with the introduction of a new structure that
includes the compiled regexes.  (Feel free to counterargue about the
use of AuthToken for this purpose, of course!)
- Plug in the refactored logic for the lists of role names and
database names in pg_hba.conf.
- Handle the case of single host entries in pg_hba.conf.
--


I agree to work step-by-step.

While looking at it again now, I discovered that the new TAP test for 
the regexp on the hostname in ssl/002_scram.pl is failing on some of my 
tests environment (and not all..).


So, I agree with the dedicated steps you are proposing and that the 
"host case" needs a dedicated attention.


I'm not ignoring all the remarks you've just done up-thread, I'll 
address them and/or provide my feedback on them when I'll come back with 
the step-by-step sub patches.


Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-10-14 Thread Drouvot, Bertrand

Hi,

On 10/14/22 8:18 AM, Michael Paquier wrote:

On Fri, Oct 14, 2022 at 02:30:25PM +0900, Michael Paquier wrote:

First, as of HEAD, AuthToken is only used for elements in a list of
role and database names in hba.conf before filling in each HbaLine,
hence we limit its usage to the initial parsing.  The patch assigns an
optional regex_t to it, then extends the use of AuthToken for single
hostname entries in pg_hba.conf.  Things going first: shouldn't we
combine ident_user and "re" together in the same structure?  Even if
we finish by not using AuthToken to store the computed regex, it seems
to me that we'd better use the same base structure for pg_ident.conf
and pg_hba.conf.  While looking closely at the patch, we would expand
the use of AuthToken outside its original context.  I have also looked
at make_auth_token(), and wondered if it could be possible to have this
routine compile the regexes.  This approach would not stick with
pg_ident.conf though, as we validate the fields in each line when we
put our hands on ident_user and after the base validation of a line
(number of fields, etc.).  So with all that in mind, it feels right to
not use AuthToken at all when building each HbaLine and each
IdentLine, but a new, separate, structure.  We could call that an
AuthItem (string, its compiled regex) perhaps?  It could have its own
make() routine, taking in input an AuthToken and process
pg_regcomp().  Better ideas for this new structure would be welcome,
and the idea is that we'd store the post-parsing state of an
AuthToken to something that has a compiled regex.  We could finish by
using AuthToken at the end and expand its use, but it does not feel
completely right either to have a make() routine but not be able to
compile its regular expression when creating the AuthToken.


I have have sent this part too quickly.  As AuthTokens are used in
check_db() and check_role() when matching entries, it is more
intuitive to store the regex_t directly in it. 


Yeah, I also think this is the right place for it.


Changing IdentLine to
use a AuthToken makes the "quoted" part useless in this case, still it
could be used in Assert()s to make sure that the data is shaped as
expected at check-time, enforced at false when creating it in
parse_ident_line()?


I agree, that makes sense. I'll work on that.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: New docs chapter on Transaction Management and related changes

2022-10-14 Thread Simon Riggs
On Fri, 14 Oct 2022 at 08:55, Simon Riggs  wrote:

> In related changes, I've also done some major rewording of the RELEASE
> SAVEPOINT command, since it was incorrectly described as having "no
> other user visible behavior". A complex example is given to explain,
> using the terminology established in the main patch.

ROLLBACK TO SAVEPOINT also needs some clarification, patch attached.

(Commentary: It's confusing to me that ROLLBACK TO SAVEPOINT starts a
new subtransaction, whereas RELEASE SAVEPOINT does not. You might
imagine they would both start a new subtransaction, but that is not
the case.)

-- 
Simon Riggshttp://www.EnterpriseDB.com/


doc-rollback-to.v1.patch
Description: Binary data


Re: New docs chapter on Transaction Management and related changes

2022-10-14 Thread Simon Riggs
On Fri, 14 Oct 2022 at 09:46, Alvaro Herrera  wrote:
>
> +1 for this new chapter.  This latest patch looks pretty good.  I think
> that introducing the concept of "sub-commit" as in Simon's follow-up
> patch clarifies things, though the word itself looks very odd.  Maybe
> it's okay.  The addition of the savepoint example looks good also.
>
> On 2022-Oct-13, Bruce Momjian wrote:
>
> > +  
> > +   PostgreSQL supports a two-phase commit (2PC)
> > +   protocol that allows multiple distributed systems to work together
> > +   in a transactional manner.  The commands are PREPARE
> > +   TRANSACTION, COMMIT PREPARED and
>
> I suggest/request that we try to avoid breaking tagged constants in
> DocBook; doing so makes it much easier to miss them later when grepping
> for them (don't laugh, it has happened to me).  Also, it breaks
> formatting in some weird cases.  I know this makes editing a bit harder
> because you can't just reflow with your editor like you would normal
> text.  So this'd be:
>
> +   in a transactional manner.  The commands are PREPARE 
> TRANSACTION,
> +   COMMIT PREPARED and
>
> with whatever word wrapping you like, except breaking between PREPARE
> and TRANSACTION.
>
> > +   
> > +In addition to vxid and xid,
> > +when a transaction is prepared it is also identified by a Global
> > +Transaction Identifier (GID). GIDs
> > +are string literals up to 200 bytes long, which must be
> > +unique amongst other currently prepared transactions.
> > +The mapping of GID to xid is shown in  > +
> > linkend="view-pg-prepared-xacts">pg_prepared_xacts.
> > +   
>
> Maybe say "is prepared for two-phase commit", to make the topic of this
> paragraph more obvious?
>
> > +   
> > +Lock waits on table-level locks are shown waiting for
> > +virtualxid, while lock waits on row-level
> > +locks are shown waiting for transactionid.
> > +Row-level read and write locks are recorded directly in locked
> > +rows and can be inspected using the 
> > +extension.  Row-level read locks might also require the assignment
> > +of multixact IDs (mxid). Mxids are recorded in
> > +the pg_multixact directory.
> > +   
>
> Hmm, ok.
>
> > +   
> > +The parent xid of each subxid is recorded in the
> > +pg_subtrans directory. No entry is made for
> > +top-level xids since they do not have a parent, nor is an entry made
> > +for read-only subtransactions.
> > +   
>
> Maybe say "the immediate parent xid of each ...", or is it too obvious?

+1 to all of those suggestions

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Fix error message for MERGE foreign tables

2022-10-14 Thread Richard Guo
On Fri, Oct 14, 2022 at 5:24 PM Alvaro Herrera 
wrote:

> Actually, I hadn't realized that the originally submitted patch had the
> test in postgres_fdw only, but we really want it to catch any FDW, so it
> needs to be somewhere more general.  The best place I found to put this
> test is in make_modifytable ... I searched for some earlier place in the
> planner to do it, but couldn't find anything.
>
> So what do people think about this?


Good point. I agree that the test should be in a more general place.

I wonder if we can make it earlier in grouping_planner() just before we
add ModifyTablePath.

--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -1772,6 +1772,17 @@ grouping_planner(PlannerInfo *root, double
tuple_fraction)
 /* Build per-target-rel lists needed by ModifyTable */
 resultRelations = lappend_int(resultRelations,
   resultRelation);
+if (parse->commandType == CMD_MERGE &&
+this_result_rel->fdwroutine != NULL)
+{
+RangeTblEntry *rte = root->simple_rte_array[resultRelation];
+
+ereport(ERROR,
+errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+errmsg("cannot execute MERGE on relation \"%s\"",
+   get_rel_name(rte->relid)),
+errdetail_relkind_not_supported(rte->relkind));
+}

Thanks
Richard


Re: Improve description of XLOG_RUNNING_XACTS

2022-10-14 Thread Bharath Rupireddy
On Mon, Oct 3, 2022 at 1:46 PM Michael Paquier  wrote:
>
> On Fri, Sep 16, 2022 at 10:55:53AM +0900, Kyotaro Horiguchi wrote:
> > Putting an arbitrary upper-bound on the number of subxids to print
> > might work? I'm not sure how we can determine the upper-bound, though.
>
> You could hardcode it so as it does not blow up the whole view, say
> 20~30.  Anyway, I agree with the concern raised upthread about the
> amount of extra data this would add to the output, so having at least
> the number of subxids would be better than the existing state of
> things telling only if the list of overflowed.  So let's stick to
> that.

I spent some time today reading this. As others said upthread, the
output can be more verbose if all the backends are running max
subtransactions or subtransactions overflow occurred in all the
backends. This can blow-up the output.

Hard-limiting the number of subxids isn't a good idea because the
whole purpose of it is gone.

As Amit said upthread, we can't really link subtxns with the
corresponding txns by looking at the output of the pg_waldump. And I
understand that having the subxid info helped Mashaiko-san debug an
issue. Wouldn't it be better to have a SQL-callable function that can
return txn and all its subxid information? I'm not sure if it's useful
or worth at all because the contents are so dynamic. I'm not sure if
we have one already or if it's possible to have one such function.

> Here is another idea for the list of subxids: show the full list of
> subxids only when using --xid.  We could always introduce an extra
> switch, but that does not seem worth the extra workload here.

This seems interesting, but I agree that the extra code isn't worth it.

FWIW, I quickly looked at few other resource managers _desc
functions to find if they output all the record's info:

xlog_desc - doesn't show restart point timestamp for xl_restore_point
record type and
logicalmsg_desc - doesn't show the database id that generated the record
clog_desc - doesn't show oldest xact db of xl_clog_truncate record
and there may be more.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




RE: Perform streaming logical transactions by background workers and parallel apply

2022-10-14 Thread houzj.f...@fujitsu.com
On Wed, Oct 12, 2022 at 18:11 PM Peter Smith  wrote:
> Here are some review comments for v36-0001.

Thanks for your comments.

> ==
> 
> 1. GENERAL
> 
> Houzj wrote ([1] #3a):
> The word 'streaming' is the same as the actual option name, so 
> personally I think it's fine. But if others also agreed that the name 
> can be improved, I can change it.
> 
> ~
> 
> Sure, I was not really complaining that the name is "wrong". Only I 
> did not think it was a good idea to have multiple struct members 
> called 'streaming' when they don't have the same meaning. e.g. one is 
> the internal character mode equivalent of the parameter, and one is 
> the parameter value as a string. That's why I thought they should be 
> different names. e.g. Make the 2nd one 'streaming_valstr' or 
> something.

Changed.

> ==
> 
> 2. doc/src/sgml/config.sgml
> 
> Previously I suggested there should be xrefsto the "Configuration 
> Settings" page but Houzj wrote ([1] #4):
> Not sure about this as we don't have similar thing in the document of 
> max_logical_replication_workers and max_sync_workers_per_subscription.
> 
> ~
> 
> Fair enough, but IMO perhaps all those others should also xref to the 
> "Configuration Settings" chapter. So if such a change does not belong 
> in this patch, then how about if I make another independent thread to 
> post this suggestion?

Sure, I feel it would be better to do it in a separate thread.

> ==
> 
> .../replication/logical/applyparallelworker.c
> 
> 
> 3. parallel_apply_find_worker
> 
> +parallel_apply_find_worker(TransactionId xid) {  bool found;  
> +ParallelApplyWorkerEntry *entry = NULL;
> +
> + if (!TransactionIdIsValid(xid))
> + return NULL;
> +
> + if (ParallelApplyWorkersHash == NULL) return NULL;
> +
> + /* Return the cached parallel apply worker if valid. */ if 
> + (stream_apply_worker != NULL) return stream_apply_worker;
> +
> + /*
> + * Find entry for requested transaction.
> + */
> + entry = hash_search(ParallelApplyWorkersHash, , HASH_FIND, 
> + );
> 
> In function parallel_apply_start_worker() you removed the entry 
> assignment to NULL because it is never needed. Can do the same here 
> too.

Changed.

> 4. parallel_apply_free_worker
> 
> +/*
> + * Remove the parallel apply worker entry from the hash table. And 
> +stop the
> + * worker if there are enough workers in the pool. For more 
> +information about
> + * the worker pool, see comments atop worker.c.
> + */
> +void
> +parallel_apply_free_worker(ParallelApplyWorkerInfo *winfo, 
> +TransactionId
> xid)
> 
> "And stop" -> "Stop"

Changed.

> 5. parallel_apply_free_worker
> 
> + * Although some error messages may be lost in rare scenarios, but
> + * since the parallel apply worker has finished processing the
> + * transaction, and error messages may be lost even if we detach the
> + * error queue after terminating the process. So it should be ok.
> + */
> 
> SUGGESTION (minor rewording)
> Some error messages may be lost in rare scenarios, but it should be OK 
> because the parallel apply worker has finished processing the 
> transaction, and error messages may be lost even if we detached the 
> error queue after terminating the process.

Changed.


> ~~~
> 
> 7. LogicalParallelApplyLoop
> 
> Previous I suggested maybe the name (e.g. the 2nd param) should be 
> changed to "ParallelApplyMessageContext"? Houzj wrote ([1] #13): Not 
> sure about this, because ApplyMessageContext is used in both worker.c 
> and applyparallelworker.c.
> 
> ~
> 
> But I thought those are completely independent ApplyMessageContext's 
> in different processes that happen to have the same name. Shouldn't 
> they have a name appropriate to who owns them?

ApplyMessageContext is used by the begin_replication_step() function which will
be invoked in both leader and parallel apply worker. So, we need to name the
memory context the same as ApplyMessageContext, otherwise we would need to
modify the logic of begin_replication_step() to use another memory context if
in parallel apply worker.


> ~~~
> 
> 8. ParallelApplyWorkerMain
> 
> + /*
> + * Allocate the origin name in a long-lived context for error context
> + * message.
> + */
> + snprintf(originname, sizeof(originname), "pg_%u", 
> + MySubscription->oid);
> 
> Now that ReplicationOriginNameForLogicalRep patch is pushed [2] please 
> make use of this common function.

Changed.

> ~~~
> 
> 9. HandleParallelApplyMessage
> 
> + case 'X': /* Terminate, indicating clean exit */ { 
> + shm_mq_detach(winfo->error_mq_handle);
> + winfo->error_mq_handle = NULL;
> + break;
> + }
> +
> + /*
> + * Don't need to do anything about NoticeResponse and
> + * NotifyResponse as the logical replication worker doesn't need
> + * to send messages to the client.
> + */
> + case 'N':
> + case 'A':
> + break;
> + default:
> + {
> + elog(ERROR, "unrecognized message type received from parallel apply
> worker: %c (message length %d bytes)",
> + msgtype, msg->len);
> + }
> 
> 9a. case 'X':
> There are no variable 

Re: Fix error message for MERGE foreign tables

2022-10-14 Thread Alvaro Herrera
Actually, I hadn't realized that the originally submitted patch had the
test in postgres_fdw only, but we really want it to catch any FDW, so it
needs to be somewhere more general.  The best place I found to put this
test is in make_modifytable ... I searched for some earlier place in the
planner to do it, but couldn't find anything.

So what do people think about this?

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"La grandeza es una experiencia transitoria.  Nunca es consistente.
Depende en gran parte de la imaginación humana creadora de mitos"
(Irulan)
>From 09fba6a6f4caf2a987aa7ee5d77dbf74372ad1d0 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 14 Oct 2022 11:19:10 +0200
Subject: [PATCH v2] Disallow MERGE cleanly for foreign partitions

---
 .../postgres_fdw/expected/postgres_fdw.out|  4 
 contrib/postgres_fdw/sql/postgres_fdw.sql |  3 +++
 src/backend/optimizer/plan/createplan.c   | 20 +++
 3 files changed, 27 insertions(+)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 9746998751..bdeba8c291 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8284,6 +8284,10 @@ select tableoid::regclass, * FROM remp2;
 (3 rows)
 
 delete from itrtest;
+-- MERGE ought to fail cleanly
+merge into itrtest using (select 1, 'foo') on (true) when matched then do nothing;
+ERROR:  cannot execute MERGE on relation "remp1"
+DETAIL:  This operation is not supported for foreign tables.
 create unique index loct1_idx on loct1 (a);
 -- DO NOTHING without an inference specification is supported
 insert into itrtest values (1, 'foo') on conflict do nothing returning *;
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 1962051e54..514df43b06 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2207,6 +2207,9 @@ select tableoid::regclass, * FROM remp2;
 
 delete from itrtest;
 
+-- MERGE ought to fail cleanly
+merge into itrtest using (select 1, 'foo') on (true) when matched then do nothing;
+
 create unique index loct1_idx on loct1 (a);
 
 -- DO NOTHING without an inference specification is supported
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index ab4d8e201d..ac86ce9003 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -7078,12 +7078,32 @@ make_modifytable(PlannerInfo *root, Plan *subplan,
 			RelOptInfo *resultRel = root->simple_rel_array[rti];
 
 			fdwroutine = resultRel->fdwroutine;
+
+			/*
+			 * MERGE is not currently supported for foreign tables and we
+			 * already checked when the table mentioned in the query is
+			 * foreign; but we can still get here if a partitioned table has a
+			 * foreign table as partition.  Disallow that now, to avoid an
+			 * uglier error message later.
+			 */
+			if (operation == CMD_MERGE && fdwroutine != NULL)
+			{
+RangeTblEntry *rte = root->simple_rte_array[rti];
+
+ereport(ERROR,
+		errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+		errmsg("cannot execute MERGE on relation \"%s\"",
+			   get_rel_name(rte->relid)),
+		errdetail_relkind_not_supported(rte->relkind));
+			}
+
 		}
 		else
 		{
 			RangeTblEntry *rte = planner_rt_fetch(rti, root);
 
 			Assert(rte->rtekind == RTE_RELATION);
+			Assert(operation != CMD_MERGE);
 			if (rte->relkind == RELKIND_FOREIGN_TABLE)
 fdwroutine = GetFdwRoutineByRelId(rte->relid);
 			else
-- 
2.30.2



pub/sub - specifying optional parameters without values.

2022-10-14 Thread Peter Smith
Hi hackers.

This post is about parameter default values. Specifically. it's about
the CREATE PUBLICATION and CREATE SUBSCRIPTION syntax, although the
same issue might apply to other commands I am unaware of...

~~~

Background:

CREATE PUBLICATION syntax has a WITH clause:
[ WITH ( publication_parameter [= value] [, ... ] ) ]

CREATE SUBSCRIPTION syntax has a similar clause:
 [ WITH ( subscription_parameter [= value] [, ... ] ) ]

~~~

The docs describe all the parameters that can be specified. Parameters
are optional, so the docs describe the defaults if the parameter name
is not specified. However, notice that the parameter *value* part is
also optional.

So, what is the defined behaviour if a parameter name is specified but
no *value* is given?

In practice, it seems to just be a shorthand for assigning a boolean
value to true... BUT -

a) I can't find anywhere in the docs where it actually says this

b) Without documentation some might consider it to be strange that now
there are 2 kinds of defaults - a default when there is no name, and
another default when there is no value - and those are not always the
same. e.g. if publish_via_partition root is not specified at all, it
is equivalent of WITH (publish_via_partition_root=false), but OTOH,
WITH (publish_via_partition_root) is equivalent of WITH
(publish_via_partition_root=true).

c) What about non-boolean parameters? In practice, it seems they all
give errors:

test_pub=# CREATE PUBLICATION pub99 FOR ALL TABLES WITH (publish);
ERROR:  publish requires a parameter

test_sub=# CREATE SUBSCRIPTION sub1 CONNECTION 'host=localhost
dbname=test_pub' PUBLICATION pub1 WITH (slot_name);
ERROR:  slot_name requires a parameter

test_sub=# CREATE SUBSCRIPTION sub1 CONNECTION 'host=localhost
dbname=test_pub' PUBLICATION pub1 WITH (synchronous_commit);
ERROR:  synchronous_commit requires a parameter

~~~

It almost feels like this is an undocumented feature, except it isn't
quite undocumented because it is right there in black-and-white in the
syntax "[= value]". Or perhaps this implied boolean-true behaviour is
already described elsewhere? But if it is, I have not found it yet.

IMO a simple patch (PSA) is needed to clarify the behaviour.

Thoughts?

--
Kind Regards,
Peter Smith.
Fujitsu Australia


v1-0001-clarify-behavior-of-specifying-a-parameter-with-n.patch
Description: Binary data


Re: Fix error message for MERGE foreign tables

2022-10-14 Thread Alvaro Herrera
On 2022-Oct-14, Michael Paquier wrote:

> On Fri, Oct 14, 2022 at 12:26:19PM +0800, Richard Guo wrote:
> > Maybe something like below, so that we keep it consistent with the case
> > of a foreign table being specified as a target.
> > 
> > --- a/contrib/postgres_fdw/postgres_fdw.c
> > +++ b/contrib/postgres_fdw/postgres_fdw.c
> > @@ -1872,6 +1872,13 @@ postgresPlanForeignModify(PlannerInfo *root,
> >  returningList,
> >  _attrs);
> > break;
> > +   case CMD_MERGE:
> > +   ereport(ERROR,
> > +   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> > +errmsg("cannot execute MERGE on relation \"%s\"",
> > +   RelationGetRelationName(rel)),
> > +
> >  errdetail_relkind_not_supported(rel->rd_rel->relkind)));
> > +   break;
> 
> Yeah, you should not use an elog(ERROR) for cases that would be faced
> directly by users.

Yeah, I think this just flies undetected until it hits code that doesn't
support the case.  I'll add a test and push as Richard suggests, thanks.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"La libertad es como el dinero; el que no la sabe emplear la pierde" (Alvarez)




Re: New docs chapter on Transaction Management and related changes

2022-10-14 Thread Alvaro Herrera
+1 for this new chapter.  This latest patch looks pretty good.  I think
that introducing the concept of "sub-commit" as in Simon's follow-up
patch clarifies things, though the word itself looks very odd.  Maybe
it's okay.  The addition of the savepoint example looks good also.

On 2022-Oct-13, Bruce Momjian wrote:

> +  
> +   PostgreSQL supports a two-phase commit (2PC)
> +   protocol that allows multiple distributed systems to work together
> +   in a transactional manner.  The commands are PREPARE
> +   TRANSACTION, COMMIT PREPARED and

I suggest/request that we try to avoid breaking tagged constants in
DocBook; doing so makes it much easier to miss them later when grepping
for them (don't laugh, it has happened to me).  Also, it breaks
formatting in some weird cases.  I know this makes editing a bit harder
because you can't just reflow with your editor like you would normal
text.  So this'd be:

+   in a transactional manner.  The commands are PREPARE 
TRANSACTION,
+   COMMIT PREPARED and

with whatever word wrapping you like, except breaking between PREPARE
and TRANSACTION.

> +   
> +In addition to vxid and xid,
> +when a transaction is prepared it is also identified by a Global
> +Transaction Identifier (GID). GIDs
> +are string literals up to 200 bytes long, which must be
> +unique amongst other currently prepared transactions.
> +The mapping of GID to xid is shown in  +
> linkend="view-pg-prepared-xacts">pg_prepared_xacts.
> +   

Maybe say "is prepared for two-phase commit", to make the topic of this
paragraph more obvious?

> +   
> +Lock waits on table-level locks are shown waiting for
> +virtualxid, while lock waits on row-level
> +locks are shown waiting for transactionid.
> +Row-level read and write locks are recorded directly in locked
> +rows and can be inspected using the 
> +extension.  Row-level read locks might also require the assignment
> +of multixact IDs (mxid). Mxids are recorded in
> +the pg_multixact directory.
> +   

Hmm, ok.

> +   
> +The parent xid of each subxid is recorded in the
> +pg_subtrans directory. No entry is made for
> +top-level xids since they do not have a parent, nor is an entry made
> +for read-only subtransactions.
> +   

Maybe say "the immediate parent xid of each ...", or is it too obvious?

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Las cosas son buenas o malas segun las hace nuestra opinión" (Lisias)




Re: thinko in basic_archive.c

2022-10-14 Thread Bharath Rupireddy
On Fri, Oct 14, 2022 at 10:11 AM Nathan Bossart
 wrote:
>
> I intended for the temporary file name generated by basic_archive.c to

I'm trying to understand this a bit:

/*
 * Pick a sufficiently unique name for the temporary file so that a
 * collision is unlikely.  This helps avoid problems in case a temporary
 * file was left around after a crash or another server happens to be
 * archiving to the same directory.
 */

Given that temp file name includes WAL file name, epoch to
milliseconds scale and MyProcPid, can there be name collisions after a
server crash or even when multiple servers with different pids are
archiving/copying the same WAL file to the same directory?

What happens to the left-over temp files after a server crash? Will
they be lying around in the archive directory? I understand that we
can't remove such files because we can't distinguish left-over files
from a crash and the temp files that another server is in the process
of copying.

If the goal is to copy files atomically, why can't we name the temp
file 'wal_file_name.pid.temp', assuming no PID wraparound and get rid
of appending time? Since basic_archive is a test module illustrating
archive_library implementation, do we really need to worry about name
collisions?

> I've attached a small patch that fixes this so that the temporary file name
> includes the timestamp in milliseconds for when it was created.

The patch LGTM.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Fix error message for MERGE foreign tables

2022-10-14 Thread Michael Paquier
On Fri, Oct 14, 2022 at 12:26:19PM +0800, Richard Guo wrote:
> Maybe something like below, so that we keep it consistent with the case
> of a foreign table being specified as a target.
> 
> --- a/contrib/postgres_fdw/postgres_fdw.c
> +++ b/contrib/postgres_fdw/postgres_fdw.c
> @@ -1872,6 +1872,13 @@ postgresPlanForeignModify(PlannerInfo *root,
>  returningList,
>  _attrs);
> break;
> +   case CMD_MERGE:
> +   ereport(ERROR,
> +   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> +errmsg("cannot execute MERGE on relation \"%s\"",
> +   RelationGetRelationName(rel)),
> +
>  errdetail_relkind_not_supported(rel->rd_rel->relkind)));
> +   break;

Yeah, you should not use an elog(ERROR) for cases that would be faced
directly by users.
--
Michael


signature.asc
Description: PGP signature


Re: Move backup-related code to xlogbackup.c/.h

2022-10-14 Thread Michael Paquier
On Fri, Oct 14, 2022 at 10:24:41AM +0200, Alvaro Herrera wrote:
> However, what's most problematic about this patch is that it introduces
> a pretty serious bug, yet that bug goes unnoticed if you just run the
> builtin test suites.  I only noticed because I added an elog(ERROR,
> "oops") in the area protected by ENSURE_ERROR_CLEANUP and a debug
> elog(WARNING) in the cleanup area, then examined the server log after
> the pg_basebackup test filed; but this is not very workable.  I wonder
> what would be a good way to keep this in check.  The naive way seems to
> be to run a pg_basebackup, have it abort partway through (how?), then
> test the server and see if forced page writes are enabled or not.

See around the bottom of 010_pg_basebackup.pl, where a combination of
IPC::Run::start('pg_basebackup') with --max-rate and
pg_terminate_backend() is able to achieve that.
--
Michael


signature.asc
Description: PGP signature


Re: Move backup-related code to xlogbackup.c/.h

2022-10-14 Thread Alvaro Herrera
On 2022-Oct-13, Bharath Rupireddy wrote:

> The pg_backup_start_callback() can just go ahead and reset
> sessionBackupState. However, it leads us to the complete removal of
> pg_backup_start_callback() itself and use do_pg_abort_backup()
> consistently across, saving 20 LOC attached as v5-0001.

OK, that's not bad -- but there is a fatal flaw here: do_pg_backup_start
only sets sessionBackupState *after* it has finished setting things up,
so if you only change it like this, do_pg_abort_backup will indeed run,
but it'll do nothing because it hits the "quick exit" test.  Therefore,
if a backup aborts while setting up, you'll keep running with forced
page writes until next postmaster crash or restart.  Not good.

ISTM we need to give another flag to the callback function besides
emit_warning: one that says whether to test sessionBackupState or not.
I suppose the easiest way to do it with no other changes is to turn
'arg' into a bitmask.
But alternatively, we could just remove emit_warning as a flag and have
the warning be emitted always; then we can use the boolean for the other
purpose.  I don't think the extra WARNING thrown during backup set-up is
going to be a problem, since it will mostly never be seen anyway (and if
you do see it, it's not a lie.)

However, what's most problematic about this patch is that it introduces
a pretty serious bug, yet that bug goes unnoticed if you just run the
builtin test suites.  I only noticed because I added an elog(ERROR,
"oops") in the area protected by ENSURE_ERROR_CLEANUP and a debug
elog(WARNING) in the cleanup area, then examined the server log after
the pg_basebackup test filed; but this is not very workable.  I wonder
what would be a good way to keep this in check.  The naive way seems to
be to run a pg_basebackup, have it abort partway through (how?), then
test the server and see if forced page writes are enabled or not.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"The problem with the facetime model is not just that it's demoralizing, but
that the people pretending to work interrupt the ones actually working."
   (Paul Graham)




Re: Question about pull_up_sublinks_qual_recurse

2022-10-14 Thread Andy Fan
>
> Later we tried to pull up the EXISTS sublink to t1 OR t2 *separately*,
> since
> this subselect referenced to t1 *AND* t2, so we CAN'T pull up the
> sublink. I
> am thinking why we have to pull up it t1 OR t2 rather than JoinExpr(t1,
> t2),
> I think the latter one is better.
>

After some more self review,  I find my proposal has the following side
effects.

select * from t1
where exists (select 1 from t2
  where exists (select 1 from t3
   where t3.c = t1.c)
and t2.a = t1.a);

In the above example,  the innermost sublink will be joined with
SemiJoin (t1 t2) in the patched version,  but joined with t1 in the current
master.  However, even if we set the JoinTree with
SemiJoin(SemiJoin(t1 t2), t3),  the join reorder functions can generate a
path which joins t1 with t3 first and then t2 still.  So any hint about this
patch's self-review is welcome.

-- 
Best Regards
Andy Fan


Re: New docs chapter on Transaction Management and related changes

2022-10-14 Thread Simon Riggs
On Thu, 13 Oct 2022 at 23:06, Bruce Momjian  wrote:
>
> Thanks, updated patch attached.  You can see the output at:

Thank you for your work to tighten and cleanup this patch, much appreciated.

I had two minor typos, plus a slight rewording to avoid using the word
"global" in the last section, since that is associated with
distributed or 2PC transactions. For those changes, I provide a
patch-on-patch so you can see clearly.

In related changes, I've also done some major rewording of the RELEASE
SAVEPOINT command, since it was incorrectly described as having "no
other user visible behavior". A complex example is given to explain,
using the terminology established in the main patch.

-- 
Simon Riggshttp://www.EnterpriseDB.com/


xact.patch-on-patch
Description: Binary data


doc-release-savepoint.v1.patch
Description: Binary data


refactor ownercheck and aclcheck functions

2022-10-14 Thread Peter Eisentraut
These patches take the dozens of mostly-duplicate pg_foo_ownercheck() 
and pg_foo_aclcheck() functions and replace (most of) them by common 
functions that are driven by the ObjectProperty table.  All the required 
information is already in that table.


This is similar to the consolidation of the drop-by-OID functions that 
we did a while ago (b1d32d3e3230f00b5baba08f75b4f665c7d6dac6).From fc98a7bdb9f2d1a47f67305c78aecc385cc10f21 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 14 Oct 2022 09:16:51 +0200
Subject: [PATCH] Refactor ownercheck functions

Instead of dozens of mostly-duplicate pg_foo_ownercheck() functions,
write one common function object_ownercheck() that can handle almost
all of them.  We already have all the information we need, such as
which system catalog corresponds to which catalog table, and which
column is the owner column.

I kept a few pg_foo_ownercheck() around as thin wrappers because they
are widely used through the code.  And there are also a couple that
don't work via the generic function, so those stay as is.
---
 src/backend/catalog/aclchk.c| 517 ++--
 src/backend/catalog/objectaddress.c |  92 +
 src/backend/commands/collationcmds.c|   2 +-
 src/backend/commands/event_trigger.c|   4 +-
 src/backend/commands/foreigncmds.c  |   6 +-
 src/backend/commands/proclang.c |   2 +-
 src/backend/commands/publicationcmds.c  |   4 +-
 src/backend/commands/statscmds.c|   2 +-
 src/backend/commands/subscriptioncmds.c |   6 +-
 src/backend/commands/tablespace.c   |   6 +-
 src/backend/commands/tsearchcmds.c  |   4 +-
 src/include/utils/acl.h |  22 +-
 12 files changed, 71 insertions(+), 596 deletions(-)

diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index aa5a2ed9483e..d67d3b522cef 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -32,6 +32,7 @@
 #include "catalog/pg_am.h"
 #include "catalog/pg_authid.h"
 #include "catalog/pg_cast.h"
+#include "catalog/pg_class.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_conversion.h"
 #include "catalog/pg_database.h"
@@ -5111,25 +5112,30 @@ pg_type_aclcheck(Oid type_oid, Oid roleid, AclMode mode)
 }
 
 /*
- * Ownership check for a relation (specified by OID).
+ * Generic ownership check for an object
  */
 bool
-pg_class_ownercheck(Oid class_oid, Oid roleid)
+object_ownercheck(Oid classid, Oid objectid, Oid roleid)
 {
HeapTuple   tuple;
Oid ownerId;
+   boolisnull;
 
/* Superusers bypass all permission checking. */
if (superuser_arg(roleid))
return true;
 
-   tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(class_oid));
+   tuple = SearchSysCache1(get_object_catcache_oid(classid), 
ObjectIdGetDatum(objectid));
if (!HeapTupleIsValid(tuple))
ereport(ERROR,
-   (errcode(ERRCODE_UNDEFINED_TABLE),
-errmsg("relation with OID %u does not exist", 
class_oid)));
+   (errcode(ERRCODE_UNDEFINED_OBJECT),
+errmsg("%s with OID %u does not exist", 
get_object_class_descr(classid), objectid)));
 
-   ownerId = ((Form_pg_class) GETSTRUCT(tuple))->relowner;
+   ownerId = 
DatumGetObjectId(SysCacheGetAttr(get_object_catcache_oid(classid),
+   
   tuple,
+   
   get_object_attnum_owner(classid),
+   
   ));
+   Assert(!isnull);
 
ReleaseSysCache(tuple);
 
@@ -5137,107 +5143,43 @@ pg_class_ownercheck(Oid class_oid, Oid roleid)
 }
 
 /*
- * Ownership check for a type (specified by OID).
+ * Wrapper functions for commonly used cases
  */
+
 bool
-pg_type_ownercheck(Oid type_oid, Oid roleid)
+pg_class_ownercheck(Oid class_oid, Oid roleid)
 {
-   HeapTuple   tuple;
-   Oid ownerId;
-
-   /* Superusers bypass all permission checking. */
-   if (superuser_arg(roleid))
-   return true;
-
-   tuple = SearchSysCache1(TYPEOID, ObjectIdGetDatum(type_oid));
-   if (!HeapTupleIsValid(tuple))
-   ereport(ERROR,
-   (errcode(ERRCODE_UNDEFINED_OBJECT),
-errmsg("type with OID %u does not exist", 
type_oid)));
-
-   ownerId = ((Form_pg_type) GETSTRUCT(tuple))->typowner;
-
-   ReleaseSysCache(tuple);
+   return object_ownercheck(RelationRelationId, class_oid, roleid);
+}
 
-   return has_privs_of_role(roleid, ownerId);
+bool
+pg_type_ownercheck(Oid type_oid, Oid roleid)
+{
+   return object_ownercheck(TypeRelationId, type_oid, roleid);
 }
 
-/*
- * Ownership check for 

Re: PG upgrade 14->15 fails - database contains our own extension

2022-10-14 Thread David Turoň



Hi,

I really appreciate your help and very quick response. And WOW, write patch
for this in few hours ...that's amazing!

> Looking closer, I don't see how b55f2b692 could have changed pg_dump's
> opinion of the order to sort these three casts in; that sort ordering
> logic is old enough to vote.  So I'm guessing that in fact this *never*
> worked.  Perhaps this extension has never been through pg_upgrade before,
> or at least not with these casts?

Yes its new and I tested right now with upgrade from 9.6 to 15.0 rc2 with
same result. So this behavior is probably long time there, but extension is
new and not upgraded yet. And probably nobody have this "strange" idea.


>(I'm pretty skeptical about it being a good idea to have a set of
casts like this, but I don't suppose pg_dump is chartered to
editorialize on that.)
Yes, im not proud of the creation this workaround extension and I did what
frontend develepers asked me if it's possible. I don't expect a medal of
honor:)

The problem was when bigint was taken from DB as json and stored as number
JS library cast number automaticaly to integer that cause problem.

lbstat=# SELECT json_agg(test) FROM test;
   json_agg
---
 [{"id":"4294967296"}]
(1 row)

-- ID was represnted now as text and JS library can use it and sent back
without error. But for DB is still bigint.

This was automatic way to solve this problem without casting on all places
to text. I tested and most things works well until upgrade test didn't
pass.

Thank you all.

David T.

--
-
Ing. David TUROŇ
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.:+420 591 166 224
fax:+420 596 621 273
mobil:  +420 732 589 152
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-



Od: "Tom Lane" 
Komu:   "David Turoň" 
Kopie:  "Robert Haas" ,
pgsql-hack...@postgresql.org, "Marian Krucina"

Datum:  13.10.2022 18:06
Předmět:Re: PG upgrade 14->15 fails - database contains our own
extension



I wrote:
> Hmm ... I think it's a very ancient bug that somehow David has avoided
> tripping over up to now.

Looking closer, I don't see how b55f2b692 could have changed pg_dump's
opinion of the order to sort these three casts in; that sort ordering
logic is old enough to vote.  So I'm guessing that in fact this *never*
worked.  Perhaps this extension has never been through pg_upgrade before,
or at least not with these casts?

> We might be able to put in some kluge in pg_dump to make it less
> likely to fail with existing DBs, but I think the true fix lies
> in adding that dependency.

I don't see any painless way to fix this in pg_dump, and I'm inclined
not to bother trying if it's not a regression.  Better to spend the
effort on the backend-side fix.

On the backend side, really anyplace that we consult IsBinaryCoercible
during DDL is at hazard.  While there aren't a huge number of such
places, there's certainly more than just CreateCast.  I'm trying to
decide how much trouble it's worth going to there.  I could be wrong,
but I think that only the cast-vs-cast case is really likely to be
problematic for pg_dump, given that it dumps casts pretty early now.
So it might be sufficient to fix that one case.

 regards, tom lane



Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-10-14 Thread Masahiko Sawada
Hi,

On Mon, Oct 10, 2022 at 2:16 PM John Naylor
 wrote:
>
> The following is not quite a full review, but has plenty to think about. 
> There is too much to cover at once, and I have to start somewhere...
>
> My main concerns are that internal APIs:
>
> 1. are difficult to follow
> 2. lead to poor branch prediction and too many function calls
>
> Some of the measurements are picking on the SIMD search code, but I go into 
> details in order to demonstrate how a regression there can go completely 
> unnoticed. Hopefully the broader themes are informative.
>
> On Fri, Oct 7, 2022 at 3:09 PM Masahiko Sawada  wrote:
> > [fixed benchmarks]
>
> Thanks for that! Now I can show clear results on some aspects in a simple 
> way. The attached patches (apply on top of v6) are not intended to be 
> incorporated as-is quite yet, but do point the way to some reorganization 
> that I think is necessary. I've done some testing on loading, but will leave 
> it out for now in the interest of length.
>
>
> 0001-0003 are your performance test fix and and some small conveniences for 
> testing. Binary search is turned off, for example, because we know it 
> already. And the sleep call is so I can run perf in a different shell 
> session, on only the search portion.
>
> Note the v6 test loads all block numbers in the range. Since the test item 
> ids are all below 64 (reasonable), there are always 32 leaf chunks, so all 
> the leaves are node32 and completely full. This had the effect of never 
> taking the byte-wise loop in the proposed pg_lsearch function. These two 
> aspects make this an easy case for the branch predictor:
>
> john=# select * from bench_seq_search(0, 1*1000*1000);
> NOTICE:  num_keys = 100, height = 2, n4 = 0, n16 = 0, n32 = 31251, n128 = 
> 1, n256 = 122
> NOTICE:  sleeping for 2 seconds...
>   nkeys  | rt_mem_allocated | array_mem_allocated | rt_load_ms | 
> array_load_ms | rt_search_ms | array_serach_ms
> -+--+-++---+--+-
>  100 | 10199040 |   18000 |167 | 
> 0 |  822 |   0
>
>  1,470,141,841  branches:u
> 63,693  branch-misses:u   #0.00% of all branches
>
> john=# select * from bench_shuffle_search(0, 1*1000*1000);
> NOTICE:  num_keys = 100, height = 2, n4 = 0, n16 = 0, n32 = 31251, n128 = 
> 1, n256 = 122
> NOTICE:  sleeping for 2 seconds...
>   nkeys  | rt_mem_allocated | array_mem_allocated | rt_load_ms | 
> array_load_ms | rt_search_ms | array_serach_ms
> -+--+-++---+--+-
>  100 | 10199040 |   18000 |168 | 
> 0 | 2174 |   0
>
>  1,470,142,569  branches:u
> 15,023,983  branch-misses:u   #1.02% of all branches
>
>
> 0004 randomizes block selection in the load part of the search test so that 
> each block has a 50% chance of being loaded.  Note that now we have many 
> node16s where we had none before. Although node 16 and node32 appear to share 
> the same path in the switch statement of rt_node_search(), the chunk 
> comparison and node_get_values() calls each must go through different 
> branches. The shuffle case is most affected, but even the sequential case 
> slows down. (The leaves are less full -> there are more of them, so memory 
> use is larger, but it shouldn't matter much, in the sequential case at least)
>
> john=# select * from bench_seq_search(0, 2*1000*1000);
> NOTICE:  num_keys = 999654, height = 2, n4 = 1, n16 = 35610, n32 = 26889, 
> n128 = 1, n256 = 245
> NOTICE:  sleeping for 2 seconds...
>  nkeys  | rt_mem_allocated | array_mem_allocated | rt_load_ms | array_load_ms 
> | rt_search_ms | array_serach_ms
> +--+-++---+--+-
>  999654 | 14893056 |   179937720 |173 | 0 
> |  907 |   0
>
>  1,684,114,926  branches:u
>  1,989,901  branch-misses:u   #0.12% of all branches
>
> john=# select * from bench_shuffle_search(0, 2*1000*1000);
> NOTICE:  num_keys = 999654, height = 2, n4 = 1, n16 = 35610, n32 = 26889, 
> n128 = 1, n256 = 245
> NOTICE:  sleeping for 2 seconds...
>  nkeys  | rt_mem_allocated | array_mem_allocated | rt_load_ms | array_load_ms 
> | rt_search_ms | array_serach_ms
> +--+-++---+--+-
>  999654 | 14893056 |   179937720 |173 | 0 
> | 2890 |   0
>
>  1,684,115,844  branches:u
> 34,215,740  branch-misses:u   #2.03% of all branches
>
>
> 0005 replaces pg_lsearch with a branch-free SIMD search. Note that it retains 
> full 

Re: dynamic result sets support in extended query protocol

2022-10-14 Thread Peter Eisentraut

On 01.02.22 15:40, Peter Eisentraut wrote:

On 12.01.22 11:20, Julien Rouhaud wrote:
Since you mentioned that this patch depends on the SHOW_ALL_RESULTS 
psql patch
which is still being worked on, I'm not expecting much activity here 
until the
prerequirements are done.  It also seems better to mark this patch as 
Waiting

on Author as further reviews are probably not really needed for now.


Well, a review on the general architecture and approach would have been 
useful.  But I understand that without the psql work, it's difficult for 
a reviewer to even get started on this patch.  It's also similarly 
difficult for me to keep updating it.  So I'll set it to Returned with 
feedback for now and take it off the table.  I want to get back to it 
when the prerequisites are more settled.


Now that the psql support for multiple result sets exists, I want to 
revive this patch.  It's the same as the last posted version, except now 
it doesn't require any psql changes or any weird test modifications anymore.


(Old news: This patch allows declaring a cursor WITH RETURN in a 
procedure to make the cursor's data be returned as a result of the CALL 
invocation.  The procedure needs to be declared with the DYNAMIC RESULT 
SETS attribute.)
From 80311214144fba40006dea54817956c3e92110ce Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 14 Oct 2022 09:01:17 +0200
Subject: [PATCH v5] Dynamic result sets from procedures

Declaring a cursor WITH RETURN in a procedure makes the cursor's data be
returned as a result of the CALL invocation.  The procedure needs to
be declared with the DYNAMIC RESULT SETS attribute.

Discussion: 
https://www.postgresql.org/message-id/flat/6e747f98-835f-2e05-cde5-86ee444a7...@2ndquadrant.com
---
 doc/src/sgml/catalogs.sgml| 10 +++
 doc/src/sgml/information_schema.sgml  |  3 +-
 doc/src/sgml/plpgsql.sgml | 27 +-
 doc/src/sgml/protocol.sgml| 19 +
 doc/src/sgml/ref/alter_procedure.sgml | 12 +++
 doc/src/sgml/ref/create_procedure.sgml| 14 +++
 doc/src/sgml/ref/declare.sgml | 34 +++-
 src/backend/catalog/information_schema.sql|  2 +-
 src/backend/catalog/pg_aggregate.c|  3 +-
 src/backend/catalog/pg_proc.c |  4 +-
 src/backend/catalog/sql_features.txt  |  2 +-
 src/backend/commands/functioncmds.c   | 79 +++--
 src/backend/commands/portalcmds.c | 23 +
 src/backend/commands/typecmds.c   | 12 ++-
 src/backend/parser/gram.y | 18 +++-
 src/backend/tcop/postgres.c   | 61 -
 src/backend/tcop/pquery.c |  6 ++
 src/backend/utils/errcodes.txt|  1 +
 src/backend/utils/mmgr/portalmem.c| 48 +++
 src/bin/pg_dump/pg_dump.c | 16 +++-
 src/include/catalog/pg_proc.h |  6 +-
 src/include/commands/defrem.h |  1 +
 src/include/nodes/parsenodes.h|  1 +
 src/include/parser/kwlist.h   |  2 +
 src/include/utils/portal.h| 14 +++
 src/interfaces/libpq/fe-protocol3.c   |  6 +-
 src/pl/plpgsql/src/expected/plpgsql_call.out  | 78 +
 src/pl/plpgsql/src/pl_exec.c  |  6 ++
 src/pl/plpgsql/src/pl_gram.y  | 58 +++--
 src/pl/plpgsql/src/pl_unreserved_kwlist.h |  2 +
 src/pl/plpgsql/src/sql/plpgsql_call.sql   | 46 ++
 .../regress/expected/create_procedure.out | 85 ++-
 src/test/regress/sql/create_procedure.sql | 61 -
 33 files changed, 719 insertions(+), 41 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 00f833d210e7..16dbe93e2246 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -6020,6 +6020,16 @@ pg_proc Columns
   
  
 
+ 
+  
+   prodynres int4
+  
+  
+   For procedures, this records the maximum number of dynamic result sets
+   the procedure may create.  Otherwise zero.
+  
+ 
+
  
   
pronargs int2
diff --git a/doc/src/sgml/information_schema.sgml 
b/doc/src/sgml/information_schema.sgml
index 350c75bc31ef..5fc9dc22aeff 100644
--- a/doc/src/sgml/information_schema.sgml
+++ b/doc/src/sgml/information_schema.sgml
@@ -5885,7 +5885,8 @@ routines Columns
max_dynamic_result_sets 
cardinal_number
   
   
-   Applies to a feature not available in 
PostgreSQL
+   For a procedure, the maximum number of dynamic result sets.  Otherwise
+   zero.
   
  
 
diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml
index d85f89bf3033..58a997e15eef 100644
--- a/doc/src/sgml/plpgsql.sgml
+++ b/doc/src/sgml/plpgsql.sgml
@@ -3128,7 +3128,7 @@ Declaring Cursor Variables
  Another way is to use the cursor declaration syntax,
  which in general is:
 

Re: archive modules

2022-10-14 Thread Bharath Rupireddy
On Fri, Oct 14, 2022 at 6:00 AM Michael Paquier  wrote:
>
> On Thu, Oct 13, 2022 at 02:53:38PM -0400, Tom Lane wrote:
> > Yeah, it really does not work to use GUC hooks to enforce multi-variable
> > constraints.  We've learned that the hard way (more than once, if memory
> > serves).
>
> 414c2fd is one of the most recent ones.  Its thread is about the same
> thing.

Got it. Thanks. Just thinking if we must move below comment somewhere
to guc related files?

 * XXX this code is broken by design.  Throwing an error from a GUC assign
 * hook breaks fundamental assumptions of guc.c.  So long as all the variables
 * for which this can happen are PGC_POSTMASTER, the consequences are limited,
 * since we'd just abort postmaster startup anyway.  Nonetheless it's likely
 * that we have odd behaviors such as unexpected GUC ordering dependencies.
 */

FWIW, I see check_stage_log_stats() and check_log_stats() that set
errdetail and return false causing the similar error:

postgres=# alter system set log_statement_stats = true;
postgres=# select pg_reload_conf();
postgres=# alter system set log_statement_stats = false;
postgres=# alter system set log_parser_stats = true;
ERROR:  invalid value for parameter "log_parser_stats": 1
DETAIL:  Cannot enable parameter when "log_statement_stats" is true.

On Thu, Oct 13, 2022 at 11:54 PM Nathan Bossart
 wrote:
>
> On Thu, Oct 13, 2022 at 03:25:27PM +0530, Bharath Rupireddy wrote:
> > The intent here looks reasonable to me. However, why should the user
> > be able to set both archive_command and archive_library in the first
> > place only to later fail in LoadArchiveLibrary() per the patch? IMO,
> > the check_hook() is the right way to disallow any sorts of GUC
> > misconfigurations, no?
>
> There was some discussion upthread about using the GUC hooks to enforce
> this [0].  In general, it doesn't seem to be a recommended practice.  One
> basic example of the problems with this approach is the following:
>
>   1. Set archive_command and leave archive_library unset and restart
>  the server.
>   2. Unset archive_command and set archive_library and call 'pg_ctl
>  reload'.

Thanks. And yes, if GUC 'foo' is reset but not reloaded and the
check_hook() in the GUC 'bar' while setting it uses the old value of
'foo' and fails.

I'm re-attaching Nathan's patch as-is from [1] here again, just to
make CF bot test the correct patch. Few comments on that patch:

1)
+if (XLogArchiveLibrary[0] != '\0' && XLogArchiveCommand[0] != '\0')
+ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("both archive_command and archive_library specified"),
+ errdetail("Only one of archive_command,
archive_library may be set.")));

The above errmsg looks informational. Can we just say something like
below?  It doesn't require errdetail as the errmsg says it all. See
the other instances elsewhere [2].

ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 errmsg("cannot specify both \"archive_command\" and
\"archive_library\"")));

2) I think we have a problem - set archive_mode and archive_library
and start the server, then set archive_command, reload the conf, see
[3] - the archiver needs to error out right? The archiver gets
restarted whenever archive_library changes but not when
archive_command changes. I think the right place for the error is
after or at the end of HandlePgArchInterrupts().

[1] https://www.postgresql.org/message-id/20221005185716.GB201192%40nathanxps13
[2] errmsg("cannot specify both PARSER and COPY options")));
errmsg("cannot specify both %s and %s",
errmsg("cannot specify both %s and %s",
[3]
./psql -c "alter system set archive_mode='on'" postgres
./psql -c "alter system set
archive_library='/home/ubuntu/postgres/contrib/basic_archive/basic_archive.so'"
postgres
./pg_ctl -D data -l logfile restart
./psql -c "alter system set archive_command='cp %p
/home/ubuntu/archived_wal/%f'" postgres
./psql -c "select pg_reload_conf();" postgres
postgres=# show archive_mode;
 archive_mode
--
 on
(1 row)
postgres=# show archive_command ;
  archive_command

 cp %p /home/ubuntu/archived_wal/%f
(1 row)
postgres=# show archive_library ;
   archive_library
--
 /home/ubuntu/postgres/contrib/basic_archive/basic_archive.so
(1 row)
postgres=# select pid, wait_event_type, backend_type from
pg_stat_activity where backend_type = 'archiver';
   pid   | wait_event_type | backend_type
-+-+--
 2116760 | Activity| archiver
(1 row)

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


fail_arch.patch
Description: Binary data


Bug: pg_regress makefile does not always copy refint.so

2022-10-14 Thread Donghang Lin
Hi hackers,

When I was building pg_regress, it didn’t always copy a rebuilt version of 
refint.so to the folder.

Steps to reproduce:
OS: centos7
PSQL version: 14.5

1. configure and build postgres
2. edit file src/include/utils/rel.h so that contrib/spi will rebuild
3. cd ${builddir}/src/test/regress
4. make
We’ll find refint.so is rebuilt in contrib/spi, but not copied over to regress 
folder.
While autoinc.so is rebuilt and copied over.

Attach the potential patch to fix the issue.

Regards,
Donghang Lin
(ServiceNow)




fix-pgregress-makefile.patch
Description: fix-pgregress-makefile.patch


Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-10-14 Thread Michael Paquier
On Fri, Oct 14, 2022 at 02:30:25PM +0900, Michael Paquier wrote:
> First, as of HEAD, AuthToken is only used for elements in a list of
> role and database names in hba.conf before filling in each HbaLine,
> hence we limit its usage to the initial parsing.  The patch assigns an
> optional regex_t to it, then extends the use of AuthToken for single
> hostname entries in pg_hba.conf.  Things going first: shouldn't we
> combine ident_user and "re" together in the same structure?  Even if
> we finish by not using AuthToken to store the computed regex, it seems
> to me that we'd better use the same base structure for pg_ident.conf
> and pg_hba.conf.  While looking closely at the patch, we would expand
> the use of AuthToken outside its original context.  I have also looked
> at make_auth_token(), and wondered if it could be possible to have this
> routine compile the regexes.  This approach would not stick with
> pg_ident.conf though, as we validate the fields in each line when we
> put our hands on ident_user and after the base validation of a line
> (number of fields, etc.).  So with all that in mind, it feels right to
> not use AuthToken at all when building each HbaLine and each
> IdentLine, but a new, separate, structure.  We could call that an
> AuthItem (string, its compiled regex) perhaps?  It could have its own
> make() routine, taking in input an AuthToken and process
> pg_regcomp().  Better ideas for this new structure would be welcome,
> and the idea is that we'd store the post-parsing state of an
> AuthToken to something that has a compiled regex.  We could finish by
> using AuthToken at the end and expand its use, but it does not feel
> completely right either to have a make() routine but not be able to
> compile its regular expression when creating the AuthToken.

I have have sent this part too quickly.  As AuthTokens are used in
check_db() and check_role() when matching entries, it is more
intuitive to store the regex_t directly in it.  Changing IdentLine to
use a AuthToken makes the "quoted" part useless in this case, still it
could be used in Assert()s to make sure that the data is shaped as
expected at check-time, enforced at false when creating it in
parse_ident_line()?
--
Michael


signature.asc
Description: PGP signature