Re: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)

2022-10-20 Thread Amit Kapila
On Fri, Oct 21, 2022 at 8:01 AM Masahiko Sawada  wrote:
>
> Thank you for the comment! I agreed with all comments and I've updated
> patches accordingly.
>

Pushed after removing the test case from v11-13 branches as it is not
relevant to those branches and the test-1 in
catalog_change_snapshot.spec already tests the same case for those
branches.

-- 
With Regards,
Amit Kapila.




Re: Getting rid of SQLValueFunction

2022-10-20 Thread Michael Paquier
On Fri, Oct 21, 2022 at 12:34:23PM +0900, Michael Paquier wrote:
> A sticky point is that this would need the creation of a pg_proc entry
> for "user" which is a generic word, or a shortcut around
> FigureColnameInternal().  The code gain overall still looks appealing
> in the executor, even if we do all that and the resulting backend code
> gets kind of nicer and easier to maintain long-term IMO.

I have looked at that, and the attribute mapping remains compatible
with past versions once the appropriate pg_proc entries are added.
The updated patch set attached does that (with a user() function as
well to keep the code a maximum simple), with more tests to cover the
attribute case mentioned upthread.
--
Michael
From 8d1a424e44f46f3107a6f0f066da22c19e4e4f3b Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Fri, 21 Oct 2022 14:06:03 +0900
Subject: [PATCH v2 1/2] Remove from SQLValueFunction all the entries using
 "name" as return type

This commit changes six SQL keywords to use COERCE_SQL_SYNTAX rather
than SQLValueFunction:
- CURRENT_ROLE
- CURRENT_USER
- USER
- SESSION_USER
- CURRENT_CATALOG
- CURRENT_SCHEMA
Among the six, "user", "current_role" and "current_catalog" require
specific SQL functions to allow ruleutils.c to map them to the SQL
keywords these require when using COERCE_SQL_SYNTAX.  Having
pg_proc.proname match with the keyword ensures that the compatibility
remains the same when projecting any of these keywords in a FROM clause
to an attribute name when an alias is not specified.  Tests are added to
make sure that the mapping happens from the SQL keyword is correct,
using a view defition that feeds on these keywords in one of the tests
introduced by 40c24bf (both in a SELECT clause and when using each
keyword in a FROM clause).

SQLValueFunction is reduced to half its contents after this change,
simplifying its logic a bit as there is no need to enforce a C collation
anymore.  I have made a few performance tests, with a million-ish calls
to these keywords without seeing a difference in run-time or in perf
profiles.  The remaining SQLValueFunctions relate to timestamps.

Bump catalog version.

Reviewed-by: Corey Huinker
Discussion: https://postgr.es/m/yzag3morycguu...@paquier.xyz
---
 src/include/catalog/catversion.h  |  2 +-
 src/include/catalog/pg_proc.dat   |  9 ++
 src/include/nodes/primnodes.h |  8 +
 src/backend/executor/execExprInterp.c | 27 -
 src/backend/nodes/nodeFuncs.c | 11 ++-
 src/backend/parser/gram.y | 30 ++
 src/backend/parser/parse_expr.c   |  8 -
 src/backend/parser/parse_target.c | 18 ---
 src/backend/utils/adt/ruleutils.c | 36 +++---
 src/test/regress/expected/create_view.out | 37 +--
 src/test/regress/sql/create_view.sql  | 15 -
 11 files changed, 105 insertions(+), 96 deletions(-)

diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index 4c930c189b..032a429345 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -57,6 +57,6 @@
  */
 
 /*			mmddN */
-#define CATALOG_VERSION_NO	202210141
+#define CATALOG_VERSION_NO	202210211
 
 #endif
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 62a5b8e655..241366fc8e 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -1505,6 +1505,15 @@
 { oid => '745', descr => 'current user name',
   proname => 'current_user', provolatile => 's', prorettype => 'name',
   proargtypes => '', prosrc => 'current_user' },
+{ oid => '9695', descr => 'current role name',
+  proname => 'current_role', provolatile => 's', prorettype => 'name',
+  proargtypes => '', prosrc => 'current_user' },
+{ oid => '9696', descr => 'user name',
+  proname => 'user', provolatile => 's', prorettype => 'name',
+  proargtypes => '', prosrc => 'current_user' },
+{ oid => '9697', descr => 'name of the current database',
+  proname => 'current_catalog', provolatile => 's', prorettype => 'name',
+  proargtypes => '', prosrc => 'current_database' },
 { oid => '746', descr => 'session user name',
   proname => 'session_user', provolatile => 's', prorettype => 'name',
   proargtypes => '', prosrc => 'session_user' },
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 40661334bb..e818231e15 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1313,13 +1313,7 @@ typedef enum SQLValueFunctionOp
 	SVFOP_LOCALTIME,
 	SVFOP_LOCALTIME_N,
 	SVFOP_LOCALTIMESTAMP,
-	SVFOP_LOCALTIMESTAMP_N,
-	SVFOP_CURRENT_ROLE,
-	SVFOP_CURRENT_USER,
-	SVFOP_USER,
-	SVFOP_SESSION_USER,
-	SVFOP_CURRENT_CATALOG,
-	SVFOP_CURRENT_SCHEMA
+	SVFOP_LOCALTIMESTAMP_N
 } SQLValueFunctionOp;
 
 typedef struct SQLValueFunction
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 

Re: thinko in basic_archive.c

2022-10-20 Thread Kyotaro Horiguchi
Sorry, the previous mail are sent inadvertently..

At Fri, 21 Oct 2022 14:13:46 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> +  expectation that a value will soon be provided. Care must be taken when
> +  multiple servers are archiving to the same
> +  basic_archive.archive_library directory as they all
> +  might try to archive the same WAL file.
> 
> I don't understand what kind of care should be taken by reading this..
> 
> Anyway the PID + millisecond-resolution timestamps work in the almost
> all cases, but it's not perfect. So.. I don't come up with what to
> think about this..
> 
> Traditionally we told people that "archiving should not overwrite a
> file unconditionally. Generally it is safe only when the contents are
> identical then should be errored-out otherwise.".. Ah this is.

basic_archive follows the suggestion if the same file exists before it
starts to write a file.  So please forget this.

>* Sync the temporary file to disk and move it to its final destination.
>* Note that this will overwrite any existing file, but this is only
>* possible if someone else created the file since the stat() above.

I'm not sure why we are allowed to allow this behavior.. But it also
would be another issue, if anyone cares. Thus I feel that we might not
touch this description in this patch.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: thinko in basic_archive.c

2022-10-20 Thread Kyotaro Horiguchi
Hi.

Anyway, on second thought, lager picture than just adding the
post-process-end callback would out of the scope of this patch. So I
write some comments on the patch first, then discussion the rest.


Thu, 20 Oct 2022 13:29:12 +0530, Bharath Rupireddy 
 wrote in 
> > No. I didn't mean that, If server stops after a successfull
> > durable_rename but before the next call to
> > basic_archive_file_internal, that call back makes false comlaint since
> > that temprary file is actually gone.
> 
> Right. Fixed it.

Thanks, but we don't need to wipe out the all bytes. Just putting \0
at the beginning of the buffer is sufficient. And the Memset() at the
beginning of basic_archive_file_internal is not needed since that
static variables are initially initialized to zeros.

This is not necessarily needed, but it might be better we empty
tempfilepath after unlinking the file.


> > +static chartempfilepath[MAXPGPATH + 256];
> >
> > MAXPGPATH is the maximum length of a file name that PG assumes to be
> > able to handle. Thus extending that length seems wrong.
> 
> I think it was to accommodate the temp file name - "archtemp", file,
> MyProcPid, epoch, but I agree that it can just be MAXPGPATH. However,
> most of the places the core defines the path name to be MAXPGPATH +
> some bytes.

Mmm. I found that basic_archive already does the same thing. So lets
follow that in this patch.


+  expectation that a value will soon be provided. Care must be taken when
+  multiple servers are archiving to the same
+  basic_archive.archive_library directory as they all
+  might try to archive the same WAL file.

I don't understand what kind of care should be taken by reading this..

Anyway the PID + millisecond-resolution timestamps work in the almost
all cases, but it's not perfect. So.. I don't come up with what to
think about this..

Traditionally we told people that "archiving should not overwrite a
file unconditionally. Generally it is safe only when the contents are
identical then should be errored-out otherwise.".. Ah this is.

https://www.postgresql.org/docs/devel/continuous-archiving.html

> Archive commands and libraries should generally be designed to
> refuse to overwrite any pre-existing archive file. This is an
> important safety feature to preserve the integrity of your archive
> in case of administrator error (such as sending the output of two
> different servers to the same archive directory). It is advisable to
> test your proposed archive library to ensure that it does not
> overwrite an existing file.
...
> file again after restarting (provided archiving is still
> enabled). When an archive command or library encounters a
> pre-existing file, it should return a zero status or true,
> respectively, if the WAL file has identical contents to the
> pre-existing archive and the pre-existing archive is fully persisted
> to storage. If a pre-existing file contains different contents than
> the WAL file being archived, the archive command or library must
> return a nonzero status or false, respectively.

On the other hand, basic_archive seems to overwrite existing files
unconditionally.  I think this is not great, in that we offer a tool
betrays to our own wrtten suggestion...



The following is out-of-the-scope discussions.

==
At Thu, 20 Oct 2022 13:29:12 +0530, Bharath Rupireddy 
 wrote in 
> On Thu, Oct 20, 2022 at 6:57 AM Kyotaro Horiguchi
>  wrote: > > > The archive module must be
> responsible for cleaning up the temp file > > that it creates. One
> way to do it is in the archive module's shutdown > > callback, this
> covers most of the cases, but not all.  > > True. But I agree to
> Robert that such temporary files should be > cleanup-able without
> needing temporarily-valid knowledge (current file > name, in this
> case). A common strategy for this is to name those files > by names
> that can be identifed as garbage.
> 
> I'm not sure how we can distinguish temp files as garbage based on
> name. As Robert pointed out upthread, using system identifier may not
> help as the standbys share the same system identifier and it's
> possible that they might archive to the same directory. Is there any
> other way?

Correct naming scheme would lead to resolution.

> > But since power cut is a typical crash source, we need to identify
> > ruined temporary files and the current naming convention is incomplete
> > in this regard.
> 
> Please note that basic_archive module creates one temp file at a time
> to make file copying/moving atomic and it can keep track of the temp
> file name and delete it using shutdown callback which helps in most of
> the scenarios. As said upthread, repeated crashes while basic_archive
> module is atomically copying files around is a big problem in itself
> and basic_archive module need not worry about it much.

I'm not sure. It's a "basic_archiver", but an "example_archiver".  I
read the name as "it is no highly configuratable but practically
usable".  In 

Re: Cygwin cleanup

2022-10-20 Thread Justin Pryzby
On Thu, Aug 04, 2022 at 04:16:06PM +1200, Thomas Munro wrote:
> On Thu, Aug 4, 2022 at 3:38 PM Justin Pryzby  wrote:
> > [train wreck]
> 
> Oh my, so I'm getting the impression we might actually be totally
> unstable on Cygwin.  Which surprises me because ... wait a minute ...
> lorikeet isn't even running most of the tests.  So... we don't really
> know the degree to which any of this works at all?

Right.

Maybe it's of limited interest, but ..

This updates the patch to build and test with meson.
Which first requires patching some meson.builds.
I guess that's needed for some current BF members, too.
Unfortunately, ccache+PCH causes gcc to crash :(

-- 
Justin
>From 0e0d5f33c8f5f3174b0576597971c80834bf76b8 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 30 Sep 2022 08:56:07 -0500
Subject: [PATCH 1/4] meson: PROVE is not required

It ought to be possible to build the application without running tests,
or without running TAP tests.  And it's essential for supporting
buildfarm/cygwin, where TAP tests consistently fail..
---
 meson.build | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/meson.build b/meson.build
index 2d225f706d2..fd8619d6997 100644
--- a/meson.build
+++ b/meson.build
@@ -323,7 +323,7 @@ python = find_program(get_option('PYTHON'), required: true, native: true)
 flex = find_program(get_option('FLEX'), native: true, version: '>= 2.5.35')
 bison = find_program(get_option('BISON'), native: true, version: '>= 2.3')
 sed = find_program(get_option('SED'), 'sed', native: true)
-prove = find_program(get_option('PROVE'), native: true)
+prove = find_program(get_option('PROVE'), native: true, required: false)
 tar = find_program(get_option('TAR'), native: true)
 gzip = find_program(get_option('GZIP'), native: true)
 program_lz4 = find_program(get_option('LZ4'), native: true, required: false)
-- 
2.25.1

>From 330f91de111f8bee7969818c9001ee6bbc74a048 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 30 Sep 2022 13:39:43 -0500
Subject: [PATCH 2/4] meson: other fixes for cygwin

XXX: what about HAVE_BUGGY_STRTOF ?
---
 meson.build  | 8 ++--
 src/port/meson.build | 4 
 src/test/regress/meson.build | 2 +-
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/meson.build b/meson.build
index fd8619d6997..1c3e156f378 100644
--- a/meson.build
+++ b/meson.build
@@ -211,6 +211,10 @@ if host_system == 'aix'
 
 elif host_system == 'cygwin'
   cppflags += '-D_GNU_SOURCE'
+  dlsuffix = '.dll'
+  mod_link_args_fmt = ['@0@']
+  mod_link_with_name = 'lib@0@.exe.a'
+  mod_link_with_dir = 'libdir'
 
 elif host_system == 'darwin'
   dlsuffix = '.dylib'
@@ -2300,8 +2304,8 @@ gnugetopt_dep = cc.find_library('gnugetopt', required: false)
 #   (i.e., allow '-' as a flag character), so use our version on those platforms
 # - We want to use system's getopt_long() only if the system provides struct
 #   option
-always_replace_getopt = host_system in ['windows', 'openbsd', 'solaris']
-always_replace_getopt_long = host_system == 'windows' or not cdata.has('HAVE_STRUCT_OPTION')
+always_replace_getopt = host_system in ['windows', 'cygwin', 'openbsd', 'solaris']
+always_replace_getopt_long = host_system in ['windows', 'cygwin'] or not cdata.has('HAVE_STRUCT_OPTION')
 
 # Required on BSDs
 execinfo_dep = cc.find_library('execinfo', required: false)
diff --git a/src/port/meson.build b/src/port/meson.build
index c696f1b..0ba83cc7930 100644
--- a/src/port/meson.build
+++ b/src/port/meson.build
@@ -40,6 +40,10 @@ if host_system == 'windows'
 'win32setlocale.c',
 'win32stat.c',
   )
+elif host_system == 'cygwin'
+  pgport_sources += files(
+'dirmod.c',
+  )
 endif
 
 if cc.get_id() == 'msvc'
diff --git a/src/test/regress/meson.build b/src/test/regress/meson.build
index 3dcfc11278f..6ec3c77af53 100644
--- a/src/test/regress/meson.build
+++ b/src/test/regress/meson.build
@@ -10,7 +10,7 @@ regress_sources = pg_regress_c + files(
 # patterns like ".*-.*-mingw.*". We probably can do better, but for now just
 # replace 'gcc' with 'mingw' on windows.
 host_tuple_cc = cc.get_id()
-if host_system == 'windows' and host_tuple_cc == 'gcc'
+if host_system in ['windows', 'cygwin'] and host_tuple_cc == 'gcc'
   host_tuple_cc = 'mingw'
 endif
 host_tuple = '@0@-@1@-@2@'.format(host_cpu, host_system, host_tuple_cc)
-- 
2.25.1

>From a054c0bfb95213e0fd37a7f0c59ed29510f2b873 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Mon, 25 Jul 2022 23:05:10 +1200
Subject: [PATCH 3/4] WIP CI support for Cygwin.

ci-os-only: cygwin

See also: d8e78714-dc77-4a64-783f-e863ba4d9...@2ndquadrant.com

https://cirrus-ci.com/task/5145086722834432

XXX This should use a canned Docker image with all the right packages
installed?  But if the larger image is slower to start, then maybe not...
---
 .cirrus.yml   | 62 +++
 configure |  2 +-
 configure.ac  |  2 +-
 

Re: Getting rid of SQLValueFunction

2022-10-20 Thread Michael Paquier
On Thu, Oct 20, 2022 at 11:10:22PM -0400, Tom Lane wrote:
> The entire point of SQLValueFunction IMO was to hide the underlying
> implementation(s).  Replacing it with something that leaks
> implementation details does not seem like a step forward.

Hmm..  Okay, thanks.  So this just comes down that I am going to need
one different pg_proc entry per SQL keyword, then, or this won't fly
far.  For example, note that on HEAD or with the patch, a view with a
SQL keyword in a FROM clause translates the same way with quotes
applied in the same places, as of:
=# create view test as select (SELECT * FROM CURRENT_USER) as cu;
CREATE VIEW
=# select pg_get_viewdef('test', true);
   pg_get_viewdef
-
  SELECT ( SELECT "current_user"."current_user" +
FROM CURRENT_USER "current_user"("current_user")) AS cu;
(1 row)

A sticky point is that this would need the creation of a pg_proc entry
for "user" which is a generic word, or a shortcut around
FigureColnameInternal().  The code gain overall still looks appealing
in the executor, even if we do all that and the resulting backend code
gets kind of nicer and easier to maintain long-term IMO.
--
Michael


signature.asc
Description: PGP signature


Re: Getting rid of SQLValueFunction

2022-10-20 Thread Tom Lane
Michael Paquier  writes:
> But the patch enforces the attribute name to be the underlying
> function name, switching the previous "current_catalog" to
> "current_database".

The entire point of SQLValueFunction IMO was to hide the underlying
implementation(s).  Replacing it with something that leaks
implementation details does not seem like a step forward.

regards, tom lane




Re: Getting rid of SQLValueFunction

2022-10-20 Thread Michael Paquier
On Wed, Oct 19, 2022 at 03:45:48PM +0900, Michael Paquier wrote:
> With this in mind, would somebody complain if I commit that?  That's a
> nice reduction in code, while completing the work done in 40c24bf:
>  25 files changed, 338 insertions(+), 477 deletions(-)

On second look, there is something I have underestimated here with
FigureColnameInternal().  This function would create an attribute name
based on the SQL keyword given in input.  For example, on HEAD we
would get that:
=# SELECT * FROM CURRENT_CATALOG;
 current_catalog 
-
 postgres
(1 row)

But the patch enforces the attribute name to be the underlying
function name, switching the previous "current_catalog" to
"current_database".  For example:
=# SELECT * FROM CURRENT_CATALOG;
 current_database 
--
 postgres
(1 row)

I am not sure how much it matters in practice, but this could break
some queries.  One way to tackle that is to extend
FigureColnameInternal() so as we use a compatible name when the node
is a T_FuncCall, but that won't be entirely water-proof as long as
there is not a one-one mapping between the SQL keywords and the
underlying function names, aka we would need a current_catalog.
"user" would be also too generic as a catalog function name, so we
should name its proc entry to a pg_user anyway, requiring a shortcut
in FigureColnameInternal().  Or perhaps I am worrying too much and
keeping the code simpler is better?  Does the SQL specification
require that the attribute name has to match its SQL keyword when
specified in a FROM clause when there is no aliases?

Thoughts?
--
Michael


signature.asc
Description: PGP signature


Re: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)

2022-10-20 Thread Masahiko Sawada
On Thu, Oct 20, 2022 at 6:57 PM Amit Kapila  wrote:
>
> On Wed, Oct 19, 2022 at 9:40 AM Masahiko Sawada  wrote:
> >
> > I've attached patches for Change-3 with a test case. Please review them as 
> > well.
> >
>
> The patch looks mostly good to me apart from few minor comments which
> are as follows:
> 1.
> +# The last decoding restarts from the first checkpoint, and add
> invalidation messages
> +# generated by "s0_truncate" to the subtransaction. When decoding the
> commit record of
> +# the top-level transaction, we mark both top-level transaction and
> its subtransactions
> +# as containing catalog changes. However, we check if we don't create
> the association
> +# between top-level and subtransactions at this time. Otherwise, we
> miss executing
> +# invalidation messages when forgetting the transaction.
> +permutation "s0_init" "s0_begin" "s0_savepoint" "s0_insert"
> "s1_checkpoint" "s1_get_changes" "s0_truncate" "s0_commit" "s0_begin"
> "s0_insert" "s1_checkpoint" "s1_get_changes" "s0_commit"
> "s1_get_changes"
>
> The second part of this comment seems to say things more than required
> which makes it less clear. How about something like: "The last
> decoding restarts from the first checkpoint and adds invalidation
> messages generated by "s0_truncate" to the subtransaction. While
> processing the commit record for the top-level transaction, we decide
> to skip this xact but ensure that corresponding invalidation messages
> get processed."?
>
> 2.
> + /*
> + * We will assign subtransactions to the top transaction before
> + * replaying the contents of the transaction.
> + */
>
> I don't think we need this comment.
>

Thank you for the comment! I agreed with all comments and I've updated
patches accordingly.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


v12-v15_v2-0001-Fix-executing-invalidation-messages-generated-by-.patch
Description: Binary data


v11_v2-0001-Fix-executing-invalidation-messages-generated-by-.patch
Description: Binary data


Re: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)

2022-10-20 Thread Masahiko Sawada
On Thu, Oct 20, 2022 at 8:09 PM Amit Kapila  wrote:
>
> On Wed, Oct 19, 2022 at 4:47 PM Amit Kapila  wrote:
> >
> > On Wed, Oct 19, 2022 at 1:08 PM Masahiko Sawada  
> > wrote:
> > >
> > > On Wed, Oct 19, 2022 at 11:58 AM Masahiko Sawada  
> > > wrote:
> > > >
> > > >
> > > > I've attached two patches that need to be back-patched to all branches
> > > > and includes Change-1, Change-2, and a test case for them. FYI this
> > > > patch resolves the assertion failure reported in this thread as well
> > > > as one reported in another thread[2]. So I borrowed some of the
> > > > changes from the patch[2] Osumi-san recently proposed.
> > > >
> > >
> > > Amit pointed out offlist that the changes in reorderbuffer.c is not
> > > pgindent'ed. I've run pgindent and attached updated patches.
> > >
> >
> > Thanks, I have tested these across all branches till v10 and it works
> > as expected. I am planning to push this tomorrow unless I see any
> > further comments.
> >
>
> Pushed.

Thank you!

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: pg_recvlogical prints bogus error when interrupted

2022-10-20 Thread Kyotaro Horiguchi
At Thu, 20 Oct 2022 13:28:45 +0530, Bharath Rupireddy 
 wrote in 
> On Thu, Oct 20, 2022 at 3:10 AM Andres Freund  wrote:
> >
> > Hi,
> >
> > While reviewing
> > https://postgr.es/m/CAD21AoBe2o2D%3Dxyycsxw2bQOD%3DzPj7ETuJ5VYGN%3DdpoTiCMRJQ%40mail.gmail.com
> > I noticed that pg_recvlogical prints
> > "pg_recvlogical: error: unexpected termination of replication stream: "
> >
> > when signalled with SIGINT/TERM.
> >
> > Oddly enough, that looks to have "always" been the case, even though clearly
> > the code tried to make provisions for a different outcome.
> >
> >
> > It looks to me like all that's needed is to gate the block printing the
> > message with an !time_to_abort.

+1

> +1. How about emitting a message like its friend pg_receivewal, like
> the attached patch?

I'm not a fan of treating SIGINT as an error in this case. It calls
prepareToTerminate() when time_to_abort and everything goes fine after
then. So I think we should do the same thing after receiving an
interrupt.  This also does file-sync naturally as a part of normal
shutdown.  I'm also not a fan of doing fsync at error.

> > I also then noticed that we don't fsync the output file in cases of errors -
> > that seems wrong to me? Looks to me like that block should be moved till 
> > after
> > the error:?
> 
> How about something like the attached patch?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/bin/pg_basebackup/pg_recvlogical.c 
b/src/bin/pg_basebackup/pg_recvlogical.c
index 5f2e6af445..e33c204df0 100644
--- a/src/bin/pg_basebackup/pg_recvlogical.c
+++ b/src/bin/pg_basebackup/pg_recvlogical.c
@@ -55,6 +55,7 @@ static const char *plugin = "test_decoding";
 /* Global State */
 static int outfd = -1;
 static volatile sig_atomic_t time_to_abort = false;
+static volatile sig_atomic_t interrupted = false;
 static volatile sig_atomic_t output_reopen = false;
 static bool output_isfile;
 static TimestampTz output_last_fsync = -1;
@@ -206,6 +207,7 @@ StreamLogicalLog(void)
char   *copybuf = NULL;
TimestampTz last_status = -1;
int i;
+   XLogRecPtr  cur_record_lsn = InvalidXLogRecPtr;
PQExpBuffer query;
 
output_written_lsn = InvalidXLogRecPtr;
@@ -275,7 +277,6 @@ StreamLogicalLog(void)
int bytes_written;
TimestampTz now;
int hdr_len;
-   XLogRecPtr  cur_record_lsn = InvalidXLogRecPtr;
 
if (copybuf != NULL)
{
@@ -487,7 +488,7 @@ StreamLogicalLog(void)
 
if (endposReached)
{
-   prepareToTerminate(conn, endpos, true, 
InvalidXLogRecPtr);
+   cur_record_lsn = InvalidXLogRecPtr;
time_to_abort = true;
break;
}
@@ -527,7 +528,6 @@ StreamLogicalLog(void)
 */
if (!flushAndSendFeedback(conn, ))
goto error;
-   prepareToTerminate(conn, endpos, false, cur_record_lsn);
time_to_abort = true;
break;
}
@@ -572,12 +572,14 @@ StreamLogicalLog(void)
/* endpos was exactly the record we just processed, 
we're done */
if (!flushAndSendFeedback(conn, ))
goto error;
-   prepareToTerminate(conn, endpos, false, cur_record_lsn);
time_to_abort = true;
break;
}
}
 
+   if (time_to_abort)
+   prepareToTerminate(conn, endpos, false, cur_record_lsn);
+
res = PQgetResult(conn);
if (PQresultStatus(res) == PGRES_COPY_OUT)
{
@@ -657,6 +659,7 @@ static void
 sigexit_handler(SIGNAL_ARGS)
 {
time_to_abort = true;
+   interrupted = true;
 }
 
 /*
@@ -1031,6 +1034,8 @@ prepareToTerminate(PGconn *conn, XLogRecPtr endpos, bool 
keepalive, XLogRecPtr l
if (keepalive)
pg_log_info("end position %X/%X reached by keepalive",
LSN_FORMAT_ARGS(endpos));
+   else if (interrupted)
+   pg_log_info("interrupted after %X/%X", 
LSN_FORMAT_ARGS(lsn));
else
pg_log_info("end position %X/%X reached by WAL record 
at %X/%X",
LSN_FORMAT_ARGS(endpos), 
LSN_FORMAT_ARGS(lsn));


Re: explain_regress, explain(MACHINE), and default to explain(BUFFERS) (was: BUFFERS enabled by default in EXPLAIN (ANALYZE))

2022-10-20 Thread Justin Pryzby
Rebased.

BTW, I think it may be that the GUC should be marked PGDLLIMPORT ?
>From 12a605ca84bf21439e4ae51cc3f3a891b3cb4989 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sat, 22 Feb 2020 21:17:10 -0600
Subject: [PATCH 1/7] Add GUC: explain_regress

This changes the defaults for explain to: costs off, timing off, summary off.
It'd be reasonable to use this for new regression tests which are not intended
to be backpatched.
---
 contrib/postgres_fdw/postgres_fdw.c |  2 +-
 src/backend/commands/explain.c  | 13 +++--
 src/backend/utils/misc/guc_tables.c | 13 +
 src/include/commands/explain.h  |  2 ++
 src/test/regress/expected/explain.out   |  3 +++
 src/test/regress/expected/inherit.out   |  2 +-
 src/test/regress/expected/stats_ext.out |  2 +-
 src/test/regress/pg_regress.c   |  3 ++-
 src/test/regress/sql/explain.sql|  4 
 src/test/regress/sql/inherit.sql|  2 +-
 src/test/regress/sql/stats_ext.sql  |  2 +-
 11 files changed, 40 insertions(+), 8 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 8d7500abfbd..3a4f56695b1 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -3138,7 +3138,7 @@ estimate_path_cost_size(PlannerInfo *root,
 		 * values, so don't request params_list.
 		 */
 		initStringInfo();
-		appendStringInfoString(, "EXPLAIN ");
+		appendStringInfoString(, "EXPLAIN (COSTS)");
 		deparseSelectStmtForRel(, root, foreignrel, fdw_scan_tlist,
 remote_conds, pathkeys,
 fpextra ? fpextra->has_final_sort : false,
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index f86983c6601..373fde4d498 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -154,6 +154,7 @@ static void ExplainJSONLineEnding(ExplainState *es);
 static void ExplainYAMLLineStarting(ExplainState *es);
 static void escape_yaml(StringInfo buf, const char *str);
 
+bool explain_regress = false; /* GUC */
 
 
 /*
@@ -172,6 +173,7 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 	ListCell   *lc;
 	bool		timing_set = false;
 	bool		summary_set = false;
+	bool		costs_set = false;
 
 	/* Parse options list. */
 	foreach(lc, stmt->options)
@@ -183,7 +185,11 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 		else if (strcmp(opt->defname, "verbose") == 0)
 			es->verbose = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "costs") == 0)
+		{
+			/* Need to keep track if it was explicitly set to ON */
+			costs_set = true;
 			es->costs = defGetBoolean(opt);
+		}
 		else if (strcmp(opt->defname, "buffers") == 0)
 			es->buffers = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "wal") == 0)
@@ -227,13 +233,16 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 	 parser_errposition(pstate, opt->location)));
 	}
 
+	/* if the costs option was not set explicitly, set default value */
+	es->costs = (costs_set) ? es->costs : es->costs && !explain_regress;
+
 	if (es->wal && !es->analyze)
 		ereport(ERROR,
 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
  errmsg("EXPLAIN option WAL requires ANALYZE")));
 
 	/* if the timing was not set explicitly, set default value */
-	es->timing = (timing_set) ? es->timing : es->analyze;
+	es->timing = (timing_set) ? es->timing : es->analyze && !explain_regress;
 
 	/* check that timing is used with EXPLAIN ANALYZE */
 	if (es->timing && !es->analyze)
@@ -242,7 +251,7 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
  errmsg("EXPLAIN option TIMING requires ANALYZE")));
 
 	/* if the summary was not set explicitly, set default value */
-	es->summary = (summary_set) ? es->summary : es->analyze;
+	es->summary = (summary_set) ? es->summary : es->analyze && !explain_regress;
 
 	query = castNode(Query, stmt->query);
 	if (IsQueryIdEnabled())
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 05ab087934c..1d34e6bdb7b 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -36,6 +36,7 @@
 #include "catalog/namespace.h"
 #include "catalog/storage.h"
 #include "commands/async.h"
+#include "commands/explain.h"
 #include "commands/tablespace.h"
 #include "commands/trigger.h"
 #include "commands/user.h"
@@ -967,6 +968,18 @@ struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+
+	{
+		{"explain_regress", PGC_USERSET, DEVELOPER_OPTIONS,
+			gettext_noop("Change defaults of EXPLAIN to avoid unstable output."),
+			NULL,
+			GUC_NOT_IN_SAMPLE | GUC_EXPLAIN
+		},
+		_regress,
+		false,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
 			gettext_noop("Enables genetic query optimization."),
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 9ebde089aed..912bc9484ef 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -61,6 +61,8 @@ typedef struct 

Re: shared memory stats ideas

2022-10-20 Thread Kyotaro Horiguchi
Thanks for the nice list.

At Wed, 19 Oct 2022 12:37:30 -0700, Peter Geoghegan  wrote in 
> On Wed, Oct 19, 2022 at 11:19 AM Andres Freund  wrote:
> > We e.g. currently can't track the number of blocks written out in a 
> > relation,
> > because we don't have a Relation at that point. Nor can't we really get hold
> > of one, as the writeback can happen in a different database without access 
> > to
> > pg_class. Which is also the reason why the per-relation IO stats aren't
> > populated by the startup process, even though it'd obviously sometimes be
> > helpful to know where the most IO time is spent on a standby.
> >
> > There's also quite a bit of contortions of the bufmgr interface related to
> > this.
> 
> This seems related to the difficulty with distinguishing between
> internal pages and leaf pages (or some generalized AM-agnostic
> definition) in views like pg_statio_*_indexes.
>
> Differentiating between leaf pages and internal pages would definitely
> be a big improvement, but it's kind of an awkward thing to implement
> [1] because you have to somehow invent the general concept of multiple
> distinct kinds of buffers/pages within a relation. A lot of code would
> need to be taught about that.
> 
> This work would be more likely to actually happen if it was tied to
> some bigger project that promised other benefits.

Stickier buffers for index pages seems to be related. I haven't see it
even get started, though.  But this might be able be an additional
reason for starting it.

> > 2) Split index and table statistics into different types of stats
> 
> > This e.g. would allow us keep track of the number of index entries killed 
> > via
> > the killtuples mechanism, which in turn would allow us to more intelligently
> > decide whether we should vacuum indexes (often the most expensive part of
> > vacuum). In a lot of workload killtuples takes care of most of the cleanup,
> > but in others it doesn't do much.
> 
> While I do agree that it would be nice to record information about the
> number of deletion operations per index, that information will still
> be tricky to interpret and act upon relative to other kinds of
> information. As a general rule, we should prefer to focus on signals
> that show things really aren't going well in some specific and
> unambiguous way. Signals about things that are going well seem harder
> to work with -- they don't generalize well.

I think some statistics can be pure-internal purpose. We can maintain
some statistics hidden from users, if we want. (However, I think
people will request for the numbers to be revealed, finally..)

> What I really mean here is this: I think that page split stuff is
> going to be much more interesting than index deletion stuff. Index
> deletion exists to prevent page splits. So it's natural to ask
> questions about where that seems like it ought to have happened, but
> didn't actually happen. This likely requires bucketing page splits
> into different categories (since most individual page splits aren't
> like that at all). Then it becomes much easier to (say) compare
> indexes on the same table -- the user can follow a procedure that is
> likely to generalize well to many different kinds of situations.
> 
> It's not completely clear how the bucketization would work. We ought
> to remember how many page splits were caused by INSERT statements
> rather than non-HOT UPDATEs, though -- that much seems likely to be
> very useful and actionable. The DBA can probably consume this
> information in a low context way by looking at the proportions of one
> kind of split to the other at the level of each index.
> 
> One type of split is mostly just a "cost of doing business" for B-Tree
> indexing. The other type really isn't.
>
> > 3) Maintain more historical statistics about vacuuming
> 
> > However, none of that allows the user to identify which relations are 
> > causing
> > autovacuum to not keep up. Even just keeping track of the the total time
> > autovacuum has spent on certain relations would be a significant 
> > improvement,
> > with more easily imaginable (total IO [time], autovacuum delay time, xid 
> > age).
> 
> With VACUUM in particular the picture over time can be far easier to
> work with than any given snapshot, from any single VACUUM operation.
> Focusing on how things seem to be changing can make it a lot easier to
> spot concerning trends, especially if you're a non-expert.

Agreed.  It seem like a kind of easy (low-hanging) one. I'll give it a
try.  There should be some other numbers that timeseries stats are
useful.

> I would also expect a similar focus on the picture over time to be
> useful with the indexing stuff, for roughly the same underlying
> reasons.
> 
> [1] 
> https://postgr.es/m/CAA8Fd-pB=mr42YQuoaLPO_o2=xo9yjnjq23cyjdfwc8sxgm...@mail.gmail.com

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: fix stats_fetch_consistency value in postgresql.conf.sample

2022-10-20 Thread Peter Smith
Hi, I was hoping to use this patch in my other thread [1], but your
latest attachment is reported broken in cfbot [2]. Please rebase it.

--
[1]  GUC C var sanity check -
https://www.postgresql.org/message-id/CAHut%2BPs91wgaE9P7JORnK_dGq7zPB56WLDJwLNCLgGXxqrh9%3DQ%40mail.gmail.com
[2] cfbot fail - http://cfbot.cputube.org/patch_40_3736.log

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-10-20 Thread Michael Paquier
On Wed, Oct 19, 2022 at 10:45:44AM +0200, Drouvot, Bertrand wrote:
> Please find attached v1-0001-regex-handling-for-db-and-roles-in-hba.patch to
> implement regexes for databases and roles in hba.
> 
> It does also contain new regexes related TAP tests and doc updates.

Thanks for the updated version.  This is really easy to look at now.

> It relies on the refactoring made in fc579e11c6 (but changes the
> regcomp_auth_token() parameters so that it is now responsible for emitting
> the compilation error message (if any), to avoid code duplication in
> parse_hba_line() and parse_ident_line() for roles, databases and user name
> mapping).

@@ -652,13 +670,18 @@ check_role(const char *role, Oid roleid, List *tokens)
[...]
-   if (!tok->quoted && tok->string[0] == '+')
+   if (!token_has_regexp(tok))
{
Hmm.  Do we need token_has_regexp() here for all the cases?  We know
that the string can begin with a '+', hence it is no regex.  The same
applies for the case of "all".  The remaining case is the one where
the user name matches exactly the AuthToken string, which should be
last as we want to treat anything beginning with a '/' as a regex.  It
seems like we could do an order like that?  Say:
if (!tok->quoted && tok->string[0] == '+')
//do
else if (token_is_keyword(tok, "all"))
//do
else if (token_has_regexp(tok))
//do regex compilation, handling failures
else if (token_matches(tok, role))
//do exact match

The same approach with keywords first, regex, and exact match could be
applied as well for the databases?  Perhaps it is just mainly a matter
of taste, and it depends on how much you want to prioritize the place
of the regex over the rest but that could make the code easier to
understand in the long-run and this is a very sensitive code area, and
the case of physical walsenders (in short specific process types)
requiringx specific conditions is also something to take into account. 

foreach(tokencell, tokens)
{
-   parsedline->roles = lappend(parsedline->roles,
-   copy_auth_token(lfirst(tokencell)));
+   AuthToken  *tok = copy_auth_token(lfirst(tokencell));
+
+   /*
+* Compile a regex from the role token, if necessary.
+*/
+   if (regcomp_auth_token(tok, HbaFileName, line_num, err_msg, elevel))
+   return NULL;
+
+   parsedline->roles = lappend(parsedline->roles, tok);
}

Compiling the expressions for the user and database lists one-by-one
in parse_hba_line() as you do is correct.  However there is a gotcha
that you are forgetting here: the memory allocations done for the
regexp compilations are not linked to the memory context where each
line is processed (named hbacxt in load_hba()) and need a separate
cleanup.  In the same fashion as load_ident(), it seems to me that we
need two extra things for this patch:
- if !ok (see where we do MemoryContextDelete(hbacxt)), we need to go
through new_parsed_lines and free for each line the AuthTokens for the
database and user lists.
- if ok and new_parsed_lines != NIL, the same cleanup needs to
happen.
My guess is that you could do both the same way as load_ident() does,
keeping some symmetry between the two code paths.  Unifying both into
a common routine would be sort of annoying as HbaLines uses lists
within the lists of parsed lines, and IdentLine can have one at most
in each line.

I am wondering whether we should link the regexp code to not use
malloc(), actually..  This would require a separate analysis, though,
and I suspect that palloc() would be very expensive for this job. 

For now, I have made your last patch a bit shorter by applying the
refactoring of regcomp_auth_token() separately with a few tweaks to
the comments.
--
Michael


signature.asc
Description: PGP signature


Re: XMAX_LOCK_ONLY and XMAX_COMMITTED (fk/multixact code)

2022-10-20 Thread Corey Huinker
On Tue, Sep 20, 2022 at 2:32 PM Nathan Bossart 
wrote:

> Here is a rebased patch for cfbot.
>
>
>
Applies, passes make check world.

Patch is straightforward, but the previous code is less so. It purported to
set XMAX_COMMITTED _or_ XMAX_INVALID, but never seemed to un-set
XMAX_COMMITTED, was that the source of the double-setting?


Re: Testing DDL Deparser

2022-10-20 Thread Runqi Tian
Hello:

Many thanks for providing feedbacks and sorry for late reply. I
updated the testing harness, please apply patches in [1] before apply
the attached patch.

>Not clear on what this means. Are you talking about ALTER TABLE
>subcommands? If so, then what you have to do (ISTM) is construct a
>single ALTER TABLE command containing several subcommands, when that is
>what is being deparsed; the reason for the user to submit several
>subcommands is to apply optimizations such as avoiding multiple table
>rewrites for several operations, when they can all share a single table
>rewrite. Therefore I think replaying such a command should try and do
>the same, if at all possible.

My question regarding subcommand is actually on commands other than
ALTER TABLE. Let me give an example (You can also find this example in
the patch), when command like

CREATE SCHEMA element_test CREATE TABLE foo (id int)

is caught by ddl_command_end trigger, function
pg_event_trigger_ddl_commands() will return 2 records which I called
as subcommands in the previous email. One is deparsed as

CREATE SCHEMA element_test,
another is deparsed as
CREATE TABLE element_test.foo (id pg_catalog.int4 ).

Is this behavior expected? I thought the deparser is designed to
deparse the entire command to a single command instead of dividing
them into 2 commands.

>The whole reason some objects are *not* dropped is precisely so that
>this type of testing has something to work on. If we find that there
>are object types that would be good to have in order to increase
>coverage, what we can do is change the .sql files to not drop those
>objects. This should be as minimal as possible

It seems that keeping separate test cases in deparser tests folder is
better than using the test cases in core regression tests folder
directly. I will write some test cases in the new deparser test
folder.

>Yeah, the idea is that a DDL processor needs to handle both the DROP and
>the other cases separately in these two event types. As I recall, we
>needed to handle DROP separately because there was no way to collect the
>necessary info otherwise.

I see, it seems that a function to deparse DROP command to JSON output
is needed but not provided yet. I implemented a function
deparse_drop_ddl() in the testing harness, maybe we could consider
exposing a function in engine to deparse DROP command as
deparse_ddl_command() does.

>No opinion on this. I don't think the deparser should be controlled by
>individual GUCs, since it will probably have multiple simultaneous uses.

I see. If the ddl command is not supported, the sub node will not
receive the ddl command, so the sub node will dump different results.
We are still able to detect the error. But because the error message
is not explicit, the developer needs to do some investigations in log
to find out the cause.

About the new implementation, I divided the 3 testing goals into 4 goals, from:

>1. The deparsed JSON output is as expected.
>2. The SQL commands re-formed from deparsed JSON should make the same schema 
>change as the original SQL commands.
>3. Any DDL change without modifying the deparser should fail the testing 
>harness.

updated to:

1, The deparsed JSON is the same as the expected string
2, The reformed SQL command is the same as the expected string
3, The original command and re-formed command can dump the same schema
4, Any DDL change without modifying the deparser should fail the
testing harness.

This new implementation achieves the first 3 goals. For the first 2
goals, the implementation is the same as test_ddl_deparse, I just
changed the notice content from command tag to deparsed JSON and
reformed SQL command. For the 3rd goal, it’s implemented using TAP
framework. The pub node will run the same test cases again, the SQL
files will be executed sequentially. After the execution of each SQL
file, the reformed commands will be applied to the sub node and the
dumped results from pub node and sub node will be compared.

For the next step, I’m going to support goal 4. Because if any DDL
change is made, the core regression test cases should also be changed.
We need to execute the core regression test cases with DDL event
triggers and the DDL deparser enabled. There are 2 solutions in my
mind:

1, Build DDL event triggers and DDL deparser into pg_regress tests so
that DDL commands in these tests can be captured and deparsed.
2, Let the goal 3 implementation, aka the TAP test to execute test
cases from pg_regress, if sub and pub node don’t dump the same
results, some DDL commands must be changed.

Solution 1 is more lighter weight as we only need to run pg_regress
once. Any other thoughts?

Next I’m going to add more test cases for each command type into this framework.

Regards,
Runqi Tian
Amazon RDS/Aurora for PostgreSQL

[1] 
https://www.postgresql.org/message-id/CALDaNm08gZq9a7xnsbaJMmHmi29_kbEuyShHHfxAKLXPh6btWQ%40mail.gmail.com


On Thu, Oct 6, 2022 at 12:22 PM Alvaro Herrera  wrote:
>
> Hello
>
> 

Re: Documentation refinement for Parallel Scans

2022-10-20 Thread David Rowley
On Thu, 20 Oct 2022 at 19:33, David Rowley  wrote:
> I'll push this soon if nobody has any other wording suggestions.

Pushed.

Thanks for the report.

David




Re: has_privs_of_role vs. is_member_of_role, redux

2022-10-20 Thread Jeff Davis
On Mon, 2022-09-26 at 15:40 -0400, Stephen Frost wrote:
> Predefined roles are special in that they should GRANT just the
> privileges that the role is described to GRANT and that users really
> shouldn't be able to SET ROLE to them nor should they be allowed to
> own
> objects, or at least that's my general feeling on them.

What about granting privileges to others? I don't think that makes
sense for a predefined role, either, because then they'd own a bunch of
grants, which is as awkward as owning objects.

> If an administrator doesn't wish for a user to have the privileges
> provided by the predefined role by default, they should be able to
> set
> that up by creating another role who has that privilege which the
> user
> is able to SET ROLE to.

And that other role could be used for grants, if needed, too.

But I don't think we need to special-case predefined roles though. I
think a lot of administrators would like to declare some roles that are
just a collection of inheritable privileges.


-- 
Jeff Davis
PostgreSQL Contributor Team - AWS






Re: Allow WindowFuncs prosupport function to use more optimal WindowClause options

2022-10-20 Thread David Rowley
On Thu, 13 Oct 2022 at 13:34, David Rowley  wrote:
> So it looks like the same can be done for rank() and dense_rank() too.
> I've added support for those in the attached.

The attached adds support for percent_rank(), cume_dist() and ntile().

David
diff --git a/src/backend/optimizer/plan/planner.c 
b/src/backend/optimizer/plan/planner.c
index 5d0fd6e072..fa28fb539a 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -38,6 +38,7 @@
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
+#include "nodes/supportnodes.h"
 #ifdef OPTIMIZER_DEBUG
 #include "nodes/print.h"
 #endif
@@ -207,6 +208,8 @@ static PathTarget *make_partial_grouping_target(PlannerInfo 
*root,

PathTarget *grouping_target,

Node *havingQual);
 static List *postprocess_setop_tlist(List *new_tlist, List *orig_tlist);
+static void optimize_window_clauses(PlannerInfo *root,
+   
WindowFuncLists *wflists);
 static List *select_active_windows(PlannerInfo *root, WindowFuncLists 
*wflists);
 static PathTarget *make_window_input_target(PlannerInfo *root,

PathTarget *final_target,
@@ -1422,7 +1425,16 @@ grouping_planner(PlannerInfo *root, double 
tuple_fraction)
wflists = find_window_functions((Node *) 
root->processed_tlist,

list_length(parse->windowClause));
if (wflists->numWindowFuncs > 0)
+   {
+   /*
+* See if any modifications can be made to each 
WindowClause
+* to allow the executor to execute the 
WindowFuncs more
+* quickly.
+*/
+   optimize_window_clauses(root, wflists);
+
activeWindows = select_active_windows(root, 
wflists);
+   }
else
parse->hasWindowFuncs = false;
}
@@ -5391,6 +5403,85 @@ postprocess_setop_tlist(List *new_tlist, List 
*orig_tlist)
return new_tlist;
 }
 
+/*
+ * optimize_window_clauses
+ * Call each WindowFunc's prosupport function to see if we're able 
to
+ * make any adjustments to any of the WindowClause's so that the 
executor
+ * can execute the window functions in a more optimal way.
+ *
+ * Currently we only allow adjustments to the WindowClause's frameOptions.  We
+ * may allow more things to be done here in the future.
+ */
+static void
+optimize_window_clauses(PlannerInfo *root, WindowFuncLists *wflists)
+{
+   List   *windowClause = root->parse->windowClause;
+   ListCell   *lc;
+
+   foreach(lc, windowClause)
+   {
+   WindowClause   *wc = lfirst_node(WindowClause, lc);
+   ListCell   *lc2;
+   int optimizedFrameOptions = 
0;
+
+   Assert(wc->winref <= wflists->maxWinRef);
+
+   /* skip any WindowClauses that have no WindowFuncs */
+   if (wflists->windowFuncs[wc->winref] == NIL)
+   continue;
+
+   foreach(lc2, wflists->windowFuncs[wc->winref])
+   {
+   SupportRequestOptimizeWindowClause req;
+   SupportRequestOptimizeWindowClause *res;
+   WindowFunc *wfunc = lfirst_node(WindowFunc, lc2);
+   Oid prosupport;
+
+   prosupport = get_func_support(wfunc->winfnoid);
+
+   /* Check if there's a support function for 'wfunc' */
+   if (!OidIsValid(prosupport))
+   break;  /* can't optimize this 
WindowClause */
+
+   req.type = T_SupportRequestOptimizeWindowClause;
+   req.window_clause = wc;
+   req.window_func = wfunc;
+   req.frameOptions = wc->frameOptions;
+
+   /* call the support function */
+   res = (SupportRequestOptimizeWindowClause *)
+   DatumGetPointer(OidFunctionCall1(prosupport,
+   PointerGetDatum()));
+
+   /*
+* Skip to next WindowClause if the support function 
does not
+* support this request type.
+*/
+   if 

Re: has_privs_of_role vs. is_member_of_role, redux

2022-10-20 Thread Jeff Davis
On Mon, 2022-09-19 at 15:32 -0400, Robert Haas wrote:
> One could take the view that the issue here is that
> pg_read_all_settings shouldn't have the right to create objects in
> the
> first place, and that this INHERIT vs. SET ROLE distinction is just a
> distraction. However, that would require accepting the idea that it's
> possible for a role to lack privileges granted to PUBLIC, which also
> sounds pretty unsatisfying. On the whole, I'm inclined to think it's
> reasonable to suppose that if you want to grant a role to someone
> without letting them create objects owned by that role, it should be
> a
> role that doesn't own any existing objects either. Essentially,
> that's
> legislating that predefined roles should be minimally privileged:
> they
> should hold the ability to do whatever it is that they are there to
> do
> (like read all settings) but not have any other privileges (like the
> ability to do stuff to objects they own).

I like this approach -- the idea that you can create a role that can't
own anything, can't create anything, and to which nobody else can "SET
ROLE".

Creating a "virtual" role like that feels much more declarative and
easy to document: "this isn't a real user, it's just a collection of
inheritable privileges". Even superusers couldn't "SET ROLE
pg_read_all_settings" or "OWNER TO pg_signal_backend".

I wouldn't call it "minimally privileged" (which feels wrong because it
wouldn't even have privileges on PUBLIC, as you say); I'd just say that
it's a type of role where those things just don't make sense.

-- 
Jeff Davis
PostgreSQL Contributor Team - AWS






Re: Make finding openssl program a configure or meson option

2022-10-20 Thread Peter Eisentraut

On 19.10.22 05:06, Michael Paquier wrote:

Looks fine as a whole, except for one nit.

src/test/ssl/t/001_ssltests.pl: warn 'couldn\'t run `openssl x509` to get 
client cert serialno';
Perhaps this warning should mentioned $ENV{OPENSSL} instead?


Committed with that change.





Re: cross-platform pg_basebackup

2022-10-20 Thread Andrew Dunstan


On 2022-10-20 Th 14:47, Robert Haas wrote:
> On Thu, Oct 20, 2022 at 1:28 PM Tom Lane  wrote:
>> Robert Haas  writes:
>>> Cool. Here's a patch.
>> LGTM, except I'd be inclined to ensure that all the macros
>> are function-style, ie
>>
>> +#define IS_DIR_SEP(ch) IS_NONWINDOWS_DIR_SEP(ch)
>>
>> not just
>>
>> +#define IS_DIR_SEP IS_NONWINDOWS_DIR_SEP
>>
>> I don't recall the exact rules, but I know that the second style
>> can lead to expanding the macro in more cases, which we likely
>> don't want.  It also seems like better documentation to show
>> the expected arguments.
> OK, thanks. v2 attached.
>


Looks good.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

2022-10-20 Thread Peter Geoghegan
On Thu, Oct 20, 2022 at 11:09 AM Jeff Davis  wrote:
> The terminology is getting slightly confusing here: by
> "antiwraparound", you mean that it's not skipping unfrozen pages, and
> therefore is able to advance relfrozenxid. Whereas the
> PROC_VACUUM_FOR_WRAPAROUND is the same thing, except done with greater
> urgency because wraparound is imminent. Right?

Not really.

I started this thread to discuss a behavior in autovacuum.c and proc.c
(the autocancellation behavior), which is, strictly speaking, not
related to the current vacuumlazy.c behavior we call aggressive mode
VACUUM. Various hackers have in the past described antiwraparound
autovacuum as "implying aggressive", which makes sense; what's the
point in doing an antiwraparound autovacuum that can almost never
advance relfrozenxid?

It is nevertheless true that antiwraparound autovacuum is an
independent behavior to aggressive VACUUM. The former is an autovacuum
thing, and the latter is a VACUUM thing. That's just how it works,
mechanically.

If this division seems artificial or pedantic to you, then consider
the fact that you can quite easily get a non-aggressive antiwraparound
autovacuum by using the storage option called
autovacuum_freeze_max_age (instead of the GUC):

https://postgr.es/m/CAH2-Wz=DJAokY_GhKJchgpa8k9t_H_OVOvfPEn97jGNr9W=d...@mail.gmail.com

This is even a case where we'll output a distinct description in the
server log when autovacuum logging is enabled and gets triggered. So
while there may be no point in an antiwraparound autovacuum that is
non-aggressive, that doesn't stop them from happening. Regardless of
whether or not that's an intended behavior, that's just how the
mechanism has been constructed.

> > There is no inherent reason why we have to do both
> > things at exactly the same XID-age-wise time. But there is reason to
> > think that doing so could make matters worse rather than better [1].
>
> Can you explain?

Why should the special autocancellation behavior for antiwraparound
autovacuums kick in at exactly the same point that we first launch an
antiwraparound autovacuum? Maybe that aggressive intervention will be
needed, in the end, but why start there?

With my patch series, antiwraparound autovacuums still occur, but
they're confined to things like static tables -- things that are
pretty much edge cases. They still need to behave sensibly (i.e.
reliably advance relfrozenxid based on some principled approach), but
now they're more like "an autovacuum that happens because no other
condition triggered an autovacuum". To some degree this is already the
case, but I'd like to be more deliberate about it.

Leaving my patch series aside, I still don't think that it makes sense
to make it impossible to auto-cancel antiwraparound autovacuums,
across the board, regardless of the underlying table age. We still
need something like that, but why not give a still-cancellable
autovacuum worker a chance to resolve the problem? Why take a risk of
causing much bigger problems (e.g., blocking automated DDL that blocks
simple SELECT queries) before the point that that starts to look like
the lesser risk (compared to hitting xidStopLimit)?

--
Peter Geoghegan




Re: Avoid memory leaks during base backups

2022-10-20 Thread Robert Haas
On Thu, Oct 20, 2022 at 1:35 PM Bharath Rupireddy
 wrote:
> I think elsewhere in the code we reset dangling pointers either ways -
> before or after deleting/resetting memory context. But placing them
> before would give us extra safety in case memory context
> deletion/reset fails. Not sure what's the best way.

I think it's OK to assume that deallocating memory will always
succeed, so it doesn't matter whether you do it just before or just
after that. But it's not OK to assume that *allocating* memory will
always succeed.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: cross-platform pg_basebackup

2022-10-20 Thread Robert Haas
On Thu, Oct 20, 2022 at 1:28 PM Tom Lane  wrote:
> Robert Haas  writes:
> > Cool. Here's a patch.
>
> LGTM, except I'd be inclined to ensure that all the macros
> are function-style, ie
>
> +#define IS_DIR_SEP(ch) IS_NONWINDOWS_DIR_SEP(ch)
>
> not just
>
> +#define IS_DIR_SEP IS_NONWINDOWS_DIR_SEP
>
> I don't recall the exact rules, but I know that the second style
> can lead to expanding the macro in more cases, which we likely
> don't want.  It also seems like better documentation to show
> the expected arguments.

OK, thanks. v2 attached.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v2-0001-pg_basebackup-Fix-cross-platform-tablespace-reloc.patch
Description: Binary data


Re: [PoC] Let libpq reject unexpected authentication requests

2022-10-20 Thread Jacob Champion
On Wed, Oct 12, 2022 at 9:40 AM Jacob Champion  wrote:
> On 10/5/22 06:33, Peter Eisentraut wrote:
> > I think it would be good to put some provisions in place here, even if
> > they are elementary.  Otherwise, there will be a significant burden on
> > the person who implements the next SASL method (i.e., you ;-) ) to
> > figure that out then.
>
> Sounds good, I'll work on that. v10 does not yet make changes in this area.

v11 makes an attempt at this (see 0003), using the proposed string list.

Personally I'm not happy with the amount of complexity it adds in
exchange for flexibility we can't use yet. Maybe there's a way to
simplify it, but I think the two-tiered approach of the patch has to
remain, unless we find a way to move SASL mechanism selection to a
different part of the code. I'm not sure that'd be helpful.

Maybe I should just add a basic Assert here, to trip if someone adds a
new SASL mechanism, and point that lucky person to this thread with a
comment?

--Jacob
From d747093ac9fe9875ae1f9cc5273e6133627f9691 Mon Sep 17 00:00:00 2001
From: Jacob Champion 
Date: Tue, 18 Oct 2022 16:55:36 -0700
Subject: [PATCH v11 3/3] require_auth: decouple SASL and SCRAM

Rather than assume that an AUTH_REQ_SASL* code refers to SCRAM-SHA-256,
future-proof by separating the single allowlist into a list of allowed
authentication request codes and a list of allowed SASL mechanisms.

The require_auth check is now separated into two tiers. The
AUTH_REQ_SASL* codes are always allowed. If the server sends one,
responsibility for the check then falls to pg_SASL_init(), which
compares the selected mechanism against the list of allowed mechanisms.
(Other SCRAM code is already responsible for rejecting unexpected or
out-of-order AUTH_REQ_SASL_* codes, so that's not explicitly handled
with this addition.)

Since there's only one recognized SASL mechanism, conn->sasl_mechs
currently only points at static hardcoded lists. Whenever a second
mechanism is added, the list will need to be managed dynamically.
---
 src/interfaces/libpq/fe-auth.c| 35 +++
 src/interfaces/libpq/fe-connect.c | 42 +++
 src/interfaces/libpq/libpq-int.h  |  2 ++
 src/test/authentication/t/001_password.pl | 10 +++---
 src/test/ssl/t/002_scram.pl   |  6 
 5 files changed, 84 insertions(+), 11 deletions(-)

diff --git a/src/interfaces/libpq/fe-auth.c b/src/interfaces/libpq/fe-auth.c
index 295b978525..24c31ae624 100644
--- a/src/interfaces/libpq/fe-auth.c
+++ b/src/interfaces/libpq/fe-auth.c
@@ -536,6 +536,41 @@ pg_SASL_init(PGconn *conn, int payloadlen)
 		goto error;
 	}
 
+	/*
+	 * Before going ahead with any SASL exchange, ensure that the user has
+	 * allowed (or, alternatively, has not forbidden) this particular mechanism.
+	 *
+	 * In a hypothetical future where a server responds with multiple SASL
+	 * mechanism families, we would need to instead consult this list up above,
+	 * during mechanism negotiation. We don't live in that world yet. The server
+	 * presents one auth method and we decide whether that's acceptable or not.
+	 */
+	if (conn->sasl_mechs)
+	{
+		const char **mech;
+		bool		found = false;
+
+		Assert(conn->require_auth);
+
+		for (mech = conn->sasl_mechs; *mech; mech++)
+		{
+			if (strcmp(*mech, selected_mechanism) == 0)
+			{
+found = true;
+break;
+			}
+		}
+
+		if ((conn->sasl_mechs_denied && found)
+			|| (!conn->sasl_mechs_denied && !found))
+		{
+			appendPQExpBuffer(>errorMessage,
+			  libpq_gettext("auth method \"%s\" requirement failed: server requested unacceptable SASL mechanism \"%s\"\n"),
+			  conn->require_auth, selected_mechanism);
+			goto error;
+		}
+	}
+
 	if (conn->channel_binding[0] == 'r' &&	/* require */
 		strcmp(selected_mechanism, SCRAM_SHA_256_PLUS_NAME) != 0)
 	{
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index eaca8cee1f..be002b4fda 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -1259,11 +1259,25 @@ connectOptions2(PGconn *conn)
 		bool		first, more;
 		bool		negated = false;
 
+		static const uint32 default_methods = (
+			1 << AUTH_REQ_SASL
+			| 1 << AUTH_REQ_SASL_CONT
+			| 1 << AUTH_REQ_SASL_FIN
+		);
+		static const char *no_mechs[] = { NULL };
+
 		/*
-		 * By default, start from an empty set of allowed options and add to it.
+		 * By default, start from a minimum set of allowed options and add to
+		 * it.
+		 *
+		 * NB: The SASL method codes are always "allowed" here. If the server
+		 * requests SASL auth, pg_SASL_init() will enforce adherence to the
+		 * sasl_mechs list, which by default is empty.
 		 */
 		conn->auth_required = true;
-		conn->allowed_auth_methods = 0;
+		conn->allowed_auth_methods = default_methods;
+		conn->sasl_mechs = no_mechs;
+		conn->sasl_mechs_denied = false;
 
 		for (first = true, more = true; more; first = false)
 		{
@@ -1289,6 +1303,9 @@ connectOptions2(PGconn *conn)
 	 */
 	

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

2022-10-20 Thread Jeff Davis
On Wed, 2022-10-19 at 14:58 -0700, Peter Geoghegan wrote:
> Why should the PROC_VACUUM_FOR_WRAPAROUND behavior happen on
> *exactly*
> the same timeline as the one used to launch an antiwraparound
> autovacuum, though?

The terminology is getting slightly confusing here: by
"antiwraparound", you mean that it's not skipping unfrozen pages, and
therefore is able to advance relfrozenxid. Whereas the
PROC_VACUUM_FOR_WRAPAROUND is the same thing, except done with greater
urgency because wraparound is imminent. Right?

> There is no inherent reason why we have to do both
> things at exactly the same XID-age-wise time. But there is reason to
> think that doing so could make matters worse rather than better [1].

Can you explain?

Regards,
Jeff Davis





Re: Avoid memory leaks during base backups

2022-10-20 Thread Alvaro Herrera
On 2022-Oct-20, Bharath Rupireddy wrote:

> I think elsewhere in the code we reset dangling pointers either ways -
> before or after deleting/resetting memory context. But placing them
> before would give us extra safety in case memory context
> deletion/reset fails. Not sure what's the best way. However, I'm
> nullifying the dangling pointers after deleting/resetting memory
> context.

I agree that's a good idea, and the patch looks good to me, but I don't
think asserting that they are null afterwards is useful.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/




Re: Avoid memory leaks during base backups

2022-10-20 Thread Bharath Rupireddy
On Thu, Oct 20, 2022 at 9:48 PM Robert Haas  wrote:
>
> On Thu, Oct 20, 2022 at 6:47 AM Bharath Rupireddy
>  wrote:
> > I tried implementing this, please see the attached v7 patch.
>
> I haven't checked this in detail but it looks much more reasonable in
> terms of code footprint. However, we should, I think, set backup_state
> = NULL and tablespace_map = NULL before deleting the memory context.
> As you have it, I believe that if backup_state = (BackupState *)
> palloc0(sizeof(BackupState)) fails -- say due to running out of memory
> -- then those variables could end up pointing to garbage because the
> context had already been reset before initializing them. I don't know
> whether it's possible for that to cause any concrete harm, but nulling
> out the pointers seems like cheap insurance.

I think elsewhere in the code we reset dangling pointers either ways -
before or after deleting/resetting memory context. But placing them
before would give us extra safety in case memory context
deletion/reset fails. Not sure what's the best way. However, I'm
nullifying the dangling pointers after deleting/resetting memory
context.
MemoryContextDelete(Conf->buildCxt);
MemoryContextDelete(PostmasterContext);
MemoryContextDelete(rulescxt);

Please see the attached v8 patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


v8-0001-Avoid-memory-leaks-during-backups-using-SQL-calla.patch
Description: Binary data


Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2022-10-20 Thread Andres Freund
Hi,

- we shouldn't do pgstat_count_io_op() while the buffer header lock is held,
  if possible.

  I wonder if we should add a "source" output argument to
  StrategyGetBuffer(). Then nearly all the counting can happen in
  BufferAlloc().

- "repossession" is a very unintuitive name for me. If we want something like
  it, can't we just name it reuse_failed or such?

- Wonder if the column names should be reads, writes, extends, etc instead of
  the current naming pattern

- Is it actually correct to count evictions in StrategyGetBuffer()? What if we
  then decide to not use that buffer in BufferAlloc()? Yes, that'll be counted
  via rejected, but that still leaves the eviction count to be "misleading"?


On 2022-10-19 15:26:51 -0400, Melanie Plageman wrote:
> I have made some major changes in this area to make the columns more
> useful. I have renamed and split "clocksweeps". It is now "evicted" and
> "freelist acquired". This makes it clear when a block must be evicted
> from a shared buffer must be and may help to identify misconfiguration
> of shared buffers.

I'm not sure freelist acquired is really that useful? If we don't add it, we
should however definitely not count buffers from the freelist as evictions.


> There is some nuance here that I tried to make clear in the docs.
> "freelist acquired" in a shared context is straightforward.
> "freelist acquired" in a strategy context is counted when a shared
> buffer is added to the strategy ring (not when it is reused).

Not sure what the second half here means - why would a buffer that's not from
the freelist ever be counted as being from the freelist?


> "freelist_acquired" is confusing for local buffers but I wanted to
> distinguish between reuse/eviction of local buffers and initial
> allocation. "freelist_acquired" seemed more fitting because there is a
> clocksweep to find a local buffer and if it hasn't been allocated yet it
> is allocated in a place similar to where shared buffers acquire a buffer
> from the freelist. If I didn't count it here, I would need to make a new
> column only for local buffers called "allocated" or something like that.

I think you're making this too granular. We need to have more detail than
today. But we don't necessarily need to catch every nuance.


> I chose not to call "evicted" "sb_evicted"
> because then we would need a separate "local_evicted". I could instead
> make "local_evicted", "sb_evicted", and rename "reused" to
> "strat_evicted". If I did that we would end up with separate columns for
> every IO Context describing behavior when a buffer is initially acquired
> vs when it is reused.
>
> It would look something like this:
>
> shared buffers:
> initial: freelist_acquired
> reused: sb_evicted
>
> local buffers:
> initial: allocated
> reused: local_evicted
>
> strategy buffers:
> initial: sb_evicted | freelist_acquired
> reused: strat_evicted
> replaced: sb_evicted | freelist_acquired
>
> This seems not too bad at first, but if you consider that later we will
> add other kinds of IO -- eg WAL IO or temporary file IO, we won't be
> able to use these existing columns and will need to add even more
> columns describing the exact behavior in those cases.

I think it's clearly not the right direction.



> I have also added the columns "repossessed" and "rejected". "rejected"
> is when a bulkread rejects a strategy buffer because it is dirty and
> requires flush. Seeing a lot of rejections could indicate you need to
> vacuum. "repossessed" is the number of times a strategy buffer was
> pinned or in use by another backend and had to be removed from the
> strategy ring and replaced with a new shared buffer. This gives you some
> indication that there is contention on blocks recently used by a
> strategy.

I don't immediately see a real use case for repossessed. Why isn't it
sufficient to count it as part of rejected?

Greetings,

Andres Freund




Re: cross-platform pg_basebackup

2022-10-20 Thread Tom Lane
Robert Haas  writes:
> Cool. Here's a patch.

LGTM, except I'd be inclined to ensure that all the macros
are function-style, ie

+#define IS_DIR_SEP(ch) IS_NONWINDOWS_DIR_SEP(ch)

not just

+#define IS_DIR_SEP IS_NONWINDOWS_DIR_SEP

I don't recall the exact rules, but I know that the second style
can lead to expanding the macro in more cases, which we likely
don't want.  It also seems like better documentation to show
the expected arguments.

regards, tom lane




Re: cross-platform pg_basebackup

2022-10-20 Thread Robert Haas
On Thu, Oct 20, 2022 at 12:17 PM Tom Lane  wrote:
> Robert Haas  writes:
> > However, I think we could relax the check a little bit, something
> > along the lines of !is_nonwindows_absolute_path(cell->old_dir) &&
> > !is_windows_absolute_path(dir). We can't actually know whether the
> > remote side is Windows or non-Windows, but if the string we're given
> > is plausibly an absolute path under either set of conventions, it's
> > probably fine to just search the list for it and see if it shows up.
>
> Seems reasonable.

Cool. Here's a patch.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v1-0001-pg_basebackup-When-Tx-y-is-used-weaken-absolute-p.patch
Description: Binary data


Re: Fix GetWALAvailability function code comments for WALAVAIL_REMOVED return value

2022-10-20 Thread sirisha chamarthi
On Wed, Oct 19, 2022 at 7:59 PM Kyotaro Horiguchi 
wrote:

> At Wed, 19 Oct 2022 13:06:08 +0530, Bharath Rupireddy <
> bharath.rupireddyforpostg...@gmail.com> wrote in
> > On Wed, Oct 19, 2022 at 12:39 PM sirisha chamarthi
> >  wrote:
> > >
> > > The current code comment says that the replication stream on a slot
> with the given targetLSN can't continue after a restart but even without a
> restart the stream cannot continue. The slot is invalidated and the
> walsender process is terminated by the checkpoint process. Attaching a
> small patch to fix the comment.
>
> In short, the proposed fix alone seems fine to me. If we want to show
> further details, I would add a bit as follows.
>
> | * * WALAVAIL_REMOVED means it has been removed. A replication stream on
> | *   a slot with this LSN cannot continue.  Note that the affected
> | *   processes have been terminated by checkpointer, too.
>

Thanks for your comments! Attached the patch with your suggestions.

Thanks,
Sirisha


v2-0001-Fix-GetWALAvailability-function-code-comments.patch
Description: Binary data


Re: Nicely exiting PG_TRY and PG_CATCH

2022-10-20 Thread Mikhail Gribkov
> Yeah, you can't return or goto out of the PG_TRY part.

So this is a problem if the check would ever work.
(Sorry for such a delayed answer.)

Then we need to fix it. Attached is a minimal patch, which changes nothing
except for correct PG_TRY exiting.
Isn't it better this way?

CCing to Peter Eisentraut, whose patch originally introduced these returns.
Peter, will such a patch work somehow against your initial idea?

--
 best regards,
Mikhail A. Gribkov

e-mail: youzh...@gmail.com
*http://www.flickr.com/photos/youzhick/albums
*
http://www.strava.com/athletes/5085772
phone: +7(916)604-71-12
Telegram: @youzhick



On Sat, Oct 8, 2022 at 12:19 AM Tom Lane  wrote:

> Mikhail Gribkov  writes:
> > Usually it's not a good idea to exit PG_TRY() block via return statement.
> > Otherwise it would leave PG_exception_stack global variable in a wrong
> > state and next ereport() will jump to some junk address.
>
> Yeah, you can't return or goto out of the PG_TRY part.
>
> > Another suspicious case is PG_CATCH block in jsonb_plpython.c:
>
> This should be OK.  The PG_CATCH and PG_FINALLY macros are set up so that
> we've fully restored that state *before* we execute any of the
> error-handling code.  It would be basically impossible to have a guarantee
> that CATCH blocks never throw errors; they'd be so restricted as to be
> near useless, like signal handlers.
>
> regards, tom lane
>
diff --git a/src/pl/plpython/plpy_exec.c b/src/pl/plpython/plpy_exec.c
index 923703535a..0c8550e53c 100644
--- a/src/pl/plpython/plpy_exec.c
+++ b/src/pl/plpython/plpy_exec.c
@@ -418,7 +418,7 @@ PLy_function_build_args(FunctionCallInfo fcinfo, PLyProcedure *proc)
 	{
 		args = PyList_New(proc->nargs);
 		if (!args)
-			return NULL;
+			goto end_of_try;
 
 		for (i = 0; i < proc->nargs; i++)
 		{
@@ -458,6 +458,7 @@ PLy_function_build_args(FunctionCallInfo fcinfo, PLyProcedure *proc)
 			/* cache the output conversion functions */
 			PLy_output_setup_record(>result, desc, proc);
 		}
+end_of_try:
 	}
 	PG_CATCH();
 	{
@@ -694,7 +695,7 @@ PLy_trigger_build_args(FunctionCallInfo fcinfo, PLyProcedure *proc, HeapTuple *r
 	{
 		pltdata = PyDict_New();
 		if (!pltdata)
-			return NULL;
+			goto end_of_try;
 
 		pltname = PLyUnicode_FromString(tdata->tg_trigger->tgname);
 		PyDict_SetItemString(pltdata, "name", pltname);
@@ -839,7 +840,8 @@ PLy_trigger_build_args(FunctionCallInfo fcinfo, PLyProcedure *proc, HeapTuple *r
 			if (!pltargs)
 			{
 Py_DECREF(pltdata);
-return NULL;
+pltdata = NULL;
+goto end_of_try;
 			}
 			for (i = 0; i < tdata->tg_trigger->tgnargs; i++)
 			{
@@ -858,6 +860,7 @@ PLy_trigger_build_args(FunctionCallInfo fcinfo, PLyProcedure *proc, HeapTuple *r
 		}
 		PyDict_SetItemString(pltdata, "args", pltargs);
 		Py_DECREF(pltargs);
+end_of_try:
 	}
 	PG_CATCH();
 	{


Re: Avoid memory leaks during base backups

2022-10-20 Thread Robert Haas
On Thu, Oct 20, 2022 at 6:47 AM Bharath Rupireddy
 wrote:
> I tried implementing this, please see the attached v7 patch.

I haven't checked this in detail but it looks much more reasonable in
terms of code footprint. However, we should, I think, set backup_state
= NULL and tablespace_map = NULL before deleting the memory context.
As you have it, I believe that if backup_state = (BackupState *)
palloc0(sizeof(BackupState)) fails -- say due to running out of memory
-- then those variables could end up pointing to garbage because the
context had already been reset before initializing them. I don't know
whether it's possible for that to cause any concrete harm, but nulling
out the pointers seems like cheap insurance.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: cross-platform pg_basebackup

2022-10-20 Thread Tom Lane
Robert Haas  writes:
> However, I think we could relax the check a little bit, something
> along the lines of !is_nonwindows_absolute_path(cell->old_dir) &&
> !is_windows_absolute_path(dir). We can't actually know whether the
> remote side is Windows or non-Windows, but if the string we're given
> is plausibly an absolute path under either set of conventions, it's
> probably fine to just search the list for it and see if it shows up.

Seems reasonable.

> This would have the disadvantage that if a Linux user creates a
> tablespace directory inside $PGDATA and gives it a name like
> /home/rhaas/pgdata/C:\Program Files\PostgreSQL\Data, and then attempts
> a backup with '-TC:\Program Files\PostgreSQL\Data=/tmp/ts1' it will
> not relocate the tablespace, yet the user won't get a message
> explaining why. I'm prepared to dismiss that scenario as "not a real
> use case".

Agreed.

regards, tom lane




cross-platform pg_basebackup

2022-10-20 Thread Robert Haas
Suppose that, for some reason, you want to use pg_basebackup on a
Linux machine to back up a database cluster on a Windows machine.
Suppose further that you attempt to use the -T option. Then you might
run afoul of this check:

/*
 * This check isn't absolutely necessary.  But all tablespaces are created
 * with absolute directories, so specifying a non-absolute path here would
 * just never match, possibly confusing users.  It's also good to be
 * consistent with the new_dir check.
 */
if (!is_absolute_path(cell->old_dir))
pg_fatal("old directory is not an absolute path in tablespace
mapping: %s",
 cell->old_dir);

The problem is that the definition of is_absolute_path() here differs
depending on whether you are on Windows or not. So this code is, I
think, subtly incorrect. What it is testing is whether the
user-specified pathname is an absolute pathname *on the local machine*
whereas what it should be testing is whether the user-specified
pathname is an absolute pathname *on the remote machine*. There's no
problem if both sides are Windows or neither side is Windows, but if
the remote side is and the local side isn't, then something like
-TC:\foo=/backup/foo will fail. As far as I know, there's no reason
why that shouldn't be permitted to work.

What this check is actually intending to prevent, I believe, is
something like -T../mytablespace=/bkp/ts1, because that wouldn't
actually work: the value in the list will be an absolute path. The
tablespace wouldn't get remapped, and the user might be confused about
why it didn't, so it is good that we tell them what they did wrong.
However, I think we could relax the check a little bit, something
along the lines of !is_nonwindows_absolute_path(cell->old_dir) &&
!is_windows_absolute_path(dir). We can't actually know whether the
remote side is Windows or non-Windows, but if the string we're given
is plausibly an absolute path under either set of conventions, it's
probably fine to just search the list for it and see if it shows up.

This would have the disadvantage that if a Linux user creates a
tablespace directory inside $PGDATA and gives it a name like
/home/rhaas/pgdata/C:\Program Files\PostgreSQL\Data, and then attempts
a backup with '-TC:\Program Files\PostgreSQL\Data=/tmp/ts1' it will
not relocate the tablespace, yet the user won't get a message
explaining why. I'm prepared to dismiss that scenario as "not a real
use case".

Thoughts?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: date_part/extract parse curiosity

2022-10-20 Thread Japin Li


On Thu, 20 Oct 2022 at 22:12, Tom Lane  wrote:
> Japin Li  writes:
>> On Thu, 20 Oct 2022 at 20:45, Erik Rijkers  wrote:
>>> I noticed that
>>> select date_part('millennium', now()); --> 3
>>> 
>>> will execute also, unperturbed, in this form:
>>> select date_part('millennium x', now()); --> 3
>
>> Maybe we should document this.  I'd be inclined to change the code to
>> match the certain valid field names.
>
> I think changing this behavior has a significant chance of drawing
> complaints and zero chance of making anyone happier.
>

Maybe.

> The current state of affairs (including the lack of unnecessary
> documentation detail) is likely quite intentional.
>

I'm curious about why not document this?

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.




Re: ccache, MSVC, and meson

2022-10-20 Thread Justin Pryzby
Note that ccache 4.7 was released (and also uploaded to chocolatey).
That supports depend mode with MSVC.

PCH made cache misses a lot less slow.  However, I still haven't found
anyt way to improve compilation speed much on cirrusci...

-- 
Justin




Re: date_part/extract parse curiosity

2022-10-20 Thread Tom Lane
Japin Li  writes:
> On Thu, 20 Oct 2022 at 20:45, Erik Rijkers  wrote:
>> I noticed that
>> select date_part('millennium', now()); --> 3
>> 
>> will execute also, unperturbed, in this form:
>> select date_part('millennium x', now()); --> 3

> Maybe we should document this.  I'd be inclined to change the code to
> match the certain valid field names.

I think changing this behavior has a significant chance of drawing
complaints and zero chance of making anyone happier.

The current state of affairs (including the lack of unnecessary
documentation detail) is likely quite intentional.

regards, tom lane




Re: date_part/extract parse curiosity

2022-10-20 Thread Japin Li


On Thu, 20 Oct 2022 at 20:45, Erik Rijkers  wrote:
> Hi,
>
> I noticed that
>   select date_part('millennium', now()); --> 3
>
> will execute also, unperturbed, in this form:
>   select date_part('millennium x', now()); --> 3
>
> By the same token
>
>   select extract(millennium from now()) --> 3
>   select extract(millenniumx from now()) --> 3
>
> This laxness occurs in all releases, and with 'millennium',
> 'millisecond', and 'microsecond' (at least).
>
> Even though it's not likely to cause much real-life headaches, and I
> hesitate to call it a real bug, perhaps it would be better if it could
> be a bit stricter.
>

According to the documentation [1], the extract() only has some field names,
however, the code use strncmp() to compare the units and tokens.

int
DecodeUnits(int field, char *lowtoken, int *val)
{
int type;
const datetkn *tp;

tp = deltacache[field];
/* use strncmp so that we match truncated tokens */  < 
here
if (tp == NULL || strncmp(lowtoken, tp->token, TOKMAXLEN) != 0)
{
tp = datebsearch(lowtoken, deltatktbl, szdeltatktbl);
}
if (tp == NULL)
{
type = UNKNOWN_FIELD;
*val = 0;
}
else
{
deltacache[field] = tp;
type = tp->type;
*val = tp->value;
}

return type;
}

This is convenient for field names such as millennium and millenniums,
however it also valid for millennium, which is looks strange.

Maybe we should document this.  I'd be inclined to change the code to
match the certain valid field names.

Any thoughts?

[1] 
https://www.postgresql.org/docs/current/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.




Re: Logical WAL sender unresponsive during decoding commit

2022-10-20 Thread Robert Haas
On Thu, Oct 20, 2022 at 1:37 AM Amit Kapila  wrote:
> On Thu, Oct 20, 2022 at 5:17 AM Robert Haas  wrote:
> > > Pushed.
> >
> > I think this was a good change, but there's at least one other problem
> > here: within ReorderBufferRestoreChanges, the while (restored <
> > max_changes_in_memory && *segno <= last_segno) doesn't seem to contain
> > a CFI. Note that this can loop either by repeatedly failing to open a
> > file, or by repeatedly reading from a file and passing the data read
> > to ReorderBufferRestoreChange. So I think there should just be a CFI
> > at the top of this loop to make sure both cases are covered.
>
> Agreed. The failures due to file operations can make this loop
> unpredictable in terms of time, so it is a good idea to have CFI at
> the top of this loop.
>
> I can take care of this unless there are any objections or you want to
> do it. We have backpatched the previous similar change, so I think we
> should backpatch this as well. What do you think?

Please go ahead. +1 for back-patching.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




date_part/extract parse curiosity

2022-10-20 Thread Erik Rijkers

Hi,

I noticed that
  select date_part('millennium', now()); --> 3

will execute also, unperturbed, in this form:
  select date_part('millennium x', now()); --> 3

By the same token

  select extract(millennium from now()) --> 3
  select extract(millenniumx from now()) --> 3

This laxness occurs in all releases, and with 'millennium', 
'millisecond', and 'microsecond' (at least).


Even though it's not likely to cause much real-life headaches, and I 
hesitate to call it a real bug, perhaps it would be better if it could 
be a bit stricter.


Thanks,

Erik Rijkers




Re: Standby recovers records from wrong timeline

2022-10-20 Thread Ants Aasma
On Thu, 20 Oct 2022 at 11:30, Kyotaro Horiguchi  wrote:
>
> primary_restored did a time-travel to past a bit because of the
> recovery_target=immediate. In other words, the primary_restored and
> the replica diverge. I don't think it is legit to connect a diverged
> standby to a primary.

primary_restored did timetravel to the past, as we're doing PITR on the
primary that's the expected behavior. However replica is not diverged,
it's a copy of the exact same basebackup. The usecase is restoring a
cluster from backup using PITR and using the same backup to create a
standby. Currently this breaks when primary has not yet archived any
segments.

> So, about the behavior in doubt, it is the correct behavior to
> seemingly ignore the history file in the archive. Recovery assumes
> that the first half of the first segment of the new timeline is the
> same with the same segment of the old timeline (.partial) so it is
> legit to read the  file til the end and that causes the
> replica goes beyond the divergence point.

What is happening is that primary_restored has a timeline switch at
tli 2, lsn 0/2000100, and the next insert record starts in the same
segment. Replica is starting on the same backup on timeline 1, tries to
find tli 2 seg 2, which is not archived yet, so falls back to tli 1 seg 2
and replays tli 1 seg 2 continuing to tli seg 3, then connects to primary
and starts applying wal starting from tli 2 seg 4. To me that seems
completely broken.

> As you know, when new primary starts a diverged history, the
> recommended way is to blow (or stash) away the archive, then take a
> new backup from the running primary.

My understanding is that backup archives are supposed to remain valid
even after PITR or equivalently a lagging standby promoting.

--
Ants Aasma
Senior Database Engineer
www.cybertec-postgresql.com




​session_user and current_user on LOG

2022-10-20 Thread Marcos Pegoraro
Having a sup_user and a normal_user, login with sup_user
select session_user, current_user
sup_user, sup_user

set role normal_user;
select session_user, current_user
sup_user, normal_user

But then, when sup_user was running with normal_user grants an exception
occurs
select * from Some_Schema.Some_Table;

I was running with SET ROLE NORMAL_USER but I cannot see that info on LOG

user_name;error_severity;message
sup_user;ERROR;permission denied for schema Some_Schema

Would be good to have on LOG session_user / current_user if they differ,
what do you think ?
Which one is better
- Put session_user / current_user on same %u prefix and fill current_user
only if it differs from session_user ?
- Create another prefix for it, %o for example
thanks,
Marcos


Re: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)

2022-10-20 Thread Amit Kapila
On Wed, Oct 19, 2022 at 4:47 PM Amit Kapila  wrote:
>
> On Wed, Oct 19, 2022 at 1:08 PM Masahiko Sawada  wrote:
> >
> > On Wed, Oct 19, 2022 at 11:58 AM Masahiko Sawada  
> > wrote:
> > >
> > >
> > > I've attached two patches that need to be back-patched to all branches
> > > and includes Change-1, Change-2, and a test case for them. FYI this
> > > patch resolves the assertion failure reported in this thread as well
> > > as one reported in another thread[2]. So I borrowed some of the
> > > changes from the patch[2] Osumi-san recently proposed.
> > >
> >
> > Amit pointed out offlist that the changes in reorderbuffer.c is not
> > pgindent'ed. I've run pgindent and attached updated patches.
> >
>
> Thanks, I have tested these across all branches till v10 and it works
> as expected. I am planning to push this tomorrow unless I see any
> further comments.
>

Pushed.

-- 
With Regards,
Amit Kapila.




Re: Avoid memory leaks during base backups

2022-10-20 Thread Bharath Rupireddy
On Wed, Oct 19, 2022 at 9:23 PM Bharath Rupireddy
 wrote:
>
> On Wed, Oct 19, 2022 at 8:10 PM Robert Haas  wrote:
>
> > One option is to just have do_pg_start_backup() blow
> > away any old memory context before it allocates any new memory, and
> > forget about releasing anything in PostgresMain(). That means memory
> > could remain allocated after a failure until you next retry the
> > operation, but I don't think that really matters. It's not a lot of
> > memory; we just don't want it to accumulate across many repetitions.
>
> This seems reasonable to me.

I tried implementing this, please see the attached v7 patch.
Currently, memory allocated in the new memory context is around 4KB
[1]. In the extreme and rarest of the rare cases where somebody
executes select pg_backup_start(repeat('foo', 1024)); or a failure
occurs before reaching pg_backup_stop() on all of the sessions
(max_connections) at once, the maximum/peak memory bloat/leak is
around max_connections*4KB, which will still be way less than the
total amount of RAM. Hence, I think this approach seems very
reasonable and non-invasive yet can solve the memory leak problem.
Thoughts?

[1]
(gdb) p *backupcontext
$4 = {type = T_AllocSetContext, isReset = false, allowInCritSection =
false, mem_allocated = 4232,
  methods = 0x55c925b81f90 , parent =
0x55c92766d2a0, firstchild = 0x0, prevchild = 0x0,
  nextchild = 0x55c92773f1f0, name = 0x55c9258be05c "on-line backup
context", ident = 0x0, reset_cbs = 0x0}

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From 657933ab5cc6001ad13d56b89fbf220af541f216 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Thu, 20 Oct 2022 10:09:59 +
Subject: [PATCH v7] Avoid memory leaks during backups using SQL-callable
 functions

---
 src/backend/access/transam/xlogfuncs.c | 43 --
 1 file changed, 20 insertions(+), 23 deletions(-)

diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index a801a94fe8..198a99a1ed 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -45,6 +45,9 @@
 static BackupState *backup_state = NULL;
 static StringInfo tablespace_map = NULL;
 
+/* A long-lived workspace for SQL-callable backup functions. */
+static MemoryContext backupcontext = NULL;
+
 /*
  * pg_backup_start: set up for taking an on-line backup dump
  *
@@ -72,27 +75,22 @@ pg_backup_start(PG_FUNCTION_ARGS)
 
 	/*
 	 * backup_state and tablespace_map need to be long-lived as they are used
-	 * in pg_backup_stop().
+	 * in pg_backup_stop(). Create a special session-level memory context as a
+	 * direct child of TopMemoryContext so that the memory allocated is carried
+	 * across. We keep the memory allocated in this memory context less,
+	 * because any error before reaching pg_backup_stop() can leak the memory
+	 * until pg_backup_start() is called again. While this is not smart, it
+	 * helps to keep things simple.
 	 */
-	oldcontext = MemoryContextSwitchTo(TopMemoryContext);
-
-	/* Allocate backup state or reset it, if it comes from a previous run */
-	if (backup_state == NULL)
-		backup_state = (BackupState *) palloc0(sizeof(BackupState));
+	if (backupcontext == NULL)
+		backupcontext = AllocSetContextCreate(TopMemoryContext,
+			  "on-line backup context",
+			  ALLOCSET_START_SMALL_SIZES);
 	else
-		MemSet(backup_state, 0, sizeof(BackupState));
-
-	/*
-	 * tablespace_map may have been created in a previous backup, so take this
-	 * occasion to clean it.
-	 */
-	if (tablespace_map != NULL)
-	{
-		pfree(tablespace_map->data);
-		pfree(tablespace_map);
-		tablespace_map = NULL;
-	}
+		MemoryContextReset(backupcontext);
 
+	oldcontext = MemoryContextSwitchTo(backupcontext);
+	backup_state = (BackupState *) palloc0(sizeof(BackupState));
 	tablespace_map = makeStringInfo();
 	MemoryContextSwitchTo(oldcontext);
 
@@ -157,13 +155,12 @@ pg_backup_stop(PG_FUNCTION_ARGS)
 	values[2] = CStringGetTextDatum(tablespace_map->data);
 
 	/* Deallocate backup-related variables */
-	pfree(backup_state);
-	backup_state = NULL;
-	pfree(tablespace_map->data);
-	pfree(tablespace_map);
-	tablespace_map = NULL;
 	pfree(backup_label);
 
+	/* Clean up the session-level backup memory context */
+	MemoryContextDelete(backupcontext);
+	backupcontext = NULL;
+
 	/* Returns the record as Datum */
 	PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
 }
-- 
2.34.1



Re: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)

2022-10-20 Thread Amit Kapila
On Wed, Oct 19, 2022 at 9:40 AM Masahiko Sawada  wrote:
>
> I've attached patches for Change-3 with a test case. Please review them as 
> well.
>

The patch looks mostly good to me apart from few minor comments which
are as follows:
1.
+# The last decoding restarts from the first checkpoint, and add
invalidation messages
+# generated by "s0_truncate" to the subtransaction. When decoding the
commit record of
+# the top-level transaction, we mark both top-level transaction and
its subtransactions
+# as containing catalog changes. However, we check if we don't create
the association
+# between top-level and subtransactions at this time. Otherwise, we
miss executing
+# invalidation messages when forgetting the transaction.
+permutation "s0_init" "s0_begin" "s0_savepoint" "s0_insert"
"s1_checkpoint" "s1_get_changes" "s0_truncate" "s0_commit" "s0_begin"
"s0_insert" "s1_checkpoint" "s1_get_changes" "s0_commit"
"s1_get_changes"

The second part of this comment seems to say things more than required
which makes it less clear. How about something like: "The last
decoding restarts from the first checkpoint and adds invalidation
messages generated by "s0_truncate" to the subtransaction. While
processing the commit record for the top-level transaction, we decide
to skip this xact but ensure that corresponding invalidation messages
get processed."?

2.
+ /*
+ * We will assign subtransactions to the top transaction before
+ * replaying the contents of the transaction.
+ */

I don't think we need this comment.

-- 
With Regards,
Amit Kapila.




Re: Perform streaming logical transactions by background workers and parallel apply

2022-10-20 Thread Amit Kapila
On Thu, Oct 20, 2022 at 2:08 PM Peter Smith  wrote:
>
> 7. get_transaction_apply_action
>
> > 12. get_transaction_apply_action
> >
> > I still felt like there should be some tablesync checks/comments in
> > this function, just for sanity, even if it works as-is now.
> >
> > For example, are you saying ([3] #22b) that there might be rare cases
> > where a Tablesync would call to parallel_apply_find_worker? That seems
> > strange, given that "for streaming transactions that are being applied
> > in the parallel ... we disallow applying changes on a table that is
> > not in the READY state".
> >
> > --
>
> Houz wrote [2] -
>
> I think because we won't try to start parallel apply worker in table sync
> worker(see the check in parallel_apply_can_start()), so we won't find any
> worker in parallel_apply_find_worker() which means 
> get_transaction_apply_action
> will return TRANS_LEADER_SERIALIZE. And get_transaction_apply_action is a
> function which can be invoked for all kinds of workers(same is true for all
> apply_handle_xxx functions), so not sure if table sync check/comment is
> necessary.
>
> ~
>
> Sure, and I believe you when you say it all works OK - but IMO there
> is something still not quite right with this current code. For
> example,
>
> e.g.1 the functional will return TRANS_LEADER_SERIALIZE for Tablesync
> worker, and yet the comment for TRANS_LEADER_SERIALIZE says "means
> that we are in the leader apply worker" (except we are not)
>
> e.g.2 we know for a fact that Tablesync workers cannot start their own
> parallel apply workers, so then why do we even let the Tablesync
> worker make a call to parallel_apply_find_worker() looking for
> something we know will not be found?
>

I don't see much benefit in adding an additional check for tablesync
workers here. It will unnecessarily make this part of the code look
bit ugly.

--
With Regards,
Amit Kapila.




RE: Perform streaming logical transactions by background workers and parallel apply

2022-10-20 Thread houzj.f...@fujitsu.com
On Wednesday, October 19, 2022 8:50 PM Kuroda, Hayato/黒田 隼人 
 wrote:

Thanks for the comments.

> 03. applyparallelwprker.c - LogicalParallelApplyLoop
> 
> ```
>   case SHM_MQ_WOULD_BLOCK:
>   {
>   int rc;
> 
>   if (!in_streamed_transaction)
>   {
>   /*
>* If we didn't get any 
> transactions for a while there might be
>* unconsumed invalidation 
> messages in the queue, consume them
>* now.
>*/
>   AcceptInvalidationMessages();
>   maybe_reread_subscription();
>   }
> 
>   MemoryContextReset(ApplyMessageContext);
> ```
> 
> Is MemoryContextReset() needed? IIUC no one uses ApplyMessageContext if we 
> reach here.

I was concerned that some code in deeper level might allocate some memory as
there are lots of codes and functions could be invoked in the loop(For example,
the functions in ProcessInterrupts()). Although It might not matter in
practice, but I think it might be better to reset here to make it robust. 
Besides,
the code seems consistent with the logic in LogicalRepApplyLoop.

> 04. applyparallelwprker.c - HandleParallelApplyMessages
> 
> ```
>   else if (res == SHM_MQ_SUCCESS)
>   {
>   StringInfoData msg;
> 
>   initStringInfo();
>   appendBinaryStringInfo(, data, nbytes);
>   HandleParallelApplyMessage(winfo, );
>   pfree(msg.data);
>   }
> ```
> 
> In LogicalParallelApplyLoop(), appendBinaryStringInfo() is not used but
> initialized StringInfoData directly initialized. Why there is a difrerence? 
> The
> function will do repalloc() and memcpy(), so it may be inefficient.

I think both styles are fine, the code in HandleParallelApplyMessages is to keep
consistent with the similar function HandleParallelMessages() which is not a
performance sensitive function.


> 05. applyparallelwprker.c - parallel_apply_send_data
> 
> ```
>   if (result != SHM_MQ_SUCCESS)
>   ereport(ERROR,
>   
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>errmsg("could not send data to shared-memory 
> queue")));
> 
> ```
> 
> I checked the enumeration of shm_mq_result, and I felt that shm_mq_send(nowait
> = false) failed only when the opposite process has been exited. How about add 
> a
> hint or detailed message like "lost connection to parallel apply worker"?

Thanks for analyzing, but I am not sure if "lost connection to xx worker" is a
appropriate errhint or detail. The current error message looks clear to me.


> 07. worker.c - apply_handle_commit_internal
> 
> I think we can add an assertion like Assert(replorigin_session_origin_lsn !=
> InvalidXLogRecPtr && replorigin_session_origin = InvalidRepOriginId), to
> avoid missing replorigin_session_setup. Previously it was set at the entry
> point at never reset.

I feel addding the assert for replorigin_session_origin is fine here. For
replorigin_session_origin_lsn, I am not sure if looks better to check here as
we need to distingush the case for streaming=on and streaming=parallel if we
want to check that.


> 10. general
> 
> IIUC parallel apply workers could not detect the deadlock automatically,
> right? I thought we might be able to use the heartbeat protocol between a
> leader worker and parallel workers.
>  
> You have already implemented a mechanism to send and receive messages between
> workers. My idea is that each parallel apply worker records a timestamp that
> gets a message from the leader and if a certain time (30s?) has passed it
> sends a heartbeat message like 'H'. The leader consumes 'H' and sends a reply
> like LOGICAL_REP_MSG_KEEPALIVE in HandleParallelApplyMessage(). If the
> parallel apply worker does not receive any message for more than one minute,
> it regards that the deadlock has occurred and can change the retry flag to on
> and exit.
> 
> The above assumes that the leader cannot reply to the message while waiting
> for the lock. Moreover, it may have notable overhead and we must use a new
> logical replication message type.
> 
> How do you think? Have you already considered about it?

Thanks for the suggestion. But we are trying to detect this kind of problem 
before
this problematic case happens and disallow parallelism in these cases by
checking the unique/constr/partitioned... in 0003/0004 patch.

About the keepalive design. We could do that, but the leader 

Re: libpq support for NegotiateProtocolVersion

2022-10-20 Thread Peter Eisentraut

On 13.10.22 23:00, Nathan Bossart wrote:

On Thu, Oct 13, 2022 at 10:33:01AM +0200, Peter Eisentraut wrote:

+   if (their_version != conn->pversion)


Shouldn't this be 'their_version < conn->pversion'?  If the server supports
a later protocol than what is requested but not all the requested protocol
extensions, I think libpq would still report "protocol version not
supported."


Ok, changed.


+   appendPQExpBuffer(>errorMessage,
+ libpq_gettext("protocol version 
not supported by server: client uses %d.%d, server supports %d.%d\n"),
+ 
PG_PROTOCOL_MAJOR(conn->pversion), PG_PROTOCOL_MINOR(conn->pversion),
+ 
PG_PROTOCOL_MAJOR(their_version), PG_PROTOCOL_MINOR(their_version));


Should this match the error in postmaster.c and provide the range of
versions the server supports?  The FATAL in postmaster.c is for the major
version, but I believe the same information is relevant when a
NegotiateProtocolVersion message is sent.

ereport(FATAL,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 errmsg("unsupported frontend protocol %u.%u: server 
supports %u.0 to %u.%u",


If you increase the libpq minor protocol version and connect to an older 
server, you would get an error like "server supports 3.0 to 3.0", which 
is probably a bit confusing.  I changed it to "up to 3.0" to convey that 
it could be a range.



+   else
+   appendPQExpBuffer(>errorMessage,
+ libpq_gettext("protocol extension 
not supported by server: %s\n"), buf.data);


nitpick: s/extension/extensions


Ok, added proper plural support.


What if neither the protocol version nor the requested extensions are
supported?  Right now, I think only the unsupported protocol version is
supported in that case, but presumably we could report both pretty easily.


Ok, I just appended both error messages in that case.
From 723027fe61a6f34acb7d7c0d518b4dbff03af9cd Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 20 Oct 2022 11:18:50 +0200
Subject: [PATCH v2] libpq: Handle NegotiateProtocolVersion message

Before, receiving a NegotiateProtocolVersion message would result in a
confusing error message like

expected authentication request from server, but received v

This adds proper handling of this protocol message and produces an
on-topic error message from it.

Discussion: 
https://www.postgresql.org/message-id/flat/f9c7862f-b864-8ef7-a861-c4638c83e209%40enterprisedb.com
---
 src/interfaces/libpq/fe-connect.c   | 18 +--
 src/interfaces/libpq/fe-protocol3.c | 50 +
 src/interfaces/libpq/libpq-int.h|  1 +
 3 files changed, 66 insertions(+), 3 deletions(-)

diff --git a/src/interfaces/libpq/fe-connect.c 
b/src/interfaces/libpq/fe-connect.c
index 746e9b4f1efc..1c0d8243a6ca 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -3246,10 +3246,11 @@ PQconnectPoll(PGconn *conn)
 
/*
 * Validate message type: we expect only an 
authentication
-* request or an error here.  Anything else 
probably means
-* it's not Postgres on the other end at all.
+* request, NegotiateProtocolVersion, or an 
error here.
+* Anything else probably means it's not 
Postgres on the other
+* end at all.
 */
-   if (!(beresp == 'R' || beresp == 'E'))
+   if (!(beresp == 'R' || beresp == 'v' || beresp 
== 'E'))
{
appendPQExpBuffer(>errorMessage,
  
libpq_gettext("expected authentication request from server, but received %c\n"),
@@ -3405,6 +3406,17 @@ PQconnectPoll(PGconn *conn)
 
goto error_return;
}
+   else if (beresp == 'v')
+   {
+   if 
(pqGetNegotiateProtocolVersion3(conn))
+   {
+   /* We'll come back when there 
is more data */
+   return PGRES_POLLING_READING;
+   }
+   /* OK, we read the message; mark data 
consumed */
+   conn->inStart = conn->inCursor;
+   goto error_return;
+   }
 
  

Re: Exponentiation confusion

2022-10-20 Thread Dean Rasheed
On Tue, 18 Oct 2022 at 20:18, Robert Haas  wrote:
>
> On Tue, Oct 18, 2022 at 6:18 AM Dean Rasheed  wrote:
> > Overall, I'm quite happy with these results. The question is, should
> > this be back-patched?
> >
> > In the past, I think I've only back-patched numeric bug-fixes where
> > the digits output by the old code were incorrect or an error was
> > thrown, not changes that resulted in a different number of digits
> > being output, changing the precision of already-correct results.
> > However, having 10.0^(-18) produce zero seems pretty bad, so my
> > inclination is to back-patch, unless anyone objects.
>
> I don't think that back-patching is a very good idea. The bar for
> changing query results should be super-high. Applications can depend
> on the existing behavior even if it's wrong.
>

OK, on reflection, I think that makes sense. Applied to HEAD only.

Regards,
Dean




Re: Standby recovers records from wrong timeline

2022-10-20 Thread Kyotaro Horiguchi
Forgot a caveat.

At Thu, 20 Oct 2022 17:34:13 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Wed, 19 Oct 2022 18:50:09 +0300, Ants Aasma  wrote in 
> > When standby is recovering to a timeline that doesn't have any segments
> > archived yet it will just blindly blow past the timeline switch point and
> > keeps on recovering on the old timeline. Typically that will eventually
> > result in an error about incorrect prev-link, but under unhappy
> > circumstances can result in standby silently having different contents.
> > 
> > Attached is a shell script that reproduces the issue. Goes back to at least
> > v12, probably longer.
> > 
> > I think we should be keeping track of where the current replay timeline is
> > going to end and not read any records past it on the old timeline. Maybe
> > while at it, we should also track that the next record should be a
> > checkpoint record for the timeline switch and error out if not. Thoughts?
> 
> primary_restored did a time-travel to past a bit because of the
> recovery_target=immediate. In other words, the primary_restored and
> the replica diverge. I don't think it is legit to connect a diverged
> standby to a primary.
> 
> So, about the behavior in doubt, it is the correct behavior to
> seemingly ignore the history file in the archive. Recovery assumes
> that the first half of the first segment of the new timeline is the
> same with the same segment of the old timeline (.partial) so it is
> legit to read the  file til the end and that causes the
> replica goes beyond the divergence point.
> 
> As you know, when new primary starts a diverged history, the
> recommended way is to blow (or stash) away the archive, then take a
> new backup from the running primary.
> 
> If you don't want to trash all the past backups, remove the archived
> files equals to or after the divergence point before starting the
> standby. They're  in this case. Also you must remove
> replica/pg_wal/ before starting the replica. That file
> causes recovery run beyond the divergence point before fetching from
> archive or stream.

The reason this is workable is (as far as I can see) using
recovery_target=immediate to stop replication and the two clusters
share the completely identical disk image. Otherwise this steps
results in a broken standby.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: problems with making relfilenodes 56-bits

2022-10-20 Thread Dilip Kumar
On Thu, Oct 20, 2022 at 12:51 AM Andres Freund  wrote:
>
> Hi,
>
> On 2022-10-17 17:14:21 -0400, Robert Haas wrote:
> > I have to admit that I worried about the same thing that Matthias
> > raises, more or less. But I don't know whether I'm right to be
> > worried. A variable-length representation of any kind is essentially a
> > gamble that values requiring fewer bytes will be more common than
> > values requiring more bytes, and by enough to justify the overhead
> > that the method has. And, you want it to be more common for each
> > individual user, not just overall. For example, more people are going
> > to have small relations than large ones, but nobody wants performance
> > to drop off a cliff when the relation passes a certain size threshold.
> > Now, it wouldn't drop off a cliff here, but what about someone with a
> > really big, append-only relation? Won't they just end up writing more
> > to WAL than with the present system?
>
> Perhaps. But I suspect it'd be a very small increase because they'd be using
> bulk-insert paths in all likelihood anyway, if they managed to get to a very
> large relation. And even in that case, if we e.g. were to make the record size
> variable length, they'd still pretty much never reach that and it'd be an
> overall win.

I think the number of cases where we will reduce the WAL size will be
far more than the cases where it will slightly increase the size.  And
also the number of bytes we save in winning cases is far bigger than
the number of bytes we increase.  So IMHO it seems like an overall win
at least from the WAL size reduction pov.  Do we have some number that
how much overhead it has for encoding/decoding?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Perform streaming logical transactions by background workers and parallel apply

2022-10-20 Thread Peter Smith
Hi, here are my review comments for the patch v39-0001

==

src/backend/libpq/pqmq.c

1. mq_putmessage

+ if (IsParallelWorker())
+ SendProcSignal(pq_mq_parallel_leader_pid,
+PROCSIG_PARALLEL_MESSAGE,
+pq_mq_parallel_leader_backend_id);
+ else
+ {
+ Assert(IsLogicalParallelApplyWorker());
+ SendProcSignal(pq_mq_parallel_leader_pid,
+PROCSIG_PARALLEL_APPLY_MESSAGE,
+pq_mq_parallel_leader_backend_id);
+ }

The generically named macro (IsParallelWorker) makes it seem like a
parallel apply worker is NOT a kind of parallel worker (e.g. it is in
the 'else'), which seems odd. But I am not sure what you can do to
improve this... e.g. reversing the if/else might look logically saner,
but might also be less efficient for the IsParallelWorker case (??)

==

.../replication/logical/applyparallelworker.c

2. LogicalParallelApplyLoop

+ /* Ensure we are reading the data into our memory context. */
+ (void) MemoryContextSwitchTo(ApplyMessageContext);

Why did you use the (void) cast for this MemoryContextSwitchTo but not
for the next one later in the same function?

~~~

3.

+ if (len == 0)
+ break;

As mentioned in my previous review ([1] #3), we are still in the
ApplyMessageContext here. Shouldn't the code be switching to the
previous context before escaping from the loop?

~~~

4.

+ switch (shmq_res)
+ {
+ case SHM_MQ_SUCCESS:
+ {
+ StringInfoData s;
+ int c;
+
+ if (len == 0)
+ break;

I think this introduces a subtle bug.

IIUC the intent of the "break" when len == 0 is to escape from the
loop. But now, this will only break from the switch case. So, it looks
like you need some kind of loop "done" flag, or maybe have to revert
back to using if/else to fix this.

~~~

5.

+ /*
+ * The first byte of message for additional communication between
+ * leader apply worker and parallel apply workers can only be 'w'.
+ */
+ c = pq_getmsgbyte();

Why does it refer to "additional communication"? Isn’t it enough just
to say something like below:

SUGGESTION
The first byte of messages sent from leader apply worker to parallel
apply workers can only be 'w'.

~~~

src/backend/replication/logical/worker.c

6. apply_handle_stream_start

+ *
+ * XXX We can avoid sending pairs of the START/STOP messages to the parallel
+ * worker because unlike apply worker it will process only one transaction at a
+ * time. However, it is not clear whether that is worth the effort because
+ * these messages are sent after logical_decoding_work_mem changes.
  */
 static void
 apply_handle_stream_start(StringInfo s)


I don't know what the "changes" part means. IIUC, the meaning of the
last sentence is like below:

SUGGESTION
However, it is not clear whether any optimization is worthwhile
because these messages are sent only when the
logical_decoding_work_mem threshold is exceeded.

~~~

7. get_transaction_apply_action

> 12. get_transaction_apply_action
>
> I still felt like there should be some tablesync checks/comments in
> this function, just for sanity, even if it works as-is now.
>
> For example, are you saying ([3] #22b) that there might be rare cases
> where a Tablesync would call to parallel_apply_find_worker? That seems
> strange, given that "for streaming transactions that are being applied
> in the parallel ... we disallow applying changes on a table that is
> not in the READY state".
>
> --

Houz wrote [2] -

I think because we won't try to start parallel apply worker in table sync
worker(see the check in parallel_apply_can_start()), so we won't find any
worker in parallel_apply_find_worker() which means get_transaction_apply_action
will return TRANS_LEADER_SERIALIZE. And get_transaction_apply_action is a
function which can be invoked for all kinds of workers(same is true for all
apply_handle_xxx functions), so not sure if table sync check/comment is
necessary.

~

Sure, and I believe you when you say it all works OK - but IMO there
is something still not quite right with this current code. For
example,

e.g.1 the functional will return TRANS_LEADER_SERIALIZE for Tablesync
worker, and yet the comment for TRANS_LEADER_SERIALIZE says "means
that we are in the leader apply worker" (except we are not)

e.g.2 we know for a fact that Tablesync workers cannot start their own
parallel apply workers, so then why do we even let the Tablesync
worker make a call to parallel_apply_find_worker() looking for
something we know will not be found?

--
[1] My review of v38-0001 -
https://www.postgresql.org/message-id/CAHut%2BPsY0aevdVqeCUJOrRQMrwpg5Wz3-Mo%2BbU%3DmCxW2%2B9EBTg%40mail.gmail.com
[2] Houz reply for my review v38 -
https://www.postgresql.org/message-id/OS0PR01MB5716D738A8F27968806957B194289%40OS0PR01MB5716.jpnprd01.prod.outlook.com

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Standby recovers records from wrong timeline

2022-10-20 Thread Kyotaro Horiguchi
Sorry, a correction needed..

At Thu, 20 Oct 2022 17:29:57 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Wed, 19 Oct 2022 18:50:09 +0300, Ants Aasma  wrote in 
> > When standby is recovering to a timeline that doesn't have any segments
> > archived yet it will just blindly blow past the timeline switch point and
> > keeps on recovering on the old timeline. Typically that will eventually
> > result in an error about incorrect prev-link, but under unhappy
> > circumstances can result in standby silently having different contents.
> > 
> > Attached is a shell script that reproduces the issue. Goes back to at least
> > v12, probably longer.
> > 
> > I think we should be keeping track of where the current replay timeline is
> > going to end and not read any records past it on the old timeline. Maybe
> > while at it, we should also track that the next record should be a
> > checkpoint record for the timeline switch and error out if not. Thoughts?
> 
> primary_restored did a time-travel to past a bit because of the
> recovery_target=immediate. In other words, the primary_restored and
> the replica diverge. I don't think it is legit to connect a diverged
> standby to a primary.
> 
> So, about the behavior in doubt, it is the correct behavior to
> seemingly ignore the history file in the archive. Recovery assumes
> that the first half of the first segment of the new timeline is the
> same with the same segment of the old timeline (.partial) so it is
> legit to read the  file til the end and that causes the
> replica goes beyond the divergence point.
> 
> As you know, when new primary starts a diverged history, the
> recommended way is to blow (or stash) away the archive, then take a
> new backup from the running primary.
> 
> If you don't want to trash all the past backups, remove the archived
> files equals to or after the divergence point before starting the
> standby. They're  in this case. Also you must remove

 => 

> replica/pg_wal/ before starting the replica. That file
> causes recovery run beyond the divergence point before fetching from
> archive or stream.

  =>  

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Standby recovers records from wrong timeline

2022-10-20 Thread Kyotaro Horiguchi
At Wed, 19 Oct 2022 18:50:09 +0300, Ants Aasma  wrote in 
> When standby is recovering to a timeline that doesn't have any segments
> archived yet it will just blindly blow past the timeline switch point and
> keeps on recovering on the old timeline. Typically that will eventually
> result in an error about incorrect prev-link, but under unhappy
> circumstances can result in standby silently having different contents.
> 
> Attached is a shell script that reproduces the issue. Goes back to at least
> v12, probably longer.
> 
> I think we should be keeping track of where the current replay timeline is
> going to end and not read any records past it on the old timeline. Maybe
> while at it, we should also track that the next record should be a
> checkpoint record for the timeline switch and error out if not. Thoughts?

primary_restored did a time-travel to past a bit because of the
recovery_target=immediate. In other words, the primary_restored and
the replica diverge. I don't think it is legit to connect a diverged
standby to a primary.

So, about the behavior in doubt, it is the correct behavior to
seemingly ignore the history file in the archive. Recovery assumes
that the first half of the first segment of the new timeline is the
same with the same segment of the old timeline (.partial) so it is
legit to read the  file til the end and that causes the
replica goes beyond the divergence point.

As you know, when new primary starts a diverged history, the
recommended way is to blow (or stash) away the archive, then take a
new backup from the running primary.

If you don't want to trash all the past backups, remove the archived
files equals to or after the divergence point before starting the
standby. They're  in this case. Also you must remove
replica/pg_wal/ before starting the replica. That file
causes recovery run beyond the divergence point before fetching from
archive or stream.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: thinko in basic_archive.c

2022-10-20 Thread Bharath Rupireddy
On Thu, Oct 20, 2022 at 6:57 AM Kyotaro Horiguchi
 wrote:
>
> > The archive module must be responsible for cleaning up the temp file
> > that it creates. One way to do it is in the archive module's shutdown
> > callback, this covers most of the cases, but not all.
>
> True. But I agree to Robert that such temporary files should be
> cleanup-able without needing temporarily-valid knowledge (current file
> name, in this case). A common strategy for this is to name those files
> by names that can be identifed as garbage.

I'm not sure how we can distinguish temp files as garbage based on
name. As Robert pointed out upthread, using system identifier may not
help as the standbys share the same system identifier and it's
possible that they might archive to the same directory. Is there any
other way?

> But since power cut is a typical crash source, we need to identify
> ruined temporary files and the current naming convention is incomplete
> in this regard.

Please note that basic_archive module creates one temp file at a time
to make file copying/moving atomic and it can keep track of the temp
file name and delete it using shutdown callback which helps in most of
the scenarios. As said upthread, repeated crashes while basic_archive
module is atomically copying files around is a big problem in itself
and basic_archive module need not worry about it much.

> flock() on nfs..
>
> The worst case I can come up with regardless of feasibility is a
> multi-standby physical replication set where all hosts share one
> archive directory. Indeed live and dead temprary files can coexist
> there.  However, I think we can identify truly rotten temp files by
> inserting host name or cluster name (means cluster_name in
> postgresql.conf) even in that case.  This premise that DBA names every
> cluster differently, but I think DBAs that is going to configure such
> a system are required to be very cautious about that kind of aspect.

Well, these ideas are great! However, we can leave defining such
strategies to archive module implementors. IMO, the basich_archive
module ought to be as simple and elegant as possible yet showing up
the usability of archive modules feature.

> Of course, it premised that a cluster uses the same name for a
> segment. If we insert cluseter_name into the temprary name, a starting
> cluster can indentify garbage files to clean up. For example if we
> name them as follows.
>
> ARCHTEMP_cluster1_pid_time_
>
> A starting cluster can clean up all files starts with
> "archtemp_cluster1_*". (We need to choose the delimiter carefully,
> though..)

Postgres cleaning up basic_archive modules temp files at the start up
isn't a great idea IMO. Because these files are not related to server
functionality in any way unlike temp files removed in
RemovePgTempFiles(). IMO, we ought to keep the basic_archive module
simple.
.
> No. I didn't mean that, If server stops after a successfull
> durable_rename but before the next call to
> basic_archive_file_internal, that call back makes false comlaint since
> that temprary file is actually gone.

Right. Fixed it.

> > Please see the attached v2 patch.
>
> +static chartempfilepath[MAXPGPATH + 256];
>
> MAXPGPATH is the maximum length of a file name that PG assumes to be
> able to handle. Thus extending that length seems wrong.

I think it was to accommodate the temp file name - "archtemp", file,
MyProcPid, epoch, but I agree that it can just be MAXPGPATH. However,
most of the places the core defines the path name to be MAXPGPATH +
some bytes.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From 591e58e31881d86dc0acc3bd9ca67aeed80a1041 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Thu, 20 Oct 2022 07:12:17 +
Subject: [PATCH v3] Remove leftover temporary files via basic_archive shutdown
 callback

---
 contrib/basic_archive/basic_archive.c | 32 +++
 doc/src/sgml/basic-archive.sgml   |  5 -
 2 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/contrib/basic_archive/basic_archive.c b/contrib/basic_archive/basic_archive.c
index 9f221816bb..f971b166be 100644
--- a/contrib/basic_archive/basic_archive.c
+++ b/contrib/basic_archive/basic_archive.c
@@ -42,10 +42,12 @@ PG_MODULE_MAGIC;
 
 static char *archive_directory = NULL;
 static MemoryContext basic_archive_context;
+static char	tempfilepath[MAXPGPATH + 256];
 
 static bool basic_archive_configured(void);
 static bool basic_archive_file(const char *file, const char *path);
 static void basic_archive_file_internal(const char *file, const char *path);
+static void basic_archive_shutdown(void);
 static bool check_archive_directory(char **newval, void **extra, GucSource source);
 static bool compare_files(const char *file1, const char *file2);
 
@@ -85,6 +87,7 @@ _PG_archive_module_init(ArchiveModuleCallbacks *cb)
 
 	cb->check_configured_cb = basic_archive_configured;
 	cb->archive_file_cb = 

Re: pg_recvlogical prints bogus error when interrupted

2022-10-20 Thread Bharath Rupireddy
On Thu, Oct 20, 2022 at 3:10 AM Andres Freund  wrote:
>
> Hi,
>
> While reviewing
> https://postgr.es/m/CAD21AoBe2o2D%3Dxyycsxw2bQOD%3DzPj7ETuJ5VYGN%3DdpoTiCMRJQ%40mail.gmail.com
> I noticed that pg_recvlogical prints
> "pg_recvlogical: error: unexpected termination of replication stream: "
>
> when signalled with SIGINT/TERM.
>
> Oddly enough, that looks to have "always" been the case, even though clearly
> the code tried to make provisions for a different outcome.
>
>
> It looks to me like all that's needed is to gate the block printing the
> message with an !time_to_abort.

+1. How about emitting a message like its friend pg_receivewal, like
the attached patch?

> I also then noticed that we don't fsync the output file in cases of errors -
> that seems wrong to me? Looks to me like that block should be moved till after
> the error:?

How about something like the attached patch?

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


v1-0001-pg_recvlogical-fixes.patch
Description: Binary data


Re: GUC values - recommended way to declare the C variables?

2022-10-20 Thread Peter Smith
On Thu, Oct 20, 2022 at 3:16 PM Justin Pryzby  wrote:
>
> On Thu, Oct 20, 2022 at 11:56:58AM +1100, Peter Smith wrote:
> > Patch 0002 adds a sanity-check function called by
> > InitializeGUCOptions, as suggested by Tom [2]. This is to ensure that
> > the GUC C variable initial values are sensible and/or have not gone
> > stale compared with the compiled-in defaults of guc_tables.c. This
> > patch also changes some GUC C variable initial values which were
> > already found (by this sanity-checker) to be different.
>
> I like it.
>
> However it's fails on windows:
>
> https://cirrus-ci.com/task/5545965036765184
>
> running bootstrap script ... FATAL:  GUC (PGC_BOOL) update_process_title, 
> boot_val=0, C-var=1
>
> Maybe you need to exclude dynamically set gucs ?
> See also this other thread, where I added a flag identifying exactly
> that.  https://commitfest.postgresql.org/40/3736/
> I need to polish that patch some, but maybe it'll be useful for you, too.
>

Great, this looks very helpful. I will try again tomorrow by skipping
over such GUCs.

And I noticed a couple of other C initial values I had changed
coincide with what you've marked as GUC_DYNAMIC_DEFAULT so I'll
restore those to how they were before too.

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: list of TransactionIds

2022-10-20 Thread Alvaro Herrera
Hello

On 2022-Oct-20, houzj.f...@fujitsu.com wrote:

> While trying to use the newly introduced list_member_xid(), I noticed that it
> internally use lfirst_oid instead of lfirst_xid. It works ok for now. Just in
> case we change xid to 64 bits in the future, I think we’d better use 
> lfirst_xid
> here.

Egad.

> Here is a tiny patch to fix that.

Pushed, thanks.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"I can't go to a restaurant and order food because I keep looking at the
fonts on the menu.  Five minutes later I realize that it's also talking
about food" (Donald Knuth)




RE: list of TransactionIds

2022-10-20 Thread houzj.f...@fujitsu.com
On Monday, July 4, 2022 9:27 PM Alvaro Herrera  wrote:

Hi,

> 
> Pushed now, to master only.

Thanks for introducing these APIs!

While trying to use the newly introduced list_member_xid(), I noticed that it
internally use lfirst_oid instead of lfirst_xid. It works ok for now. Just in
case we change xid to 64 bits in the future, I think we’d better use lfirst_xid
here.

Here is a tiny patch to fix that.

Best regards,
Hou Zhijie


0001-use-proper-macros-to-access-xid.patch
Description: 0001-use-proper-macros-to-access-xid.patch


Re: shared memory stats ideas

2022-10-20 Thread Drouvot, Bertrand

Hi,

On 10/19/22 8:19 PM, Andres Freund wrote:


Hi,


Here's a largely unordered list of ideas. I'm not planning to work on them
myself, but thought it'd nevertheless be useful to have them memorialized
somewhere.



Thanks for sharing this list of ideas!




2) Split index and table statistics into different types of stats

We track both types of statistics in the same format and rename column in
views etc to make them somewhat sensible. A number of the "columns" in index
stats are currently unused.

If we split the stats for indexes and relations we can have reasonable names
for the fields, shrink the current memory usage by halfing the set of fields
we keep for indexes, and extend the stats in a more targeted fashion.


I started to work on this.
I should be able to provide a patch attempt in the next couple of weeks.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




RE: Logical replication timeout problem

2022-10-20 Thread wangw.f...@fujitsu.com
On Thurs, Oct 20, 2022 at 13:47 PM Fabrice Chapuis  
wrote:
> Yes the refresh of MV is on the Publisher Side.
> Thanks for your draft patch, I'll try it
> I'll back to you as soonas possible

Thanks a lot.

> One question: why the refresh of the MV is a DDL not a DML?

Since in the source, the type of command `REFRESH MATERIALIZED VIEW` is
`CMD_UTILITY`, I think this command is DDL (see CmdType in file nodes.h).

BTW, after trying to search for DML in the pg-doc, I found the relevant
description in the below link:
https://www.postgresql.org/docs/devel/logical-replication-publication.html

Regards,
Wang wei


Re: Documentation refinement for Parallel Scans

2022-10-20 Thread David Rowley
On Thu, 20 Oct 2022 at 16:03, Zhang Mingli  wrote:
> As said in parallel.smgl:
>
> In a parallel sequential scan, the table's blocks will be divided among the 
> cooperating processes. Blocks are handed out one at a time, so that access to 
> the table remains sequential.

> Shall we update the documents?

Yeah, 56788d215 should have updated that. Seems I didn't expect that
level of detail in the docs. I've attached a patch to address this.

I didn't feel the need to go into too much detail about how the sizes
of the ranges are calculated. I tried to be brief, but I think I did
leave enough in there so that a reader will know that we don't just
make the range length  / .

I'll push this soon if nobody has any other wording suggestions.

Thanks for the report.

David
diff --git a/doc/src/sgml/parallel.sgml b/doc/src/sgml/parallel.sgml
index c37fb67065..e556786e2b 100644
--- a/doc/src/sgml/parallel.sgml
+++ b/doc/src/sgml/parallel.sgml
@@ -272,8 +272,9 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE 
'%x%';
 
   
 In a parallel sequential scan, the table's blocks 
will
-be divided among the cooperating processes.  Blocks are handed out one
-at a time, so that access to the table remains sequential.
+be divided into ranges and shared among the cooperating processes.  
Each
+worker process will complete the scanning of its given range of blocks 
before
+requesting an additional range of blocks.