Add WAL recovery messages with log_wal_traffic GUC (was: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display)

2022-02-26 Thread Bharath Rupireddy
On Wed, Dec 29, 2021 at 6:50 AM Bharath Rupireddy
 wrote:
>
> On Thu, Dec 9, 2021 at 9:28 PM Alvaro Herrera  wrote:
> >
> > Maybe some tunable like
> > log_wal_traffic = { none, medium, high }
> > where "none" is current behavior of no noise, "medium" gets (say) once
> > every 256 segments (e.g., when going from XFF to (X+1)00), "high" gets
> > you one message per segment.
>
> On Fri, Dec 24, 2021 at 7:19 PM Justin Pryzby  wrote:
> >
> > If you're talking about a new feature that uses the infrastructre from 9ce3,
> > but is controlled by a separate GUC like log_wal_traffic, that could be 
> > okay.
>
> Thanks for the suggestion. I've added a new GUC log_wal_traffic as
> suggested above. PSA  v7 patch.

Here's the rebased v8 patch, please review it.

https://commitfest.postgresql.org/37/3443/

Regards,
Bharath Rupireddy.


v8-0001-Add-WAL-recovery-messages-with-log_wal_traffic-GU.patch
Description: Binary data


Re: [Proposal] Global temporary tables

2022-02-26 Thread Pavel Stehule
ne 27. 2. 2022 v 5:13 odesílatel Andres Freund  napsal:

> Hi,
>
> On 2022-02-27 04:17:52 +0100, Pavel Stehule wrote:
> > > You redirect stats from pg_class and pg_statistics to a local hash
> table.
> > > This is pretty hairy :(
>
> As is I think the patch is architecturally completely unacceptable. Having
> code everywhere to redirect to manually written in-memory catalog table
> code
> isn't maintainable.
>
>
> > > I guess you'd also need to handle pg_statistic_ext and ext_data.
> > > pg_stats doesn't work, since the data isn't in pg_statistic - it'd
> need to
> > > look
> > > at pg_get_gtt_statistics.
> >
> > Without this, the GTT will be terribly slow like current temporary tables
> > with a lot of problems with bloating of pg_class, pg_attribute and
> > pg_depend tables.
>
> I think it's not a great idea to solve multiple complicated problems at
> once...
>

I thought about this issue for a very long time, and I didn't find any
better (without more significant rewriting of pg storage). In a lot of
projects, that I know, the temporary tables are strictly prohibited due
possible devastating impact to system catalog bloat.  It is a serious
problem. So any implementation of GTT should solve the questions: a) how to
reduce catalog bloating, b) how to allow session related statistics for
GTT. I agree so implementation of GTT like template based LTT (local
temporary tables) can be very simple (it is possible by extension), but
with the same unhappy performance impacts.

I don't say so current design should be accepted without any discussions
and without changes. Maybe GTT based on LTT can be better than nothing
(what we have now), and can be good enough for a lot of projects where the
load is not too high (and almost all projects have low load).
Unfortunately,it can be a trap for a lot of projects in future, so there
should be discussion and proposed solutions for fix of related issues. The
performance of GTT should be fixable, so any discussion about this topic
should have part about protections against catalog bloat and about cost
related to frequent catalog updates.

But anyway, I invite (and probably not just me) any discussion on how to
implement this feature, how to solve performance issues, and how to divide
implementation into smaller steps. I am sure so fast GTT  implementation
can be used for fast implementation of LTT too, and maybe with all other
temporary objects

Regards

Pavel


> Greetings,
>
> Andres Freund
>


Re: [Proposal] Global temporary tables

2022-02-26 Thread Andres Freund
Hi,

On 2022-02-27 04:17:52 +0100, Pavel Stehule wrote:
> > You redirect stats from pg_class and pg_statistics to a local hash table.
> > This is pretty hairy :(

As is I think the patch is architecturally completely unacceptable. Having
code everywhere to redirect to manually written in-memory catalog table code
isn't maintainable.


> > I guess you'd also need to handle pg_statistic_ext and ext_data.
> > pg_stats doesn't work, since the data isn't in pg_statistic - it'd need to
> > look
> > at pg_get_gtt_statistics.
>
> Without this, the GTT will be terribly slow like current temporary tables
> with a lot of problems with bloating of pg_class, pg_attribute and
> pg_depend tables.

I think it's not a great idea to solve multiple complicated problems at
once...

Greetings,

Andres Freund




Re: Adding CI to our tree

2022-02-26 Thread Andres Freund
Hi,

On 2022-02-26 21:10:57 -0600, Justin Pryzby wrote:
> I did git commit --amend --no-edit and repushed to github to trigger a new CI
> run, and it did this: https://github.com/justinpryzby/postgres/runs/5347878714
>
> This is in a branch with changes to doc.  I wasn't intending it to skip
> building docs on this branch just because the same, changed docs were
> previously built.

But why not? If nothing in docs/ changes, there's little point? It'd probably
be good to check .cirrus.yml and docs/**, to make sure that .cirrus logic
changes rerun as well.


> Why wouldn't the docs be built following the same logic as the rest of the
> sources?

Tests have a certain rate of spurious failure, so rerunning them makes
sense. But what do we gain by rebuilding the docs? And especially, what do we
gain about uploading the docs e.g. in the postgres/postgres repo?


> If someone renames or removes an xref target, shouldn't CI fail on its next
> run for a patch which tries to reference it ?

Why wouldn't it?


> It would fail on the buildfarm, and I think one major use for the CI is to
> minimize the post-push cleanup cycles.

I personally see it more as access to a "mini buildfarm" before patches are
committable, but that's of course compatible.


> Are you sure about cfbot ?  AIUI cirrus would see that docs didn't change
> relative to the previous run for branch: commitfest/NN/.

Not entirely sure, but it's what I remember observing when ammending commits
in a repo using changesInclues. If I were to build a feature like it I'd look
at the list of files of
  git diff $(git merge-base last_green new_commit)..new_commit

or such.  cfbot doesn't commit incrementally but replaces the prior commit, so
I suspect it'll always be viewn as new. But who knows, shouldn't be hard to
figure out.

Greetings,

Andres Freund




Re: [Proposal] Global temporary tables

2022-02-26 Thread Pavel Stehule
Hi

You redirect stats from pg_class and pg_statistics to a local hash table.
> This is pretty hairy :(
> I guess you'd also need to handle pg_statistic_ext and ext_data.
> pg_stats doesn't work, since the data isn't in pg_statistic - it'd need to
> look
> at pg_get_gtt_statistics.
>

Without this, the GTT will be terribly slow like current temporary tables
with a lot of problems with bloating of pg_class, pg_attribute and
pg_depend tables.

Regards

Pavel


Re: Adding CI to our tree

2022-02-26 Thread Justin Pryzby
On Sat, Feb 26, 2022 at 06:50:00PM -0800, Andres Freund wrote:
> Hi,
> 
> On 2022-02-26 20:43:52 -0600, Justin Pryzby wrote:
> > This doesn't do the right thing - I just tried.
> > https://cirrus-ci.org/guide/writing-tasks/#environment-variables
> > | changesInclude function can be very useful for skipping some tasks when 
> > no changes to sources have been made since the last successful Cirrus CI 
> > build.
> 
> > That means it will not normally rebuild docs (and then this still requires
> > resolving the "base branch").
> 
> Why would we want to rebuild docs if they're the same as in the last build for
> the same branch?  For cfbot purposes each commit is independent from the prior
> commit, so it should rebuild it every time if the CF entry has changes to the
> docs.

I did git commit --amend --no-edit and repushed to github to trigger a new CI
run, and it did this: https://github.com/justinpryzby/postgres/runs/5347878714

This is in a branch with changes to doc.  I wasn't intending it to skip
building docs on this branch just because the same, changed docs were
previously built.

Why wouldn't the docs be built following the same logic as the rest of the
sources ?  If someone renames or removes an xref target, shouldn't CI fail on
its next run for a patch which tries to reference it ?  It would fail on the
buildfarm, and I think one major use for the CI is to minimize the post-push
cleanup cycles.

Are you sure about cfbot ?  AIUI cirrus would see that docs didn't change
relative to the previous run for branch: commitfest/NN/.

-- 
Justin




Re: Adding CI to our tree

2022-02-26 Thread Andres Freund
Hi,

On 2022-02-26 20:43:52 -0600, Justin Pryzby wrote:
> This doesn't do the right thing - I just tried.
> https://cirrus-ci.org/guide/writing-tasks/#environment-variables
> | changesInclude function can be very useful for skipping some tasks when no 
> changes to sources have been made since the last successful Cirrus CI build.

> That means it will not normally rebuild docs (and then this still requires
> resolving the "base branch").

Why would we want to rebuild docs if they're the same as in the last build for
the same branch?  For cfbot purposes each commit is independent from the prior
commit, so it should rebuild it every time if the CF entry has changes to the
docs.

Greetings,

Andres Freund




Re: Adding CI to our tree

2022-02-26 Thread Justin Pryzby
On Sat, Feb 26, 2022 at 05:09:08PM -0800, Andres Freund wrote:
> > XXX: if this is run in the same task, the configure flags should probably be
> > consistent ?
> 
> What do you mean?

I mean that commit to run CompilerWarnings unconditionally built docs with
different flags than the other stuff in that task.  If it's going to be a
separate task, then that doesn't matter.

> > +# Verify docs can be built, and upload changed docs as artifacts
> > +task:
> > +  name: HTML docs
> > +
> > +  env:
> > +CPUS: 1
> > +
> > +  only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || 
> > $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*(docs|html).*'
> > +
> > +  container:
> > +image: $CONTAINER_REPO/linux_debian_bullseye_ci:latest
> > +cpu: $CPUS
> > +
> 
> how about using something like (the syntax might be slightly off)
>   skip: !changesInclude('doc/**')
> to avoid running it for the many pushes where no docs are changed?

This doesn't do the right thing - I just tried.
https://cirrus-ci.org/guide/writing-tasks/#environment-variables
| changesInclude function can be very useful for skipping some tasks when no 
changes to sources have been made since the last successful Cirrus CI build.

That means it will not normally rebuild docs (and then this still requires
resolving the "base branch").

-- 
Justin




Re: convert libpq uri-regress tests to tap test

2022-02-26 Thread Andres Freund
Hi,

On 2022-02-25 17:52:29 -0800, Andres Freund wrote:
> I'd like to commit 0001 and 0002 soon, unless somebody sees a reason not to?

Pushed.  Attached is the remainder, 0003, the move of libpq_pipeline to
src/interfaces/libpq that I'm not planning to push for now.

Regards,

Andres
>From 61a89721c8aab3a1fd2300067c470807b4fe87bc Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Thu, 24 Feb 2022 08:27:41 -0800
Subject: [PATCH v4] Move libpq_pipeline test into src/interfaces/libpq.

---
 .../libpq/t/002_pipeline.pl}  |  2 +-
 src/interfaces/libpq/test/.gitignore  |  1 +
 src/interfaces/libpq/test/Makefile|  2 +-
 .../libpq/test}/libpq_pipeline.c  |  0
 .../test}/traces/disallowed_in_pipeline.trace |  0
 .../libpq/test}/traces/multi_pipelines.trace  |  0
 .../libpq/test}/traces/nosync.trace   |  0
 .../libpq/test}/traces/pipeline_abort.trace   |  0
 .../libpq/test}/traces/prepared.trace |  0
 .../libpq/test}/traces/simple_pipeline.trace  |  0
 .../libpq/test}/traces/singlerow.trace|  0
 .../libpq/test}/traces/transaction.trace  |  0
 src/test/modules/Makefile |  1 -
 src/test/modules/libpq_pipeline/.gitignore|  5 
 src/test/modules/libpq_pipeline/Makefile  | 25 ---
 src/test/modules/libpq_pipeline/README|  1 -
 16 files changed, 3 insertions(+), 34 deletions(-)
 rename src/{test/modules/libpq_pipeline/t/001_libpq_pipeline.pl => interfaces/libpq/t/002_pipeline.pl} (96%)
 rename src/{test/modules/libpq_pipeline => interfaces/libpq/test}/libpq_pipeline.c (100%)
 rename src/{test/modules/libpq_pipeline => interfaces/libpq/test}/traces/disallowed_in_pipeline.trace (100%)
 rename src/{test/modules/libpq_pipeline => interfaces/libpq/test}/traces/multi_pipelines.trace (100%)
 rename src/{test/modules/libpq_pipeline => interfaces/libpq/test}/traces/nosync.trace (100%)
 rename src/{test/modules/libpq_pipeline => interfaces/libpq/test}/traces/pipeline_abort.trace (100%)
 rename src/{test/modules/libpq_pipeline => interfaces/libpq/test}/traces/prepared.trace (100%)
 rename src/{test/modules/libpq_pipeline => interfaces/libpq/test}/traces/simple_pipeline.trace (100%)
 rename src/{test/modules/libpq_pipeline => interfaces/libpq/test}/traces/singlerow.trace (100%)
 rename src/{test/modules/libpq_pipeline => interfaces/libpq/test}/traces/transaction.trace (100%)
 delete mode 100644 src/test/modules/libpq_pipeline/.gitignore
 delete mode 100644 src/test/modules/libpq_pipeline/Makefile
 delete mode 100644 src/test/modules/libpq_pipeline/README

diff --git a/src/test/modules/libpq_pipeline/t/001_libpq_pipeline.pl b/src/interfaces/libpq/t/002_pipeline.pl
similarity index 96%
rename from src/test/modules/libpq_pipeline/t/001_libpq_pipeline.pl
rename to src/interfaces/libpq/t/002_pipeline.pl
index 0c164dcaba5..2e288d8ba7c 100644
--- a/src/test/modules/libpq_pipeline/t/001_libpq_pipeline.pl
+++ b/src/interfaces/libpq/t/002_pipeline.pl
@@ -49,7 +49,7 @@ for my $testname (@tests)
 		my $expected;
 		my $result;
 
-		$expected = slurp_file_eval("traces/$testname.trace");
+		$expected = slurp_file_eval("test/traces/$testname.trace");
 		next unless $expected ne "";
 		$result = slurp_file_eval($traceout);
 		next unless $result ne "";
diff --git a/src/interfaces/libpq/test/.gitignore b/src/interfaces/libpq/test/.gitignore
index 5e803d8816a..e24d7f64dc3 100644
--- a/src/interfaces/libpq/test/.gitignore
+++ b/src/interfaces/libpq/test/.gitignore
@@ -1 +1,2 @@
 /uri-regress
+/libpq_pipeline
diff --git a/src/interfaces/libpq/test/Makefile b/src/interfaces/libpq/test/Makefile
index 54212159065..9f99309653f 100644
--- a/src/interfaces/libpq/test/Makefile
+++ b/src/interfaces/libpq/test/Makefile
@@ -11,7 +11,7 @@ endif
 override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
 LDFLAGS_INTERNAL += $(libpq_pgport)
 
-PROGS = uri-regress
+PROGS = uri-regress libpq_pipeline
 
 all: $(PROGS)
 
diff --git a/src/test/modules/libpq_pipeline/libpq_pipeline.c b/src/interfaces/libpq/test/libpq_pipeline.c
similarity index 100%
rename from src/test/modules/libpq_pipeline/libpq_pipeline.c
rename to src/interfaces/libpq/test/libpq_pipeline.c
diff --git a/src/test/modules/libpq_pipeline/traces/disallowed_in_pipeline.trace b/src/interfaces/libpq/test/traces/disallowed_in_pipeline.trace
similarity index 100%
rename from src/test/modules/libpq_pipeline/traces/disallowed_in_pipeline.trace
rename to src/interfaces/libpq/test/traces/disallowed_in_pipeline.trace
diff --git a/src/test/modules/libpq_pipeline/traces/multi_pipelines.trace b/src/interfaces/libpq/test/traces/multi_pipelines.trace
similarity index 100%
rename from src/test/modules/libpq_pipeline/traces/multi_pipelines.trace
rename to src/interfaces/libpq/test/traces/multi_pipelines.trace
diff --git a/src/test/modules/libpq_pipeline/traces/nosync.trace b/src/interfaces/libpq/test/traces/nosync.trace
similarity index 100%
rename from 

Re: range_agg with multirange inputs

2022-02-26 Thread Chapman Flack
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

This applies (with some fuzz) and passes installcheck-world, but a rebase
is needed, because 3 lines of context aren't enough to get the doc changes
in the right place in the aggregate function table. (I think generating
the patch with 4 lines of context would be enough to keep that from being
a recurring issue.)

One thing that seems a bit funny is this message in the new
multirange_agg_transfn:

+   if (!type_is_multirange(mltrngtypoid))
+   ereport(ERROR,
+   (errcode(ERRCODE_DATATYPE_MISMATCH),
+errmsg("range_agg must be called with a multirange")));

It's clearly copied from the corresponding test and message in
range_agg_transfn. They both say "range_agg must be called ...", which
makes perfect sense, as from the user's perspective both messages come
from (different overloads of) a function named range_agg.

Still, it could be odd to have (again from the user's perspective)
a function named range_agg that sometimes says "range_agg must be
called with a range" and other times says "range_agg must be called
with a multirange".

I'm not sure how to tweak the wording (of either message or both) to
make that less weird, but there's probably a way.

I kind of wonder whether either message is really reachable, at least
through the aggregate machinery in the expected way. Won't that machinery
ensure that it is calling the right transfn with the right type of
argument? If that's the case, maybe the messages could only be seen
by someone calling the transfn directly ... which also seems ruled out
for these transfns because the state type is internal. Is this whole test
more of the nature of an assertion?

Regards,
-Chap

The new status of this patch is: Waiting on Author


Re: Adding CI to our tree

2022-02-26 Thread Andres Freund
On 2022-02-26 17:09:08 -0800, Andres Freund wrote:
> You could put the script in src/tools/ci and call it from the script to avoid
> the quoting issues.

Might also be a good idea for the bulk of the docs / coverage stuff, even if
there are no quoting issues.




Re: Adding CI to our tree

2022-02-26 Thread Andres Freund
Hi,

> Subject: [PATCH 2/7] cirrus: upload changed html docs as artifacts

I still think the determination of the base branch needs to be resolved before
this can be considered.


> Always run doc build; to allow them to be shown in cfbot, they should not be
> skipped if the linux build fails.
> 
> This could be done on the client side (cfbot).  One advantage of doing it here
> is that fewer docs are uploaded - many patches won't upload docs at all.

Imo this stuff is largely independent from the commit subject


> XXX: if this is run in the same task, the configure flags should probably be
> consistent ?

What do you mean?


> Subject: [PATCH 3/7] s!build docs as a separate task..

Could you reorder this to earlier, then we can merge it before resolving the
branch issues. And we don't waffle on the CompilerWarnings dependency.


> I believe this'll automatically show up as a separate "column" on the cfbot
> page.

Yup.


> +# Verify docs can be built, and upload changed docs as artifacts
> +task:
> +  name: HTML docs
> +
> +  env:
> +CPUS: 1
> +
> +  only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || 
> $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*(docs|html).*'
> +
> +  container:
> +image: $CONTAINER_REPO/linux_debian_bullseye_ci:latest
> +cpu: $CPUS
> +

how about using something like (the syntax might be slightly off)
  skip: !changesInclude('doc/**')
to avoid running it for the many pushes where no docs are changed?


> +  sysinfo_script: |
> +id
> +uname -a
> +cat /proc/cmdline
> +ulimit -a -H && ulimit -a -S
> +export
> +
> +git remote -v
> +git branch -a
> +git remote add postgres https://github.com/postgres/postgres
> +time git fetch -v postgres master
> +git log -1 postgres/master
> +git diff --name-only postgres/master..

Hardly "sysinfo"?


> Subject: [PATCH 4/7] wip: cirrus: code coverage
> 
> XXX: lcov should be installed in the OS image

FWIW, you can open a PR in https://github.com/anarazel/pg-vm-images/
both the debian docker and VM have their packages installed
via scripts/linux_debian_install_deps.sh


> From 226699150e3e224198fc297689add21bece51c4b Mon Sep 17 00:00:00 2001
> From: Justin Pryzby 
> Date: Sun, 9 Jan 2022 18:25:02 -0600
> Subject: [PATCH 5/7] cirrus/vcregress: test modules/contrib with
>  NO_INSTALLCHECK=1

I don't want to commit the vcregress.pl part myself. But if you split off I'm
happy to push the --temp-config bits.


> From 08933bcd93d4f57ad73ab6df2f1627b93e61b459 Mon Sep 17 00:00:00 2001
> From: Justin Pryzby 
> Date: Sun, 16 Jan 2022 12:51:13 -0600
> Subject: [PATCH 6/7] wip: cirrus/windows: save tmp_install as an artifact..
> 
> ..to allow users to easily download compiled binaries to try a patch.
> If they don't have a development environment handy or not familiar with
> compilation.
> 
> XXX: maybe this should be conditional or commented out ?

Yea, I don't want to do this by default, that's a fair bit of data that very
likely nobody will ever access. One can make entire tasks triggered manually,
but that'd then require building again :/.



> From a7d2bba6f51d816412fb645b0d4821c36ee5c400 Mon Sep 17 00:00:00 2001
> From: Justin Pryzby 
> Date: Sun, 20 Feb 2022 15:01:59 -0600
> Subject: [PATCH 7/7] wip: cirrus/windows: add compiler_warnings_script
> 
> I'm not sure how to write this test in windows shell; it's also not easy to
> write it in posix sh, since windows shell is somehow interpretting && and 
> ||...

You could put the script in src/tools/ci and call it from the script to avoid
the quoting issues.

Would be good to add a comment explaining the fileLoggerParameters1 thing and
a warning that compiler_warnings_script should be the last script.


Greetings,

Andres Freund




Re: Fix a typo in pg_receivewal.c's get_destination_dir()

2022-02-26 Thread Andres Freund
Hi,

On 2022-02-26 22:04:13 +0530, Bharath Rupireddy wrote:
> It looks like there's a typo in pg_receivewal.c's
> get_destination_dir(), that is, use the function parameter dest_folder
> instead of global variable basedir in the error message. Although
> there's no harm in it as basedir is a global variable in
> pg_receivewal.c, let's use the function parameter to be in sync with
> close_destination_dir.
> 
> Attaching a tiny patch to fix it.

Good catch, pushed.

Greetings,

Andres Freund




Re: Commitfest manager for 2022-03

2022-02-26 Thread Justin Pryzby
Can I suggest to update the CF APP to allow:
| Target version: 16

I also suggest to update patches to indicate which are (not) being considered
for v15.

A few specific ones from myself:

|https://commitfest.postgresql.org/37/2573/
|pg_dump - read data for some options from external fileReady for 
Committer 15
Marked RFC by Daniel Gustafsson since 2021-10-01.
Is it more likely than not to be included in v15, or should the "v15" be
removed ?

|https://commitfest.postgresql.org/37/3499/
|libpq compression  Waiting on Author   15
I re-raised the same concerns made ~12 months ago but haven't heard back.
WOA since 35 days.  Unlikely to be considered/included in v15.

|https://commitfest.postgresql.org/37/3571/
|Add LZ4 compression in pg_dump Needs review
This patch was just submitted on 2022-02-25.
I did a lot of the same things this patch does for a previous patch submission
(32/2888) for pg_dump/ZSTD, so I could review this, if there were interest to
include it in v15.

|https://commitfest.postgresql.org/37/2349/
|Global temporary table Needs review15
The handling/hijacking of pg_class and pg_statistic is not likely to be
acceptable.  I doubt this will be progressed for v15.

|https://commitfest.postgresql.org/37/3448/
|reduce impact of lengthy startup and checkpoint tasks  Needs review15
I think this is in the design/concept phase, and not likely for v15, except
maybe in reduced scope.

|https://commitfest.postgresql.org/37/2906/
|Row filtering for logical replication
If I'm not wrong, this is merged and should be closed?

For my own patches, it'd be helpful if someone would suggest if any are (not)
likely to be progressed for v15.

These are marked RFC, but have no "committer" set.  Are they all targetting
v15?

https://commitfest.postgresql.org/37/2863/
Function to log backtrace of postgres processes Ready for 
Committer
Should target v15 ?

https://commitfest.postgresql.org/37/2897/
Faster pglz compression Ready for Committer 15
Tomas, are you still planning to merge this one ?

https://commitfest.postgresql.org/37/2992/
Allow batched insert during cross-partition updates Ready for 
Committer

https://commitfest.postgresql.org/37/3073/
Add callback table access method to reset filenode when dropping 
relation   Ready for Committer

https://commitfest.postgresql.org/37/3384/
use has_privs_for_role for predefined roles Ready for Committer 
15

Do all of the RFC patches target v15 (except bugfixes) ?
https://commitfest.postgresql.org/37/?status=3

These are not marked as targetting any version .. are any of them 
being considered for v15 ?

https://commitfest.postgresql.org/37/2903/
Parallel Hash Full Join Needs review

https://commitfest.postgresql.org/37/3508/
Avoid smgrimmedsync() during index build and add unbuffered IO API  
Needs review

https://commitfest.postgresql.org/37/3182/
automatically generating node support functions Needs review

https://commitfest.postgresql.org/37/3183/
Detectable crashes and unlogged table resetsNeeds review
Jeff Davis (jdavis)

These are set as "targetting v15", but have no committer set.  If someone
thinks it's unrealistic they'll be included in v15, I suggest to update not to
say so.

https://commitfest.postgresql.org/37/3272/
Add system view tracking shared buffer actions  Needs review15

https://commitfest.postgresql.org/37/3224/
Fix ExecRTCheckPerms() inefficiency with many prunable partitions   
Needs review15

https://commitfest.postgresql.org/37/3270/
Cache tuple routing info during bulk loads into partitioned tables  
Needs review15

https://commitfest.postgresql.org/37/3461/
In-place persistence change of a relation (fast ALTER TABLE ... SET 
LOGGED with wal_level=minimal)  Needs review15

https://commitfest.postgresql.org/37/3539/
Allow parallel plan for referential integrity checksNeeds review
15

https://commitfest.postgresql.org/37/3397/
Prefetching in recovery, take IINeeds review15  Thomas 
Munro (macdice)

https://commitfest.postgresql.org/37/2482/
jsonpath syntax extensions  Needs review15

https://commitfest.postgresql.org/37/2901/
SQL/JSON: functions Needs review15

https://commitfest.postgresql.org/37/2902/
SQL/JSON: JSON_TABLENeeds review15

https://commitfest.postgresql.org/37/3099/
Asymmetric partition-wise JOIN  Needs review15

https://commitfest.postgresql.org/37/3293/
Tags in errordata   Needs review15

https://commitfest.postgresql.org/37/3358/
Update relfrozenxmin when 

Re: [Proposal] Global temporary tables

2022-02-26 Thread Justin Pryzby
I read through this.
Find attached some language fixes.  You should be able to apply each "fix"
patch on top of your own local branch with git am, and then squish them
together.  Let me know if you have trouble with that.

I think get_seqence_start_value() should be static.  (Or otherwise, it should
be in lsyscache.c).

The include added to execPartition.c seems to be unused.

+#define RELATION_IS_TEMP_ON_CURRENT_SESSION(relation) \
+#define RELATION_IS_TEMP(relation) \
+#define RelpersistenceTsTemp(relpersistence) \
+#define RELATION_GTT_ON_COMMIT_DELETE(relation)\

=> These macros can evaluate their arguments multiple times.
You should add a comment to warn about that.  And maybe avoid passing them a
function argument, like: RelpersistenceTsTemp(get_rel_persistence(rte->relid))

+list_all_backend_gtt_frozenxids should return TransactionId not int.
The function name should say "oldest" and not "all" ?

I think the GUC should have a longer name.  max_active_gtt is too short for a
global var.

+#defineMIN_NUM_ACTIVE_GTT  0
+#defineDEFAULT_NUM_ACTIVE_GTT  1000
+#defineMAX_NUM_ACTIVE_GTT  100

+intmax_active_gtt = MIN_NUM_ACTIVE_GTT

It's being initialized to MIN, but then the GUC machinery sets it to DEFAULT.
By convention, it should be initialized to default.

fout->remoteVersion >= 14

=> should say 15

describe.c has gettext_noop("session"), which is a half-truth.  The data is
per-session but the table definition is persistent..

You redirect stats from pg_class and pg_statistics to a local hash table.
This is pretty hairy :(
I guess you'd also need to handle pg_statistic_ext and ext_data.
pg_stats doesn't work, since the data isn't in pg_statistic - it'd need to look
at pg_get_gtt_statistics.

I wonder if there's a better way to do it, like updating pg_statistic but
forcing the changes to be rolled back when the session ends...  But I think
that would make longrunning sessions behave badly, the same as "longrunning
transactions".

Have you looked at Gilles Darold's GTT extension ?
>From b89f3cc5c78e7b4c3e10ab39ef527b524d0d112d Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sat, 1 Jan 2022 17:02:30 -0600
Subject: [PATCH 02/11] f!0002-gtt-v64-doc.patch

---
 doc/src/sgml/ref/create_table.sgml | 42 --
 1 file changed, 22 insertions(+), 20 deletions(-)

diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 408373323c..9cd5fc17f2 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -174,18 +174,18 @@ WITH ( MODULUS numeric_literal, REM
  
   If specified, the table is created as a temporary table.
   Optionally, GLOBAL or LOCAL
-  can be written before TEMPORARY or TEMP.
+  may be written before TEMPORARY or TEMP.
   They represent two types of temporary tables supported by PostgreSQL:
-  global temporary table and local temporary table. Without specified
-  GLOBAL or LOCAL, a local temporary table is created by default.
+  global temporary table and local temporary table. If neither
+  GLOBAL or LOCAL is specified, a local temporary table is created by default.
  
 
 
- Both types of temporary tables’ data are truncated at the
+ Data of both types of temporary tables is truncated at the
  end of a session or optionally at the end of the current transaction.
- (see ON COMMIT below). For global temporary table,
- its schema is reserved and reused by future sessions or transactions.
- For local temporary table, both its data and its schema are dropped.
+ (see ON COMMIT below). For a global temporary table,
+ its table definition is preserved and reused by future sessions or transactions.
+ For a local temporary table, both its data and its table definition are dropped.
 
 
 
@@ -194,10 +194,10 @@ WITH ( MODULUS numeric_literal, REM
   

 Global temporary table are defined just once and automatically exist
-(starting with empty contents) in every session that needs them.
-The schema definition of temporary tables is persistent and shared among sessions.
-However, the data in temporary tables are kept private to sessions themselves,
-even though they use same name and same schema.
+(initially with empty contents) in every session.
+The schema definition of a temporary table is persistent and common to all sessions.
+However, the data in temporary tables is private to each sessions,
+even though they share the same name and same table definition.

   
  
@@ -206,15 +206,15 @@ WITH ( MODULUS numeric_literal, REM
   Local Temporary Table
  
  
-  Local temporary table are automatically dropped at the end of a
-  session (include schema and data). Future sessions need to create
-  their own temporary tables when they are used.
+  Local 

Re: Missed condition-variable wakeups on FreeBSD

2022-02-26 Thread Melanie Plageman
On Sat, Feb 26, 2022 at 2:07 PM Tom Lane  wrote:
>
> About once a month over the last six months, my buildfarm animal
> florican has gotten stuck while running the core regression tests.
> The symptoms have looked very much the same each time: there is
> a backend with two parallel worker processes that are just sitting
> and not consuming any CPU time.  Each time I've attached to these
> processes with gdb to check their stack traces, and seen pretty
> much the same story every time (traces below).  What is really
> interesting is that after I detach from the second worker, the
> processes resume running and finish out the test successfully!
> I don't know much about how gdb interacts with kernel calls on
> FreeBSD, but I speculate that the poll(2) call returns with EINTR
> after gdb releases the process, and then things resume fine,
> suggesting that we lost an interrupt somewhere.
>
> I have observed this three times in the REL_11 branch, once
> in REL_12, and a couple of times last summer before it occurred
> to me to start keeping notes.  Over that time the machine has
> been running various patchlevels of FreeBSD 13.0.
>
> Here's the stack trace from the leader process in the most
> recent event (REL_11 as of yesterday).  It's not always the
> same query that gets stuck, but it's always a parallel hash join:
>
> (gdb) p debug_query_string
> $1 = 0x21873090 "select count(*) from simple r join simple s using (id);"
> (gdb) bt
> #0  _poll () at _poll.S:4
> #1  0x21701361 in __thr_poll (fds=0x219dc170, nfds=2, timeout=-1) at 
> /usr/src/lib/libthr/thread/thr_syscalls.c:338
> #2  0x215eaf3f in poll (pfd=0x219dc170, nfds=2, timeout=-1) at 
> /usr/src/lib/libc/sys/poll.c:47
> #3  0x0097b0fd in WaitEventSetWaitBlock (set=, cur_timeout=-1, 
> occurred_events=, nevents=) at latch.c:1171
> #4  WaitEventSetWait (set=0x219dc138, timeout=-1, occurred_events= out>, nevents=1, wait_event_info=134217738) at latch.c:1000
> #5  0x0099842b in ConditionVariableSleep (cv=0x2bf0b230, 
> wait_event_info=134217738) at condition_variable.c:157
> #6  0x00977e12 in BarrierArriveAndWait (barrier=0x2bf0b218, 
> wait_event_info=134217738) at barrier.c:191
> #7  0x00873441 in ExecHashJoinImpl (pstate=, parallel=true) at 
> nodeHashjoin.c:328
> #8  ExecParallelHashJoin (pstate=0x2a887740) at nodeHashjoin.c:600
> #9  0x0086a509 in ExecProcNode (node=0x2a887740) at 
> ../../../src/include/executor/executor.h:248
> #10 fetch_input_tuple (aggstate=aggstate@entry=0x2a887500) at nodeAgg.c:407
> #11 0x00869646 in agg_retrieve_direct (aggstate=) at 
> nodeAgg.c:1746
> #12 ExecAgg (pstate=0x2a887500) at nodeAgg.c:1561
> #13 0x0086e181 in ExecProcNode (node=0x2a887500) at 
> ../../../src/include/executor/executor.h:248
> #14 gather_getnext (gatherstate=0x2a887414) at nodeGather.c:276
> #15 ExecGather (pstate=0x2a887414) at nodeGather.c:207
> #16 0x0086a509 in ExecProcNode (node=0x2a887414) at 
> ../../../src/include/executor/executor.h:248
> #17 fetch_input_tuple (aggstate=aggstate@entry=0x2a8871b8) at nodeAgg.c:407
> #18 0x00869646 in agg_retrieve_direct (aggstate=) at 
> nodeAgg.c:1746
> #19 ExecAgg (pstate=0x2a8871b8) at nodeAgg.c:1561
> #20 0x0085956b in ExecProcNode (node=0x2a8871b8) at 
> ../../../src/include/executor/executor.h:248
> #21 ExecutePlan (estate=0x2a887090, planstate=0x2a8871b8, 
> use_parallel_mode=, operation=CMD_SELECT, 
> sendTuples=,
> numberTuples=, direction=, dest=0x2ab629b4, 
> execute_once=) at execMain.c:1712
> #22 standard_ExecutorRun (queryDesc=0x218a1490, 
> direction=ForwardScanDirection, count=0, execute_once=) at 
> execMain.c:353
> #23 0x00859445 in ExecutorRun (queryDesc=0x218a1490, 
> direction=ForwardScanDirection, count=0, execute_once=) at 
> execMain.c:296
> #24 0x009a3d57 in PortalRunSelect (portal=portal@entry=0x2197f090, 
> forward=, count=0, dest=0x2ab629b4) at pquery.c:941
> #25 0x009a39d6 in PortalRun (portal=0x2197f090, count=2147483647, 
> isTopLevel=, run_once=, dest=0x2ab629b4, 
> altdest=0x2ab629b4,
> completionTag=0xffbfdc70 "") at pquery.c:782
> #26 0x009a2b53 in exec_simple_query 
> (query_string=query_string@entry=0x21873090 "select count(*) from simple r 
> join simple s using (id);") at postgres.c:1145
> #27 0x009a04ec in PostgresMain (argc=1, argv=0x21954ed4, dbname= out>, username=0x21954d38 "buildfarm") at postgres.c:4052
>
> Worker 1 looks like this:
>
> (gdb) bt
> #0  _poll () at _poll.S:4
> #1  0x21701361 in __thr_poll (fds=0x2187cb48, nfds=2, timeout=-1) at 
> /usr/src/lib/libthr/thread/thr_syscalls.c:338
> #2  0x215eaf3f in poll (pfd=0x2187cb48, nfds=2, timeout=-1) at 
> /usr/src/lib/libc/sys/poll.c:47
> #3  0x0097b0fd in WaitEventSetWaitBlock (set=, cur_timeout=-1, 
> occurred_events=, nevents=) at latch.c:1171
> #4  WaitEventSetWait (set=0x2187cb10, timeout=-1, occurred_events= out>, nevents=1, wait_event_info=134217738) at latch.c:1000
> #5  0x0099842b in ConditionVariableSleep (cv=0x2196b230, 
> wait_event_info=134217738) at condition_variable.c:157
> #6  

Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

2022-02-26 Thread Nathan Bossart
On Sat, Feb 26, 2022 at 05:03:04PM -0500, Chapman Flack wrote:
> I've now looked through the rest, and the only further thing I noticed
> was that xlog.c's do_pg_start_backup still has a tablespaces parameter
> to receive a List* of tablespaces if the caller wants, but this patch
> removes the comment describing it:
> 
> 
> - * If "tablespaces" isn't NULL, it receives a list of tablespaceinfo structs
> - * describing the cluster's tablespaces.
> 
> 
> which seems like collateral damage.

Thanks.  I will fix this and the proofreading nits you noted upthread in
the next revision.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

2022-02-26 Thread Nathan Bossart
On Sat, Feb 26, 2022 at 04:48:52PM +, Chapman Flack wrote:
> My biggest concerns are the changes to the SQL-visible pg_start_backup
> and pg_stop_backup functions. When the non-exclusive API was implemented
> (in 7117685), that was done with care (with a new optional argument to
> pg_start_backup, and a new overload of pg_stop_backup) to avoid immediate
> breakage of working backup scripts.
> 
> With this patch, even scripts that were dutifully migrated to that new API and
> now invoke pg_start_backup(label, false) or (label, exclusive => false) will
> immediately and unnecessarily break. What I would suggest for this patch
> would be to change the exclusive default from true to false, and have the
> function report an ERROR if true is passed.
> 
> Otherwise, for sites using a third-party backup solution, there will be an
> unnecessary requirement to synchronize a PostgreSQL upgrade with an
> upgrade of the backup solution that won't be broken by the change. For
> a site with their backup procedures scripted in-house, there will be an
> unnecessarily urgent need for the current admin team to study and patch
> the currently-working scripts.
> 
> That can be avoided by just changing the default to false and rejecting calls
> where true is passed. That will break only scripts that never got the memo
> about moving to non-exclusive backup, available for six years now.
> 
> Assuming the value is false, so no error is thrown, is it practical to 
> determine
> from flinfo->fn_expr whether the value was defaulted or supplied? If so, I 
> would
> further suggest reporting a deprecation WARNING if it was explicitly supplied,
> with a HINT that the argument can simply be removed at the call site, and will
> become unrecognized in some future release.

This is a good point.  I think I agree with your proposed changes.  I
believe it is possible to add a deprecation warning only when 'exclusive'
is specified.  If anything, we can create a separate function that accepts
the 'exclusive' parameter and that always emits a NOTICE or WARNING.

> pg_stop_backup needs thought, because 7117685 added a new overload for that
> function, rather than just an optional argument. This patch removes the old
> niladic version that returned pg_lsn, leaving just one version, with an 
> optional
> argument, that returns a record.
> 
> Here again, the old niladic one was only suitable for exclusive backups, so 
> there
> can't be any script existing in 2022 that still calls that unless it has 
> never been
> updated in six years to nonexclusive backups, and that breakage can't be
> helped.
> 
> Any scripts that did get dutifully updated over the last six years will be 
> calling the
> record-returning version, passing false, or exclusive => false. This patch as 
> it
> stands will unnecessarily break those, but here again I think that can be 
> avoided
> just by making the exclusive parameter optional with default false, and 
> reporting
> an error if true is passed.
> 
> Here again, I would consider also issuing a deprecation warning if the 
> argument
> is explicitly supplied, if it is practical to determine that from fn_expr. (I 
> haven't
> looked yet to see how practical that is.)

Agreed.  I will look into updating this one, too.  I think the 'exclusive'
parameter should remain documented for now for both pg_start_backup() and
pg_stop_backup(), but this documentation will just note that it is there
for backward compatibility and must be set to false.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

2022-02-26 Thread Chapman Flack
On 02/26/22 11:48, Chapman Flack wrote:
> This patch applies cleanly for me and passes installcheck-world.
> I have not yet studied all of the changes in detail.

I've now looked through the rest, and the only further thing I noticed
was that xlog.c's do_pg_start_backup still has a tablespaces parameter
to receive a List* of tablespaces if the caller wants, but this patch
removes the comment describing it:


- * If "tablespaces" isn't NULL, it receives a list of tablespaceinfo structs
- * describing the cluster's tablespaces.


which seems like collateral damage.

Regards,
-Chap




Checkpointer sync queue fills up / loops around pg_usleep() are bad

2022-02-26 Thread Andres Freund
Hi,

In two recent investigations in occasional test failures
(019_replslot_limit.pl failures, AIO rebase) the problems are somehow tied to
checkpointer.

I don't yet know if actually causally related to precisely those failures, but
when running e.g. 027_stream_regress.pl, I see phases in which many backends
are looping in RegisterSyncRequest() repeatedly, each time sleeping with
pg_usleep(1L).


Without adding instrumentation this is completely invisible at any log
level. There's no log messages, there's no wait events, nothing.

ISTM, we should not have any loops around pg_usleep(). And shorter term, we
shouldn't have any loops around pg_usleep() that don't emit log messages / set
wait events. Therefore I propose that we "prohibit" such loops without at
least a DEBUG2 elog() or so. It's just too hard to debug.


The reason for the sync queue filling up in 027_stream_regress.pl is actually
fairly simple:

1) The test runs with shared_buffers = 1MB, leading to a small sync queue of
   128 entries.
2) CheckpointWriteDelay() does pg_usleep(10L)

ForwardSyncRequest() wakes up the checkpointer using SetLatch() if the sync
queue is more than half full.

But at least on linux and freebsd that doesn't actually interrupt pg_usleep()
anymore (due to using signalfd / kqueue rather than a signal handler). And on
all platforms the signal might arrive just before the pg_usleep() rather than
during, also not causing usleep to be interrupted.

If I shorten the sleep in CheckpointWriteDelay() the problem goes away. This
actually reduces the time for a single run of 027_stream_regress.pl on my
workstation noticably.  With default sleep time it's ~32s, with shortened time
it's ~27s.

I suspect we need to do something about this concrete problem for 14 and
master, because it's certainly worse than before on linux / freebsd.

I suspect the easiest is to just convert that usleep to a WaitLatch(). That'd
require adding a new enum value to WaitEventTimeout in 14. Which probably is
fine?



Greetings,

Andres Freund




Re: Commitfest manager for 2022-03

2022-02-26 Thread Greg Stark
On Sat, 26 Feb 2022 at 01:33, Julien Rouhaud  wrote:
>
> On Sat, Feb 26, 2022 at 02:42:33PM +0900, Michael Paquier wrote:
> > On Fri, Feb 25, 2022 at 01:58:55PM -0600, David Steele wrote:
> > > On 2/25/22 12:39, Greg Stark wrote:
> > >> I would like to volunteer.
> >
> > > I've been hoping somebody would volunteer, so I'm all in favor of you 
> > > being
> > > CF.
> >
> > Greg as CFM would be fine.  It's nice to see someone volunteer.
>
> +1, thanks a lot Greg!
>
> Note that the last CF of a major version is probably even more work than
> "regular" CF, so if other people are interested there's probably room for
> multiple CFMs.

I do have the time available. What I don't necessarily have is insight
into everything that needs to be done, especially behind the scenes.
So if I could have someone to give me tips about things I'm missing or
review the things I am doing to see that I've covered everything that
would be great.


-- 
greg




Re: explain_regress, explain(MACHINE), and default to explain(BUFFERS) (was: BUFFERS enabled by default in EXPLAIN (ANALYZE))

2022-02-26 Thread Justin Pryzby
Rebased over ebf6c5249b7db525e59563fb149642665c88f747.
It looks like that patch handles only query_id, and this patch also tries to
handle a bunch of other stuff.

If it's helpful, feel free to kick this patch to a future CF.
>From e58fffedc6f1cf471228fb3234faba35898678c3 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sat, 22 Feb 2020 21:17:10 -0600
Subject: [PATCH 1/6] Add GUC: explain_regress

This changes the defaults for explain to: costs off, timing off, summary off.
It'd be reasonable to use this for new regression tests which are not intended
to be backpatched.
---
 contrib/postgres_fdw/postgres_fdw.c |  2 +-
 src/backend/commands/explain.c  | 13 +++--
 src/backend/utils/misc/guc.c| 13 +
 src/bin/psql/describe.c |  2 +-
 src/include/commands/explain.h  |  2 ++
 src/test/regress/expected/explain.out   |  3 +++
 src/test/regress/expected/inherit.out   |  2 +-
 src/test/regress/expected/stats_ext.out |  2 +-
 src/test/regress/pg_regress.c   |  3 ++-
 src/test/regress/sql/explain.sql|  4 
 src/test/regress/sql/inherit.sql|  2 +-
 src/test/regress/sql/stats_ext.sql  |  2 +-
 12 files changed, 41 insertions(+), 9 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 56654844e8..54da40d662 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -3124,7 +3124,7 @@ estimate_path_cost_size(PlannerInfo *root,
 		 * values, so don't request params_list.
 		 */
 		initStringInfo();
-		appendStringInfoString(, "EXPLAIN ");
+		appendStringInfoString(, "EXPLAIN (COSTS)");
 		deparseSelectStmtForRel(, root, foreignrel, fdw_scan_tlist,
 remote_conds, pathkeys,
 fpextra ? fpextra->has_final_sort : false,
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index de81379da3..22a3cf6349 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -154,6 +154,7 @@ static void ExplainJSONLineEnding(ExplainState *es);
 static void ExplainYAMLLineStarting(ExplainState *es);
 static void escape_yaml(StringInfo buf, const char *str);
 
+bool explain_regress = false; /* GUC */
 
 
 /*
@@ -172,6 +173,7 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 	ListCell   *lc;
 	bool		timing_set = false;
 	bool		summary_set = false;
+	bool		costs_set = false;
 
 	/* Parse options list. */
 	foreach(lc, stmt->options)
@@ -183,7 +185,11 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 		else if (strcmp(opt->defname, "verbose") == 0)
 			es->verbose = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "costs") == 0)
+		{
+			/* Need to keep track if it was explicitly set to ON */
+			costs_set = true;
 			es->costs = defGetBoolean(opt);
+		}
 		else if (strcmp(opt->defname, "buffers") == 0)
 			es->buffers = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "wal") == 0)
@@ -227,13 +233,16 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 	 parser_errposition(pstate, opt->location)));
 	}
 
+	/* if the costs option was not set explicitly, set default value */
+	es->costs = (costs_set) ? es->costs : es->costs && !explain_regress;
+
 	if (es->wal && !es->analyze)
 		ereport(ERROR,
 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
  errmsg("EXPLAIN option WAL requires ANALYZE")));
 
 	/* if the timing was not set explicitly, set default value */
-	es->timing = (timing_set) ? es->timing : es->analyze;
+	es->timing = (timing_set) ? es->timing : es->analyze && !explain_regress;
 
 	/* check that timing is used with EXPLAIN ANALYZE */
 	if (es->timing && !es->analyze)
@@ -242,7 +251,7 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
  errmsg("EXPLAIN option TIMING requires ANALYZE")));
 
 	/* if the summary was not set explicitly, set default value */
-	es->summary = (summary_set) ? es->summary : es->analyze;
+	es->summary = (summary_set) ? es->summary : es->analyze && !explain_regress;
 
 	query = castNode(Query, stmt->query);
 	if (IsQueryIdEnabled())
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 1e3650184b..87266cf7bd 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -46,6 +46,7 @@
 #include "catalog/pg_authid.h"
 #include "catalog/storage.h"
 #include "commands/async.h"
+#include "commands/explain.h"
 #include "commands/prepare.h"
 #include "commands/tablespace.h"
 #include "commands/trigger.h"
@@ -1474,6 +1475,18 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+
+	{
+		{"explain_regress", PGC_USERSET, DEVELOPER_OPTIONS,
+			gettext_noop("Change defaults of EXPLAIN to avoid unstable output."),
+			NULL,
+			GUC_NOT_IN_SAMPLE | GUC_EXPLAIN
+		},
+		_regress,
+		false,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"log_parser_stats", PGC_SUSET, STATS_MONITORING,
 			gettext_noop("Writes parser performance statistics to the server log."),
diff --git 

Re: Missed condition-variable wakeups on FreeBSD

2022-02-26 Thread Justin Pryzby
On Sat, Feb 26, 2022 at 02:07:05PM -0500, Tom Lane wrote:
> I don't know much about how gdb interacts with kernel calls on
> FreeBSD, but I speculate that the poll(2) call returns with EINTR
> after gdb releases the process, and then things resume fine,
> suggesting that we lost an interrupt somewhere.

I've seen some similar interactions with strace under linux causing a process
to be woken up or otherwise incur a different behavior (not necessarily
postgres).  

> Thoughts?  Ideas on debugging this?

Before attaching a debugger, figure out what syscall each process is in.
In linux, that's ps O wchan PID.

>> Besides trying to make the issue more likely as suggested above, it might be
>> worth checking if signalling the stuck processes with SIGUSR1 gets them
>> unstuck.

And SIGCONT.

Maybe already did this, but you can dump a corefile of the running processes to
allow future inspection.

-- 
Justin




Re: Missed condition-variable wakeups on FreeBSD

2022-02-26 Thread Andres Freund
Hi,

On 2022-02-26 14:07:05 -0500, Tom Lane wrote:
> About once a month over the last six months, my buildfarm animal
> florican has gotten stuck while running the core regression tests.
> The symptoms have looked very much the same each time: there is
> a backend with two parallel worker processes that are just sitting
> and not consuming any CPU time.  Each time I've attached to these
> processes with gdb to check their stack traces, and seen pretty
> much the same story every time (traces below).  What is really
> interesting is that after I detach from the second worker, the
> processes resume running and finish out the test successfully!
> I don't know much about how gdb interacts with kernel calls on
> FreeBSD, but I speculate that the poll(2) call returns with EINTR
> after gdb releases the process, and then things resume fine,
> suggesting that we lost an interrupt somewhere.
>
> I have observed this three times in the REL_11 branch, once
> in REL_12, and a couple of times last summer before it occurred
> to me to start keeping notes.  Over that time the machine has
> been running various patchlevels of FreeBSD 13.0.

It's certainly interesting that it appears to happen only in the branches
using poll rather than kqueue to implement latches. That changed between 12
and 13.

Have you tried running the core regression tests with force_parallel_mode =
on, or with the parallel costs lowered, to see if that makes the problem
appear more often?


> Here's the stack trace from the leader process in the most
> recent event (REL_11 as of yesterday).  It's not always the
> same query that gets stuck, but it's always a parallel hash join:

Which does make me wonder if it's a latch issue or a logic issue in hashjoin /
barriers. HJ is the only user of barrier.c...


There's this commit, which subsequently was reverted due to some issue, and
not re-applied since...

commit 4e0f0995e923948631c4114ab353b256b51b58ad
Author: Thomas Munro 
Date:   2021-03-17 17:46:39 +1300
 
Fix race in Parallel Hash Join batch cleanup.

It doesn't seem super likely to be related, but...


> (gdb) p debug_query_string
> $1 = 0x21873090 "select count(*) from simple r join simple s using (id);"
> (gdb) bt
> #0  _poll () at _poll.S:4
> #1  0x21701361 in __thr_poll (fds=0x219dc170, nfds=2, timeout=-1) at 
> /usr/src/lib/libthr/thread/thr_syscalls.c:338
> #2  0x215eaf3f in poll (pfd=0x219dc170, nfds=2, timeout=-1) at 
> /usr/src/lib/libc/sys/poll.c:47
> #3  0x0097b0fd in WaitEventSetWaitBlock (set=, cur_timeout=-1, 
> occurred_events=, nevents=) at latch.c:1171

The next time this happens / if you still have this open, perhaps it could be
worth checking if there's a byte in the self pipe?


> Thoughts?  Ideas on debugging this?

Besides trying to make the issue more likely as suggested above, it might be
worth checking if signalling the stuck processes with SIGUSR1 gets them
unstuck.

Greetings,

Andres Freund




Missed condition-variable wakeups on FreeBSD

2022-02-26 Thread Tom Lane
About once a month over the last six months, my buildfarm animal
florican has gotten stuck while running the core regression tests.
The symptoms have looked very much the same each time: there is
a backend with two parallel worker processes that are just sitting
and not consuming any CPU time.  Each time I've attached to these
processes with gdb to check their stack traces, and seen pretty
much the same story every time (traces below).  What is really
interesting is that after I detach from the second worker, the
processes resume running and finish out the test successfully!
I don't know much about how gdb interacts with kernel calls on
FreeBSD, but I speculate that the poll(2) call returns with EINTR
after gdb releases the process, and then things resume fine,
suggesting that we lost an interrupt somewhere.

I have observed this three times in the REL_11 branch, once
in REL_12, and a couple of times last summer before it occurred
to me to start keeping notes.  Over that time the machine has
been running various patchlevels of FreeBSD 13.0.

Here's the stack trace from the leader process in the most
recent event (REL_11 as of yesterday).  It's not always the
same query that gets stuck, but it's always a parallel hash join:

(gdb) p debug_query_string 
$1 = 0x21873090 "select count(*) from simple r join simple s using (id);"
(gdb) bt
#0  _poll () at _poll.S:4
#1  0x21701361 in __thr_poll (fds=0x219dc170, nfds=2, timeout=-1) at 
/usr/src/lib/libthr/thread/thr_syscalls.c:338
#2  0x215eaf3f in poll (pfd=0x219dc170, nfds=2, timeout=-1) at 
/usr/src/lib/libc/sys/poll.c:47
#3  0x0097b0fd in WaitEventSetWaitBlock (set=, cur_timeout=-1, 
occurred_events=, nevents=) at latch.c:1171
#4  WaitEventSetWait (set=0x219dc138, timeout=-1, occurred_events=, nevents=1, wait_event_info=134217738) at latch.c:1000
#5  0x0099842b in ConditionVariableSleep (cv=0x2bf0b230, 
wait_event_info=134217738) at condition_variable.c:157
#6  0x00977e12 in BarrierArriveAndWait (barrier=0x2bf0b218, 
wait_event_info=134217738) at barrier.c:191
#7  0x00873441 in ExecHashJoinImpl (pstate=, parallel=true) at 
nodeHashjoin.c:328
#8  ExecParallelHashJoin (pstate=0x2a887740) at nodeHashjoin.c:600
#9  0x0086a509 in ExecProcNode (node=0x2a887740) at 
../../../src/include/executor/executor.h:248
#10 fetch_input_tuple (aggstate=aggstate@entry=0x2a887500) at nodeAgg.c:407
#11 0x00869646 in agg_retrieve_direct (aggstate=) at 
nodeAgg.c:1746
#12 ExecAgg (pstate=0x2a887500) at nodeAgg.c:1561
#13 0x0086e181 in ExecProcNode (node=0x2a887500) at 
../../../src/include/executor/executor.h:248
#14 gather_getnext (gatherstate=0x2a887414) at nodeGather.c:276
#15 ExecGather (pstate=0x2a887414) at nodeGather.c:207
#16 0x0086a509 in ExecProcNode (node=0x2a887414) at 
../../../src/include/executor/executor.h:248
#17 fetch_input_tuple (aggstate=aggstate@entry=0x2a8871b8) at nodeAgg.c:407
#18 0x00869646 in agg_retrieve_direct (aggstate=) at 
nodeAgg.c:1746
#19 ExecAgg (pstate=0x2a8871b8) at nodeAgg.c:1561
#20 0x0085956b in ExecProcNode (node=0x2a8871b8) at 
../../../src/include/executor/executor.h:248
#21 ExecutePlan (estate=0x2a887090, planstate=0x2a8871b8, 
use_parallel_mode=, operation=CMD_SELECT, sendTuples=, 
numberTuples=, direction=, dest=0x2ab629b4, 
execute_once=) at execMain.c:1712
#22 standard_ExecutorRun (queryDesc=0x218a1490, direction=ForwardScanDirection, 
count=0, execute_once=) at execMain.c:353
#23 0x00859445 in ExecutorRun (queryDesc=0x218a1490, 
direction=ForwardScanDirection, count=0, execute_once=) at 
execMain.c:296
#24 0x009a3d57 in PortalRunSelect (portal=portal@entry=0x2197f090, 
forward=, count=0, dest=0x2ab629b4) at pquery.c:941
#25 0x009a39d6 in PortalRun (portal=0x2197f090, count=2147483647, 
isTopLevel=, run_once=, dest=0x2ab629b4, 
altdest=0x2ab629b4, 
completionTag=0xffbfdc70 "") at pquery.c:782
#26 0x009a2b53 in exec_simple_query (query_string=query_string@entry=0x21873090 
"select count(*) from simple r join simple s using (id);") at postgres.c:1145
#27 0x009a04ec in PostgresMain (argc=1, argv=0x21954ed4, dbname=, username=0x21954d38 "buildfarm") at postgres.c:4052

Worker 1 looks like this:

(gdb) bt
#0  _poll () at _poll.S:4
#1  0x21701361 in __thr_poll (fds=0x2187cb48, nfds=2, timeout=-1) at 
/usr/src/lib/libthr/thread/thr_syscalls.c:338
#2  0x215eaf3f in poll (pfd=0x2187cb48, nfds=2, timeout=-1) at 
/usr/src/lib/libc/sys/poll.c:47
#3  0x0097b0fd in WaitEventSetWaitBlock (set=, cur_timeout=-1, 
occurred_events=, nevents=) at latch.c:1171
#4  WaitEventSetWait (set=0x2187cb10, timeout=-1, occurred_events=, nevents=1, wait_event_info=134217738) at latch.c:1000
#5  0x0099842b in ConditionVariableSleep (cv=0x2196b230, 
wait_event_info=134217738) at condition_variable.c:157
#6  0x00977e12 in BarrierArriveAndWait (barrier=0x2196b218, 
wait_event_info=134217738) at barrier.c:191
#7  0x00873441 in ExecHashJoinImpl (pstate=, parallel=true) at 
nodeHashjoin.c:328
#8  ExecParallelHashJoin (pstate=0x2a8378d8) at nodeHashjoin.c:600
#9  

Re: Reducing power consumption on idle servers

2022-02-26 Thread Tom Lane
Magnus Hagander  writes:
>> Deprecating explicit file-based promotion is possible and simple, so
>> that is the approach in the latest version of the patch.

> Is there any actual use-case for this other than backwards
> compatibility?

The fundamental problem with signal-based promotion is that it's
an edge-triggered rather than level-triggered condition, ie if you
miss the single notification event you're screwed.  I'm not sure
why we'd want to go there, considering all the problems we're having
with edge-triggered APIs in nearby threads.

We could combine the two approaches, perhaps: have "pg_ctl promote"
create a trigger file and then send a signal.  The trigger file
would record the existence of the promotion request, so that if the
postmaster isn't running right now or misses the signal, it'd still
promote when restarted or otherwise at the next opportunity.
But with an approach like this, we could expect quick response to
the signal in normal conditions, without need for constant wakeups
to check for the file.  Also, this route would allow overloading
of the signal, since it would just mean "wake up, I asked you to
do something" rather than needing to carry the full semantics of
what is to be done.

regards, tom lane




Re: Reducing power consumption on idle servers

2022-02-26 Thread Magnus Hagander
On Mon, Feb 21, 2022 at 5:11 PM Simon Riggs
 wrote:
>
> On Sat, 19 Feb 2022 at 17:03, Andres Freund  wrote:
> >
> > Hi,
> >
> > On 2022-02-19 14:10:39 +, Simon Riggs wrote:
> > IMO we should instead consider either deprecating file based promotion, or
> > adding an optional dependency on filesystem monitoring APIs (i.e. inotify 
> > etc)
> > that avoid the need to poll for file creation.

Came here to suggest exactly that :)


> Deprecating explicit file-based promotion is possible and simple, so
> that is the approach in the latest version of the patch.

Is there any actual use-case for this other than backwards
compatibility? The alternative has certainly been around for some time
now, so if we don't know a specific use-case for the file-based one
it's definitely time to deprecate it properly.


-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/




Re: Document ordering guarantees on INSERT/UPDATE RETURNING clause

2022-02-26 Thread Shay Rojansky
> > > That seems very reasonable; if the situation is similar on PostgreSQL,
> > > then I'd suggest making that very clear in the INSERT[2] and
UPDATE[3] docs.
> >
> > There is clearly no mention of such a guarantee in our documentation.
>
> Yes, which is just how SQL works: a set doesn't have any ordering unless
an
> explicit one has been defined, RETURNING is no exception to that.

Thanks for confirming that such a guarantee doesn't exist. I would still
suggest explicitly calling that out in the docs around RETURNING, since
that seems like an understand pitfall; personally-speaking, this certainly
wasn't clear to me when first looking at it (even if it is now).


Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

2022-02-26 Thread Chapman Flack
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

This patch applies cleanly for me and passes installcheck-world.
I have not yet studied all of the changes in detail.

Some proofreading nits in the documentation: the pg_stop_backup
with arguments has lost the 'exclusive' argument, but still shows a comma
before the 'wait_for_archive' argument. And the niladic pg_stop_backup
is still documented, though it no longer exists.

My biggest concerns are the changes to the SQL-visible pg_start_backup
and pg_stop_backup functions. When the non-exclusive API was implemented
(in 7117685), that was done with care (with a new optional argument to
pg_start_backup, and a new overload of pg_stop_backup) to avoid immediate
breakage of working backup scripts.

With this patch, even scripts that were dutifully migrated to that new API and
now invoke pg_start_backup(label, false) or (label, exclusive => false) will
immediately and unnecessarily break. What I would suggest for this patch
would be to change the exclusive default from true to false, and have the
function report an ERROR if true is passed.

Otherwise, for sites using a third-party backup solution, there will be an
unnecessary requirement to synchronize a PostgreSQL upgrade with an
upgrade of the backup solution that won't be broken by the change. For
a site with their backup procedures scripted in-house, there will be an
unnecessarily urgent need for the current admin team to study and patch
the currently-working scripts.

That can be avoided by just changing the default to false and rejecting calls
where true is passed. That will break only scripts that never got the memo
about moving to non-exclusive backup, available for six years now.

Assuming the value is false, so no error is thrown, is it practical to determine
from flinfo->fn_expr whether the value was defaulted or supplied? If so, I would
further suggest reporting a deprecation WARNING if it was explicitly supplied,
with a HINT that the argument can simply be removed at the call site, and will
become unrecognized in some future release.

pg_stop_backup needs thought, because 7117685 added a new overload for that
function, rather than just an optional argument. This patch removes the old
niladic version that returned pg_lsn, leaving just one version, with an optional
argument, that returns a record.

Here again, the old niladic one was only suitable for exclusive backups, so 
there
can't be any script existing in 2022 that still calls that unless it has never 
been
updated in six years to nonexclusive backups, and that breakage can't be
helped.

Any scripts that did get dutifully updated over the last six years will be 
calling the
record-returning version, passing false, or exclusive => false. This patch as it
stands will unnecessarily break those, but here again I think that can be 
avoided
just by making the exclusive parameter optional with default false, and 
reporting
an error if true is passed.

Here again, I would consider also issuing a deprecation warning if the argument
is explicitly supplied, if it is practical to determine that from fn_expr. (I 
haven't
looked yet to see how practical that is.)

Regards,
-Chap

Fix a typo in pg_receivewal.c's get_destination_dir()

2022-02-26 Thread Bharath Rupireddy
Hi,

It looks like there's a typo in pg_receivewal.c's
get_destination_dir(), that is, use the function parameter dest_folder
instead of global variable basedir in the error message. Although
there's no harm in it as basedir is a global variable in
pg_receivewal.c, let's use the function parameter to be in sync with
close_destination_dir.

Attaching a tiny patch to fix it.

Regards,
Bharath Rupireddy.


v1-0001-Fix-a-typo-in-pg_receivewal.c-s-get_destination_d.patch
Description: Binary data


Re: Mingw task for Cirrus CI

2022-02-26 Thread Andrew Dunstan


On 2/25/22 19:27, Andres Freund wrote:
> Hi,
>
> Andrew, CCIng you both because you might be interested in the CI bit, and
> because you might know the answer.
>
>
>
>> 2-  src/pl/plpython/Makefile looks under "C:/Windows/system32/" for
>> PYTHONDLL. gendef cannot open that file, give an error like " failed to
>> open()" when creating a .def file.
> On my win10 VM in which I installed python.org python I don't see a python dll
> in c:/windows/system32 either. Looks like none of our windows mingw animals
> build with python. So it might just be bitrot.



There certainly was a time when python from python.org used to install
its DLL in the system32 directory, so I imagine that's why it's there.
I'm very glad to see that's no longer the case.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers)

2022-02-26 Thread Nathan Bossart
On Sat, Feb 26, 2022 at 02:17:50PM +0530, Bharath Rupireddy wrote:
> A global min LSN of SendRqstPtr of all the sync standbys can be
> calculated and the async standbys can send WAL up to global min LSN.
> This is unlike what the v1 patch does i.e. async standbys will wait
> until the sync standbys report flush LSN back to the primary. Problem
> with the global min LSN approach is that there can still be a small
> window where async standbys can get ahead of sync standbys. Imagine
> async standbys being closer to the primary than sync standbys and if
> the failover has to happen while the WAL at SendRqstPtr isn't received
> by the sync standbys, but the async standbys can receive them as they
> are closer. We hit the same problem that we are trying to solve with
> this patch. This is the reason, we are waiting till the sync flush LSN
> as it guarantees more transactional protection.

Do you mean that the application of WAL gets ahead on your async standbys
or that the writing/flushing of WAL gets ahead?  If synchronous_commit is
set to 'remote_write' or 'on', I think either approach can lead to
situations where the async standbys are ahead of the sync standbys with WAL
application.  For example, a conflict between WAL replay and a query on
your sync standby could delay WAL replay, but the primary will not wait for
this conflict to resolve before considering a transaction synchronously
replicated and sending it to the async standbys.

If writing/flushing WAL gets ahead on async standbys, I think something is
wrong with the patch.  If you aren't sending WAL to async standbys until
it is synchronously replicated to the sync standbys, it should by
definition be impossible for this to happen.

If you wanted to make sure that WAL was not applied to async standbys
before it was applied to sync standbys, I think you'd need to set
synchronous_commit to 'remote_apply'.  This would ensure that the WAL is
replayed on sync standbys before the primary considers the transaction
synchronously replicated and sends it to the async standbys.

> Do you think allowing async standbys optionally wait for either remote
> write or flush or apply or global min LSN of SendRqstPtr so that users
> can choose what they want?

I'm not sure I follow the difference between "global min LSN of
SendRqstPtr" and remote write/flush/apply.  IIUC you are saying that we
could use the LSN of what is being sent to sync standbys instead of the LSN
of what the primary considers synchronously replicated.  I don't think we
should do that because it provides no guarantee that the WAL has even been
sent to the sync standbys before it is sent to the async standbys.  For
this feature, I think we always need to consider what the primary considers
synchronously replicated.  My suggested approach doesn't change that.  I'm
saying that instead of spinning in a loop waiting for the WAL to be
synchronously replicated, we just immediately send WAL up to the LSN that
is presently known to be synchronously replicated.

You do bring up an interesting point, though.  Is there a use-case for
specifying synchronous_commit='on' but not sending WAL to async replicas
until it is synchronously applied?  Or alternatively, would anyone want to
set synchronous_commit='remote_apply' but send WAL to async standbys as
soon as it is written to the sync standbys?  My initial reaction is that we
should depend on the synchronous replication setup.  As long as the primary
considers an LSN synchronously replicated, it would be okay to send it to
the async standbys.  I personally don't think it is worth taking on the
extra complexity for that level of configuration just yet.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: why do hash index builds use smgrextend() for new splitpoint pages

2022-02-26 Thread Melanie Plageman
On Fri, Feb 25, 2022 at 11:17 PM Amit Kapila  wrote:
>
> On Sat, Feb 26, 2022 at 3:01 AM Melanie Plageman
>  wrote:
> >
> > Since _hash_alloc_buckets() WAL-logs the last page of the
> > splitpoint, is it safe to skip the smgrimmedsync()? What if the last
> > page of the splitpoint doesn't end up having any tuples added to it
> > during the index build and the redo pointer is moved past the WAL for
> > this page and then later there is a crash sometime before this page
> > makes it to permanent storage. Does it matter that this page is lost? If
> > not, then why bother WAL-logging it?
> >
>
> I think we don't care if the page is lost before we update the
> meta-page in the caller because we will try to reallocate in that
> case. But we do care after meta page update (having the updated
> information about this extension via different masks) in which case we
> won't lose this last page because it would have registered the sync
> request for it via sgmrextend before meta page update.

and could it happen that during smgrextend() for the last page, a
checkpoint starts and finishes between FileWrite() and
register_dirty_segment(), then index build finishes, and then a crash
occurs before another checkpoint completes the pending fsync for that
last page?




Re: Document ordering guarantees on INSERT/UPDATE RETURNING clause

2022-02-26 Thread Julien Rouhaud
Hi,

On Sat, Feb 26, 2022 at 06:25:22AM -0700, David G. Johnston wrote:
> On Sat, Feb 26, 2022 at 5:42 AM Shay Rojansky  wrote:
> 
> > FWIW I've received feedback from a SQL Server engineer that one definitely
> > should *not* depend on such ordering there, and that future optimizations
> > (e.g. parallel insertion of many rows) could result in row ordering which
> > differs from the lexical ordering of the VALUES clause.
> >
> 
> 
> > That seems very reasonable; if the situation is similar on PostgreSQL,
> > then I'd suggest making that very clear in the INSERT[2] and UPDATE[3] docs.
> >
> 
> There is clearly no mention of such a guarantee in our documentation.

Yes, which is just how SQL works: a set doesn't have any ordering unless an
explicit one has been defined, RETURNING is no exception to that.




Re: Document ordering guarantees on INSERT/UPDATE RETURNING clause

2022-02-26 Thread David G. Johnston
On Sat, Feb 26, 2022 at 5:42 AM Shay Rojansky  wrote:

> FWIW I've received feedback from a SQL Server engineer that one definitely
> should *not* depend on such ordering there, and that future optimizations
> (e.g. parallel insertion of many rows) could result in row ordering which
> differs from the lexical ordering of the VALUES clause.
>


> That seems very reasonable; if the situation is similar on PostgreSQL,
> then I'd suggest making that very clear in the INSERT[2] and UPDATE[3] docs.
>

There is clearly no mention of such a guarantee in our documentation.

David J.


Document ordering guarantees on INSERT/UPDATE RETURNING clause

2022-02-26 Thread Shay Rojansky
Hi all,

I've seen various discussions around whether PG makes any guarantees on the
ordering of rows returned by the RETURNING clause (e.g. [1]). In a
nutshell, when executing a statement such as the following:

CREATE TABLE foo (id INT PRIMARY KEY GENERATED ALWAYS AS IDENTITY, data
INT);
INSERT INTO foo (data) VALUES (8), (9), (10) RETURNING id, data;

... us the INSERT guaranteed to return the rows (1,8), (2,9) and (3,10)
(and in that order)? This point is important when inserting multiple rows
and wanting to e.g. match a database-generated ID back to memory structures
on the client.

FWIW I've received feedback from a SQL Server engineer that one definitely
should *not* depend on such ordering there, and that future optimizations
(e.g. parallel insertion of many rows) could result in row ordering which
differs from the lexical ordering of the VALUES clause. That seems very
reasonable; if the situation is similar on PostgreSQL, then I'd suggest
making that very clear in the INSERT[2] and UPDATE[3] docs. I'd also
possibly point to the workaround of wrapping the INSERT/UPDATE in a CTE
which then defines the ordering.

Thanks,

Shay

[1]
https://stackoverflow.com/questions/5439293/is-insert-returning-guaranteed-to-return-things-in-the-right-order
[2] https://www.postgresql.org/docs/current/sql-insert.html
[3] https://www.postgresql.org/docs/current/sql-update.html


Re: C++ Trigger Framework

2022-02-26 Thread Shmuel Kamensky
OK, great thank you, Buce! I'll check back in with any interesting results
when I have something running (probably in a few months to a year since I
don't yet know C++ very well :-)).


Re: Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers)

2022-02-26 Thread Bharath Rupireddy
On Sat, Feb 26, 2022 at 3:22 AM Hsu, John  wrote:
>
> > On Fri, Feb 25, 2022 at 08:31:37PM +0530, Bharath Rupireddy wrote:
> >> Thanks Satya and others for the inputs. Here's the v1 patch that
> >> basically allows async wal senders to wait until the sync standbys
> >> report their flush lsn back to the primary. Please let me know your
> >> thoughts.
> > I haven't had a chance to look too closely yet, but IIUC this adds a new
> > function that waits for synchronous replication.  This new function
> > essentially spins until the synchronous LSN has advanced.
> >
> > I don't think it's a good idea to block sending any WAL like this.  AFAICT
> > it is possible that there will be a lot of synchronously replicated WAL
> > that we can send, and it might just be the last several bytes that cannot
> > yet be replicated to the asynchronous standbys.  І believe this patch will
> > cause the server to avoid sending _any_ WAL until the synchronous LSN
> > advances.
> >
> > Perhaps we should instead just choose the SendRqstPtr based on the current
> > synchronous LSN.  Presumably there are other things we'd need to consider,
> > but in general, I think we ought to send as much WAL as possible for a
> > given call to XLogSendPhysical().
>
> I think you're right that we'll avoid sending any WAL until sync_lsn
> advances. We could setup a contrived situation where the async-walsender
> never advances because it terminates before the flush_lsn of the
> synchronous_node catches up. And when the async-walsender restarts,
> it'll start with the latest flushed on the primary and we could go into
> a perpetual loop.

The async walsender looks at flush LSN from
walsndctl->lsn[SYNC_REP_WAIT_FLUSH]; after it comes up and decides to
send the WAL up to it. If there are no sync replicats after it comes
up (users can make sync standbys async without postmaster restart
because synchronous_standby_names is effective with SIGHUP), then it
doesn't wait at all and continues to send WAL. I don't see any problem
with it. Am I missing something here?

> I took a look at the patch and tested basic streaming with async
> replicas ahead of the synchronous standby and with logical clients as
> well and it works as expected.

Thanks for reviewing and testing the patch.

>  > ereport(LOG,
>  >(errmsg("async standby WAL sender with request LSN %X/%X
> is waiting as sync standbys are ahead with flush LSN %X/%X",
>  >LSN_FORMAT_ARGS(flushLSN),
> LSN_FORMAT_ARGS(sendRqstPtr)),
>  > errhidestmt(true)));
>
> I think this log formatting is incorrect.
> s/sync standbys are ahead/sync standbys are behind/ and I think you need
> to swap flushLsn and sendRqstPtr

I will correct it. "async standby WAL sender with request LSN %X/%X is
waiting as sync standbys are ahead with flush LSN %X/%X",
LSN_FORMAT_ARGS(sendRqstP), LSN_FORMAT_ARGS(flushLSN). I will think
more about having better wording of these messages, any suggestions
here?

> When a walsender is waiting for the lsn on the synchronous replica to
> advance and a database stop is issued to the writer, the pg_ctl stop
> isn't able to proceed and the database seems to never shutdown.

I too observed this once or twice. It looks like the walsender isn't
detecting postmaster death in for (;;) with WalSndWait. Not sure if
this is expected or true with other wait-loops in walsender code. Any
more thoughts here?

>  > Assert(priority >= 0);
>
> What's the point of the assert here?

Just for safety. I can remove it as the sync_standby_priority can
never be negative.

> Also the comments/code refer to AsyncStandbys, however it's also used
> for logical clients, which may or may not be standbys. Don't feel too
> strongly about the naming here but something to note.

I will try to be more informative by adding something like "async
standbys and logical replication subscribers".

>  > if (!ShouldWaitForSyncRepl())
>  >return;
>  > ...
>  > for (;;)
>  > {
>  >// rest of work
>  > }
>
> If we had a walsender already waiting for an ack, and the conditions of
> ShouldWaitForSyncRepl() change, such as disabling
> async_standbys_wait_for_sync_replication or synchronous replication
> it'll still wait since we never re-check the condition.

Yeah, I will add the checks inside the async walsender wait-loop.

> postgres=# select wait_event from pg_stat_activity where wait_event like
> 'AsyncWal%';
>wait_event
> --
>   AsyncWalSenderWaitForSyncReplication
>   AsyncWalSenderWaitForSyncReplication
>   AsyncWalSenderWaitForSyncReplication
> (3 rows)
>
> postgres=# show synchronous_standby_names;
>   synchronous_standby_names
> ---
>
> (1 row)
>
> postgres=# show async_standbys_wait_for_sync_replication;
>   async_standbys_wait_for_sync_replication
> --
>   off
> (1 row)
>
>  >LWLockAcquire(SyncRepLock, LW_SHARED);
>  >flushLSN = 

Re: PATCH: add "--config-file=" option to pg_rewind

2022-02-26 Thread Gunnar "Nick" Bluth

Am 26.02.22 um 06:51 schrieb Michael Paquier:

On Fri, Feb 25, 2022 at 10:35:49AM +0100, Gunnar "Nick" Bluth wrote:

Am 24.02.22 um 14:46 schrieb Daniel Gustafsson:

Actually, I think this looks like a saner approach.  Putting a config setting
in two place (postgresql.conf and on the commandline for pg_rewind) is a recipe
for them diverging.


FWIW, I have a bad feeling about passing down directly a command
through an option itself part of a command, so what's discussed on
this thread is refreshing.


Ta.



+   } else {
+   snprintf(postgres_cmd, sizeof(postgres_cmd),
+"\"%s\" -D \"%s\" --config_file=\"%s\" -C 
restore_command",
+postgres_exec_path, datadir_target, config_file);
+   }

Shouldn't this one use appendShellString() on config_file?


It probably should, yes. I don't fancy this repetitive code myself.
But then, pg_rewind as a whole could use an overhaul. I don't see any 
use of PQExpBuffer in it, but a lot of char* ...


I myself am not a C hacker (as you can probably tell already) and don't 
really feel fit (enough) to rewrite pg_rewind.c to use 
fe_utils/string_utils.h :/


I might give it a try anyhow, but that may be a bit... heavy,,, just to 
add one option?


While we're at it (I'm really good at opening cans of worms):
09:47 $ grep -r -l --include \*.c fe_utils/string_utils.h src/bin/ | wc -l
28
09:48 $ grep -r -L --include \*.c fe_utils/string_utils.h src/bin/ | wc -l
66

GSOC project? ;-)

Best,
--
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bl...@pro-open.de
__
"Ceterum censeo SystemD esse delendam" - Cato




Re: Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers)

2022-02-26 Thread Bharath Rupireddy
On Sat, Feb 26, 2022 at 1:08 AM Nathan Bossart  wrote:
>
> On Fri, Feb 25, 2022 at 08:31:37PM +0530, Bharath Rupireddy wrote:
> > Thanks Satya and others for the inputs. Here's the v1 patch that
> > basically allows async wal senders to wait until the sync standbys
> > report their flush lsn back to the primary. Please let me know your
> > thoughts.
>
> I haven't had a chance to look too closely yet, but IIUC this adds a new
> function that waits for synchronous replication.  This new function
> essentially spins until the synchronous LSN has advanced.
>
> I don't think it's a good idea to block sending any WAL like this.  AFAICT
> it is possible that there will be a lot of synchronously replicated WAL
> that we can send, and it might just be the last several bytes that cannot
> yet be replicated to the asynchronous standbys.  І believe this patch will
> cause the server to avoid sending _any_ WAL until the synchronous LSN
> advances.
>
> Perhaps we should instead just choose the SendRqstPtr based on the current
> synchronous LSN.  Presumably there are other things we'd need to consider,
> but in general, I think we ought to send as much WAL as possible for a
> given call to XLogSendPhysical().

A global min LSN of SendRqstPtr of all the sync standbys can be
calculated and the async standbys can send WAL up to global min LSN.
This is unlike what the v1 patch does i.e. async standbys will wait
until the sync standbys report flush LSN back to the primary. Problem
with the global min LSN approach is that there can still be a small
window where async standbys can get ahead of sync standbys. Imagine
async standbys being closer to the primary than sync standbys and if
the failover has to happen while the WAL at SendRqstPtr isn't received
by the sync standbys, but the async standbys can receive them as they
are closer. We hit the same problem that we are trying to solve with
this patch. This is the reason, we are waiting till the sync flush LSN
as it guarantees more transactional protection.

Do you think allowing async standbys optionally wait for either remote
write or flush or apply or global min LSN of SendRqstPtr so that users
can choose what they want?

> > I've done pgbench testing to see if the patch causes any problems. I
> > ran tests two times, there isn't much difference in the txns per
> > seconds (tps), although there's a delay in the async standby receiving
> > the WAL, after all, that's the feature we are pursuing.
>
> I'm curious what a longer pgbench run looks like when the synchronous
> replicas are in the same region.  That is probably a more realistic
> use-case.

We are performing more tests, I will share the results once done.

Regards,
Bharath Rupireddy.




Re: [PATCH] pg_permissions

2022-02-26 Thread Joel Jacobson
On Fri, Feb 25, 2022, at 22:12, Chapman Flack wrote:
> I would be happy to review this patch, but a look through the email leaves me
> thinking it may still be waiting on a C implementation of pg_get_acl(). Is 
> that
> right?

Not sure.

> And perhaps a view rename to pg_privileges, following Peter's comment?

+1

/Joel




RE: Design of pg_stat_subscription_workers vs pgstats

2022-02-26 Thread osumi.takami...@fujitsu.com
On Saturday, February 26, 2022 11:51 AM Amit Kapila  
wrote:
> I have reviewed the latest version and made a few changes along with fixing
> some of the pending comments by Peter Smith. The changes are as
> follows: (a) Removed m_databaseid in PgStat_MsgSubscriptionError as that is
> not required now; (b) changed the struct name PgStat_MsgSubscriptionPurge
> to PgStat_MsgSubscriptionDrop to make it similar to DropDb; (c) changed the
> view name to pg_stat_subscription_stats, we can reconsider it in future if 
> there
> is a consensus on some other name, accordingly changed the reset function
> name to pg_stat_reset_subscription_stats; (d) moved some of the newly
> added subscription stats functions adjacent to slots to main the consistency 
> in
> code; (e) changed comments at few places; (f) added LATERAL back to
> system_views query as we refer pg_subscription's oid in the function call,
> previously that was not clear.
> 
> Do let me know what you think of the attached?
Hi, thank you for updating the patch !


I have a couple of comments on v4.

(1)

I'm not sure if I'm correct, but I'd say the sync_error_count
can come next to the subname as the order of columns.
I felt there's case that the column order is somewhat
related to the time/processing order (I imagined
pg_stat_replication's LSN related columns).
If this was right, table sync related column could be the
first column as a counter within this patch.


(2) doc/src/sgml/monitoring.sgml

+Resets statistics for a single subscription shown in the
+pg_stat_subscription_stats view to zero. If
+the argument is NULL, reset statistics for all
+subscriptions.


I felt we could improve the first sentence.

From:
Resets statistics for a single subscription shown in the..

To(idea1):
Resets statistics for a single subscription defined by the argument to zero.

Or,
To(idea2):
Resets statistics to zero for a single subscription or for all subscriptions.



Best Regards,
Takamichi Osumi