date:20230403

Re: Minimal logical decoding on standbys

2023-04-03 Thread Amit Kapila

On Mon, Apr 3, 2023 at 8:51 PM Drouvot, Bertrand
 wrote:
>
> On 4/2/23 10:10 PM, Andres Freund wrote:
> > Hi,
> >>  restart_lsn = s->data.restart_lsn;
> >> -
> >> -/*
> >> - * If the slot is already invalid or is fresh enough, we 
> >> don't need to
> >> - * do anything.
> >> - */
> >> -if (XLogRecPtrIsInvalid(restart_lsn) || restart_lsn >= 
> >> oldestLSN)
> >> +slot_xmin = s->data.xmin;
> >> +slot_catalog_xmin = s->data.catalog_xmin;
> >> +
> >> +/* the slot has been invalidated (logical decoding conflict 
> >> case) */
> >> +if ((xid && ((LogicalReplicationSlotIsInvalid(s)) ||
> >> +/* or the xid is valid and this is a non conflicting slot */
> >> + (TransactionIdIsValid(*xid) && 
> >> !(LogicalReplicationSlotXidsConflict(slot_xmin, slot_catalog_xmin, 
> >> *xid) ||
> >> +/* or the slot has been invalidated (obsolete LSN case) */
> >> +(!xid && (XLogRecPtrIsInvalid(restart_lsn) || 
> >> restart_lsn >= oldestLSN)))
> >>  {
> >
> > This still looks nearly unreadable. I suggest moving comments outside of the
> > if (), remove redundant parentheses, use a function to detect if the slot 
> > has
> > been invalidated.
> >
>
> I made it as simple as:
>
>  /*
>   * If the slot is already invalid or is a non conflicting slot, we 
> don't
>   * need to do anything.
>   */
>  islogical = xid ? true : false;
>
>  if (SlotIsInvalid(s, islogical) || SlotIsNotConflicting(s, 
> islogical, xid, ))
>
> in V56 attached.
>

Here the variable 'islogical' doesn't seem to convey its actual
meaning because one can imagine that it indicates whether the slot is
logical which I don't think is the actual intent. One idea to simplify
this is to introduce a single function CanInvalidateSlot() or
something like that and move the logic from both the functions
SlotIsInvalid() and SlotIsNotConflicting() into the new function.

-- 
With Regards,
Amit Kapila.

Re: same query but different result on pg16devel and pg15.2

2023-04-03 Thread tender wang

Attached file included table schema information, but no data.

tender wang  于2023年4月4日周二 10:53写道：

> Hi hackers,
> I encounter a problem, as shown below:
>
> query:
>   select
>   ref_0.ps_suppkey as c0,
>   ref_1.c_acctbal as c1,
>   ref_2.o_totalprice as c2,
>   ref_2.o_orderpriority as c3,
>   ref_2.o_clerk as c4
> from
>   public.partsupp as ref_0
>   left join public.nation as sample_0
> inner join public.customer as sample_1
> on (false)
>   on (true)
> left join public.customer as ref_1
> right join public.orders as ref_2
> on (false)
>   left join public.supplier as ref_3
>   on (false)
> on (sample_0.n_comment = ref_1.c_name )
> where (8 <= NULLIF(CASE WHEN (o_orderkey IS NOT NULL) THEN 4 ELSE 4 END,
> CASE WHEN (o_orderdate >= o_orderdate) THEN 95 ELSE 95 END))
> order by c0, c1, c2, c3, c4 limit 1;
>
> on pg16devel:
> c0 | c1 | c2 | c3 | c4
> ++++
>   1 ||||
> (1 row)
> plan:
>   QUERY PLAN
>
>
> ---
>  Limit
>->  Sort
>  Sort Key: ref_0.ps_suppkey, c_acctbal, o_totalprice,
> o_orderpriority, o_clerk
>  ->  Nested Loop Left Join
>->  Seq Scan on partsupp ref_0
>->  Result
>  One-Time Filter: false
> (7 rows)
>
> on pg15.2:
>  c0 | c1 | c2 | c3 | c4
> ++++
> (0 rows)
> plan:
>
> QUERY PLAN
>
>
> ---
>  Limit
>->  Sort
>  Sort Key: ref_0.ps_suppkey, c_acctbal, o_totalprice,
> o_orderpriority, o_clerk
>  ->  Hash Left Join
>Hash Cond: ((n_comment)::text = (c_name)::text)
>Filter: (8 <= NULLIF(CASE WHEN (o_orderkey IS NOT NULL)
> THEN 4 ELSE 4 END, CASE WHEN (o_orderdate >= o_orderdate) THEN 95 ELSE 95
> END))
>->  Nested Loop Left Join
>  ->  Seq Scan on partsupp ref_0
>  ->  Result
>One-Time Filter: false
>->  Hash
>  ->  Result
>One-Time Filter: false
> (13 rows)
>
>
>
>  regards, tender
> wang
>


dbt3-s0.01-janm.sql
Description: Binary data

Re: Prefetch the next tuple's memory during seqscans

2023-04-03 Thread David Rowley

On Tue, 4 Apr 2023 at 07:47, Gregory Stark (as CFM)  wrote:
> The referenced patch was committed March 19th but there's been no
> comment here. Is this patch likely to go ahead this release or should
> I move it forward again?

Thanks for the reminder on this.

I have done some work on it but just didn't post it here as I didn't
have good news.  The problem I'm facing is that after Melanie's recent
refactor work done around heapgettup() [1], I can no longer get the
same speedup as before with the pg_prefetch_mem(). While testing
Melanie's patches, I did do some performance tests and did see a good
increase in performance from it. I really don't know the reason why
the prefetching does not show the gains as it did before. Perhaps the
rearranged code is better able to perform hardware prefetching of
cache lines.

I am, however, inclined not to drop the pg_prefetch_mem() macro
altogether just because I can no longer demonstrate any performance
gains during sequential scans, so I decided to go and try what Thomas
mentioned in [2] to use the prefetching macro to fetch the required
tuples in PageRepairFragmentation() so that they're cached in CPU
cache by the time we get to compactify_tuples().

I tried this using the same test as I described in [3] after adjusting
the following line to use PANIC instead of LOG:

ereport(LOG,
(errmsg("redo done at %X/%X system usage: %s",
LSN_FORMAT_ARGS(xlogreader->ReadRecPtr),
pg_rusage_show(;

doing that allows me to repeat the test using the same WAL each time.

amd3990x CPU on Ubuntu 22.10 with 64GB RAM.

shared_buffers = 10GB
checkpoint_timeout = '1 h'
max_wal_size = 100GB
max_connections = 300

Master:

2023-04-04 15:54:55.635 NZST [15958] PANIC:  redo done at 0/DC447610
system usage: CPU: user: 44.46 s, system: 0.97 s, elapsed: 45.45 s
2023-04-04 15:56:33.380 NZST [16109] PANIC:  redo done at 0/DC447610
system usage: CPU: user: 43.80 s, system: 0.86 s, elapsed: 44.69 s
2023-04-04 15:57:25.968 NZST [16134] PANIC:  redo done at 0/DC447610
system usage: CPU: user: 44.08 s, system: 0.74 s, elapsed: 44.84 s
2023-04-04 15:58:53.820 NZST [16158] PANIC:  redo done at 0/DC447610
system usage: CPU: user: 44.20 s, system: 0.72 s, elapsed: 44.94 s

Prefetch Memory in PageRepairFragmentation():

2023-04-04 16:03:16.296 NZST [25921] PANIC:  redo done at 0/DC447610
system usage: CPU: user: 41.73 s, system: 0.77 s, elapsed: 42.52 s
2023-04-04 16:04:07.384 NZST [25945] PANIC:  redo done at 0/DC447610
system usage: CPU: user: 40.87 s, system: 0.86 s, elapsed: 41.74 s
2023-04-04 16:05:01.090 NZST [25968] PANIC:  redo done at 0/DC447610
system usage: CPU: user: 41.20 s, system: 0.72 s, elapsed: 41.94 s
2023-04-04 16:05:49.235 NZST [25996] PANIC:  redo done at 0/DC447610
system usage: CPU: user: 41.56 s, system: 0.66 s, elapsed: 42.24 s

About 6.7% performance increase over master.

I wonder since I really just did the seqscan patch as a means to get
the pg_prefetch_mem() patch in, I wonder if it's ok to scrap that in
favour of the PageRepairFragmentation patch.

Updated patches attached.

David

[1] 
https://postgr.es/m/CAAKRu_YSOnhKsDyFcqJsKtBSrd32DP-jjXmv7hL0BPD-z0TGXQ%40mail.gmail.com
[2] 
https://postgr.es/m/CA%2BhUKGJRtzbbhVmb83vbCiMRZ4piOAi7HWLCqs%3DGQ74mUPrP_w%40mail.gmail.com
[3] 
https://postgr.es/m/CAApHDvoKwqAzhiuxEt8jSquPJKDpH8DNUZDFUSX9P7DXrJdc3Q%40mail.gmail.com

v1-0001-Add-pg_prefetch_mem-macro-to-load-cache-lines.patch
Description: Binary data

prefetch_in_PageRepairFragmentation.patch
Description: Binary data

RE: Add missing copyright for pg_upgrade/t/* files

2023-04-03 Thread Hayato Kuroda (Fujitsu)

Dear Amit,

Thank you for responding!

> 
> Yeah, it is good to have the Copyright to keep it consistent with
> other test files and otherwise as well.
> 
> --- a/src/bin/pg_upgrade/t/001_basic.pl
> +++ b/src/bin/pg_upgrade/t/001_basic.pl
> @@ -1,3 +1,5 @@
> +# Copyright (c) 2022-2023, PostgreSQL Global Development Group
> 
> How did you decide on the starting year as 2022?

I checked the commit log.
About 001_basic.pl, it had been added at 2017 once but been reverted soon 
[1][2].
322bec added the file again at 2022[3], so I chose 2022.

About 002_pg_upgrade.pl, it has been added at the same time[3]. 
Definitively it should be 2022.

[1]: 
https://github.com/postgres/postgres/commit/f41e56c76e39f02bef7ba002c9de03d62b76de4d
[2] 
https://github.com/postgres/postgres/commit/58ffe141eb37c3f027acd25c1fc6b36513bf9380
[3: 
https://github.com/postgres/postgres/commit/322becb6085cb92d3708635eea61b45776bf27b6

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: RFC: logical publication via inheritance root?

2023-04-03 Thread Peter Smith

FYI, the WIP patch does not seem to apply cleanly anymore using the latest HEAD.

See the cfbot rebase logs [1].

--
[1] http://cfbot.cputube.org/patch_42_4225.log

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Add missing copyright for pg_upgrade/t/* files

2023-04-03 Thread Amit Kapila

On Mon, Apr 3, 2023 at 7:25 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> While reading codes, I noticed that pg_upgrade/t/001_basic.pl and
> pg_upgrade/t/002_pg_upgrade.pl do not contain the copyright.
>
> I checked briefly and almost all files have that, so I thought they missed it.
> PSA the patch to fix them.
>

Yeah, it is good to have the Copyright to keep it consistent with
other test files and otherwise as well.

--- a/src/bin/pg_upgrade/t/001_basic.pl
+++ b/src/bin/pg_upgrade/t/001_basic.pl
@@ -1,3 +1,5 @@
+# Copyright (c) 2022-2023, PostgreSQL Global Development Group

How did you decide on the starting year as 2022?

-- 
With Regards,
Amit Kapila.

Re: O(n) tasks cause lengthy startups and checkpoints

2023-04-03 Thread Nathan Bossart

I sent this one to the next commitfest and marked it as waiting-on-author
and targeted for v17.  I'm aiming to have something that addresses the
latest feedback ready for the July commitfest.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: pg_usleep for multisecond delays

2023-04-03 Thread Nathan Bossart

On Mon, Apr 03, 2023 at 04:33:27PM -0400, Gregory Stark (as CFM) wrote:
> Is this still a WIP? Is it targeting this release? There are only a
> few days left before the feature freeze. I'm guessing it should just
> move to the next CF for the next release?

I moved it to the next commitfest and marked the target version as v17.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: [BUG] pg_stat_statements and extended query protocol

2023-04-03 Thread Imseih (AWS), Sami

> Maybe, but is there any field demand for that?

I don't think there is.

> We clearly do need to fix the
> reported rowcount for cases where ExecutorRun is invoked more than
> once per ExecutorEnd call; but I think that's sufficient.

Sure, the original proposed fix, but with tracking the es_total_processed
only in Estate should be enough for now.

Regards,

Sami Imseih
Amazon Web Services (AWS)

RE: Support logical replication of DDLs

2023-04-03 Thread houzj.f...@fujitsu.com

On Friday, March 31, 2023 6:31 AM Peter Smith  wrote:

Hi,

> 
> It seems that lately, the patch attachments are lacking version numbers. It
> causes unnecessary confusion. For example, I sometimes fetch patches from
> this thread locally to "diff" them with previous patches to get a rough 
> overview
> of the changes -- that has now become more difficult.
> 
> Can you please reinstate the name convention of having version numbers for all
> patch attachments?
> 
> IMO *every* post that includes patches should unconditionally increment the
> patch version -- even if the new patches are just a rebase or some other 
> trivial
> change. Version numbers make it clear what patches are the latest, you will be
> easily able to unambiguously refer to them by name in subsequent posts, and
> when copied to your local computer they won't clash with any older copied
> patches.

The patch currently use date as the version number. I think the reason is that
multiple people are working on the patch which cause the version numbers to be
changed very frequently(soon becomes a very large number). So to avoid this
, we used the date to distinguish different versions.

Best Regards,
Hou zj

Re: [BUG] pg_stat_statements and extended query protocol

2023-04-03 Thread Tom Lane

"Imseih (AWS), Sami"  writes:
> I wonder if the right answer here is to track fetches as 
> a separate counter in pg_stat_statements, in which fetch
> refers to the number of times a portal is executed?

Maybe, but is there any field demand for that?

IMV, the existing behavior is that we count one "call" per overall
query execution (that is, per ExecutorEnd invocation).  The argument
that that's a bug and we should change it seems unsupportable to me,
and even the argument that we should also count ExecutorRun calls
seems quite lacking in evidence.  We clearly do need to fix the
reported rowcount for cases where ExecutorRun is invoked more than
once per ExecutorEnd call; but I think that's sufficient.

regards, tom lane

Re: zstd compression for pg_dump

2023-04-03 Thread Justin Pryzby

On Mon, Apr 03, 2023 at 11:26:09PM +0200, Tomas Vondra wrote:
> On 4/3/23 21:17, Justin Pryzby wrote:
> > On Sat, Apr 01, 2023 at 10:26:01PM +0200, Tomas Vondra wrote:
> >>> Feel free to mess around with threads (but I'd much rather see the patch
> >>> progress for zstd:long).
> >>
> >> OK, understood. The long mode patch is pretty simple. IIUC it does not
> >> change the format, i.e. in the worst case we could leave it for PG17
> >> too. Correct?
> > 
> > Right, libzstd only has one "format", which is the same as what's used
> > by the commandline tool.  zstd:long doesn't change the format of the
> > output: the library just uses a larger memory buffer to allow better
> > compression.  There's no format change for zstd:workers, either.
> 
> OK. I plan to do a bit more review/testing on this, and get it committed
> over the next day or two, likely including the long mode. One thing I
> noticed today is that maybe long_distance should be a bool, not int.
> Yes, ZSTD_c_enableLongDistanceMatching() accepts int, but it'd be
> cleaner to cast the value during a call and keep it bool otherwise.

Thanks for noticing.  Evidently I wrote it using "int" to get the
feature working, and then later wrote the bool parsing bits but never
changed the data structure.

This also updates a few comments, indentation, removes a useless
assertion, and updates the warning about zstd:workers.

-- 
Justin
>From df0eb4d3c4799f24e58f1e5b0a9470e5af355ad6 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sat, 7 Jan 2023 15:45:06 -0600
Subject: [PATCH 1/4] pg_dump: zstd compression

Previously proposed at: 20201221194924.gi30...@telsasoft.com
---
 doc/src/sgml/ref/pg_dump.sgml |  13 +-
 src/bin/pg_dump/Makefile  |   2 +
 src/bin/pg_dump/compress_io.c |  66 ++--
 src/bin/pg_dump/compress_zstd.c   | 537 ++
 src/bin/pg_dump/compress_zstd.h   |  25 ++
 src/bin/pg_dump/meson.build   |   4 +-
 src/bin/pg_dump/pg_backup_archiver.c  |   9 +-
 src/bin/pg_dump/pg_backup_directory.c |   2 +
 src/bin/pg_dump/pg_dump.c |  20 +-
 src/bin/pg_dump/t/002_pg_dump.pl  |  79 +++-
 src/tools/pginclude/cpluspluscheck|   1 +
 src/tools/pgindent/typedefs.list  |   1 +
 12 files changed, 705 insertions(+), 54 deletions(-)
 create mode 100644 src/bin/pg_dump/compress_zstd.c
 create mode 100644 src/bin/pg_dump/compress_zstd.h

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 77299878e02..8de38e0fd0d 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -330,8 +330,9 @@ PostgreSQL documentation
machine-readable format that pg_restore
can read. A directory format archive can be manipulated with
standard Unix tools; for example, files in an uncompressed archive
-   can be compressed with the gzip or
-   lz4 tools.
+   can be compressed with the gzip,
+   lz4, or
+   zstd tools.
This format is compressed by default using gzip
and also supports parallel dumps.
   
@@ -655,7 +656,8 @@ PostgreSQL documentation

 Specify the compression method and/or the compression level to use.
 The compression method can be set to gzip,
-lz4, or none for no compression.
+lz4, zstd,
+or none for no compression.
 A compression detail string can optionally be specified.  If the
 detail string is an integer, it specifies the compression level.
 Otherwise, it should be a comma-separated list of items, each of the
@@ -676,8 +678,9 @@ PostgreSQL documentation
 individual table-data segments, and the default is to compress using
 gzip at a moderate level. For plain text output,
 setting a nonzero compression level causes the entire output file to be compressed,
-as though it had been fed through gzip or
-lz4; but the default is not to compress.
+as though it had been fed through gzip,
+lz4, or zstd;
+but the default is not to compress.


 The tar archive format currently does not support compression at all.
diff --git a/src/bin/pg_dump/Makefile b/src/bin/pg_dump/Makefile
index eb8f59459a1..24de7593a6a 100644
--- a/src/bin/pg_dump/Makefile
+++ b/src/bin/pg_dump/Makefile
@@ -18,6 +18,7 @@ include $(top_builddir)/src/Makefile.global
 
 export GZIP_PROGRAM=$(GZIP)
 export LZ4
+export ZSTD
 export with_icu
 
 override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
@@ -29,6 +30,7 @@ OBJS = \
 	compress_io.o \
 	compress_lz4.o \
 	compress_none.o \
+	compress_zstd.o \
 	dumputils.o \
 	parallel.o \
 	pg_backup_archiver.o \
diff --git a/src/bin/pg_dump/compress_io.c b/src/bin/pg_dump/compress_io.c
index 0972a4f934a..4f06bb024f9 100644
--- a/src/bin/pg_dump/compress_io.c
+++ b/src/bin/pg_dump/compress_io.c
@@ -52,8 +52,8 @@
  *
  *	InitDiscoverCompressFileHandle tries to infer the

Re: [BUG] pg_stat_statements and extended query protocol

2023-04-03 Thread Imseih (AWS), Sami

> Why should that be the definition? Partial execution of a portal
> might be something that is happening at the driver level, behind the
> user's back. You can't make rational calculations of, say, plan
> time versus execution time if that's how "calls" is measured.

Correct, and there are also drivers that implement fetch size using
cursor statements, i.e. DECLARE CURSOR, FETCH CURSOR,
and each FETCH gets counted as 1 call.

I wonder if the right answer here is to track fetches as 
a separate counter in pg_stat_statements, in which fetch
refers to the number of times a portal is executed?

Regards,

Sami Imseih
Amazon Web Services (AWS)

same query but different result on pg16devel and pg15.2

2023-04-03 Thread tender wang

Hi hackers,
I encounter a problem, as shown below:

query:
  select
  ref_0.ps_suppkey as c0,
  ref_1.c_acctbal as c1,
  ref_2.o_totalprice as c2,
  ref_2.o_orderpriority as c3,
  ref_2.o_clerk as c4
from
  public.partsupp as ref_0
  left join public.nation as sample_0
inner join public.customer as sample_1
on (false)
  on (true)
left join public.customer as ref_1
right join public.orders as ref_2
on (false)
  left join public.supplier as ref_3
  on (false)
on (sample_0.n_comment = ref_1.c_name )
where (8 <= NULLIF(CASE WHEN (o_orderkey IS NOT NULL) THEN 4 ELSE 4 END,
CASE WHEN (o_orderdate >= o_orderdate) THEN 95 ELSE 95 END))
order by c0, c1, c2, c3, c4 limit 1;

on pg16devel:
c0 | c1 | c2 | c3 | c4
++++
  1 ||||
(1 row)
plan:
  QUERY PLAN

---
 Limit
   ->  Sort
 Sort Key: ref_0.ps_suppkey, c_acctbal, o_totalprice,
o_orderpriority, o_clerk
 ->  Nested Loop Left Join
   ->  Seq Scan on partsupp ref_0
   ->  Result
 One-Time Filter: false
(7 rows)

on pg15.2:
 c0 | c1 | c2 | c3 | c4
++++
(0 rows)
plan:
  QUERY
PLAN
---
 Limit
   ->  Sort
 Sort Key: ref_0.ps_suppkey, c_acctbal, o_totalprice,
o_orderpriority, o_clerk
 ->  Hash Left Join
   Hash Cond: ((n_comment)::text = (c_name)::text)
   Filter: (8 <= NULLIF(CASE WHEN (o_orderkey IS NOT NULL) THEN
4 ELSE 4 END, CASE WHEN (o_orderdate >= o_orderdate) THEN 95 ELSE 95 END))
   ->  Nested Loop Left Join
 ->  Seq Scan on partsupp ref_0
 ->  Result
   One-Time Filter: false
   ->  Hash
 ->  Result
   One-Time Filter: false
(13 rows)



 regards, tender
wang

Re: [BUG] pg_stat_statements and extended query protocol

2023-04-03 Thread Tom Lane

"Imseih (AWS), Sami"  writes:
>> Also, I'm doubtful that counting calls this way is a great idea,
>> which would mean you only need one new counter field not two. The
>> fact that you're having trouble defining what it means certainly
>> suggests that the implementation is out front of the design.

> ISTM you are not in agreement that a call count should be incremented 
> after every executorRun, but should only be incremented after 
> the portal is closed, at executorEnd. Is that correct?

Right.  That makes the "call count" equal to the number of times the
query is invoked.

> FWIW, The rationale for incrementing calls in executorRun is that calls 
> refers 
> to the number of times a client executes a portal, whether partially or to 
> completion.

Why should that be the definition?  Partial execution of a portal
might be something that is happening at the driver level, behind the
user's back.  You can't make rational calculations of, say, plan
time versus execution time if that's how "calls" is measured.

regards, tom lane

Re: [BUG] pg_stat_statements and extended query protocol

2023-04-03 Thread Imseih (AWS), Sami

> * Yeah, it'd be nice to have an in-core test, but it's folly to insist
> on one that works via libpq and psql. That requires a whole new set
> of features that you're apparently designing on-the-fly with no other
> use cases in mind. I don't think that will accomplish much except to
> ensure that this bug fix doesn't make it into v16.

I agree, Solving the lack of in-core testing for this area belong in a
different discussion. 


> * I don't understand why it was thought good to add two new counters
> to struct Instrumentation. In EXPLAIN ANALYZE cases those will be
> wasted space *per plan node*, not per Query.

Indeed, looking at ExplainNode, Instrumentation struct is allocated
per node and the new fields will be wasted space. Thanks for highlighting
this.

> * It also seems quite bizarre to add counters to a core data structure
> and then leave it to pg_stat_statements to maintain them. 

That is fair point

> I'm inclined to think that adding the counters to struct EState is
> fine. That's 304 bytes already on 64-bit platforms, another 8 or 16
> won't matter.

With the point you raise about Insrumentation per node, Estate 
is the better place for the new counters.


> Also, I'm doubtful that counting calls this way is a great idea,
> which would mean you only need one new counter field not two. The
> fact that you're having trouble defining what it means certainly
> suggests that the implementation is out front of the design.

ISTM you are not in agreement that a call count should be incremented 
after every executorRun, but should only be incremented after 
the portal is closed, at executorEnd. Is that correct?

FWIW, The rationale for incrementing calls in executorRun is that calls refers 
to the number of times a client executes a portal, whether partially or to 
completion.

Clients can also fetch rows from portals at various rates, so to determine the
"rows per call" accurately from pg_stat_statements, we should track a calls as 
the number of times executorRun was called on a portal.

> In short, what I think I'd suggest is adding an es_total_processed
> field to EState and having standard_ExecutorRun do "es_total_processed
> += es_processed" near the end, then just change pg_stat_statements to
> use es_total_processed not es_processed.

The original proposal in 
0001-correct-pg_stat_statements-tracking-of-portals.patch,
was to add a "calls" and "es_total_processed" field to EState.


Regards,

Sami Imseih
Amazon Web Services (AWS)

Re: running logical replication as the subscription owner

2023-04-03 Thread Noah Misch

On Mon, Apr 03, 2023 at 12:05:29PM -0700, Jeff Davis wrote:
> On Sun, 2023-04-02 at 20:21 -0700, Noah Misch wrote:
> > The most-plausible-to-me attack involves an ENABLE ALWAYS trigger
> > that logs
> > CURRENT_USER to an audit table.
> 
> How does requiring that the subscription owner have SET ROLE privileges
> on the table owner help that case? As Robert pointed out, users coming
> from v15 will have superuser subscription owners anyway, so the change
> will be silent for them.

For subscriptions upgraded from v15, it doesn't matter.  Requiring SET ROLE
prevents this sequence:

- Make a table with such an audit trigger.  Grant INSERT and UPDATE to Alice.
- Upgrade to v15.
- Grant pg_create_subscription and database-level CREATE to Alice.
- Alice creates a subscription as a tool to impersonate the table owner,
  bypassing audit.

To put it another way, the benefit of the SET ROLE requirement is not really
making subscriptions more secure.  The benefit of the requirement is
pg_create_subscription not becoming a tool for bypassing audit.

I gather we agree on what to do for v16, which is good.

> But I feel like we can do better in version 17 when we have time to
> actually work through common use cases and the exceptional cases and
> weight them appropriately. Like, how common is it to want to get the
> user from a trigger on the subscriber side?

Fair.  I don't think the community has arrived at a usable approach for
answering questions like that.  It would be valuable to have an approach.

> Should that trigger be
> using SESSION_USER instead of CURRENT_USER?

Apart from evaluating the argument of SET ROLE, I've not heard of a valid use
case for SESSION_USER.

Re: Minimal logical decoding on standbys

2023-04-03 Thread Masahiko Sawada

On Tue, Apr 4, 2023 at 3:17 AM Drouvot, Bertrand
 wrote:
>
> Hi,
>
> On 4/3/23 8:10 AM, Drouvot, Bertrand wrote:
> > Hi,
> >
> > On 4/3/23 7:35 AM, Amit Kapila wrote:
> >> On Mon, Apr 3, 2023 at 4:26 AM Jeff Davis  wrote:
> >>
> >> Agreed, even Bertrand and myself discussed the same approach few
> >> emails above. BTW, if we have this selective logic to wake
> >> physical/logical walsenders and for standby's, we only wake logical
> >> walsenders at the time of  ApplyWalRecord() then do we need the new
> >> conditional variable enhancement being discussed, and if so, why?
> >>
> >
> > Thank you both for this new idea and discussion. In that case I don't think
> > we need the new CV API and the use of a CV anymore. As just said up-thread 
> > I'll submit
> > a new proposal with this new approach.
> >
>
> Please find enclosed V57 implementing the new approach in 0004.

Regarding 0004 patch:

@@ -2626,6 +2626,12 @@ InitWalSenderSlot(void)
walsnd->sync_standby_priority = 0;
walsnd->latch = >procLatch;
walsnd->replyTime = 0;
+
+   if (MyDatabaseId == InvalidOid)
+   walsnd->kind = REPLICATION_KIND_PHYSICAL;
+   else
+   walsnd->kind = REPLICATION_KIND_LOGICAL;
+

I think we might want to set the replication kind when processing the
START_REPLICATION command. The walsender using a logical replication
slot is not necessarily streaming (e.g. when COPYing table data).

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

fairywren exiting in ecpg

2023-04-03 Thread Andres Freund

Hi,

Looks like fairywren is possibly seeing something I saw before and spent many
days looking into:
https://postgr.es/m/20220909235836.lz3igxtkcjb5w7zb%40awork3.anarazel.de
which led me to add the following to .cirrus.yml:

# Cirrus defaults to SetErrorMode(SEM_NOGPFAULTERRORBOX | ...). That
# prevents crash reporting from working unless binaries do SetErrorMode()
# themselves. Furthermore, it appears that either python or, more likely,
# the C runtime has a bug where SEM_NOGPFAULTERRORBOX can very
# occasionally *trigger* a crash on process exit - which is hard to debug,
# given that it explicitly prevents crash dumps from working...
# 0x8001 is SEM_FAILCRITICALERRORS | SEM_NOOPENFILEERRORBOX
CIRRUS_WINDOWS_ERROR_MODE: 0x8001


The mingw folks also spent a lot of time looking into this ([1]), without a
lot of success.

It sure looks like it might be a windows C runtime issue - none of the
stacktrace handling python has gets invoked. I could not find any relevant
behavoural differences in python's code that depend on SEM_NOGPFAULTERRORBOX
being set.

It'd be interesting to see if fairywren's occasional failures go away if you
set MSYS=winjitdebug (which prevents msys from adding SEM_NOGPFAULTERRORBOX to
ErrorMode).

Greetings,

Andres Freund

[1] https://github.com/msys2/MINGW-packages/issues/11864

Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-04-03 Thread David Rowley

On Tue, 4 Apr 2023 at 02:49, Melanie Plageman  wrote:
> v9 attached.

I've made a pass on the v9-0001 patch only.  Here's what I noted down:

v9-0001:

1. In the documentation and comments, generally we always double-space
after a period.  I see quite often you're not following this.

2. Doc: We could generally seem to break tags within paragraphs into
multiple lines. You're doing that quite a bit, e.g:

linkend="glossary-buffer-access-strategy">Buffer Access
Strategy. 0 will disable use of a

2. This is not a command

BUFFER_USAGE_LIMIT parameter.

 is probably what you want.

3. I'm not sure I agree that it's a good idea to refer to the strategy
with multiple different names. Here you've called it a "ring buffer",
but in the next sentence, you're calling it a Buffer Access Strategy.

  Specifies the ring buffer size for VACUUM. This size
  is used to calculate the number of shared buffers which will be reused as
  part of a Buffer
  Access Strategy. 0 disables use of a

4. Can you explain your choice in not just making < 128 a hard error
rather than clamping?

I guess it means checks like this are made more simple, but that does
not seem like a good enough reason:

/* check for out-of-bounds */
if (result < -1 || result > MAX_BAS_RING_SIZE_KB)


postgres=# vacuum (parallel -1) pg_class;
ERROR:  parallel workers for vacuum must be between 0 and 1024

Maybe the above is a good guide to follow.

To allow you to get rid of the clamping code, you'd likely need an
assign hook function for vacuum_buffer_usage_limit.

5. I see vacuum.sgml is full of inconsistencies around the use of
 vs . I was going to complain about your:

  ONLY_DATABASE_STATS option. If
  ANALYZE is also specified, the
  BUFFER_USAGE_LIMIT value is used for both the vacuum

but I see you've likely just copied what's nearby.

There are also plenty of usages of  in that file.  I'd rather
see you use . Maybe there can be some other patch that sweeps
the entire docs to look for OPTION_NAME and
replaces them to use .

6. I was surprised to see you've added both
GetAccessStrategyWithSize() and GetAccessStrategyWithNBuffers(). I
think the former is suitable for both. GetAccessStrategyWithNBuffers()
seems to be just used once outside of freelist.c

7. I don't think bas_nbuffers() is a good name for an external
function.  StrategyGetBufferCount() seems better.

8. I don't quite follow this comment:

/*
* TODO: should this be 0 so that we are sure that vacuum() never
* allocates a new bstrategy for us, even if we pass in NULL for that
* parameter? maybe could change how failsafe NULLs out bstrategy if
* so?
*/

Can you explain under what circumstances would vacuum() allocate a
bstrategy when do_autovacuum() would not? Is this a case of a config
reload where someone changes vacuum_buffer_usage_limit from 0 to
something non-zero? If so, perhaps do_autovacuum() needs to detect
this and allocate a strategy rather than having vacuum() do it once
per table (wastefully).

9. buffer/README.  I think it might be overkill to document details
about how the new vacuum option works in a section talking about
Buffer Ring Replacement Strategy.  Perhaps it just worth something
like:

"In v16, the 256KB ring was made configurable by way of the
vacuum_buffer_usage_limit GUC and the BUFFER_USAGE_LIMIT VACUUM
option."

10. I think if you do #4 then you can get rid of all the range checks
and DEBUG1 elogs in GetAccessStrategyWithSize().

11. This seems a bit badly done:

int vacuum_buffer_usage_limit = -1;

int VacuumCostPageHit = 1; /* GUC parameters for vacuum */
int VacuumCostPageMiss = 2;
int VacuumCostPageDirty = 20;

I'd class vacuum_buffer_usage_limit as a "GUC parameters for vacuum"
too.  Probably the CamelCase naming should be followed too.


12. ANALYZE too?

{"vacuum_buffer_usage_limit", PGC_USERSET, RESOURCES_MEM,
gettext_noop("Sets the buffer pool size for VACUUM and autovacuum."),

13. VacuumParams.ring_size has no comments explaining what it is.

14. vacuum_buffer_usage_limit seems to be lumped in with unrelated GUCs

extern PGDLLIMPORT int maintenance_work_mem;
extern PGDLLIMPORT int max_parallel_maintenance_workers;
+extern PGDLLIMPORT int vacuum_buffer_usage_limit;

extern PGDLLIMPORT int VacuumCostPageHit;
extern PGDLLIMPORT int VacuumCostPageMiss;


15. No comment explaining what these are:

#define MAX_BAS_RING_SIZE_KB (16 * 1024 * 1024)
#define MIN_BAS_RING_SIZE_KB 128

16. Parameter names in function declaration and definition don't match in:

extern BufferAccessStrategy
GetAccessStrategyWithNBuffers(BufferAccessStrategyType btype, int
nbuffers);
extern BufferAccessStrategy
GetAccessStrategyWithSize(BufferAccessStrategyType btype, int
nbuffers);

Also, line wraps at 79 chars. (80 including line feed)

17. If you want to test the 16GB upper limit, maybe going 1KB (or
8KB?) rather than 1GB over 16GB is better? 2097153kB, I think.

VACUUM (BUFFER_USAGE_LIMIT '17 GB') vac_option_tab;

David

Re: Why enable_hashjoin Completely disables HashJoin

2023-04-03 Thread Tom Lane

David Rowley  writes:
> I think there would be quite a bit of work to do before we could ever
> start to think about that.  The planner does quite a bit of writing on
> the parse, e.g adding new RangeTblEntrys to the query's rtable. We'd
> either need to fix all those first or make a copy of the parse before
> planning.

Yeah, we'd have to be sure that all that preliminary work is teased apart
from the actual path-making.  I think we are probably pretty close to
that but not there yet.  Subqueries might be problematic, but perhaps
we could define our way out of that by saying that this retry principle
applies independently in each planner recursion level.

> It's also not clear to
> me how you'd know what you'd need to enable again to get the 2nd
> attempt to produce a plan this time around. I'd assume you'd want the
> minimum possible set of enable_* GUCs turned back on, but what would
> you do in cases where there's an aggregate and both enable_hashagg and
> enable_sort are both disabled and there are no indexes providing
> pre-sorted input?

As I commented concurrently, I think we should simply not try to solve
that conundrum: if you want control, don't pose impossible problems.
There's no principled way that we could decide which of enable_hashagg
and enable_sort to ignore first, for example.

regards, tom lane

Re: Why enable_hashjoin Completely disables HashJoin

2023-04-03 Thread Tom Lane

Andres Freund  writes:
> It sounds too hard compared to the gains, but another way could be to plan
> with the relevant path generation hard disabled, and plan from scratch, with
> additional scan types enabled, if we end up being unable to generate valid
> plan.

Actually, I kind of like that.  It would put the extra cost in a place
it belongs: if you have enough enable_foo turned off to prevent
generating a valid plan, it'll cost you extra to make a plan ... but
likely you'll be paying even more in runtime due to not getting a good
plan, so maybe that doesn't matter anyway.  I'd limit it to two passes:
first try honors all enable_foo switches, second try ignores all.

I'm not quite sure how this could be wedged into the existing code
structure --- in particular I am not sure that we're prepared to do
two passes of baserel path generation.  (GEQO is an existence proof
that we could handle it for join planning, though.)

Or we could rethink the design goal of not allowing enable_foo switches
to cause us to fail to make a plan.  That might be unusable though.

regards, tom lane

Re: Why enable_hashjoin Completely disables HashJoin

2023-04-03 Thread David Rowley

On Tue, 4 Apr 2023 at 11:18, Andres Freund  wrote:
> It sounds too hard compared to the gains, but another way could be to plan
> with the relevant path generation hard disabled, and plan from scratch, with
> additional scan types enabled, if we end up being unable to generate valid
> plan.

I think there would be quite a bit of work to do before we could ever
start to think about that.  The planner does quite a bit of writing on
the parse, e.g adding new RangeTblEntrys to the query's rtable. We'd
either need to fix all those first or make a copy of the parse before
planning. The latter is quite expensive today.  It's also not clear to
me how you'd know what you'd need to enable again to get the 2nd
attempt to produce a plan this time around. I'd assume you'd want the
minimum possible set of enable_* GUCs turned back on, but what would
you do in cases where there's an aggregate and both enable_hashagg and
enable_sort are both disabled and there are no indexes providing
pre-sorted input?

David

David

Re: Why enable_hashjoin Completely disables HashJoin

2023-04-03 Thread Andres Freund

Hi,

On 2023-04-03 14:04:30 -0400, Tom Lane wrote:
> Robert Haas  writes:
> > On Mon, Apr 3, 2023 at 8:13 AM Tom Lane  wrote:
> >> Personally, I'd get rid of disable_cost altogether if I could.
> >> I'm not in a hurry to extend its use to more places.
> 
> > I agree. I've wondered if we should put some work into that. It feels
> > bad to waste CPU cycles generating paths we intend to basically just
> > throw away, and it feels even worse if they manage to beat out some
> > other path on cost.
> 
> > It hasn't been obvious to me how we could restructure the existing
> > logic to avoid relying on disable_cost.
> 
> Yeah.  In some places it would not be too hard; for example, if we
> generated seqscan paths last instead of first for baserels, the rule
> could be "generate it if enable_seqscan is on OR if we made no other
> path for the rel".  It's much messier for joins though, partly because
> the same joinrel will be considered multiple times as we process
> different join orderings, plus it's usually unclear whether failing
> to generate any paths for joinrel X will lead to overall failure.
> 
> A solution that would work is to treat disable_cost as a form of infinity
> that's counted separately from the actual cost estimate, that is we
> label paths as "cost X, plus there are N uses of disabled plan types".
> Then you sort first on N and after that on X.  But this'd add a good
> number of cycles to add_path, which I've not wanted to expend on a
> non-mainstream usage.

It sounds too hard compared to the gains, but another way could be to plan
with the relevant path generation hard disabled, and plan from scratch, with
additional scan types enabled, if we end up being unable to generate valid
plan.

Greetings,

Andres Freund

Re: Kerberos delegation support in libpq and postgres_fdw

2023-04-03 Thread David Christensen

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

Did a code review pass here; here is some feedback.


+   /* If password was used to connect, make sure it was one provided */
+   if (PQconnectionUsedPassword(conn) && dblink_connstr_has_pw(connstr))
+   return;

  Do we need to consider whether these passwords are the same?  Is there a 
different vector where a different password could be acquired from a different 
source (PGPASSWORD, say) while both of these criteria are true?  Seems like it 
probably doesn't matter that much considering we only checked Password alone in 
previous version of this code.

---

Looks like the pg_gssinfo struct hides the `proxy_creds` def behind:

#if defined(ENABLE_GSS) | defined(ENABLE_SSPI)
typedef struct
{
gss_buffer_desc outbuf; /* GSSAPI output token buffer */
#ifdef ENABLE_GSS
...
boolproxy_creds;/* GSSAPI Delegated/proxy credentials */
#endif
} pg_gssinfo;
#endif

Which means that the later check in `be_gssapi_get_proxy()` we have:

/*
 * Return if GSSAPI delegated/proxy credentials were included on this
 * connection.
 */
bool
be_gssapi_get_proxy(Port *port)
{
if (!port || !port->gss)
return NULL;

return port->gss->proxy_creds;
}

So in theory it'd be possible to have SSPI enabled but GSS disabled and we'd 
fail to compile in that case.  (It may be that this routine is never *actually* 
called in that case, just noting compile-time considerations.)  I'm not seeing 
guards in the actual PQ* routines, but don't think I've done an exhaustive 
search.

---

gss_accept_deleg


+   
+Forward (delegate) GSS credentials to the server.  The default is
+disable which means credentials will not be 
forwarded
+to the server.  Set this to enable to have
+credentials forwarded when possible.

When is this not possible?  Server policy?  External factors?

---



 Only superusers may connect to foreign servers without password
-authentication, so always specify the password option
-for user mappings belonging to non-superusers.
+authentication or using gssapi proxied credentials, so specify the
+password option for user mappings belonging to
+non-superusers who are not able to proxy GSSAPI credentials.



s/gssapi/GSSAPI/; this is kind of confusing, as this makes it sound like only 
superuser may use GSSAPI proxied credentials, which I disbelieve to be true.  
Additionally, it sounds like you're wanting to explicitly maintain a denylist 
for users to not be allowed proxying; is that correct?

---

libpq/auth.c:

if (proxy != NULL)
{
pg_store_proxy_credential(proxy);
port->gss->proxy_creds = true;
}

Per GSS docs, seems like we should be comparing to GSS_C_NO_CREDENTIAL and 
validating that the gflags has the `deleg_flag` bit set before considering 
whether there are valid credentials; in practice this might be the same effect 
(haven't looked at what that symbol actually resolves to, but NULL would be 
sensible).

Are there other cases we might need to consider here, like valid credentials, 
but they are expired? (GSS_S_CREDENTIALS_EXPIRED)

---

+   /*
+* Set KRB5CCNAME for this backend, so that later calls to 
gss_acquire_cred
+* will find the proxied credentials we stored.
+*/

So I'm not seeing this in other use in the code; I assume this is just used by 
the krb5 libs?

Similar q's for the other places the pg_gss_accept_deleg are used.

---

+int
+PQconnectionUsedGSSAPI(const PGconn *conn)
+{
+   if (!conn)
+   return false;
+   if (conn->gssapi_used)
+   return true;
+   else
+   return false;
+}

Micro-gripe: this routine seems like could be simpler, though the compiler 
probably has the same thing to say for either, so maybe code clarity is better 
as written:

int
PQconnectionUsedGSSAPI(const PGconn *conn)
{
return conn && conn->gssapi_used;
}

---

Anything required for adding meson support? I notice src/test/kerberos has 
Makefile updated, but no meson.build files are changed.

---

Two tests in src/test/kerberos/t/001_auth.pl at :535 and :545 have the same 
test description:

+   'succeeds with GSS-encrypted access required and hostgssenc hba and 
credentials not forwarded',

Since the first test has only `gssencmode` defined (so implicit `gssdeleg` 
value) and the second has `gssdeleg=disable` I'd suggest that the test on :545 
should have its description updated to add the word "explicitly":

'succeeds with GSS-encrypted access required and hostgssenc hba and

Re: Should vacuum process config file reload more often

2023-04-03 Thread Melanie Plageman

On Mon, Apr 3, 2023 at 3:08 PM Andres Freund  wrote:
> On 2023-04-03 14:43:14 -0400, Tom Lane wrote:
> > Melanie Plageman  writes:
> > > v13 attached with requested updates.
> >
> > I'm afraid I'd not been paying any attention to this discussion,
> > but better late than never.  I'm okay with letting autovacuum
> > processes reload config files more often than now.  However,
> > I object to allowing ProcessConfigFile to be called from within
> > commands in a normal user backend.  The existing semantics are
> > that user backends respond to SIGHUP only at the start of processing
> > a user command, and I'm uncomfortable with suddenly deciding that
> > that can work differently if the command happens to be VACUUM.
> > It seems unprincipled and perhaps actively unsafe.
>
> I think it should be ok in commands like VACUUM that already internally start
> their own transactions, and thus require to be run outside of a transaction
> and at the toplevel. I share your concerns about allowing config reload in
> arbitrary places. While we might want to go there, it would require a lot more
> analysis.

As an alternative for your consideration, attached v14 set implements
the config file reload for autovacuum only (in 0003) and then enables it
for VACUUM and ANALYZE not in a nested transaction command (in 0004).

Previously I had the commits in the reverse order for ease of review (to
separate changes to worker limit balancing logic from config reload
code).

- Melanie
From 1781dd7174d5d6eaaeb4bd02029212f3c23d4dbe Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Sat, 25 Mar 2023 14:14:55 -0400
Subject: [PATCH v14 3/4] Autovacuum refreshes cost-based delay params more
 often

Allow autovacuum to reload the config file more often so that cost-based
delay parameters can take effect while VACUUMing a relation. Previously
autovacuum workers only reloaded the config file once per relation
vacuumed, so config changes could not take effect until beginning to
vacuum the next table.

Now, check if a reload is pending roughly once per block, when checking
if we need to delay.

In order for autovacuum workers to safely update their own cost delay
and cost limit parameters without impacting performance, we had to
rethink when and how these values were accessed.

Previously, an autovacuum worker's wi_cost_limit was set only at the
beginning of vacuuming a table, after reloading the config file.
Therefore, at the time that autovac_balance_cost() is called, workers
vacuuming tables with no table options could still have different values
for their wi_cost_limit_base and wi_cost_delay.

Now that the cost parameters can be updated while vacuuming a table,
workers will (within some margin of error) have no reason to have
different values for cost limit and cost delay (in the absence of table
options). This removes the rationale for keeping cost limit and cost
delay in shared memory. Balancing the cost limit requires only the
number of active autovacuum workers vacuuming a table with no cost-based
table options.

Reviewed-by: Masahiko Sawada 
Reviewed-by: Kyotaro Horiguchi 
Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_ZngzqnEODc7LmS1NH04Kt6Y9huSjz5pp7%2BDXhrjDA0gw%40mail.gmail.com
---
 src/backend/commands/vacuum.c   |  44 -
 src/backend/postmaster/autovacuum.c | 253 +++-
 src/include/commands/vacuum.h   |   1 +
 3 files changed, 180 insertions(+), 118 deletions(-)

diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 96df5e2920..a51a3f78a0 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -48,6 +48,7 @@
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
+#include "postmaster/interrupt.h"
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
 #include "storage/pmsignal.h"
@@ -510,9 +511,9 @@ vacuum(List *relations, VacuumParams *params,
 	{
 		ListCell   *cur;

-		VacuumUpdateCosts();
 		in_vacuum = true;
-		VacuumCostActive = (VacuumCostDelay > 0);
+		VacuumFailsafeActive = false;
+		VacuumUpdateCosts();
 		VacuumCostBalance = 0;
 		VacuumPageHit = 0;
 		VacuumPageMiss = 0;
@@ -566,12 +567,20 @@ vacuum(List *relations, VacuumParams *params,
 	CommandCounterIncrement();
 }
 			}
+
+			/*
+			 * Ensure VacuumFailsafeActive has been reset before vacuuming the
+			 * next relation relation.
+			 */
+			VacuumFailsafeActive = false;
 		}
 	}
 	PG_FINALLY();
 	{
 		in_vacuum = false;
 		VacuumCostActive = false;
+		VacuumFailsafeActive = false;
+		VacuumCostBalance = 0;
 	}
 	PG_END_TRY();

@@ -2238,7 +2247,27 @@ vacuum_delay_point(void)
 	/* Always check for interrupts */
 	CHECK_FOR_INTERRUPTS();

-	if (!VacuumCostActive || InterruptPending)
+	if (InterruptPending ||
+		(!VacuumCostActive && !ConfigReloadPending))
+		return;
+
+	/*
+	 * Reload the configuration file if requested. This allows changes to
+	 * autovacuum_vacuum_cost_limit and

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-04-03 Thread Alexander Korotkov

On Mon, Apr 3, 2023 at 5:12 PM Pavel Borisov  wrote:
> Upon Alexander reverting patches v15 from master, I've rebased what
> was correction patches v4 in a message above on a fresh master
> (together with patches v15). The resulting patch v16 is attached.

Pavel, thank you for you review, revisions and rebase.
We'll reconsider this once v17 is branched.

--
Regards,
Alexander Korotkov

Re: Patch proposal: New hooks in the connection path

2023-04-03 Thread Gregory Stark (as CFM)

This looks like it was a good discussion -- last summer. But it
doesn't seem to be a patch under active development now.

It sounds like there were some design constraints that still need some
new ideas to solve and a new patch will be needed to address them.

Should this be marked Returned With Feedback?

-- 
Gregory Stark
As Commitfest Manager

Re: psql: Add role's membership options to the \du+ command

2023-04-03 Thread David G. Johnston

On Wed, Mar 22, 2023 at 11:11 AM Pavel Luzanov 
wrote:

> In the previous version, I didn't notice (unlike cfbot) the compiler
> warning. Fixed in version 6.
>
>
I've marked this Ready for Committer.

My opinion is that this is a necessary modification due to the
already-committed changes to the membership grant implementation and so
only needs to be accepted prior to v16 going live, not feature freeze.

I've added Robert to this thread as the committer of said changes.

David J.

Re: [PATCH] Introduce array_shuffle() and array_sample()

2023-04-03 Thread Daniel Gustafsson

> On 3 Apr 2023, at 23:46, Tom Lane  wrote:
> 
> Daniel Gustafsson  writes:
>> On 29 Sep 2022, at 21:33, Tom Lane  wrote:
>>> I find this behavior a bit surprising:
>>> 
>>> +SELECT 
>>> array_dims(array_sample('[-1:2][2:3]={{1,2},{3,NULL},{5,6},{7,8}}'::int[], 
>>> 3));
>>> + array_dims  
>>> +-
>>> + [-1:1][2:3]
>>> +(1 row)
>>> 
>>> I can buy preserving the lower bound in array_shuffle(), but
>>> array_sample() is not preserving the first-dimension indexes of
>>> the array, so ISTM it ought to reset the first lower bound to 1.
> 
>> I might be daft but I'm not sure I follow why not preserving here, can you
>> explain?
> 
> Because array_sample selects only some of the (first level) array
> elements, those elements are typically not going to have the same
> indexes in the output as they did in the input.  So I don't see why
> it would be useful to preserve the same lower-bound index.  It does
> make sense to preserve the lower-order index bounds ([2:3] in this
> example) because we are including or not including those array
> slices as a whole.

Ah, ok, now I see what you mean, thanks!  I'll try to fix up the patch like
this tomorrow.

--
Daniel Gustafsson

Re: Split index and table statistics into different types of stats

2023-04-03 Thread Gregory Stark (as CFM)

On Thu, 16 Mar 2023 at 05:25, Drouvot, Bertrand
 wrote:
>
> My plan was to get [1] done before resuming working on the "Split index and 
> table statistics into different types of stats" one.
> [1]: https://commitfest.postgresql.org/42/4106/

Generate pg_stat_get_xact*() functions with Macros ([1]) was committed March 27.

There's only a few days left in this CF. Would you like to leave this
here? Should it be marked Needs Review or Ready for Commit? Or should
we move it to the next CF now?

--
Gregory Stark
As Commitfest Manager

Re: [PATCH] Introduce array_shuffle() and array_sample()

2023-04-03 Thread Tom Lane

Daniel Gustafsson  writes:
> On 29 Sep 2022, at 21:33, Tom Lane  wrote:
>> I find this behavior a bit surprising:
>> 
>> +SELECT 
>> array_dims(array_sample('[-1:2][2:3]={{1,2},{3,NULL},{5,6},{7,8}}'::int[], 
>> 3));
>> + array_dims  
>> +-
>> + [-1:1][2:3]
>> +(1 row)
>> 
>> I can buy preserving the lower bound in array_shuffle(), but
>> array_sample() is not preserving the first-dimension indexes of
>> the array, so ISTM it ought to reset the first lower bound to 1.

> I might be daft but I'm not sure I follow why not preserving here, can you
> explain?

Because array_sample selects only some of the (first level) array
elements, those elements are typically not going to have the same
indexes in the output as they did in the input.  So I don't see why
it would be useful to preserve the same lower-bound index.  It does
make sense to preserve the lower-order index bounds ([2:3] in this
example) because we are including or not including those array
slices as a whole.

regards, tom lane

Re: generic plans and "initial" pruning

2023-04-03 Thread Tom Lane

Amit Langote  writes:
> [ v38 patchset ]

I spent a little bit of time looking through this, and concluded that
it's not something I will be wanting to push into v16 at this stage.
The patch doesn't seem very close to being committable on its own
terms, and even if it was now is not a great time in the dev cycle
to be making significant executor API changes.  Too much risk of
having to thrash the API during beta, or even change it some more
in v17.  I suggest that we push this forward to the next CF with the
hope of landing it early in v17.

A few concrete thoughts:

* I understand that your plan now is to acquire locks on all the
originally-named tables, then do permissions checks (which will
involve only those tables), then dynamically lock just inheritance and
partitioning child tables as we descend the plan tree.  That seems
more or less okay to me, but it could be reflected better in the
structure of the patch perhaps.

* In particular I don't much like the "viewRelations" list, which
seems like a wart; those ought to be handled more nearly the same way
as other RTEs.  (One concrete reason why is that this scheme is going
to result in locking views in a different order than they were locked
during original parsing, which perhaps could contribute to deadlocks.)
Maybe we should store an integer list of which RTIs need to be locked
in the early phase?  Building that in the parser/rewriter would provide
a solid guide to the original locking order, so we'd be trivially sure
of duplicating that.  (It might be close enough to follow the RT list
order, which is basically what AcquireExecutorLocks does today, but
this'd be more certain to do the right thing.)  I'm less concerned
about lock order for child tables because those are just going to
follow the inheritance or partitioning structure.

* I don't understand the need for changes like this:

/* clean up tuple table */
-   ExecClearTuple(node->ps.ps_ResultTupleSlot);
+   if (node->ps.ps_ResultTupleSlot)
+   ExecClearTuple(node->ps.ps_ResultTupleSlot);

ISTM that the process ought to involve taking a lock (if needed)
before we have built any execution state for a given plan node,
and if we find we have to fail, returning NULL instead of a
partially-valid planstate node.  Otherwise, considerations of how
to handle partially-valid nodes are going to metastasize into all
sorts of places, almost certainly including EXPLAIN for instance.
I think we ought to be able to limit the damage to "parent nodes
might have NULL child links that you wouldn't have expected".
That wouldn't faze ExecEndNode at all, nor most other code.

* More attention is needed to comments.  For example, in a couple of
places in plancache.c you have removed function header comments
defining API details and not replaced them with any info about the new
details, despite the fact that those details are more complex than the
old.

> It seems I hadn't noted in the ExecEndNode()'s comment that all node
> types' recursive subroutines need to  handle the change made by this
> patch that the corresponding ExecInitNode() subroutine may now return
> early without having initialized all state struct fields.
> Also noted in the documentation for CustomScan and ForeignScan that
> the Begin*Scan callback may not have been called at all, so the
> End*Scan should handle that gracefully.

Yeah, I think we need to avoid adding such requirements.  It's the
sort of thing that would far too easily get past developer testing
and only fail once in a blue moon in the field.

regards, tom lane

Re: WIP: Aggregation push-down - take2

2023-04-03 Thread Gregory Stark (as CFM)

It looks like in November 2022 Tomas Vondra said:

> I did a quick initial review of the v20 patch series.
> I plan to do a
more thorough review over the next couple days, if time permits.
> In
general I think the patch is in pretty good shape.

Following which Antonin Houska updated the patch responding to his
review comments.

Since then this patch has demonstrated the unfortunate "please rebase
thx" followed by the author rebasing and getting no feedback until
"please rebase again thx"...

So while the patch doesn't currently apply it seems like it really
should be either Needs Review or Ready for Commit.

That said, I suspect this patch has missed the boat for this CF.
Hopefully it will get more attention next release.

I'll move it to the next CF but set it to Needs Review even though it
needs a rebase.

-- 
Gregory Stark
As Commitfest Manager

Re: GUC for temporarily disabling event triggers

2023-04-03 Thread Daniel Gustafsson

> On 3 Apr 2023, at 16:09, Robert Haas  wrote:
> On Mon, Apr 3, 2023 at 9:15 AM Daniel Gustafsson  wrote:
>>> On 3 Apr 2023, at 15:09, Robert Haas  wrote:

>>> I continue to think it's odd that the sense of this is inverted as
>>> compared with row_security.
>> 
>> I'm not sure I follow.  Do you propose that the GUC enables classes of event
>> triggers, the default being "all" (or similar) and one would remove the type 
>> of
>> EVT for which debugging is needed?  That doesn't seem like a bad idea, just 
>> one
>> that hasn't come up in the discussion (and I didn't think about).
> 
> Right. Although to be fair, that idea doesn't sound as good if we're
> going to have settings other than "on" or "off".

Yeah.  The patch as it stands allow for disabling specific types rather than
all-or-nothing, which is why the name was "ignore".

> I'm not sure what the best thing to do is here, I just think it
> deserves some thought.

Absolutely, the discussion is much appreciated.  Having done some thinking I
think I'm still partial to framing it as a disabling GUC rather than an
enabling; with the act of setting it being "As an admin I want to skip
execution of all evt's of type X". 

--
Daniel Gustafsson

Re: zstd compression for pg_dump

2023-04-03 Thread Tomas Vondra

On 4/3/23 21:17, Justin Pryzby wrote:
> On Sat, Apr 01, 2023 at 10:26:01PM +0200, Tomas Vondra wrote:
>>> Feel free to mess around with threads (but I'd much rather see the patch
>>> progress for zstd:long).
>>
>> OK, understood. The long mode patch is pretty simple. IIUC it does not
>> change the format, i.e. in the worst case we could leave it for PG17
>> too. Correct?
> 
> Right, libzstd only has one "format", which is the same as what's used
> by the commandline tool.  zstd:long doesn't change the format of the
> output: the library just uses a larger memory buffer to allow better
> compression.  There's no format change for zstd:workers, either.
> 

OK. I plan to do a bit more review/testing on this, and get it committed
over the next day or two, likely including the long mode. One thing I
noticed today is that maybe long_distance should be a bool, not int.
Yes, ZSTD_c_enableLongDistanceMatching() accepts int, but it'd be
cleaner to cast the value during a call and keep it bool otherwise.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH] Introduce array_shuffle() and array_sample()

2023-04-03 Thread Daniel Gustafsson

> On 29 Sep 2022, at 21:33, Tom Lane  wrote:
> 
> Martin Kalcher  writes:
>> New patch: array_shuffle() and array_sample() use pg_global_prng_state now.
> 
> I took a closer look at the patch today.

Since this seems pretty close to going in, and seems like quite useful
functions, I took a look to see if I could get it across the line (although I
noticed that CFM beat me to the clock in sending this =)).

>  I find this behavior a bit surprising:
> 
> +SELECT 
> array_dims(array_sample('[-1:2][2:3]={{1,2},{3,NULL},{5,6},{7,8}}'::int[], 
> 3));
> + array_dims  
> +-
> + [-1:1][2:3]
> +(1 row)
> 
> I can buy preserving the lower bound in array_shuffle(), but
> array_sample() is not preserving the first-dimension indexes of
> the array, so ISTM it ought to reset the first lower bound to 1.

I might be daft but I'm not sure I follow why not preserving here, can you
explain?

The rest of your comments have been addressed in the attached v6 I think
(although I'm pretty sure the docs part is just as bad now, explaining these in
concise words is hard, will take another look with fresh eyes tomorrow).

--
Daniel Gustafsson

v6-0001-Introduce-array_shuffle-and-array_sample.patch
Description: Binary data

Re: [PATCH] Introduce array_shuffle() and array_sample()

2023-04-03 Thread Gregory Stark (as CFM)

Given that there's been no updates since September 22 I'm going to
make this patch Returned with Feedback. The patch can be resurrected
when there's more work done -- don't forget to move it to the new CF
when you do that.


-- 
Gregory Stark
As Commitfest Manager

Re: pg_stats and range statistics

2023-04-03 Thread Gregory Stark (as CFM)

On Fri, 24 Mar 2023 at 14:48, Egor Rogov  wrote:
>
> Done.

> There is one thing I'm not sure what to do about. This check:
>
>   if (typentry->typtype != TYPTYPE_RANGE)
>   ereport(ERROR,
>   (errcode(ERRCODE_DATATYPE_MISMATCH),
>errmsg("expected array of ranges")));
>
> doesn't work, because the range_get_typcache() call errors out first
> ("type %u is not a range type"). The message doesn't look friendly
> enough for user-faced SQL function. Should we duplicate
> range_get_typcache's logic and replace the error message?

> Okay. I've corrected the examples a bit.

It sounds like you've addressed Tomas's feedback and still have one
open question.

Fwiw I rebased it, it seemed to merge fine automatically.

I've updated the CF entry to Needs Review. But at this late date it
may have to wait until the next release.




-- 
Gregory Stark
As Commitfest Manager
From 87424880f1a970448979681684e6916f33567eeb Mon Sep 17 00:00:00 2001
From: Greg Stark 
Date: Mon, 3 Apr 2023 17:04:11 -0400
Subject: [PATCH] pg_stats and range statistics

---
 doc/src/sgml/func.sgml|  36 ++
 doc/src/sgml/system-views.sgml|  40 +++
 src/backend/catalog/system_views.sql  |  30 -
 src/backend/utils/adt/rangetypes_typanalyze.c | 107 ++
 src/include/catalog/pg_proc.dat   |  10 ++
 src/test/regress/expected/rangetypes.out  |  22 
 src/test/regress/expected/rules.out   |  34 +-
 src/test/regress/sql/rangetypes.sql   |   7 ++
 8 files changed, 284 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 918a492234..548078a12e 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19643,6 +19643,24 @@ SELECT NULLIF(value, '(none)') ...

   
 
+  
+   
+
+ ranges_lower
+
+ranges_lower ( anyarray )
+anyarray
+   
+   
+Extracts lower bounds of ranges in the array (NULL if
+the range is empty or the lower bound is infinite).
+   
+   
+lower(ARRAY[numrange(1.1,2.2),numrange(3.3,4.4)])
+{1.1,3.3}
+   
+  
+
   

 
@@ -19661,6 +19679,24 @@ SELECT NULLIF(value, '(none)') ...

   
 
+  
+   
+
+ ranges_upper
+
+ranges_upper ( anyarray )
+anyarray
+   
+   
+Extracts upper bounds of ranges (NULL if the
+range is empty or the upper bound is infinite).
+   
+   
+upper(ARRAY[numrange(1.1,2.2),numrange(3.3,4.4)])
+{2.2,4.4}
+   
+  
+
   

 
diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index bb1a418450..d7760838ae 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -3784,6 +3784,46 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
non-null elements.  (Null for scalar types.)
   
  
+
+ 
+  
+   range_length_histogram anyarray
+  
+  
+   A histogram of the lengths of non-empty and non-null range values of a
+   range type column. (Null for non-range types.)
+  
+ 
+
+ 
+  
+   range_empty_frac float4
+  
+  
+   Fraction of column entries whose values are empty ranges.
+   (Null for non-range types.)
+  
+ 
+
+ 
+  
+   range_lower_histogram anyarray
+  
+  
+   A histogram of lower bounds of non-empty and non-null range values.
+   (Null for non-range types.)
+  
+ 
+
+ 
+  
+   range_upper_histogram anyarray
+  
+  
+   A histogram of upper bounds of non-empty and non-null range values.
+   (Null for non-range types.)
+  
+ 
 

   
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 574cbc2e44..3fb7ee448f 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -243,7 +243,35 @@ CREATE VIEW pg_stats WITH (security_barrier) AS
 WHEN stakind3 = 5 THEN stanumbers3
 WHEN stakind4 = 5 THEN stanumbers4
 WHEN stakind5 = 5 THEN stanumbers5
-END AS elem_count_histogram
+END AS elem_count_histogram,
+CASE
+WHEN stakind1 = 6 THEN stanumbers1[1]
+WHEN stakind2 = 6 THEN stanumbers2[1]
+WHEN stakind3 = 6 THEN stanumbers3[1]
+WHEN stakind4 = 6 THEN stanumbers4[1]
+WHEN stakind5 = 6 THEN stanumbers5[1]
+END AS range_empty_frac,
+CASE
+WHEN stakind1 = 6 THEN stavalues1
+WHEN stakind2 = 6 THEN stavalues2
+WHEN stakind3 = 6 THEN stavalues3
+WHEN stakind4 = 6 THEN stavalues4
+WHEN stakind5 = 6 THEN stavalues5
+END AS range_length_histogram,
+CASE
+

Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

2023-04-03 Thread Jacob Champion

On Mon, Apr 3, 2023 at 12:40 PM Daniel Gustafsson  wrote:
> Doh, sorry, my bad.  I read and wrote 1.0.1 but was thinking about 1.0.2.  You
> are right, in 1.0.1 that API does not exist.  I'm not all too concerned with
> skipping this tests on OpenSSL versions that by the time 16 ships are 6 years
> EOL - and I'm not convinced that spending meson/autoconf cycles to include 
> them
> is warranted.

Cool. v10 keys off of HAVE_SSL_CTX_SET_CERT_CB, instead.

> > We could maybe have them connect to a known host:
> >
> >$ echo Q | openssl s_client -connect postgresql.org:443 
> > -verify_return_error
>
> Something along these lines is probably best, if we do it at all.  Needs
> sleeping on.

Sounds good.

Thanks!
--Jacob
1:  18fd368e0e ! 1:  957a011364 libpq: add sslrootcert=system to use default CAs
@@ .cirrus.yml: task:
  
ccache_cache:
 
- ## configure ##
-@@ configure: $as_echo "$ac_res" >&6; }
- 
- } # ac_fn_c_check_func
- 
-+# ac_fn_c_check_decl LINENO SYMBOL VAR INCLUDES
-+# -
-+# Tests whether SYMBOL is declared in INCLUDES, setting cache variable VAR
-+# accordingly.
-+ac_fn_c_check_decl ()
-+{
-+  as_lineno=${as_lineno-"$1"} 
as_lineno_stack=as_lineno_stack=$as_lineno_stack
-+  # Initialize each $ac_[]_AC_LANG_ABBREV[]_decl_warn_flag once.
-+  as_decl_name=`echo $2|sed 's/ *(.*//'`
-+  as_decl_use=`echo $2|sed -e 's/(/((/' -e 's/)/) 0&/' -e 's/,/) 0& (/g'`
-+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether $as_decl_name 
is declared" >&5
-+$as_echo_n "checking whether $as_decl_name is declared... " >&6; }
-+if eval \${$3+:} false; then :
-+  $as_echo_n "(cached) " >&6
-+else
-+  ac_save_werror_flag=$ac_c_werror_flag
-+  ac_c_werror_flag="$ac_c_decl_warn_flag$ac_c_werror_flag"
-+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
-+/* end confdefs.h.  */
-+$4
-+int
-+main ()
-+{
-+#ifndef $as_decl_name
-+#ifdef __cplusplus
-+  (void) $as_decl_use;
-+#else
-+  (void) $as_decl_name;
-+#endif
-+#endif
-+
-+  ;
-+  return 0;
-+}
-+_ACEOF
-+if ac_fn_c_try_compile "$LINENO"; then :
-+  eval "$3=yes"
-+else
-+  eval "$3=no"
-+fi
-+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
-+  ac_c_werror_flag=$ac_save_werror_flag
-+fi
-+eval ac_res=\$$3
-+ { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5
-+$as_echo "$ac_res" >&6; }
-+  eval $as_lineno_stack; ${as_lineno_stack:+:} unset as_lineno
-+
-+} # ac_fn_c_check_decl
-+
- # ac_fn_c_check_type LINENO TYPE VAR INCLUDES
- # ---
- # Tests whether TYPE exists after having included INCLUDES, setting cache
-@@ configure: rm -f conftest.val
-   as_fn_set_status $ac_retval
- 
- } # ac_fn_c_compute_int
--
--# ac_fn_c_check_decl LINENO SYMBOL VAR INCLUDES
--# -
--# Tests whether SYMBOL is declared in INCLUDES, setting cache variable VAR
--# accordingly.
--ac_fn_c_check_decl ()
--{
--  as_lineno=${as_lineno-"$1"} 
as_lineno_stack=as_lineno_stack=$as_lineno_stack
--  # Initialize each $ac_[]_AC_LANG_ABBREV[]_decl_warn_flag once.
--  as_decl_name=`echo $2|sed 's/ *(.*//'`
--  as_decl_use=`echo $2|sed -e 's/(/((/' -e 's/)/) 0&/' -e 's/,/) 0& (/g'`
--  { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether $as_decl_name 
is declared" >&5
--$as_echo_n "checking whether $as_decl_name is declared... " >&6; }
--if eval \${$3+:} false; then :
--  $as_echo_n "(cached) " >&6
--else
--  ac_save_werror_flag=$ac_c_werror_flag
--  ac_c_werror_flag="$ac_c_decl_warn_flag$ac_c_werror_flag"
--  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
--/* end confdefs.h.  */
--$4
--int
--main ()
--{
--#ifndef $as_decl_name
--#ifdef __cplusplus
--  (void) $as_decl_use;
--#else
--  (void) $as_decl_name;
--#endif
--#endif
--
--  ;
--  return 0;
--}
--_ACEOF
--if ac_fn_c_try_compile "$LINENO"; then :
--  eval "$3=yes"
--else
--  eval "$3=no"
--fi
--rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
--  ac_c_werror_flag=$ac_save_werror_flag
--fi
--eval ac_res=\$$3
-- { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5
--$as_echo "$ac_res" >&6; }
--  eval $as_lineno_stack; ${as_lineno_stack:+:} unset as_lineno
--
--} # ac_fn_c_check_decl
- cat >config.log <<_ACEOF
- This file contains any messages produced by compilers while
- running configure, to aid debugging if configure makes a mistake.
-@@ configure: _ACEOF
- fi
- done
- 
-+  # Let tests differentiate between vanilla OpenSSL and LibreSSL.
-+  # The Clang

Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2023-04-03 Thread Melanie Plageman

On Mon, Apr 3, 2023 at 12:13 AM Pavel Luzanov  wrote:
>
> Hello,
>
> I found that the 'standalone backend' backend type is not documented
> right now.
> Adding something like (from commit message) would be helpful:
>
> Both the bootstrap backend and single user mode backends will have
> backend_type STANDALONE_BACKEND.

Thanks for the report.

Attached is a tiny patch to add standalone backend type to
pg_stat_activity documentation (referenced by pg_stat_io).

I mentioned both the bootstrap process and single user mode process in
the docs, though I can't imagine that the bootstrap process is relevant
for pg_stat_activity.

I also noticed that the pg_stat_activity docs call background workers
"parallel workers" (though it also mentions that extensions could have
other background workers registered), but this seems a bit weird because
pg_stat_activity uses GetBackendTypeDesc() and this prints "background
worker" for type B_BG_WORKER. Background workers doing parallelism tasks
is what users will most often see in pg_stat_activity, but I feel like
it is confusing to have it documented as something different than what
would appear in the view. Unless I am misunderstanding something...

- Melanie
From d9218d082397d9b87a3e126bce4a45e9ec720ff2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Mon, 3 Apr 2023 16:38:47 -0400
Subject: [PATCH v1] Document standalone backend type in pg_stat_activity

Reported-by: Pavel Luzanov 
Discussion: https://www.postgresql.org/message-id/fcbe2851-f1fb-9863-54bc-a95dc7a0d946%40postgrespro.ru
---
 doc/src/sgml/monitoring.sgml | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index d5a45f996d..a00fe9c6a3 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -989,10 +989,12 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
parallel worker, background writer,
client backend, checkpointer,
archiver,
-   startup, walreceiver,
-   walsender and walwriter.
-   In addition, background workers registered by extensions may have
-   additional types.
+   startup,
+   standalone backend (which includes both the
+process and bootstrap
+   process), walreceiver, walsender
+   and walwriter. In addition, background workers
+   registered by extensions may have additional types.

-- 
2.37.2

Re: pg_usleep for multisecond delays

2023-04-03 Thread Gregory Stark (as CFM)

On Mon, 13 Mar 2023 at 17:17, Nathan Bossart  wrote:
>
> On Fri, Mar 10, 2023 at 12:28:28PM -0500, Tom Lane wrote:
> > A quick grep for pg_usleep with large intervals finds rather more
> > than you touched:
> >
> > [...]
> >
> > Did you have reasons for excluding the rest of these?
>
> I'm still looking into each of these, but my main concerns were 1) ensuring
> latch support had been set up and 2) figuring out the right interrupt
> function to use.  Thus far, I don't think latch support is a problem
> because InitializeLatchSupport() is called quite early.  However, I'm not
> as confident with the interrupt functions yet.  Some of these multisecond
> sleeps are done very early before the signal handlers are set up.  Others
> are done within the sigsetjmp() block.  And at least one is in a code path
> that both the startup process and single-user mode touch.

Is this still a WIP? Is it targeting this release? There are only a
few days left before the feature freeze. I'm guessing it should just
move to the next CF for the next release?



-- 
Gregory Stark
As Commitfest Manager

Re: Removing unneeded self joins

2023-04-03 Thread Gregory Stark (as CFM)

On Mon, 6 Mar 2023 at 00:30, Michał Kłeczek  wrote:
>
> Hi All,
>
> I just wanted to ask about the status and plans for this patch.
> I can see it being stuck at “Waiting for Author” status in several commit 
> tests.

Sadly it seems to now be badly in need of a rebase. There are large
hunks failing in the guts of analyzejoins.c as well as minor failures
elsewhere and lots of offsets which need to be reviewed.

I think given the lack of activity it's out of time for this release
at this point. I'm moving it ahead to the next CF.

--
Gregory Stark
As Commitfest Manager

Re: psql - factor out echo code

2023-04-03 Thread Gregory Stark (as CFM)

On Mon, 13 Feb 2023 at 05:41, Peter Eisentraut
 wrote:
>
> I think this patch requires an up-to-date summary and explanation.  The
> thread is over a year old and the patch has evolved quite a bit.  There
> are some test changes that are not explained.  Please provide more
> detail so that the patch can be considered.

Given this feedback I'm going to mark this Returned with Feedback. I
think it'll be clearer to start with a new thread explaining the
intent of the patch as it is now.

-- 
Gregory Stark
As Commitfest Manager

Re: SQL/JSON revisited

2023-04-03 Thread Alexander Lakhin


Hi Alvaro,

03.04.2023 20:16, Alvaro Herrera wrote:


So I pushed 0001 on Friday, and here are 0002 (which I intend to push
shortly, since it shouldn't be controversial) and the "JSON query
functions" patch as 0003.  After looking at it some more, I think there
are some things that need to be addressed by one of the authors:

- the gram.y solution to the "ON ERROR/ON EMPTY" clauses is quite ugly.
   I think we could make that stuff use something similar to
   ConstraintAttributeSpec with an accompanying post-processing function.
   That would reduce the number of ad-hoc hacks, which seem excessive.

- the changes in formatting.h have no explanation whatsoever.  At the
   very least, the new function should have a comment in the .c file.
   (And why is it at end of file?  I bet there's a better location)

- some nasty hacks are being used in the ECPG grammar with no tests at
   all.  It's easy to add a few lines to the .pgc file I added in prior
   commits.

- Some functions in jsonfuncs.c have changed from throwing hard errors
   into soft ones.  I think this deserves more commentary.

- func.sgml: The new functions are documented in a separate table for no
   reason that I can see.  Needs to be merged into one of the existing
   tables.  I didn't actually review the docs.


Please take a look at the following minor issues in
v15-0002-SQL-JSON-query-functions.patch:
1)
s/addreess/address/

2)
ECPGColLabelCommon gone with 83f1c7b74, but is still mentioned in ecpg.trailer.

3)
s/ExecEvalJsonCoercion/ExecEvalJsonExprCoercion/ ?
(there is no ExecEvalJsonCoercion() function)

4)
json_table mentioned in func.sgml:
    details the SQL/JSON
   functions that can be used to query JSON data, except
   for json_table.

but if JSON_TABLE not going to be committed in v16, maybe remove that reference
to it.

There is also a reference to JSON_TABLE in src/backend/parser/README:
parse_jsontable.c handle JSON_TABLE
(It was added with 9853bf6ab and survived the revert of SQL JSON last
year somehow.)

Best regards,
Alexander

Re: Prefetch the next tuple's memory during seqscans

2023-04-03 Thread Gregory Stark (as CFM)

On Sun, 29 Jan 2023 at 21:24, David Rowley  wrote:
>
> I've moved this patch to the next CF.  This patch has a dependency on
> what's being proposed in [1].

The referenced patch was committed March 19th but there's been no
comment here. Is this patch likely to go ahead this release or should
I move it forward again?

-- 
Gregory Stark
As Commitfest Manager

Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

2023-04-03 Thread Daniel Gustafsson

> On 3 Apr 2023, at 21:04, Jacob Champion  wrote:
> 
> On Sun, Apr 2, 2023 at 1:36 PM Daniel Gustafsson  wrote:
>>> On 31 Mar 2023, at 19:59, Jacob Champion  wrote:
>>> I can make that change; note that it'll also skip some of the new tests
>>> with OpenSSL 1.0.1, where there's no SSL_CTX_set_cert_cb. If that's
>>> acceptable, it should be an easy switch.
>> 
>> I'm not sure I follow, AFAICT it's present all the way till 3.1 at least?  
>> What
>> am I missing?
> 
> I don't see it anywhere in my 1.0.1 setup, and Meson doesn't define
> HAVE_SSL_CTX_SET_CERT_CB when built against it.

Doh, sorry, my bad.  I read and wrote 1.0.1 but was thinking about 1.0.2.  You
are right, in 1.0.1 that API does not exist.  I'm not all too concerned with
skipping this tests on OpenSSL versions that by the time 16 ships are 6 years
EOL - and I'm not convinced that spending meson/autoconf cycles to include them
is warranted.

Longer term I'd want to properly distinguish between LibreSSL and OpenSSL, but
then we should have a bigger discussion on what we want to use these values for.

>>> Is there something we could document that's more helpful than "make sure
>>> your installation isn't broken"?
>> 
>> I wonder if there is an openssl command line example for verifying defaults
>> that we can document and refer to?
> 
> We could maybe have them connect to a known host:
> 
>$ echo Q | openssl s_client -connect postgresql.org:443 
> -verify_return_error

Something along these lines is probably best, if we do it at all.  Needs
sleeping on.

--
Daniel Gustafsson

Re: Minimal logical decoding on standbys

2023-04-03 Thread Andres Freund

Hi,

On 2023-04-03 17:34:52 +0200, Alvaro Herrera wrote:
> On 2023-Apr-03, Drouvot, Bertrand wrote:
> 
> > +/*
> > + * Report terminating or conflicting message.
> > + *
> > + * For both, logical conflict on standby and obsolete slot are handled.
> > + */
> > +static void
> > +ReportTerminationInvalidation(bool terminating, bool islogical, int pid,
> > + NameData slotname, 
> > TransactionId *xid,
> > + XLogRecPtr 
> > restart_lsn, XLogRecPtr oldestLSN)
> > +{
> 
> > +   if (terminating)
> > +   appendStringInfo(_msg, _("terminating process %d to release 
> > replication slot \"%s\""),
> > +pid,
> > +NameStr(slotname));
> > +   else
> > +   appendStringInfo(_msg, _("invalidating"));
> > +
> > +   if (islogical)
> > +   {
> > +   if (terminating)
> > +   appendStringInfo(_msg, _(" because it conflicts 
> > with recovery"));
> 
> You can't build the strings this way, because it's not possible to put
> the strings into the translation machinery.  You need to write full
> strings for each separate case instead, without appending other string
> parts later.

Hm? That's what the _'s do. We build strings in parts in other places too.

You do need to use errmsg_internal() later, to prevent that format string from
being translated as well.

I'm not say that this is exactly the right way, don't get me wrong.

Greetings,

Andres Freund

Re: Pass heaprel to GlobalVisTestFor() in vacuumRedirectAndPlaceholder()

2023-04-03 Thread Peter Geoghegan

On Mon, Apr 3, 2023 at 12:09 AM Drouvot, Bertrand
 wrote:
> Right. Please find enclosed V2 also taking care of BTPageIsRecyclable()
> and _bt_pendingfsm_finalize().

Pushed that as too separate patches just now. Thanks.

BTW, I'm not overly happy about the extent of the changes to nbtree
from commit 61b313e4. I understand that it was necessary to pass down
a heaprel in a lot of places, which is bound to create a lot of churn.
However, a lot of the churn from the commit seems completely
avoidable. There is no reason why the BT_READ path in _bt_getbuf()
could possibly require a valid heaprel. In fact, most individual
BT_WRITE calls don't need heaprel, either -- only those that pass
P_NEW. The changes affecting places like _bt_mkscankey() and
_bt_metaversion() seem particularly bad.

Anyway, I'll take care of this myself at some point after feature freeze.

-- 
Peter Geoghegan

Re: zstd compression for pg_dump

2023-04-03 Thread Justin Pryzby

On Sat, Apr 01, 2023 at 10:26:01PM +0200, Tomas Vondra wrote:
> > Feel free to mess around with threads (but I'd much rather see the patch
> > progress for zstd:long).
> 
> OK, understood. The long mode patch is pretty simple. IIUC it does not
> change the format, i.e. in the worst case we could leave it for PG17
> too. Correct?

Right, libzstd only has one "format", which is the same as what's used
by the commandline tool.  zstd:long doesn't change the format of the
output: the library just uses a larger memory buffer to allow better
compression.  There's no format change for zstd:workers, either.

-- 
Justin

Re: running logical replication as the subscription owner

2023-04-03 Thread Jeff Davis

On Mon, 2023-04-03 at 10:26 -0400, Robert Haas wrote:
> Not very much. I think the biggest risk is user confusion, but I
> don't
> think that's a huge risk because I don't think this scenario will
> come
> up very often. Also, it's kind of hard to imagine that there's a
> security model here which never does anything potentially surprising.

Alright, let's just proceed as-is then. I believe these patches are a
major improvement to the usability of logical replication and will put
up with the weirdness. I wanted to understand better why it's there,
and I'm not sure I fully do, but we'll have more time to discuss later.

Regards,
Jeff Davis

Re: Should vacuum process config file reload more often

2023-04-03 Thread Andres Freund

Hi,

On 2023-04-03 14:43:14 -0400, Tom Lane wrote:
> Melanie Plageman  writes:
> > v13 attached with requested updates.
> 
> I'm afraid I'd not been paying any attention to this discussion,
> but better late than never.  I'm okay with letting autovacuum
> processes reload config files more often than now.  However,
> I object to allowing ProcessConfigFile to be called from within
> commands in a normal user backend.  The existing semantics are
> that user backends respond to SIGHUP only at the start of processing
> a user command, and I'm uncomfortable with suddenly deciding that
> that can work differently if the command happens to be VACUUM.
> It seems unprincipled and perhaps actively unsafe.

I think it should be ok in commands like VACUUM that already internally start
their own transactions, and thus require to be run outside of a transaction
and at the toplevel. I share your concerns about allowing config reload in
arbitrary places. While we might want to go there, it would require a lot more
analysis.

Greetings,

Andres Freund

Re: Thoughts on using Text::Template for our autogenerated code?

2023-04-03 Thread Corey Huinker

>
> Yeah, it's somewhat hard to believe that the cost/benefit ratio would be
> attractive.  But maybe you could mock up some examples of what the input
> could look like, and get people on board (or not) before writing any
> code.
>
>
tl;dr - I tried a few things, nothing that persuades myself let alone the
community, but perhaps some ideas for the future.

I borrowed Bertrand's ongoing work for waiteventnames.* because that is
what got me thinking about this in the first place. I considered a few
different templating libraries:

There is no perl implementation of the golang template library (example of
that here: https://blog.gopheracademy.com/advent-2017/using-go-templates/ )
that I could find.

Text::Template does not support loops, and as such it is no better than
here-docs.

Template Toolkit seems to do what we need, but it has a kitchen sink of
dependencies that make it an unattractive option, so I didn't even attempt
it.

HTML::Template has looping and if/then/else constructs, and it is a single
standalone library. It also does a "separation of concerns" wherein you
pass in parameter names and values, and some parameters can be for loops,
which means you pass an arrayref of hashrefs that the template engine loops
over. That's where the advantages stop, however. It is fairly verbose, and
because it is HTML-centric it isn't very good about controlling whitespace,
which leads to piling template directives onto the same line in order to
avoid spurious newlines. As such I cannot recommend it.

My ideal template library would have text something like this:

[% loop events %]
[% $enum_value %]
[% if __first__ +%] = [%+ $inital_value %][% endif %]
[% if ! __last__ %],[% endif +%]
[% end loop %]
[% loop xml_blocks indent: relative,spaces,4 %]



  [%element_body%]/>



[% end loop %]


[%+ means "leading whitespace matters", +%] means "trailing whitespace
matters"
That pseudocode is a mix of ASP, HTML::Template. The special variables
__first__ and __last__ refer to which iteration of the loop we are on. You
would pass it a data structure like this:

{events: [ { enum_value: "abc", initial_value: "def"}, ... { enum_value:
"wuv", initial_value: "xyz" } ],
 xml_block: [ {attrib_val: "one", element_body: "two"} ]
 }


I did one initial pass with just converting printf statements to here-docs,
and the results were pretty unsatisfying. It wasn't really possible to
"see" the output files take shape.

My next attempt was to take the "separation of concerns" code from the
HTML::Template version, constructing the nested data structure of resolved
output values, and then iterating over that once per output file. This
resulted in something cleaner, partly because we're only writing one file
type at a time, partly because the interpolated variables have names much
closer to their output meaning.

In doing this, it occurred to me that a lot of this effort is in getting
the code to conform to our own style guide, at the cost of the generator
code being less readable. What if we wrote the generator and formatted the
code in a way that made sense for the generator, and then pgindented it.
That's not the workflow right now, but perhaps it could be.

Conclusions:
- There is no "good enough" template engine that doesn't require big
changes in dependencies.
- pgindent will not save you from a run-on sentence, like putting all of
a typdef enum values on one line
- There is some clarity value in either separating input processing from
the output processing, or making the input align more closely with the
outputs
- Fiddling with indentation and spacing detracts from legibility no matter
what method is used.
- here docs are basically ok but they necessarily confuse output
indentation with code indentation. it is possible to de-indent them them
with <<~ but that's a 5.25+ feature.
- Any of these principles can be applied at any time, with no overhaul
required.


"sorted-" is the slightly modified version of Bertrand's code.
"eof-as-is-" is a direct conversion of the original but using here-docs.
"heredoc-fone-file-at-a-time-" first generates an output-friendly data
structure
"needs-pgindent-" is what is possible if we format for our own readability
and make pgindent fix the output, though it was not a perfect output match


sorted-generate-waiteventnames.pl
Description: Perl program


eof-as-is-generate-waiteventnames.pl
Description: Perl program


heredoc-one-file-at-a-time-generate-waiteventnames.pl
Description: Perl program


needs-pgindent-generate-waiteventnames.pl
Description: Perl program

Re: running logical replication as the subscription owner

2023-04-03 Thread Jeff Davis

On Sun, 2023-04-02 at 20:21 -0700, Noah Misch wrote:
> The most-plausible-to-me attack involves an ENABLE ALWAYS trigger
> that logs
> CURRENT_USER to an audit table.

How does requiring that the subscription owner have SET ROLE privileges
on the table owner help that case? As Robert pointed out, users coming
from v15 will have superuser subscription owners anyway, so the change
will be silent for them.

We need support to apply changes as the table owner, and we need that
to be the default; and this patch provides those things, and almost all
users of logical replication will be better off after this is
committed.

The small number of users for whom this new model is not good still
need the right documentation in front of them to understand the
consequences, so that they can opt out one way or another (as 0002
offers). Release notes are probably the most powerful tool we have for
notifying users, unfortunately. Requiring SET ROLE for users that are
almost certainly superusers doesn't give an opportunity to educate
people about the change in behavior.

As I said before, I'm fine with requiring that the subscription owner
can SET ROLE to the table owner for v16. It's the most conservative
choice and the most "correct" (in that no lesser privilege we have
today is a perfect match).

But I feel like we can do better in version 17 when we have time to
actually work through common use cases and the exceptional cases and
weight them appropriately. Like, how common is it to want to get the
user from a trigger on the subscriber side? Should that trigger be
using SESSION_USER instead of CURRENT_USER? Security is best when it
takes into account what people actually want to do and makes it easy to
do that securely.

Regards,
Jeff Davis

Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

2023-04-03 Thread Jacob Champion

On Sun, Apr 2, 2023 at 1:36 PM Daniel Gustafsson  wrote:
> > On 31 Mar 2023, at 19:59, Jacob Champion  wrote:
> > I can make that change; note that it'll also skip some of the new tests
> > with OpenSSL 1.0.1, where there's no SSL_CTX_set_cert_cb. If that's
> > acceptable, it should be an easy switch.
>
> I'm not sure I follow, AFAICT it's present all the way till 3.1 at least?  
> What
> am I missing?

I don't see it anywhere in my 1.0.1 setup, and Meson doesn't define
HAVE_SSL_CTX_SET_CERT_CB when built against it.

> > Is there something we could document that's more helpful than "make sure
> > your installation isn't broken"?
>
> I wonder if there is an openssl command line example for verifying defaults
> that we can document and refer to?

We could maybe have them connect to a known host:

$ echo Q | openssl s_client -connect postgresql.org:443 -verify_return_error

Alternatively, OpenSSL will show you the OPENSSLDIR:

$ openssl version -d
OPENSSLDIR: "/usr/lib/ssl"

and then we could tell users to ensure they have a populated certs/
directory or a cert.pem file underneath it. That'll be prone to rot,
though (e.g. OpenSSL 3 introduces the default store in addition to the
default file+directory).

Thanks,
--Jacob

Re: hio.c does visibilitymap_pin()/IO while holding buffer lock

2023-04-03 Thread Andres Freund

Hi,

On 2023-04-03 14:25:59 +0200, Tomas Vondra wrote:
> On 4/3/23 00:40, Andres Freund wrote:
> > Hi,
> >
> > On 2023-03-28 19:17:21 -0700, Andres Freund wrote:
> >> On 2023-03-28 18:21:02 -0700, Andres Freund wrote:
> >>> Here's a draft patch.
> >>
> >> Attached is v2, with a stupid bug fixed and a bit of comment / pgindent
> >> polish.
> >
> > I'd welcome some review (Tomas?), but otherwise I'm planning to push ahead
> > with this.
> >
>
> I guess the 0001 part was already pushed, so I should be looking only at
> 0002, correct?

Yes.


> I think 0002 makes RelationGetBufferForTuple() harder to understand. I'm
> not saying it's incorrect, but I find it hard to reason about the new
> combinations of conditions :-(

> I mean, it only had this condition:
>
> if (otherBuffer != InvalidBuffer)
> {
> ...
> }
>
> but now it have
>
> if (unlockedTargetBuffer)
> {
>...
> }
> else if (otherBuffer != InvalidBuffer)
> {
> ...
> }
>
> if (unlockedTargetBuffer || otherBuffer != InvalidBuffer)
> {
> ...
> }
>
> Not sure how to improve that :-/ but not exactly trivial to figure out
> what's going to happen.

It's not great, I agree. I tried to make it easier to read in this version by
a) changing GetVisibilityMapPins() as I proposed
b) added a new variable "recheckVmPins", that gets set in
   if (unlockedTargetBuffer)
   and
   if (otherBuffer != InvalidBuffer)
c) reformulated comments


> Maybe this
>
>  * If we unlocked the target buffer above, it's unlikely, but possible,
>  * that another backend used space on this page.
>
> might say what we're going to do in this case. I mean - I understand
> some backend may use space in unlocked page, but what does that mean for
> this code? What is it going to do? (The same comment talks about the
> next condition in much more detail, for example.)

There's a comment about that detail further below. But you're right, it wasn't
clear as-is. How about now?


> Also, isn't it a bit strange the free space check now happens outside
> any if condition? It used to happen in the
>
> if (otherBuffer != InvalidBuffer)
> {
> ...
> }
>
> block, but now it happens outside.

Well, the alternative is to repeat it in the two branches, which doesn't seem
great either. Particularly because there'll be a third branch after the bulk
extension patch.

Greetings,

Andres Freund
>From 49183283d6701d22cf12f6a6280ed1aba02fc063 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Mon, 3 Apr 2023 11:18:10 -0700
Subject: [PATCH v3 1/2] hio: Relax rules for calling GetVisibilityMapPins()

Author:
Reviewed-by:
Discussion: https://postgr.es/m/
Backpatch:
---
 src/backend/access/heap/hio.c | 37 ---
 1 file changed, 26 insertions(+), 11 deletions(-)

diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index 7479212d4e0..e8aea42f8a9 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -131,9 +131,9 @@ ReadBufferBI(Relation relation, BlockNumber targetBlock,
  * For each heap page which is all-visible, acquire a pin on the appropriate
  * visibility map page, if we haven't already got one.
  *
- * buffer2 may be InvalidBuffer, if only one buffer is involved.  buffer1
- * must not be InvalidBuffer.  If both buffers are specified, block1 must
- * be less than block2.
+ * To avoid complexity in the callers, either buffer1 or buffer2 may be
+ * InvalidBuffer if only one buffer is involved. For the same reason, block2
+ * may be smaller than block1.
  */
 static void
 GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2,
@@ -143,6 +143,26 @@ GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2,
 	bool		need_to_pin_buffer1;
 	bool		need_to_pin_buffer2;
 
+	/*
+	 * Swap buffers around to handle case of a single block/buffer, and to
+	 * handle if lock ordering rules require to lock block2 first.
+	 */
+	if (!BufferIsValid(buffer1) ||
+		(BufferIsValid(buffer2) && block1 > block2))
+	{
+		Buffer tmpbuf = buffer1;
+		Buffer *tmpvmbuf = vmbuffer1;
+		BlockNumber tmpblock = block1;
+
+		buffer1 = buffer2;
+		vmbuffer1 = vmbuffer2;
+		block1 = block2;
+
+		buffer2 = tmpbuf;
+		vmbuffer2 = tmpvmbuf;
+		block2 = tmpblock;
+	}
+
 	Assert(BufferIsValid(buffer1));
 	Assert(buffer2 == InvalidBuffer || block1 <= block2);
 
@@ -502,14 +522,9 @@ loop:
 		 * done a bit of extra work for no gain, but there's no real harm
 		 * done.
 		 */
-		if (otherBuffer == InvalidBuffer || targetBlock <= otherBlock)
-			GetVisibilityMapPins(relation, buffer, otherBuffer,
- targetBlock, otherBlock, vmbuffer,
- vmbuffer_other);
-		else
-			GetVisibilityMapPins(relation, otherBuffer, buffer,
- otherBlock, targetBlock, vmbuffer_other,
- vmbuffer);
+		GetVisibilityMapPins(relation, buffer, otherBuffer,
+			 targetBlock, otherBlock, vmbuffer,
+			 vmbuffer_other);
 
 		/*
 		 * Now we can check to

Re: Why enable_hashjoin Completely disables HashJoin

2023-04-03 Thread Robert Haas

On Mon, Apr 3, 2023 at 2:04 PM Tom Lane  wrote:
> Yeah.  In some places it would not be too hard; for example, if we
> generated seqscan paths last instead of first for baserels, the rule
> could be "generate it if enable_seqscan is on OR if we made no other
> path for the rel".  It's much messier for joins though, partly because
> the same joinrel will be considered multiple times as we process
> different join orderings, plus it's usually unclear whether failing
> to generate any paths for joinrel X will lead to overall failure.

Yeah, good point. I'm now remembering that at one point I'd had the
idea of running the whole find-a-plan-for-a-jointree step and then
running it a second time if it fails to find a plan. But I think that
requires some restructuring, because I think right now it does some
things that we should only do once we know we're definitely getting a
plan out. Or else we have to reset some state. Like if we want to go
back and maybe add more paths then we have to undo and redo whatever
set_cheapest() did.

> A solution that would work is to treat disable_cost as a form of infinity
> that's counted separately from the actual cost estimate, that is we
> label paths as "cost X, plus there are N uses of disabled plan types".
> Then you sort first on N and after that on X.  But this'd add a good
> number of cycles to add_path, which I've not wanted to expend on a
> non-mainstream usage.

Yeah, I thought of that at one point too and rejected it for the same reason.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: possible proposal plpgsql GET DIAGNOSTICS oid = PG_ROUTINE_OID

2023-04-03 Thread Pavel Stehule

Hi


po 3. 4. 2023 v 19:37 odesílatel Tom Lane  napsal:

> Pavel Stehule  writes:
> > po 27. 3. 2023 v 5:36 odesílatel Kirk Wolak  napsal:
> >> I have marked the item Ready for Commiter...
>
> > Thank you for doc and for review
>
> I'm kind of surprised there was any interest in this proposal at all,
> TBH, but apparently there is some.  Still, I think you over-engineered
> it by doing more than the original proposal of making the function OID
> available.  The other things can be had by casting the OID to regproc
> or regprocedure, so I'd be inclined to add just one new keyword not
> three.  Besides, your implementation is a bit inconsistent: relying
> on fn_signature could return a result that is stale or doesn't conform
> to the current search_path.
>

ok

There is reduced patch + regress tests

Regards

Pavel



> regards, tom lane
>
diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml
index 7c8a49fe43..4df1b71904 100644
--- a/doc/src/sgml/plpgsql.sgml
+++ b/doc/src/sgml/plpgsql.sgml
@@ -1639,6 +1639,11 @@ GET DIAGNOSTICS integer_var = ROW_COUNT;
  line(s) of text describing the current call stack
   (see )
 
+
+ PG_CURRENT_ROUTINE_OID
+ oid
+ oid of the function currently running
+

   
  
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index b0a2cac227..ed9d26258b 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -2475,6 +2475,12 @@ exec_stmt_getdiag(PLpgSQL_execstate *estate, PLpgSQL_stmt_getdiag *stmt)
 }
 break;
 
+			case PLPGSQL_GETDIAG_CURRENT_ROUTINE_OID:
+exec_assign_value(estate, var,
+  ObjectIdGetDatum(estate->func->fn_oid),
+  false, OIDOID, -1);
+  break;
+
 			default:
 elog(ERROR, "unrecognized diagnostic item kind: %d",
 	 diag_item->kind);
diff --git a/src/pl/plpgsql/src/pl_funcs.c b/src/pl/plpgsql/src/pl_funcs.c
index 5a6eadccd5..4ebf393646 100644
--- a/src/pl/plpgsql/src/pl_funcs.c
+++ b/src/pl/plpgsql/src/pl_funcs.c
@@ -325,6 +325,8 @@ plpgsql_getdiag_kindname(PLpgSQL_getdiag_kind kind)
 			return "TABLE_NAME";
 		case PLPGSQL_GETDIAG_SCHEMA_NAME:
 			return "SCHEMA_NAME";
+		case PLPGSQL_GETDIAG_CURRENT_ROUTINE_OID:
+			return "PG_CURRENT_ROUTINE_OID";
 	}
 
 	return "unknown";
diff --git a/src/pl/plpgsql/src/pl_gram.y b/src/pl/plpgsql/src/pl_gram.y
index edeb72c380..6e81cf774d 100644
--- a/src/pl/plpgsql/src/pl_gram.y
+++ b/src/pl/plpgsql/src/pl_gram.y
@@ -318,6 +318,7 @@ static	void			check_raise_parameters(PLpgSQL_stmt_raise *stmt);
 %token 	K_OR
 %token 	K_PERFORM
 %token 	K_PG_CONTEXT
+%token 	K_PG_CURRENT_ROUTINE_OID
 %token 	K_PG_DATATYPE_NAME
 %token 	K_PG_EXCEPTION_CONTEXT
 %token 	K_PG_EXCEPTION_DETAIL
@@ -1035,6 +1036,7 @@ stmt_getdiag	: K_GET getdiag_area_opt K_DIAGNOSTICS getdiag_list ';'
 	break;
 /* these fields are allowed in either case */
 case PLPGSQL_GETDIAG_CONTEXT:
+case PLPGSQL_GETDIAG_CURRENT_ROUTINE_OID:
 	break;
 default:
 	elog(ERROR, "unrecognized diagnostic item kind: %d",
@@ -1123,6 +1125,9 @@ getdiag_item :
 		else if (tok_is_keyword(tok, ,
 K_RETURNED_SQLSTATE, "returned_sqlstate"))
 			$$ = PLPGSQL_GETDIAG_RETURNED_SQLSTATE;
+		else if (tok_is_keyword(tok, ,
+K_PG_CURRENT_ROUTINE_OID, "pg_current_routine_oid"))
+			$$ = PLPGSQL_GETDIAG_CURRENT_ROUTINE_OID;
 		else
 			yyerror("unrecognized GET DIAGNOSTICS item");
 	}
@@ -2523,6 +2528,7 @@ unreserved_keyword	:
 | K_OPEN
 | K_OPTION
 | K_PERFORM
+| K_PG_CURRENT_ROUTINE_OID
 | K_PG_CONTEXT
 | K_PG_DATATYPE_NAME
 | K_PG_EXCEPTION_CONTEXT
diff --git a/src/pl/plpgsql/src/pl_unreserved_kwlist.h b/src/pl/plpgsql/src/pl_unreserved_kwlist.h
index 466bdc7a20..b569d75708 100644
--- a/src/pl/plpgsql/src/pl_unreserved_kwlist.h
+++ b/src/pl/plpgsql/src/pl_unreserved_kwlist.h
@@ -81,6 +81,7 @@ PG_KEYWORD("open", K_OPEN)
 PG_KEYWORD("option", K_OPTION)
 PG_KEYWORD("perform", K_PERFORM)
 PG_KEYWORD("pg_context", K_PG_CONTEXT)
+PG_KEYWORD("pg_current_routine_oid", K_PG_CURRENT_ROUTINE_OID)
 PG_KEYWORD("pg_datatype_name", K_PG_DATATYPE_NAME)
 PG_KEYWORD("pg_exception_context", K_PG_EXCEPTION_CONTEXT)
 PG_KEYWORD("pg_exception_detail", K_PG_EXCEPTION_DETAIL)
diff --git a/src/pl/plpgsql/src/plpgsql.h b/src/pl/plpgsql/src/plpgsql.h
index 355c9f678d..258ce4dec7 100644
--- a/src/pl/plpgsql/src/plpgsql.h
+++ b/src/pl/plpgsql/src/plpgsql.h
@@ -157,7 +157,8 @@ typedef enum PLpgSQL_getdiag_kind
 	PLPGSQL_GETDIAG_DATATYPE_NAME,
 	PLPGSQL_GETDIAG_MESSAGE_TEXT,
 	PLPGSQL_GETDIAG_TABLE_NAME,
-	PLPGSQL_GETDIAG_SCHEMA_NAME
+	PLPGSQL_GETDIAG_SCHEMA_NAME,
+	PLPGSQL_GETDIAG_CURRENT_ROUTINE_OID
 } PLpgSQL_getdiag_kind;
 
 /*
diff --git a/src/test/regress/expected/plpgsql.out b/src/test/regress/expected/plpgsql.out
index 2d26be1a81..56ed4e14bf 100644
--- a/src/test/regress/expected/plpgsql.out

Re: [BUG] Logical replica crash if there was an error in a function.

2023-04-03 Thread Tom Lane

"Anton A. Melnikov"  writes:
> Now there are no any pending questions, so moved it to RFC.

I did not think this case was worth memorializing in a test before,
and I still do not.  I'm inclined to reject this patch.

regards, tom lane

Re: Should vacuum process config file reload more often

2023-04-03 Thread Tom Lane

Melanie Plageman  writes:
> v13 attached with requested updates.

I'm afraid I'd not been paying any attention to this discussion,
but better late than never.  I'm okay with letting autovacuum
processes reload config files more often than now.  However,
I object to allowing ProcessConfigFile to be called from within
commands in a normal user backend.  The existing semantics are
that user backends respond to SIGHUP only at the start of processing
a user command, and I'm uncomfortable with suddenly deciding that
that can work differently if the command happens to be VACUUM.
It seems unprincipled and perhaps actively unsafe.

regards, tom lane

Re: Minimal logical decoding on standbys

2023-04-03 Thread Drouvot, Bertrand


Hi,

On 4/3/23 8:10 AM, Drouvot, Bertrand wrote:

Hi,

On 4/3/23 7:35 AM, Amit Kapila wrote:

On Mon, Apr 3, 2023 at 4:26 AM Jeff Davis  wrote:

Agreed, even Bertrand and myself discussed the same approach few
emails above. BTW, if we have this selective logic to wake
physical/logical walsenders and for standby's, we only wake logical
walsenders at the time of  ApplyWalRecord() then do we need the new
conditional variable enhancement being discussed, and if so, why?



Thank you both for this new idea and discussion. In that case I don't think
we need the new CV API and the use of a CV anymore. As just said up-thread I'll 
submit
a new proposal with this new approach.



Please find enclosed V57 implementing the new approach in 0004. With the new 
approach in place
the TAP tests (0005) work like a charm (no delay and even after a promotion).

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comFrom 0d198484e008090a524562076326054be56935ca Mon Sep 17 00:00:00 2001
From: bdrouvotAWS 
Date: Tue, 7 Feb 2023 14:08:11 +
Subject: [PATCH v57 6/6] Doc changes describing details about logical
 decoding.

Author: Andres Freund (in an older version), Amit Khandekar, Bertrand Drouvot
Reviewed-By: Bertrand Drouvot, Andres Freund, Robert Haas, Fabrizio de Royes 
Mello
---
 doc/src/sgml/logicaldecoding.sgml | 22 ++
 1 file changed, 22 insertions(+)
 100.0% doc/src/sgml/

diff --git a/doc/src/sgml/logicaldecoding.sgml 
b/doc/src/sgml/logicaldecoding.sgml
index 4e912b4bd4..3da254ed1f 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -316,6 +316,28 @@ postgres=# select * from 
pg_logical_slot_get_changes('regression_slot', NULL, NU
  may consume changes from a slot at any given time.
 
 
+
+ A logical replication slot can also be created on a hot standby. To 
prevent
+ VACUUM from removing required rows from the system
+ catalogs, hot_standby_feedback should be set on the
+ standby. In spite of that, if any required rows get removed, the slot gets
+ invalidated. It's highly recommended to use a physical slot between the 
primary
+ and the standby. Otherwise, hot_standby_feedback will work, but only 
while the
+ connection is alive (for example a node restart would break it). Existing
+ logical slots on standby also get invalidated if wal_level on primary is 
reduced to
+ less than 'logical'.
+
+
+
+ For a logical slot to be created, it builds a historic snapshot, for which
+ information of all the currently running transactions is essential. On
+ primary, this information is available, but on standby, this information
+ has to be obtained from primary. So, slot creation may wait for some
+ activity to happen on the primary. If the primary is idle, creating a
+ logical slot on standby may take a noticeable time. One option to speed it
+ is to call the pg_log_standby_snapshot on the 
primary.
+
+
 
  
   Replication slots persist across crashes and know nothing about the state
-- 
2.34.1

From 4e392c06e39f36c4185780a21f2b90c7c6a97de4 Mon Sep 17 00:00:00 2001
From: bdrouvotAWS 
Date: Tue, 7 Feb 2023 09:04:12 +
Subject: [PATCH v57 5/6] New TAP test for logical decoding on standby.

In addition to the new TAP test, this commit introduces a new 
pg_log_standby_snapshot()
function.

The idea is to be able to take a snapshot of running transactions and write this
to WAL without requesting for a (costly) checkpoint.

Author: Craig Ringer (in an older version), Amit Khandekar, Bertrand Drouvot
Reviewed-By: Bertrand Drouvot, Andres Freund, Robert Haas, Fabrizio de Royes 
Mello
---
 doc/src/sgml/func.sgml|  15 +
 src/backend/access/transam/xlogfuncs.c|  32 +
 src/backend/catalog/system_functions.sql  |   2 +
 src/include/catalog/pg_proc.dat   |   3 +
 src/test/perl/PostgreSQL/Test/Cluster.pm  |  37 +
 src/test/recovery/meson.build |   1 +
 .../t/035_standby_logical_decoding.pl | 705 ++
 7 files changed, 795 insertions(+)
   3.1% src/backend/
   4.0% src/test/perl/PostgreSQL/Test/
  89.7% src/test/recovery/t/

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 918a492234..939fb8019f 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -27034,6 +27034,21 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * 
ps.setting::int + :offset
 prepared with .

   
+  
+   
+
+ pg_log_standby_snapshot
+
+pg_log_standby_snapshot ()
+pg_lsn
+   
+   
+Take a snapshot of running transactions and write this to WAL without
+having to wait bgwriter or checkpointer to log one. This one is useful 
for
+logical decoding on standby for which logical slot creation is hanging
+until such a

Re: [Proposal] Add foreign-server health checks infrastructure

2023-04-03 Thread Tom Lane

Fujii Masao  writes:
> Regarding 0001 patch, on second thought, to me, it seems odd to expose
> a function that doesn't have anything to directly do with PostgreSQL
> as a libpq function. The function simply calls poll() on the socket
> with POLLRDHUP if it is supported. While it is certainly convenient to
> have this function, I'm not sure that it fits within the scope of libpq.
> Thought?

Yeah, that does not seem great, partly because the semantics would be
platform-dependent.  I don't think we want that to become part of
libpq's API.

regards, tom lane

Re: Why enable_hashjoin Completely disables HashJoin

2023-04-03 Thread Tom Lane

Robert Haas  writes:
> On Mon, Apr 3, 2023 at 8:13 AM Tom Lane  wrote:
>> Personally, I'd get rid of disable_cost altogether if I could.
>> I'm not in a hurry to extend its use to more places.

> I agree. I've wondered if we should put some work into that. It feels
> bad to waste CPU cycles generating paths we intend to basically just
> throw away, and it feels even worse if they manage to beat out some
> other path on cost.

> It hasn't been obvious to me how we could restructure the existing
> logic to avoid relying on disable_cost.

Yeah.  In some places it would not be too hard; for example, if we
generated seqscan paths last instead of first for baserels, the rule
could be "generate it if enable_seqscan is on OR if we made no other
path for the rel".  It's much messier for joins though, partly because
the same joinrel will be considered multiple times as we process
different join orderings, plus it's usually unclear whether failing
to generate any paths for joinrel X will lead to overall failure.

A solution that would work is to treat disable_cost as a form of infinity
that's counted separately from the actual cost estimate, that is we
label paths as "cost X, plus there are N uses of disabled plan types".
Then you sort first on N and after that on X.  But this'd add a good
number of cycles to add_path, which I've not wanted to expend on a
non-mainstream usage.

regards, tom lane

Re: Non-superuser subscription owners

2023-04-03 Thread Robert Haas

On Sat, Apr 1, 2023 at 12:00 PM Alexander Lakhin  wrote:
> I've managed to reproduce it using the following script:
> for ((i=1;i<=10;i++)); do
> echo "iteration $i"
> echo "
> CREATE ROLE sub_user;
> CREATE SUBSCRIPTION testsub CONNECTION 'dbname=db'
>   PUBLICATION testpub WITH (connect = false);
> ALTER SUBSCRIPTION testsub ENABLE;
> DROP SUBSCRIPTION testsub;
> SELECT pg_sleep(0.001);
> DROP ROLE sub_user;
> " | psql
> psql -c "ALTER SUBSCRIPTION testsub DISABLE;"
> psql -c "ALTER SUBSCRIPTION testsub SET (slot_name = NONE);"
> psql -c "DROP SUBSCRIPTION testsub;"
> grep 'TRAP' server.log && break
> done

After a bit of experimentation this repro worked for me -- I needed
-DRELCACHE_FORCE_RELEASE as well, and a bigger iteration count. I
verified that the patch fixed it, and committed the patch with the
addition of a comment.

Thanks very much for this repro, and likewise many thanks to Hou
Zhijie for the report and patch.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [Proposal] Add foreign-server health checks infrastructure

2023-04-03 Thread Fujii Masao





On 2023/03/13 16:05, Hayato Kuroda (Fujitsu) wrote:

Thank you so much for your reviewing!
Now we can wait comments from senior members and committers.


Thanks for working on this patch!

Regarding 0001 patch, on second thought, to me, it seems odd to expose
a function that doesn't have anything to directly do with PostgreSQL
as a libpq function. The function simply calls poll() on the socket
with POLLRDHUP if it is supported. While it is certainly convenient to
have this function, I'm not sure that it fits within the scope of libpq.
Thought?

Regarding 0002 patch, the behavior of postgres_fdw_verify_connection_states()
in regards to multiple connections using different user mappings seems
not well documented. The function seems to return false if either of
those connections has been closed.

This behavior means that it's difficult to identify which connection
has been closed when there are multiple ones to the given server.
To make it easier to identify that, it could be helpful to extend
the postgres_fdw_verify_connection_states() function so that it accepts
a unique connection as an input instead of just the server name.
One suggestion is to extend the function so that it accepts
both the server name and the user name used for the connection,
and checks the specified connection. If only the server name is specified,
the function should check all connections to the server and return false
if any of them are closed. This would be helpful since there is typically
only one connection to the server in most cases.

Additionally, it would be helpful to extend the postgres_fdw_get_connections()
function to also return the user name used for each connection,
as currently there is no straightforward way to identify it.

The function name "postgres_fdw_verify_connection_states" may seem
unnecessarily long to me. A simpler name like
"postgres_fdw_verify_connection" may be enough?

The patch may not be ready for commit due to the review comments,
and with the feature freeze approaching in a few days,
it may not be possible to include this feature in v16.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Why enable_hashjoin Completely disables HashJoin

2023-04-03 Thread Robert Haas

On Mon, Apr 3, 2023 at 8:13 AM Tom Lane  wrote:
> Personally, I'd get rid of disable_cost altogether if I could.
> I'm not in a hurry to extend its use to more places.

I agree. I've wondered if we should put some work into that. It feels
bad to waste CPU cycles generating paths we intend to basically just
throw away, and it feels even worse if they manage to beat out some
other path on cost.

It hasn't been obvious to me how we could restructure the existing
logic to avoid relying on disable_cost. I sort of feel like it should
be a two-pass algorithm: go through and generate all the path types
that aren't disabled, and then if that results in no paths, try a
do-over where you ignore the disable flags (or just some of them). But
the code structure doesn't seem particularly amenable to that kind of
thing.

This hasn't caused me enough headaches yet that I've been willing to
invest time in it, but it has caused me more than zero headaches...

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: possible proposal plpgsql GET DIAGNOSTICS oid = PG_ROUTINE_OID

2023-04-03 Thread Tom Lane

Pavel Stehule  writes:
> po 27. 3. 2023 v 5:36 odesílatel Kirk Wolak  napsal:
>> I have marked the item Ready for Commiter...

> Thank you for doc and for review

I'm kind of surprised there was any interest in this proposal at all,
TBH, but apparently there is some.  Still, I think you over-engineered
it by doing more than the original proposal of making the function OID
available.  The other things can be had by casting the OID to regproc
or regprocedure, so I'd be inclined to add just one new keyword not
three.  Besides, your implementation is a bit inconsistent: relying
on fn_signature could return a result that is stale or doesn't conform
to the current search_path.

regards, tom lane

Re: [BUG] pg_stat_statements and extended query protocol

2023-04-03 Thread Tom Lane

"Imseih (AWS), Sami"  writes:
>> So...  The idea here is to set a custom fetch size so as the number of
>> calls can be deterministic in the tests, still more than 1 for the
>> tests we'd have.  And your point is that libpq enforces always 0 when
>> sending the EXECUTE message causing it to always return all the rows
>> for any caller of PQsendQueryGuts().

> That is correct.

Hi, I took a quick look through this thread, and I have a couple of
thoughts:

* Yeah, it'd be nice to have an in-core test, but it's folly to insist
on one that works via libpq and psql.  That requires a whole new set
of features that you're apparently designing on-the-fly with no other
use cases in mind.  I don't think that will accomplish much except to
ensure that this bug fix doesn't make it into v16.

* I don't understand why it was thought good to add two new counters
to struct Instrumentation.  In EXPLAIN ANALYZE cases those will be
wasted space *per plan node*, not per Query.

* It also seems quite bizarre to add counters to a core data structure
and then leave it to pg_stat_statements to maintain them.  (BTW, I didn't
much care for putting that maintenance into pgss_ExecutorRun without
updating its header comment.)

I'm inclined to think that adding the counters to struct EState is
fine.  That's 304 bytes already on 64-bit platforms, another 8 or 16
won't matter.

Also, I'm doubtful that counting calls this way is a great idea,
which would mean you only need one new counter field not two.  The
fact that you're having trouble defining what it means certainly
suggests that the implementation is out front of the design.

In short, what I think I'd suggest is adding an es_total_processed
field to EState and having standard_ExecutorRun do "es_total_processed
+= es_processed" near the end, then just change pg_stat_statements to
use es_total_processed not es_processed.

regards, tom lane

Re: Should vacuum process config file reload more often

2023-04-03 Thread Melanie Plageman

On Sun, Apr 2, 2023 at 10:28 PM Masahiko Sawada  wrote:
> Thank you for updating the patches. Here are comments for 0001, 0002,
> and 0003 patches:

Thanks for the review!

v13 attached with requested updates.

>  0001:
>
> @@ -391,7 +389,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
>  Assert(params->index_cleanup != VACOPTVALUE_UNSPECIFIED);
>  Assert(params->truncate != VACOPTVALUE_UNSPECIFIED &&
> params->truncate != VACOPTVALUE_AUTO);
> -vacrel->failsafe_active = false;
> +VacuumFailsafeActive = false;
>
> If we go with the idea of using VacuumCostActive +
> VacuumFailsafeActive, we need to make sure that both are cleared at
> the end of the vacuum per table. Since the patch clears it only here,
> it remains true even after vacuum() if we trigger the failsafe mode
> for the last table in the table list.
>
> In addition to that,  to ensure that also in an error case, I think we
> need to clear it also in PG_FINALLY() block in vacuum().

So, in 0001, I tried to keep it exactly the same as
LVRelState->failsafe_active except for it being a global. We don't
actually use VacuumFailsafeActive in this commit except in vacuumlazy.c,
which does its own management of the value (it resets it to false at the
top of heap_vacuum_rel()).

In the later commit which references VacuumFailsafeActive outside of
vacuumlazy.c, I had reset it in PG_FINALLY(). I hadn't reset it in the
relation list loop in vacuum(). Autovacuum calls vacuum() for each
relation. However, you are right that for VACUUM with a list of
relations for a table access method other than heap, once set to true,
if the table AM forgets to reset the value to false at the end of
vacuuming the relation, it would stay true.

I've set it to false now at the bottom of the loop through relations in
vacuum().

> ---
> @@ -306,6 +306,7 @@ extern PGDLLIMPORT pg_atomic_uint32
> *VacuumSharedCostBalance;
>  extern PGDLLIMPORT pg_atomic_uint32 *VacuumActiveNWorkers;
>  extern PGDLLIMPORT int VacuumCostBalanceLocal;
>
> +extern bool VacuumFailsafeActive;
>
> Do we need PGDLLIMPORT for VacuumFailSafeActive?

I didn't add one because I thought extensions and other code probably
shouldn't access this variable. I thought PGDLLIMPORT was only needed
for extensions built on windows to access variables.

> 0002:
>
> @@ -2388,6 +2398,7 @@ vac_max_items_to_alloc_size(int max_items)
>  return offsetof(VacDeadItems, items) +
> sizeof(ItemPointerData) * max_items;
>  }
>
> +
>  /*
>   * vac_tid_reaped() -- is a particular tid deletable?
>   *
>
> Unnecessary new line. There are some other unnecessary new lines in this 
> patch.

Thanks! I think I got them all.

> ---
> @@ -307,6 +309,8 @@ extern PGDLLIMPORT pg_atomic_uint32 *VacuumActiveNWorkers;
>  extern PGDLLIMPORT int VacuumCostBalanceLocal;
>
>  extern bool VacuumFailsafeActive;
> +extern int VacuumCostLimit;
> +extern double VacuumCostDelay;
>
> and
>
> @@ -266,8 +266,6 @@ extern PGDLLIMPORT int max_parallel_maintenance_workers;
>  extern PGDLLIMPORT int VacuumCostPageHit;
>  extern PGDLLIMPORT int VacuumCostPageMiss;
>  extern PGDLLIMPORT int VacuumCostPageDirty;
> -extern PGDLLIMPORT int VacuumCostLimit;
> -extern PGDLLIMPORT double VacuumCostDelay;
>
> Do we need PGDLLIMPORT too?

I was on the fence about this. I annotated the new guc variables
vacuum_cost_delay and vacuum_cost_limit with PGDLLIMPORT, but I did not
annotate the variables used in vacuum code (VacuumCostLimit/Delay). I
think whether or not this is the right choice depends on two things:
whether or not my understanding of PGDLLIMPORT is correct and, if it is,
whether or not we want extensions to be able to access
VacuumCostLimit/Delay or if just access to the guc variables is
sufficient/desirable.

> ---
> @@ -1773,20 +1773,33 @@ FreeWorkerInfo(int code, Datum arg)
> }
>  }
>
> +
>  /*
> - * Update the cost-based delay parameters, so that multiple workers consume
> - * each a fraction of the total available I/O.
> + * Update vacuum cost-based delay-related parameters for autovacuum workers 
> and
> + * backends executing VACUUM or ANALYZE using the value of relevant gucs and
> + * global state. This must be called during setup for vacuum and after every
> + * config reload to ensure up-to-date values.
>   */
>  void
> -AutoVacuumUpdateDelay(void)
> +VacuumUpdateCosts(void
>
> Isn't it better to define VacuumUpdateCosts() in vacuum.c rather than
> autovacuum.c as this is now a common code for both vacuum and
> autovacuum?

We can't access members of WorkerInfoData from inside vacuum.c

> 0003:
>
> @@ -501,9 +502,9 @@ vacuum(List *relations, VacuumParams *params,
>  {
>  ListCell   *cur;
>
> -VacuumUpdateCosts();
>  in_vacuum = true;
> -VacuumCostActive = (VacuumCostDelay > 0);
> +VacuumFailsafeActive = false;
> +VacuumUpdateCosts();
>
> Hmm, if we initialize

Re: Request for comment on setting binary format output per session

2023-04-03 Thread Dave Cramer

> participating clients to receive GUC configured format.  I do not

> > think that libpq's result format being able to be overridden by GUC
>> > is a good idea at all, the library has to to participate, and I
>> > think can be made to so so without adjusting the interface (say, by
>> > resultFormat = 3).
>>
>> Interesting idea. I suppose you'd need to specify 3 for all result
>> columns? That is a protocol change, but wouldn't "break" older clients.
>> The newer clients would need to make sure that they're connecting to
>> v16+, so using the protocol version alone wouldn't be enough. Hmm.
>>
>
>
So this only works with extended protocol and not simple queries.
Again, as Peter mentioned it's already easy enough to confuse psql using
binary cursors so
it makes sense to fix psql either way.

If you use resultFormat (3) I think you'd still end up doing the Describe
(which we are trying to avoid) to make sure you could receive all the
columns in binary.

Dave

Re: Minimal logical decoding on standbys

2023-04-03 Thread Alvaro Herrera

On 2023-Apr-03, Drouvot, Bertrand wrote:

> +/*
> + * Report terminating or conflicting message.
> + *
> + * For both, logical conflict on standby and obsolete slot are handled.
> + */
> +static void
> +ReportTerminationInvalidation(bool terminating, bool islogical, int pid,
> +   NameData slotname, 
> TransactionId *xid,
> +   XLogRecPtr 
> restart_lsn, XLogRecPtr oldestLSN)
> +{

> + if (terminating)
> + appendStringInfo(_msg, _("terminating process %d to release 
> replication slot \"%s\""),
> +  pid,
> +  NameStr(slotname));
> + else
> + appendStringInfo(_msg, _("invalidating"));
> +
> + if (islogical)
> + {
> + if (terminating)
> + appendStringInfo(_msg, _(" because it conflicts 
> with recovery"));

You can't build the strings this way, because it's not possible to put
the strings into the translation machinery.  You need to write full
strings for each separate case instead, without appending other string
parts later.

Thanks

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Hay quien adquiere la mala costumbre de ser infeliz" (M. A. Evans)

Re: Minimal logical decoding on standbys

2023-04-03 Thread Drouvot, Bertrand


Hi,

On 4/2/23 10:10 PM, Andres Freund wrote:

Hi,

Btw, most of the patches have some things that pgindent will change (and some
that my editor will highlight). It wouldn't hurt to run pgindent for the later
patches...


done.



Pushed the WAL format change.



Thanks!



On 2023-04-02 10:27:45 +0200, Drouvot, Bertrand wrote:

5.3% doc/src/sgml/
6.2% src/backend/access/transam/
4.6% src/backend/replication/logical/
   55.6% src/backend/replication/
4.4% src/backend/storage/ipc/
6.9% src/backend/tcop/
5.3% src/backend/
3.8% src/include/catalog/
5.3% src/include/replication/


I think it might be worth trying to split this up a bit.



Okay. Split in 2 parts in V56 enclosed.

One part to handle logical slot conflicts on standby, and one part
to arrange for a new pg_stat_database_conflicts and pg_replication_slots field.




restart_lsn = s->data.restart_lsn;
-
-   /*
-* If the slot is already invalid or is fresh enough, we don't 
need to
-* do anything.
-*/
-   if (XLogRecPtrIsInvalid(restart_lsn) || restart_lsn >= 
oldestLSN)
+   slot_xmin = s->data.xmin;
+   slot_catalog_xmin = s->data.catalog_xmin;
+
+   /* the slot has been invalidated (logical decoding conflict 
case) */
+   if ((xid && ((LogicalReplicationSlotIsInvalid(s)) ||
+   /* or the xid is valid and this is a non conflicting slot */
+(TransactionIdIsValid(*xid) && 
!(LogicalReplicationSlotXidsConflict(slot_xmin, slot_catalog_xmin, *xid) ||
+   /* or the slot has been invalidated (obsolete LSN case) */
+   (!xid && (XLogRecPtrIsInvalid(restart_lsn) || restart_lsn 
>= oldestLSN)))
{


This still looks nearly unreadable. I suggest moving comments outside of the
if (), remove redundant parentheses, use a function to detect if the slot has
been invalidated.



I made it as simple as:

/*
 * If the slot is already invalid or is a non conflicting slot, we don't
 * need to do anything.
 */
islogical = xid ? true : false;

if (SlotIsInvalid(s, islogical) || SlotIsNotConflicting(s, islogical, xid, 
))

in V56 attached.




@@ -1329,16 +1345,45 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, 
XLogRecPtr oldestLSN,
 */
if (last_signaled_pid != active_pid)
{
+   boolsend_signal = false;
+
+   initStringInfo(_msg);
+   initStringInfo(_detail);
+
+   appendStringInfo(_msg, "terminating process %d to release 
replication slot \"%s\"",
+active_pid,
+
NameStr(slotname));


For this to be translatable you need to use _("message").


Thanks!





+   if (xid)
+   {
+   appendStringInfo(_msg, " because it 
conflicts with recovery");
+   send_signal = true;
+
+   if (TransactionIdIsValid(*xid))
+   appendStringInfo(_detail, "The 
slot conflicted with xid horizon %u.", *xid);
+   else
+   appendStringInfo(_detail, 
"Logical decoding on standby requires wal_level to be at least logical on the primary 
server");
+   }
+   else
+   {
+   appendStringInfo(_detail, "The slot's 
restart_lsn %X/%X exceeds the limit by %llu bytes.",
+
LSN_FORMAT_ARGS(restart_lsn),
+
(unsigned long long) (oldestLSN - restart_lsn));
+   }
+
ereport(LOG,
-   errmsg("terminating process %d to release 
replication slot \"%s\"",
-  active_pid, 
NameStr(slotname)),
-   errdetail("The slot's restart_lsn 
%X/%X exceeds the limit by %llu bytes.",
- 
LSN_FORMAT_ARGS(restart_lsn),
- (unsigned 
long long) (oldestLSN - restart_lsn)),
-   errhint("You might need to increase 
max_slot_wal_keep_size."));
-
-   (void)

Re: Initial Schema Sync for Logical Replication

2023-04-03 Thread Masahiko Sawada

On Mon, Apr 3, 2023 at 3:54 PM Kumar, Sachin  wrote:
>
>
>
> > -Original Message-
> > From: Masahiko Sawada 
> > > > I was thinking each TableSync process will call pg_dump --table,
> > > > This way if we have N tableSync process, we can have N pg_dump --
> > table=table_name called in parallel.
> > > > In fact we can use --schema-only to get schema and then let COPY
> > > > take care of data syncing . We will use same snapshot for pg_dump as 
> > > > well
> > as COPY table.
> > >
> > > How can we postpone creating the pg_subscription_rel entries until the
> > > tablesync worker starts and does the schema sync? I think that since
> > > pg_subscription_rel entry needs the table OID, we need either to do
> > > the schema sync before creating the entry (i.e, during CREATE
> > > SUBSCRIPTION) or to postpone creating entries as Amit proposed[1]. The
> > > apply worker needs the information of tables to sync in order to
> > > launch the tablesync workers, but it needs to create the table schema
> > > to get that information.
> >
> > For the above reason, I think that step 6 of the initial proposal won't 
> > work.
> >
> > If we can have the tablesync worker create an entry of pg_subscription_rel 
> > after
> > creating the table, it may give us the flexibility to perform the initial 
> > sync. One
> > idea is that we add a relname field to pg_subscription_rel so that we can 
> > create
> > entries with relname instead of OID if the table is not created yet. Once 
> > the
> > table is created, we clear the relname field and set the OID of the table 
> > instead.
> > It's not an ideal solution but we might make it simpler later.
> >
> > Assuming that it's feasible, I'm considering another approach for the 
> > initial sync
> > in order to address the concurrent DDLs.
> >
> > The basic idea is to somewhat follow how pg_dump/restore to dump/restore
> > the database data. We divide the synchronization phase (including both 
> > schema
> > and data) up into three phases: pre-data, table-data, post-data. These 
> > mostly
> > follow the --section option of pg_dump.
> >
> > 1. The backend process performing CREATE SUBSCRIPTION creates the
> > subscription but doesn't create pg_subscription_rel entries yet.
> >
> > 2. Before starting the streaming, the apply worker fetches the table list 
> > from the
> > publisher, create pg_subscription_rel entries for them, and dumps+restores
> > database objects that tables could depend on but don't depend on tables 
> > such as
> > TYPE, OPERATOR, ACCESS METHOD etc (i.e.
> > pre-data).
>
> We will not have slot starting snapshot, So somehow we have to get a new 
> snapshot
> And skip all the wal_log between starting of slot and snapshot creation lsn ? 
> .

Yes. Or we can somehow postpone creating pg_subscription_rel entries
until the tablesync workers create tables, or we request walsender to
establish a new snapshot using a technique proposed in email[1].

>
> >
> > 3. The apply worker launches the tablesync workers for tables that need to 
> > be
> > synchronized.
> >
> > There might be DDLs executed on the publisher for tables before the 
> > tablesync
> > worker starts. But the apply worker needs to apply DDLs for pre-data 
> > database
> > objects. OTOH, it can ignore DDLs for not-synced-yet tables and other 
> > database
> > objects such as INDEX, TRIGGER, RULE, etc (i.e. post-data).
> >
> > 4. The tablesync worker creates its replication slot, dumps+restores the 
> > table
> > schema, update the pg_subscription_rel, and perform COPY.
> >
> > These operations should be done in the same transaction.
>
> pg_restore wont be rollbackable, So we need to maintain states in 
> pg_subscription_rel.

Yes. But I think it depends on how we restore them. For example, if we
have the tablesync worker somethow restore the table using a new SQL
function returning the table schema as we discussed or executing the
dump file via SPI, we can do that in the same transaction.

>
> >
> > 5. After finishing COPY, the tablesync worker dumps indexes (and perhaps
> > constraints) of the table and creates them (which possibly takes a long 
> > time).
> > Then it starts to catch up, same as today. The apply worker needs to wait 
> > for the
> > tablesync worker to catch up.
>
> I don’t think we can have CATCHUP stage. We can have a DDL on publisher which
> can add a new column (And this DDL will be executed by applier later). Then 
> we get a INSERT
>  because we have old definition of table, insert will fail.

All DMLs and DDLs associated with the table being synchronized are
applied by the tablesync worker until it catches up with the apply
worker.

>
> >
> > We need to repeat these steps until we complete the initial data copy and 
> > create
> > indexes for all tables, IOW until all pg_subscription_rel status becomes 
> > READY.
> >
> > 6. If the apply worker confirms all tables are READY, it starts another sync
> > worker who is responsible for the post-data database objects such as 
> >

Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-04-03 Thread Melanie Plageman

On Mon, Apr 3, 2023 at 1:09 AM David Rowley  wrote:
>
> On Sat, 1 Apr 2023 at 13:24, Melanie Plageman  
> wrote:
> > Your diff LGTM.
> >
> > Earlier upthread in [1], Bharath had mentioned in a review comment about
> > removing the global variables that he would have expected the analogous
> > global in analyze.c to also be removed (vac_strategy [and analyze.c also
> > has anl_context]).
> >
> > I looked into doing this, and this is what I found out (see full
> > rationale in [2]):
> >
> > > it is a bit harder to remove it from analyze because acquire_func
> > > doesn't take the buffer access strategy as a parameter and
> > > acquire_sample_rows uses the vac_context global variable to pass to
> > > table_scan_analyze_next_block().
> >
> > I don't know if this is worth mentioning in the commit removing the
> > other globals? Maybe it will just make it more confusing...
>
> I did look at that, but it seems a little tricky to make work unless
> the AcquireSampleRowsFunc signature was changed. To me, it just does
> not seem worth doing that to get rid of the two globals in analyze.c.

Yes, I came to basically the same conclusion.

On Mon, Apr 3, 2023 at 7:57 AM David Rowley  wrote:
>
> I've now pushed up v8-0004.  Can rebase the remaining 2 patches on top
> of master again and resend?

v9 attached.

> On Mon, 3 Apr 2023 at 08:11, Melanie Plageman  
> wrote:
> > I still have a few open questions:
> > - what the initial value of ring_size for autovacuum should be (see the
> >   one remaining TODO in the code)
>
> I assume you're talking about the 256KB BAS_VACUUM one set in
> GetAccessStrategy()? I don't think this patch should be doing anything
> to change those defaults.  Anything that does that should likely have
> a new thread and come with analysis or reasoning about why the newly
> proposed defaults are better than the old ones.

I actually was talking about something much more trivial but a little
more confusing.

In table_recheck_autovac(), I initialize the
autovac_table->at_params.ring_size to the value of the
vacuum_buffer_usage_limit guc. However, autovacuum makes its own
BufferAccessStrategy object (instead of relying on vacuum() to do it)
and passes that in to vacuum(). So, if we wanted autovacuum to disable
use of a strategy (and use as many shared buffers as it likes), it would
pass in NULL to vacuum(). If vauum_buffer_usage_limit is not 0, then we
would end up making and using a BufferAccessStrategy in vacuum().

If we instead initialized autovac_table->at_params.ring_size to 0, even
if the passed in BufferAccessStrategy is NULL, we wouldn't make a ring
for autovacuum. Right now, we don't disable the strategy for autovacuum
except in failsafe mode. And it is unclear when or why we would want to.

I also thought it might be weird to have the value of the ring_size be
initialized to something other than the value of
vacuum_buffer_usage_limit for autovacuum, since it is supposed to use
that guc value.

In fact, right now, we don't use the autovac_table->at_params.ring_size
set in table_recheck_autovac() when making the ring in do_autovacuum()
but instead use the guc directly.

I actually don't really like how vacuum() relies on the
BufferAccessStrategy parameter being NULL for autovacuum and feel like
there is a more intuitive way to handle all this. But, I didn't want to
make major changes at this point.

Anyway, the above is quite a bit more analysis than the issue is really
worth. We should pick something and then document it in a comment.

> > - should ANALYZE allow specifying BUFFER_USAGE_LIMIT since it uses the guc
> >   value when that is set?
>
> That's a good question...

I kinda think we should just skip it. It adds to the surface area of the
feature.

> > - should INDEX_CLEANUP off cause VACUUM to use shared buffers and
> >   disable use of a strategy (like failsafe vacuum)
>
> I don't see why it should.  It seems strange to have one option
> magically make changes to some other option.

Sure, sounds good.

> > - should we add anything to VACUUM VERBOSE output about the number of
> >   reuses of strategy buffers?
>
> Sounds like this would require an extra array of counter variables in
> BufferAccessStrategyData?  I think it might be a bit late to start
> experimenting with this.

Makes sense. I hadn't thought through the implementation. We count reuses in
pg_stat_io data structures but that is global and not per
BufferAccessStrategyData instance, so I agree to scrapping this idea.

> > - Should we make BufferAccessStrategyData non-opaque so that we don't
> >   have to add a getter for nbuffers. I could have implemented this in
> >   another way, but I don't really see why BufferAccessStrategyData
> >   should be opaque
>
> If nothing outside of the .c file requires access then there's little
> need to make the members known outside of the file. Same as you'd want
> to make classes private rather than public when possible in OOP.
>
> If you do come up with a reason to be able to determine

Re: is_superuser is not documented

2023-04-03 Thread Fujii Masao





On 2023/04/01 22:34, Joseph Koshakow wrote:

The patch updated the guc table for is_superuser in
src/backend/utils/misc/guc_tables.c


Yes, this patch moves the descriptions of is_superuser to config.sgml
and changes its group to PRESET_OPTIONS.


However, when I look at the code on master I don't see this update


Yes, the patch has not been committed yet because of lack of review comments.
Do you have any review comments on this patch?
Or you think it's ready for committer?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: daitch_mokotoff module

2023-04-03 Thread Tomas Vondra

On 4/3/23 15:19, Dag Lem wrote:
> Dag Lem  writes:
> 
>> I sincerely hope this resolves any blocking issues with copyright /
>> legalese / credits.
>>
> 
> Can this now be considered ready for commiter, so that Paul or someone
> else can flip the bit?
> 

Hi, I think from the technical point of view it's sound and ready for
commit. The patch stalled on the copyright/credit stuff, which is
somewhat separate and mostly non-technical aspect of patches. Sorry for
that, I'm sure it's annoying/frustrating :-(

I see the current patch has two simple lines:

 * This module was originally sponsored by Finance Norway /
 * Trafikkforsikringsforeningen, and implemented by Dag Lem

Any objections to this level of attribution in commnents?

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: running logical replication as the subscription owner

2023-04-03 Thread Robert Haas

Thank you for this email. It's very helpful to get your opinion on this.

On Sun, Apr 2, 2023 at 11:21 PM Noah Misch  wrote:
> On Wed, Mar 29, 2023 at 04:00:45PM -0400, Robert Haas wrote:
> > > The dangerous cases seem to be something along the lines of a security-
> > > invoker trigger function that builds and executes arbirary SQL based on
> > > the row contents. And then the table owner would then still need to set
> > > ENABLE ALWAYS TRIGGER.
> > >
> > > Do we really want to take that case on as our security responsibility?
> >
> > That's something about which I would like to get more opinions.
>
> The most-plausible-to-me attack involves an ENABLE ALWAYS trigger that logs
> CURRENT_USER to an audit table.  The "SQL based on the row contents" scenario
> feels remote.  Another remotely-possible attack involves a trigger that
> internally queries some other table having RLS.  (Switching to the table owner
> can change the rows visible to that query.)

I had thought of the first of these cases, but not the second one.

> If having INSERT/UPDATE privileges on the table were enough to make a
> subscription that impersonates the table owner, then relatively-unprivileged
> roles could make a subscription to bypass the aforementioned auditing.  Commit
> c3afe8c has imposed weighty requirements beyond I/U privileges, namely holding
> the pg_create_subscription role and database-level CREATE privilege.  Since
> database-level CREATE is already powerful, it would be plausible to drop the
> SET ROLE requirement and add this audit bypass to its powers.  The SET ROLE
> requirement is nice for keeping the powers disentangled.  One drawback is
> making people do GRANTs regardless of whether a relevant audit trigger exists.
> Another drawback is the subscription role having more privileges than ideally
> needed.  I do like keeping strong privileges orthogonal, so I lean toward
> keeping the SET ROLE requirement.

The orthogonality argument weighs extremely heavily with me in this
case. As I said to Jeff, I would not mind having a more granular way
to control which tables a user can replicate into; e.g. a grantable
REPLICAT{E,ION} privilege, or we want something global we could have a
predefined role for it, e.g. pg_replicate_into_any_table. But I think
any such thing should definitely be separate from
pg_create_subscription.

I'll fix the typos. Thanks for reporting them.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: running logical replication as the subscription owner

2023-04-03 Thread Robert Haas

On Fri, Mar 31, 2023 at 6:46 PM Jeff Davis  wrote:
> I guess the "more convenient" is where I'm confused, because the "grant
> subscription_owner to table owner with set role true" is not likely to
> be conveniently already present; it would need to be issued manually to
> take advantage of this special case.

You and I disagree about the likelihood of that, but I could well be wrong.

> Do you have any concern about the weirdness where assigning the
> subscription to a higher-privilege user Z would cause B's trigger to
> fail?

Not very much. I think the biggest risk is user confusion, but I don't
think that's a huge risk because I don't think this scenario will come
up very often. Also, it's kind of hard to imagine that there's a
security model here which never does anything potentially surprising.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [EXTERNAL] Re: [PATCH] Report the query string that caused a memory error under Valgrind

2023-04-03 Thread Tom Lane

Onur Tirtir  writes:
> Thank you for sharing your proposal as a patch. It looks much nicer and 
> useful than mine.
> I've also tested it for a few cases --by injecting a memory error on 
> purpose-- and seen that it helps reporting the problematic query.
> Should we move forward with v3 then?

OK, I pushed v3 as-is.  We can refine it later if anyone has suggestions.

Thanks for the contribution!

regards, tom lane

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-04-03 Thread Pavel Borisov

Upon Alexander reverting patches v15 from master, I've rebased what
was correction patches v4 in a message above on a fresh master
(together with patches v15). The resulting patch v16 is attached.

Pavel.


v16-0002-Add-EvalPlanQual-delete-returning-isolation-test.patch
Description: Binary data


v16-0001-Allow-locking-updated-tuples-in-tuple_update-and.patch
Description: Binary data

Re: Infinite Interval

2023-04-03 Thread Ashutosh Bapat

Hi Joseph,

On Mon, Apr 3, 2023 at 6:02 AM Joseph Koshakow  wrote:

>
> I've attached a patch with all of the errcontext calls removed. None of
> the existing out of range errors have an errdetail call so I think this
> is more consistent. If we do want to add errdetail, then we should
> probably do it in a later patch and add it to all out of range errors,
> not just the ones related to infinity.

Hmm, I realize my errcontext suggestion was in wrong direction. We can
use errdetail if required in future. But not for this patch.

Here are comments on the test and output.

+ infinity  | | |
  ||  Infinity |  Infinity |   | |  Infinity |
Infinity |  Infinity |   Infinity |  Infinity
+ -infinity | | |
  || -Infinity | -Infinity |   | | -Infinity |
-Infinity | -Infinity |  -Infinity | -Infinity

This is more for my education. It looks like for oscillating units we report
NULL here but for monotonically increasing units we report infinity. I came
across those terms in the code. But I didn't find definitions of those terms.
Can you please point me to the document/resources defining those terms.

diff --git a/src/test/regress/sql/horology.sql
b/src/test/regress/sql/horology.sql
index f7f8c8d2dd..1d0ab322c0 100644
--- a/src/test/regress/sql/horology.sql
+++ b/src/test/regress/sql/horology.sql
@@ -207,14 +207,17 @@ SELECT t.d1 AS t, i.f1 AS i, t.d1 + i.f1 AS
"add", t.d1 - i.f1 AS "subtract"
   FROM TIMESTAMP_TBL t, INTERVAL_TBL i
   WHERE t.d1 BETWEEN '1990-01-01' AND '2001-01-01'
 AND i.f1 BETWEEN '00:00' AND '23:00'
+AND isfinite(i.f1)

I removed this and it did not have any effect on results. I think the
isfinite(i.f1) is already covered by the two existing conditions.

 SELECT t.f1 AS t, i.f1 AS i, t.f1 + i.f1 AS "add", t.f1 - i.f1 AS "subtract"
   FROM TIME_TBL t, INTERVAL_TBL i
+  WHERE isfinite(i.f1)
   ORDER BY 1,2;

 SELECT t.f1 AS t, i.f1 AS i, t.f1 + i.f1 AS "add", t.f1 - i.f1 AS "subtract"
   FROM TIMETZ_TBL t, INTERVAL_TBL i
+  WHERE isfinite(i.f1)
   ORDER BY 1,2;

 -- SQL9x OVERLAPS operator
@@ -287,11 +290,12 @@ SELECT f1 AS "timestamp"

 SELECT d.f1 AS "timestamp", t.f1 AS "interval", d.f1 + t.f1 AS plus
   FROM TEMP_TIMESTAMP d, INTERVAL_TBL t
+  WHERE isfinite(t.f1)
   ORDER BY plus, "timestamp", "interval";

 SELECT d.f1 AS "timestamp", t.f1 AS "interval", d.f1 - t.f1 AS minus
   FROM TEMP_TIMESTAMP d, INTERVAL_TBL t
-  WHERE isfinite(d.f1)
+  WHERE isfinite(t.f1)
   ORDER BY minus, "timestamp", "interval";

IIUC, the isfinite() conditions are added to avoid any changes to the
output due to new
values added to INTERVAL_TBL. Instead, it might be a good idea to not add these
conditions and avoid extra queries testing infinity arithmetic in interval.sql,
timestamptz.sql and timestamp.sql like below

+
+-- infinite intervals

... some lines folded

+
+SELECT date '1995-08-06' + interval 'infinity';
+SELECT date '1995-08-06' + interval '-infinity';
+SELECT date '1995-08-06' - interval 'infinity';
+SELECT date '1995-08-06' - interval '-infinity';

... block truncated

With that I have reviewed the entire patch-set. Once you address these
comments, we can mark it as ready for committer. I already see Tom
looking at the patch. So that might be just a formality.

-- 
Best Wishes,
Ashutosh Bapat

Re: GUC for temporarily disabling event triggers

2023-04-03 Thread Robert Haas

On Mon, Apr 3, 2023 at 9:15 AM Daniel Gustafsson  wrote:
> > On 3 Apr 2023, at 15:09, Robert Haas  wrote:
> > On Mon, Apr 3, 2023 at 8:46 AM Daniel Gustafsson  wrote:
> >> All comments above addressed in the attached v5, thanks for review!
> >
> > I continue to think it's odd that the sense of this is inverted as
> > compared with row_security.
>
> I'm not sure I follow.  Do you propose that the GUC enables classes of event
> triggers, the default being "all" (or similar) and one would remove the type 
> of
> EVT for which debugging is needed?  That doesn't seem like a bad idea, just 
> one
> that hasn't come up in the discussion (and I didn't think about).

Right. Although to be fair, that idea doesn't sound as good if we're
going to have settings other than "on" or "off". If this is just
disable_event_triggers = on | off, then why not flip the sense around
and make it event_triggers = off | on, just as we do for row_security?
But if we're going to allow specific types of event triggers to be
disabled, and we think it's likely that people will want to disable
one specific type of event trigger while leaving the others alone,
that might not be very convenient, because you could end up having to
list all the things you do want instead of the one thing you don't
want. On the third hand, in other contexts, I've often advocating for
giving options a positive sense (what are we going to do?) rather than
a negative sense (what are we not going to do?). For example, the
TIMING option to EXPLAIN was originally proposed with a name like
DISABLE_TIMING or something, and the value inverted, and I said let's
not do that. And similarly in other places. A case where I didn't
follow that principle is VACUUM (DISABLE_PAGE_SKIPPING) which now
seems like a wart to me. Why isn't it VACUUM (PAGE_SKIPPING) again
with the opposite value?

I'm not sure what the best thing to do is here, I just think it
deserves some thought.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Infinite Interval

2023-04-03 Thread Ashutosh Bapat

Hi Joseph,
thanks for addressing comments.

On Sat, Apr 1, 2023 at 10:53 PM Joseph Koshakow  wrote:
> So I added a check for FLOAT8_FITS_IN_INT64() and a test with this
> scenario.

I like that. Thanks.

>
> For what it's worth I think that 2147483647 months only became a valid
> interval in v15 as part of this commit [0]. It's also outside of the
> documented valid range [1], which is
> [-17800 years, 17800 years] or
> [-1483 months, 1483 months].

you mean +/-213600 months :). In that sense the current code
actually fixes a bug introduced in v15. So I am fine with it.

>
> The rationale for only checking the month's field is that it's faster
> than checking all three fields, though I'm not entirely sure if it's
> the right trade-off. Any thoughts on this?

Hmm, comparing one integer is certainly faster than comparing three.
We do that check at least once per interval operation. So the thrice
CPU cycles might show some impact when millions of rows are processed.

Given that we have clear documentation of bounds, just using months
field is fine. If needed we can always expand it later.

>
> > There's some way to avoid separate checks for infinite-ness of
> > interval and factor and use a single block using some integer
> > arithmetic. But I think this is more readable. So I avoided doing
> > that. Let me know if this works for you.
>
> I think the patch looks good, I've combined it with the existing patch.
>
> > Also added some test cases.
>
> I didn't see any tests in the patch, did you forget to include it?

Sorry I forgot to include those. Attached.

Please see my reply to your latest email as well.


--
Best Wishes,
Ashutosh Bapat
diff --git a/src/test/regress/expected/interval.out b/src/test/regress/expected/interval.out
index 0adfe5f9fb..ebb4208ad7 100644
--- a/src/test/regress/expected/interval.out
+++ b/src/test/regress/expected/interval.out
@@ -2161,6 +2161,26 @@ SELECT interval '-infinity' * 'nan';
 ERROR:  interval out of range
 SELECT interval '-1073741824 months -1073741824 days -4611686018427387904 us' * 2;
 ERROR:  interval out of range
+SELECT interval 'infinity' * 0;
+ERROR:  interval out of range
+SELECT interval '-infinity' * 0;
+ERROR:  interval out of range
+SELECT interval '0 days' * 'infinity'::float;
+ERROR:  interval out of range
+SELECT interval '0 days' * '-infinity'::float;
+ERROR:  interval out of range
+SELECT interval '5 days' * 'infinity'::float;
+ ?column? 
+--
+ infinity
+(1 row)
+
+SELECT interval '5 days' * '-infinity'::float;
+ ?column?  
+---
+ -infinity
+(1 row)
+
 SELECT interval 'infinity' / 'infinity';
 ERROR:  interval out of range
 SELECT interval 'infinity' / '-infinity';
diff --git a/src/test/regress/sql/interval.sql b/src/test/regress/sql/interval.sql
index 1e1d8560bf..549ceb57c1 100644
--- a/src/test/regress/sql/interval.sql
+++ b/src/test/regress/sql/interval.sql
@@ -690,6 +690,12 @@ SELECT -interval '-2147483647 months -2147483647 days -9223372036854775807 us';
 SELECT interval 'infinity' * 'nan';
 SELECT interval '-infinity' * 'nan';
 SELECT interval '-1073741824 months -1073741824 days -4611686018427387904 us' * 2;
+SELECT interval 'infinity' * 0;
+SELECT interval '-infinity' * 0;
+SELECT interval '0 days' * 'infinity'::float;
+SELECT interval '0 days' * '-infinity'::float;
+SELECT interval '5 days' * 'infinity'::float;
+SELECT interval '5 days' * '-infinity'::float;
 
 SELECT interval 'infinity' / 'infinity';
 SELECT interval 'infinity' / '-infinity';

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-04-03 Thread Pavel Borisov

Hi, Alexander!

On 2023-04-02 03:37:19 +0300, Alexander Korotkov wrote:
> On Sat, Apr 1, 2023 at 8:21 AM Andres Freund  wrote:
> > Given that the in-tree state has been broken for a week, I think it probably
> > is time to revert the commits that already went in.
>
> The revised patch is attached.  The most notable change is getting rid
> of LazyTupleTableSlot.  Also get rid of complex computations to detect
> how to initialize LazyTupleTableSlot.  Instead just pass the oldSlot
> as an argument of ExecUpdate() and ExecDelete().  The price for this
> is just preallocation of ri_oldTupleSlot before calling ExecDelete().
> The slot allocation is quite cheap.  After all wrappers it's
> table_slot_callbacks(), which is very cheap, single palloc() and few
> fields initialization.  It doesn't seem reasonable to introduce an
> infrastructure to evade this.
>
> I think patch resolves all the major issues you've highlighted.  Even
> if there are some minor things missed, I'd prefer to push this rather
> than reverting the whole work.

I looked into the latest patch v3.
In my view, it addresses all the issues discussed in [1]. Also, with
the pushing oldslot logic outside  code becomes more transparent. I've
added some very minor modifications to the code and comments in patch
v4-0001. Also, I'm for committing Andres' isolation test. I've added
some minor revisions to make the test run routinely among the other
isolation tests. The test could also be made a part of the existing
eval-plan-qual.spec, but I have left it separate yet.

Also, I think that signatures of ExecUpdate() and ExecDelete()
functions, especially the last one are somewhat overloaded with
different status bool variables added by different authors on
different occasions. If they are combined into some kind of status
variable, it would be nice. But as this doesn't touch API, is not
related to the current update/delete optimization, it could be
modified anytime in the future as well.

The changes that indeed touch API are adding TupleTableSlot and
conversion of bool wait flag into now four-state options variable for
tuple_update(), tuple_delete(), heap_update(), heap_delete() and
heap_lock_tuple() and a couple of Exec*DeleteTriggers(). I think they
are justified.

One thing that is not clear to me is that we pass oldSlot into
simple_table_tuple_update() whereas as per the comment on this
function "concurrent updates of
the target tuple is not expected (for example, because we have a lock
on the relation associated with the tuple)". It seems not to break
anything but maybe this could be simplified.

Overall I think the patch is good enough.

Regards,
Pavel Borisov,
Supabase.

[1] 
https://www.postgresql.org/message-id/CAPpHfdtwKb5UVXKkDQZYW8nQCODy0fY_S7mV5Z%2Bcg7urL%3DzDEA%40mail.gmail.com

v4-0002-Add-EvalPlanQual-delete-returning-isolation-test.patch
Description: Binary data

v4-0001-Revise-changes-in-764da7710b-and-11470f544e.patch
Description: Binary data

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-04-03 Thread Alexander Korotkov

On Sun, Apr 2, 2023 at 3:47 AM Andres Freund  wrote:
> On 2023-04-02 03:37:19 +0300, Alexander Korotkov wrote:
> > On Sat, Apr 1, 2023 at 8:21 AM Andres Freund  wrote:
> > > Given that the in-tree state has been broken for a week, I think it 
> > > probably
> > > is time to revert the commits that already went in.
> >
> > The revised patch is attached.  The most notable change is getting rid
> > of LazyTupleTableSlot.  Also get rid of complex computations to detect
> > how to initialize LazyTupleTableSlot.  Instead just pass the oldSlot
> > as an argument of ExecUpdate() and ExecDelete().  The price for this
> > is just preallocation of ri_oldTupleSlot before calling ExecDelete().
> > The slot allocation is quite cheap.  After all wrappers it's
> > table_slot_callbacks(), which is very cheap, single palloc() and few
> > fields initialization.  It doesn't seem reasonable to introduce an
> > infrastructure to evade this.
> >
> > I think patch resolves all the major issues you've highlighted.  Even
> > if there are some minor things missed, I'd prefer to push this rather
> > than reverting the whole work.
>
> Shrug. You're designing new APIs, days before the feature freeze. This just
> doesn't seem ready in time for 16. I certainly won't have time to look at it
> sufficiently in the next 5 days.

OK.  Reverted.

--
Regards,
Alexander Korotkov

Add missing copyright for pg_upgrade/t/* files

2023-04-03 Thread Hayato Kuroda (Fujitsu)

Dear hackers,

While reading codes, I noticed that pg_upgrade/t/001_basic.pl and
pg_upgrade/t/002_pg_upgrade.pl do not contain the copyright.

I checked briefly and almost all files have that, so I thought they missed it.
PSA the patch to fix them.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



add_copyright.patch
Description: add_copyright.patch

RE: Support logical replication of DDLs

2023-04-03 Thread Phil Florent

Hi,
Sorry about the list. Since it was a question about the specifications I 
thought I had to ask it first in the general list. I will reply in the hackers 
list only for new features.

Replicating from orcl to postgres was difficult. You mentionned renaming of 
columns, the ordinal position of a column is reused with a drop/add column in 
orcl and you can wrongly think it is a renaming from an external point of view. 
Only "advantage" with orcl is that you can drop/add columns thousands of times 
if you want, not with postgres.
 From PostgreSQL to PostgreSQL it's now easier of course but difficulty is that 
we have to separate DDL things. The "+" things have to be executed first on the 
replicated db (new tables, new columns, enlargement of columns). The "-" things 
have to be executed first on the source db (dropped tables, dropped columns, 
downsize of columns). DSS and OLTP teams are different, OLTP teams cannot or 
don't want to deal with DSS concerns etc.  If replication is delayed it's not 
so trivial anway to know when you can drop a table on the replicated db for 
example. DSS team has in fact to build a system that detects a posteriori why 
the subscription is KO if something goes wrong. It can also be a human mistake, 
e.g a "create table very_important_table_to_save as select * from 
very_important_table;" and the replication is KO if the _save table is created 
in the published schema.
I had read too fast. I read the proposals and Vignesh suggestion & syntax seem 
very promising. If I understand well an existing "for all tables" / "tables in 
schema" DML publication would have be to altered with
ALTER PUBLICATION 
simple_applicative_schema_replication_that_wont_be_interrupted_for_an_rdbms_reason
 WITH (ddl = 'table:create,alter'); to get rid of the majority of possible 
interruptions.

> Additionally, there could be some additional problems to deal with
> like say if the column list has been specified then we ideally
> shouldn't send those columns even in DDL. For example, if one uses
> Alter Table .. Rename Column and the new column name is not present in
> the published column list then we shouldn't send it.
Perhaps I miss something but the names are not relevant here. The column is 
part of the publication and the corresponding DDL has to be sent, the column is 
not part of the publication and the DDL should not be sent. Dependencies are 
not based on names, it currently works like that with DML publication but also 
with views for example.
Quick test :

bas=# \dRp+

 Publication test_phil

Propriétaire | Toutes les tables | Insertions | Mises à jour | Suppressions | 
Tronque | Via la racine

--+---++--+--+-+---

postgres | f | t  | t| t| t 
  | f

Tables :

"test_phil.t1" (c1, c2)

bas=# alter table test_phil.t1 rename column c2 to c4;

ALTER TABLE

bas=# \dRp+

 Publication test_phil

Propriétaire | Toutes les tables | Insertions | Mises à jour | Suppressions | 
Tronque | Via la racine

--+---++--+--+-+---

postgres | f | t  | t| t| t 
  | f

Tables :

"test_phil.t1" (c1, c4)

"rename column" DDL has to be sent and the new name is not relevant in the 
decision to send it. If "rename column" DDL had concerned a column that is not 
part of the publication you wouldn't have to send the DDL, no matter the new 
name. Drop is not a problem. You cannot drop an existing column that is part of 
a publication without a "cascade". What could be problematic is a "add column" 
DDL and after that the column is added to the publication via "alter 
publication set". Such a case is difficult to deal with I guess. But the 
initial DDL to create a table is also not sent anyway right ?  It could be a 
known limitation.

I usually only test things in beta to report but I will try to have a look 
earlier at this patch since it is very interesting. That and the TDE thing but 
TDE is an external obligation and not a real interest. I obtained a delay but 
hopefully we will have this encryption thing or perhaps we will be obliged to 
go back to the proprietary RDBMS for some needs even if the feature is in fact 
mostly useless...

Best regards,
Phil

De : Amit Kapila 
Envoyé : lundi 3 avril 2023 06:07
À : Phil Florent 
Cc : vignesh C ; houzj.f...@fujitsu.com 
; Ajin Cherian ; 
wangw.f...@fujitsu.com ; Runqi Tian 
; Peter Smith ; Tom Lane 
; li jie ; Dilip Kumar 
; Alvaro Herrera ; Masahiko 
Sawada ; Japin Li ; rajesh 
singarapu ; Zheng Li ; PostgreSQL 
Hackers 
Objet : Re: Support logical replication of DDLs

On Sun, Apr 2, 2023 at 3:25 PM Phil Florent  wrote:
>
> As an end-user, I am highly interested in the

Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

2023-04-03 Thread Tom Lane

Thomas Munro  writes:
> I think this experiment worked out pretty well.  I think it's a nice
> side-effect that you can see what memory the regexp subsystem is
> using, and that's likely to lead to more improvements.  (Why is it
> limited to caching 32 entries?  Why is it a linear search, not a hash
> table?  Why is LRU implemented with memmove() and not a list?  Could
> we have a GUC regex_cache_memory, so someone who uses a lot of regexes
> can opt into a large cache?)  On the other hand it also uses a bit
> more RAM, like other code using the reparenting trick, which is a
> topic for future research.

> I vote for proceeding with this approach.  I wish we didn't have to
> tackle either a regexp interface/management change (done here) or a
> CFI() redesign (not done, but probably also a good idea for other
> reasons) before getting this signal stuff straightened out, but here
> we are.  This approach seems good to me.  Anyone have a different
> take?

Sorry for not looking at this sooner.  I am okay with the regex
changes proposed in v5-0001 through 0003, but I think you need to
take another mopup pass there.  Some specific complaints:
* header comment for pg_regprefix has been falsified (s/malloc/palloc/)
* in spell.c, regex_affix_deletion_callback could be got rid of
* check other callers of pg_regerror for now-useless CHECK_FOR_INTERRUPTS

In general there's a lot of comments referring to regexes being malloc'd.
I'm disinclined to change the ones inside the engine, because as far as
it knows it is still using malloc, but maybe we should work harder on
our own comments.  In particular, it'd likely be useful to have something
somewhere pointing out that pg_regfree is only needed when you can't
get rid of the regex by context cleanup.  Maybe write a short section
about memory management in backend/regex/README?

I've not really looked at 0004.

regards, tom lane

Re: daitch_mokotoff module

2023-04-03 Thread Dag Lem

Dag Lem  writes:

> I sincerely hope this resolves any blocking issues with copyright /
> legalese / credits.
>

Can this now be considered ready for commiter, so that Paul or someone
else can flip the bit?

Best regards
Dag Lem

Re: GUC for temporarily disabling event triggers

2023-04-03 Thread Daniel Gustafsson

> On 3 Apr 2023, at 15:09, Robert Haas  wrote:
> 
> On Mon, Apr 3, 2023 at 8:46 AM Daniel Gustafsson  wrote:
>> All comments above addressed in the attached v5, thanks for review!
> 
> I continue to think it's odd that the sense of this is inverted as
> compared with row_security.

I'm not sure I follow.  Do you propose that the GUC enables classes of event
triggers, the default being "all" (or similar) and one would remove the type of
EVT for which debugging is needed?  That doesn't seem like a bad idea, just one
that hasn't come up in the discussion (and I didn't think about).

--
Daniel Gustafsson

Re: RFC: logical publication via inheritance root?

2023-04-03 Thread Aleksander Alekseev

Hi,

> > Outside the scope of special TimescaleDB tables and the speculated
> > pg_partman old-style table migration, will this proposed new feature
> > have any other application? In other words, do you know if this
> > proposal will be of any benefit to the *normal* users who just have
> > native PostgreSQL inherited tables they want to replicate?
>
> I think it comes down to why an inheritance scheme was used. If it's
> because you want to group rows into named, queryable subsets (e.g. the
> "cities/capitals" example in the docs [1]), I don't think this has any
> utility, because I assume you'd want to replicate your subsets as-is.
>
> But if it's because you've implemented a partitioning scheme of your
> own (the docs still list reasons you might want to [2], even today),
> and all you ever really do is interact with the root table, I think
> this feature will give you some of the same benefits that
> publish_via_partition_root gives native partition users. We're very
> much in that boat, but I don't know how many others are.

I would like to point out that inheritance is merely a tool for
modeling data. Its use cases are not limited to only partitioning,
although many people ended up using it for this purpose back when we
didn't have a proper built-in partitioning. So unless we are going to
remove inheritance in nearest releases (*) I believe it should work
with logical replication in a sane and convenient way.

Correct me if I'm wrong, but I got an impression that the patch tries
to accomplish just that.

(*) Which personally I believe would be a good change. Unlikely to
happen, though.

-- 
Best regards,
Aleksander Alekseev

Re: GUC for temporarily disabling event triggers

2023-04-03 Thread Robert Haas

On Mon, Apr 3, 2023 at 8:46 AM Daniel Gustafsson  wrote:
> All comments above addressed in the attached v5, thanks for review!

I continue to think it's odd that the sense of this is inverted as
compared with row_security.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: GUC for temporarily disabling event triggers

2023-04-03 Thread Daniel Gustafsson

> On 2 Apr 2023, at 21:48, Justin Pryzby  wrote:

> +   gettext_noop("Disable event triggers for the duration 
> of the session."),
> 
> Why does is it say "for the duration of the session" ?
> 
> It's possible to disable ignoring, and within the same session.
> GUCs are typically "for the duration of the session" .. but they don't
> say so (and don't need to).
> 
> +   elog(ERROR, "unsupport event trigger: %d", event);
> 
> typo: unsupported
> 
> +Allows to temporarily disable event triggers from executing in order
> 
> => Allow temporarily disabling execution of event triggers ..
> 
> +to troubleshoot and repair faulty event triggers. The value matches
> +the type of event trigger to be ignored:
> +ddl_command_start, 
> ddl_command_end,
> +table_rewrite and sql_drop.
> +Additionally, all event triggers can be disabled by setting it to
> +all. Setting the value to none
> +will ensure that all event triggers are enabled, this is the default
> 
> It doesn't technically "ensure that they're enabled", since they can be
> disabled by ALTER.  Better to say that it "doesn't disable any even triggers".

All comments above addressed in the attached v5, thanks for review!

--
Daniel Gustafsson



v5-0001-Add-GUC-for-temporarily-disabling-event-triggers.patch
Description: Binary data

Re: Sketch of a fix for that truncation data corruption issue

2023-04-03 Thread Tom Lane

Alvaro Herrera  writes:
> Has this problem been fixed?  I was under the impression that it had
> been, but I spent some 20 minutes now looking for code, commits, or
> patches in the archives, and I can't find anything relevant.  Maybe it
> was fixed in some different way that's not so obviously connected?

As far as I can see from a quick look at the code, nothing has been
done that would alleviate this problem: smgrtruncate still calls
DropRelationBuffers before truncating.

Have you run into a new case of it?  I don't recall having seen
many field complaints about this since 2018.

regards, tom lane

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-04-03 Thread Aleksander Alekseev

Hi John,

Many thanks for all the great feedback!

> Okay, the changes look good. To go further, I think we need to combine into 
> two patches, one with 0001-0003 and one with 0004:
>
> 1. Correct false statements about "shutdown" etc. This should contain changes 
> that can safely be patched all the way to v11.
> 2. Change bad advice (single-user mode) into good advice. We can target head 
> first, and then try to adopt as far back as we safely can (and test).

Done.

> > > In swapping this topic back in my head, I also saw [2] where Robert 
> > > suggested
> > >
> > > "that old prepared transactions and stale replication
> > > slots should be emphasized more prominently.  Maybe something like:
> > >
> > > HINT:  Commit or roll back old prepared transactions, drop stale
> > > replication slots, or kill long-running sessions.
> > > Ensure that autovacuum is progressing, or run a manual database-wide 
> > > VACUUM."
> >
> > It looks like the hint regarding replication slots was added at some
> > point. Currently we have:
> >
> > ```
> > errhint( [...]
> > "You might also need to commit or roll back old prepared
> > transactions, or drop stale replication slots.")));
> > ```
>
> Yes, the exact same text as it appeared in the [2] thread above, which 
> prompted Robert's comment I quoted. I take the point to mean: All of these 
> things need to be taken care of *first*, before vacuuming, so the hint should 
> order things so that it is clear.
>
> > Please let me know if you think
> > we should also add a suggestion to kill long-running sessions, etc.
>
> +1 for also adding that.

OK, done. I included this change as a separate patch. It can be
squashed with another one if necessary.

While on it, I noticed that multixact.c still talks about a
"shutdown". I made corresponding changes in 0001.

> - errmsg("database is not accepting commands to avoid wraparound data loss in 
> database \"%s\"",
> + errmsg("database is not accepting commands that generate new XIDs to avoid 
> wraparound data loss in database \"%s\"",
>
> I'm not quite on board with the new message, but welcome additional opinions. 
> For one, it's a bit longer and now ambiguous. I also bet that "generate XIDs" 
> doesn't  really communicate anything useful. The people who understand 
> exactly what that means, and what the consequences are, are unlikely to let 
> the system get near wraparound in the first place, and might even know enough 
> to ignore the hint.
>
> I'm thinking we might need to convey something about "writes". While it's 
> less technically correct, I imagine it's more useful. Remember, many users 
> have it drilled in their heads that they need to drop immediately to 
> single-user mode. I'd like to give some idea of what they can and cannot do.

This particular wording was chosen for consistency with multixact.c:

```
errmsg("database is not accepting commands that generate new
MultiXactIds to avoid wraparound data loss in database \"%s\"",
```

The idea of using "writes" is sort of OK, but note that the same
message will appear for a query like:

```
SELECT pg_current_xact_id();
```

... which doesn't do writes. The message will be misleading in this case.

On top of that, although a PostgreSQL user may not be aware of
MultiXactIds, arguably there are many users that are aware of XIDs.
Not to mention the fact that XIDs are well documented.

I didn't make this change in v4 since it seems to be controversial and
probably not the highest priority at the moment. I suggest we discuss
it separately.

> I propose for discussion that 0004 should show in the docs all the queries 
> for finding prepared xacts, repl slots etc. If we ever show the info at 
> runtime, we can dispense with the queries, but there seems to be no urgency 
> for that...

Good idea.

> + Previously it was required to stop the postmaster and VACUUM the 
> database
> + in a single-user mode. There is no need to use a single-user mode 
> anymore.
>
> I think we need to go further and actively warn against it: It's slow, 
> impossible to monitor, disables replication and disables safeguards against 
> wraparound. (Other bad things too, but these are easily understandable for 
> users)
>
> Maybe mention also that it's main use in wraparound situations is for a way 
> to perform DROPs and TRUNCATEs if that would help speed up resolution.

Fixed.


--
Best regards,
Aleksander Alekseev


v4-0001-Correct-the-docs-and-messages-about-preventing-XI.patch
Description: Binary data


v4-0002-Don-t-recommend-running-VACUUM-in-a-single-user-m.patch
Description: Binary data


v4-0003-Modify-the-hints-about-preventing-XID-wraparound.patch
Description: Binary data

Re: hio.c does visibilitymap_pin()/IO while holding buffer lock

2023-04-03 Thread Tomas Vondra

On 4/3/23 00:40, Andres Freund wrote:
> Hi,
> 
> On 2023-03-28 19:17:21 -0700, Andres Freund wrote:
>> On 2023-03-28 18:21:02 -0700, Andres Freund wrote:
>>> Here's a draft patch.
>>
>> Attached is v2, with a stupid bug fixed and a bit of comment / pgindent
>> polish.
> 
> I'd welcome some review (Tomas?), but otherwise I'm planning to push ahead
> with this.
> 

I guess the 0001 part was already pushed, so I should be looking only at
0002, correct?

I think 0002 makes RelationGetBufferForTuple() harder to understand. I'm
not saying it's incorrect, but I find it hard to reason about the new
combinations of conditions :-(

I mean, it only had this condition:

if (otherBuffer != InvalidBuffer)
{
...
}

but now it have

if (unlockedTargetBuffer)
{
   ...
}
else if (otherBuffer != InvalidBuffer)
{
...
}

if (unlockedTargetBuffer || otherBuffer != InvalidBuffer)
{
...
}

Not sure how to improve that :-/ but not exactly trivial to figure out
what's going to happen.

Maybe this

 * If we unlocked the target buffer above, it's unlikely, but possible,
 * that another backend used space on this page.

might say what we're going to do in this case. I mean - I understand
some backend may use space in unlocked page, but what does that mean for
this code? What is it going to do? (The same comment talks about the
next condition in much more detail, for example.)

Also, isn't it a bit strange the free space check now happens outside
any if condition? It used to happen in the

if (otherBuffer != InvalidBuffer)
{
...
}

block, but now it happens outside.

> I'm still debating with myself whether this commit (or a prerequisite commit)
> should move logic dealing with the buffer ordering into
> GetVisibilityMapPins(), so we don't need two blocks like this:
> 
> 
>   if (otherBuffer == InvalidBuffer || targetBlock <= otherBlock)
>   GetVisibilityMapPins(relation, buffer, otherBuffer,
>targetBlock, 
> otherBlock, vmbuffer,
>
> vmbuffer_other);
>   else
>   GetVisibilityMapPins(relation, otherBuffer, buffer,
>otherBlock, 
> targetBlock, vmbuffer_other,
>vmbuffer);
> ...
> 
>   if (otherBuffer != InvalidBuffer)
>   {
>   if (GetVisibilityMapPins(relation, otherBuffer, buffer,
>
> otherBlock, targetBlock, vmbuffer_other,
>
> vmbuffer))
>   unlockedTargetBuffer = true;
>   }
>   else
>   {
>   if (GetVisibilityMapPins(relation, buffer, 
> InvalidBuffer,
>
> targetBlock, InvalidBlockNumber,
>
> vmbuffer, InvalidBuffer))
>   unlockedTargetBuffer = true;
>   }
>   }
> 

Yeah. I haven't tried, but I imagine it'd make RelationGetBufferForTuple
a little bit.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Why enable_hashjoin Completely disables HashJoin

2023-04-03 Thread Tom Lane

Quan Zongliang  writes:
> I found that the enable_hashjoin disables HashJoin completely.

Well, yeah.  It's what you asked for.

> Instead, it should add a disable cost to the cost calculation of 
> hashjoin.

Why?  The disable-cost stuff is a crude hack that we use when
turning off a particular plan type entirely might render us
unable to generate a valid plan.  Hash join is not in that
category.

> After disabling all three, the HashJoin path should still be chosen.

Why?

Personally, I'd get rid of disable_cost altogether if I could.
I'm not in a hurry to extend its use to more places.

regards, tom lane

Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-04-03 Thread David Rowley

I've now pushed up v8-0004.  Can rebase the remaining 2 patches on top
of master again and resend?

On Mon, 3 Apr 2023 at 08:11, Melanie Plageman  wrote:
> I still have a few open questions:
> - what the initial value of ring_size for autovacuum should be (see the
>   one remaining TODO in the code)

I assume you're talking about the 256KB BAS_VACUUM one set in
GetAccessStrategy()? I don't think this patch should be doing anything
to change those defaults.  Anything that does that should likely have
a new thread and come with analysis or reasoning about why the newly
proposed defaults are better than the old ones.

> - should ANALYZE allow specifying BUFFER_USAGE_LIMIT since it uses the guc
>   value when that is set?

That's a good question...

> - should INDEX_CLEANUP off cause VACUUM to use shared buffers and
>   disable use of a strategy (like failsafe vacuum)

I don't see why it should.  It seems strange to have one option
magically make changes to some other option.

> - should we add anything to VACUUM VERBOSE output about the number of
>   reuses of strategy buffers?

Sounds like this would require an extra array of counter variables in
BufferAccessStrategyData?  I think it might be a bit late to start
experimenting with this.

> - Should we make BufferAccessStrategyData non-opaque so that we don't
>   have to add a getter for nbuffers. I could have implemented this in
>   another way, but I don't really see why BufferAccessStrategyData
>   should be opaque

If nothing outside of the .c file requires access then there's little
need to make the members known outside of the file. Same as you'd want
to make classes private rather than public when possible in OOP.

If you do come up with a reason to be able to determine the size of
the BufferAccessStrategy from outside freelist.c, I'd say an accessor
method is the best way.

David

David

Re: running logical replication as the subscription owner

2023-04-03 Thread Amit Kapila

On Thu, Mar 30, 2023 at 7:12 PM Robert Haas  wrote:
>
> I don't think run_as_owner is terrible, despite the ambiguity. It's
> talking about the owner of the object on which the property is being
> set.
>

I find this justification quite reasonable to keep the option name as
run_as_owner. So, +1 to use the new option name as run_as_owner.

-- 
With Regards,
Amit Kapila.

Re: Why enable_hashjoin Completely disables HashJoin

2023-04-03 Thread Tomas Vondra

On 4/3/23 12:23, Quan Zongliang wrote:
> Hi,
> 
> I found that the enable_hashjoin disables HashJoin completely.
> It's in the function add_paths_to_joinrel:
> 
> if (enable_hashjoin || jointype == JOIN_FULL)
> hash_inner_and_outer(root, joinrel, outerrel, innerrel,
>     jointype, );
> 
> Instead, it should add a disable cost to the cost calculation of
> hashjoin. And now final_cost_hashjoin does the same thing:
> 
> if (!enable_hashjoin)
> startup_cost += disable_cost;
> 
> 
> enable_mergejoin has the same problem.
> 
> Test case:
> 
> CREATE TABLE t_score_01(
> s_id int,
> s_score int,
> s_course char(8),
> c_id int);
> 
> CREATE TABLE t_student_01(
> s_id int,
> s_name char(8));
> 
> insert into t_score_01 values(
> generate_series(1, 100), random()*100, 'course', generate_series(1,
> 100));
> 
> insert into t_student_01 values(generate_series(1, 100), 'name');
> 
> analyze t_score_01;
> analyze t_student_01;
> 
> SET enable_hashjoin TO off;
> SET enable_nestloop TO off;
> SET enable_mergejoin TO off;
> 
> explain select count(*)
> from t_student_01 a join t_score_01 b on a.s_id=b.s_id;
> 
> After disabling all three, the HashJoin path should still be chosen.
> 

It's not clear to me why that behavior would be desirable? Why is this
an issue you need so solve?

AFAIK the reason why some paths are actually disabled (not built at all)
while others are only penalized by adding disable_cost is that we need
to end up with at least one way to execute the query. So we pick a path
that we know is possible (e.g. seqscan) and hard-disable other paths.
But the always-possible path is only soft-disabled by disable_cost.

For joins, we do the same thing. The hash/merge joins may not be
possible, because the data types may not have hash/sort operators, etc.
Nestloop is always possible. So we soft-disable nestloop but
hard-disable hash/merge joins.

I doubt we want to change this behavior, unless there's a good reason to
do that ...

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

1 2 >

1 - 100 of 111 matches

Mail list logo