Re: Minimal logical decoding on standbys
On Mon, Apr 3, 2023 at 8:51 PM Drouvot, Bertrand wrote: > > On 4/2/23 10:10 PM, Andres Freund wrote: > > Hi, > >> restart_lsn = s->data.restart_lsn; > >> - > >> -/* > >> - * If the slot is already invalid or is fresh enough, we > >> don't need to > >> - * do anything. > >> - */ > >> -if (XLogRecPtrIsInvalid(restart_lsn) || restart_lsn >= > >> oldestLSN) > >> +slot_xmin = s->data.xmin; > >> +slot_catalog_xmin = s->data.catalog_xmin; > >> + > >> +/* the slot has been invalidated (logical decoding conflict > >> case) */ > >> +if ((xid && ((LogicalReplicationSlotIsInvalid(s)) || > >> +/* or the xid is valid and this is a non conflicting slot */ > >> + (TransactionIdIsValid(*xid) && > >> !(LogicalReplicationSlotXidsConflict(slot_xmin, slot_catalog_xmin, > >> *xid) || > >> +/* or the slot has been invalidated (obsolete LSN case) */ > >> +(!xid && (XLogRecPtrIsInvalid(restart_lsn) || > >> restart_lsn >= oldestLSN))) > >> { > > > > This still looks nearly unreadable. I suggest moving comments outside of the > > if (), remove redundant parentheses, use a function to detect if the slot > > has > > been invalidated. > > > > I made it as simple as: > > /* > * If the slot is already invalid or is a non conflicting slot, we > don't > * need to do anything. > */ > islogical = xid ? true : false; > > if (SlotIsInvalid(s, islogical) || SlotIsNotConflicting(s, > islogical, xid, )) > > in V56 attached. > Here the variable 'islogical' doesn't seem to convey its actual meaning because one can imagine that it indicates whether the slot is logical which I don't think is the actual intent. One idea to simplify this is to introduce a single function CanInvalidateSlot() or something like that and move the logic from both the functions SlotIsInvalid() and SlotIsNotConflicting() into the new function. -- With Regards, Amit Kapila.
Re: same query but different result on pg16devel and pg15.2
Attached file included table schema information, but no data. tender wang 于2023年4月4日周二 10:53写道: > Hi hackers, > I encounter a problem, as shown below: > > query: > select > ref_0.ps_suppkey as c0, > ref_1.c_acctbal as c1, > ref_2.o_totalprice as c2, > ref_2.o_orderpriority as c3, > ref_2.o_clerk as c4 > from > public.partsupp as ref_0 > left join public.nation as sample_0 > inner join public.customer as sample_1 > on (false) > on (true) > left join public.customer as ref_1 > right join public.orders as ref_2 > on (false) > left join public.supplier as ref_3 > on (false) > on (sample_0.n_comment = ref_1.c_name ) > where (8 <= NULLIF(CASE WHEN (o_orderkey IS NOT NULL) THEN 4 ELSE 4 END, > CASE WHEN (o_orderdate >= o_orderdate) THEN 95 ELSE 95 END)) > order by c0, c1, c2, c3, c4 limit 1; > > on pg16devel: > c0 | c1 | c2 | c3 | c4 > ++++ > 1 |||| > (1 row) > plan: > QUERY PLAN > > > --- > Limit >-> Sort > Sort Key: ref_0.ps_suppkey, c_acctbal, o_totalprice, > o_orderpriority, o_clerk > -> Nested Loop Left Join >-> Seq Scan on partsupp ref_0 >-> Result > One-Time Filter: false > (7 rows) > > on pg15.2: > c0 | c1 | c2 | c3 | c4 > ++++ > (0 rows) > plan: > > QUERY PLAN > > > --- > Limit >-> Sort > Sort Key: ref_0.ps_suppkey, c_acctbal, o_totalprice, > o_orderpriority, o_clerk > -> Hash Left Join >Hash Cond: ((n_comment)::text = (c_name)::text) >Filter: (8 <= NULLIF(CASE WHEN (o_orderkey IS NOT NULL) > THEN 4 ELSE 4 END, CASE WHEN (o_orderdate >= o_orderdate) THEN 95 ELSE 95 > END)) >-> Nested Loop Left Join > -> Seq Scan on partsupp ref_0 > -> Result >One-Time Filter: false >-> Hash > -> Result >One-Time Filter: false > (13 rows) > > > > regards, tender > wang > dbt3-s0.01-janm.sql Description: Binary data
Re: Prefetch the next tuple's memory during seqscans
On Tue, 4 Apr 2023 at 07:47, Gregory Stark (as CFM) wrote: > The referenced patch was committed March 19th but there's been no > comment here. Is this patch likely to go ahead this release or should > I move it forward again? Thanks for the reminder on this. I have done some work on it but just didn't post it here as I didn't have good news. The problem I'm facing is that after Melanie's recent refactor work done around heapgettup() [1], I can no longer get the same speedup as before with the pg_prefetch_mem(). While testing Melanie's patches, I did do some performance tests and did see a good increase in performance from it. I really don't know the reason why the prefetching does not show the gains as it did before. Perhaps the rearranged code is better able to perform hardware prefetching of cache lines. I am, however, inclined not to drop the pg_prefetch_mem() macro altogether just because I can no longer demonstrate any performance gains during sequential scans, so I decided to go and try what Thomas mentioned in [2] to use the prefetching macro to fetch the required tuples in PageRepairFragmentation() so that they're cached in CPU cache by the time we get to compactify_tuples(). I tried this using the same test as I described in [3] after adjusting the following line to use PANIC instead of LOG: ereport(LOG, (errmsg("redo done at %X/%X system usage: %s", LSN_FORMAT_ARGS(xlogreader->ReadRecPtr), pg_rusage_show(; doing that allows me to repeat the test using the same WAL each time. amd3990x CPU on Ubuntu 22.10 with 64GB RAM. shared_buffers = 10GB checkpoint_timeout = '1 h' max_wal_size = 100GB max_connections = 300 Master: 2023-04-04 15:54:55.635 NZST [15958] PANIC: redo done at 0/DC447610 system usage: CPU: user: 44.46 s, system: 0.97 s, elapsed: 45.45 s 2023-04-04 15:56:33.380 NZST [16109] PANIC: redo done at 0/DC447610 system usage: CPU: user: 43.80 s, system: 0.86 s, elapsed: 44.69 s 2023-04-04 15:57:25.968 NZST [16134] PANIC: redo done at 0/DC447610 system usage: CPU: user: 44.08 s, system: 0.74 s, elapsed: 44.84 s 2023-04-04 15:58:53.820 NZST [16158] PANIC: redo done at 0/DC447610 system usage: CPU: user: 44.20 s, system: 0.72 s, elapsed: 44.94 s Prefetch Memory in PageRepairFragmentation(): 2023-04-04 16:03:16.296 NZST [25921] PANIC: redo done at 0/DC447610 system usage: CPU: user: 41.73 s, system: 0.77 s, elapsed: 42.52 s 2023-04-04 16:04:07.384 NZST [25945] PANIC: redo done at 0/DC447610 system usage: CPU: user: 40.87 s, system: 0.86 s, elapsed: 41.74 s 2023-04-04 16:05:01.090 NZST [25968] PANIC: redo done at 0/DC447610 system usage: CPU: user: 41.20 s, system: 0.72 s, elapsed: 41.94 s 2023-04-04 16:05:49.235 NZST [25996] PANIC: redo done at 0/DC447610 system usage: CPU: user: 41.56 s, system: 0.66 s, elapsed: 42.24 s About 6.7% performance increase over master. I wonder since I really just did the seqscan patch as a means to get the pg_prefetch_mem() patch in, I wonder if it's ok to scrap that in favour of the PageRepairFragmentation patch. Updated patches attached. David [1] https://postgr.es/m/CAAKRu_YSOnhKsDyFcqJsKtBSrd32DP-jjXmv7hL0BPD-z0TGXQ%40mail.gmail.com [2] https://postgr.es/m/CA%2BhUKGJRtzbbhVmb83vbCiMRZ4piOAi7HWLCqs%3DGQ74mUPrP_w%40mail.gmail.com [3] https://postgr.es/m/CAApHDvoKwqAzhiuxEt8jSquPJKDpH8DNUZDFUSX9P7DXrJdc3Q%40mail.gmail.com v1-0001-Add-pg_prefetch_mem-macro-to-load-cache-lines.patch Description: Binary data prefetch_in_PageRepairFragmentation.patch Description: Binary data
RE: Add missing copyright for pg_upgrade/t/* files
Dear Amit, Thank you for responding! > > Yeah, it is good to have the Copyright to keep it consistent with > other test files and otherwise as well. > > --- a/src/bin/pg_upgrade/t/001_basic.pl > +++ b/src/bin/pg_upgrade/t/001_basic.pl > @@ -1,3 +1,5 @@ > +# Copyright (c) 2022-2023, PostgreSQL Global Development Group > > How did you decide on the starting year as 2022? I checked the commit log. About 001_basic.pl, it had been added at 2017 once but been reverted soon [1][2]. 322bec added the file again at 2022[3], so I chose 2022. About 002_pg_upgrade.pl, it has been added at the same time[3]. Definitively it should be 2022. [1]: https://github.com/postgres/postgres/commit/f41e56c76e39f02bef7ba002c9de03d62b76de4d [2] https://github.com/postgres/postgres/commit/58ffe141eb37c3f027acd25c1fc6b36513bf9380 [3: https://github.com/postgres/postgres/commit/322becb6085cb92d3708635eea61b45776bf27b6 Best Regards, Hayato Kuroda FUJITSU LIMITED
Re: RFC: logical publication via inheritance root?
FYI, the WIP patch does not seem to apply cleanly anymore using the latest HEAD. See the cfbot rebase logs [1]. -- [1] http://cfbot.cputube.org/patch_42_4225.log Kind Regards, Peter Smith. Fujitsu Australia
Re: Add missing copyright for pg_upgrade/t/* files
On Mon, Apr 3, 2023 at 7:25 PM Hayato Kuroda (Fujitsu) wrote: > > While reading codes, I noticed that pg_upgrade/t/001_basic.pl and > pg_upgrade/t/002_pg_upgrade.pl do not contain the copyright. > > I checked briefly and almost all files have that, so I thought they missed it. > PSA the patch to fix them. > Yeah, it is good to have the Copyright to keep it consistent with other test files and otherwise as well. --- a/src/bin/pg_upgrade/t/001_basic.pl +++ b/src/bin/pg_upgrade/t/001_basic.pl @@ -1,3 +1,5 @@ +# Copyright (c) 2022-2023, PostgreSQL Global Development Group How did you decide on the starting year as 2022? -- With Regards, Amit Kapila.
Re: O(n) tasks cause lengthy startups and checkpoints
I sent this one to the next commitfest and marked it as waiting-on-author and targeted for v17. I'm aiming to have something that addresses the latest feedback ready for the July commitfest. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: pg_usleep for multisecond delays
On Mon, Apr 03, 2023 at 04:33:27PM -0400, Gregory Stark (as CFM) wrote: > Is this still a WIP? Is it targeting this release? There are only a > few days left before the feature freeze. I'm guessing it should just > move to the next CF for the next release? I moved it to the next commitfest and marked the target version as v17. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: [BUG] pg_stat_statements and extended query protocol
> Maybe, but is there any field demand for that? I don't think there is. > We clearly do need to fix the > reported rowcount for cases where ExecutorRun is invoked more than > once per ExecutorEnd call; but I think that's sufficient. Sure, the original proposed fix, but with tracking the es_total_processed only in Estate should be enough for now. Regards, Sami Imseih Amazon Web Services (AWS)
RE: Support logical replication of DDLs
On Friday, March 31, 2023 6:31 AM Peter Smith wrote: Hi, > > It seems that lately, the patch attachments are lacking version numbers. It > causes unnecessary confusion. For example, I sometimes fetch patches from > this thread locally to "diff" them with previous patches to get a rough > overview > of the changes -- that has now become more difficult. > > Can you please reinstate the name convention of having version numbers for all > patch attachments? > > IMO *every* post that includes patches should unconditionally increment the > patch version -- even if the new patches are just a rebase or some other > trivial > change. Version numbers make it clear what patches are the latest, you will be > easily able to unambiguously refer to them by name in subsequent posts, and > when copied to your local computer they won't clash with any older copied > patches. The patch currently use date as the version number. I think the reason is that multiple people are working on the patch which cause the version numbers to be changed very frequently(soon becomes a very large number). So to avoid this , we used the date to distinguish different versions. Best Regards, Hou zj
Re: [BUG] pg_stat_statements and extended query protocol
"Imseih (AWS), Sami" writes: > I wonder if the right answer here is to track fetches as > a separate counter in pg_stat_statements, in which fetch > refers to the number of times a portal is executed? Maybe, but is there any field demand for that? IMV, the existing behavior is that we count one "call" per overall query execution (that is, per ExecutorEnd invocation). The argument that that's a bug and we should change it seems unsupportable to me, and even the argument that we should also count ExecutorRun calls seems quite lacking in evidence. We clearly do need to fix the reported rowcount for cases where ExecutorRun is invoked more than once per ExecutorEnd call; but I think that's sufficient. regards, tom lane
Re: zstd compression for pg_dump
On Mon, Apr 03, 2023 at 11:26:09PM +0200, Tomas Vondra wrote: > On 4/3/23 21:17, Justin Pryzby wrote: > > On Sat, Apr 01, 2023 at 10:26:01PM +0200, Tomas Vondra wrote: > >>> Feel free to mess around with threads (but I'd much rather see the patch > >>> progress for zstd:long). > >> > >> OK, understood. The long mode patch is pretty simple. IIUC it does not > >> change the format, i.e. in the worst case we could leave it for PG17 > >> too. Correct? > > > > Right, libzstd only has one "format", which is the same as what's used > > by the commandline tool. zstd:long doesn't change the format of the > > output: the library just uses a larger memory buffer to allow better > > compression. There's no format change for zstd:workers, either. > > OK. I plan to do a bit more review/testing on this, and get it committed > over the next day or two, likely including the long mode. One thing I > noticed today is that maybe long_distance should be a bool, not int. > Yes, ZSTD_c_enableLongDistanceMatching() accepts int, but it'd be > cleaner to cast the value during a call and keep it bool otherwise. Thanks for noticing. Evidently I wrote it using "int" to get the feature working, and then later wrote the bool parsing bits but never changed the data structure. This also updates a few comments, indentation, removes a useless assertion, and updates the warning about zstd:workers. -- Justin >From df0eb4d3c4799f24e58f1e5b0a9470e5af355ad6 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Sat, 7 Jan 2023 15:45:06 -0600 Subject: [PATCH 1/4] pg_dump: zstd compression Previously proposed at: 20201221194924.gi30...@telsasoft.com --- doc/src/sgml/ref/pg_dump.sgml | 13 +- src/bin/pg_dump/Makefile | 2 + src/bin/pg_dump/compress_io.c | 66 ++-- src/bin/pg_dump/compress_zstd.c | 537 ++ src/bin/pg_dump/compress_zstd.h | 25 ++ src/bin/pg_dump/meson.build | 4 +- src/bin/pg_dump/pg_backup_archiver.c | 9 +- src/bin/pg_dump/pg_backup_directory.c | 2 + src/bin/pg_dump/pg_dump.c | 20 +- src/bin/pg_dump/t/002_pg_dump.pl | 79 +++- src/tools/pginclude/cpluspluscheck| 1 + src/tools/pgindent/typedefs.list | 1 + 12 files changed, 705 insertions(+), 54 deletions(-) create mode 100644 src/bin/pg_dump/compress_zstd.c create mode 100644 src/bin/pg_dump/compress_zstd.h diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml index 77299878e02..8de38e0fd0d 100644 --- a/doc/src/sgml/ref/pg_dump.sgml +++ b/doc/src/sgml/ref/pg_dump.sgml @@ -330,8 +330,9 @@ PostgreSQL documentation machine-readable format that pg_restore can read. A directory format archive can be manipulated with standard Unix tools; for example, files in an uncompressed archive - can be compressed with the gzip or - lz4 tools. + can be compressed with the gzip, + lz4, or + zstd tools. This format is compressed by default using gzip and also supports parallel dumps. @@ -655,7 +656,8 @@ PostgreSQL documentation Specify the compression method and/or the compression level to use. The compression method can be set to gzip, -lz4, or none for no compression. +lz4, zstd, +or none for no compression. A compression detail string can optionally be specified. If the detail string is an integer, it specifies the compression level. Otherwise, it should be a comma-separated list of items, each of the @@ -676,8 +678,9 @@ PostgreSQL documentation individual table-data segments, and the default is to compress using gzip at a moderate level. For plain text output, setting a nonzero compression level causes the entire output file to be compressed, -as though it had been fed through gzip or -lz4; but the default is not to compress. +as though it had been fed through gzip, +lz4, or zstd; +but the default is not to compress. The tar archive format currently does not support compression at all. diff --git a/src/bin/pg_dump/Makefile b/src/bin/pg_dump/Makefile index eb8f59459a1..24de7593a6a 100644 --- a/src/bin/pg_dump/Makefile +++ b/src/bin/pg_dump/Makefile @@ -18,6 +18,7 @@ include $(top_builddir)/src/Makefile.global export GZIP_PROGRAM=$(GZIP) export LZ4 +export ZSTD export with_icu override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS) @@ -29,6 +30,7 @@ OBJS = \ compress_io.o \ compress_lz4.o \ compress_none.o \ + compress_zstd.o \ dumputils.o \ parallel.o \ pg_backup_archiver.o \ diff --git a/src/bin/pg_dump/compress_io.c b/src/bin/pg_dump/compress_io.c index 0972a4f934a..4f06bb024f9 100644 --- a/src/bin/pg_dump/compress_io.c +++ b/src/bin/pg_dump/compress_io.c @@ -52,8 +52,8 @@ * * InitDiscoverCompressFileHandle tries to infer the
Re: [BUG] pg_stat_statements and extended query protocol
> Why should that be the definition? Partial execution of a portal > might be something that is happening at the driver level, behind the > user's back. You can't make rational calculations of, say, plan > time versus execution time if that's how "calls" is measured. Correct, and there are also drivers that implement fetch size using cursor statements, i.e. DECLARE CURSOR, FETCH CURSOR, and each FETCH gets counted as 1 call. I wonder if the right answer here is to track fetches as a separate counter in pg_stat_statements, in which fetch refers to the number of times a portal is executed? Regards, Sami Imseih Amazon Web Services (AWS)
same query but different result on pg16devel and pg15.2
Hi hackers, I encounter a problem, as shown below: query: select ref_0.ps_suppkey as c0, ref_1.c_acctbal as c1, ref_2.o_totalprice as c2, ref_2.o_orderpriority as c3, ref_2.o_clerk as c4 from public.partsupp as ref_0 left join public.nation as sample_0 inner join public.customer as sample_1 on (false) on (true) left join public.customer as ref_1 right join public.orders as ref_2 on (false) left join public.supplier as ref_3 on (false) on (sample_0.n_comment = ref_1.c_name ) where (8 <= NULLIF(CASE WHEN (o_orderkey IS NOT NULL) THEN 4 ELSE 4 END, CASE WHEN (o_orderdate >= o_orderdate) THEN 95 ELSE 95 END)) order by c0, c1, c2, c3, c4 limit 1; on pg16devel: c0 | c1 | c2 | c3 | c4 ++++ 1 |||| (1 row) plan: QUERY PLAN --- Limit -> Sort Sort Key: ref_0.ps_suppkey, c_acctbal, o_totalprice, o_orderpriority, o_clerk -> Nested Loop Left Join -> Seq Scan on partsupp ref_0 -> Result One-Time Filter: false (7 rows) on pg15.2: c0 | c1 | c2 | c3 | c4 ++++ (0 rows) plan: QUERY PLAN --- Limit -> Sort Sort Key: ref_0.ps_suppkey, c_acctbal, o_totalprice, o_orderpriority, o_clerk -> Hash Left Join Hash Cond: ((n_comment)::text = (c_name)::text) Filter: (8 <= NULLIF(CASE WHEN (o_orderkey IS NOT NULL) THEN 4 ELSE 4 END, CASE WHEN (o_orderdate >= o_orderdate) THEN 95 ELSE 95 END)) -> Nested Loop Left Join -> Seq Scan on partsupp ref_0 -> Result One-Time Filter: false -> Hash -> Result One-Time Filter: false (13 rows) regards, tender wang
Re: [BUG] pg_stat_statements and extended query protocol
"Imseih (AWS), Sami" writes: >> Also, I'm doubtful that counting calls this way is a great idea, >> which would mean you only need one new counter field not two. The >> fact that you're having trouble defining what it means certainly >> suggests that the implementation is out front of the design. > ISTM you are not in agreement that a call count should be incremented > after every executorRun, but should only be incremented after > the portal is closed, at executorEnd. Is that correct? Right. That makes the "call count" equal to the number of times the query is invoked. > FWIW, The rationale for incrementing calls in executorRun is that calls > refers > to the number of times a client executes a portal, whether partially or to > completion. Why should that be the definition? Partial execution of a portal might be something that is happening at the driver level, behind the user's back. You can't make rational calculations of, say, plan time versus execution time if that's how "calls" is measured. regards, tom lane
Re: [BUG] pg_stat_statements and extended query protocol
> * Yeah, it'd be nice to have an in-core test, but it's folly to insist > on one that works via libpq and psql. That requires a whole new set > of features that you're apparently designing on-the-fly with no other > use cases in mind. I don't think that will accomplish much except to > ensure that this bug fix doesn't make it into v16. I agree, Solving the lack of in-core testing for this area belong in a different discussion. > * I don't understand why it was thought good to add two new counters > to struct Instrumentation. In EXPLAIN ANALYZE cases those will be > wasted space *per plan node*, not per Query. Indeed, looking at ExplainNode, Instrumentation struct is allocated per node and the new fields will be wasted space. Thanks for highlighting this. > * It also seems quite bizarre to add counters to a core data structure > and then leave it to pg_stat_statements to maintain them. That is fair point > I'm inclined to think that adding the counters to struct EState is > fine. That's 304 bytes already on 64-bit platforms, another 8 or 16 > won't matter. With the point you raise about Insrumentation per node, Estate is the better place for the new counters. > Also, I'm doubtful that counting calls this way is a great idea, > which would mean you only need one new counter field not two. The > fact that you're having trouble defining what it means certainly > suggests that the implementation is out front of the design. ISTM you are not in agreement that a call count should be incremented after every executorRun, but should only be incremented after the portal is closed, at executorEnd. Is that correct? FWIW, The rationale for incrementing calls in executorRun is that calls refers to the number of times a client executes a portal, whether partially or to completion. Clients can also fetch rows from portals at various rates, so to determine the "rows per call" accurately from pg_stat_statements, we should track a calls as the number of times executorRun was called on a portal. > In short, what I think I'd suggest is adding an es_total_processed > field to EState and having standard_ExecutorRun do "es_total_processed > += es_processed" near the end, then just change pg_stat_statements to > use es_total_processed not es_processed. The original proposal in 0001-correct-pg_stat_statements-tracking-of-portals.patch, was to add a "calls" and "es_total_processed" field to EState. Regards, Sami Imseih Amazon Web Services (AWS)
Re: running logical replication as the subscription owner
On Mon, Apr 03, 2023 at 12:05:29PM -0700, Jeff Davis wrote: > On Sun, 2023-04-02 at 20:21 -0700, Noah Misch wrote: > > The most-plausible-to-me attack involves an ENABLE ALWAYS trigger > > that logs > > CURRENT_USER to an audit table. > > How does requiring that the subscription owner have SET ROLE privileges > on the table owner help that case? As Robert pointed out, users coming > from v15 will have superuser subscription owners anyway, so the change > will be silent for them. For subscriptions upgraded from v15, it doesn't matter. Requiring SET ROLE prevents this sequence: - Make a table with such an audit trigger. Grant INSERT and UPDATE to Alice. - Upgrade to v15. - Grant pg_create_subscription and database-level CREATE to Alice. - Alice creates a subscription as a tool to impersonate the table owner, bypassing audit. To put it another way, the benefit of the SET ROLE requirement is not really making subscriptions more secure. The benefit of the requirement is pg_create_subscription not becoming a tool for bypassing audit. I gather we agree on what to do for v16, which is good. > But I feel like we can do better in version 17 when we have time to > actually work through common use cases and the exceptional cases and > weight them appropriately. Like, how common is it to want to get the > user from a trigger on the subscriber side? Fair. I don't think the community has arrived at a usable approach for answering questions like that. It would be valuable to have an approach. > Should that trigger be > using SESSION_USER instead of CURRENT_USER? Apart from evaluating the argument of SET ROLE, I've not heard of a valid use case for SESSION_USER.
Re: Minimal logical decoding on standbys
On Tue, Apr 4, 2023 at 3:17 AM Drouvot, Bertrand wrote: > > Hi, > > On 4/3/23 8:10 AM, Drouvot, Bertrand wrote: > > Hi, > > > > On 4/3/23 7:35 AM, Amit Kapila wrote: > >> On Mon, Apr 3, 2023 at 4:26 AM Jeff Davis wrote: > >> > >> Agreed, even Bertrand and myself discussed the same approach few > >> emails above. BTW, if we have this selective logic to wake > >> physical/logical walsenders and for standby's, we only wake logical > >> walsenders at the time of ApplyWalRecord() then do we need the new > >> conditional variable enhancement being discussed, and if so, why? > >> > > > > Thank you both for this new idea and discussion. In that case I don't think > > we need the new CV API and the use of a CV anymore. As just said up-thread > > I'll submit > > a new proposal with this new approach. > > > > Please find enclosed V57 implementing the new approach in 0004. Regarding 0004 patch: @@ -2626,6 +2626,12 @@ InitWalSenderSlot(void) walsnd->sync_standby_priority = 0; walsnd->latch = >procLatch; walsnd->replyTime = 0; + + if (MyDatabaseId == InvalidOid) + walsnd->kind = REPLICATION_KIND_PHYSICAL; + else + walsnd->kind = REPLICATION_KIND_LOGICAL; + I think we might want to set the replication kind when processing the START_REPLICATION command. The walsender using a logical replication slot is not necessarily streaming (e.g. when COPYing table data). Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
fairywren exiting in ecpg
Hi, Looks like fairywren is possibly seeing something I saw before and spent many days looking into: https://postgr.es/m/20220909235836.lz3igxtkcjb5w7zb%40awork3.anarazel.de which led me to add the following to .cirrus.yml: # Cirrus defaults to SetErrorMode(SEM_NOGPFAULTERRORBOX | ...). That # prevents crash reporting from working unless binaries do SetErrorMode() # themselves. Furthermore, it appears that either python or, more likely, # the C runtime has a bug where SEM_NOGPFAULTERRORBOX can very # occasionally *trigger* a crash on process exit - which is hard to debug, # given that it explicitly prevents crash dumps from working... # 0x8001 is SEM_FAILCRITICALERRORS | SEM_NOOPENFILEERRORBOX CIRRUS_WINDOWS_ERROR_MODE: 0x8001 The mingw folks also spent a lot of time looking into this ([1]), without a lot of success. It sure looks like it might be a windows C runtime issue - none of the stacktrace handling python has gets invoked. I could not find any relevant behavoural differences in python's code that depend on SEM_NOGPFAULTERRORBOX being set. It'd be interesting to see if fairywren's occasional failures go away if you set MSYS=winjitdebug (which prevents msys from adding SEM_NOGPFAULTERRORBOX to ErrorMode). Greetings, Andres Freund [1] https://github.com/msys2/MINGW-packages/issues/11864
Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode
On Tue, 4 Apr 2023 at 02:49, Melanie Plageman wrote: > v9 attached. I've made a pass on the v9-0001 patch only. Here's what I noted down: v9-0001: 1. In the documentation and comments, generally we always double-space after a period. I see quite often you're not following this. 2. Doc: We could generally seem to break tags within paragraphs into multiple lines. You're doing that quite a bit, e.g: linkend="glossary-buffer-access-strategy">Buffer Access Strategy. 0 will disable use of a 2. This is not a command BUFFER_USAGE_LIMIT parameter. is probably what you want. 3. I'm not sure I agree that it's a good idea to refer to the strategy with multiple different names. Here you've called it a "ring buffer", but in the next sentence, you're calling it a Buffer Access Strategy. Specifies the ring buffer size for VACUUM. This size is used to calculate the number of shared buffers which will be reused as part of a Buffer Access Strategy. 0 disables use of a 4. Can you explain your choice in not just making < 128 a hard error rather than clamping? I guess it means checks like this are made more simple, but that does not seem like a good enough reason: /* check for out-of-bounds */ if (result < -1 || result > MAX_BAS_RING_SIZE_KB) postgres=# vacuum (parallel -1) pg_class; ERROR: parallel workers for vacuum must be between 0 and 1024 Maybe the above is a good guide to follow. To allow you to get rid of the clamping code, you'd likely need an assign hook function for vacuum_buffer_usage_limit. 5. I see vacuum.sgml is full of inconsistencies around the use of vs . I was going to complain about your: ONLY_DATABASE_STATS option. If ANALYZE is also specified, the BUFFER_USAGE_LIMIT value is used for both the vacuum but I see you've likely just copied what's nearby. There are also plenty of usages of in that file. I'd rather see you use . Maybe there can be some other patch that sweeps the entire docs to look for OPTION_NAME and replaces them to use . 6. I was surprised to see you've added both GetAccessStrategyWithSize() and GetAccessStrategyWithNBuffers(). I think the former is suitable for both. GetAccessStrategyWithNBuffers() seems to be just used once outside of freelist.c 7. I don't think bas_nbuffers() is a good name for an external function. StrategyGetBufferCount() seems better. 8. I don't quite follow this comment: /* * TODO: should this be 0 so that we are sure that vacuum() never * allocates a new bstrategy for us, even if we pass in NULL for that * parameter? maybe could change how failsafe NULLs out bstrategy if * so? */ Can you explain under what circumstances would vacuum() allocate a bstrategy when do_autovacuum() would not? Is this a case of a config reload where someone changes vacuum_buffer_usage_limit from 0 to something non-zero? If so, perhaps do_autovacuum() needs to detect this and allocate a strategy rather than having vacuum() do it once per table (wastefully). 9. buffer/README. I think it might be overkill to document details about how the new vacuum option works in a section talking about Buffer Ring Replacement Strategy. Perhaps it just worth something like: "In v16, the 256KB ring was made configurable by way of the vacuum_buffer_usage_limit GUC and the BUFFER_USAGE_LIMIT VACUUM option." 10. I think if you do #4 then you can get rid of all the range checks and DEBUG1 elogs in GetAccessStrategyWithSize(). 11. This seems a bit badly done: int vacuum_buffer_usage_limit = -1; int VacuumCostPageHit = 1; /* GUC parameters for vacuum */ int VacuumCostPageMiss = 2; int VacuumCostPageDirty = 20; I'd class vacuum_buffer_usage_limit as a "GUC parameters for vacuum" too. Probably the CamelCase naming should be followed too. 12. ANALYZE too? {"vacuum_buffer_usage_limit", PGC_USERSET, RESOURCES_MEM, gettext_noop("Sets the buffer pool size for VACUUM and autovacuum."), 13. VacuumParams.ring_size has no comments explaining what it is. 14. vacuum_buffer_usage_limit seems to be lumped in with unrelated GUCs extern PGDLLIMPORT int maintenance_work_mem; extern PGDLLIMPORT int max_parallel_maintenance_workers; +extern PGDLLIMPORT int vacuum_buffer_usage_limit; extern PGDLLIMPORT int VacuumCostPageHit; extern PGDLLIMPORT int VacuumCostPageMiss; 15. No comment explaining what these are: #define MAX_BAS_RING_SIZE_KB (16 * 1024 * 1024) #define MIN_BAS_RING_SIZE_KB 128 16. Parameter names in function declaration and definition don't match in: extern BufferAccessStrategy GetAccessStrategyWithNBuffers(BufferAccessStrategyType btype, int nbuffers); extern BufferAccessStrategy GetAccessStrategyWithSize(BufferAccessStrategyType btype, int nbuffers); Also, line wraps at 79 chars. (80 including line feed) 17. If you want to test the 16GB upper limit, maybe going 1KB (or 8KB?) rather than 1GB over 16GB is better? 2097153kB, I think. VACUUM (BUFFER_USAGE_LIMIT '17 GB') vac_option_tab; David
Re: Why enable_hashjoin Completely disables HashJoin
David Rowley writes: > I think there would be quite a bit of work to do before we could ever > start to think about that. The planner does quite a bit of writing on > the parse, e.g adding new RangeTblEntrys to the query's rtable. We'd > either need to fix all those first or make a copy of the parse before > planning. Yeah, we'd have to be sure that all that preliminary work is teased apart from the actual path-making. I think we are probably pretty close to that but not there yet. Subqueries might be problematic, but perhaps we could define our way out of that by saying that this retry principle applies independently in each planner recursion level. > It's also not clear to > me how you'd know what you'd need to enable again to get the 2nd > attempt to produce a plan this time around. I'd assume you'd want the > minimum possible set of enable_* GUCs turned back on, but what would > you do in cases where there's an aggregate and both enable_hashagg and > enable_sort are both disabled and there are no indexes providing > pre-sorted input? As I commented concurrently, I think we should simply not try to solve that conundrum: if you want control, don't pose impossible problems. There's no principled way that we could decide which of enable_hashagg and enable_sort to ignore first, for example. regards, tom lane
Re: Why enable_hashjoin Completely disables HashJoin
Andres Freund writes: > It sounds too hard compared to the gains, but another way could be to plan > with the relevant path generation hard disabled, and plan from scratch, with > additional scan types enabled, if we end up being unable to generate valid > plan. Actually, I kind of like that. It would put the extra cost in a place it belongs: if you have enough enable_foo turned off to prevent generating a valid plan, it'll cost you extra to make a plan ... but likely you'll be paying even more in runtime due to not getting a good plan, so maybe that doesn't matter anyway. I'd limit it to two passes: first try honors all enable_foo switches, second try ignores all. I'm not quite sure how this could be wedged into the existing code structure --- in particular I am not sure that we're prepared to do two passes of baserel path generation. (GEQO is an existence proof that we could handle it for join planning, though.) Or we could rethink the design goal of not allowing enable_foo switches to cause us to fail to make a plan. That might be unusable though. regards, tom lane
Re: Why enable_hashjoin Completely disables HashJoin
On Tue, 4 Apr 2023 at 11:18, Andres Freund wrote: > It sounds too hard compared to the gains, but another way could be to plan > with the relevant path generation hard disabled, and plan from scratch, with > additional scan types enabled, if we end up being unable to generate valid > plan. I think there would be quite a bit of work to do before we could ever start to think about that. The planner does quite a bit of writing on the parse, e.g adding new RangeTblEntrys to the query's rtable. We'd either need to fix all those first or make a copy of the parse before planning. The latter is quite expensive today. It's also not clear to me how you'd know what you'd need to enable again to get the 2nd attempt to produce a plan this time around. I'd assume you'd want the minimum possible set of enable_* GUCs turned back on, but what would you do in cases where there's an aggregate and both enable_hashagg and enable_sort are both disabled and there are no indexes providing pre-sorted input? David David
Re: Why enable_hashjoin Completely disables HashJoin
Hi, On 2023-04-03 14:04:30 -0400, Tom Lane wrote: > Robert Haas writes: > > On Mon, Apr 3, 2023 at 8:13 AM Tom Lane wrote: > >> Personally, I'd get rid of disable_cost altogether if I could. > >> I'm not in a hurry to extend its use to more places. > > > I agree. I've wondered if we should put some work into that. It feels > > bad to waste CPU cycles generating paths we intend to basically just > > throw away, and it feels even worse if they manage to beat out some > > other path on cost. > > > It hasn't been obvious to me how we could restructure the existing > > logic to avoid relying on disable_cost. > > Yeah. In some places it would not be too hard; for example, if we > generated seqscan paths last instead of first for baserels, the rule > could be "generate it if enable_seqscan is on OR if we made no other > path for the rel". It's much messier for joins though, partly because > the same joinrel will be considered multiple times as we process > different join orderings, plus it's usually unclear whether failing > to generate any paths for joinrel X will lead to overall failure. > > A solution that would work is to treat disable_cost as a form of infinity > that's counted separately from the actual cost estimate, that is we > label paths as "cost X, plus there are N uses of disabled plan types". > Then you sort first on N and after that on X. But this'd add a good > number of cycles to add_path, which I've not wanted to expend on a > non-mainstream usage. It sounds too hard compared to the gains, but another way could be to plan with the relevant path generation hard disabled, and plan from scratch, with additional scan types enabled, if we end up being unable to generate valid plan. Greetings, Andres Freund
Re: Kerberos delegation support in libpq and postgres_fdw
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation:not tested Did a code review pass here; here is some feedback. + /* If password was used to connect, make sure it was one provided */ + if (PQconnectionUsedPassword(conn) && dblink_connstr_has_pw(connstr)) + return; Do we need to consider whether these passwords are the same? Is there a different vector where a different password could be acquired from a different source (PGPASSWORD, say) while both of these criteria are true? Seems like it probably doesn't matter that much considering we only checked Password alone in previous version of this code. --- Looks like the pg_gssinfo struct hides the `proxy_creds` def behind: #if defined(ENABLE_GSS) | defined(ENABLE_SSPI) typedef struct { gss_buffer_desc outbuf; /* GSSAPI output token buffer */ #ifdef ENABLE_GSS ... boolproxy_creds;/* GSSAPI Delegated/proxy credentials */ #endif } pg_gssinfo; #endif Which means that the later check in `be_gssapi_get_proxy()` we have: /* * Return if GSSAPI delegated/proxy credentials were included on this * connection. */ bool be_gssapi_get_proxy(Port *port) { if (!port || !port->gss) return NULL; return port->gss->proxy_creds; } So in theory it'd be possible to have SSPI enabled but GSS disabled and we'd fail to compile in that case. (It may be that this routine is never *actually* called in that case, just noting compile-time considerations.) I'm not seeing guards in the actual PQ* routines, but don't think I've done an exhaustive search. --- gss_accept_deleg + +Forward (delegate) GSS credentials to the server. The default is +disable which means credentials will not be forwarded +to the server. Set this to enable to have +credentials forwarded when possible. When is this not possible? Server policy? External factors? --- Only superusers may connect to foreign servers without password -authentication, so always specify the password option -for user mappings belonging to non-superusers. +authentication or using gssapi proxied credentials, so specify the +password option for user mappings belonging to +non-superusers who are not able to proxy GSSAPI credentials. s/gssapi/GSSAPI/; this is kind of confusing, as this makes it sound like only superuser may use GSSAPI proxied credentials, which I disbelieve to be true. Additionally, it sounds like you're wanting to explicitly maintain a denylist for users to not be allowed proxying; is that correct? --- libpq/auth.c: if (proxy != NULL) { pg_store_proxy_credential(proxy); port->gss->proxy_creds = true; } Per GSS docs, seems like we should be comparing to GSS_C_NO_CREDENTIAL and validating that the gflags has the `deleg_flag` bit set before considering whether there are valid credentials; in practice this might be the same effect (haven't looked at what that symbol actually resolves to, but NULL would be sensible). Are there other cases we might need to consider here, like valid credentials, but they are expired? (GSS_S_CREDENTIALS_EXPIRED) --- + /* +* Set KRB5CCNAME for this backend, so that later calls to gss_acquire_cred +* will find the proxied credentials we stored. +*/ So I'm not seeing this in other use in the code; I assume this is just used by the krb5 libs? Similar q's for the other places the pg_gss_accept_deleg are used. --- +int +PQconnectionUsedGSSAPI(const PGconn *conn) +{ + if (!conn) + return false; + if (conn->gssapi_used) + return true; + else + return false; +} Micro-gripe: this routine seems like could be simpler, though the compiler probably has the same thing to say for either, so maybe code clarity is better as written: int PQconnectionUsedGSSAPI(const PGconn *conn) { return conn && conn->gssapi_used; } --- Anything required for adding meson support? I notice src/test/kerberos has Makefile updated, but no meson.build files are changed. --- Two tests in src/test/kerberos/t/001_auth.pl at :535 and :545 have the same test description: + 'succeeds with GSS-encrypted access required and hostgssenc hba and credentials not forwarded', Since the first test has only `gssencmode` defined (so implicit `gssdeleg` value) and the second has `gssdeleg=disable` I'd suggest that the test on :545 should have its description updated to add the word "explicitly": 'succeeds with GSS-encrypted access required and hostgssenc hba and
Re: Should vacuum process config file reload more often
On Mon, Apr 3, 2023 at 3:08 PM Andres Freund wrote: > On 2023-04-03 14:43:14 -0400, Tom Lane wrote: > > Melanie Plageman writes: > > > v13 attached with requested updates. > > > > I'm afraid I'd not been paying any attention to this discussion, > > but better late than never. I'm okay with letting autovacuum > > processes reload config files more often than now. However, > > I object to allowing ProcessConfigFile to be called from within > > commands in a normal user backend. The existing semantics are > > that user backends respond to SIGHUP only at the start of processing > > a user command, and I'm uncomfortable with suddenly deciding that > > that can work differently if the command happens to be VACUUM. > > It seems unprincipled and perhaps actively unsafe. > > I think it should be ok in commands like VACUUM that already internally start > their own transactions, and thus require to be run outside of a transaction > and at the toplevel. I share your concerns about allowing config reload in > arbitrary places. While we might want to go there, it would require a lot more > analysis. As an alternative for your consideration, attached v14 set implements the config file reload for autovacuum only (in 0003) and then enables it for VACUUM and ANALYZE not in a nested transaction command (in 0004). Previously I had the commits in the reverse order for ease of review (to separate changes to worker limit balancing logic from config reload code). - Melanie From 1781dd7174d5d6eaaeb4bd02029212f3c23d4dbe Mon Sep 17 00:00:00 2001 From: Melanie Plageman Date: Sat, 25 Mar 2023 14:14:55 -0400 Subject: [PATCH v14 3/4] Autovacuum refreshes cost-based delay params more often Allow autovacuum to reload the config file more often so that cost-based delay parameters can take effect while VACUUMing a relation. Previously autovacuum workers only reloaded the config file once per relation vacuumed, so config changes could not take effect until beginning to vacuum the next table. Now, check if a reload is pending roughly once per block, when checking if we need to delay. In order for autovacuum workers to safely update their own cost delay and cost limit parameters without impacting performance, we had to rethink when and how these values were accessed. Previously, an autovacuum worker's wi_cost_limit was set only at the beginning of vacuuming a table, after reloading the config file. Therefore, at the time that autovac_balance_cost() is called, workers vacuuming tables with no table options could still have different values for their wi_cost_limit_base and wi_cost_delay. Now that the cost parameters can be updated while vacuuming a table, workers will (within some margin of error) have no reason to have different values for cost limit and cost delay (in the absence of table options). This removes the rationale for keeping cost limit and cost delay in shared memory. Balancing the cost limit requires only the number of active autovacuum workers vacuuming a table with no cost-based table options. Reviewed-by: Masahiko Sawada Reviewed-by: Kyotaro Horiguchi Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_ZngzqnEODc7LmS1NH04Kt6Y9huSjz5pp7%2BDXhrjDA0gw%40mail.gmail.com --- src/backend/commands/vacuum.c | 44 - src/backend/postmaster/autovacuum.c | 253 +++- src/include/commands/vacuum.h | 1 + 3 files changed, 180 insertions(+), 118 deletions(-) diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c index 96df5e2920..a51a3f78a0 100644 --- a/src/backend/commands/vacuum.c +++ b/src/backend/commands/vacuum.c @@ -48,6 +48,7 @@ #include "pgstat.h" #include "postmaster/autovacuum.h" #include "postmaster/bgworker_internals.h" +#include "postmaster/interrupt.h" #include "storage/bufmgr.h" #include "storage/lmgr.h" #include "storage/pmsignal.h" @@ -510,9 +511,9 @@ vacuum(List *relations, VacuumParams *params, { ListCell *cur; - VacuumUpdateCosts(); in_vacuum = true; - VacuumCostActive = (VacuumCostDelay > 0); + VacuumFailsafeActive = false; + VacuumUpdateCosts(); VacuumCostBalance = 0; VacuumPageHit = 0; VacuumPageMiss = 0; @@ -566,12 +567,20 @@ vacuum(List *relations, VacuumParams *params, CommandCounterIncrement(); } } + + /* + * Ensure VacuumFailsafeActive has been reset before vacuuming the + * next relation relation. + */ + VacuumFailsafeActive = false; } } PG_FINALLY(); { in_vacuum = false; VacuumCostActive = false; + VacuumFailsafeActive = false; + VacuumCostBalance = 0; } PG_END_TRY(); @@ -2238,7 +2247,27 @@ vacuum_delay_point(void) /* Always check for interrupts */ CHECK_FOR_INTERRUPTS(); - if (!VacuumCostActive || InterruptPending) + if (InterruptPending || + (!VacuumCostActive && !ConfigReloadPending)) + return; + + /* + * Reload the configuration file if requested. This allows changes to + * autovacuum_vacuum_cost_limit and
Re: POC: Lock updated tuples in tuple_update() and tuple_delete()
On Mon, Apr 3, 2023 at 5:12 PM Pavel Borisov wrote: > Upon Alexander reverting patches v15 from master, I've rebased what > was correction patches v4 in a message above on a fresh master > (together with patches v15). The resulting patch v16 is attached. Pavel, thank you for you review, revisions and rebase. We'll reconsider this once v17 is branched. -- Regards, Alexander Korotkov
Re: Patch proposal: New hooks in the connection path
This looks like it was a good discussion -- last summer. But it doesn't seem to be a patch under active development now. It sounds like there were some design constraints that still need some new ideas to solve and a new patch will be needed to address them. Should this be marked Returned With Feedback? -- Gregory Stark As Commitfest Manager
Re: psql: Add role's membership options to the \du+ command
On Wed, Mar 22, 2023 at 11:11 AM Pavel Luzanov wrote: > In the previous version, I didn't notice (unlike cfbot) the compiler > warning. Fixed in version 6. > > I've marked this Ready for Committer. My opinion is that this is a necessary modification due to the already-committed changes to the membership grant implementation and so only needs to be accepted prior to v16 going live, not feature freeze. I've added Robert to this thread as the committer of said changes. David J.
Re: [PATCH] Introduce array_shuffle() and array_sample()
> On 3 Apr 2023, at 23:46, Tom Lane wrote: > > Daniel Gustafsson writes: >> On 29 Sep 2022, at 21:33, Tom Lane wrote: >>> I find this behavior a bit surprising: >>> >>> +SELECT >>> array_dims(array_sample('[-1:2][2:3]={{1,2},{3,NULL},{5,6},{7,8}}'::int[], >>> 3)); >>> + array_dims >>> +- >>> + [-1:1][2:3] >>> +(1 row) >>> >>> I can buy preserving the lower bound in array_shuffle(), but >>> array_sample() is not preserving the first-dimension indexes of >>> the array, so ISTM it ought to reset the first lower bound to 1. > >> I might be daft but I'm not sure I follow why not preserving here, can you >> explain? > > Because array_sample selects only some of the (first level) array > elements, those elements are typically not going to have the same > indexes in the output as they did in the input. So I don't see why > it would be useful to preserve the same lower-bound index. It does > make sense to preserve the lower-order index bounds ([2:3] in this > example) because we are including or not including those array > slices as a whole. Ah, ok, now I see what you mean, thanks! I'll try to fix up the patch like this tomorrow. -- Daniel Gustafsson
Re: Split index and table statistics into different types of stats
On Thu, 16 Mar 2023 at 05:25, Drouvot, Bertrand wrote: > > My plan was to get [1] done before resuming working on the "Split index and > table statistics into different types of stats" one. > [1]: https://commitfest.postgresql.org/42/4106/ Generate pg_stat_get_xact*() functions with Macros ([1]) was committed March 27. There's only a few days left in this CF. Would you like to leave this here? Should it be marked Needs Review or Ready for Commit? Or should we move it to the next CF now? -- Gregory Stark As Commitfest Manager
Re: [PATCH] Introduce array_shuffle() and array_sample()
Daniel Gustafsson writes: > On 29 Sep 2022, at 21:33, Tom Lane wrote: >> I find this behavior a bit surprising: >> >> +SELECT >> array_dims(array_sample('[-1:2][2:3]={{1,2},{3,NULL},{5,6},{7,8}}'::int[], >> 3)); >> + array_dims >> +- >> + [-1:1][2:3] >> +(1 row) >> >> I can buy preserving the lower bound in array_shuffle(), but >> array_sample() is not preserving the first-dimension indexes of >> the array, so ISTM it ought to reset the first lower bound to 1. > I might be daft but I'm not sure I follow why not preserving here, can you > explain? Because array_sample selects only some of the (first level) array elements, those elements are typically not going to have the same indexes in the output as they did in the input. So I don't see why it would be useful to preserve the same lower-bound index. It does make sense to preserve the lower-order index bounds ([2:3] in this example) because we are including or not including those array slices as a whole. regards, tom lane
Re: generic plans and "initial" pruning
Amit Langote writes: > [ v38 patchset ] I spent a little bit of time looking through this, and concluded that it's not something I will be wanting to push into v16 at this stage. The patch doesn't seem very close to being committable on its own terms, and even if it was now is not a great time in the dev cycle to be making significant executor API changes. Too much risk of having to thrash the API during beta, or even change it some more in v17. I suggest that we push this forward to the next CF with the hope of landing it early in v17. A few concrete thoughts: * I understand that your plan now is to acquire locks on all the originally-named tables, then do permissions checks (which will involve only those tables), then dynamically lock just inheritance and partitioning child tables as we descend the plan tree. That seems more or less okay to me, but it could be reflected better in the structure of the patch perhaps. * In particular I don't much like the "viewRelations" list, which seems like a wart; those ought to be handled more nearly the same way as other RTEs. (One concrete reason why is that this scheme is going to result in locking views in a different order than they were locked during original parsing, which perhaps could contribute to deadlocks.) Maybe we should store an integer list of which RTIs need to be locked in the early phase? Building that in the parser/rewriter would provide a solid guide to the original locking order, so we'd be trivially sure of duplicating that. (It might be close enough to follow the RT list order, which is basically what AcquireExecutorLocks does today, but this'd be more certain to do the right thing.) I'm less concerned about lock order for child tables because those are just going to follow the inheritance or partitioning structure. * I don't understand the need for changes like this: /* clean up tuple table */ - ExecClearTuple(node->ps.ps_ResultTupleSlot); + if (node->ps.ps_ResultTupleSlot) + ExecClearTuple(node->ps.ps_ResultTupleSlot); ISTM that the process ought to involve taking a lock (if needed) before we have built any execution state for a given plan node, and if we find we have to fail, returning NULL instead of a partially-valid planstate node. Otherwise, considerations of how to handle partially-valid nodes are going to metastasize into all sorts of places, almost certainly including EXPLAIN for instance. I think we ought to be able to limit the damage to "parent nodes might have NULL child links that you wouldn't have expected". That wouldn't faze ExecEndNode at all, nor most other code. * More attention is needed to comments. For example, in a couple of places in plancache.c you have removed function header comments defining API details and not replaced them with any info about the new details, despite the fact that those details are more complex than the old. > It seems I hadn't noted in the ExecEndNode()'s comment that all node > types' recursive subroutines need to handle the change made by this > patch that the corresponding ExecInitNode() subroutine may now return > early without having initialized all state struct fields. > Also noted in the documentation for CustomScan and ForeignScan that > the Begin*Scan callback may not have been called at all, so the > End*Scan should handle that gracefully. Yeah, I think we need to avoid adding such requirements. It's the sort of thing that would far too easily get past developer testing and only fail once in a blue moon in the field. regards, tom lane
Re: WIP: Aggregation push-down - take2
It looks like in November 2022 Tomas Vondra said: > I did a quick initial review of the v20 patch series. > I plan to do a more thorough review over the next couple days, if time permits. > In general I think the patch is in pretty good shape. Following which Antonin Houska updated the patch responding to his review comments. Since then this patch has demonstrated the unfortunate "please rebase thx" followed by the author rebasing and getting no feedback until "please rebase again thx"... So while the patch doesn't currently apply it seems like it really should be either Needs Review or Ready for Commit. That said, I suspect this patch has missed the boat for this CF. Hopefully it will get more attention next release. I'll move it to the next CF but set it to Needs Review even though it needs a rebase. -- Gregory Stark As Commitfest Manager
Re: GUC for temporarily disabling event triggers
> On 3 Apr 2023, at 16:09, Robert Haas wrote: > On Mon, Apr 3, 2023 at 9:15 AM Daniel Gustafsson wrote: >>> On 3 Apr 2023, at 15:09, Robert Haas wrote: >>> I continue to think it's odd that the sense of this is inverted as >>> compared with row_security. >> >> I'm not sure I follow. Do you propose that the GUC enables classes of event >> triggers, the default being "all" (or similar) and one would remove the type >> of >> EVT for which debugging is needed? That doesn't seem like a bad idea, just >> one >> that hasn't come up in the discussion (and I didn't think about). > > Right. Although to be fair, that idea doesn't sound as good if we're > going to have settings other than "on" or "off". Yeah. The patch as it stands allow for disabling specific types rather than all-or-nothing, which is why the name was "ignore". > I'm not sure what the best thing to do is here, I just think it > deserves some thought. Absolutely, the discussion is much appreciated. Having done some thinking I think I'm still partial to framing it as a disabling GUC rather than an enabling; with the act of setting it being "As an admin I want to skip execution of all evt's of type X". -- Daniel Gustafsson
Re: zstd compression for pg_dump
On 4/3/23 21:17, Justin Pryzby wrote: > On Sat, Apr 01, 2023 at 10:26:01PM +0200, Tomas Vondra wrote: >>> Feel free to mess around with threads (but I'd much rather see the patch >>> progress for zstd:long). >> >> OK, understood. The long mode patch is pretty simple. IIUC it does not >> change the format, i.e. in the worst case we could leave it for PG17 >> too. Correct? > > Right, libzstd only has one "format", which is the same as what's used > by the commandline tool. zstd:long doesn't change the format of the > output: the library just uses a larger memory buffer to allow better > compression. There's no format change for zstd:workers, either. > OK. I plan to do a bit more review/testing on this, and get it committed over the next day or two, likely including the long mode. One thing I noticed today is that maybe long_distance should be a bool, not int. Yes, ZSTD_c_enableLongDistanceMatching() accepts int, but it'd be cleaner to cast the value during a call and keep it bool otherwise. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [PATCH] Introduce array_shuffle() and array_sample()
> On 29 Sep 2022, at 21:33, Tom Lane wrote: > > Martin Kalcher writes: >> New patch: array_shuffle() and array_sample() use pg_global_prng_state now. > > I took a closer look at the patch today. Since this seems pretty close to going in, and seems like quite useful functions, I took a look to see if I could get it across the line (although I noticed that CFM beat me to the clock in sending this =)). > I find this behavior a bit surprising: > > +SELECT > array_dims(array_sample('[-1:2][2:3]={{1,2},{3,NULL},{5,6},{7,8}}'::int[], > 3)); > + array_dims > +- > + [-1:1][2:3] > +(1 row) > > I can buy preserving the lower bound in array_shuffle(), but > array_sample() is not preserving the first-dimension indexes of > the array, so ISTM it ought to reset the first lower bound to 1. I might be daft but I'm not sure I follow why not preserving here, can you explain? The rest of your comments have been addressed in the attached v6 I think (although I'm pretty sure the docs part is just as bad now, explaining these in concise words is hard, will take another look with fresh eyes tomorrow). -- Daniel Gustafsson v6-0001-Introduce-array_shuffle-and-array_sample.patch Description: Binary data
Re: [PATCH] Introduce array_shuffle() and array_sample()
Given that there's been no updates since September 22 I'm going to make this patch Returned with Feedback. The patch can be resurrected when there's more work done -- don't forget to move it to the new CF when you do that. -- Gregory Stark As Commitfest Manager
Re: pg_stats and range statistics
On Fri, 24 Mar 2023 at 14:48, Egor Rogov wrote: > > Done. > There is one thing I'm not sure what to do about. This check: > > if (typentry->typtype != TYPTYPE_RANGE) > ereport(ERROR, > (errcode(ERRCODE_DATATYPE_MISMATCH), >errmsg("expected array of ranges"))); > > doesn't work, because the range_get_typcache() call errors out first > ("type %u is not a range type"). The message doesn't look friendly > enough for user-faced SQL function. Should we duplicate > range_get_typcache's logic and replace the error message? > Okay. I've corrected the examples a bit. It sounds like you've addressed Tomas's feedback and still have one open question. Fwiw I rebased it, it seemed to merge fine automatically. I've updated the CF entry to Needs Review. But at this late date it may have to wait until the next release. -- Gregory Stark As Commitfest Manager From 87424880f1a970448979681684e6916f33567eeb Mon Sep 17 00:00:00 2001 From: Greg Stark Date: Mon, 3 Apr 2023 17:04:11 -0400 Subject: [PATCH] pg_stats and range statistics --- doc/src/sgml/func.sgml| 36 ++ doc/src/sgml/system-views.sgml| 40 +++ src/backend/catalog/system_views.sql | 30 - src/backend/utils/adt/rangetypes_typanalyze.c | 107 ++ src/include/catalog/pg_proc.dat | 10 ++ src/test/regress/expected/rangetypes.out | 22 src/test/regress/expected/rules.out | 34 +- src/test/regress/sql/rangetypes.sql | 7 ++ 8 files changed, 284 insertions(+), 2 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 918a492234..548078a12e 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -19643,6 +19643,24 @@ SELECT NULLIF(value, '(none)') ... + + + + ranges_lower + +ranges_lower ( anyarray ) +anyarray + + +Extracts lower bounds of ranges in the array (NULL if +the range is empty or the lower bound is infinite). + + +lower(ARRAY[numrange(1.1,2.2),numrange(3.3,4.4)]) +{1.1,3.3} + + + @@ -19661,6 +19679,24 @@ SELECT NULLIF(value, '(none)') ... + + + + ranges_upper + +ranges_upper ( anyarray ) +anyarray + + +Extracts upper bounds of ranges (NULL if the +range is empty or the upper bound is infinite). + + +upper(ARRAY[numrange(1.1,2.2),numrange(3.3,4.4)]) +{2.2,4.4} + + + diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml index bb1a418450..d7760838ae 100644 --- a/doc/src/sgml/system-views.sgml +++ b/doc/src/sgml/system-views.sgml @@ -3784,6 +3784,46 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx non-null elements. (Null for scalar types.) + + + + range_length_histogram anyarray + + + A histogram of the lengths of non-empty and non-null range values of a + range type column. (Null for non-range types.) + + + + + + range_empty_frac float4 + + + Fraction of column entries whose values are empty ranges. + (Null for non-range types.) + + + + + + range_lower_histogram anyarray + + + A histogram of lower bounds of non-empty and non-null range values. + (Null for non-range types.) + + + + + + range_upper_histogram anyarray + + + A histogram of upper bounds of non-empty and non-null range values. + (Null for non-range types.) + + diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 574cbc2e44..3fb7ee448f 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -243,7 +243,35 @@ CREATE VIEW pg_stats WITH (security_barrier) AS WHEN stakind3 = 5 THEN stanumbers3 WHEN stakind4 = 5 THEN stanumbers4 WHEN stakind5 = 5 THEN stanumbers5 -END AS elem_count_histogram +END AS elem_count_histogram, +CASE +WHEN stakind1 = 6 THEN stanumbers1[1] +WHEN stakind2 = 6 THEN stanumbers2[1] +WHEN stakind3 = 6 THEN stanumbers3[1] +WHEN stakind4 = 6 THEN stanumbers4[1] +WHEN stakind5 = 6 THEN stanumbers5[1] +END AS range_empty_frac, +CASE +WHEN stakind1 = 6 THEN stavalues1 +WHEN stakind2 = 6 THEN stavalues2 +WHEN stakind3 = 6 THEN stavalues3 +WHEN stakind4 = 6 THEN stavalues4 +WHEN stakind5 = 6 THEN stavalues5 +END AS range_length_histogram, +CASE +
Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert
On Mon, Apr 3, 2023 at 12:40 PM Daniel Gustafsson wrote: > Doh, sorry, my bad. I read and wrote 1.0.1 but was thinking about 1.0.2. You > are right, in 1.0.1 that API does not exist. I'm not all too concerned with > skipping this tests on OpenSSL versions that by the time 16 ships are 6 years > EOL - and I'm not convinced that spending meson/autoconf cycles to include > them > is warranted. Cool. v10 keys off of HAVE_SSL_CTX_SET_CERT_CB, instead. > > We could maybe have them connect to a known host: > > > >$ echo Q | openssl s_client -connect postgresql.org:443 > > -verify_return_error > > Something along these lines is probably best, if we do it at all. Needs > sleeping on. Sounds good. Thanks! --Jacob 1: 18fd368e0e ! 1: 957a011364 libpq: add sslrootcert=system to use default CAs @@ .cirrus.yml: task: ccache_cache: - ## configure ## -@@ configure: $as_echo "$ac_res" >&6; } - - } # ac_fn_c_check_func - -+# ac_fn_c_check_decl LINENO SYMBOL VAR INCLUDES -+# - -+# Tests whether SYMBOL is declared in INCLUDES, setting cache variable VAR -+# accordingly. -+ac_fn_c_check_decl () -+{ -+ as_lineno=${as_lineno-"$1"} as_lineno_stack=as_lineno_stack=$as_lineno_stack -+ # Initialize each $ac_[]_AC_LANG_ABBREV[]_decl_warn_flag once. -+ as_decl_name=`echo $2|sed 's/ *(.*//'` -+ as_decl_use=`echo $2|sed -e 's/(/((/' -e 's/)/) 0&/' -e 's/,/) 0& (/g'` -+ { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether $as_decl_name is declared" >&5 -+$as_echo_n "checking whether $as_decl_name is declared... " >&6; } -+if eval \${$3+:} false; then : -+ $as_echo_n "(cached) " >&6 -+else -+ ac_save_werror_flag=$ac_c_werror_flag -+ ac_c_werror_flag="$ac_c_decl_warn_flag$ac_c_werror_flag" -+ cat confdefs.h - <<_ACEOF >conftest.$ac_ext -+/* end confdefs.h. */ -+$4 -+int -+main () -+{ -+#ifndef $as_decl_name -+#ifdef __cplusplus -+ (void) $as_decl_use; -+#else -+ (void) $as_decl_name; -+#endif -+#endif -+ -+ ; -+ return 0; -+} -+_ACEOF -+if ac_fn_c_try_compile "$LINENO"; then : -+ eval "$3=yes" -+else -+ eval "$3=no" -+fi -+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext -+ ac_c_werror_flag=$ac_save_werror_flag -+fi -+eval ac_res=\$$3 -+ { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5 -+$as_echo "$ac_res" >&6; } -+ eval $as_lineno_stack; ${as_lineno_stack:+:} unset as_lineno -+ -+} # ac_fn_c_check_decl -+ - # ac_fn_c_check_type LINENO TYPE VAR INCLUDES - # --- - # Tests whether TYPE exists after having included INCLUDES, setting cache -@@ configure: rm -f conftest.val - as_fn_set_status $ac_retval - - } # ac_fn_c_compute_int -- --# ac_fn_c_check_decl LINENO SYMBOL VAR INCLUDES --# - --# Tests whether SYMBOL is declared in INCLUDES, setting cache variable VAR --# accordingly. --ac_fn_c_check_decl () --{ -- as_lineno=${as_lineno-"$1"} as_lineno_stack=as_lineno_stack=$as_lineno_stack -- # Initialize each $ac_[]_AC_LANG_ABBREV[]_decl_warn_flag once. -- as_decl_name=`echo $2|sed 's/ *(.*//'` -- as_decl_use=`echo $2|sed -e 's/(/((/' -e 's/)/) 0&/' -e 's/,/) 0& (/g'` -- { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether $as_decl_name is declared" >&5 --$as_echo_n "checking whether $as_decl_name is declared... " >&6; } --if eval \${$3+:} false; then : -- $as_echo_n "(cached) " >&6 --else -- ac_save_werror_flag=$ac_c_werror_flag -- ac_c_werror_flag="$ac_c_decl_warn_flag$ac_c_werror_flag" -- cat confdefs.h - <<_ACEOF >conftest.$ac_ext --/* end confdefs.h. */ --$4 --int --main () --{ --#ifndef $as_decl_name --#ifdef __cplusplus -- (void) $as_decl_use; --#else -- (void) $as_decl_name; --#endif --#endif -- -- ; -- return 0; --} --_ACEOF --if ac_fn_c_try_compile "$LINENO"; then : -- eval "$3=yes" --else -- eval "$3=no" --fi --rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext -- ac_c_werror_flag=$ac_save_werror_flag --fi --eval ac_res=\$$3 -- { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5 --$as_echo "$ac_res" >&6; } -- eval $as_lineno_stack; ${as_lineno_stack:+:} unset as_lineno -- --} # ac_fn_c_check_decl - cat >config.log <<_ACEOF - This file contains any messages produced by compilers while - running configure, to aid debugging if configure makes a mistake. -@@ configure: _ACEOF - fi - done - -+ # Let tests differentiate between vanilla OpenSSL and LibreSSL. -+ # The Clang
Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
On Mon, Apr 3, 2023 at 12:13 AM Pavel Luzanov wrote: > > Hello, > > I found that the 'standalone backend' backend type is not documented > right now. > Adding something like (from commit message) would be helpful: > > Both the bootstrap backend and single user mode backends will have > backend_type STANDALONE_BACKEND. Thanks for the report. Attached is a tiny patch to add standalone backend type to pg_stat_activity documentation (referenced by pg_stat_io). I mentioned both the bootstrap process and single user mode process in the docs, though I can't imagine that the bootstrap process is relevant for pg_stat_activity. I also noticed that the pg_stat_activity docs call background workers "parallel workers" (though it also mentions that extensions could have other background workers registered), but this seems a bit weird because pg_stat_activity uses GetBackendTypeDesc() and this prints "background worker" for type B_BG_WORKER. Background workers doing parallelism tasks is what users will most often see in pg_stat_activity, but I feel like it is confusing to have it documented as something different than what would appear in the view. Unless I am misunderstanding something... - Melanie From d9218d082397d9b87a3e126bce4a45e9ec720ff2 Mon Sep 17 00:00:00 2001 From: Melanie Plageman Date: Mon, 3 Apr 2023 16:38:47 -0400 Subject: [PATCH v1] Document standalone backend type in pg_stat_activity Reported-by: Pavel Luzanov Discussion: https://www.postgresql.org/message-id/fcbe2851-f1fb-9863-54bc-a95dc7a0d946%40postgrespro.ru --- doc/src/sgml/monitoring.sgml | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index d5a45f996d..a00fe9c6a3 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -989,10 +989,12 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser parallel worker, background writer, client backend, checkpointer, archiver, - startup, walreceiver, - walsender and walwriter. - In addition, background workers registered by extensions may have - additional types. + startup, + standalone backend (which includes both the +process and bootstrap + process), walreceiver, walsender + and walwriter. In addition, background workers + registered by extensions may have additional types. -- 2.37.2
Re: pg_usleep for multisecond delays
On Mon, 13 Mar 2023 at 17:17, Nathan Bossart wrote: > > On Fri, Mar 10, 2023 at 12:28:28PM -0500, Tom Lane wrote: > > A quick grep for pg_usleep with large intervals finds rather more > > than you touched: > > > > [...] > > > > Did you have reasons for excluding the rest of these? > > I'm still looking into each of these, but my main concerns were 1) ensuring > latch support had been set up and 2) figuring out the right interrupt > function to use. Thus far, I don't think latch support is a problem > because InitializeLatchSupport() is called quite early. However, I'm not > as confident with the interrupt functions yet. Some of these multisecond > sleeps are done very early before the signal handlers are set up. Others > are done within the sigsetjmp() block. And at least one is in a code path > that both the startup process and single-user mode touch. Is this still a WIP? Is it targeting this release? There are only a few days left before the feature freeze. I'm guessing it should just move to the next CF for the next release? -- Gregory Stark As Commitfest Manager
Re: Removing unneeded self joins
On Mon, 6 Mar 2023 at 00:30, Michał Kłeczek wrote: > > Hi All, > > I just wanted to ask about the status and plans for this patch. > I can see it being stuck at “Waiting for Author” status in several commit > tests. Sadly it seems to now be badly in need of a rebase. There are large hunks failing in the guts of analyzejoins.c as well as minor failures elsewhere and lots of offsets which need to be reviewed. I think given the lack of activity it's out of time for this release at this point. I'm moving it ahead to the next CF. -- Gregory Stark As Commitfest Manager
Re: psql - factor out echo code
On Mon, 13 Feb 2023 at 05:41, Peter Eisentraut wrote: > > I think this patch requires an up-to-date summary and explanation. The > thread is over a year old and the patch has evolved quite a bit. There > are some test changes that are not explained. Please provide more > detail so that the patch can be considered. Given this feedback I'm going to mark this Returned with Feedback. I think it'll be clearer to start with a new thread explaining the intent of the patch as it is now. -- Gregory Stark As Commitfest Manager
Re: SQL/JSON revisited
Hi Alvaro, 03.04.2023 20:16, Alvaro Herrera wrote: So I pushed 0001 on Friday, and here are 0002 (which I intend to push shortly, since it shouldn't be controversial) and the "JSON query functions" patch as 0003. After looking at it some more, I think there are some things that need to be addressed by one of the authors: - the gram.y solution to the "ON ERROR/ON EMPTY" clauses is quite ugly. I think we could make that stuff use something similar to ConstraintAttributeSpec with an accompanying post-processing function. That would reduce the number of ad-hoc hacks, which seem excessive. - the changes in formatting.h have no explanation whatsoever. At the very least, the new function should have a comment in the .c file. (And why is it at end of file? I bet there's a better location) - some nasty hacks are being used in the ECPG grammar with no tests at all. It's easy to add a few lines to the .pgc file I added in prior commits. - Some functions in jsonfuncs.c have changed from throwing hard errors into soft ones. I think this deserves more commentary. - func.sgml: The new functions are documented in a separate table for no reason that I can see. Needs to be merged into one of the existing tables. I didn't actually review the docs. Please take a look at the following minor issues in v15-0002-SQL-JSON-query-functions.patch: 1) s/addreess/address/ 2) ECPGColLabelCommon gone with 83f1c7b74, but is still mentioned in ecpg.trailer. 3) s/ExecEvalJsonCoercion/ExecEvalJsonExprCoercion/ ? (there is no ExecEvalJsonCoercion() function) 4) json_table mentioned in func.sgml: details the SQL/JSON functions that can be used to query JSON data, except for json_table. but if JSON_TABLE not going to be committed in v16, maybe remove that reference to it. There is also a reference to JSON_TABLE in src/backend/parser/README: parse_jsontable.c handle JSON_TABLE (It was added with 9853bf6ab and survived the revert of SQL JSON last year somehow.) Best regards, Alexander
Re: Prefetch the next tuple's memory during seqscans
On Sun, 29 Jan 2023 at 21:24, David Rowley wrote: > > I've moved this patch to the next CF. This patch has a dependency on > what's being proposed in [1]. The referenced patch was committed March 19th but there's been no comment here. Is this patch likely to go ahead this release or should I move it forward again? -- Gregory Stark As Commitfest Manager
Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert
> On 3 Apr 2023, at 21:04, Jacob Champion wrote: > > On Sun, Apr 2, 2023 at 1:36 PM Daniel Gustafsson wrote: >>> On 31 Mar 2023, at 19:59, Jacob Champion wrote: >>> I can make that change; note that it'll also skip some of the new tests >>> with OpenSSL 1.0.1, where there's no SSL_CTX_set_cert_cb. If that's >>> acceptable, it should be an easy switch. >> >> I'm not sure I follow, AFAICT it's present all the way till 3.1 at least? >> What >> am I missing? > > I don't see it anywhere in my 1.0.1 setup, and Meson doesn't define > HAVE_SSL_CTX_SET_CERT_CB when built against it. Doh, sorry, my bad. I read and wrote 1.0.1 but was thinking about 1.0.2. You are right, in 1.0.1 that API does not exist. I'm not all too concerned with skipping this tests on OpenSSL versions that by the time 16 ships are 6 years EOL - and I'm not convinced that spending meson/autoconf cycles to include them is warranted. Longer term I'd want to properly distinguish between LibreSSL and OpenSSL, but then we should have a bigger discussion on what we want to use these values for. >>> Is there something we could document that's more helpful than "make sure >>> your installation isn't broken"? >> >> I wonder if there is an openssl command line example for verifying defaults >> that we can document and refer to? > > We could maybe have them connect to a known host: > >$ echo Q | openssl s_client -connect postgresql.org:443 > -verify_return_error Something along these lines is probably best, if we do it at all. Needs sleeping on. -- Daniel Gustafsson
Re: Minimal logical decoding on standbys
Hi, On 2023-04-03 17:34:52 +0200, Alvaro Herrera wrote: > On 2023-Apr-03, Drouvot, Bertrand wrote: > > > +/* > > + * Report terminating or conflicting message. > > + * > > + * For both, logical conflict on standby and obsolete slot are handled. > > + */ > > +static void > > +ReportTerminationInvalidation(bool terminating, bool islogical, int pid, > > + NameData slotname, > > TransactionId *xid, > > + XLogRecPtr > > restart_lsn, XLogRecPtr oldestLSN) > > +{ > > > + if (terminating) > > + appendStringInfo(_msg, _("terminating process %d to release > > replication slot \"%s\""), > > +pid, > > +NameStr(slotname)); > > + else > > + appendStringInfo(_msg, _("invalidating")); > > + > > + if (islogical) > > + { > > + if (terminating) > > + appendStringInfo(_msg, _(" because it conflicts > > with recovery")); > > You can't build the strings this way, because it's not possible to put > the strings into the translation machinery. You need to write full > strings for each separate case instead, without appending other string > parts later. Hm? That's what the _'s do. We build strings in parts in other places too. You do need to use errmsg_internal() later, to prevent that format string from being translated as well. I'm not say that this is exactly the right way, don't get me wrong. Greetings, Andres Freund
Re: Pass heaprel to GlobalVisTestFor() in vacuumRedirectAndPlaceholder()
On Mon, Apr 3, 2023 at 12:09 AM Drouvot, Bertrand wrote: > Right. Please find enclosed V2 also taking care of BTPageIsRecyclable() > and _bt_pendingfsm_finalize(). Pushed that as too separate patches just now. Thanks. BTW, I'm not overly happy about the extent of the changes to nbtree from commit 61b313e4. I understand that it was necessary to pass down a heaprel in a lot of places, which is bound to create a lot of churn. However, a lot of the churn from the commit seems completely avoidable. There is no reason why the BT_READ path in _bt_getbuf() could possibly require a valid heaprel. In fact, most individual BT_WRITE calls don't need heaprel, either -- only those that pass P_NEW. The changes affecting places like _bt_mkscankey() and _bt_metaversion() seem particularly bad. Anyway, I'll take care of this myself at some point after feature freeze. -- Peter Geoghegan
Re: zstd compression for pg_dump
On Sat, Apr 01, 2023 at 10:26:01PM +0200, Tomas Vondra wrote: > > Feel free to mess around with threads (but I'd much rather see the patch > > progress for zstd:long). > > OK, understood. The long mode patch is pretty simple. IIUC it does not > change the format, i.e. in the worst case we could leave it for PG17 > too. Correct? Right, libzstd only has one "format", which is the same as what's used by the commandline tool. zstd:long doesn't change the format of the output: the library just uses a larger memory buffer to allow better compression. There's no format change for zstd:workers, either. -- Justin
Re: running logical replication as the subscription owner
On Mon, 2023-04-03 at 10:26 -0400, Robert Haas wrote: > Not very much. I think the biggest risk is user confusion, but I > don't > think that's a huge risk because I don't think this scenario will > come > up very often. Also, it's kind of hard to imagine that there's a > security model here which never does anything potentially surprising. Alright, let's just proceed as-is then. I believe these patches are a major improvement to the usability of logical replication and will put up with the weirdness. I wanted to understand better why it's there, and I'm not sure I fully do, but we'll have more time to discuss later. Regards, Jeff Davis
Re: Should vacuum process config file reload more often
Hi, On 2023-04-03 14:43:14 -0400, Tom Lane wrote: > Melanie Plageman writes: > > v13 attached with requested updates. > > I'm afraid I'd not been paying any attention to this discussion, > but better late than never. I'm okay with letting autovacuum > processes reload config files more often than now. However, > I object to allowing ProcessConfigFile to be called from within > commands in a normal user backend. The existing semantics are > that user backends respond to SIGHUP only at the start of processing > a user command, and I'm uncomfortable with suddenly deciding that > that can work differently if the command happens to be VACUUM. > It seems unprincipled and perhaps actively unsafe. I think it should be ok in commands like VACUUM that already internally start their own transactions, and thus require to be run outside of a transaction and at the toplevel. I share your concerns about allowing config reload in arbitrary places. While we might want to go there, it would require a lot more analysis. Greetings, Andres Freund
Re: Thoughts on using Text::Template for our autogenerated code?
> > Yeah, it's somewhat hard to believe that the cost/benefit ratio would be > attractive. But maybe you could mock up some examples of what the input > could look like, and get people on board (or not) before writing any > code. > > tl;dr - I tried a few things, nothing that persuades myself let alone the community, but perhaps some ideas for the future. I borrowed Bertrand's ongoing work for waiteventnames.* because that is what got me thinking about this in the first place. I considered a few different templating libraries: There is no perl implementation of the golang template library (example of that here: https://blog.gopheracademy.com/advent-2017/using-go-templates/ ) that I could find. Text::Template does not support loops, and as such it is no better than here-docs. Template Toolkit seems to do what we need, but it has a kitchen sink of dependencies that make it an unattractive option, so I didn't even attempt it. HTML::Template has looping and if/then/else constructs, and it is a single standalone library. It also does a "separation of concerns" wherein you pass in parameter names and values, and some parameters can be for loops, which means you pass an arrayref of hashrefs that the template engine loops over. That's where the advantages stop, however. It is fairly verbose, and because it is HTML-centric it isn't very good about controlling whitespace, which leads to piling template directives onto the same line in order to avoid spurious newlines. As such I cannot recommend it. My ideal template library would have text something like this: [% loop events %] [% $enum_value %] [% if __first__ +%] = [%+ $inital_value %][% endif %] [% if ! __last__ %],[% endif +%] [% end loop %] [% loop xml_blocks indent: relative,spaces,4 %] [%element_body%]/> [% end loop %] [%+ means "leading whitespace matters", +%] means "trailing whitespace matters" That pseudocode is a mix of ASP, HTML::Template. The special variables __first__ and __last__ refer to which iteration of the loop we are on. You would pass it a data structure like this: {events: [ { enum_value: "abc", initial_value: "def"}, ... { enum_value: "wuv", initial_value: "xyz" } ], xml_block: [ {attrib_val: "one", element_body: "two"} ] } I did one initial pass with just converting printf statements to here-docs, and the results were pretty unsatisfying. It wasn't really possible to "see" the output files take shape. My next attempt was to take the "separation of concerns" code from the HTML::Template version, constructing the nested data structure of resolved output values, and then iterating over that once per output file. This resulted in something cleaner, partly because we're only writing one file type at a time, partly because the interpolated variables have names much closer to their output meaning. In doing this, it occurred to me that a lot of this effort is in getting the code to conform to our own style guide, at the cost of the generator code being less readable. What if we wrote the generator and formatted the code in a way that made sense for the generator, and then pgindented it. That's not the workflow right now, but perhaps it could be. Conclusions: - There is no "good enough" template engine that doesn't require big changes in dependencies. - pgindent will not save you from a run-on sentence, like putting all of a typdef enum values on one line - There is some clarity value in either separating input processing from the output processing, or making the input align more closely with the outputs - Fiddling with indentation and spacing detracts from legibility no matter what method is used. - here docs are basically ok but they necessarily confuse output indentation with code indentation. it is possible to de-indent them them with <<~ but that's a 5.25+ feature. - Any of these principles can be applied at any time, with no overhaul required. "sorted-" is the slightly modified version of Bertrand's code. "eof-as-is-" is a direct conversion of the original but using here-docs. "heredoc-fone-file-at-a-time-" first generates an output-friendly data structure "needs-pgindent-" is what is possible if we format for our own readability and make pgindent fix the output, though it was not a perfect output match sorted-generate-waiteventnames.pl Description: Perl program eof-as-is-generate-waiteventnames.pl Description: Perl program heredoc-one-file-at-a-time-generate-waiteventnames.pl Description: Perl program needs-pgindent-generate-waiteventnames.pl Description: Perl program
Re: running logical replication as the subscription owner
On Sun, 2023-04-02 at 20:21 -0700, Noah Misch wrote: > The most-plausible-to-me attack involves an ENABLE ALWAYS trigger > that logs > CURRENT_USER to an audit table. How does requiring that the subscription owner have SET ROLE privileges on the table owner help that case? As Robert pointed out, users coming from v15 will have superuser subscription owners anyway, so the change will be silent for them. We need support to apply changes as the table owner, and we need that to be the default; and this patch provides those things, and almost all users of logical replication will be better off after this is committed. The small number of users for whom this new model is not good still need the right documentation in front of them to understand the consequences, so that they can opt out one way or another (as 0002 offers). Release notes are probably the most powerful tool we have for notifying users, unfortunately. Requiring SET ROLE for users that are almost certainly superusers doesn't give an opportunity to educate people about the change in behavior. As I said before, I'm fine with requiring that the subscription owner can SET ROLE to the table owner for v16. It's the most conservative choice and the most "correct" (in that no lesser privilege we have today is a perfect match). But I feel like we can do better in version 17 when we have time to actually work through common use cases and the exceptional cases and weight them appropriately. Like, how common is it to want to get the user from a trigger on the subscriber side? Should that trigger be using SESSION_USER instead of CURRENT_USER? Security is best when it takes into account what people actually want to do and makes it easy to do that securely. Regards, Jeff Davis
Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert
On Sun, Apr 2, 2023 at 1:36 PM Daniel Gustafsson wrote: > > On 31 Mar 2023, at 19:59, Jacob Champion wrote: > > I can make that change; note that it'll also skip some of the new tests > > with OpenSSL 1.0.1, where there's no SSL_CTX_set_cert_cb. If that's > > acceptable, it should be an easy switch. > > I'm not sure I follow, AFAICT it's present all the way till 3.1 at least? > What > am I missing? I don't see it anywhere in my 1.0.1 setup, and Meson doesn't define HAVE_SSL_CTX_SET_CERT_CB when built against it. > > Is there something we could document that's more helpful than "make sure > > your installation isn't broken"? > > I wonder if there is an openssl command line example for verifying defaults > that we can document and refer to? We could maybe have them connect to a known host: $ echo Q | openssl s_client -connect postgresql.org:443 -verify_return_error Alternatively, OpenSSL will show you the OPENSSLDIR: $ openssl version -d OPENSSLDIR: "/usr/lib/ssl" and then we could tell users to ensure they have a populated certs/ directory or a cert.pem file underneath it. That'll be prone to rot, though (e.g. OpenSSL 3 introduces the default store in addition to the default file+directory). Thanks, --Jacob
Re: hio.c does visibilitymap_pin()/IO while holding buffer lock
Hi, On 2023-04-03 14:25:59 +0200, Tomas Vondra wrote: > On 4/3/23 00:40, Andres Freund wrote: > > Hi, > > > > On 2023-03-28 19:17:21 -0700, Andres Freund wrote: > >> On 2023-03-28 18:21:02 -0700, Andres Freund wrote: > >>> Here's a draft patch. > >> > >> Attached is v2, with a stupid bug fixed and a bit of comment / pgindent > >> polish. > > > > I'd welcome some review (Tomas?), but otherwise I'm planning to push ahead > > with this. > > > > I guess the 0001 part was already pushed, so I should be looking only at > 0002, correct? Yes. > I think 0002 makes RelationGetBufferForTuple() harder to understand. I'm > not saying it's incorrect, but I find it hard to reason about the new > combinations of conditions :-( > I mean, it only had this condition: > > if (otherBuffer != InvalidBuffer) > { > ... > } > > but now it have > > if (unlockedTargetBuffer) > { >... > } > else if (otherBuffer != InvalidBuffer) > { > ... > } > > if (unlockedTargetBuffer || otherBuffer != InvalidBuffer) > { > ... > } > > Not sure how to improve that :-/ but not exactly trivial to figure out > what's going to happen. It's not great, I agree. I tried to make it easier to read in this version by a) changing GetVisibilityMapPins() as I proposed b) added a new variable "recheckVmPins", that gets set in if (unlockedTargetBuffer) and if (otherBuffer != InvalidBuffer) c) reformulated comments > Maybe this > > * If we unlocked the target buffer above, it's unlikely, but possible, > * that another backend used space on this page. > > might say what we're going to do in this case. I mean - I understand > some backend may use space in unlocked page, but what does that mean for > this code? What is it going to do? (The same comment talks about the > next condition in much more detail, for example.) There's a comment about that detail further below. But you're right, it wasn't clear as-is. How about now? > Also, isn't it a bit strange the free space check now happens outside > any if condition? It used to happen in the > > if (otherBuffer != InvalidBuffer) > { > ... > } > > block, but now it happens outside. Well, the alternative is to repeat it in the two branches, which doesn't seem great either. Particularly because there'll be a third branch after the bulk extension patch. Greetings, Andres Freund >From 49183283d6701d22cf12f6a6280ed1aba02fc063 Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Mon, 3 Apr 2023 11:18:10 -0700 Subject: [PATCH v3 1/2] hio: Relax rules for calling GetVisibilityMapPins() Author: Reviewed-by: Discussion: https://postgr.es/m/ Backpatch: --- src/backend/access/heap/hio.c | 37 --- 1 file changed, 26 insertions(+), 11 deletions(-) diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c index 7479212d4e0..e8aea42f8a9 100644 --- a/src/backend/access/heap/hio.c +++ b/src/backend/access/heap/hio.c @@ -131,9 +131,9 @@ ReadBufferBI(Relation relation, BlockNumber targetBlock, * For each heap page which is all-visible, acquire a pin on the appropriate * visibility map page, if we haven't already got one. * - * buffer2 may be InvalidBuffer, if only one buffer is involved. buffer1 - * must not be InvalidBuffer. If both buffers are specified, block1 must - * be less than block2. + * To avoid complexity in the callers, either buffer1 or buffer2 may be + * InvalidBuffer if only one buffer is involved. For the same reason, block2 + * may be smaller than block1. */ static void GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2, @@ -143,6 +143,26 @@ GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2, bool need_to_pin_buffer1; bool need_to_pin_buffer2; + /* + * Swap buffers around to handle case of a single block/buffer, and to + * handle if lock ordering rules require to lock block2 first. + */ + if (!BufferIsValid(buffer1) || + (BufferIsValid(buffer2) && block1 > block2)) + { + Buffer tmpbuf = buffer1; + Buffer *tmpvmbuf = vmbuffer1; + BlockNumber tmpblock = block1; + + buffer1 = buffer2; + vmbuffer1 = vmbuffer2; + block1 = block2; + + buffer2 = tmpbuf; + vmbuffer2 = tmpvmbuf; + block2 = tmpblock; + } + Assert(BufferIsValid(buffer1)); Assert(buffer2 == InvalidBuffer || block1 <= block2); @@ -502,14 +522,9 @@ loop: * done a bit of extra work for no gain, but there's no real harm * done. */ - if (otherBuffer == InvalidBuffer || targetBlock <= otherBlock) - GetVisibilityMapPins(relation, buffer, otherBuffer, - targetBlock, otherBlock, vmbuffer, - vmbuffer_other); - else - GetVisibilityMapPins(relation, otherBuffer, buffer, - otherBlock, targetBlock, vmbuffer_other, - vmbuffer); + GetVisibilityMapPins(relation, buffer, otherBuffer, + targetBlock, otherBlock, vmbuffer, + vmbuffer_other); /* * Now we can check to
Re: Why enable_hashjoin Completely disables HashJoin
On Mon, Apr 3, 2023 at 2:04 PM Tom Lane wrote: > Yeah. In some places it would not be too hard; for example, if we > generated seqscan paths last instead of first for baserels, the rule > could be "generate it if enable_seqscan is on OR if we made no other > path for the rel". It's much messier for joins though, partly because > the same joinrel will be considered multiple times as we process > different join orderings, plus it's usually unclear whether failing > to generate any paths for joinrel X will lead to overall failure. Yeah, good point. I'm now remembering that at one point I'd had the idea of running the whole find-a-plan-for-a-jointree step and then running it a second time if it fails to find a plan. But I think that requires some restructuring, because I think right now it does some things that we should only do once we know we're definitely getting a plan out. Or else we have to reset some state. Like if we want to go back and maybe add more paths then we have to undo and redo whatever set_cheapest() did. > A solution that would work is to treat disable_cost as a form of infinity > that's counted separately from the actual cost estimate, that is we > label paths as "cost X, plus there are N uses of disabled plan types". > Then you sort first on N and after that on X. But this'd add a good > number of cycles to add_path, which I've not wanted to expend on a > non-mainstream usage. Yeah, I thought of that at one point too and rejected it for the same reason. -- Robert Haas EDB: http://www.enterprisedb.com
Re: possible proposal plpgsql GET DIAGNOSTICS oid = PG_ROUTINE_OID
Hi po 3. 4. 2023 v 19:37 odesílatel Tom Lane napsal: > Pavel Stehule writes: > > po 27. 3. 2023 v 5:36 odesílatel Kirk Wolak napsal: > >> I have marked the item Ready for Commiter... > > > Thank you for doc and for review > > I'm kind of surprised there was any interest in this proposal at all, > TBH, but apparently there is some. Still, I think you over-engineered > it by doing more than the original proposal of making the function OID > available. The other things can be had by casting the OID to regproc > or regprocedure, so I'd be inclined to add just one new keyword not > three. Besides, your implementation is a bit inconsistent: relying > on fn_signature could return a result that is stale or doesn't conform > to the current search_path. > ok There is reduced patch + regress tests Regards Pavel > regards, tom lane > diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml index 7c8a49fe43..4df1b71904 100644 --- a/doc/src/sgml/plpgsql.sgml +++ b/doc/src/sgml/plpgsql.sgml @@ -1639,6 +1639,11 @@ GET DIAGNOSTICS integer_var = ROW_COUNT; line(s) of text describing the current call stack (see ) + + PG_CURRENT_ROUTINE_OID + oid + oid of the function currently running + diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c index b0a2cac227..ed9d26258b 100644 --- a/src/pl/plpgsql/src/pl_exec.c +++ b/src/pl/plpgsql/src/pl_exec.c @@ -2475,6 +2475,12 @@ exec_stmt_getdiag(PLpgSQL_execstate *estate, PLpgSQL_stmt_getdiag *stmt) } break; + case PLPGSQL_GETDIAG_CURRENT_ROUTINE_OID: +exec_assign_value(estate, var, + ObjectIdGetDatum(estate->func->fn_oid), + false, OIDOID, -1); + break; + default: elog(ERROR, "unrecognized diagnostic item kind: %d", diag_item->kind); diff --git a/src/pl/plpgsql/src/pl_funcs.c b/src/pl/plpgsql/src/pl_funcs.c index 5a6eadccd5..4ebf393646 100644 --- a/src/pl/plpgsql/src/pl_funcs.c +++ b/src/pl/plpgsql/src/pl_funcs.c @@ -325,6 +325,8 @@ plpgsql_getdiag_kindname(PLpgSQL_getdiag_kind kind) return "TABLE_NAME"; case PLPGSQL_GETDIAG_SCHEMA_NAME: return "SCHEMA_NAME"; + case PLPGSQL_GETDIAG_CURRENT_ROUTINE_OID: + return "PG_CURRENT_ROUTINE_OID"; } return "unknown"; diff --git a/src/pl/plpgsql/src/pl_gram.y b/src/pl/plpgsql/src/pl_gram.y index edeb72c380..6e81cf774d 100644 --- a/src/pl/plpgsql/src/pl_gram.y +++ b/src/pl/plpgsql/src/pl_gram.y @@ -318,6 +318,7 @@ static void check_raise_parameters(PLpgSQL_stmt_raise *stmt); %token K_OR %token K_PERFORM %token K_PG_CONTEXT +%token K_PG_CURRENT_ROUTINE_OID %token K_PG_DATATYPE_NAME %token K_PG_EXCEPTION_CONTEXT %token K_PG_EXCEPTION_DETAIL @@ -1035,6 +1036,7 @@ stmt_getdiag : K_GET getdiag_area_opt K_DIAGNOSTICS getdiag_list ';' break; /* these fields are allowed in either case */ case PLPGSQL_GETDIAG_CONTEXT: +case PLPGSQL_GETDIAG_CURRENT_ROUTINE_OID: break; default: elog(ERROR, "unrecognized diagnostic item kind: %d", @@ -1123,6 +1125,9 @@ getdiag_item : else if (tok_is_keyword(tok, , K_RETURNED_SQLSTATE, "returned_sqlstate")) $$ = PLPGSQL_GETDIAG_RETURNED_SQLSTATE; + else if (tok_is_keyword(tok, , +K_PG_CURRENT_ROUTINE_OID, "pg_current_routine_oid")) + $$ = PLPGSQL_GETDIAG_CURRENT_ROUTINE_OID; else yyerror("unrecognized GET DIAGNOSTICS item"); } @@ -2523,6 +2528,7 @@ unreserved_keyword : | K_OPEN | K_OPTION | K_PERFORM +| K_PG_CURRENT_ROUTINE_OID | K_PG_CONTEXT | K_PG_DATATYPE_NAME | K_PG_EXCEPTION_CONTEXT diff --git a/src/pl/plpgsql/src/pl_unreserved_kwlist.h b/src/pl/plpgsql/src/pl_unreserved_kwlist.h index 466bdc7a20..b569d75708 100644 --- a/src/pl/plpgsql/src/pl_unreserved_kwlist.h +++ b/src/pl/plpgsql/src/pl_unreserved_kwlist.h @@ -81,6 +81,7 @@ PG_KEYWORD("open", K_OPEN) PG_KEYWORD("option", K_OPTION) PG_KEYWORD("perform", K_PERFORM) PG_KEYWORD("pg_context", K_PG_CONTEXT) +PG_KEYWORD("pg_current_routine_oid", K_PG_CURRENT_ROUTINE_OID) PG_KEYWORD("pg_datatype_name", K_PG_DATATYPE_NAME) PG_KEYWORD("pg_exception_context", K_PG_EXCEPTION_CONTEXT) PG_KEYWORD("pg_exception_detail", K_PG_EXCEPTION_DETAIL) diff --git a/src/pl/plpgsql/src/plpgsql.h b/src/pl/plpgsql/src/plpgsql.h index 355c9f678d..258ce4dec7 100644 --- a/src/pl/plpgsql/src/plpgsql.h +++ b/src/pl/plpgsql/src/plpgsql.h @@ -157,7 +157,8 @@ typedef enum PLpgSQL_getdiag_kind PLPGSQL_GETDIAG_DATATYPE_NAME, PLPGSQL_GETDIAG_MESSAGE_TEXT, PLPGSQL_GETDIAG_TABLE_NAME, - PLPGSQL_GETDIAG_SCHEMA_NAME + PLPGSQL_GETDIAG_SCHEMA_NAME, + PLPGSQL_GETDIAG_CURRENT_ROUTINE_OID } PLpgSQL_getdiag_kind; /* diff --git a/src/test/regress/expected/plpgsql.out b/src/test/regress/expected/plpgsql.out index 2d26be1a81..56ed4e14bf 100644 --- a/src/test/regress/expected/plpgsql.out
Re: [BUG] Logical replica crash if there was an error in a function.
"Anton A. Melnikov" writes: > Now there are no any pending questions, so moved it to RFC. I did not think this case was worth memorializing in a test before, and I still do not. I'm inclined to reject this patch. regards, tom lane
Re: Should vacuum process config file reload more often
Melanie Plageman writes: > v13 attached with requested updates. I'm afraid I'd not been paying any attention to this discussion, but better late than never. I'm okay with letting autovacuum processes reload config files more often than now. However, I object to allowing ProcessConfigFile to be called from within commands in a normal user backend. The existing semantics are that user backends respond to SIGHUP only at the start of processing a user command, and I'm uncomfortable with suddenly deciding that that can work differently if the command happens to be VACUUM. It seems unprincipled and perhaps actively unsafe. regards, tom lane
Re: Minimal logical decoding on standbys
Hi, On 4/3/23 8:10 AM, Drouvot, Bertrand wrote: Hi, On 4/3/23 7:35 AM, Amit Kapila wrote: On Mon, Apr 3, 2023 at 4:26 AM Jeff Davis wrote: Agreed, even Bertrand and myself discussed the same approach few emails above. BTW, if we have this selective logic to wake physical/logical walsenders and for standby's, we only wake logical walsenders at the time of ApplyWalRecord() then do we need the new conditional variable enhancement being discussed, and if so, why? Thank you both for this new idea and discussion. In that case I don't think we need the new CV API and the use of a CV anymore. As just said up-thread I'll submit a new proposal with this new approach. Please find enclosed V57 implementing the new approach in 0004. With the new approach in place the TAP tests (0005) work like a charm (no delay and even after a promotion). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.comFrom 0d198484e008090a524562076326054be56935ca Mon Sep 17 00:00:00 2001 From: bdrouvotAWS Date: Tue, 7 Feb 2023 14:08:11 + Subject: [PATCH v57 6/6] Doc changes describing details about logical decoding. Author: Andres Freund (in an older version), Amit Khandekar, Bertrand Drouvot Reviewed-By: Bertrand Drouvot, Andres Freund, Robert Haas, Fabrizio de Royes Mello --- doc/src/sgml/logicaldecoding.sgml | 22 ++ 1 file changed, 22 insertions(+) 100.0% doc/src/sgml/ diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml index 4e912b4bd4..3da254ed1f 100644 --- a/doc/src/sgml/logicaldecoding.sgml +++ b/doc/src/sgml/logicaldecoding.sgml @@ -316,6 +316,28 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU may consume changes from a slot at any given time. + + A logical replication slot can also be created on a hot standby. To prevent + VACUUM from removing required rows from the system + catalogs, hot_standby_feedback should be set on the + standby. In spite of that, if any required rows get removed, the slot gets + invalidated. It's highly recommended to use a physical slot between the primary + and the standby. Otherwise, hot_standby_feedback will work, but only while the + connection is alive (for example a node restart would break it). Existing + logical slots on standby also get invalidated if wal_level on primary is reduced to + less than 'logical'. + + + + For a logical slot to be created, it builds a historic snapshot, for which + information of all the currently running transactions is essential. On + primary, this information is available, but on standby, this information + has to be obtained from primary. So, slot creation may wait for some + activity to happen on the primary. If the primary is idle, creating a + logical slot on standby may take a noticeable time. One option to speed it + is to call the pg_log_standby_snapshot on the primary. + + Replication slots persist across crashes and know nothing about the state -- 2.34.1 From 4e392c06e39f36c4185780a21f2b90c7c6a97de4 Mon Sep 17 00:00:00 2001 From: bdrouvotAWS Date: Tue, 7 Feb 2023 09:04:12 + Subject: [PATCH v57 5/6] New TAP test for logical decoding on standby. In addition to the new TAP test, this commit introduces a new pg_log_standby_snapshot() function. The idea is to be able to take a snapshot of running transactions and write this to WAL without requesting for a (costly) checkpoint. Author: Craig Ringer (in an older version), Amit Khandekar, Bertrand Drouvot Reviewed-By: Bertrand Drouvot, Andres Freund, Robert Haas, Fabrizio de Royes Mello --- doc/src/sgml/func.sgml| 15 + src/backend/access/transam/xlogfuncs.c| 32 + src/backend/catalog/system_functions.sql | 2 + src/include/catalog/pg_proc.dat | 3 + src/test/perl/PostgreSQL/Test/Cluster.pm | 37 + src/test/recovery/meson.build | 1 + .../t/035_standby_logical_decoding.pl | 705 ++ 7 files changed, 795 insertions(+) 3.1% src/backend/ 4.0% src/test/perl/PostgreSQL/Test/ 89.7% src/test/recovery/t/ diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 918a492234..939fb8019f 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -27034,6 +27034,21 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset prepared with . + + + + pg_log_standby_snapshot + +pg_log_standby_snapshot () +pg_lsn + + +Take a snapshot of running transactions and write this to WAL without +having to wait bgwriter or checkpointer to log one. This one is useful for +logical decoding on standby for which logical slot creation is hanging +until such a
Re: [Proposal] Add foreign-server health checks infrastructure
Fujii Masao writes: > Regarding 0001 patch, on second thought, to me, it seems odd to expose > a function that doesn't have anything to directly do with PostgreSQL > as a libpq function. The function simply calls poll() on the socket > with POLLRDHUP if it is supported. While it is certainly convenient to > have this function, I'm not sure that it fits within the scope of libpq. > Thought? Yeah, that does not seem great, partly because the semantics would be platform-dependent. I don't think we want that to become part of libpq's API. regards, tom lane
Re: Why enable_hashjoin Completely disables HashJoin
Robert Haas writes: > On Mon, Apr 3, 2023 at 8:13 AM Tom Lane wrote: >> Personally, I'd get rid of disable_cost altogether if I could. >> I'm not in a hurry to extend its use to more places. > I agree. I've wondered if we should put some work into that. It feels > bad to waste CPU cycles generating paths we intend to basically just > throw away, and it feels even worse if they manage to beat out some > other path on cost. > It hasn't been obvious to me how we could restructure the existing > logic to avoid relying on disable_cost. Yeah. In some places it would not be too hard; for example, if we generated seqscan paths last instead of first for baserels, the rule could be "generate it if enable_seqscan is on OR if we made no other path for the rel". It's much messier for joins though, partly because the same joinrel will be considered multiple times as we process different join orderings, plus it's usually unclear whether failing to generate any paths for joinrel X will lead to overall failure. A solution that would work is to treat disable_cost as a form of infinity that's counted separately from the actual cost estimate, that is we label paths as "cost X, plus there are N uses of disabled plan types". Then you sort first on N and after that on X. But this'd add a good number of cycles to add_path, which I've not wanted to expend on a non-mainstream usage. regards, tom lane
Re: Non-superuser subscription owners
On Sat, Apr 1, 2023 at 12:00 PM Alexander Lakhin wrote: > I've managed to reproduce it using the following script: > for ((i=1;i<=10;i++)); do > echo "iteration $i" > echo " > CREATE ROLE sub_user; > CREATE SUBSCRIPTION testsub CONNECTION 'dbname=db' > PUBLICATION testpub WITH (connect = false); > ALTER SUBSCRIPTION testsub ENABLE; > DROP SUBSCRIPTION testsub; > SELECT pg_sleep(0.001); > DROP ROLE sub_user; > " | psql > psql -c "ALTER SUBSCRIPTION testsub DISABLE;" > psql -c "ALTER SUBSCRIPTION testsub SET (slot_name = NONE);" > psql -c "DROP SUBSCRIPTION testsub;" > grep 'TRAP' server.log && break > done After a bit of experimentation this repro worked for me -- I needed -DRELCACHE_FORCE_RELEASE as well, and a bigger iteration count. I verified that the patch fixed it, and committed the patch with the addition of a comment. Thanks very much for this repro, and likewise many thanks to Hou Zhijie for the report and patch. -- Robert Haas EDB: http://www.enterprisedb.com
Re: [Proposal] Add foreign-server health checks infrastructure
On 2023/03/13 16:05, Hayato Kuroda (Fujitsu) wrote: Thank you so much for your reviewing! Now we can wait comments from senior members and committers. Thanks for working on this patch! Regarding 0001 patch, on second thought, to me, it seems odd to expose a function that doesn't have anything to directly do with PostgreSQL as a libpq function. The function simply calls poll() on the socket with POLLRDHUP if it is supported. While it is certainly convenient to have this function, I'm not sure that it fits within the scope of libpq. Thought? Regarding 0002 patch, the behavior of postgres_fdw_verify_connection_states() in regards to multiple connections using different user mappings seems not well documented. The function seems to return false if either of those connections has been closed. This behavior means that it's difficult to identify which connection has been closed when there are multiple ones to the given server. To make it easier to identify that, it could be helpful to extend the postgres_fdw_verify_connection_states() function so that it accepts a unique connection as an input instead of just the server name. One suggestion is to extend the function so that it accepts both the server name and the user name used for the connection, and checks the specified connection. If only the server name is specified, the function should check all connections to the server and return false if any of them are closed. This would be helpful since there is typically only one connection to the server in most cases. Additionally, it would be helpful to extend the postgres_fdw_get_connections() function to also return the user name used for each connection, as currently there is no straightforward way to identify it. The function name "postgres_fdw_verify_connection_states" may seem unnecessarily long to me. A simpler name like "postgres_fdw_verify_connection" may be enough? The patch may not be ready for commit due to the review comments, and with the feature freeze approaching in a few days, it may not be possible to include this feature in v16. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: Why enable_hashjoin Completely disables HashJoin
On Mon, Apr 3, 2023 at 8:13 AM Tom Lane wrote: > Personally, I'd get rid of disable_cost altogether if I could. > I'm not in a hurry to extend its use to more places. I agree. I've wondered if we should put some work into that. It feels bad to waste CPU cycles generating paths we intend to basically just throw away, and it feels even worse if they manage to beat out some other path on cost. It hasn't been obvious to me how we could restructure the existing logic to avoid relying on disable_cost. I sort of feel like it should be a two-pass algorithm: go through and generate all the path types that aren't disabled, and then if that results in no paths, try a do-over where you ignore the disable flags (or just some of them). But the code structure doesn't seem particularly amenable to that kind of thing. This hasn't caused me enough headaches yet that I've been willing to invest time in it, but it has caused me more than zero headaches... -- Robert Haas EDB: http://www.enterprisedb.com
Re: possible proposal plpgsql GET DIAGNOSTICS oid = PG_ROUTINE_OID
Pavel Stehule writes: > po 27. 3. 2023 v 5:36 odesílatel Kirk Wolak napsal: >> I have marked the item Ready for Commiter... > Thank you for doc and for review I'm kind of surprised there was any interest in this proposal at all, TBH, but apparently there is some. Still, I think you over-engineered it by doing more than the original proposal of making the function OID available. The other things can be had by casting the OID to regproc or regprocedure, so I'd be inclined to add just one new keyword not three. Besides, your implementation is a bit inconsistent: relying on fn_signature could return a result that is stale or doesn't conform to the current search_path. regards, tom lane
Re: [BUG] pg_stat_statements and extended query protocol
"Imseih (AWS), Sami" writes: >> So... The idea here is to set a custom fetch size so as the number of >> calls can be deterministic in the tests, still more than 1 for the >> tests we'd have. And your point is that libpq enforces always 0 when >> sending the EXECUTE message causing it to always return all the rows >> for any caller of PQsendQueryGuts(). > That is correct. Hi, I took a quick look through this thread, and I have a couple of thoughts: * Yeah, it'd be nice to have an in-core test, but it's folly to insist on one that works via libpq and psql. That requires a whole new set of features that you're apparently designing on-the-fly with no other use cases in mind. I don't think that will accomplish much except to ensure that this bug fix doesn't make it into v16. * I don't understand why it was thought good to add two new counters to struct Instrumentation. In EXPLAIN ANALYZE cases those will be wasted space *per plan node*, not per Query. * It also seems quite bizarre to add counters to a core data structure and then leave it to pg_stat_statements to maintain them. (BTW, I didn't much care for putting that maintenance into pgss_ExecutorRun without updating its header comment.) I'm inclined to think that adding the counters to struct EState is fine. That's 304 bytes already on 64-bit platforms, another 8 or 16 won't matter. Also, I'm doubtful that counting calls this way is a great idea, which would mean you only need one new counter field not two. The fact that you're having trouble defining what it means certainly suggests that the implementation is out front of the design. In short, what I think I'd suggest is adding an es_total_processed field to EState and having standard_ExecutorRun do "es_total_processed += es_processed" near the end, then just change pg_stat_statements to use es_total_processed not es_processed. regards, tom lane
Re: Should vacuum process config file reload more often
On Sun, Apr 2, 2023 at 10:28 PM Masahiko Sawada wrote: > Thank you for updating the patches. Here are comments for 0001, 0002, > and 0003 patches: Thanks for the review! v13 attached with requested updates. > 0001: > > @@ -391,7 +389,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, > Assert(params->index_cleanup != VACOPTVALUE_UNSPECIFIED); > Assert(params->truncate != VACOPTVALUE_UNSPECIFIED && > params->truncate != VACOPTVALUE_AUTO); > -vacrel->failsafe_active = false; > +VacuumFailsafeActive = false; > > If we go with the idea of using VacuumCostActive + > VacuumFailsafeActive, we need to make sure that both are cleared at > the end of the vacuum per table. Since the patch clears it only here, > it remains true even after vacuum() if we trigger the failsafe mode > for the last table in the table list. > > In addition to that, to ensure that also in an error case, I think we > need to clear it also in PG_FINALLY() block in vacuum(). So, in 0001, I tried to keep it exactly the same as LVRelState->failsafe_active except for it being a global. We don't actually use VacuumFailsafeActive in this commit except in vacuumlazy.c, which does its own management of the value (it resets it to false at the top of heap_vacuum_rel()). In the later commit which references VacuumFailsafeActive outside of vacuumlazy.c, I had reset it in PG_FINALLY(). I hadn't reset it in the relation list loop in vacuum(). Autovacuum calls vacuum() for each relation. However, you are right that for VACUUM with a list of relations for a table access method other than heap, once set to true, if the table AM forgets to reset the value to false at the end of vacuuming the relation, it would stay true. I've set it to false now at the bottom of the loop through relations in vacuum(). > --- > @@ -306,6 +306,7 @@ extern PGDLLIMPORT pg_atomic_uint32 > *VacuumSharedCostBalance; > extern PGDLLIMPORT pg_atomic_uint32 *VacuumActiveNWorkers; > extern PGDLLIMPORT int VacuumCostBalanceLocal; > > +extern bool VacuumFailsafeActive; > > Do we need PGDLLIMPORT for VacuumFailSafeActive? I didn't add one because I thought extensions and other code probably shouldn't access this variable. I thought PGDLLIMPORT was only needed for extensions built on windows to access variables. > 0002: > > @@ -2388,6 +2398,7 @@ vac_max_items_to_alloc_size(int max_items) > return offsetof(VacDeadItems, items) + > sizeof(ItemPointerData) * max_items; > } > > + > /* > * vac_tid_reaped() -- is a particular tid deletable? > * > > Unnecessary new line. There are some other unnecessary new lines in this > patch. Thanks! I think I got them all. > --- > @@ -307,6 +309,8 @@ extern PGDLLIMPORT pg_atomic_uint32 *VacuumActiveNWorkers; > extern PGDLLIMPORT int VacuumCostBalanceLocal; > > extern bool VacuumFailsafeActive; > +extern int VacuumCostLimit; > +extern double VacuumCostDelay; > > and > > @@ -266,8 +266,6 @@ extern PGDLLIMPORT int max_parallel_maintenance_workers; > extern PGDLLIMPORT int VacuumCostPageHit; > extern PGDLLIMPORT int VacuumCostPageMiss; > extern PGDLLIMPORT int VacuumCostPageDirty; > -extern PGDLLIMPORT int VacuumCostLimit; > -extern PGDLLIMPORT double VacuumCostDelay; > > Do we need PGDLLIMPORT too? I was on the fence about this. I annotated the new guc variables vacuum_cost_delay and vacuum_cost_limit with PGDLLIMPORT, but I did not annotate the variables used in vacuum code (VacuumCostLimit/Delay). I think whether or not this is the right choice depends on two things: whether or not my understanding of PGDLLIMPORT is correct and, if it is, whether or not we want extensions to be able to access VacuumCostLimit/Delay or if just access to the guc variables is sufficient/desirable. > --- > @@ -1773,20 +1773,33 @@ FreeWorkerInfo(int code, Datum arg) > } > } > > + > /* > - * Update the cost-based delay parameters, so that multiple workers consume > - * each a fraction of the total available I/O. > + * Update vacuum cost-based delay-related parameters for autovacuum workers > and > + * backends executing VACUUM or ANALYZE using the value of relevant gucs and > + * global state. This must be called during setup for vacuum and after every > + * config reload to ensure up-to-date values. > */ > void > -AutoVacuumUpdateDelay(void) > +VacuumUpdateCosts(void > > Isn't it better to define VacuumUpdateCosts() in vacuum.c rather than > autovacuum.c as this is now a common code for both vacuum and > autovacuum? We can't access members of WorkerInfoData from inside vacuum.c > 0003: > > @@ -501,9 +502,9 @@ vacuum(List *relations, VacuumParams *params, > { > ListCell *cur; > > -VacuumUpdateCosts(); > in_vacuum = true; > -VacuumCostActive = (VacuumCostDelay > 0); > +VacuumFailsafeActive = false; > +VacuumUpdateCosts(); > > Hmm, if we initialize
Re: Request for comment on setting binary format output per session
> participating clients to receive GUC configured format. I do not > > think that libpq's result format being able to be overridden by GUC >> > is a good idea at all, the library has to to participate, and I >> > think can be made to so so without adjusting the interface (say, by >> > resultFormat = 3). >> >> Interesting idea. I suppose you'd need to specify 3 for all result >> columns? That is a protocol change, but wouldn't "break" older clients. >> The newer clients would need to make sure that they're connecting to >> v16+, so using the protocol version alone wouldn't be enough. Hmm. >> > > So this only works with extended protocol and not simple queries. Again, as Peter mentioned it's already easy enough to confuse psql using binary cursors so it makes sense to fix psql either way. If you use resultFormat (3) I think you'd still end up doing the Describe (which we are trying to avoid) to make sure you could receive all the columns in binary. Dave
Re: Minimal logical decoding on standbys
On 2023-Apr-03, Drouvot, Bertrand wrote: > +/* > + * Report terminating or conflicting message. > + * > + * For both, logical conflict on standby and obsolete slot are handled. > + */ > +static void > +ReportTerminationInvalidation(bool terminating, bool islogical, int pid, > + NameData slotname, > TransactionId *xid, > + XLogRecPtr > restart_lsn, XLogRecPtr oldestLSN) > +{ > + if (terminating) > + appendStringInfo(_msg, _("terminating process %d to release > replication slot \"%s\""), > + pid, > + NameStr(slotname)); > + else > + appendStringInfo(_msg, _("invalidating")); > + > + if (islogical) > + { > + if (terminating) > + appendStringInfo(_msg, _(" because it conflicts > with recovery")); You can't build the strings this way, because it's not possible to put the strings into the translation machinery. You need to write full strings for each separate case instead, without appending other string parts later. Thanks -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Hay quien adquiere la mala costumbre de ser infeliz" (M. A. Evans)
Re: Minimal logical decoding on standbys
Hi, On 4/2/23 10:10 PM, Andres Freund wrote: Hi, Btw, most of the patches have some things that pgindent will change (and some that my editor will highlight). It wouldn't hurt to run pgindent for the later patches... done. Pushed the WAL format change. Thanks! On 2023-04-02 10:27:45 +0200, Drouvot, Bertrand wrote: 5.3% doc/src/sgml/ 6.2% src/backend/access/transam/ 4.6% src/backend/replication/logical/ 55.6% src/backend/replication/ 4.4% src/backend/storage/ipc/ 6.9% src/backend/tcop/ 5.3% src/backend/ 3.8% src/include/catalog/ 5.3% src/include/replication/ I think it might be worth trying to split this up a bit. Okay. Split in 2 parts in V56 enclosed. One part to handle logical slot conflicts on standby, and one part to arrange for a new pg_stat_database_conflicts and pg_replication_slots field. restart_lsn = s->data.restart_lsn; - - /* -* If the slot is already invalid or is fresh enough, we don't need to -* do anything. -*/ - if (XLogRecPtrIsInvalid(restart_lsn) || restart_lsn >= oldestLSN) + slot_xmin = s->data.xmin; + slot_catalog_xmin = s->data.catalog_xmin; + + /* the slot has been invalidated (logical decoding conflict case) */ + if ((xid && ((LogicalReplicationSlotIsInvalid(s)) || + /* or the xid is valid and this is a non conflicting slot */ +(TransactionIdIsValid(*xid) && !(LogicalReplicationSlotXidsConflict(slot_xmin, slot_catalog_xmin, *xid) || + /* or the slot has been invalidated (obsolete LSN case) */ + (!xid && (XLogRecPtrIsInvalid(restart_lsn) || restart_lsn >= oldestLSN))) { This still looks nearly unreadable. I suggest moving comments outside of the if (), remove redundant parentheses, use a function to detect if the slot has been invalidated. I made it as simple as: /* * If the slot is already invalid or is a non conflicting slot, we don't * need to do anything. */ islogical = xid ? true : false; if (SlotIsInvalid(s, islogical) || SlotIsNotConflicting(s, islogical, xid, )) in V56 attached. @@ -1329,16 +1345,45 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN, */ if (last_signaled_pid != active_pid) { + boolsend_signal = false; + + initStringInfo(_msg); + initStringInfo(_detail); + + appendStringInfo(_msg, "terminating process %d to release replication slot \"%s\"", +active_pid, + NameStr(slotname)); For this to be translatable you need to use _("message"). Thanks! + if (xid) + { + appendStringInfo(_msg, " because it conflicts with recovery"); + send_signal = true; + + if (TransactionIdIsValid(*xid)) + appendStringInfo(_detail, "The slot conflicted with xid horizon %u.", *xid); + else + appendStringInfo(_detail, "Logical decoding on standby requires wal_level to be at least logical on the primary server"); + } + else + { + appendStringInfo(_detail, "The slot's restart_lsn %X/%X exceeds the limit by %llu bytes.", + LSN_FORMAT_ARGS(restart_lsn), + (unsigned long long) (oldestLSN - restart_lsn)); + } + ereport(LOG, - errmsg("terminating process %d to release replication slot \"%s\"", - active_pid, NameStr(slotname)), - errdetail("The slot's restart_lsn %X/%X exceeds the limit by %llu bytes.", - LSN_FORMAT_ARGS(restart_lsn), - (unsigned long long) (oldestLSN - restart_lsn)), - errhint("You might need to increase max_slot_wal_keep_size.")); - - (void)
Re: Initial Schema Sync for Logical Replication
On Mon, Apr 3, 2023 at 3:54 PM Kumar, Sachin wrote: > > > > > -Original Message- > > From: Masahiko Sawada > > > > I was thinking each TableSync process will call pg_dump --table, > > > > This way if we have N tableSync process, we can have N pg_dump -- > > table=table_name called in parallel. > > > > In fact we can use --schema-only to get schema and then let COPY > > > > take care of data syncing . We will use same snapshot for pg_dump as > > > > well > > as COPY table. > > > > > > How can we postpone creating the pg_subscription_rel entries until the > > > tablesync worker starts and does the schema sync? I think that since > > > pg_subscription_rel entry needs the table OID, we need either to do > > > the schema sync before creating the entry (i.e, during CREATE > > > SUBSCRIPTION) or to postpone creating entries as Amit proposed[1]. The > > > apply worker needs the information of tables to sync in order to > > > launch the tablesync workers, but it needs to create the table schema > > > to get that information. > > > > For the above reason, I think that step 6 of the initial proposal won't > > work. > > > > If we can have the tablesync worker create an entry of pg_subscription_rel > > after > > creating the table, it may give us the flexibility to perform the initial > > sync. One > > idea is that we add a relname field to pg_subscription_rel so that we can > > create > > entries with relname instead of OID if the table is not created yet. Once > > the > > table is created, we clear the relname field and set the OID of the table > > instead. > > It's not an ideal solution but we might make it simpler later. > > > > Assuming that it's feasible, I'm considering another approach for the > > initial sync > > in order to address the concurrent DDLs. > > > > The basic idea is to somewhat follow how pg_dump/restore to dump/restore > > the database data. We divide the synchronization phase (including both > > schema > > and data) up into three phases: pre-data, table-data, post-data. These > > mostly > > follow the --section option of pg_dump. > > > > 1. The backend process performing CREATE SUBSCRIPTION creates the > > subscription but doesn't create pg_subscription_rel entries yet. > > > > 2. Before starting the streaming, the apply worker fetches the table list > > from the > > publisher, create pg_subscription_rel entries for them, and dumps+restores > > database objects that tables could depend on but don't depend on tables > > such as > > TYPE, OPERATOR, ACCESS METHOD etc (i.e. > > pre-data). > > We will not have slot starting snapshot, So somehow we have to get a new > snapshot > And skip all the wal_log between starting of slot and snapshot creation lsn ? > . Yes. Or we can somehow postpone creating pg_subscription_rel entries until the tablesync workers create tables, or we request walsender to establish a new snapshot using a technique proposed in email[1]. > > > > > 3. The apply worker launches the tablesync workers for tables that need to > > be > > synchronized. > > > > There might be DDLs executed on the publisher for tables before the > > tablesync > > worker starts. But the apply worker needs to apply DDLs for pre-data > > database > > objects. OTOH, it can ignore DDLs for not-synced-yet tables and other > > database > > objects such as INDEX, TRIGGER, RULE, etc (i.e. post-data). > > > > 4. The tablesync worker creates its replication slot, dumps+restores the > > table > > schema, update the pg_subscription_rel, and perform COPY. > > > > These operations should be done in the same transaction. > > pg_restore wont be rollbackable, So we need to maintain states in > pg_subscription_rel. Yes. But I think it depends on how we restore them. For example, if we have the tablesync worker somethow restore the table using a new SQL function returning the table schema as we discussed or executing the dump file via SPI, we can do that in the same transaction. > > > > > 5. After finishing COPY, the tablesync worker dumps indexes (and perhaps > > constraints) of the table and creates them (which possibly takes a long > > time). > > Then it starts to catch up, same as today. The apply worker needs to wait > > for the > > tablesync worker to catch up. > > I don’t think we can have CATCHUP stage. We can have a DDL on publisher which > can add a new column (And this DDL will be executed by applier later). Then > we get a INSERT > because we have old definition of table, insert will fail. All DMLs and DDLs associated with the table being synchronized are applied by the tablesync worker until it catches up with the apply worker. > > > > > We need to repeat these steps until we complete the initial data copy and > > create > > indexes for all tables, IOW until all pg_subscription_rel status becomes > > READY. > > > > 6. If the apply worker confirms all tables are READY, it starts another sync > > worker who is responsible for the post-data database objects such as > >
Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode
On Mon, Apr 3, 2023 at 1:09 AM David Rowley wrote: > > On Sat, 1 Apr 2023 at 13:24, Melanie Plageman > wrote: > > Your diff LGTM. > > > > Earlier upthread in [1], Bharath had mentioned in a review comment about > > removing the global variables that he would have expected the analogous > > global in analyze.c to also be removed (vac_strategy [and analyze.c also > > has anl_context]). > > > > I looked into doing this, and this is what I found out (see full > > rationale in [2]): > > > > > it is a bit harder to remove it from analyze because acquire_func > > > doesn't take the buffer access strategy as a parameter and > > > acquire_sample_rows uses the vac_context global variable to pass to > > > table_scan_analyze_next_block(). > > > > I don't know if this is worth mentioning in the commit removing the > > other globals? Maybe it will just make it more confusing... > > I did look at that, but it seems a little tricky to make work unless > the AcquireSampleRowsFunc signature was changed. To me, it just does > not seem worth doing that to get rid of the two globals in analyze.c. Yes, I came to basically the same conclusion. On Mon, Apr 3, 2023 at 7:57 AM David Rowley wrote: > > I've now pushed up v8-0004. Can rebase the remaining 2 patches on top > of master again and resend? v9 attached. > On Mon, 3 Apr 2023 at 08:11, Melanie Plageman > wrote: > > I still have a few open questions: > > - what the initial value of ring_size for autovacuum should be (see the > > one remaining TODO in the code) > > I assume you're talking about the 256KB BAS_VACUUM one set in > GetAccessStrategy()? I don't think this patch should be doing anything > to change those defaults. Anything that does that should likely have > a new thread and come with analysis or reasoning about why the newly > proposed defaults are better than the old ones. I actually was talking about something much more trivial but a little more confusing. In table_recheck_autovac(), I initialize the autovac_table->at_params.ring_size to the value of the vacuum_buffer_usage_limit guc. However, autovacuum makes its own BufferAccessStrategy object (instead of relying on vacuum() to do it) and passes that in to vacuum(). So, if we wanted autovacuum to disable use of a strategy (and use as many shared buffers as it likes), it would pass in NULL to vacuum(). If vauum_buffer_usage_limit is not 0, then we would end up making and using a BufferAccessStrategy in vacuum(). If we instead initialized autovac_table->at_params.ring_size to 0, even if the passed in BufferAccessStrategy is NULL, we wouldn't make a ring for autovacuum. Right now, we don't disable the strategy for autovacuum except in failsafe mode. And it is unclear when or why we would want to. I also thought it might be weird to have the value of the ring_size be initialized to something other than the value of vacuum_buffer_usage_limit for autovacuum, since it is supposed to use that guc value. In fact, right now, we don't use the autovac_table->at_params.ring_size set in table_recheck_autovac() when making the ring in do_autovacuum() but instead use the guc directly. I actually don't really like how vacuum() relies on the BufferAccessStrategy parameter being NULL for autovacuum and feel like there is a more intuitive way to handle all this. But, I didn't want to make major changes at this point. Anyway, the above is quite a bit more analysis than the issue is really worth. We should pick something and then document it in a comment. > > - should ANALYZE allow specifying BUFFER_USAGE_LIMIT since it uses the guc > > value when that is set? > > That's a good question... I kinda think we should just skip it. It adds to the surface area of the feature. > > - should INDEX_CLEANUP off cause VACUUM to use shared buffers and > > disable use of a strategy (like failsafe vacuum) > > I don't see why it should. It seems strange to have one option > magically make changes to some other option. Sure, sounds good. > > - should we add anything to VACUUM VERBOSE output about the number of > > reuses of strategy buffers? > > Sounds like this would require an extra array of counter variables in > BufferAccessStrategyData? I think it might be a bit late to start > experimenting with this. Makes sense. I hadn't thought through the implementation. We count reuses in pg_stat_io data structures but that is global and not per BufferAccessStrategyData instance, so I agree to scrapping this idea. > > - Should we make BufferAccessStrategyData non-opaque so that we don't > > have to add a getter for nbuffers. I could have implemented this in > > another way, but I don't really see why BufferAccessStrategyData > > should be opaque > > If nothing outside of the .c file requires access then there's little > need to make the members known outside of the file. Same as you'd want > to make classes private rather than public when possible in OOP. > > If you do come up with a reason to be able to determine
Re: is_superuser is not documented
On 2023/04/01 22:34, Joseph Koshakow wrote: The patch updated the guc table for is_superuser in src/backend/utils/misc/guc_tables.c Yes, this patch moves the descriptions of is_superuser to config.sgml and changes its group to PRESET_OPTIONS. However, when I look at the code on master I don't see this update Yes, the patch has not been committed yet because of lack of review comments. Do you have any review comments on this patch? Or you think it's ready for committer? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: daitch_mokotoff module
On 4/3/23 15:19, Dag Lem wrote: > Dag Lem writes: > >> I sincerely hope this resolves any blocking issues with copyright / >> legalese / credits. >> > > Can this now be considered ready for commiter, so that Paul or someone > else can flip the bit? > Hi, I think from the technical point of view it's sound and ready for commit. The patch stalled on the copyright/credit stuff, which is somewhat separate and mostly non-technical aspect of patches. Sorry for that, I'm sure it's annoying/frustrating :-( I see the current patch has two simple lines: * This module was originally sponsored by Finance Norway / * Trafikkforsikringsforeningen, and implemented by Dag Lem Any objections to this level of attribution in commnents? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: running logical replication as the subscription owner
Thank you for this email. It's very helpful to get your opinion on this. On Sun, Apr 2, 2023 at 11:21 PM Noah Misch wrote: > On Wed, Mar 29, 2023 at 04:00:45PM -0400, Robert Haas wrote: > > > The dangerous cases seem to be something along the lines of a security- > > > invoker trigger function that builds and executes arbirary SQL based on > > > the row contents. And then the table owner would then still need to set > > > ENABLE ALWAYS TRIGGER. > > > > > > Do we really want to take that case on as our security responsibility? > > > > That's something about which I would like to get more opinions. > > The most-plausible-to-me attack involves an ENABLE ALWAYS trigger that logs > CURRENT_USER to an audit table. The "SQL based on the row contents" scenario > feels remote. Another remotely-possible attack involves a trigger that > internally queries some other table having RLS. (Switching to the table owner > can change the rows visible to that query.) I had thought of the first of these cases, but not the second one. > If having INSERT/UPDATE privileges on the table were enough to make a > subscription that impersonates the table owner, then relatively-unprivileged > roles could make a subscription to bypass the aforementioned auditing. Commit > c3afe8c has imposed weighty requirements beyond I/U privileges, namely holding > the pg_create_subscription role and database-level CREATE privilege. Since > database-level CREATE is already powerful, it would be plausible to drop the > SET ROLE requirement and add this audit bypass to its powers. The SET ROLE > requirement is nice for keeping the powers disentangled. One drawback is > making people do GRANTs regardless of whether a relevant audit trigger exists. > Another drawback is the subscription role having more privileges than ideally > needed. I do like keeping strong privileges orthogonal, so I lean toward > keeping the SET ROLE requirement. The orthogonality argument weighs extremely heavily with me in this case. As I said to Jeff, I would not mind having a more granular way to control which tables a user can replicate into; e.g. a grantable REPLICAT{E,ION} privilege, or we want something global we could have a predefined role for it, e.g. pg_replicate_into_any_table. But I think any such thing should definitely be separate from pg_create_subscription. I'll fix the typos. Thanks for reporting them. -- Robert Haas EDB: http://www.enterprisedb.com
Re: running logical replication as the subscription owner
On Fri, Mar 31, 2023 at 6:46 PM Jeff Davis wrote: > I guess the "more convenient" is where I'm confused, because the "grant > subscription_owner to table owner with set role true" is not likely to > be conveniently already present; it would need to be issued manually to > take advantage of this special case. You and I disagree about the likelihood of that, but I could well be wrong. > Do you have any concern about the weirdness where assigning the > subscription to a higher-privilege user Z would cause B's trigger to > fail? Not very much. I think the biggest risk is user confusion, but I don't think that's a huge risk because I don't think this scenario will come up very often. Also, it's kind of hard to imagine that there's a security model here which never does anything potentially surprising. -- Robert Haas EDB: http://www.enterprisedb.com
Re: [EXTERNAL] Re: [PATCH] Report the query string that caused a memory error under Valgrind
Onur Tirtir writes: > Thank you for sharing your proposal as a patch. It looks much nicer and > useful than mine. > I've also tested it for a few cases --by injecting a memory error on > purpose-- and seen that it helps reporting the problematic query. > Should we move forward with v3 then? OK, I pushed v3 as-is. We can refine it later if anyone has suggestions. Thanks for the contribution! regards, tom lane
Re: POC: Lock updated tuples in tuple_update() and tuple_delete()
Upon Alexander reverting patches v15 from master, I've rebased what was correction patches v4 in a message above on a fresh master (together with patches v15). The resulting patch v16 is attached. Pavel. v16-0002-Add-EvalPlanQual-delete-returning-isolation-test.patch Description: Binary data v16-0001-Allow-locking-updated-tuples-in-tuple_update-and.patch Description: Binary data
Re: Infinite Interval
Hi Joseph, On Mon, Apr 3, 2023 at 6:02 AM Joseph Koshakow wrote: > > I've attached a patch with all of the errcontext calls removed. None of > the existing out of range errors have an errdetail call so I think this > is more consistent. If we do want to add errdetail, then we should > probably do it in a later patch and add it to all out of range errors, > not just the ones related to infinity. Hmm, I realize my errcontext suggestion was in wrong direction. We can use errdetail if required in future. But not for this patch. Here are comments on the test and output. + infinity | | | || Infinity | Infinity | | | Infinity | Infinity | Infinity | Infinity | Infinity + -infinity | | | || -Infinity | -Infinity | | | -Infinity | -Infinity | -Infinity | -Infinity | -Infinity This is more for my education. It looks like for oscillating units we report NULL here but for monotonically increasing units we report infinity. I came across those terms in the code. But I didn't find definitions of those terms. Can you please point me to the document/resources defining those terms. diff --git a/src/test/regress/sql/horology.sql b/src/test/regress/sql/horology.sql index f7f8c8d2dd..1d0ab322c0 100644 --- a/src/test/regress/sql/horology.sql +++ b/src/test/regress/sql/horology.sql @@ -207,14 +207,17 @@ SELECT t.d1 AS t, i.f1 AS i, t.d1 + i.f1 AS "add", t.d1 - i.f1 AS "subtract" FROM TIMESTAMP_TBL t, INTERVAL_TBL i WHERE t.d1 BETWEEN '1990-01-01' AND '2001-01-01' AND i.f1 BETWEEN '00:00' AND '23:00' +AND isfinite(i.f1) I removed this and it did not have any effect on results. I think the isfinite(i.f1) is already covered by the two existing conditions. SELECT t.f1 AS t, i.f1 AS i, t.f1 + i.f1 AS "add", t.f1 - i.f1 AS "subtract" FROM TIME_TBL t, INTERVAL_TBL i + WHERE isfinite(i.f1) ORDER BY 1,2; SELECT t.f1 AS t, i.f1 AS i, t.f1 + i.f1 AS "add", t.f1 - i.f1 AS "subtract" FROM TIMETZ_TBL t, INTERVAL_TBL i + WHERE isfinite(i.f1) ORDER BY 1,2; -- SQL9x OVERLAPS operator @@ -287,11 +290,12 @@ SELECT f1 AS "timestamp" SELECT d.f1 AS "timestamp", t.f1 AS "interval", d.f1 + t.f1 AS plus FROM TEMP_TIMESTAMP d, INTERVAL_TBL t + WHERE isfinite(t.f1) ORDER BY plus, "timestamp", "interval"; SELECT d.f1 AS "timestamp", t.f1 AS "interval", d.f1 - t.f1 AS minus FROM TEMP_TIMESTAMP d, INTERVAL_TBL t - WHERE isfinite(d.f1) + WHERE isfinite(t.f1) ORDER BY minus, "timestamp", "interval"; IIUC, the isfinite() conditions are added to avoid any changes to the output due to new values added to INTERVAL_TBL. Instead, it might be a good idea to not add these conditions and avoid extra queries testing infinity arithmetic in interval.sql, timestamptz.sql and timestamp.sql like below + +-- infinite intervals ... some lines folded + +SELECT date '1995-08-06' + interval 'infinity'; +SELECT date '1995-08-06' + interval '-infinity'; +SELECT date '1995-08-06' - interval 'infinity'; +SELECT date '1995-08-06' - interval '-infinity'; ... block truncated With that I have reviewed the entire patch-set. Once you address these comments, we can mark it as ready for committer. I already see Tom looking at the patch. So that might be just a formality. -- Best Wishes, Ashutosh Bapat
Re: GUC for temporarily disabling event triggers
On Mon, Apr 3, 2023 at 9:15 AM Daniel Gustafsson wrote: > > On 3 Apr 2023, at 15:09, Robert Haas wrote: > > On Mon, Apr 3, 2023 at 8:46 AM Daniel Gustafsson wrote: > >> All comments above addressed in the attached v5, thanks for review! > > > > I continue to think it's odd that the sense of this is inverted as > > compared with row_security. > > I'm not sure I follow. Do you propose that the GUC enables classes of event > triggers, the default being "all" (or similar) and one would remove the type > of > EVT for which debugging is needed? That doesn't seem like a bad idea, just > one > that hasn't come up in the discussion (and I didn't think about). Right. Although to be fair, that idea doesn't sound as good if we're going to have settings other than "on" or "off". If this is just disable_event_triggers = on | off, then why not flip the sense around and make it event_triggers = off | on, just as we do for row_security? But if we're going to allow specific types of event triggers to be disabled, and we think it's likely that people will want to disable one specific type of event trigger while leaving the others alone, that might not be very convenient, because you could end up having to list all the things you do want instead of the one thing you don't want. On the third hand, in other contexts, I've often advocating for giving options a positive sense (what are we going to do?) rather than a negative sense (what are we not going to do?). For example, the TIMING option to EXPLAIN was originally proposed with a name like DISABLE_TIMING or something, and the value inverted, and I said let's not do that. And similarly in other places. A case where I didn't follow that principle is VACUUM (DISABLE_PAGE_SKIPPING) which now seems like a wart to me. Why isn't it VACUUM (PAGE_SKIPPING) again with the opposite value? I'm not sure what the best thing to do is here, I just think it deserves some thought. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Infinite Interval
Hi Joseph, thanks for addressing comments. On Sat, Apr 1, 2023 at 10:53 PM Joseph Koshakow wrote: > So I added a check for FLOAT8_FITS_IN_INT64() and a test with this > scenario. I like that. Thanks. > > For what it's worth I think that 2147483647 months only became a valid > interval in v15 as part of this commit [0]. It's also outside of the > documented valid range [1], which is > [-17800 years, 17800 years] or > [-1483 months, 1483 months]. you mean +/-213600 months :). In that sense the current code actually fixes a bug introduced in v15. So I am fine with it. > > The rationale for only checking the month's field is that it's faster > than checking all three fields, though I'm not entirely sure if it's > the right trade-off. Any thoughts on this? Hmm, comparing one integer is certainly faster than comparing three. We do that check at least once per interval operation. So the thrice CPU cycles might show some impact when millions of rows are processed. Given that we have clear documentation of bounds, just using months field is fine. If needed we can always expand it later. > > > There's some way to avoid separate checks for infinite-ness of > > interval and factor and use a single block using some integer > > arithmetic. But I think this is more readable. So I avoided doing > > that. Let me know if this works for you. > > I think the patch looks good, I've combined it with the existing patch. > > > Also added some test cases. > > I didn't see any tests in the patch, did you forget to include it? Sorry I forgot to include those. Attached. Please see my reply to your latest email as well. -- Best Wishes, Ashutosh Bapat diff --git a/src/test/regress/expected/interval.out b/src/test/regress/expected/interval.out index 0adfe5f9fb..ebb4208ad7 100644 --- a/src/test/regress/expected/interval.out +++ b/src/test/regress/expected/interval.out @@ -2161,6 +2161,26 @@ SELECT interval '-infinity' * 'nan'; ERROR: interval out of range SELECT interval '-1073741824 months -1073741824 days -4611686018427387904 us' * 2; ERROR: interval out of range +SELECT interval 'infinity' * 0; +ERROR: interval out of range +SELECT interval '-infinity' * 0; +ERROR: interval out of range +SELECT interval '0 days' * 'infinity'::float; +ERROR: interval out of range +SELECT interval '0 days' * '-infinity'::float; +ERROR: interval out of range +SELECT interval '5 days' * 'infinity'::float; + ?column? +-- + infinity +(1 row) + +SELECT interval '5 days' * '-infinity'::float; + ?column? +--- + -infinity +(1 row) + SELECT interval 'infinity' / 'infinity'; ERROR: interval out of range SELECT interval 'infinity' / '-infinity'; diff --git a/src/test/regress/sql/interval.sql b/src/test/regress/sql/interval.sql index 1e1d8560bf..549ceb57c1 100644 --- a/src/test/regress/sql/interval.sql +++ b/src/test/regress/sql/interval.sql @@ -690,6 +690,12 @@ SELECT -interval '-2147483647 months -2147483647 days -9223372036854775807 us'; SELECT interval 'infinity' * 'nan'; SELECT interval '-infinity' * 'nan'; SELECT interval '-1073741824 months -1073741824 days -4611686018427387904 us' * 2; +SELECT interval 'infinity' * 0; +SELECT interval '-infinity' * 0; +SELECT interval '0 days' * 'infinity'::float; +SELECT interval '0 days' * '-infinity'::float; +SELECT interval '5 days' * 'infinity'::float; +SELECT interval '5 days' * '-infinity'::float; SELECT interval 'infinity' / 'infinity'; SELECT interval 'infinity' / '-infinity';
Re: POC: Lock updated tuples in tuple_update() and tuple_delete()
Hi, Alexander! On 2023-04-02 03:37:19 +0300, Alexander Korotkov wrote: > On Sat, Apr 1, 2023 at 8:21 AM Andres Freund wrote: > > Given that the in-tree state has been broken for a week, I think it probably > > is time to revert the commits that already went in. > > The revised patch is attached. The most notable change is getting rid > of LazyTupleTableSlot. Also get rid of complex computations to detect > how to initialize LazyTupleTableSlot. Instead just pass the oldSlot > as an argument of ExecUpdate() and ExecDelete(). The price for this > is just preallocation of ri_oldTupleSlot before calling ExecDelete(). > The slot allocation is quite cheap. After all wrappers it's > table_slot_callbacks(), which is very cheap, single palloc() and few > fields initialization. It doesn't seem reasonable to introduce an > infrastructure to evade this. > > I think patch resolves all the major issues you've highlighted. Even > if there are some minor things missed, I'd prefer to push this rather > than reverting the whole work. I looked into the latest patch v3. In my view, it addresses all the issues discussed in [1]. Also, with the pushing oldslot logic outside code becomes more transparent. I've added some very minor modifications to the code and comments in patch v4-0001. Also, I'm for committing Andres' isolation test. I've added some minor revisions to make the test run routinely among the other isolation tests. The test could also be made a part of the existing eval-plan-qual.spec, but I have left it separate yet. Also, I think that signatures of ExecUpdate() and ExecDelete() functions, especially the last one are somewhat overloaded with different status bool variables added by different authors on different occasions. If they are combined into some kind of status variable, it would be nice. But as this doesn't touch API, is not related to the current update/delete optimization, it could be modified anytime in the future as well. The changes that indeed touch API are adding TupleTableSlot and conversion of bool wait flag into now four-state options variable for tuple_update(), tuple_delete(), heap_update(), heap_delete() and heap_lock_tuple() and a couple of Exec*DeleteTriggers(). I think they are justified. One thing that is not clear to me is that we pass oldSlot into simple_table_tuple_update() whereas as per the comment on this function "concurrent updates of the target tuple is not expected (for example, because we have a lock on the relation associated with the tuple)". It seems not to break anything but maybe this could be simplified. Overall I think the patch is good enough. Regards, Pavel Borisov, Supabase. [1] https://www.postgresql.org/message-id/CAPpHfdtwKb5UVXKkDQZYW8nQCODy0fY_S7mV5Z%2Bcg7urL%3DzDEA%40mail.gmail.com v4-0002-Add-EvalPlanQual-delete-returning-isolation-test.patch Description: Binary data v4-0001-Revise-changes-in-764da7710b-and-11470f544e.patch Description: Binary data
Re: POC: Lock updated tuples in tuple_update() and tuple_delete()
On Sun, Apr 2, 2023 at 3:47 AM Andres Freund wrote: > On 2023-04-02 03:37:19 +0300, Alexander Korotkov wrote: > > On Sat, Apr 1, 2023 at 8:21 AM Andres Freund wrote: > > > Given that the in-tree state has been broken for a week, I think it > > > probably > > > is time to revert the commits that already went in. > > > > The revised patch is attached. The most notable change is getting rid > > of LazyTupleTableSlot. Also get rid of complex computations to detect > > how to initialize LazyTupleTableSlot. Instead just pass the oldSlot > > as an argument of ExecUpdate() and ExecDelete(). The price for this > > is just preallocation of ri_oldTupleSlot before calling ExecDelete(). > > The slot allocation is quite cheap. After all wrappers it's > > table_slot_callbacks(), which is very cheap, single palloc() and few > > fields initialization. It doesn't seem reasonable to introduce an > > infrastructure to evade this. > > > > I think patch resolves all the major issues you've highlighted. Even > > if there are some minor things missed, I'd prefer to push this rather > > than reverting the whole work. > > Shrug. You're designing new APIs, days before the feature freeze. This just > doesn't seem ready in time for 16. I certainly won't have time to look at it > sufficiently in the next 5 days. OK. Reverted. -- Regards, Alexander Korotkov
Add missing copyright for pg_upgrade/t/* files
Dear hackers, While reading codes, I noticed that pg_upgrade/t/001_basic.pl and pg_upgrade/t/002_pg_upgrade.pl do not contain the copyright. I checked briefly and almost all files have that, so I thought they missed it. PSA the patch to fix them. Best Regards, Hayato Kuroda FUJITSU LIMITED add_copyright.patch Description: add_copyright.patch
RE: Support logical replication of DDLs
Hi, Sorry about the list. Since it was a question about the specifications I thought I had to ask it first in the general list. I will reply in the hackers list only for new features. Replicating from orcl to postgres was difficult. You mentionned renaming of columns, the ordinal position of a column is reused with a drop/add column in orcl and you can wrongly think it is a renaming from an external point of view. Only "advantage" with orcl is that you can drop/add columns thousands of times if you want, not with postgres. From PostgreSQL to PostgreSQL it's now easier of course but difficulty is that we have to separate DDL things. The "+" things have to be executed first on the replicated db (new tables, new columns, enlargement of columns). The "-" things have to be executed first on the source db (dropped tables, dropped columns, downsize of columns). DSS and OLTP teams are different, OLTP teams cannot or don't want to deal with DSS concerns etc. If replication is delayed it's not so trivial anway to know when you can drop a table on the replicated db for example. DSS team has in fact to build a system that detects a posteriori why the subscription is KO if something goes wrong. It can also be a human mistake, e.g a "create table very_important_table_to_save as select * from very_important_table;" and the replication is KO if the _save table is created in the published schema. I had read too fast. I read the proposals and Vignesh suggestion & syntax seem very promising. If I understand well an existing "for all tables" / "tables in schema" DML publication would have be to altered with ALTER PUBLICATION simple_applicative_schema_replication_that_wont_be_interrupted_for_an_rdbms_reason WITH (ddl = 'table:create,alter'); to get rid of the majority of possible interruptions. > Additionally, there could be some additional problems to deal with > like say if the column list has been specified then we ideally > shouldn't send those columns even in DDL. For example, if one uses > Alter Table .. Rename Column and the new column name is not present in > the published column list then we shouldn't send it. Perhaps I miss something but the names are not relevant here. The column is part of the publication and the corresponding DDL has to be sent, the column is not part of the publication and the DDL should not be sent. Dependencies are not based on names, it currently works like that with DML publication but also with views for example. Quick test : bas=# \dRp+ Publication test_phil Propriétaire | Toutes les tables | Insertions | Mises à jour | Suppressions | Tronque | Via la racine --+---++--+--+-+--- postgres | f | t | t| t| t | f Tables : "test_phil.t1" (c1, c2) bas=# alter table test_phil.t1 rename column c2 to c4; ALTER TABLE bas=# \dRp+ Publication test_phil Propriétaire | Toutes les tables | Insertions | Mises à jour | Suppressions | Tronque | Via la racine --+---++--+--+-+--- postgres | f | t | t| t| t | f Tables : "test_phil.t1" (c1, c4) "rename column" DDL has to be sent and the new name is not relevant in the decision to send it. If "rename column" DDL had concerned a column that is not part of the publication you wouldn't have to send the DDL, no matter the new name. Drop is not a problem. You cannot drop an existing column that is part of a publication without a "cascade". What could be problematic is a "add column" DDL and after that the column is added to the publication via "alter publication set". Such a case is difficult to deal with I guess. But the initial DDL to create a table is also not sent anyway right ? It could be a known limitation. I usually only test things in beta to report but I will try to have a look earlier at this patch since it is very interesting. That and the TDE thing but TDE is an external obligation and not a real interest. I obtained a delay but hopefully we will have this encryption thing or perhaps we will be obliged to go back to the proprietary RDBMS for some needs even if the feature is in fact mostly useless... Best regards, Phil De : Amit Kapila Envoyé : lundi 3 avril 2023 06:07 À : Phil Florent Cc : vignesh C ; houzj.f...@fujitsu.com ; Ajin Cherian ; wangw.f...@fujitsu.com ; Runqi Tian ; Peter Smith ; Tom Lane ; li jie ; Dilip Kumar ; Alvaro Herrera ; Masahiko Sawada ; Japin Li ; rajesh singarapu ; Zheng Li ; PostgreSQL Hackers Objet : Re: Support logical replication of DDLs On Sun, Apr 2, 2023 at 3:25 PM Phil Florent wrote: > > As an end-user, I am highly interested in the
Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?
Thomas Munro writes: > I think this experiment worked out pretty well. I think it's a nice > side-effect that you can see what memory the regexp subsystem is > using, and that's likely to lead to more improvements. (Why is it > limited to caching 32 entries? Why is it a linear search, not a hash > table? Why is LRU implemented with memmove() and not a list? Could > we have a GUC regex_cache_memory, so someone who uses a lot of regexes > can opt into a large cache?) On the other hand it also uses a bit > more RAM, like other code using the reparenting trick, which is a > topic for future research. > I vote for proceeding with this approach. I wish we didn't have to > tackle either a regexp interface/management change (done here) or a > CFI() redesign (not done, but probably also a good idea for other > reasons) before getting this signal stuff straightened out, but here > we are. This approach seems good to me. Anyone have a different > take? Sorry for not looking at this sooner. I am okay with the regex changes proposed in v5-0001 through 0003, but I think you need to take another mopup pass there. Some specific complaints: * header comment for pg_regprefix has been falsified (s/malloc/palloc/) * in spell.c, regex_affix_deletion_callback could be got rid of * check other callers of pg_regerror for now-useless CHECK_FOR_INTERRUPTS In general there's a lot of comments referring to regexes being malloc'd. I'm disinclined to change the ones inside the engine, because as far as it knows it is still using malloc, but maybe we should work harder on our own comments. In particular, it'd likely be useful to have something somewhere pointing out that pg_regfree is only needed when you can't get rid of the regex by context cleanup. Maybe write a short section about memory management in backend/regex/README? I've not really looked at 0004. regards, tom lane
Re: daitch_mokotoff module
Dag Lem writes: > I sincerely hope this resolves any blocking issues with copyright / > legalese / credits. > Can this now be considered ready for commiter, so that Paul or someone else can flip the bit? Best regards Dag Lem
Re: GUC for temporarily disabling event triggers
> On 3 Apr 2023, at 15:09, Robert Haas wrote: > > On Mon, Apr 3, 2023 at 8:46 AM Daniel Gustafsson wrote: >> All comments above addressed in the attached v5, thanks for review! > > I continue to think it's odd that the sense of this is inverted as > compared with row_security. I'm not sure I follow. Do you propose that the GUC enables classes of event triggers, the default being "all" (or similar) and one would remove the type of EVT for which debugging is needed? That doesn't seem like a bad idea, just one that hasn't come up in the discussion (and I didn't think about). -- Daniel Gustafsson
Re: RFC: logical publication via inheritance root?
Hi, > > Outside the scope of special TimescaleDB tables and the speculated > > pg_partman old-style table migration, will this proposed new feature > > have any other application? In other words, do you know if this > > proposal will be of any benefit to the *normal* users who just have > > native PostgreSQL inherited tables they want to replicate? > > I think it comes down to why an inheritance scheme was used. If it's > because you want to group rows into named, queryable subsets (e.g. the > "cities/capitals" example in the docs [1]), I don't think this has any > utility, because I assume you'd want to replicate your subsets as-is. > > But if it's because you've implemented a partitioning scheme of your > own (the docs still list reasons you might want to [2], even today), > and all you ever really do is interact with the root table, I think > this feature will give you some of the same benefits that > publish_via_partition_root gives native partition users. We're very > much in that boat, but I don't know how many others are. I would like to point out that inheritance is merely a tool for modeling data. Its use cases are not limited to only partitioning, although many people ended up using it for this purpose back when we didn't have a proper built-in partitioning. So unless we are going to remove inheritance in nearest releases (*) I believe it should work with logical replication in a sane and convenient way. Correct me if I'm wrong, but I got an impression that the patch tries to accomplish just that. (*) Which personally I believe would be a good change. Unlikely to happen, though. -- Best regards, Aleksander Alekseev
Re: GUC for temporarily disabling event triggers
On Mon, Apr 3, 2023 at 8:46 AM Daniel Gustafsson wrote: > All comments above addressed in the attached v5, thanks for review! I continue to think it's odd that the sense of this is inverted as compared with row_security. -- Robert Haas EDB: http://www.enterprisedb.com
Re: GUC for temporarily disabling event triggers
> On 2 Apr 2023, at 21:48, Justin Pryzby wrote: > + gettext_noop("Disable event triggers for the duration > of the session."), > > Why does is it say "for the duration of the session" ? > > It's possible to disable ignoring, and within the same session. > GUCs are typically "for the duration of the session" .. but they don't > say so (and don't need to). > > + elog(ERROR, "unsupport event trigger: %d", event); > > typo: unsupported > > +Allows to temporarily disable event triggers from executing in order > > => Allow temporarily disabling execution of event triggers .. > > +to troubleshoot and repair faulty event triggers. The value matches > +the type of event trigger to be ignored: > +ddl_command_start, > ddl_command_end, > +table_rewrite and sql_drop. > +Additionally, all event triggers can be disabled by setting it to > +all. Setting the value to none > +will ensure that all event triggers are enabled, this is the default > > It doesn't technically "ensure that they're enabled", since they can be > disabled by ALTER. Better to say that it "doesn't disable any even triggers". All comments above addressed in the attached v5, thanks for review! -- Daniel Gustafsson v5-0001-Add-GUC-for-temporarily-disabling-event-triggers.patch Description: Binary data
Re: Sketch of a fix for that truncation data corruption issue
Alvaro Herrera writes: > Has this problem been fixed? I was under the impression that it had > been, but I spent some 20 minutes now looking for code, commits, or > patches in the archives, and I can't find anything relevant. Maybe it > was fixed in some different way that's not so obviously connected? As far as I can see from a quick look at the code, nothing has been done that would alleviate this problem: smgrtruncate still calls DropRelationBuffers before truncating. Have you run into a new case of it? I don't recall having seen many field complaints about this since 2018. regards, tom lane
Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound
Hi John, Many thanks for all the great feedback! > Okay, the changes look good. To go further, I think we need to combine into > two patches, one with 0001-0003 and one with 0004: > > 1. Correct false statements about "shutdown" etc. This should contain changes > that can safely be patched all the way to v11. > 2. Change bad advice (single-user mode) into good advice. We can target head > first, and then try to adopt as far back as we safely can (and test). Done. > > > In swapping this topic back in my head, I also saw [2] where Robert > > > suggested > > > > > > "that old prepared transactions and stale replication > > > slots should be emphasized more prominently. Maybe something like: > > > > > > HINT: Commit or roll back old prepared transactions, drop stale > > > replication slots, or kill long-running sessions. > > > Ensure that autovacuum is progressing, or run a manual database-wide > > > VACUUM." > > > > It looks like the hint regarding replication slots was added at some > > point. Currently we have: > > > > ``` > > errhint( [...] > > "You might also need to commit or roll back old prepared > > transactions, or drop stale replication slots."))); > > ``` > > Yes, the exact same text as it appeared in the [2] thread above, which > prompted Robert's comment I quoted. I take the point to mean: All of these > things need to be taken care of *first*, before vacuuming, so the hint should > order things so that it is clear. > > > Please let me know if you think > > we should also add a suggestion to kill long-running sessions, etc. > > +1 for also adding that. OK, done. I included this change as a separate patch. It can be squashed with another one if necessary. While on it, I noticed that multixact.c still talks about a "shutdown". I made corresponding changes in 0001. > - errmsg("database is not accepting commands to avoid wraparound data loss in > database \"%s\"", > + errmsg("database is not accepting commands that generate new XIDs to avoid > wraparound data loss in database \"%s\"", > > I'm not quite on board with the new message, but welcome additional opinions. > For one, it's a bit longer and now ambiguous. I also bet that "generate XIDs" > doesn't really communicate anything useful. The people who understand > exactly what that means, and what the consequences are, are unlikely to let > the system get near wraparound in the first place, and might even know enough > to ignore the hint. > > I'm thinking we might need to convey something about "writes". While it's > less technically correct, I imagine it's more useful. Remember, many users > have it drilled in their heads that they need to drop immediately to > single-user mode. I'd like to give some idea of what they can and cannot do. This particular wording was chosen for consistency with multixact.c: ``` errmsg("database is not accepting commands that generate new MultiXactIds to avoid wraparound data loss in database \"%s\"", ``` The idea of using "writes" is sort of OK, but note that the same message will appear for a query like: ``` SELECT pg_current_xact_id(); ``` ... which doesn't do writes. The message will be misleading in this case. On top of that, although a PostgreSQL user may not be aware of MultiXactIds, arguably there are many users that are aware of XIDs. Not to mention the fact that XIDs are well documented. I didn't make this change in v4 since it seems to be controversial and probably not the highest priority at the moment. I suggest we discuss it separately. > I propose for discussion that 0004 should show in the docs all the queries > for finding prepared xacts, repl slots etc. If we ever show the info at > runtime, we can dispense with the queries, but there seems to be no urgency > for that... Good idea. > + Previously it was required to stop the postmaster and VACUUM the > database > + in a single-user mode. There is no need to use a single-user mode > anymore. > > I think we need to go further and actively warn against it: It's slow, > impossible to monitor, disables replication and disables safeguards against > wraparound. (Other bad things too, but these are easily understandable for > users) > > Maybe mention also that it's main use in wraparound situations is for a way > to perform DROPs and TRUNCATEs if that would help speed up resolution. Fixed. -- Best regards, Aleksander Alekseev v4-0001-Correct-the-docs-and-messages-about-preventing-XI.patch Description: Binary data v4-0002-Don-t-recommend-running-VACUUM-in-a-single-user-m.patch Description: Binary data v4-0003-Modify-the-hints-about-preventing-XID-wraparound.patch Description: Binary data
Re: hio.c does visibilitymap_pin()/IO while holding buffer lock
On 4/3/23 00:40, Andres Freund wrote: > Hi, > > On 2023-03-28 19:17:21 -0700, Andres Freund wrote: >> On 2023-03-28 18:21:02 -0700, Andres Freund wrote: >>> Here's a draft patch. >> >> Attached is v2, with a stupid bug fixed and a bit of comment / pgindent >> polish. > > I'd welcome some review (Tomas?), but otherwise I'm planning to push ahead > with this. > I guess the 0001 part was already pushed, so I should be looking only at 0002, correct? I think 0002 makes RelationGetBufferForTuple() harder to understand. I'm not saying it's incorrect, but I find it hard to reason about the new combinations of conditions :-( I mean, it only had this condition: if (otherBuffer != InvalidBuffer) { ... } but now it have if (unlockedTargetBuffer) { ... } else if (otherBuffer != InvalidBuffer) { ... } if (unlockedTargetBuffer || otherBuffer != InvalidBuffer) { ... } Not sure how to improve that :-/ but not exactly trivial to figure out what's going to happen. Maybe this * If we unlocked the target buffer above, it's unlikely, but possible, * that another backend used space on this page. might say what we're going to do in this case. I mean - I understand some backend may use space in unlocked page, but what does that mean for this code? What is it going to do? (The same comment talks about the next condition in much more detail, for example.) Also, isn't it a bit strange the free space check now happens outside any if condition? It used to happen in the if (otherBuffer != InvalidBuffer) { ... } block, but now it happens outside. > I'm still debating with myself whether this commit (or a prerequisite commit) > should move logic dealing with the buffer ordering into > GetVisibilityMapPins(), so we don't need two blocks like this: > > > if (otherBuffer == InvalidBuffer || targetBlock <= otherBlock) > GetVisibilityMapPins(relation, buffer, otherBuffer, >targetBlock, > otherBlock, vmbuffer, > > vmbuffer_other); > else > GetVisibilityMapPins(relation, otherBuffer, buffer, >otherBlock, > targetBlock, vmbuffer_other, >vmbuffer); > ... > > if (otherBuffer != InvalidBuffer) > { > if (GetVisibilityMapPins(relation, otherBuffer, buffer, > > otherBlock, targetBlock, vmbuffer_other, > > vmbuffer)) > unlockedTargetBuffer = true; > } > else > { > if (GetVisibilityMapPins(relation, buffer, > InvalidBuffer, > > targetBlock, InvalidBlockNumber, > > vmbuffer, InvalidBuffer)) > unlockedTargetBuffer = true; > } > } > Yeah. I haven't tried, but I imagine it'd make RelationGetBufferForTuple a little bit. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Why enable_hashjoin Completely disables HashJoin
Quan Zongliang writes: > I found that the enable_hashjoin disables HashJoin completely. Well, yeah. It's what you asked for. > Instead, it should add a disable cost to the cost calculation of > hashjoin. Why? The disable-cost stuff is a crude hack that we use when turning off a particular plan type entirely might render us unable to generate a valid plan. Hash join is not in that category. > After disabling all three, the HashJoin path should still be chosen. Why? Personally, I'd get rid of disable_cost altogether if I could. I'm not in a hurry to extend its use to more places. regards, tom lane
Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode
I've now pushed up v8-0004. Can rebase the remaining 2 patches on top of master again and resend? On Mon, 3 Apr 2023 at 08:11, Melanie Plageman wrote: > I still have a few open questions: > - what the initial value of ring_size for autovacuum should be (see the > one remaining TODO in the code) I assume you're talking about the 256KB BAS_VACUUM one set in GetAccessStrategy()? I don't think this patch should be doing anything to change those defaults. Anything that does that should likely have a new thread and come with analysis or reasoning about why the newly proposed defaults are better than the old ones. > - should ANALYZE allow specifying BUFFER_USAGE_LIMIT since it uses the guc > value when that is set? That's a good question... > - should INDEX_CLEANUP off cause VACUUM to use shared buffers and > disable use of a strategy (like failsafe vacuum) I don't see why it should. It seems strange to have one option magically make changes to some other option. > - should we add anything to VACUUM VERBOSE output about the number of > reuses of strategy buffers? Sounds like this would require an extra array of counter variables in BufferAccessStrategyData? I think it might be a bit late to start experimenting with this. > - Should we make BufferAccessStrategyData non-opaque so that we don't > have to add a getter for nbuffers. I could have implemented this in > another way, but I don't really see why BufferAccessStrategyData > should be opaque If nothing outside of the .c file requires access then there's little need to make the members known outside of the file. Same as you'd want to make classes private rather than public when possible in OOP. If you do come up with a reason to be able to determine the size of the BufferAccessStrategy from outside freelist.c, I'd say an accessor method is the best way. David David
Re: running logical replication as the subscription owner
On Thu, Mar 30, 2023 at 7:12 PM Robert Haas wrote: > > I don't think run_as_owner is terrible, despite the ambiguity. It's > talking about the owner of the object on which the property is being > set. > I find this justification quite reasonable to keep the option name as run_as_owner. So, +1 to use the new option name as run_as_owner. -- With Regards, Amit Kapila.
Re: Why enable_hashjoin Completely disables HashJoin
On 4/3/23 12:23, Quan Zongliang wrote: > Hi, > > I found that the enable_hashjoin disables HashJoin completely. > It's in the function add_paths_to_joinrel: > > if (enable_hashjoin || jointype == JOIN_FULL) > hash_inner_and_outer(root, joinrel, outerrel, innerrel, > jointype, ); > > Instead, it should add a disable cost to the cost calculation of > hashjoin. And now final_cost_hashjoin does the same thing: > > if (!enable_hashjoin) > startup_cost += disable_cost; > > > enable_mergejoin has the same problem. > > Test case: > > CREATE TABLE t_score_01( > s_id int, > s_score int, > s_course char(8), > c_id int); > > CREATE TABLE t_student_01( > s_id int, > s_name char(8)); > > insert into t_score_01 values( > generate_series(1, 100), random()*100, 'course', generate_series(1, > 100)); > > insert into t_student_01 values(generate_series(1, 100), 'name'); > > analyze t_score_01; > analyze t_student_01; > > SET enable_hashjoin TO off; > SET enable_nestloop TO off; > SET enable_mergejoin TO off; > > explain select count(*) > from t_student_01 a join t_score_01 b on a.s_id=b.s_id; > > After disabling all three, the HashJoin path should still be chosen. > It's not clear to me why that behavior would be desirable? Why is this an issue you need so solve? AFAIK the reason why some paths are actually disabled (not built at all) while others are only penalized by adding disable_cost is that we need to end up with at least one way to execute the query. So we pick a path that we know is possible (e.g. seqscan) and hard-disable other paths. But the always-possible path is only soft-disabled by disable_cost. For joins, we do the same thing. The hash/merge joins may not be possible, because the data types may not have hash/sort operators, etc. Nestloop is always possible. So we soft-disable nestloop but hard-disable hash/merge joins. I doubt we want to change this behavior, unless there's a good reason to do that ... regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company