Re: Allow placeholders in ALTER ROLE w/o superuser
Thanks a lot for the feedback Nathan. Taking your options into consideration, for me the correct behaviour should be: - The ALTER ROLE placeholder should always be stored with a PGC_USERSET GucContext. It's a placeholder anyway, so it should be the less restrictive one. If the user wants to define it as PGC_SUSET or other this should be done through a custom extension. - When an extension claims the placeholder, we should check the DefineCustomXXXVariable GucContext with PGC_USERSET. If there's a match, then the value gets applied, otherwise WARN or ERR. The role GUCs get applied at login time right? So at this point we can WARN or ERR about the defined role GUCs. What do you think? On Mon, 18 Jul 2022 at 19:03, Nathan Bossart wrote: > On Fri, Jul 01, 2022 at 04:40:27PM -0700, Nathan Bossart wrote: > > On Sun, Jun 05, 2022 at 11:20:38PM -0500, Steve Chavez wrote: > >> However, defining placeholders at the role level require superuser: > >> > >> alter role myrole set my.username to 'tomas'; > >> ERROR: permission denied to set parameter "my.username" > >> > >> Which is inconsistent and surprising behavior. I think it doesn't make > >> sense since you can already set them at the session or transaction > >> level(SET LOCAL my.username = 'tomas'). Enabling this would allow > sidecar > >> services to store metadata scoped to its pertaining role. > >> > >> I've attached a patch that removes this restriction. From my testing, > this > >> doesn't affect permission checking when an extension defines its custom > GUC > >> variables. > >> > >>DefineCustomStringVariable("my.custom", NULL, NULL, _custom, > NULL, > >> PGC_SUSET, ..); > >> > >> Using PGC_SUSET or PGC_SIGHUP will fail accordingly. Also no tests fail > >> when doing "make installcheck". > > > > IIUC you are basically proposing to revert a6dcd19 [0], but it is not > clear > > to me why that is safe. Am I missing something? > > I spent some more time looking into this, and I think I've constructed a > simple example that demonstrates the problem with removing this > restriction. > > postgres=# CREATE ROLE test CREATEROLE; > CREATE ROLE > postgres=# CREATE ROLE other LOGIN; > CREATE ROLE > postgres=# GRANT CREATE ON DATABASE postgres TO other; > GRANT > postgres=# SET ROLE test; > SET > postgres=> ALTER ROLE other SET plperl.on_plperl_init = 'test'; > ALTER ROLE > postgres=> \c postgres other > You are now connected to database "postgres" as user "other". > postgres=> CREATE EXTENSION plperl; > CREATE EXTENSION > postgres=> SHOW plperl.on_plperl_init; > plperl.on_plperl_init > --- > test > (1 row) > > In this example, the non-superuser role sets a placeholder GUC for another > role. This GUC becomes a PGC_SUSET GUC when plperl is loaded, so a > non-superuser role will have successfully set a PGC_SUSET GUC. If we had a > record of who ran ALTER ROLE, we might be able to apply appropriate > permissions checking when the module is loaded, but this information > doesn't exist in pg_db_role_setting. IIUC we have the following options: > > 1. Store information about who ran ALTER ROLE. I think there are a >couple of problems with this. For example, what happens if the >grantor was dropped or its privileges were altered? Should we >instead store the context of the user (i.e., PGC_USERSET or >PGC_SUSET)? Do we need to add entries to pg_depend? > 2. Ignore or ERROR for any ALTER ROLE settings for custom GUCs. > Since >we don't know who ran ALTER ROLE, we can't trust the value. > 3. Require superuser to use ALTER ROLE for a placeholder. This is > what >we do today. Since we know a superuser set the value, we can > always >apply it when the custom GUC is finally defined. > > If this is an accurate representation of the options, it seems clear why > the superuser restriction is in place. > > -- > Nathan Bossart > Amazon Web Services: https://aws.amazon.com >
Re: Windows now has fdatasync()
On Tue, Jul 19, 2022 at 4:54 PM Michael Paquier wrote: > Do you still need HAVE_DECL_FDATASYNC? I guess so, because that is currently used for macOS, and with this patch would also be used to control the declaration for Windows. The alternative would be to explicitly test for WIN32 or __darwin__. The reason we need it for macOS is that they have had fdatasync function for many years now, and configure detects it, but they haven't ever declared it in a header, so we (accidentally?) do it in c.h. We didn't set that up for Apple! The commit that added it was 33cc5d8a, which was about a month before Apple shipped the first version of OS X (and long before they defined the function). So there must have been another Unix with that problem, lost in the mists of time.
Re: Making pg_rewind faster
On Mon, Jul 18, 2022 at 05:14:00PM +, Justin Kwan wrote: > Thank you for taking a look at this and that sounds good. I will > send over a patch compatible with Postgres v16. +$node_2->psql( + 'postgres', + "SELECT extract(epoch from modification) FROM pg_stat_file('pg_wal/00010003');", + stdout => \my $last_common_tli1_wal_last_modified_at); Please note that you should not rely on the FS-level stats for anything that touches the WAL segments. A rough guess about what you could here to make sure that only the set of WAL segments you are looking for is being copied over would be to either: - Scan the logs produced by pg_rewind and see if the segments are copied or not, depending on the divergence point (aka the last checkpoint before WAL forked). - Clean up pg_wal/ in the target node before running pg_rewind, checking that only the segments you want are available once the operation completes. -- Michael signature.asc Description: PGP signature
Re: System catalog documentation chapter
On Tue, Jul 19, 2022 at 1:22 PM Tom Lane wrote: > > Bruce Momjian writes: > > On Sat, Jul 16, 2022 at 10:53:17AM +0200, Peter Eisentraut wrote: > >> Maybe this whole notion that "system views" is one thing is not suitable. > > > Are you thinking we should just call the chapter "System Catalogs and > > Views" and just place them alphabetically in a single chapter? > > I didn't think that was Peter's argument at all. He's complaining > that "system views" isn't a monolithic category, which is a reasonable > point, especially since we have a bunch of built-in views that appear > in other chapters. But to then also confuse them with catalogs isn't > improving the situation. > My original post was prompted when I was scrolling in the table-of-contents for chapter 53 "System Catalogs". unable to find a Catalog because I did not realise it was really a View. It was only when I couldn't find it alphabetically that I noticed there was *another* appended list of Views, but then the "System Views" heading seemed strangely buried at the same heading level as everything else... and although there was an "Overview" section for Catalogs there was no "Overview" section for the Views... Maybe I was only seeing the tip of the iceberg. I'm not sure anymore what the best solution is. I do prefer the recent changes over how it used to be, but perhaps they also introduce a whole new set of problems. --- (It used to look like this) Chapter 53. "System Catalogs" == 53.1. Overview 53.2. pg_aggregate 53.3. pg_am 53.4. pg_amop 53.5. pg_amproc ... 53.66. System Views <--- 2nd heading just hiding here 53.67. pg_available_extensions 53.68. pg_available_extension_versions 53.69. pg_backend_memory_contexts 53.70. pg_config ... -- Kind Regards, Peter Smith. Fujitsu Australia.
Re: Windows now has fdatasync()
On Mon, Jul 18, 2022 at 03:26:36PM +1200, Thomas Munro wrote: > My plan now is to commit this patch so that problem #1 is solved, prod > conchuela's owner to upgrade to solve #2, and wait until Tom shuts > down prairiedog to solve #3. Then we could consider removing the > HAVE_FDATASYNC probe and associated #ifdefs when convenient. For that > reason, I'm not too bothered about the slight weirdness of defining > HAVE_FDATASYNC on Windows even though that doesn't come from > configure; it'd hopefully be short-lived. Better ideas welcome, > though. Does that make sense? Do you still need HAVE_DECL_FDATASYNC? -- Michael signature.asc Description: PGP signature
Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
On Tue, Jul 19, 2022 at 6:34 AM Masahiko Sawada wrote: > > On Mon, Jul 18, 2022 at 1:12 PM Amit Kapila wrote: > > > > On Fri, Jul 15, 2022 at 8:09 PM Masahiko Sawada > > wrote: > > > > > > This patch should have the fix for the issue that Shi yu reported. Shi > > > yu, could you please test it again with this patch? > > > > > > > Can you explain the cause of the failure and your fix for the same? > > @@ -1694,6 +1788,8 @@ out: > /* be tidy */ > if (ondisk) > pfree(ondisk); > + if (catchange_xip) > + pfree(catchange_xip); > > Regarding the above code in the previous version patch, looking at the > generated assembler code shared by Shi yu offlist, I realized that the > “if (catchange_xip)” is removed (folded) by gcc optimization. This is > because we dereference catchange_xip before null-pointer check as > follow: > > + /* copy catalog modifying xacts */ > + sz = sizeof(TransactionId) * catchange_xcnt; > + memcpy(ondisk_c, catchange_xip, sz); > + COMP_CRC32C(ondisk->checksum, ondisk_c, sz); > + ondisk_c += sz; > > Since sz is 0 in this case, memcpy doesn’t do anything actually. > > By checking the assembler code, I’ve confirmed that gcc does the > optimization for these code and setting > -fno-delete-null-pointer-checks flag prevents the if statement from > being folded. Also, I’ve confirmed that adding the check if > "catchange.xcnt > 0” before the null-pointer check also can prevent > that. Adding a check if "catchange.xcnt > 0” looks more robust. I’ve > added a similar check for builder->committed.xcnt as well for > consistency. builder->committed.xip could have no transactions. > Good work. I wonder without comments this may create a problem in the future. OTOH, I don't see adding a check "catchange.xcnt > 0" before freeing the memory any less robust. Also, for consistency, we can use a similar check based on xcnt in the SnapBuildRestore to free the memory in the below code: + /* set catalog modifying transactions */ + if (builder->catchange.xip) + pfree(builder->catchange.xip); -- With Regards, Amit Kapila.
Re: In-placre persistance change of a relation
(Mmm. I haven't noticed an annoying misspelling in the subejct X( ) > Thank you for checking that! It got a wider attack by b0a55e4329 > (RelFileNumber). The commit message suggests "relfilenode" as files Then, now I stepped on my own foot. Rebased also on nodefuncs autogeneration. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From cdd0282eb373ce79b8495b9a1160fb8c5122315e Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Tue, 19 Jul 2022 13:23:01 +0900 Subject: [PATCH v23 1/2] In-place table persistence change Even though ALTER TABLE SET LOGGED/UNLOGGED does not require data rewriting, currently it runs heap rewrite which causes large amount of file I/O. This patch makes the command run without heap rewrite. Addition to that, SET LOGGED while wal_level > minimal emits WAL using XLOG_FPI instead of massive number of HEAP_INSERT's, which should be smaller. Also this allows for the cleanup of files left behind in the crash of the transaction that created it. --- src/backend/access/rmgrdesc/smgrdesc.c| 49 ++ src/backend/access/transam/README | 10 + src/backend/access/transam/xact.c | 7 + src/backend/access/transam/xlogrecovery.c | 18 + src/backend/catalog/storage.c | 561 +- src/backend/commands/tablecmds.c | 267 -- src/backend/replication/basebackup.c | 9 +- src/backend/storage/buffer/bufmgr.c | 85 src/backend/storage/file/fd.c | 4 +- src/backend/storage/file/reinit.c | 340 + src/backend/storage/smgr/md.c | 95 +++- src/backend/storage/smgr/smgr.c | 32 ++ src/backend/storage/sync/sync.c | 21 +- src/bin/pg_rewind/parsexlog.c | 22 + src/bin/pg_rewind/pg_rewind.c | 1 - src/common/relpath.c | 47 +- src/include/catalog/storage.h | 3 + src/include/catalog/storage_xlog.h| 43 +- src/include/common/relpath.h | 9 +- src/include/storage/bufmgr.h | 2 + src/include/storage/fd.h | 1 + src/include/storage/md.h | 8 +- src/include/storage/reinit.h | 10 +- src/include/storage/smgr.h| 17 + src/tools/pgindent/typedefs.list | 7 + 25 files changed, 1485 insertions(+), 183 deletions(-) diff --git a/src/backend/access/rmgrdesc/smgrdesc.c b/src/backend/access/rmgrdesc/smgrdesc.c index e0ee8a078a..2f92c06f70 100644 --- a/src/backend/access/rmgrdesc/smgrdesc.c +++ b/src/backend/access/rmgrdesc/smgrdesc.c @@ -40,6 +40,46 @@ smgr_desc(StringInfo buf, XLogReaderState *record) xlrec->blkno, xlrec->flags); pfree(path); } + else if (info == XLOG_SMGR_UNLINK) + { + xl_smgr_unlink *xlrec = (xl_smgr_unlink *) rec; + char *path = relpathperm(xlrec->rlocator, xlrec->forkNum); + + appendStringInfoString(buf, path); + pfree(path); + } + else if (info == XLOG_SMGR_MARK) + { + xl_smgr_mark *xlrec = (xl_smgr_mark *) rec; + char *path = GetRelationPath(xlrec->rlocator.dbOid, + xlrec->rlocator.spcOid, + xlrec->rlocator.relNumber, + InvalidBackendId, + xlrec->forkNum, xlrec->mark); + char *action = ""; + + switch (xlrec->action) + { + case XLOG_SMGR_MARK_CREATE: +action = "CREATE"; +break; + case XLOG_SMGR_MARK_UNLINK: +action = "DELETE"; +break; + } + + appendStringInfo(buf, "%s %s", action, path); + pfree(path); + } + else if (info == XLOG_SMGR_BUFPERSISTENCE) + { + xl_smgr_bufpersistence *xlrec = (xl_smgr_bufpersistence *) rec; + char *path = relpathperm(xlrec->rlocator, MAIN_FORKNUM); + + appendStringInfoString(buf, path); + appendStringInfo(buf, " persistence %d", xlrec->persistence); + pfree(path); + } } const char * @@ -55,6 +95,15 @@ smgr_identify(uint8 info) case XLOG_SMGR_TRUNCATE: id = "TRUNCATE"; break; + case XLOG_SMGR_UNLINK: + id = "UNLINK"; + break; + case XLOG_SMGR_MARK: + id = "MARK"; + break; + case XLOG_SMGR_BUFPERSISTENCE: + id = "BUFPERSISTENCE"; + break; } return id; diff --git a/src/backend/access/transam/README b/src/backend/access/transam/README index 734c39a4d0..f08bd7f42d 100644 --- a/src/backend/access/transam/README +++ b/src/backend/access/transam/README @@ -724,6 +724,16 @@ we must panic and abort recovery. The DBA will have to manually clean up then restart recovery. This is part of the reason for not writing a WAL entry until we've successfully done the original action. + +The Smgr MARK files + + +An smgr mark file is an empty file that is created when a new relation +storage file is created to signal that the storage file needs to be +cleaned up at recovery time. In contrast to the four actions above, +failure to remove smgr mark files will lead to data loss, in which +case the server will shut down. +
Re: [PoC] Improve dead tuple storage for lazy vacuum
On Tue, Jul 19, 2022 at 9:11 AM Masahiko Sawada wrote: > I’d like to keep the first version simple. We can improve it and add > more optimizations later. Using radix tree for vacuum TID storage > would still be a big win comparing to using a flat array, even without > all these optimizations. In terms of single-value leaves method, I'm > also concerned about an extra pointer traversal and extra memory > allocation. It's most flexible but multi-value leaves method is also > flexible enough for many use cases. Using the single-value method > seems to be too much as the first step for me. > > Overall, using 64-bit keys and 64-bit values would be a reasonable > choice for me as the first step . It can cover wider use cases > including vacuum TID use cases. And possibly it can cover use cases by > combining a hash table or using tree of tree, for example. These two aspects would also bring it closer to Andres' prototype, which 1) makes review easier and 2) easier to preserve optimization work already done, so +1 from me. -- John Naylor EDB: http://www.enterprisedb.com
Re: [PoC] Improve dead tuple storage for lazy vacuum
On Mon, Jul 18, 2022 at 9:10 PM John Naylor wrote: > On Tue, Jul 19, 2022 at 9:24 AM Andres Freund wrote: > > FWIW, I think the best path forward would be to do something similar to the > > simplehash.h approach, so it can be customized to the specific user. > > I figured that would come up at some point. It may be worth doing in the > future, but I think it's way too much to ask for the first use case. I have a prototype patch that creates a read-only snapshot of the visibility map, and has vacuumlazy.c work off of that when determining with pages to skip. The patch also gets rid of the SKIP_PAGES_THRESHOLD stuff. This is very effective with TPC-C, principally because it really cuts down on the number of scanned_pages that are scanned only because the VM bit is unset concurrently by DML. The window for this is very large when the table is large (and naturally takes a long time to scan), resulting in many more "dead but not yet removable" tuples being encountered than necessary. Which itself causes bogus information in the FSM -- information about the space that VACUUM could free from the page, which is often highly misleading. There are remaining questions about how to do this properly. Right now I'm just copying pages from the VM into local memory, right after OldestXmin is first acquired -- we "lock in" a snapshot of the VM at the earliest opportunity, which is what lazy_scan_skip() actually works off now. There needs to be some consideration given to the resource management aspects of this -- it needs to use memory sensibly, which the current prototype patch doesn't do at all. I'm probably going to seriously pursue this as a project soon, and will probably need some kind of data structure for the local copy. The raw pages are usually quite space inefficient, considering we only need an immutable snapshot of the VM. I wonder if it makes sense to use this as part of this project. It will be possible to know the exact heap pages that will become scanned_pages before scanning even one page with this design (perhaps with caveats about low memory conditions). It could also be very effective as a way of speeding up TID lookups in the reasonably common case where most scanned_pages don't have any LP_DEAD items -- just look it up in our local/materialized copy of the VM first. But even when LP_DEAD items are spread fairly evenly, it could still give us reliable information about the distribution of LP_DEAD items very early on. Maybe the two data structures could even be combined in some way? You can use more memory for the local copy of the VM if you know that you won't need the memory for dead_items. It's kinda the same problem, in a way. -- Peter Geoghegan
Re: Allowing REINDEX to have an optional name
On Mon, Jul 18, 2022 at 09:26:53PM -0500, Justin Pryzby wrote: > Sorry, I meant to send this earlier.. No problem. > It looks like you named the table "toast_relfilenodes", but then also store > to it data for non-toast tables. How about naming that index_relfilenodes? One difference with what I posted previously and 5fb5b6 is the addition of an extra regclass that stores the parent table, for reference in the output. > It's also a bit weird to call the column "relname" but use it to store the > ::regclass. You later need to cast the column to text, so you may as well > store it as text, either relname or oid::regclass. I have used "indname" at the end. > It seems like cluster.sql does this more succinctly. Except that this does not include the relfilenodes from the toast indexes, which is something I wanted to add a check for when it comes to both user tables and catalogs. > Why {4,5} ? Looks like a brain fade from here, while looking the relation names this generated. This could just match with an integer. -- Michael diff --git a/src/bin/scripts/t/090_reindexdb.pl b/src/bin/scripts/t/090_reindexdb.pl index 864707ff92..b5fff5a9cf 100644 --- a/src/bin/scripts/t/090_reindexdb.pl +++ b/src/bin/scripts/t/090_reindexdb.pl @@ -40,7 +40,7 @@ my $toast_index = $node->safe_psql('postgres', # REINDEX operations. A set of relfilenodes is saved from the catalogs # and then compared with pg_class. $node->safe_psql('postgres', - 'CREATE TABLE toast_relfilenodes (parent regclass, indname regclass, relfilenode oid);' + 'CREATE TABLE index_relfilenodes (parent regclass, indname regclass, relfilenode oid);' ); # Save the relfilenode of a set of toast indexes, one from the catalog # pg_constraint and one from the test table. @@ -58,8 +58,8 @@ my $fetch_index_relfilenodes = qq{SELECT i.indrelid, a.oid, a.relfilenode JOIN pg_index i ON (i.indexrelid = a.oid) WHERE a.relname IN ('pg_constraint_oid_index', 'test1x')}; my $save_relfilenodes = - "INSERT INTO toast_relfilenodes $fetch_toast_relfilenodes;" - . "INSERT INTO toast_relfilenodes $fetch_index_relfilenodes;"; + "INSERT INTO index_relfilenodes $fetch_toast_relfilenodes;" + . "INSERT INTO index_relfilenodes $fetch_index_relfilenodes;"; # Query to compare a set of relfilenodes saved with the contents of pg_class. # Note that this does not join using OIDs, as CONCURRENTLY would change them @@ -68,10 +68,10 @@ my $save_relfilenodes = # based on the name is enough to ensure a fixed output, where the name of the # parent table is included to provide more context. my $compare_relfilenodes = qq(SELECT b.parent::regclass, - regexp_replace(b.indname::text, '(pg_toast.pg_toast_)\\d{4,5}(_index)', '\\1\\2'), + regexp_replace(b.indname::text, '(pg_toast.pg_toast_)\\d+(_index)', '\\1\\2'), CASE WHEN a.relfilenode = b.relfilenode THEN 'relfilenode is unchanged' ELSE 'relfilenode has changed' END - FROM toast_relfilenodes b + FROM index_relfilenodes b JOIN pg_class a ON b.indname::text = a.oid::regclass::text ORDER BY b.parent::text, b.indname::text); @@ -91,7 +91,7 @@ test1|test1x|relfilenode has changed), # Re-save and run the second one. $node->safe_psql('postgres', - "TRUNCATE toast_relfilenodes; $save_relfilenodes"); + "TRUNCATE index_relfilenodes; $save_relfilenodes"); $node->issues_sql_like( [ 'reindexdb', '-s', 'postgres' ], qr/statement: REINDEX SYSTEM postgres;/, signature.asc Description: PGP signature
Re: [PoC] Improve dead tuple storage for lazy vacuum
On Tue, Jul 19, 2022 at 9:24 AM Andres Freund wrote: > FWIW, I think the best path forward would be to do something similar to the > simplehash.h approach, so it can be customized to the specific user. I figured that would come up at some point. It may be worth doing in the future, but I think it's way too much to ask for the first use case. -- John Naylor EDB: http://www.enterprisedb.com
Re: Rename some rel truncation related constants at the top of vacuumlazy.c
On Mon, Jul 18, 2022 at 8:55 PM Tom Lane wrote: > Um ... you seem to have removed some useful comments? I don't think that the stuff about making them into a GUC is useful myself. > Personally I wouldn't do this, as I don't think the renaming > brings much benefit, and it will create a hazard for back-patching > any fixes that might be needed in that code. I'm not hugely upset > about it, but that's the way I'd vote if asked. In that case I withdraw the patch. FWIW I wrote the patch during the course of work on new feature development. A patch that added a couple of similar constants a bit further down. Seemed neater this way, but it's certainly not worth arguing over. -- Peter Geoghegan
Re: Rename some rel truncation related constants at the top of vacuumlazy.c
Peter Geoghegan writes: > I propose to rename some of the rel truncation related constants at > the top of vacuumlazy.c, per the attached patch. The patch > consolidates related constants into a single block/grouping, and > imposes a uniform naming scheme. Um ... you seem to have removed some useful comments? Personally I wouldn't do this, as I don't think the renaming brings much benefit, and it will create a hazard for back-patching any fixes that might be needed in that code. I'm not hugely upset about it, but that's the way I'd vote if asked. regards, tom lane
Rename some rel truncation related constants at the top of vacuumlazy.c
I propose to rename some of the rel truncation related constants at the top of vacuumlazy.c, per the attached patch. The patch consolidates related constants into a single block/grouping, and imposes a uniform naming scheme. -- Peter Geoghegan 0001-vacuumlazy.c-rename-rel-truncation-constants.patch Description: Binary data
Re: Allowing REINDEX to have an optional name
On Sun, Jul 17, 2022 at 10:58:26AM +0100, Simon Riggs wrote: > Sounds great, looks fine. Thanks for your review. Ok, cool. At the end, I have decided to split the tests and the main patch into two different commits, as each is useful on its own. Doing so also helps in seeing the difference of behavior when issuing a REINDEX DATABASE. Another thing that was itching me with the test is that it was not possible to make the difference between the toast index of the catalog and of the user table, so I have added the parent table name as an extra thing stored in the table storing the relfilenodes. -- Michael signature.asc Description: PGP signature
Re: Doc about how to set max_wal_senders when setting minimal wal_level
On Mon, Jul 18, 2022 at 8:16 PM Bruce Momjian wrote: > On Mon, Jul 18, 2022 at 07:39:55PM -0700, David G. Johnston wrote: > > On Mon, Jul 18, 2022 at 6:27 PM Japin Li wrote: > > > > > > +0.90 > > > > Consider changing: > > > > "makes any base backups taken before this unusable" > > > > to: > > > > "makes existing base backups unusable" > > > > As I try to justify this, though, it isn't quite true, maybe: > > > > "makes point-in-time recovery, using existing base backups, unable to > replay > > future WAL." > > I went with simpler wording. > > +1 Thanks! David J.
Re: Doc about how to set max_wal_senders when setting minimal wal_level
On Mon, Jul 18, 2022 at 07:39:55PM -0700, David G. Johnston wrote: > On Mon, Jul 18, 2022 at 6:27 PM Japin Li wrote: > > > +0.90 > > Consider changing: > > "makes any base backups taken before this unusable" > > to: > > "makes existing base backups unusable" > > As I try to justify this, though, it isn't quite true, maybe: > > "makes point-in-time recovery, using existing base backups, unable to replay > future WAL." I went with simpler wording. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 37fd80388c..c2bdacb6a7 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -2764,9 +2764,10 @@ include_dir 'conf.d' levels. This parameter can only be set at server start. -In minimal level, no information is logged for -permanent relations for the remainder of a transaction that creates or -rewrites them. This can make operations much faster (see +The minimal level generates the least WAL +volume. It logs no row information for permanent relations +in transactions that create or +rewrite them. This can make operations much faster (see ). Operations that initiate this optimization include: @@ -2778,19 +2779,20 @@ include_dir 'conf.d' REINDEX TRUNCATE -But minimal WAL does not contain enough information to reconstruct the -data from a base backup and the WAL logs, so replica or -higher must be used to enable WAL archiving -() and streaming replication. +However, minimal WAL does not contain sufficient information for +point-in-time recovery, so replica or +higher must be used to enable continuous archiving +() and streaming binary replication. +In fact, the server will not even start in this mode if +max_wal_senders is non-zero. Note that changing wal_level to -minimal makes any base backups taken before -unavailable for archive recovery and standby server, which may -lead to data loss. +minimal makes previous base backups unusable +for point-in-time recovery and standby servers. In logical level, the same information is logged as -with replica, plus information needed to allow -extracting logical change sets from the WAL. Using a level of +with replica, plus information needed to +extract logical change sets from the WAL. Using a level of logical will increase the WAL volume, particularly if many tables are configured for REPLICA IDENTITY FULL and many UPDATE and DELETE statements are
Re: doc: New cumulative stats subsystem obsoletes comment in maintenance.sgml
Hi, On 2022-07-18 19:47:39 -0700, David G. Johnston wrote: > On Thu, Jul 14, 2022 at 4:31 PM Andres Freund wrote: > > It might make sense to still say semi-accurate, but adjust the explanation > > to > > say that stats reporting is not instantaneous? > > > > > Unless that delay manifests in executing an UPDATE in a session then > looking at these views in the same session and not seeing that update > reflected I wouldn't mention it. Depending on which stats you're looking at, yes, that could totally happen. I don't think the issue is not seeing changes from the current transaction though - it's that *after* commit you might not see them for a while (the're transmitted not more than once a second, and can be delayed up to 60s if there's contention). Greetings, Andres Freund
Re: Error "initial slot snapshot too large" in create replication slot
At Tue, 5 Jul 2022 11:32:42 -0700, Jacob Champion wrote in > On Thu, Mar 31, 2022 at 11:53 PM Kyotaro Horiguchi > wrote: > > So this is that. v5 creates a regular snapshot. > > This patch will need a quick rebase over 905c020bef9, which added > `extern` to several missing locations. Thanks! Just rebased. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From abcf0a0e0b3e2de9927d8943a3e3c145ab189508 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Tue, 19 Jul 2022 11:50:29 +0900 Subject: [PATCH v6] Create correct snapshot during CREATE_REPLICATION_SLOT Snapshot can hold top XIDs up to the number of server processes but SnapBuildInitialSnapshot tries to store all top-level and sub XIDs to there and easily fails with "initial slot snapshot too large" error. We could instead create a "takenDuringRecovery" snapshot, which later leads to unnecessary visibility checks. Therefore we take trouble to create a regular snapshot by identifying whether each xids is top-level and storing it in the appropriate xid array. Author: Dilip Kumar and Kyotaro Horiguchi --- .../replication/logical/reorderbuffer.c | 20 + src/backend/replication/logical/snapbuild.c | 45 +++ src/include/replication/reorderbuffer.h | 1 + 3 files changed, 58 insertions(+), 8 deletions(-) diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c index 88a37fde72..44ea3f31aa 100644 --- a/src/backend/replication/logical/reorderbuffer.c +++ b/src/backend/replication/logical/reorderbuffer.c @@ -3332,6 +3332,26 @@ ReorderBufferXidHasBaseSnapshot(ReorderBuffer *rb, TransactionId xid) } +/* + * ReorderBufferXidIsKnownSubXact + * Returns true if the xid is a known subtransaction. + */ +bool +ReorderBufferXidIsKnownSubXact(ReorderBuffer *rb, TransactionId xid) +{ + ReorderBufferTXN *txn; + + txn = ReorderBufferTXNByXid(rb, xid, false, +NULL, InvalidXLogRecPtr, false); + + /* a known subtxn? */ + if (txn && rbtxn_is_known_subxact(txn)) + return true; + + return false; +} + + /* * --- * Disk serialization support diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c index 73c0f15214..2b5282788a 100644 --- a/src/backend/replication/logical/snapbuild.c +++ b/src/backend/replication/logical/snapbuild.c @@ -532,7 +532,10 @@ SnapBuildInitialSnapshot(SnapBuild *builder) Snapshot snap; TransactionId xid; TransactionId *newxip; - int newxcnt = 0; + TransactionId *newsubxip; + int newxcnt; + int newsubxcnt; + bool overflowed = false; Assert(!FirstSnapshotSet); Assert(XactIsoLevel == XACT_REPEATABLE_READ); @@ -568,9 +571,17 @@ SnapBuildInitialSnapshot(SnapBuild *builder) MyProc->xmin = snap->xmin; - /* allocate in transaction context */ + /* + * Allocate in transaction context. + * + * We could use only subxip to store all xids (takenduringrecovery + * snapshot) but that causes useless visibility checks later so we hasle to + * create a normal snapshot. + */ newxip = (TransactionId *) palloc(sizeof(TransactionId) * GetMaxSnapshotXidCount()); + newsubxip = (TransactionId *) + palloc(sizeof(TransactionId) * GetMaxSnapshotSubxidCount()); /* * snapbuild.c builds transactions in an "inverted" manner, which means it @@ -578,6 +589,9 @@ SnapBuildInitialSnapshot(SnapBuild *builder) * classical snapshot by marking all non-committed transactions as * in-progress. This can be expensive. */ + newxcnt = 0; + newsubxcnt = 0; + for (xid = snap->xmin; NormalTransactionIdPrecedes(xid, snap->xmax);) { void *test; @@ -591,12 +605,24 @@ SnapBuildInitialSnapshot(SnapBuild *builder) if (test == NULL) { - if (newxcnt >= GetMaxSnapshotXidCount()) -ereport(ERROR, - (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE), - errmsg("initial slot snapshot too large"))); - - newxip[newxcnt++] = xid; + /* Store the xid to the appropriate xid array */ + if (ReorderBufferXidIsKnownSubXact(builder->reorder, xid)) + { +if (!overflowed) +{ + if (newsubxcnt >= GetMaxSnapshotSubxidCount()) + overflowed = true; + else + newsubxip[newsubxcnt++] = xid; +} + } + else + { +if (newxcnt >= GetMaxSnapshotXidCount()) + elog(ERROR, + "too many transactions while creating snapshot"); +newxip[newxcnt++] = xid; + } } TransactionIdAdvance(xid); @@ -606,6 +632,9 @@ SnapBuildInitialSnapshot(SnapBuild *builder) snap->snapshot_type = SNAPSHOT_MVCC; snap->xcnt = newxcnt; snap->xip = newxip; + snap->subxcnt = newsubxcnt; + snap->subxip = newsubxip; + snap->suboverflowed = overflowed; return snap; } diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h index d109d0baed..736f22a04e 100644 --- a/src/include/replication/reorderbuffer.h +++ b/src/include/replication/reorderbuffer.h @@ -669,6 +669,7 @@
Re: doc: New cumulative stats subsystem obsoletes comment in maintenance.sgml
On Thu, Jul 14, 2022 at 4:31 PM Andres Freund wrote: > Hi, > > I had missed David's original email on this topic... > > On 2022-07-14 18:58:09 -0400, Bruce Momjian wrote: > > On Wed, Apr 20, 2022 at 04:40:44PM -0700, David G. Johnston wrote: > > > The new cumulative stats subsystem no longer has a "lost under heavy > load" > > > problem so that parenthetical should go (or at least be modified). > > > > > > These stats can be reset so some discussion about how the system uses > them > > > given that possibility seems like it would be good to add here. I'm > not sure > > > what that should look like though. > > > > > > diff --git a/doc/src/sgml/maintenance.sgml > b/doc/src/sgml/maintenance.sgml > > > index 04a04e0e5f..360807c8f9 100644 > > > --- a/doc/src/sgml/maintenance.sgml > > > +++ b/doc/src/sgml/maintenance.sgml > > > @@ -652,9 +652,8 @@ vacuum insert threshold = vacuum base insert > threshold + > > > vacuum insert scale fac > > > tuples to be frozen by earlier vacuums. The number of obsolete > tuples and > > > the number of inserted tuples are obtained from the cumulative > statistics > > > system; > > > it is a semi-accurate count updated by each > UPDATE, > > > -DELETE and INSERT > operation. (It is > > > -only semi-accurate because some information might be lost under > heavy > > > -load.) If the relfrozenxid value of > the table > > > +DELETE and INSERT operation. > > > +If the relfrozenxid value of the table > > > is more than vacuum_freeze_table_age > transactions old, > > > an aggressive vacuum is performed to freeze old tuples and advance > > > relfrozenxid; otherwise, only pages > that have > > > been modified > > > > Yes, I agree and plan to apply this patch soon. > > It might make sense to still say semi-accurate, but adjust the explanation > to > say that stats reporting is not instantaneous? > > Unless that delay manifests in executing an UPDATE in a session then looking at these views in the same session and not seeing that update reflected I wouldn't mention it. Concurrency aspects are reasonably expected here. But if we do want to mention it maybe: "...it is an eventually-consistent count updated by..." David J.
Re: Windows default locale vs initdb
On Tue, Jul 19, 2022 at 10:58 AM Thomas Munro wrote: > Here's a patch. I added this to the next commitfest, and cfbot promptly told me about some warnings I needed to fix. That'll teach me to post a patch tested with "ci-os-only: windows". Looking more closely at some error messages that report GetLastError() where I'd mixed up %d and %lu, I see also that I didn't quite follow existing conventions for wording when reporting Windows error numbers, so I fixed that too. In the "startcreate" step on CI you can see that it says: The database cluster will be initialized with locale "en-US". The default database encoding has accordingly been set to "WIN1252". The default text search configuration will be set to "english". As for whether "accordingly" still applies, by the logic of of win32_langinfo()... Windows still considers WIN1252 to be the default ANSI code page for "en-US", though it'd work with UTF-8 too. I'm not sure what to make of that. The goal here was to give Windows users good defaults, but WIN1252 is probably not what most people actually want. Hmph. From 95f2684150e2938f2e555d16bbed4295a6dad279 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Tue, 19 Jul 2022 06:31:17 +1200 Subject: [PATCH v2 1/2] Default to BCP 47 locale in initdb on Windows. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Avoid selecting traditional Windows locale names written with English words, because they are unstable and not recommended for use in databases. Since setlocale() returns such names, on Windows use GetUserDefaultLocaleName() if the user didn't provide an explicit locale. Also update the documentation to recommend BCP 47 over the traditional names when providing explicit values to initdb. Reviewed-by: Juan José Santamaría Flecha Discussion: https://postgr.es/m/CA%2BhUKGJ%3DXThErgAQRoqfCy1bKPxXVuF0%3D2zDbB%2BSxDs59pv7Fw%40mail.gmail.com --- doc/src/sgml/charset.sgml | 10 -- src/bin/initdb/initdb.c | 31 +-- 2 files changed, 37 insertions(+), 4 deletions(-) diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml index 445fd175d8..b656ca489f 100644 --- a/doc/src/sgml/charset.sgml +++ b/doc/src/sgml/charset.sgml @@ -83,8 +83,14 @@ initdb --locale=sv_SE system under what names depends on what was provided by the operating system vendor and what was installed. On most Unix systems, the command locale -a will provide a list of available locales. -Windows uses more verbose locale names, such as German_Germany -or Swedish_Sweden.1252, but the principles are the same. + + + +Windows uses BCP 47 language tags, like ICU. +For example, sv-SE represents Swedish as spoken in Sweden. +Windows also supports more verbose locale names based on English words, +such as German_Germany or Swedish_Sweden.1252, +but these are not recommended. diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index 89b888eaa5..3af08b7b99 100644 --- a/src/bin/initdb/initdb.c +++ b/src/bin/initdb/initdb.c @@ -59,6 +59,10 @@ #include "sys/mman.h" #endif +#ifdef WIN32 +#include +#endif + #include "access/xlog_internal.h" #include "catalog/pg_authid_d.h" #include "catalog/pg_class_d.h" /* pgrminclude ignore */ @@ -2007,6 +2011,7 @@ locale_date_order(const char *locale) static void check_locale_name(int category, const char *locale, char **canonname) { + char *locale_copy; char *save; char *res; @@ -2022,10 +2027,30 @@ check_locale_name(int category, const char *locale, char **canonname) /* for setlocale() call */ if (!locale) - locale = ""; + { +#ifdef WIN32 + wchar_t wide_name[LOCALE_NAME_MAX_LENGTH]; + char name[LOCALE_NAME_MAX_LENGTH]; + + /* use Windows API to find the default in BCP47 format */ + if (GetUserDefaultLocaleName(wide_name, LOCALE_NAME_MAX_LENGTH) == 0) + pg_fatal("failed to get default locale name: error code %lu", + GetLastError()); + if (WideCharToMultiByte(CP_ACP, 0, wide_name, -1, name, +LOCALE_NAME_MAX_LENGTH, NULL, NULL) == 0) + pg_fatal("failed to convert locale name: error code %lu", + GetLastError()); + locale_copy = pg_strdup(name); +#else + /* use environment to find the default */ + locale_copy = pg_strdup(""); +#endif + } + else + locale_copy = pg_strdup(locale); /* set the locale with setlocale, to see if it accepts it. */ - res = setlocale(category, locale); + res = setlocale(category, locale_copy); /* save canonical name if requested. */ if (res && canonname) @@ -2054,6 +2079,8 @@ check_locale_name(int category, const char *locale, char **canonname) pg_fatal("invalid locale settings; check LANG and LC_* environment variables"); } } + + free(locale_copy); } /* -- 2.35.1 From 1e0b75b4c8958397a8e660fa0b8759f1da78a753 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Tue, 19 Jul 2022 08:53:08 +1200 Subject: [PATCH v2 2/2] Remove support for old Windows
Re: Doc about how to set max_wal_senders when setting minimal wal_level
On Mon, Jul 18, 2022 at 6:27 PM Japin Li wrote: > > On Tue, 19 Jul 2022 at 03:58, Bruce Momjian wrote: > > On Fri, Jul 15, 2022 at 09:29:20PM +0800, Japin Li wrote: > >> > >> On Fri, 15 Jul 2022 at 08:49, Bruce Momjian wrote: > >> > On Tue, Jul 5, 2022 at 08:02:33PM -0400, Tom Lane wrote: > >> >> "Precondition" is an overly fancy word that makes things less clear > >> >> not more so. Does it mean that setting wal_level = minimal will fail > >> >> if you don't do these other things, or does it just mean that you > >> >> won't be getting the absolute minimum WAL volume? If the former, > >> >> I think it'd be better to say something like "To set wal_level to > minimal, > >> >> you must also set [these variables], which has the effect of > disabling > >> >> both WAL archiving and streaming replication." > >> > > >> > I have created the attached patch to try to improve this text. > >> > >> IMO we can add the following sentence for wal_level description, since > >> if wal_level = minimal and max_wal_senders > 0, we cannot start the > database. > >> > >> To set wal_level to minimal, you must also set max_wal_senders to 0, > >> which has the effect of disabling both WAL archiving and streaming > >> replication. > > > > Okay, text added in the attached patch. > > Thanks for updating the patch! LGTM. > > +0.90 Consider changing: "makes any base backups taken before this unusable" to: "makes existing base backups unusable" As I try to justify this, though, it isn't quite true, maybe: "makes point-in-time recovery, using existing base backups, unable to replay future WAL." David J.
Re: Allowing REINDEX to have an optional name
Sorry, I meant to send this earlier.. On Sun, Jul 17, 2022 at 03:19:47PM +0900, Michael Paquier wrote: > The second thing is test coverage. Using a REINDEX DATABASE/SYSTEM > +my $catalog_toast_index = $node->safe_psql('postgres', > + "SELECT indexrelid::regclass FROM pg_index WHERE indrelid = > '$toast_table'::regclass;" > +); > + > +# Set of SQL queries to cross-check the state of relfilenodes across > +# REINDEX operations. A set of relfilenodes is saved from the catalogs > +# and then compared with pg_class. > +$node->safe_psql('postgres', > + 'CREATE TABLE toast_relfilenodes (relname regclass, relfilenode oid);'); It looks like you named the table "toast_relfilenodes", but then also store to it data for non-toast tables. It's also a bit weird to call the column "relname" but use it to store the ::regclass. You later need to cast the column to text, so you may as well store it as text, either relname or oid::regclass. It seems like cluster.sql does this more succinctly. -- Check that clustering sets new relfilenodes: CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ; CLUSTER clstrpart USING clstrpart_idx; CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ; SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C"; > +# Save the relfilenode of a set of toast indexes, one from the catalog > +# pg_constraint and one from the test table. This data is used for checks > +# after some of the REINDEX operations done below, checking if they are > +# changed. > +my $fetch_toast_relfilenodes = qq{SELECT c.oid::regclass, c.relfilenode > + FROM pg_class a > +JOIN pg_class b ON (a.oid = b.reltoastrelid) > +JOIN pg_index i on (a.oid = i.indrelid) > +JOIN pg_class c on (i.indexrelid = c.oid) > + WHERE b.oid IN ('pg_constraint'::regclass, 'test1'::regclass)}; > +# Same for relfilenodes of normal indexes. This saves the relfilenode > +# from a catalog of pg_constraint, and the one from the test table. > +my $fetch_index_relfilenodes = qq{SELECT oid, relfilenode > + FROM pg_class > + WHERE relname IN ('pg_constraint_oid_index', 'test1x')}; > +my $save_relfilenodes = > + "INSERT INTO toast_relfilenodes $fetch_toast_relfilenodes;" > + . "INSERT INTO toast_relfilenodes $fetch_index_relfilenodes;"; > + > +# Query to compare a set of relfilenodes saved with the contents of pg_class. > +# Note that this does not join using OIDs, as CONCURRENTLY would change them > +# when reindexing. A filter is applied on the toast index names, even if > this > +# does not make a difference between the catalog and normal ones, the > ordering > +# based on the name is enough to ensure a fixed output. > +my $compare_relfilenodes = > + qq(SELECT regexp_replace(b.relname::text, > '(pg_toast.pg_toast_)\\d{4,5}(_index)', '\\1\\2'), Why {4,5} ? > + CASE WHEN a.relfilenode = b.relfilenode THEN 'relfilenode is unchanged' > +ELSE 'relfilenode has changed' END > + FROM toast_relfilenodes b > +JOIN pg_class a ON b.relname::text = a.oid::regclass::text > + ORDER BY b.relname::text); > + > +# Save the set of relfilenodes and compare them. > +$node->safe_psql('postgres', $save_relfilenodes); > +$node->issues_sql_like( > + [ 'reindexdb', 'postgres' ], > + qr/statement: REINDEX DATABASE postgres;/, > + 'SQL REINDEX run'); > +my $relnode_info = $node->safe_psql('postgres', $compare_relfilenodes); > +is( $relnode_info, qq(pg_constraint_oid_index|relfilenode is unchanged > +pg_toast.pg_toast__index|relfilenode has changed > +pg_toast.pg_toast__index|relfilenode is unchanged > +test1x|relfilenode has changed), 'relfilenode change after REINDEX > DATABASE'); > + > +# Re-save and run the second one. > +$node->safe_psql('postgres', > + "TRUNCATE toast_relfilenodes; $save_relfilenodes"); > +$node->issues_sql_like( > + [ 'reindexdb', '-s', 'postgres' ], > + qr/statement: REINDEX SYSTEM postgres;/, > + 'reindex system tables'); > +$relnode_info = $node->safe_psql('postgres', $compare_relfilenodes); > +is( $relnode_info, qq(pg_constraint_oid_index|relfilenode has changed > +pg_toast.pg_toast__index|relfilenode is unchanged > +pg_toast.pg_toast__index|relfilenode has changed > +test1x|relfilenode is unchanged), 'relfilenode change after REINDEX SYSTEM');
Re: [PoC] Improve dead tuple storage for lazy vacuum
Hi, On 2022-07-08 11:09:44 +0900, Masahiko Sawada wrote: > I think that at this stage it's better to define the design first. For > example, key size and value size, and these sizes are fixed or can be > set the arbitary size? Given the use case of buffer mapping, we would > need a wider key to store RelFileNode, ForkNumber, and BlockNumber. On > the other hand, limiting the key size is 64 bit integer makes the > logic simple, and possibly it could still be used in buffer mapping > cases by using a tree of a tree. For value size, if we support > different value sizes specified by the user, we can either embed > multiple values in the leaf node (called Multi-value leaves in ART > paper) or introduce a leaf node that stores one value (called > Single-value leaves). FWIW, I think the best path forward would be to do something similar to the simplehash.h approach, so it can be customized to the specific user. Greetings, Andres Freund
Re: [PoC] Improve dead tuple storage for lazy vacuum
On Thu, Jul 14, 2022 at 1:17 PM John Naylor wrote: > > On Tue, Jul 12, 2022 at 8:16 AM Masahiko Sawada wrote: > > > > > I think that at this stage it's better to define the design first. For > > > > example, key size and value size, and these sizes are fixed or can be > > > > set the arbitary size? > > > > > > I don't think we need to start over. Andres' prototype had certain > > > design decisions built in for the intended use case (although maybe > > > not clearly documented as such). Subsequent patches in this thread > > > substantially changed many design aspects. If there were any changes > > > that made things wonderful for vacuum, it wasn't explained, but Andres > > > did explain how some of these changes were not good for other uses. > > > Going to fixed 64-bit keys and values should still allow many future > > > applications, so let's do that if there's no reason not to. > > > > I thought Andres pointed out that given that we store BufferTag (or > > part of that) into the key, the fixed 64-bit keys might not be enough > > for buffer mapping use cases. If we want to use wider keys more than > > 64-bit, we would need to consider it. > > It sounds like you've answered your own question, then. If so, I'm > curious what your current thinking is. > > If we *did* want to have maximum flexibility, then "single-value > leaves" method would be the way to go, since it seems to be the > easiest way to have variable-length both keys and values. I do have a > concern that the extra pointer traversal would be a drag on > performance, and also require lots of small memory allocations. Agreed. > I also have some concerns about also simultaneously trying to design > for the use for buffer mappings. I certainly want to make this good > for as many future uses as possible, and I'd really like to preserve > any optimizations already fought for. However, to make concrete > progress on the thread subject, I also don't think it's the most > productive use of time to get tied up about the fine details of > something that will not likely happen for several years at the > earliest. I’d like to keep the first version simple. We can improve it and add more optimizations later. Using radix tree for vacuum TID storage would still be a big win comparing to using a flat array, even without all these optimizations. In terms of single-value leaves method, I'm also concerned about an extra pointer traversal and extra memory allocation. It's most flexible but multi-value leaves method is also flexible enough for many use cases. Using the single-value method seems to be too much as the first step for me. Overall, using 64-bit keys and 64-bit values would be a reasonable choice for me as the first step . It can cover wider use cases including vacuum TID use cases. And possibly it can cover use cases by combining a hash table or using tree of tree, for example. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: PATCH: Add Table Access Method option to pgbench
On Mon, Jul 18, 2022 at 01:53:21PM +0300, Alexander Korotkov wrote: > Looks good to me as well. I'm going to push this if no objections. FWIW, I find the extra mention of PGOPTIONS with the specific point of table AMs added within the part of the environment variables a bit confusing, because we already mention PGOPTIONS for serializable transactions a bit down. Hence, my choice would be the addition of an extra paragraph in the "Notes", named "Table Access Methods", just before or after "Good Practices". My 2c. -- Michael signature.asc Description: PGP signature
Re: NAMEDATALEN increase because of non-latin languages
On Mon, Jul 18, 2022 at 9:58 AM Andres Freund wrote: > > 0001 is just boilerplate, same as v1 > > If we were to go for this, I wonder if we should backpatch the cast containing > version of GESTRUCT for less pain backpatching bugfixes. That'd likely require > using a different name for the cast containing one. The new version in this series was meant to be temporary scaffolding, but in the back of my mind I wondered if we should go ahead and keep the simple cast for catalogs that have no varlenas or alignment issues. It sounds like you'd be in favor of that. > > 0003 generates static inline functions that work the same as the current > > GETSTRUCT macro, i.e. just cast to the right pointer and return it. > > It seems likely that inline functions are going to be too large for > this. There's a lot of GESTRUCTs in a lot of files, emitting a copy of the > function every time doesn't seem great. Ok. > > current offset is the previous offset plus the previous type length, plus > > any alignment padding suitable for the current type (there is none here, so > > the alignment aspect is not tested). I'm hoping something like this will be > > sufficient for what's in the current structs, but of course will need > > additional work when expanding those to include pointers to varlen > > attributes. I've not yet inspected the emitted assembly language, but > > regression tests pass. > > Hm. Wouldn't it make sense to just use the normal tuple deforming routines and > then map the results to the structs? I wasn't sure if they'd be suitable for this, but if they are, that'd make this easier and more maintainable. I'll look into it. -- John Naylor EDB: http://www.enterprisedb.com
Re: Doc about how to set max_wal_senders when setting minimal wal_level
On Tue, 19 Jul 2022 at 03:58, Bruce Momjian wrote: > On Fri, Jul 15, 2022 at 09:29:20PM +0800, Japin Li wrote: >> >> On Fri, 15 Jul 2022 at 08:49, Bruce Momjian wrote: >> > On Tue, Jul 5, 2022 at 08:02:33PM -0400, Tom Lane wrote: >> >> "Precondition" is an overly fancy word that makes things less clear >> >> not more so. Does it mean that setting wal_level = minimal will fail >> >> if you don't do these other things, or does it just mean that you >> >> won't be getting the absolute minimum WAL volume? If the former, >> >> I think it'd be better to say something like "To set wal_level to minimal, >> >> you must also set [these variables], which has the effect of disabling >> >> both WAL archiving and streaming replication." >> > >> > I have created the attached patch to try to improve this text. >> >> IMO we can add the following sentence for wal_level description, since >> if wal_level = minimal and max_wal_senders > 0, we cannot start the database. >> >> To set wal_level to minimal, you must also set max_wal_senders to 0, >> which has the effect of disabling both WAL archiving and streaming >> replication. > > Okay, text added in the attached patch. Thanks for updating the patch! LGTM. -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd.
Re: First draft of the PG 15 release notes
> Increase hash_mem_multiplier default to 2.0 (Peter Geoghegan) > This allows query hash operations to use double the amount of work_mem memory > as other operations. I wonder if it's worth pointing out that a query may end up using not just 2x more memory (since work_mem*hash_mem_multiplier is per node), but 2**N more, for N nodes. > Remove pg_dump's --no-synchronized-snapshots option since all supported > server versions support synchronized snapshots (Tom Lane) It'd be better to put that after the note about dropping support for upgrading clusters older than v9.2 in psql/pg_dump/pg_upgrade. > Enable system and TOAST btree indexes to efficiently store duplicates (Peter > Geoghegan) Say "btree indexes on system [and TOAST] tables" > Prevent changes to columns only indexed by BRIN indexes from disabling HOT > updates (Josef Simanek) This was reverted > Generate periodic log message during slow server starts (Nitin Jadhav, Robert > Haas) messages plural > Messages report the cause of the delay. The time interval for notification is > controlled by the new server variable log_startup_progress_interval. *The messages > Add server variable shared_memory_size to report the size of allocated shared > memory (Nathan Bossart) > Add server variable shared_memory_size_in_huge_pages to report the number of > huge memory pages required (Nathan Bossart) Maybe these should say server *parameter* since they're not really "variable". > 0. Add support for LZ4 and Zstandard compression of server-side base backups > (Jeevan Ladhe, Robert Haas) > 1. Allow pg_basebackup to use LZ4 and Zstandard compression on server-side > base backup files (Dipesh Pandit, Jeevan Ladhe) > 2. Allow pg_basebackup's --compress option to control the compression method > and options (Michael Paquier, Robert Haas) >New options include server-gzip (gzip on the server), client-gzip (same as > gzip). > 3. Allow pg_basebackup to compress on the server side and decompress on the > client side before storage (Dipesh Pandit) >This is accomplished by specifying compression on the server side and > plain output format. I still think these expose the incremental development rather than the user-facing change. 1. It seems wrong to say "server-side" since client-side compression with LZ4/zstd is also supported. 2. It's confusing to say that the new options are server-gzip and client-gzip, since it just mentioned new algorithms; 3. I'm not sure this needs to be mentioned at all; maybe it should be a "detail" following the item about server-side compression. > Tables added to the listed schemas in the future will also be replicated. "Tables later added" is clearer. Otherwise "in the future" sounds like maybe in v16 or v17 we'll start replicating those tables. > Allow subscribers to stop logical replication application on error (Osumi > Takamichi, Mark Dilger) "application" sounds off. > Add new default WAL-logged method for database creation (Dilip Kumar) "New default" sounds off. Say "Add new WAL-logged method for database creation, used by default". > Have pg_upgrade preserve relfilenodes, tablespace, and database OIDs between > old and new clusters (Shruthi KC, Antonin Houska) "tablespace OIDs" or "tablespace and database OIDs and relfilenodes" > Limit support of pg_upgrade to old servers running PostgreSQL 9.2 and later > (Tom Lane) The word "old" doesn't appear in the 2 release notes items about pg_dump and psql, and "old" makes it sound sounds like "antique" rather than "source". > Some internal-use-only types have also been assigned this column. this *value > Allow custom scan provders to indicate if they support projections (Sven > Klemm) > The default is now that custom scan providers can't support projections, so > they need to be updated for this release. Per the commit message, they don't "need" to be updated. I think this should say "The default now assumes that a custom scan provider does not support projections; to retain optimal performance, they should be updated to indicate whether that's supported.
Re: System catalog documentation chapter
Bruce Momjian writes: > On Sat, Jul 16, 2022 at 10:53:17AM +0200, Peter Eisentraut wrote: >> Maybe this whole notion that "system views" is one thing is not suitable. > Are you thinking we should just call the chapter "System Catalogs and > Views" and just place them alphabetically in a single chapter? I didn't think that was Peter's argument at all. He's complaining that "system views" isn't a monolithic category, which is a reasonable point, especially since we have a bunch of built-in views that appear in other chapters. But to then also confuse them with catalogs isn't improving the situation. The views that are actually reinterpretations of catalog contents should probably be documented near the catalogs. But a lot of stuff in that chapter is no such thing. For example, it's really unclear why pg_backend_memory_contexts is documented here and not somewhere near the stats views. We also have stuff like pg_available_extensions, pg_file_settings, and pg_timezone_names, which are reporting ground truth of some sort that didn't come from the catalogs. I'm not sure if those belong near the catalogs or not. The larger point, perhaps, is that this whole area is underneath "Part VII: Internals", and that being the case what you would expect to find here is stuff that we don't intend people to interact with in day-to-day usage. Most of the "system views" are specifically intended for day-to-day use, maybe only by DBAs, but nonetheless they are user-facing in a way that the catalogs aren't. Maybe we should move them all to Part IV, in a chapter or chapters adjacent to the Information Schema chapter. Or maybe try to separate "user" views from "DBA" views, and put user views in Part IV while DBA views go into a new chapter in Part III, near the stats views. regards, tom lane
Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
On Mon, Jul 18, 2022 at 12:28 PM shiy.f...@fujitsu.com wrote: > > On Fri, Jul 15, 2022 10:39 PM Masahiko Sawada wrote: > > > > This patch should have the fix for the issue that Shi yu reported. Shi > > yu, could you please test it again with this patch? > > > > Thanks for updating the patch! > I have tested and confirmed that the problem I found has been fixed. Thank you for testing! Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: [PATCH] Introduce array_shuffle() and array_sample()
On Tue, Jul 19, 2022 at 8:15 AM Martin Kalcher wrote: > Am 18.07.22 um 21:29 schrieb Tom Lane: > > The preferred thing to do is to add it to our "commitfest" queue, > > which will ensure that it gets looked at eventually. The currently > > open cycle is 2022-09 [1] (see the "New Patch" button there). > > Thanks Tom, did that. FYI that brought your patch to the attention of this CI robot, which is showing a couple of warnings. See the FAQ link there for an explanation of that infrastructure. http://cfbot.cputube.org/martin-kalcher.html
Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
On Mon, Jul 18, 2022 at 1:12 PM Amit Kapila wrote: > > On Fri, Jul 15, 2022 at 8:09 PM Masahiko Sawada wrote: > > > > This patch should have the fix for the issue that Shi yu reported. Shi > > yu, could you please test it again with this patch? > > > > Can you explain the cause of the failure and your fix for the same? @@ -1694,6 +1788,8 @@ out: /* be tidy */ if (ondisk) pfree(ondisk); + if (catchange_xip) + pfree(catchange_xip); Regarding the above code in the previous version patch, looking at the generated assembler code shared by Shi yu offlist, I realized that the “if (catchange_xip)” is removed (folded) by gcc optimization. This is because we dereference catchange_xip before null-pointer check as follow: + /* copy catalog modifying xacts */ + sz = sizeof(TransactionId) * catchange_xcnt; + memcpy(ondisk_c, catchange_xip, sz); + COMP_CRC32C(ondisk->checksum, ondisk_c, sz); + ondisk_c += sz; Since sz is 0 in this case, memcpy doesn’t do anything actually. By checking the assembler code, I’ve confirmed that gcc does the optimization for these code and setting -fno-delete-null-pointer-checks flag prevents the if statement from being folded. Also, I’ve confirmed that adding the check if "catchange.xcnt > 0” before the null-pointer check also can prevent that. Adding a check if "catchange.xcnt > 0” looks more robust. I’ve added a similar check for builder->committed.xcnt as well for consistency. builder->committed.xip could have no transactions. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: Costing elided SubqueryScans more nearly correctly
On Tue, Jul 19, 2022 at 1:30 AM Tom Lane wrote: > Alvaro Herrera writes: > > On 2022-Jul-18, Richard Guo wrote: > >> BTW, not related to this patch, the new lines for parallel_aware check > >> in setrefs.c are very wide. How about wrap them to keep consistent with > >> arounding codes? > > > Not untrue! Something like this, you mean? Fixed the nearby typo while > > at it. > > WFM. (I'd fixed the comment typo in my patch, but I don't mind if > you get there first.) +1 The fix looks good to me. Thanks Richard
Re: System catalog documentation chapter
On Sat, Jul 16, 2022 at 10:53:17AM +0200, Peter Eisentraut wrote: > Now that I see the result, I don't think this is really the right > improvement yet. > > The new System Views chapter lists views that are effectively quasi-system > catalogs, such as pg_shadow or pg_replication_origin_status -- the fact that > these are views and not tables is secondary. On the other hand, it lists > views that are more on the level of information schema views, that is, they > are explicitly user-facing wrappers around information available elsewhere, > such as pg_sequences, pg_views. > > I think most of them are in the second category. So having this chapter in > the "Internals" part seems wrong. But then moving it, say, closer to where > the information schema is documented wouldn't be right either, unless we > move the views in the first category elsewhere. > > Also, consider that we document the pg_stats_ views in yet another place, > and it's not really clear why something like pg_replication_slots, which > might often be used together with stats views, is documented so far away > from them. > > Maybe this whole notion that "system views" is one thing is not suitable. Are you thinking we should just call the chapter "System Catalogs and Views" and just place them alphabetically in a single chapter? -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson
Re: Allow placeholders in ALTER ROLE w/o superuser
On Fri, Jul 01, 2022 at 04:40:27PM -0700, Nathan Bossart wrote: > On Sun, Jun 05, 2022 at 11:20:38PM -0500, Steve Chavez wrote: >> However, defining placeholders at the role level require superuser: >> >> alter role myrole set my.username to 'tomas'; >> ERROR: permission denied to set parameter "my.username" >> >> Which is inconsistent and surprising behavior. I think it doesn't make >> sense since you can already set them at the session or transaction >> level(SET LOCAL my.username = 'tomas'). Enabling this would allow sidecar >> services to store metadata scoped to its pertaining role. >> >> I've attached a patch that removes this restriction. From my testing, this >> doesn't affect permission checking when an extension defines its custom GUC >> variables. >> >>DefineCustomStringVariable("my.custom", NULL, NULL, _custom, NULL, >> PGC_SUSET, ..); >> >> Using PGC_SUSET or PGC_SIGHUP will fail accordingly. Also no tests fail >> when doing "make installcheck". > > IIUC you are basically proposing to revert a6dcd19 [0], but it is not clear > to me why that is safe. Am I missing something? I spent some more time looking into this, and I think I've constructed a simple example that demonstrates the problem with removing this restriction. postgres=# CREATE ROLE test CREATEROLE; CREATE ROLE postgres=# CREATE ROLE other LOGIN; CREATE ROLE postgres=# GRANT CREATE ON DATABASE postgres TO other; GRANT postgres=# SET ROLE test; SET postgres=> ALTER ROLE other SET plperl.on_plperl_init = 'test'; ALTER ROLE postgres=> \c postgres other You are now connected to database "postgres" as user "other". postgres=> CREATE EXTENSION plperl; CREATE EXTENSION postgres=> SHOW plperl.on_plperl_init; plperl.on_plperl_init --- test (1 row) In this example, the non-superuser role sets a placeholder GUC for another role. This GUC becomes a PGC_SUSET GUC when plperl is loaded, so a non-superuser role will have successfully set a PGC_SUSET GUC. If we had a record of who ran ALTER ROLE, we might be able to apply appropriate permissions checking when the module is loaded, but this information doesn't exist in pg_db_role_setting. IIUC we have the following options: 1. Store information about who ran ALTER ROLE. I think there are a couple of problems with this. For example, what happens if the grantor was dropped or its privileges were altered? Should we instead store the context of the user (i.e., PGC_USERSET or PGC_SUSET)? Do we need to add entries to pg_depend? 2. Ignore or ERROR for any ALTER ROLE settings for custom GUCs. Since we don't know who ran ALTER ROLE, we can't trust the value. 3. Require superuser to use ALTER ROLE for a placeholder. This is what we do today. Since we know a superuser set the value, we can always apply it when the custom GUC is finally defined. If this is an accurate representation of the options, it seems clear why the superuser restriction is in place. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Add WAL recovery messages with log_wal_traffic GUC (was: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display)
On Fri, May 13, 2022 at 09:10:52AM -0400, Robert Haas wrote: > I suggest that if log_startup_progress_interval doesn't meet your > needs here, we should try to understand why not and maybe enhance it, > instead of adding a separate facility. +1. AFAICT it should be possible to make the log_startup_progress_interval machinery extensible enough to be reused for several other tasks that can take a while but have little visibility. Since this thread has been inactive for over 2 months, I'm marking the commitfest entry as Waiting on Author. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Windows default locale vs initdb
On Wed, Dec 15, 2021 at 11:32 PM Juan José Santamaría Flecha wrote: > On Sun, May 16, 2021 at 6:29 AM Noah Misch wrote: >> On Mon, Apr 19, 2021 at 05:42:51PM +1200, Thomas Munro wrote: >> > The question we asked ourselves >> > multiple times in the other thread was how we're supposed to get to >> > the modern BCP 47 form when creating the template databases. It looks >> > like one possibility, since Vista, is to call >> > GetUserDefaultLocaleName()[2] >> >> > No patch, but I wondered if any Windows hackers have any feedback on >> > relative sanity of trying to fix all these problems this way. >> >> Sounds reasonable. If PostgreSQL v15 would otherwise run on Windows Server >> 2003 R2, this is a good time to let that support end. >> > The value returned by GetUserDefaultLocaleName() is a system configured > parameter, independent of what you set with setlocale(). It might be > reasonable for initdb but not for a backend in most cases. Agreed. Only for initdb, and only if you didn't specify a locale name on the command line. > You can get the locale POSIX-ish name using GetLocaleInfoEx(), but this is no > longer recommended, because using LCIDs is no longer recommended [1]. > Although, this would work for legacy locales. Please find attached a POC > patch showing this approach. Now that museum-grade Windows has been defenestrated, we are free to call GetUserDefaultLocaleName(). Here's a patch. One thing you did in your patch that I disagree with, I think, was to convert a BCP 47 name to a POSIX name early, that is, s/-/_/. I think we should use the locale name exactly as Windows (really, under the covers, ICU) spells it. There is only one place in the tree today that really wants a POSIX locale name, and that's LC_MESSAGES, accessed by GNU gettext, not Windows. We already had code to cope with that. I think we should also convert to POSIX format when making the collname in your pg_import_system_collations() proposal, so that COLLATE "en_US" works (= a SQL identifier), but that's another thread[1]. I don't think we should do it in collcollate or datcollate, which is a string for the OS to interpret. With my garbage collector hat on, I would like to rip out all of the support for traditional locale names, eventually. Deleting kludgy code is easy and fun -- 0002 is a first swing at that -- but there remains an important unanswered question. How should someone pg_upgrade a "English_Canada.1521" cluster if we now reject that name? We'd need to do a conversion to "en-CA", or somehow tell the user to. H. [1] https://www.postgresql.org/message-id/flat/CAC%2BAXB0WFjJGL1n33bRv8wsnV-3PZD0A7kkjJ2KjPH0dOWqQdg%40mail.gmail.com From d6d677fd185242590f0f716cf69d09e735122ff7 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Tue, 19 Jul 2022 06:31:17 +1200 Subject: [PATCH 1/2] Default to BCP 47 locale in initdb on Windows. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Avoid selecting traditional Windows locale names written with English words, because they are unstable and not recommended for use in databases. Since setlocale() returns such names, instead use GetUserDefaultLocaleName() if the user didn't provide an explicit locale. Also update the documentation to recommend BCP 47 over the traditional names when providing explicit values to initdb. Reviewed-by: Juan José Santamaría Flecha Discussion: https://postgr.es/m/CA%2BhUKGJ%3DXThErgAQRoqfCy1bKPxXVuF0%3D2zDbB%2BSxDs59pv7Fw%40mail.gmail.com --- doc/src/sgml/charset.sgml | 10 -- src/bin/initdb/initdb.c | 28 +++- 2 files changed, 35 insertions(+), 3 deletions(-) diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml index 445fd175d8..22e33f0f57 100644 --- a/doc/src/sgml/charset.sgml +++ b/doc/src/sgml/charset.sgml @@ -83,8 +83,14 @@ initdb --locale=sv_SE system under what names depends on what was provided by the operating system vendor and what was installed. On most Unix systems, the command locale -a will provide a list of available locales. -Windows uses more verbose locale names, such as German_Germany -or Swedish_Sweden.1252, but the principles are the same. + + + +Windows uses BCP 47 language tags. +For example, sv-SE represents Swedish as spoken in Sweden. +Windows also supports more verbose locale names based on English words, +such as German_Germany or Swedish_Sweden.1252, +but these are not recommended. diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index 89b888eaa5..57c5ecf3cf 100644 --- a/src/bin/initdb/initdb.c +++ b/src/bin/initdb/initdb.c @@ -59,6 +59,10 @@ #include "sys/mman.h" #endif +#ifdef WIN32 +#include +#endif + #include "access/xlog_internal.h" #include "catalog/pg_authid_d.h" #include "catalog/pg_class_d.h" /* pgrminclude ignore */ @@ -2022,7 +2026,27 @@ check_locale_name(int category, const char *locale, char **canonname) /* for
Re: Commitfest Update
On 7/18/22 15:32, Justin Pryzby wrote: > On Mon, Jul 18, 2022 at 12:00:01PM -0700, Jacob Champion wrote: >> And thank you for speaking up so quickly! It's a lot easier to undo >> partial damage :D (Speaking of which: where is that CF update stream you >> mentioned?) > > https://commitfest.postgresql.org/activity/ Thank you. At this point, I think I've repaired all the entries. --Jacob
Re: [PATCH] Introduce array_shuffle() and array_sample()
Am 19.07.22 um 00:18 schrieb Tom Lane: Independently of the dimensionality question --- I'd imagined that array_sample would select a random subset of the array elements but keep their order intact. If you want the behavior shown above, you can do array_shuffle(array_sample(...)). But if we randomize it, and that's not what the user wanted, she has no recourse. Now, if you're convinced that the set of people wanting sampling-without-shuffling is the empty set, then making everybody else call two functions is a loser. But I'm not convinced. At the least, I'd like to see the argument made why nobody would want that. On the contrary! I am pretty sure there are people out there wanting sampling-without-shuffling. I will think about that.
Re: [PATCH] Introduce array_shuffle() and array_sample()
"David G. Johnston" writes: > On Mon, Jul 18, 2022 at 3:18 PM Tom Lane wrote: >> Independently of the dimensionality question --- I'd imagined that >> array_sample would select a random subset of the array elements >> but keep their order intact. If you want the behavior shown >> above, you can do array_shuffle(array_sample(...)). But if we >> randomize it, and that's not what the user wanted, she has no >> recourse. > And for those that want to know in what order those elements were chosen > they have no recourse in the other setup. Um ... why is "the order in which the elements were chosen" a concept we want to expose? ISTM sample() is a black box in which notionally the decisions could all be made at once. > I really think this function needs to grow an algorithm argument that can > be used to specify stuff like ordering, replacement/without-replacement, > etc...just some enums separated by commas that can be added to the call. I think you might run out of gold paint somewhere around here. I'm still not totally convinced we should bother with the sample() function at all, let alone that it needs algorithm variants. At some point we say to the user "here's a PL, write what you want for yourself". regards, tom lane
Re: Commitfest Update
On Mon, Jul 18, 2022 at 12:00:01PM -0700, Jacob Champion wrote: > And thank you for speaking up so quickly! It's a lot easier to undo > partial damage :D (Speaking of which: where is that CF update stream you > mentioned?) https://commitfest.postgresql.org/activity/ -- Justin
Re: [PATCH] Introduce array_shuffle() and array_sample()
On Mon, Jul 18, 2022 at 3:18 PM Tom Lane wrote: > > Independently of the dimensionality question --- I'd imagined that > array_sample would select a random subset of the array elements > but keep their order intact. If you want the behavior shown > above, you can do array_shuffle(array_sample(...)). But if we > randomize it, and that's not what the user wanted, she has no > recourse. > > And for those that want to know in what order those elements were chosen they have no recourse in the other setup. I really think this function needs to grow an algorithm argument that can be used to specify stuff like ordering, replacement/without-replacement, etc...just some enums separated by commas that can be added to the call. David J.
Re: [PATCH] Introduce array_shuffle() and array_sample()
Martin Kalcher writes: > If we go with (1) array_shuffle() and array_sample() should shuffle each > element individually and always return a one-dimensional array. >select array_shuffle('{{1,2},{3,4},{5,6}}'); >--- > {1,4,3,5,6,2} >select array_sample('{{1,2},{3,4},{5,6}}', 3); >-- > {1,4,3} Independently of the dimensionality question --- I'd imagined that array_sample would select a random subset of the array elements but keep their order intact. If you want the behavior shown above, you can do array_shuffle(array_sample(...)). But if we randomize it, and that's not what the user wanted, she has no recourse. Now, if you're convinced that the set of people wanting sampling-without-shuffling is the empty set, then making everybody else call two functions is a loser. But I'm not convinced. At the least, I'd like to see the argument made why nobody would want that. regards, tom lane
Re: allow specifying action when standby encounters incompatible parameter settings
On Fri, Jun 24, 2022 at 11:42:29AM +0100, Simon Riggs wrote: > This patch would undo a very important change - to keep servers > available by default and go back to the old behavior for a huge fleet > of Postgres databases. The old behavior of shutdown-on-change caused > catastrophe many times for users and in one case brought down a rather > large and important service provider, whose CTO explained to me quite > clearly how stupid he thought that old behavior was. So I will not > easily agree to allowing it to be put back into PostgreSQL, simply to > avoid adding a small amount of easy code into an orchestration layer > that could and should implement documented best practice. > > I am otherwise very appreciative of your insightful contributions, > just not this specific one. Given this feedback, I intend to mark the associated commitfest entry as Withdrawn at the conclusion of the current commitfest. > Happy to discuss how we might introduce new parameters/behavior to > reduce the list of parameters that need to be kept in lock-step. Me, too. I don't have anything concrete to propose at the moment. I thought Horiguchi-san's idea of automatically running ALTER SYSTEM was intriguing, but I have not explored that in depth. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Use -fvisibility=hidden for shared libraries
Hi, On 2022-07-18 00:05:16 -0700, Andres Freund wrote: > It looks like we might be missing out on benefiting from this on more > platforms - the test right now is in the gcc specific section of configure.ac, > but it looks like at least Intel's, sun's and AIX's compilers might all > support -fvisibility with the same syntax. > > Given that that's just about all compilers we support using configure, perhaps > we should just move that out of the compiler specific section? Doesn't look > like there's much precedent for that so far... Here's a potential patch along those lines. I wonder if we also should move the -fno-strict-aliasing, -fwrapv tests out. But that'd be something for later. Greetings, Andres Freund >From 713236eb696b6b60bbde3582d0fd31f1de23d7b9 Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Mon, 18 Jul 2022 15:05:58 -0700 Subject: [PATCH v4] configure: Check if -fvisibility is supported for all compilers. --- configure| 305 ++- configure.ac | 31 -- 2 files changed, 177 insertions(+), 159 deletions(-) diff --git a/configure b/configure index a4f4d321fb7..df68b86b09b 100755 --- a/configure +++ b/configure @@ -6304,154 +6304,6 @@ if test x"$pgac_cv_prog_CC_cflags__ftree_vectorize" = x"yes"; then fi - # - # If the compiler knows how to hide symbols add the switch needed for that - # to CFLAGS_SL_MODULE and define HAVE_VISIBILITY_ATTRIBUTE. - { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether ${CC} supports -fvisibility=hidden, for CFLAGS_SL_MODULE" >&5 -$as_echo_n "checking whether ${CC} supports -fvisibility=hidden, for CFLAGS_SL_MODULE... " >&6; } -if ${pgac_cv_prog_CC_cflags__fvisibility_hidden+:} false; then : - $as_echo_n "(cached) " >&6 -else - pgac_save_CFLAGS=$CFLAGS -pgac_save_CC=$CC -CC=${CC} -CFLAGS="${CFLAGS_SL_MODULE} -fvisibility=hidden" -ac_save_c_werror_flag=$ac_c_werror_flag -ac_c_werror_flag=yes -cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ - -int -main () -{ - - ; - return 0; -} -_ACEOF -if ac_fn_c_try_compile "$LINENO"; then : - pgac_cv_prog_CC_cflags__fvisibility_hidden=yes -else - pgac_cv_prog_CC_cflags__fvisibility_hidden=no -fi -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext -ac_c_werror_flag=$ac_save_c_werror_flag -CFLAGS="$pgac_save_CFLAGS" -CC="$pgac_save_CC" -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv_prog_CC_cflags__fvisibility_hidden" >&5 -$as_echo "$pgac_cv_prog_CC_cflags__fvisibility_hidden" >&6; } -if test x"$pgac_cv_prog_CC_cflags__fvisibility_hidden" = x"yes"; then - CFLAGS_SL_MODULE="${CFLAGS_SL_MODULE} -fvisibility=hidden" -fi - - - if test "$pgac_cv_prog_CC_cflags__fvisibility_hidden" = yes; then - -$as_echo "#define HAVE_VISIBILITY_ATTRIBUTE 1" >>confdefs.h - - fi - # For C++ we additionally want -fvisibility-inlines-hidden - { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether ${CXX} supports -fvisibility=hidden, for CXXFLAGS_SL_MODULE" >&5 -$as_echo_n "checking whether ${CXX} supports -fvisibility=hidden, for CXXFLAGS_SL_MODULE... " >&6; } -if ${pgac_cv_prog_CXX_cxxflags__fvisibility_hidden+:} false; then : - $as_echo_n "(cached) " >&6 -else - pgac_save_CXXFLAGS=$CXXFLAGS -pgac_save_CXX=$CXX -CXX=${CXX} -CXXFLAGS="${CXXFLAGS_SL_MODULE} -fvisibility=hidden" -ac_save_cxx_werror_flag=$ac_cxx_werror_flag -ac_cxx_werror_flag=yes -ac_ext=cpp -ac_cpp='$CXXCPP $CPPFLAGS' -ac_compile='$CXX -c $CXXFLAGS $CPPFLAGS conftest.$ac_ext >&5' -ac_link='$CXX -o conftest$ac_exeext $CXXFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' -ac_compiler_gnu=$ac_cv_cxx_compiler_gnu - -cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ - -int -main () -{ - - ; - return 0; -} -_ACEOF -if ac_fn_cxx_try_compile "$LINENO"; then : - pgac_cv_prog_CXX_cxxflags__fvisibility_hidden=yes -else - pgac_cv_prog_CXX_cxxflags__fvisibility_hidden=no -fi -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext -ac_ext=c -ac_cpp='$CPP $CPPFLAGS' -ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' -ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' -ac_compiler_gnu=$ac_cv_c_compiler_gnu - -ac_cxx_werror_flag=$ac_save_cxx_werror_flag -CXXFLAGS="$pgac_save_CXXFLAGS" -CXX="$pgac_save_CXX" -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv_prog_CXX_cxxflags__fvisibility_hidden" >&5 -$as_echo "$pgac_cv_prog_CXX_cxxflags__fvisibility_hidden" >&6; } -if test x"$pgac_cv_prog_CXX_cxxflags__fvisibility_hidden" = x"yes"; then - CXXFLAGS_SL_MODULE="${CXXFLAGS_SL_MODULE} -fvisibility=hidden" -fi - - { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether ${CXX} supports -fvisibility-inlines-hidden, for CXXFLAGS_SL_MODULE" >&5 -$as_echo_n "checking whether ${CXX} supports -fvisibility-inlines-hidden, for CXXFLAGS_SL_MODULE... " >&6; } -if ${pgac_cv_prog_CXX_cxxflags__fvisibility_inlines_hidden+:} false; then : - $as_echo_n "(cached) " >&6 -else -
Re: pg_parameter_aclcheck() and trusted extensions
On Thu, Jul 14, 2022 at 03:57:35PM -0700, Nathan Bossart wrote: > However, ALTER ROLE RESET ALL will be blocked, while resetting only the > individual GUC will go through. > > postgres=> ALTER ROLE other RESET ALL; > ALTER ROLE > postgres=> SELECT * FROM pg_db_role_setting; >setdatabase | setrole |setconfig > -+-+- > 0 | 16385 | {zero_damaged_pages=on} > (1 row) > > postgres=> ALTER ROLE other RESET zero_damaged_pages; > ALTER ROLE > postgres=> SELECT * FROM pg_db_role_setting; >setdatabase | setrole | setconfig > -+-+--- > (0 rows) > > I think this is because GUCArrayReset() is the only caller of > validate_option_array_item() that sets skipIfNoPermissions to true. The > others fall through to set_config_option(), which does a > pg_parameter_aclcheck(). So, you are right. Here's a small patch that seems to fix this case. However, I wonder if a better way to fix this is to provide a way to stop set_config_option() from throwing errors (e.g., setting elevel to -1). That way, we could remove the manual permissions checks in favor of always using the real ones, which might help prevent similar bugs in the future. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 0328029d43..fbc1014824 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -11689,7 +11689,8 @@ validate_option_array_item(const char *name, const char *value, * We cannot do any meaningful check on the value, so only permissions * are useful to check. */ - if (superuser()) + if (superuser() || + pg_parameter_aclcheck(gconf->name, GetUserId(), ACL_SET) == ACLCHECK_OK) return true; if (skipIfNoPermissions) return false; @@ -11703,6 +11704,8 @@ validate_option_array_item(const char *name, const char *value, /* ok */ ; else if (gconf->context == PGC_SUSET && superuser()) /* ok */ ; + else if (pg_parameter_aclcheck(gconf->name, GetUserId(), ACL_SET) == ACLCHECK_OK) + /* ok */ ; else if (skipIfNoPermissions) return false; /* if a permissions error should be thrown, let set_config_option do it */
Re: [PATCH] Introduce array_shuffle() and array_sample()
Am 18.07.22 um 23:03 schrieb Tom Lane: I wrote: Martin had originally proposed (2), which I rejected on the grounds that we don't treat multi-dimensional arrays as arrays-of-arrays for any other purpose. Actually, after poking at it for awhile, that's an overstatement. It's true that the type system doesn't think N-D arrays are arrays-of-arrays, but there are individual functions/operators that do. Thanks Robert for pointing out the inconsistent behavior of array_sample(). That needs to be fixed. As Tom's investigation showed, there is no consensus in the code if multi-dimensional arrays are treated as arrays-of-arrays or not. We need to decide what should be the correct treatment. If we go with (1) array_shuffle() and array_sample() should shuffle each element individually and always return a one-dimensional array. select array_shuffle('{{1,2},{3,4},{5,6}}'); --- {1,4,3,5,6,2} select array_sample('{{1,2},{3,4},{5,6}}', 3); -- {1,4,3} If we go with (2) both functions should only operate on the first dimension and shuffle whole subarrays and keep the dimensions intact. select array_shuffle('{{1,2},{3,4},{5,6}}'); - {{3,4},{1,2},{5,6}} select array_sample('{{1,2},{3,4},{5,6}}', 2); --- {{3,4},{1,2}} I do not feel qualified to make that decision. (2) complicates the code a bit, but that should not be the main argument here. Martin
Re: [PATCH] Introduce array_shuffle() and array_sample()
I wrote: > Martin had originally proposed (2), which I rejected on the grounds > that we don't treat multi-dimensional arrays as arrays-of-arrays for > any other purpose. Actually, after poking at it for awhile, that's an overstatement. It's true that the type system doesn't think N-D arrays are arrays-of-arrays, but there are individual functions/operators that do. For instance, @> is in the its-a-flat-list-of-elements camp: regression=# select array[[1,2],[3,4]] @> array[1,3]; ?column? -- t (1 row) but || wants to preserve dimensionality: regression=# select array[[1,2],[3,4]] || array[1]; ERROR: cannot concatenate incompatible arrays DETAIL: Arrays with differing dimensions are not compatible for concatenation. Various other functions dodge the issue by refusing to work on arrays of more than one dimension. There seem to be more functions that think arrays are flat than otherwise, but it's not as black-and-white as I was thinking. regards, tom lane
Re: [Commitfest 2022-07] Begins Now
Hi, On 2022-07-18 13:34:52 -0700, Jacob Champion wrote: > On 7/18/22 12:32, Andres Freund wrote: > > I'm not following - I'm talking about the patch author needing a while to > > address the higher level feedback given by a reviewer. The author might put > > out a couple new versions, which each might still benefit from review. In > > that > > - pretty common imo - situation I don't think it's useful for the reviewer > > that provided the higher level feedback to be removed from the patch. > > Okay, I think I get it now. Thanks. > > There's still something off in that case that I can't quite > articulate... Is it your intent to use Reviewer as a signal that "I'll > come back to this eventually"? That, and as a way to find out what I possible should look at again. > As a signal to other prospective reviewers that you're handling the patch? Definitely not. I think no reviewer on a patch should be taken as that. There's often many angles to a patch, and leaving trivial patches aside, no reviewer is an expert in all of them. > How should a CFM move things forward when they come to a patch that's been > responded to by the author but the sole Reviewer has been silent? Ping the reviewer and/or thread, ensure the patch is needs-review state. I don't think removing reviewers in the CF app would help with that anyway - often some reviewers explicitly state that they're only reviewing a specific part of the patch, or that looked at everything but lack expertise to be confident in their positions etc. Such reviewers might do more rounds of feedback to newer patches, but the patch might still need more feedback. ISTM that you're trying to get patches to have zero reviewers if they need more reviewers, because that can serve as a signal in the CF app. But to me that's a bad proxy. Greetings, Andres Freund
Re: doc: Clarify Routines and Extension Membership
On Thu, Jul 14, 2022 at 06:27:17PM -0700, David G. Johnston wrote: > Thank you and apologies for being quiet here and on a few of the other > threads. > I've been on vacation and flagged as ToDo some of the non-simple feedback > items > that have come this way. No need to worry --- we will incorporate your suggestions whenever you can supply them. I know you waited months for these to be addressed. > The change to restrict and description in drop extension needs to be fixed up > (the other pages look good). > > "This option prevents the specified extensions from being dropped if there > exists non-extension-member objects that depends on any the extensions. This > is > the default." > > At minimum: "...that depend on any of the extensions." Agreed. > I did just now confirm that if any of the named extensions failed to be > dropped > the entire command fails. There is no partial success mode. > > I'd like to avoid non-extension-member, and one of the main points is that the > routine dependency is member-like, not actual membership. Hence the separate > wording. Okay. > I thus propose to replace the drop extension / restrict paragraph and replace > it with the following: > > "This option prevents the specified extensions from being dropped if other > objects - besides these extensions, their members, and their explicitly > dependent routines - depend on them. This is the default." Good. > Also, I'm thinking to change, on the same page (description): > > "Dropping an extension causes its component objects," > > to be: > > "Dropping an extension causes its member objects," > > I'm not sure why I originally chose component over member... All done, in the attached patch. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson diff --git a/doc/src/sgml/ref/drop_extension.sgml b/doc/src/sgml/ref/drop_extension.sgml index c01ddace84..549b8d3b52 100644 --- a/doc/src/sgml/ref/drop_extension.sgml +++ b/doc/src/sgml/ref/drop_extension.sgml @@ -30,7 +30,7 @@ DROP EXTENSION [ IF EXISTS ] name [ DROP EXTENSION removes extensions from the database. - Dropping an extension causes its component objects, and other explicitly + Dropping an extension causes its member objects, and other explicitly dependent routines (see , the depends on extension action), to be dropped as well. @@ -79,9 +79,9 @@ DROP EXTENSION [ IF EXISTS ] name [ RESTRICT - This option prevents the specified extensions from being dropped - if there exists non-extension-member objects that depends on any - the extensions. This is the default. + This option prevents the specified extensions from being dropped if + other objects, besides these extensions, their members, and their + explicitly dependent routines, depend on them. This is the default.
Re: [PATCH] Introduce array_shuffle() and array_sample()
Robert Haas writes: > On Mon, Jul 18, 2022 at 3:03 PM Martin Kalcher > wrote: >> array_shuffle(anyarray) -> anyarray >> array_sample(anyarray, integer) -> anyarray > I think it's questionable whether the behavior of array_shuffle() is > correct for a multi-dimensional array. The implemented behavior is to > keep the dimensions as they were, but permute the elements across all > levels at random. But there are at least two other behaviors that seem > potentially defensible: (1) always return a 1-dimensional array, (2) > shuffle the sub-arrays at the top-level without the possibility of > moving elements within or between sub-arrays. What behavior we decide > is best here should be documented. Martin had originally proposed (2), which I rejected on the grounds that we don't treat multi-dimensional arrays as arrays-of-arrays for any other purpose. Maybe we should have, but that ship sailed decades ago, and I doubt that shuffle() is the place to start changing it. I could get behind your option (1) though, to make it clearer that the input array's dimensionality is not of interest. Especially since, as you say, (1) is pretty much the only sensible choice for array_sample. > I also think you should add test cases involving multi-dimensional > arrays, as well as arrays with non-default bounds. e.g. trying > shuffling or sampling some values like > '[8:10][-6:-5]={{1,2},{3,4},{5,6}}'::int[] This'd only matter if we decide not to ignore the input's dimensionality. regards, tom lane
Re: [Commitfest 2022-07] Begins Now
On 7/18/22 12:32, Andres Freund wrote: > I'm not following - I'm talking about the patch author needing a while to > address the higher level feedback given by a reviewer. The author might put > out a couple new versions, which each might still benefit from review. In that > - pretty common imo - situation I don't think it's useful for the reviewer > that provided the higher level feedback to be removed from the patch. Okay, I think I get it now. Thanks. There's still something off in that case that I can't quite articulate... Is it your intent to use Reviewer as a signal that "I'll come back to this eventually"? As a signal to other prospective reviewers that you're handling the patch? How should a CFM move things forward when they come to a patch that's been responded to by the author but the sole Reviewer has been silent? --Jacob
Re: pg15b2: large objects lost on upgrade
On Mon, Jul 18, 2022 at 4:06 PM Andres Freund wrote: > How about adding a new binary_upgrade_* helper function for this purpose > instead, instead of tying it into truncate? I considered that briefly, but it would need to do a lot of things that TRUNCATE already knows how to do, so it does not seem like a good idea. -- Robert Haas EDB: http://www.enterprisedb.com
Re: [PATCH] Introduce array_shuffle() and array_sample()
On Mon, Jul 18, 2022 at 3:03 PM Martin Kalcher wrote: > Thanks for all your feedback and help. I got a patch that i consider > ready for review. It introduces two new functions: > >array_shuffle(anyarray) -> anyarray >array_sample(anyarray, integer) -> anyarray > > array_shuffle() shuffles an array (obviously). array_sample() picks n > random elements from an array. I like this idea. I think it's questionable whether the behavior of array_shuffle() is correct for a multi-dimensional array. The implemented behavior is to keep the dimensions as they were, but permute the elements across all levels at random. But there are at least two other behaviors that seem potentially defensible: (1) always return a 1-dimensional array, (2) shuffle the sub-arrays at the top-level without the possibility of moving elements within or between sub-arrays. What behavior we decide is best here should be documented. array_sample() will return elements in random order when sample_size < array_size, but in the original order when sample_size >= array_size. Similarly, it will always return a 1-dimensional array in the former case, but will keep the original dimensions in the latter case. That seems pretty hard to defend. I think it should always return a 1-dimensional array with elements in random order, and I think this should be documented. I also think you should add test cases involving multi-dimensional arrays, as well as arrays with non-default bounds. e.g. trying shuffling or sampling some values like '[8:10][-6:-5]={{1,2},{3,4},{5,6}}'::int[] -- Robert Haas EDB: http://www.enterprisedb.com
Re: [RFC] building postgres with meson - v10
Hi, On 2022-07-18 11:33:09 +0200, Peter Eisentraut wrote: > The following patches are ok to commit IMO: > > a1c5542929 prereq: Deal with paths containing \ and spaces in > basebackup_to_shell tests > e37951875d meson: prereq: psql: Output dir and dependency generation for > sql_help > 18cc9fbd02 meson: prereq: ecpg: Add and use output directory argument for > preproc/*.pl > 58a32694e9 meson: prereq: Move snowball_create.sql creation into perl file > 59b8bffdaf meson: prereq: Add output path arg in generate-lwlocknames.pl > 2db97b00d5 meson: prereq: generate-errcodes.pl: Accept output file > fb8f52f21d meson: prereq: unicode: Allow to specify output directory > 8f1e4410d6 meson: prereq: Refactor dtrace postprocessing make rules > 3d18a20b11 meson: prereq: Add --outdir option to gen_node_support.pl I pushed these. Thanks for the reviews and patches! The symbol export stuff has also been pushed (discussed in a separate thread). It's nice to see the meson patchset length reduced by this much. I pushed a rebased version of the remaining branches to git. I'll be on vacation for a bit, I'm not sure I can get a new version with further cleanups out before. Given that we can't use src/tools/gen_versioning_script.pl for the make build, due to not depending on perl for tarball builds, I'm inclined to rewrite it python (which we depend on via meson anyway) and consider it a meson specific wrapper? Bilal, Peter previously commented on the pg_regress change for ecpg, perhaps you can comment on that? In https://postgr.es/m/0e81e45c-c9a5-e95b-2782-ab2dfec8bf57%40enterprisedb.com On 2022-07-06 11:03:31 +0200, Peter Eisentraut wrote: > dff7b5a960 meson: prereq: regress: allow to specify director containing > expected files. > > This could use a bit more explanation, but it doesn't look > controversial so far. Greetings, Andres Freund
Re: [PATCH] Introduce array_shuffle() and array_sample()
Am 18.07.22 um 21:29 schrieb Tom Lane: The preferred thing to do is to add it to our "commitfest" queue, which will ensure that it gets looked at eventually. The currently open cycle is 2022-09 [1] (see the "New Patch" button there). Thanks Tom, did that. I am not sure if "SQL Commands" is the correct topic for this patch, but i guess its not that important. I am impressed by all the infrastructure build around this project. Martin
Re: System column support for partitioned tables using heap
On Sun, Jul 17, 2022 at 9:04 PM Morris de Oryx wrote: > This fails on a partitioned table because xmax() may not exist. In fact, it > does exist in all of those tables, but the system doesn't know how to > guarantee that. I know which tables are partitioned, and can downgrade the > result on partitioned tables to the count(*) I've been using to date. But now > I'm wondering if working with xmax() like this is a poor idea going forward. > I don't want to lean on a feature/behavior that's likely to change. For > example, I noticed the other day that MERGE does not support RETURNING. > > I'd appreciate any insight or advice you can offer. What is motivating you to want to see the xmax value here? It's not an unreasonable thing to want to do, IMHO, but it's a little bit niche so I'm just curious what the motivation is. I do agree with you that it would be nice if this worked better than it does, but I don't really know exactly how to make that happen. The column list for a partitioned table must be fixed at the time it is created, but we do not know what partitions might be added in the future, and thus we don't know whether they will have an xmax column. I guess we could have tried to work things out so that a 0 value would be passed up from children that lack an xmax column, and that would allow the parent to have such a column, but I don't feel too bad that we didn't do that ... should I? -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg15b2: large objects lost on upgrade
On Mon, Jul 18, 2022 at 02:57:40PM -0400, Robert Haas wrote: > So I tried implementing this but I didn't get it quite right the first > time. It's not enough to call smgrdounlinkall() instead of > RelationDropStorage(), because just as RelationDropStorage() does not > actually drop the storage but only schedules it to be dropped later, > so also smgrdounlinkall() does not in fact unlink all, but only some. > It leaves the first segment of the main relation fork around to guard > against the hazards discussed in the header comments for mdunlink(). > But those hazards don't really matter here either, because, again, any > failure will necessarily require that the entire new cluster be > destroyed, and also because there shouldn't be any concurrent activity > in the new cluster while pg_upgrade is running. So I adjusted > smgrdounlinkall() to actually remove everything when IsBinaryUpgrade = > true. And then it all seems to work: pg_upgrade of a cluster that has > had a rewrite of pg_largeobject works, and pg_upgrade of a cluster > that has not had such a rewrite works too. Wa-hoo. Using the IsBinaryUpgrade flag makes sense to me. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson
Re: pg15b2: large objects lost on upgrade
Hi, On 2022-07-18 14:57:40 -0400, Robert Haas wrote: > As to whether this is a good fix, I think someone could certainly > argue otherwise. This is all a bit grotty. However, I don't find it > all that bad. As long as we're moving files from between one PG > cluster and another using an external tool rather than logic inside > the server itself, I think we're bound to have some hacks someplace to > make it all work. To me, extending them to a few more places to avoid > leaving files behind on disk seems like a good trade-off. Your mileage > may vary. How about adding a new binary_upgrade_* helper function for this purpose instead, instead of tying it into truncate? Greetings, Andres Freund
Re: Commitfest Update
On Mon, Jul 18, 2022 at 12:06:34PM -0700, Jacob Champion wrote: > On 7/17/22 08:17, Nathan Bossart wrote: >> Yeah, I happened to look in my commitfest update folder this morning and >> was surprised to learn that I was no longer reviewing 3612. I spent a good >> amount of time getting that patch into a state where I felt it was >> ready-for-committer, and I intended to follow up on any further >> developments in the thread. It's not uncommon for me to use the filter >> functionality in the app to keep track of patches I'm reviewing. > > I'm sorry again for interrupting that flow. Thank you for speaking up > and establishing the use case. No worries. Thanks for managing this commitfest! -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Doc about how to set max_wal_senders when setting minimal wal_level
On Fri, Jul 15, 2022 at 09:29:20PM +0800, Japin Li wrote: > > On Fri, 15 Jul 2022 at 08:49, Bruce Momjian wrote: > > On Tue, Jul 5, 2022 at 08:02:33PM -0400, Tom Lane wrote: > >> "Precondition" is an overly fancy word that makes things less clear > >> not more so. Does it mean that setting wal_level = minimal will fail > >> if you don't do these other things, or does it just mean that you > >> won't be getting the absolute minimum WAL volume? If the former, > >> I think it'd be better to say something like "To set wal_level to minimal, > >> you must also set [these variables], which has the effect of disabling > >> both WAL archiving and streaming replication." > > > > I have created the attached patch to try to improve this text. > > IMO we can add the following sentence for wal_level description, since > if wal_level = minimal and max_wal_senders > 0, we cannot start the database. > > To set wal_level to minimal, you must also set max_wal_senders to 0, > which has the effect of disabling both WAL archiving and streaming > replication. Okay, text added in the attached patch. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 37fd80388c..4c0489c9c3 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -2764,9 +2764,10 @@ include_dir 'conf.d' levels. This parameter can only be set at server start. -In minimal level, no information is logged for -permanent relations for the remainder of a transaction that creates or -rewrites them. This can make operations much faster (see +The minimal level generates the least WAL +volume. It logs no row information for permanent relations +in transactions that create or +rewrite them. This can make operations much faster (see ). Operations that initiate this optimization include: @@ -2778,19 +2779,20 @@ include_dir 'conf.d' REINDEX TRUNCATE -But minimal WAL does not contain enough information to reconstruct the -data from a base backup and the WAL logs, so replica or -higher must be used to enable WAL archiving -() and streaming replication. +However, minimal WAL does not contain sufficient information for +point-in-time recovery, so replica or +higher must be used to enable continuous archiving +() and streaming binary replication. +In fact, the server will not even start in this mode if +max_wal_senders is non-zero. Note that changing wal_level to -minimal makes any base backups taken before -unavailable for archive recovery and standby server, which may -lead to data loss. +minimal makes any base backups taken before this +unusable for point-in-time recovery and standby servers. In logical level, the same information is logged as -with replica, plus information needed to allow -extracting logical change sets from the WAL. Using a level of +with replica, plus information needed to +extract logical change sets from the WAL. Using a level of logical will increase the WAL volume, particularly if many tables are configured for REPLICA IDENTITY FULL and many UPDATE and DELETE statements are
Re: Use fadvise in wal replay
On Thu, Jun 23, 2022 at 5:49 AM Jakub Wartak wrote: > Cool. As for GUC I'm afraid there's going to be resistance of adding yet > another GUC (to avoid many knobs). Ideally it would be nice if we had some > advanced/deep/hidden parameters , but there isn't such thing. > Maybe another option would be to use (N * maintenance_io_concurrency * > XLOG_BLCKSZ), so N=1 that's 80kB and N=2 160kB (pretty close to default > value, and still can be tweaked by enduser). Let's wait what others say? I don't think adding more parameters is a problem intrinsically. A good question to ask, though, is how the user is supposed to know what value they should configure. If we don't have any idea what value is likely to be optimal, odds are users won't either. It's not very clear to me that we have any kind of agreement on what the basic approach should be here, though. -- Robert Haas EDB: http://www.enterprisedb.com
Re: [Commitfest 2022-07] Begins Now
Hi, On 2022-07-18 12:22:25 -0700, Jacob Champion wrote: > [dev hat] > > On 7/15/22 18:07, Andres Freund wrote: > > IDK, I've plenty times given feedback and it took months till it all was > > implemented. What's the point of doing further rounds of review until then? > > I guess I would wonder why we're optimizing for that case. Is it helpful > for that patch to stick around in an active CF for months? I'm not following - I'm talking about the patch author needing a while to address the higher level feedback given by a reviewer. The author might put out a couple new versions, which each might still benefit from review. In that - pretty common imo - situation I don't think it's useful for the reviewer that provided the higher level feedback to be removed from the patch. Greetings, Andres Freund
Re: [PATCH] Introduce array_shuffle() and array_sample()
Martin Kalcher writes: > Is someone interested in looking at it? What are the next steps? The preferred thing to do is to add it to our "commitfest" queue, which will ensure that it gets looked at eventually. The currently open cycle is 2022-09 [1] (see the "New Patch" button there). I believe you have to have signed up as a community member[2] in order to put yourself in as the patch author. I think "New Patch" will work anyway, but it's better to have an author listed. regards, tom lane [1] https://commitfest.postgresql.org/39/ [2] https://www.postgresql.org/account/login/
Re: replacing role-level NOINHERIT with a grant-level option
On Thu, Jul 14, 2022 at 10:53 AM tushar wrote: > GRANT "g2" TO "u1" WITH GRANTED BY "edb"; Another good catch. Here is v5 with a fix for that problem. -- Robert Haas EDB: http://www.enterprisedb.com v5-0001-Allow-grant-level-control-of-role-inheritance-beh.patch Description: Binary data
Re: [Commitfest 2022-07] Begins Now
[dev hat] On 7/15/22 18:07, Andres Freund wrote: > IDK, I've plenty times given feedback and it took months till it all was > implemented. What's the point of doing further rounds of review until then? I guess I would wonder why we're optimizing for that case. Is it helpful for that patch to stick around in an active CF for months? There's an established need for keeping a "TODO item" around and not letting it fall off, but I think that should remain separate in an application which seems to be focused on organizing active volunteers. And if that's supposed to be what Waiting on Author is for, then I think we need more guidance on how to use that status effectively. Some reviewers seem to use it as a "replied" flag. I think there's a meaningful difference between soft-blocked on review feedback and hard-blocked on new implementation. And maybe there's even a middle state, where the patch just needs someone to do a mindless rebase. I think you're in a better position than most to "officially" decide that a patch can no longer benefit from review. Most of us can't do that, I imagine -- nor should we. Thanks, --Jacob
Re: fix crash with Python 3.11
On 23.06.22 09:41, Markus Wanner wrote: On 6/21/22 18:33, Tom Lane wrote: My inclination at this point is to not back-patch the second change 12d768e70 ("Don't use static storage for SaveTransactionCharacteristics"). It's not clear that the benefit would be worth even a small risk of somebody being unhappy about the API break. Actually, the backport of 2e517818f ("Fix SPI's handling of errors") already broke the API for code using SPICleanup, as that function had been removed. Granted, it's not documented, but still exported. I propose to re-introduce a no-op placeholder similar to what we have for SPI_start_transaction, somewhat like the attached patch. I have applied your patch to branches 11 through 14.
Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size
On 2022-07-15 Fr 17:07, Andres Freund wrote: > Hi, > > On 2022-07-08 17:05:49 -0400, Andrew Dunstan wrote: >> On 2022-07-05 Tu 15:04, Andrew Dunstan wrote: >>> On 2022-07-05 Tu 14:36, Andres Freund wrote: >> I think Andrew's beta 2 comment was more about my other architectural >> complains around the json expression eval stuff. > Right. That's being worked on but it's not going to be a mechanical fix. Any updates here? >>> Not yet. A colleague and I are working on it. I'll post a status this >>> week if we can't post a fix. >> We're still working on it. We've made substantial progress but there are >> some tests failing that we need to fix. > I think we need to resolve this soon - or consider the alternatives. A lot of > the new json stuff doesn't seem fully baked, so I'm starting to wonder if we > have to consider pushing it a release further down. > > Perhaps you could post your current state? I might be able to help resolving > some of the problems. Ok. Here is the state of things. This has proved to be rather more intractable than I expected. Almost all the legwork here has been done by Amit Langote, for which he deserves both my thanks and considerable credit, but I take responsibility for it. I just discovered today that this scheme is failing under "force_parallel_mode = regress". I have as yet no idea if that can be fixed simply or not. Apart from that I think the main outstanding issue is to fill in the gaps in llvm_compile_expr(). If you have help you can offer that would be very welcome. I'd still very much like to get this done, but if the decision is we've run out of time I'll be sad but understand. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com From 7c0d831f3baf6fb6ec27d1e033209535945e4858 Mon Sep 17 00:00:00 2001 From: Andrew Dunstan Date: Mon, 18 Jul 2022 11:03:13 -0400 Subject: [PATCH 1/3] in JsonExprState just store a pointer to the input FmgrInfo --- src/backend/executor/execExpr.c | 5 - src/backend/executor/execExprInterp.c | 2 +- src/include/executor/execExpr.h | 2 +- 3 files changed, 6 insertions(+), 3 deletions(-) diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c index c8d7145fe3..a55e5000e2 100644 --- a/src/backend/executor/execExpr.c +++ b/src/backend/executor/execExpr.c @@ -2606,11 +2606,14 @@ ExecInitExprRec(Expr *node, ExprState *state, (jexpr->result_coercion && jexpr->result_coercion->via_io)) { Oid typinput; + FmgrInfo *finfo; /* lookup the result type's input function */ getTypeInputInfo(jexpr->returning->typid, , >input.typioparam); - fmgr_info(typinput, >input.func); + finfo = palloc0(sizeof(FmgrInfo)); + fmgr_info(typinput, finfo); + jsestate->input.finfo = finfo; } jsestate->args = NIL; diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c index 723770fda0..0512a81c7c 100644 --- a/src/backend/executor/execExprInterp.c +++ b/src/backend/executor/execExprInterp.c @@ -4664,7 +4664,7 @@ ExecEvalJsonExprCoercion(ExprEvalStep *op, ExprContext *econtext, /* strip quotes and call typinput function */ char *str = *isNull ? NULL : JsonbUnquote(jb); - return InputFunctionCall(>input.func, str, + return InputFunctionCall(jsestate->input.finfo, str, jsestate->input.typioparam, jexpr->returning->typmod); } diff --git a/src/include/executor/execExpr.h b/src/include/executor/execExpr.h index 1e3f1bbee8..e55a572854 100644 --- a/src/include/executor/execExpr.h +++ b/src/include/executor/execExpr.h @@ -754,7 +754,7 @@ typedef struct JsonExprState struct { - FmgrInfo func; /* typinput function for output type */ + FmgrInfo *finfo; /* typinput function for output type */ Oid typioparam; } input; /* I/O info for output type */ -- 2.34.1 From ee39dadaa18f45c59224d4da908b36db7a2a8b0f Mon Sep 17 00:00:00 2001 From: amitlan Date: Mon, 18 Jul 2022 11:04:17 -0400 Subject: [PATCH 2/3] Evaluate various JsonExpr sub-expressions using parent ExprState These include PASSING args, ON ERROR, ON EMPTY default expressions, and result_coercion expression. To do so, this moves a bunch of code from ExecEvalJson(), ExecEvalJsonExpr(), and ExecEvalJsonExprCoercion(), all of which would previously run under the single step EEOP_JSONEXPR steps into a number of new (sub-) ExprEvalSteps that are now added for a given JsonExpr. TODO: update llvm_compile_expr() to handle newly added EEOP_JSONEXPR_* steps. --- src/backend/executor/execExpr.c | 288 +++- src/backend/executor/execExprInterp.c | 366 ++ src/backend/jit/llvm/llvmjit_expr.c | 35 ++- src/include/executor/execExpr.h | 136 -- 4 files changed, 639 insertions(+), 186 deletions(-) diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c index a55e5000e2..4dbb24aaee 100644 ---
Re: Commitfest Update
On 7/17/22 08:17, Nathan Bossart wrote: > On Fri, Jul 15, 2022 at 09:37:14PM -0500, Justin Pryzby wrote: >> I'm not suggesting to give the community regulars special treatment, but you >> could reasonably assume that when they added themselves and then "didn't >> remove >> themself", it was on purpose and not by omission. I think most of those >> people >> would be surprised to learn that they're no longer considered to be reviewing >> the patch. > > Yeah, I happened to look in my commitfest update folder this morning and > was surprised to learn that I was no longer reviewing 3612. I spent a good > amount of time getting that patch into a state where I felt it was > ready-for-committer, and I intended to follow up on any further > developments in the thread. It's not uncommon for me to use the filter > functionality in the app to keep track of patches I'm reviewing. I'm sorry again for interrupting that flow. Thank you for speaking up and establishing the use case. > Of course, there are probably patches where I could be removed from the > reviewers field. I can try to stay on top of that better. If you think I > shouldn't be marked as a reviewer and that it's hindering further review > for a patch, feel free to message me publicly or privately about it. Sure. I don't plan on removing anyone else from a Reviewer list this commitfest, but if I do come across a reason I'll make sure to ask first. :) --Jacob
Re: Commitfest Update
On 7/15/22 19:59, Michael Paquier wrote: > On this point, I'd like to think that a window of two weeks is a right > balance. That's half of the commit fest, so that leaves plenty of > time for one to answer. There is always the case where one is on > vacations for a period longer than that, but it is also possible for > an author to add a new entry in a future CF for the same patch. That seems reasonable. My suggestion was going to be more aggressive, at five days, but really anywhere in that range seems good. --Jacob
[PATCH] Introduce array_shuffle() and array_sample()
Thanks for all your feedback and help. I got a patch that i consider ready for review. It introduces two new functions: array_shuffle(anyarray) -> anyarray array_sample(anyarray, integer) -> anyarray array_shuffle() shuffles an array (obviously). array_sample() picks n random elements from an array. Is someone interested in looking at it? What are the next steps? MartinFrom 5498bb2d9f1fab4cad56cd0d3a6eeafa24a26c8e Mon Sep 17 00:00:00 2001 From: Martin Kalcher Date: Sun, 17 Jul 2022 18:06:04 +0200 Subject: [PATCH] Introduce array_shuffle() and array_sample() * array_shuffle() shuffles to elements of an array. * array_sample() chooses n elements from an array by random. Shuffling of arrays can already be accomplished with SQL using unnest() and array_agg(order by random()). But that is very slow compared to the new functions. --- doc/src/sgml/func.sgml | 34 ++ src/backend/utils/adt/arrayfuncs.c | 163 +++ src/include/catalog/pg_proc.dat | 6 + src/test/regress/expected/arrays.out | 30 + src/test/regress/sql/arrays.sql | 11 ++ 5 files changed, 244 insertions(+) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index b6783b7ad0..c676031b4a 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -19395,6 +19395,40 @@ SELECT NULLIF(value, '(none)') ... + + + + array_sample + +array_sample ( array anyarray, n integer ) +anyarray + + +Returns n randomly chosen elements from array. + + +array_sample(ARRAY[1,2,3,4,5,6], 3) +{4,5,1} + + + + + + + array_shuffle + +array_shuffle ( anyarray ) +anyarray + + +Shuffles the elements of the array. + + +array_shuffle(ARRAY[1,2,3,4,5,6]) +{4,5,1,3,2,6} + + + diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c index fb167f226a..a6769a8ebf 100644 --- a/src/backend/utils/adt/arrayfuncs.c +++ b/src/backend/utils/adt/arrayfuncs.c @@ -6872,3 +6872,166 @@ trim_array(PG_FUNCTION_ARGS) PG_RETURN_DATUM(result); } + +/* + * Shuffle the elements of an array. + */ +Datum +array_shuffle(PG_FUNCTION_ARGS) +{ + ArrayType *array, + *result; + int16 elmlen; + bool elmbyval; + char elmalign; + Oid elmtyp; + TypeCacheEntry *typentry; + Datum *elms, +elm; + bool *nuls, +nul; + int nelms, +i, +j; + + array = PG_GETARG_ARRAYTYPE_P(0); + nelms = ArrayGetNItems(ARR_NDIM(array), ARR_DIMS(array)); + + /* There is no point in shuffling arrays with less then two elements. */ + if (nelms < 2) + PG_RETURN_ARRAYTYPE_P(array); + + elmtyp = ARR_ELEMTYPE(array); + typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra; + + if (typentry == NULL || typentry->type_id != elmtyp) + { + typentry = lookup_type_cache(elmtyp, 0); + fcinfo->flinfo->fn_extra = (void *) typentry; + } + elmlen = typentry->typlen; + elmbyval = typentry->typbyval; + elmalign = typentry->typalign; + + deconstruct_array(array, elmtyp, elmlen, elmbyval, elmalign, + , , ); + + /* Shuffle elements and nulls using Fisher-Yates shuffle algorithm. */ + for (i = nelms - 1; i > 0; i--) + { + j = random() % (i + 1); + elm = elms[i]; + nul = nuls[i]; + elms[i] = elms[j]; + nuls[i] = nuls[j]; + elms[j] = elm; + nuls[j] = nul; + } + + result = construct_md_array(elms, nuls, +ARR_NDIM(array), ARR_DIMS(array), ARR_LBOUND(array), +elmtyp, elmlen, elmbyval, elmalign); + + pfree(elms); + pfree(nuls); + PG_FREE_IF_COPY(array, 0); + + PG_RETURN_ARRAYTYPE_P(result); +} + +/* + * Choose N random elements from an array. + */ +Datum +array_sample(PG_FUNCTION_ARGS) +{ + ArrayType *array, + *result; + int16 elmlen; + bool elmbyval; + char elmalign; + Oid elmtyp; + TypeCacheEntry *typentry; + Datum *elms, +elm; + bool *nuls, +nul; + int nelms, +rnelms, +rdims[1], +rlbs[1], +i, +j; + + array = PG_GETARG_ARRAYTYPE_P(0); + nelms = ArrayGetNItems(ARR_NDIM(array), ARR_DIMS(array)); + elmtyp = ARR_ELEMTYPE(array); + rnelms = PG_GETARG_INT32(1); + + if (rnelms < 0) + ereport(ERROR, +(errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("second parameter must not be negative"))); + + /* Return an empty array if the requested sample size is zero. */ + if (rnelms == 0) + { + PG_FREE_IF_COPY(array, 0); + PG_RETURN_ARRAYTYPE_P(construct_empty_array(elmtyp)); + } + + /* + * Return the original array if the requested sample size is greater than + * or equal to its own size. + */ + if (rnelms >= nelms) + PG_RETURN_ARRAYTYPE_P(array); + + typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra; + + if (typentry == NULL || typentry->type_id != elmtyp) + { + typentry = lookup_type_cache(elmtyp, 0); + fcinfo->flinfo->fn_extra = (void *)
Re: Commitfest Update
On 7/18/22 06:13, Justin Pryzby wrote: > On Mon, Jul 18, 2022 at 03:05:51PM +0200, Alvaro Herrera wrote: >> Maybe we should have two reviewers columns -- one for history-tracking >> purposes (and commit msg credit) and another for current ones. > > Maybe. Or, the list of reviewers shouldn't be shown prominently in the list > of > patches. But changing that would currently break cfbot's web scraping. I think separating use cases of "what you can currently do for this patch" and "what others have historically done for this patch" is important. Whether that's best done with more columns or with some other workflow, I'm not sure. It seems like being able to mark items on a personal level, in a way that doesn't interfere with recordkeeping being done centrally, could help as well. --Jacob
Re: Commitfest Update
Justin, (Consolidating replies here.) On 7/15/22 19:13, Justin Pryzby wrote: > cfbot is Thomas's project, so moving it run on postgres vm was one step, but I > imagine the "integration with cfapp" requires coordination with Magnus. > > What patch ? https://www.postgresql.org/message-id/CAAWbhmg84OsO5VkaSjX4jokHy8mdpWpNKFgZJHHbb4mprXmtiQ%40mail.gmail.com It was intended to be a toe in the water -- see if I'm following conventions, and if I even have the right list. >>> Similarly, patches could be summarily set to "waiting on author" if they >>> didn't >>> recently apply, compile, and pass tests. That's the minimum standard. >>> However, I think it's better not to do this immediately after the patch >>> stops >>> applying/compiling/failing tests, since it's usually easy enough to review >>> it. >> >> It's hard to argue with that, but without automation, this is plenty of >> busy work too. > > I don't think that's busywork, since it's understood to require human > judgement, like 1) to deal with false-positive test failures, and 2) check if > there's actually anything left for the author to do; 3) check if it passed > tests recently; 4) evaluate existing opinions in the thread and make a > judgement call. [Dev hat; strong opinions ahead.] I maintain that 1) and 3) are busy work. You should not have to do those things, in the ideal end state. >> I think this may have been the goal but I don't think it actually works >> in practice. The app refuses to let you carry a RwF patch to a new CF. > > I was able to do what Peter said. I don't know why the cfapp has that > restriction, it seems like an artificial constraint. Thanks, I'll work on a patch. > On Fri, Jul 15, 2022 at 05:23:48PM -0700, Jacob Champion wrote: > I'm not suggesting to give the community regulars special treatment, but you > could reasonably assume that when they added themselves and then "didn't > remove > themself", it was on purpose and not by omission. I think most of those > people > would be surprised to learn that they're no longer considered to be reviewing > the patch. For some people, I can maybe(?) assume that, but I'm being honest when I say that I don't really know who that's true for. I definitely don't think it's true for everybody. And once I start making those decisions as a CFM, then it really does boil down to who I know and have interacted with before. > Can you give an example of a patch where you sent a significant review, and > added yourself as a reviewer, but wouldn't mind if someone summarily removed > you, in batch ? Literally all of them. That's probably the key disconnect here, and why I didn't think too hard about doing it. (I promise, I didn't think to myself "I would really hate it if someone did this to me", and then go ahead and do it to twenty-some other people. :D) I come from OSS communities that discourage cookie-licking, whether accidental or on purpose. I don't like marking myself as a Reviewer in general (although I have done it, because it seems like the thing to do here?). Simultaneous reviews are never "wasted work" and I'd just rather not call dibs on a patch. So I wouldn't have a problem with someone coming along, seeing that I haven't interacted with a patch for a while, and removing my name. I trust that committers will give credit if credit is due. > The stated goal was to avoid the scenario that a would-be reviewer decides not > to review a patch because cfapp already shows someone else as a reviewer. I'm > sure that could happen, but I doubt it's something that happens frequently.. I do that every commitfest. It's one of the first things I look at to determine what to pay attention to, because I'm trying to find patches that have slipped through the cracks. As Tom pointed out, others do it too, though I don't know how many or if their motivations match mine. >> Why do you think it's useful to remove annotations that people added ? (And, >> if >> it were useful, why shouldn't that be implemented in the cfapp, which already >> has all the needed information.) > > Or, to say it differently, since "reviewers" are preserved when a patch is > moved to the next CF, it comes as a surprise when by some other mechanism or > policy the field doesn't stay there. (If it's intended to be more like a > per-CF field, I think its behavior should be changed in the cfapp, to avoid > manual effort, and to avoid other people executing it differently.) It was my assumption, based on the upthread discussion, that that was the end goal, and that we just hadn't implemented it yet for lack of time. >> It undoes work that you and others have done to make the historical >> record more accurate, and I think that's understandably frustrating. But >> I thought we were trying to move away from that usage of it altogether. > > I gather that your goal was to make the "reviewers" field more like "people > who > are reviewing the current version of the patch", to make it easy to > find/isolate
Re: pg15b2: large objects lost on upgrade
On Tue, Jul 12, 2022 at 4:51 PM Robert Haas wrote: > I have a few more ideas to try here. It occurs to me that we could fix > this more cleanly if we could get the dump itself to set the > relfilenode for pg_largeobject to the desired value. Right now, it's > just overwriting the relfilenode stored in the catalog without > actually doing anything that would cause a change on disk. But if we > could make it change the relfilenode in a more principled way that > would actually cause an on-disk change, then the orphaned-file problem > would be fixed, because we'd always be installing the new file over > top of the old file. I'm going to investigate how hard it would be to > make that work. Well, it took a while to figure out how to make that work, but I believe I've got it now. Attached please find a couple of patches that should get the job done. They might need a bit of polish, but I think the basic concepts are sound. My first thought was to have the dump issue VACUUM FULL pg_largeobject after first calling binary_upgrade_set_next_heap_relfilenode() and binary_upgrade_set_next_index_relfilenode(), and have the VACUUM FULL use the values configured by those calls for the new heap and index OID. I got this working in standalone testing, only to find that this doesn't work inside pg_upgrade. The complaint is "ERROR: VACUUM cannot run inside a transaction block", and I think that happens because pg_restore sends the entire TOC entry for a single object to the server as a single query, and here it contains multiple SQL commands. That multi-command string ends up being treated like an implicit transaction block. So my second thought was to have the dump still call binary_upgrade_set_next_heap_relfilenode() and binary_upgrade_set_next_index_relfilenode(), but then afterwards call TRUNCATE rather than VACUUM FULL. I found that a simple change to RelationSetNewRelfilenumber() did the trick: I could then easily change the heap and index relfilenodes for pg_largeobject to any new values I liked. However, I realized that this approach had a problem: what if the pg_largeobject relation had never been rewritten in the old cluster? Then the heap and index relfilenodes would be unchanged. It's also possible that someone might have run REINDEX in the old cluster, in which case it might happen that the heap relfilenode was unchanged, but the index relfilenode had changed. I spent some time fumbling around with trying to get the non-transactional truncate path to cover these cases, but the fact that we might need to change the relfilenode for the index but not the heap makes this approach seem pretty awful. But it occurred to me that in the case of a pg_upgrade, we don't really need to keep the old storage around until commit time. We can unlink it first, before creating the new storage, and then if the old and new storage happen to be the same, it still works. I can think of two possible objections to this way forward. First, it means that the operation isn't properly transactional. However, if the upgrade fails at this stage, the new cluster is going to have to be nuked and recreated anyway, so the fact that things might be in an unclean state after an ERROR is irrelevant. Second, one might wonder whether such a fix is really sufficient. For example, what happens if the relfilenode allocated to pg_largebject in the old cluster is assigned to its index in the new cluster, and vice versa? To make that work, we would need to remove all storage for all relfilenodes first, and then recreate them all afterward. However, I don't think we need to make that work. If an object in the old cluster has a relfilenode < 16384, then that should mean it's never been rewritten and therefore its relfilenode in the new cluster should be the same. The only way this wouldn't be true is if we shuffled around the initial relfilenode assignments in a new version of PG so that the same values were used but now for different objects, which would be a really dumb idea. And on the other hand, if the object in the old cluster has a relfilenode > 16384, then that relfilenode value should be unused in the new cluster. If not, the user has tinkered with the new cluster more than they ought. So I tried implementing this but I didn't get it quite right the first time. It's not enough to call smgrdounlinkall() instead of RelationDropStorage(), because just as RelationDropStorage() does not actually drop the storage but only schedules it to be dropped later, so also smgrdounlinkall() does not in fact unlink all, but only some. It leaves the first segment of the main relation fork around to guard against the hazards discussed in the header comments for mdunlink(). But those hazards don't really matter here either, because, again, any failure will necessarily require that the entire new cluster be destroyed, and also because there shouldn't be any concurrent activity in the new cluster while pg_upgrade is running. So I adjusted smgrdounlinkall() to
Re: proposal: possibility to read dumped table's name from file
ne 17. 7. 2022 v 16:01 odesílatel Justin Pryzby napsal: > Thanks for updating the patch. > > This failed to build on windows. > http://cfbot.cputube.org/pavel-stehule.html > > Yes, there was a significant problem with the function exit_nicely, that is differently implemented in pg_dump and pg_dumpall. > Some more comments inline. > > On Sun, Jul 17, 2022 at 08:20:47AM +0200, Pavel Stehule wrote: > > The attached patch implements the --filter option for pg_dumpall and for > > pg_restore too. > > > diff --git a/doc/src/sgml/ref/pg_dump.sgml > b/doc/src/sgml/ref/pg_dump.sgml > > index 5efb442b44..ba2920dbee 100644 > > --- a/doc/src/sgml/ref/pg_dump.sgml > > +++ b/doc/src/sgml/ref/pg_dump.sgml > > @@ -779,6 +779,80 @@ PostgreSQL documentation > > > > > > > > + > > + --filter= class="parameter">filename > > + > > + > > +Specify a filename from which to read patterns for objects to > include > > +or exclude from the dump. The patterns are interpreted > according to the > > +same rules as the corresponding options: > > +-t/--table for tables, > > +-n/--schema for schemas, > > +--include-foreign-data for data on foreign > servers and > > +--exclude-table-data for table data. > > +To read from STDIN use > - as the STDIN comma > fixed > > > + > > +Lines starting with # are considered > comments and > > +are ignored. Comments can be placed after filter as well. Blank > lines > > change "are ignored" to "ignored", I think. > changed > > > diff --git a/doc/src/sgml/ref/pg_dumpall.sgml > b/doc/src/sgml/ref/pg_dumpall.sgml > > index 8a081f0080..137491340c 100644 > > --- a/doc/src/sgml/ref/pg_dumpall.sgml > > +++ b/doc/src/sgml/ref/pg_dumpall.sgml > > @@ -122,6 +122,29 @@ PostgreSQL documentation > > > > > > > > + > > + --filter= class="parameter">filename > > + > > + > > +Specify a filename from which to read patterns for databases > excluded > > +from dump. The patterns are interpretted according to the same > rules > > +like --exclude-database. > > same rules *as* > fixed > > > +To read from STDIN use > - as the > > comma > fixed > > > +filename. The --filter option can be > specified in > > +conjunction with the above listed options for including or > excluding > > For dumpall, remove "for including or" > change "above listed options" to "exclude-database" ? > fixed > > > diff --git a/doc/src/sgml/ref/pg_restore.sgml > b/doc/src/sgml/ref/pg_restore.sgml > > index 526986eadb..5f16c4a333 100644 > > --- a/doc/src/sgml/ref/pg_restore.sgml > > +++ b/doc/src/sgml/ref/pg_restore.sgml > > @@ -188,6 +188,31 @@ PostgreSQL documentation > > > > > > > > + > > + --filter= class="parameter">filename > > + > > + > > +Specify a filename from which to read patterns for objects > excluded > > +or included from restore. The patterns are interpretted > according to the > > +same rules like --schema, > --exclude-schema, > > s/like/as/ > changed > > > +--function, --index, > --table > > +or --trigger. > > +To read from STDIN use > - as the > > STDIN comma > fixed > > > +/* > > + * filter_get_keyword - read the next filter keyword from buffer > > + * > > + * Search for keywords (limited to containing ascii alphabetic > characters) in > > remove "containing" > fixed > > > + /* > > + * If the object name pattern has been quoted we must take care > parse out > > + * the entire quoted pattern, which may contain whitespace and can > span > > + * over many lines. > > quoted comma > *to parse > remove "over" > fixed > > > + * The pattern is either simple without any whitespace, or properly > quoted > > double space > fixed > > > + * in case there is whitespace in the object name. The pattern handling > follows > > s/is/may be/ > fixed > > > + if (size == 7 && pg_strncasecmp(keyword, > "include", 7) == 0) > > + *is_include = true; > > + else if (size == 7 && pg_strncasecmp(keyword, > "exclude", 7) == 0) > > + *is_include = false; > > Can't you write strncasecmp(keyword, "include", size) to avoid hardcoding > "7" ? > I need to compare the size of the keyword with expected size, but I can use strlen(conststr). I wrote new macro is_keyword_str to fix this issue fixed > > > + > > + if (size == 4 && pg_strncasecmp(keyword, "data", > 4) == 0) > > + *objtype = FILTER_OBJECT_TYPE_DATA; > > + else if (size == 8 && pg_strncasecmp(keyword, > "database", 8) == 0) > > + *objtype = FILTER_OBJECT_TYPE_DATABASE; > > + else if (size == 12 && pg_strncasecmp(keyword, > "foreign_data", 12) == 0) > > +
Re: postgres_fdw: using TABLESAMPLE to collect remote sample
Tomas Vondra writes: > Thanks. I'll switch this to "needs review" now. OK, I looked through this, and attach some review suggestions in the form of a delta patch. (0001 below is your two patches merged, 0002 is my delta.) A lot of the delta is comment-smithing, but not all. After reflection I think that what you've got, ie use reltuples but don't try to sample if reltuples <= 0, is just fine. The remote would only have reltuples <= 0 in a never-analyzed table, which shouldn't be a situation that persists for long unless the table is tiny. Also, if reltuples is in error, the way to bet is that it's too small thanks to the table having been enlarged. But an error in that direction doesn't hurt us: we'll overestimate the required sample_frac and pull back more data than we need, but we'll still end up with a valid sample of the right size. So I doubt it's worth the complication to try to correct based on relpages etc. (Note that any such correction would almost certainly end in increasing our estimate of reltuples. But it's safer to have an underestimate than an overestimate.) I messed around with the sample_frac choosing logic slightly, to make it skip pointless calculations if we decide right off the bat to disable sampling. That way we don't need to worry about avoiding zero divide, nor do we have to wonder if any of the later calculations could misbehave. I left your logic about "disable if saving fewer than 100 rows" alone, but I have to wonder if using an absolute threshold rather than a relative one is well-conceived. Sampling at a rate of 99.9 percent seems pretty pointless, but this code is perfectly capable of accepting that if reltuples is big enough. So personally I'd do that more like if (sample_frac > 0.95) method = ANALYZE_SAMPLE_OFF; which is simpler and would also eliminate the need for the previous range-clamp step. I'm not sure what the right cutoff is, but your "100 tuples" constant is just as arbitrary. I rearranged the docs patch too. Where you had it, analyze_sampling was between fdw_startup_cost/fdw_tuple_cost and the following para discussing them, which didn't seem to me to flow well at all. I ended up putting analyze_sampling in its own separate list. You could almost make a case for giving it its own , but I concluded that was probably overkill. One thing I'm not happy about, but did not touch here, is the expense of the test cases you added. On my machine, that adds a full 10% to the already excessively long runtime of postgres_fdw.sql --- and I do not think it's buying us anything. It is not this module's job to test whether bernoulli sampling works on partitioned tables. I think you should just add enough to make sure we exercise the relevant code paths in postgres_fdw itself. With these issues addressed, I think this'd be committable. regards, tom lane diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index a9766f9734..ea2139fbfa 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -2369,14 +2369,56 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel) appendStringInfo(buf, "::pg_catalog.regclass) / %d", BLCKSZ); } +/* + * Construct SELECT statement to acquire a number of rows of a relation. + * + * Note: Maybe this should compare relpages and current relation size + * and adjust reltuples accordingly? + */ +void +deparseAnalyzeTuplesSql(StringInfo buf, Relation rel) +{ + StringInfoData relname; + + /* We'll need the remote relation name as a literal. */ + initStringInfo(); + deparseRelation(, rel); + + appendStringInfoString(buf, "SELECT reltuples FROM pg_catalog.pg_class WHERE oid = "); + deparseStringLiteral(buf, relname.data); + appendStringInfoString(buf, "::pg_catalog.regclass"); +} + /* * Construct SELECT statement to acquire sample rows of given relation. * * SELECT command is appended to buf, and list of columns retrieved * is returned to *retrieved_attrs. + * + * XXX We allow customizing the sampling method, but we only support methods + * we can decide based on server version. Allowing custom TSM modules (for + * example tsm_system_rows) might be useful, but it would require detecting + * which extensions are installed, to allow automatic fall-back. Moreover, the + * methods use different parameters (not sampling rate). So we don't do this + * for now, leaving it for future improvements. + * + * XXX Using remote random() to sample rows has advantages & disadvantages. + * The advantages are that this works on all PostgreSQL versions (unlike + * TABLESAMPLE), and that it does the sampling on the remote side (unlike + * the old approach, which transfers everything and then discards most data). + * We could also do "ORDER BY random() LIMIT x", which would always pick + * the expected number of rows, but it requires sorting so it's a bit more + * expensive. + * + * The disadvantage is that we still have to read all rows and
Re: Use "WAL segment" instead of "log segment" consistently in user-facing messages
Overall, these patches look reasonable. On Mon, Jul 18, 2022 at 04:24:12PM +0530, Bharath Rupireddy wrote: > record. Because the entire content of data pages is saved in the > - log on the first page modification after a checkpoint (assuming > + WAL record on the first page modification after a checkpoint (assuming > is not disabled), all pages > changed since the checkpoint will be restored to a consistent > state. nitpick: I would remove the word "record" in this change. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
On Mon, Jul 18, 2022 at 04:53:18PM +0530, Bharath Rupireddy wrote: > Just wondering - do we ever have a problem if we can't remove the > snapshot or mapping file? Besides running out of disk space, there appears to be a transaction ID wraparound risk with the mappings files. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Commitfest Update
On Fri, Jul 15, 2022 at 05:23:48PM -0700, Jacob Champion wrote: >> This is important stuff to discuss, for sure, but I also want to revisit >> the thing I put on pause, which is to clear out old Reviewer entries to >> make it easier for new reviewers to find work to do. If we're not using >> Reviewers as a historical record, is there any reason for me not to keep >> clearing that out? On Fri, Jul 15, 2022 at 09:37:14PM -0500, Justin Pryzby wrote: > Why do you think it's useful to remove annotations that people added ? (And, > if > it were useful, why shouldn't that be implemented in the cfapp, which already > has all the needed information.) Or, to say it differently, since "reviewers" are preserved when a patch is moved to the next CF, it comes as a surprise when by some other mechanism or policy the field doesn't stay there. (If it's intended to be more like a per-CF field, I think its behavior should be changed in the cfapp, to avoid manual effort, and to avoid other people executing it differently.) > It undoes work that you and others have done to make the historical > record more accurate, and I think that's understandably frustrating. But > I thought we were trying to move away from that usage of it altogether. I gather that your goal was to make the "reviewers" field more like "people who are reviewing the current version of the patch", to make it easy to find/isolate patch-versions which need to be reviewed, and hopefully accelarate the process. But IMO there's already no trouble finding the list of patches which need to be reviewed - it's the long list that say "Needs Review" - which is what's actually needed; that's not easy to do, which is why it's a long list, and no amount of updating the annotations will help with that. I doubt many people search for patches to review by seeking out those which have no reviewer (which is not a short list anyway). I think they look for the most interesting patches, or the ones that are going to be specifically useful to them. Here's an idea: send out batch mails to people who are listed as reviewers for patches which "Need Review". That starts to treat the reviewers field as a functional thing rather than purely an annotation. Be sure in your message to say "You are receiving this message because you're listed as a reviewer for a patch which -Needs Review-". I think it's reasonable to get a message like that 1 or 2 times per month (but not per-month-per-patch). Ideally it'd include the list of patches specific to that reviewer, but I think it'd okay to get an un-personalized email reminder 1x/month with a link. BTW, one bulk update to make is for the few dozen patches that say "v15" on them, and (excluding bugfixes) those are nearly all wrong. Since the field isn't visible in cfbot, it's mostly ignored. The field is useful toward the end of a release cycle to indicate patches that aren't intended for consideration for the next release. Ideally, it'd also be used to indicate the patches *are* being considered, but it seems like nobody does that and it ends up being a surprise which patches are or are not committed (this seems weird and easy to avoid but .. ). The patches which say "v15" are probably from patches submitted during the v15 cycle, and now the version should be removed, unless there's a reason to believe the patch is going to target v16 (like if a committer has assigned themself). Thanks for receiving my criticism well :) -- Justin
Re: support for CREATE MODULE
On Thu, Mar 17, 2022 at 04:30:43PM -0700, Nathan Bossart wrote: > On Thu, Mar 17, 2022 at 04:26:31PM -0700, Swaha Miller wrote: >> On Thu, Mar 17, 2022 at 4:16 PM Nathan Bossart >> wrote: >>> It seems unlikely that this will be committed for v15. Swaha, should the >>> commitfest entry be adjusted to v16 and moved to the next commitfest? >>> >>> >> Yes please, thank you Nathan > > Done! I spoke with Swaha off-list, and we agreed that this commitfest entry can be closed for now. I'm going to mark it as returned-with-feedback. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: [Commitfest 2022-07] Begins Now
On 7/15/22 16:42, Jacob Champion wrote: > If you have thoughts/comments on this approach, please share them! Okay, plenty of feedback to sift through here. [CFM hat] First of all: mea culpa. I unilaterally made a change that I had assumed would be uncontroversial; it clearly was not, and I interrupted the flow of the CF for people when my goal was to be mostly invisible this month. (My single email to a single thread saying "any objections?" is, in retrospect, not nearly enough reach or mandate to have made this change.) Big thank you to Justin for seeing it happen and speaking up immediately. Here is a rough summary of opinions that have been shared so far; pulled from the other thread [1] as well: There are at least three major use cases for the Reviewer field at the moment. 1) As a new reviewer, find a patch that needs help moving forward. 2) As a committer, give credit to people who moved the patch forward. 3) As an established reviewer, keep track of patches "in flight." I had never realized the third case existed. To those of you who I've interrupted by modifying your checklist without permission, I'm sorry. I see that several of you have already added yourselves back, which is great; I will try to find the CF update stream that has been alluded to elsewhere and see if I can restore the original Reviewers lists that I nulled out on Friday. It was suggested that we track historical reviewers and current reviews separately from each other, to handle both cases 1 and 2. There appears to be a need for people to be able to consider a patch "blocked" pending some action, so that further review cycles aren't burned on it. Some people use Waiting on Author for that, but others use WoA as soon as an email is sent. The two cases have similarities but, to me at least, aren't the same and may be working at cross purposes. It is is apparently possible to pull one of your closed patches from a prior commitfest into the new one, but you have to set it back to Needs Review first. I plan to work on a CF patch to streamline that, if someone does not beat me to it. Okay, I think those are the broad strokes. I will put my [dev hat] on now and respond more granularly to threads, with stronger opinions. Thanks, --Jacob [1] https://www.postgresql.org/message-id/flat/34b32cb2-a728-090a-00d5-067305874174%40timescale.com#3247e661b219f8736ae418c9b5452d63
Re: Convert planner's AggInfo and AggTransInfo to Nodes
Tom Lane writes: > I got annoyed just now upon finding that pprint() applied to the planner's > "root" pointer doesn't dump root->agginfos or root->aggtransinfos. That's > evidently because AggInfo and AggTransInfo aren't proper Nodes, just bare > structs, which presumably is because somebody couldn't be bothered to > write outfuncs support for them. I'd say that was a questionable shortcut > even when it was made, and there's certainly precious little excuse now > that gen_node_support.pl can do all the heavy lifting. Hence, PFA a > little finger exercise to turn them into Nodes. I took the opportunity > to improve related comments too, and in particular to fix some comments > that leave the impression that preprocess_minmax_aggregates still does > its own scan of the query tree. (It was momentary confusion over that > idea that got me to the point of being annoyed in the first place.) > > Any objections so far? It seems like a reasonable idea, but I don't know enough to judge the wider ramifications of it. But one thing that the patch should also do, is switch to using the l*_node() functions instead of manual casting. The ones I noticed in the patch/context are below, but there are a few more: > foreach(lc, root->agginfos) > { > AggInfo*agginfo = (AggInfo *) lfirst(lc); AggInfo*agginfo = lfirst_node(AggInfo, lc); […] > foreach(lc, transnos) > { > int transno = lfirst_int(lc); > - AggTransInfo *pertrans = (AggTransInfo *) > list_nth(root->aggtransinfos, transno); > + AggTransInfo *pertrans = (AggTransInfo *) > list_nth(root->aggtransinfos, > + >transno); > + AggTransInfo *pertrans = (AggTransInfo *) > list_nth(root->aggtransinfos, > + >transno); AggTransInfo *pertrans = list_nth_node(AggTransInfo, root->aggtransinfos, transno); - ilmari
Re: Making pg_rewind faster
Hi Tom, Thank you for taking a look at this and that sounds good. I will send over a patch compatible with Postgres v16. Justin From: Tom Lane Sent: July 17, 2022 2:40 PM To: Justin Kwan Cc: pgsql-hackers ; vignesh ; jk...@cloudflare.com ; vignesh ravichandran ; hlinn...@iki.fi Subject: Re: Making pg_rewind faster Justin Kwan writes: > I've also attached the pg_rewind optimization patch file for Postgres version > 14.4. The previous patch file targets version Postgres version 15 Beta 1/2. It's very unlikely that we would consider committing such changes into released branches. In fact, it's too late even for v15. You should be submitting non-bug-fix patches against master (v16-to-be). regards, tom lane
Re: Costing elided SubqueryScans more nearly correctly
Alvaro Herrera writes: > On 2022-Jul-18, Richard Guo wrote: >> BTW, not related to this patch, the new lines for parallel_aware check >> in setrefs.c are very wide. How about wrap them to keep consistent with >> arounding codes? > Not untrue! Something like this, you mean? Fixed the nearby typo while > at it. WFM. (I'd fixed the comment typo in my patch, but I don't mind if you get there first.) regards, tom lane
Re: Costing elided SubqueryScans more nearly correctly
On 2022-Jul-18, Richard Guo wrote: > BTW, not related to this patch, the new lines for parallel_aware check > in setrefs.c are very wide. How about wrap them to keep consistent with > arounding codes? Not untrue! Something like this, you mean? Fixed the nearby typo while at it. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ >From 985ffec3086fc01585cb784a58ddb8975f832350 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Mon, 18 Jul 2022 16:40:01 +0200 Subject: [PATCH] Wrap overly long lines --- src/backend/optimizer/plan/setrefs.c | 29 1 file changed, 17 insertions(+), 12 deletions(-) diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c index 9cef92cab2..707c1016c2 100644 --- a/src/backend/optimizer/plan/setrefs.c +++ b/src/backend/optimizer/plan/setrefs.c @@ -1637,14 +1637,16 @@ set_append_references(PlannerInfo *root, * See if it's safe to get rid of the Append entirely. For this to be * safe, there must be only one child plan and that child plan's parallel * awareness must match that of the Append's. The reason for the latter - * is that the if the Append is parallel aware and the child is not then - * the calling plan may execute the non-parallel aware child multiple - * times. + * is that if the Append is parallel aware and the child is not, then the + * calling plan may execute the non-parallel aware child multiple times. */ - if (list_length(aplan->appendplans) == 1 && - ((Plan *) linitial(aplan->appendplans))->parallel_aware == aplan->plan.parallel_aware) - return clean_up_removed_plan_level((Plan *) aplan, - (Plan *) linitial(aplan->appendplans)); + if (list_length(aplan->appendplans) == 1) + { + Plan *p = (Plan *) linitial(aplan->appendplans); + + if (p->parallel_aware == aplan->plan.parallel_aware) + return clean_up_removed_plan_level((Plan *) aplan, p); + } /* * Otherwise, clean up the Append as needed. It's okay to do this after @@ -1709,14 +1711,17 @@ set_mergeappend_references(PlannerInfo *root, * See if it's safe to get rid of the MergeAppend entirely. For this to * be safe, there must be only one child plan and that child plan's * parallel awareness must match that of the MergeAppend's. The reason - * for the latter is that the if the MergeAppend is parallel aware and the + * for the latter is that if the MergeAppend is parallel aware and the * child is not then the calling plan may execute the non-parallel aware * child multiple times. */ - if (list_length(mplan->mergeplans) == 1 && - ((Plan *) linitial(mplan->mergeplans))->parallel_aware == mplan->plan.parallel_aware) - return clean_up_removed_plan_level((Plan *) mplan, - (Plan *) linitial(mplan->mergeplans)); + if (list_length(mplan->mergeplans) == 1) + { + Plan *p = (Plan *) linitial(mplan->mergeplans); + + if (p->parallel_aware == mplan->plan.parallel_aware) + return clean_up_removed_plan_level((Plan *) mplan, p); + } /* * Otherwise, clean up the MergeAppend as needed. It's okay to do this -- 2.30.2
Convert planner's AggInfo and AggTransInfo to Nodes
I got annoyed just now upon finding that pprint() applied to the planner's "root" pointer doesn't dump root->agginfos or root->aggtransinfos. That's evidently because AggInfo and AggTransInfo aren't proper Nodes, just bare structs, which presumably is because somebody couldn't be bothered to write outfuncs support for them. I'd say that was a questionable shortcut even when it was made, and there's certainly precious little excuse now that gen_node_support.pl can do all the heavy lifting. Hence, PFA a little finger exercise to turn them into Nodes. I took the opportunity to improve related comments too, and in particular to fix some comments that leave the impression that preprocess_minmax_aggregates still does its own scan of the query tree. (It was momentary confusion over that idea that got me to the point of being annoyed in the first place.) Any objections so far? I'm kind of tempted to mount an effort to get rid of as many of pathnodes.h's "read_write_ignore" annotations as possible. Some are necessary to prevent infinite recursion, and others represent considered judgments that they'd bloat node dumps more than they're worth --- but I think quite a lot of them arose from plain laziness about updating outfuncs.c. With the infrastructure we have now, that's no longer a good reason. In particular, I'm tempted to make a dump of PlannerInfo include all the baserel RelOptInfos (not joins though; there could be a mighty lot of those.) I think we didn't print the simple_rel_array[] array before mostly because outfuncs didn't use to have reasonable support for printing arrays. Thoughts? regards, tom lane diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c index 9330908cbf..0f5d8fd978 100644 --- a/src/backend/optimizer/plan/planagg.c +++ b/src/backend/optimizer/plan/planagg.c @@ -137,8 +137,8 @@ preprocess_minmax_aggregates(PlannerInfo *root) return; /* - * Scan the tlist and HAVING qual to find all the aggregates and verify - * all are MIN/MAX aggregates. Stop as soon as we find one that isn't. + * Examine all the aggregates and verify all are MIN/MAX aggregates. Stop + * as soon as we find one that isn't. */ aggs_list = NIL; if (!can_minmax_aggs(root, _list)) @@ -227,21 +227,21 @@ preprocess_minmax_aggregates(PlannerInfo *root) /* * can_minmax_aggs - * Walk through all the aggregates in the query, and check - * if they are all MIN/MAX aggregates. If so, build a list of the - * distinct aggregate calls in the tree. + * Examine all the aggregates in the query, and check if they are + * all MIN/MAX aggregates. If so, build a list of MinMaxAggInfo + * nodes for them. * * Returns false if a non-MIN/MAX aggregate is found, true otherwise. - * - * This does not descend into subqueries, and so should be used only after - * reduction of sublinks to subplans. There mustn't be outer-aggregate - * references either. */ static bool can_minmax_aggs(PlannerInfo *root, List **context) { ListCell *lc; + /* + * This function used to have to scan the query for itself, but now we can + * just thumb through the AggInfo list made by preprocess_aggrefs. + */ foreach(lc, root->agginfos) { AggInfo*agginfo = (AggInfo *) lfirst(lc); diff --git a/src/backend/optimizer/prep/prepagg.c b/src/backend/optimizer/prep/prepagg.c index 404a5f1dac..8f4686 100644 --- a/src/backend/optimizer/prep/prepagg.c +++ b/src/backend/optimizer/prep/prepagg.c @@ -229,7 +229,7 @@ preprocess_aggref(Aggref *aggref, PlannerInfo *root) } else { - AggInfo*agginfo = palloc(sizeof(AggInfo)); + AggInfo*agginfo = makeNode(AggInfo); agginfo->finalfn_oid = aggfinalfn; agginfo->representative_aggref = aggref; @@ -266,7 +266,7 @@ preprocess_aggref(Aggref *aggref, PlannerInfo *root) same_input_transnos); if (transno == -1) { - AggTransInfo *transinfo = palloc(sizeof(AggTransInfo)); + AggTransInfo *transinfo = makeNode(AggTransInfo); transinfo->args = aggref->args; transinfo->aggfilter = aggref->aggfilter; @@ -452,7 +452,8 @@ find_compatible_trans(PlannerInfo *root, Aggref *newagg, bool shareable, foreach(lc, transnos) { int transno = lfirst_int(lc); - AggTransInfo *pertrans = (AggTransInfo *) list_nth(root->aggtransinfos, transno); + AggTransInfo *pertrans = (AggTransInfo *) list_nth(root->aggtransinfos, + transno); /* * if the transfns or transition state types are not the same then the diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h index 69ba254372..e650af5ff2 100644 --- a/src/include/nodes/pathnodes.h +++ b/src/include/nodes/pathnodes.h @@ -442,15 +442,15 @@ struct PlannerInfo * Information about aggregates. Filled by preprocess_aggrefs(). */ /* AggInfo structs */ - List *agginfos pg_node_attr(read_write_ignore); + List *agginfos; /* AggTransInfo structs */ - List *aggtransinfos
Re: Transparent column encryption
On Mon, Jul 18, 2022 at 6:53 AM Peter Eisentraut wrote: > I think a way to look at this is that this column encryption feature > isn't suitable for disguising the existence or absence of data, it can > only disguise the particular data that you know exists. +1. Even there, what can be accomplished with a feature that only encrypts individual column values is by nature somewhat limited. If you have a text column that, for one row, stores the value 'a', and for some other row, stores the entire text of Don Quixote in the original Spanish, it is going to be really difficult to keep an adversary who can read from the disk from distinguishing those rows. If you want to fix that, you're going to need to do block-level encryption or something of that sort. And even then, you still have another version of the problem, because now imagine you have one *table* that contains nothing but the value 'a' and another that contains nothing but the entire text of Don Quixote, it is going to be possible for an adversary to tell those tables apart, even with the corresponding files encrypted in their entirety. But I don't think that this means that a feature like this has no value. I think it just means that we need to clearly document how the feature works and not over-promise. -- Robert Haas EDB: http://www.enterprisedb.com
Re: [RFC] building postgres with meson - v10
Hi, On 2022-07-18 11:12:10 +0300, Aleksander Alekseev wrote: > > Just a quick question - is there a reason for changing the subject of > > the emails? > > > > Not all email clients handle this well, e.g. Google Mail considers > > this being 10 separate threads. The CF application and/or > > pgsql-hackers@ archive also don't recognise this as a continuation of > > the original thread. So all the discussions in -v8, -v9, -v9 ets > > threads get lost. > > > > May I suggest using a single thread? > > OK, the part about the archive is wrong - I scrolled right to the end > of the thread, didn't notice v10 patch above and assumed it was lost. > Sorry for the confusion. However, the part about various email clients > is accurate. For me the thread is too long to look through without some separation. I wouldn't do the version in the subject for a small patchset / thread, but at this size I think it's reasonable. Greetings, Andres Freund
Re: doc: mentioned CREATE+ATTACH PARTITION as an alternative to CREATE TABLE..PARTITION OF
On 2022-07-18 Mo 10:33, Justin Pryzby wrote: > It's easy to use CREATE TABLE..LIKE + ALTER..ATTACH PARTITION to avoid > acquiring a strong lock when creating a new partition. > But it's also easy to forget. > > commit 76c0d1198cf2908423b321cd3340d296cb668c8e > Author: Justin Pryzby > Date: Mon Jul 18 09:24:55 2022 -0500 > > doc: mention CREATE+ATTACH PARTITION as an alternative to CREATE > TABLE..PARTITION OF > > See also: 898e5e3290a72d288923260143930fb32036c00c > Should backpatch to v12 > > diff --git a/doc/src/sgml/ref/create_table.sgml > b/doc/src/sgml/ref/create_table.sgml > index 6bbf15ed1a4..db7d8710bae 100644 > --- a/doc/src/sgml/ref/create_table.sgml > +++ b/doc/src/sgml/ref/create_table.sgml > @@ -619,6 +619,16 @@ WITH ( MODULUS class="parameter">numeric_literal, REM >with DROP TABLE requires taking an ACCESS >EXCLUSIVE lock on the parent table. > > + > + > + Note that creating a partition acquires an ACCESS > + EXCLUSIVE lock on the parent table. > + It may be preferable to first CREATE a separate table and then ATTACH > it, > + which does not require as strong of a lock. > + See ATTACH > PARTITION > + and for more information. > + > + > > > Style nitpick. I would prefer "does not require as strong a lock." cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: limits of max, min optimization
po 18. 7. 2022 v 16:29 odesílatel Tom Lane napsal: > Alvaro Herrera writes: > > On 2022-Jul-18, Pavel Stehule wrote: > >> I am trying to fix one slow query, and found that optimization of min, > max > >> functions is possible only when there is no JOIN in the query. > > > See preprocess_minmax_aggregates() in > > src/backend/optimizer/plan/planagg.c > > Maybe it is possible to hack that code so that this case can be handled > > better. > > The comments show this was already thought about: > > * We also restrict the query to reference exactly one table, since > join > * conditions can't be handled reasonably. (We could perhaps handle a > * query containing cartesian-product joins, but it hardly seems worth > the > * trouble.) > > Thank you for reply Regards Pavel > regards, tom lane >
doc: mentioned CREATE+ATTACH PARTITION as an alternative to CREATE TABLE..PARTITION OF
It's easy to use CREATE TABLE..LIKE + ALTER..ATTACH PARTITION to avoid acquiring a strong lock when creating a new partition. But it's also easy to forget. commit 76c0d1198cf2908423b321cd3340d296cb668c8e Author: Justin Pryzby Date: Mon Jul 18 09:24:55 2022 -0500 doc: mention CREATE+ATTACH PARTITION as an alternative to CREATE TABLE..PARTITION OF See also: 898e5e3290a72d288923260143930fb32036c00c Should backpatch to v12 diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml index 6bbf15ed1a4..db7d8710bae 100644 --- a/doc/src/sgml/ref/create_table.sgml +++ b/doc/src/sgml/ref/create_table.sgml @@ -619,6 +619,16 @@ WITH ( MODULUS numeric_literal, REM with DROP TABLE requires taking an ACCESS EXCLUSIVE lock on the parent table. + + + Note that creating a partition acquires an ACCESS + EXCLUSIVE lock on the parent table. + It may be preferable to first CREATE a separate table and then ATTACH it, + which does not require as strong of a lock. + See ATTACH PARTITION + and for more information. + +
Re: limits of max, min optimization
Alvaro Herrera writes: > On 2022-Jul-18, Pavel Stehule wrote: >> I am trying to fix one slow query, and found that optimization of min, max >> functions is possible only when there is no JOIN in the query. > See preprocess_minmax_aggregates() in > src/backend/optimizer/plan/planagg.c > Maybe it is possible to hack that code so that this case can be handled > better. The comments show this was already thought about: * We also restrict the query to reference exactly one table, since join * conditions can't be handled reasonably. (We could perhaps handle a * query containing cartesian-product joins, but it hardly seems worth the * trouble.) regards, tom lane
Re: limits of max, min optimization
On 2022-Jul-18, Pavel Stehule wrote: > Hi > > I am trying to fix one slow query, and found that optimization of min, max > functions is possible only when there is no JOIN in the query. > > Is it true? See preprocess_minmax_aggregates() in src/backend/optimizer/plan/planagg.c > select max(insert_date) from foo join boo on foo.boo_id = boo.id > where foo.item_id = 100 and boo.is_ok Maybe it is possible to hack that code so that this case can be handled better. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/
Re: Use fadvise in wal replay
> On 23 Jun 2022, at 12:50, Jakub Wartak wrote: > > Thoughts? I've looked into the patch one more time. And I propose to change this line + posix_fadvise(readFile, readOff + RACHUNK, RACHUNK, POSIX_FADV_WILLNEED); to + posix_fadvise(readFile, readOff + XLOG_BLCKSZ, RACHUNK, POSIX_FADV_WILLNEED); Currently first 128Kb of the file are not prefetched. But I expect that this change will produce similar performance results. I propose this change only for consistency, so we prefetch all data that we did not prefetch yet and going to read. What do you think? Best regards, Andrey Borodin.
Re: Proposal to introduce a shuffle function to intarray extension
John Naylor writes: > On Mon, Jul 18, 2022 at 2:47 PM Martin Kalcher < > martin.kalc...@aboutsource.net> wrote: >> One more question. How do i pick a Oid for the functions? > For this, we recommend running src/include/catalog/unused_oids, and it will > give you a random range to pick from. That reduces the chance of different > patches conflicting with each other. It doesn't really matter what the oid > here is, since at feature freeze a committer will change them anyway. If you want the nitty gritty details here, see https://www.postgresql.org/docs/devel/system-catalog-initial-data.html#SYSTEM-CATALOG-OID-ASSIGNMENT regards, tom lane
limits of max, min optimization
Hi I am trying to fix one slow query, and found that optimization of min, max functions is possible only when there is no JOIN in the query. Is it true? I need to do manual transformation of query select max(insert_date) from foo join boo on foo.boo_id = boo.id where foo.item_id = 100 and boo.is_ok to select insert_date from foo join boo on foo.boo_id = boo.id where foo.item_id = 100 and boo.is_ok order by insert_date desc limit 1; Regards Pavel
Re: MERGE and parsing with prepared statements
On Fri, Jul 15, 2022 at 03:43:41PM -0500, Justin Pryzby wrote: > Should that sentence be removed from MERGE ? Also, I think these examples should be more similar. doc/src/sgml/ref/merge.sgml > > MERGE INTO CustomerAccount CA > USING RecentTransactions T > ON T.CustomerId = CA.CustomerId > WHEN MATCHED THEN > UPDATE SET Balance = Balance + TransactionValue > WHEN NOT MATCHED THEN > INSERT (CustomerId, Balance) > VALUES (T.CustomerId, T.TransactionValue); > > > > >Notice that this would be exactly equivalent to the following >statement because the MATCHED result does not change >during execution. > > > MERGE INTO CustomerAccount CA > USING (Select CustomerId, TransactionValue From RecentTransactions) AS T > ON CA.CustomerId = T.CustomerId > WHEN NOT MATCHED THEN > INSERT (CustomerId, Balance) > VALUES (T.CustomerId, T.TransactionValue) > WHEN MATCHED THEN > UPDATE SET Balance = Balance + TransactionValue; > > The "ON" lines can be the same. The "MATCHED" can be in the same order. -- Justin
Re: Commitfest Update
On Mon, Jul 18, 2022 at 03:05:51PM +0200, Alvaro Herrera wrote: > Maybe we should have two reviewers columns -- one for history-tracking > purposes (and commit msg credit) and another for current ones. Maybe. Or, the list of reviewers shouldn't be shown prominently in the list of patches. But changing that would currently break cfbot's web scraping. -- Justin
Re: Commitfest Update
Maybe we should have two reviewers columns -- one for history-tracking purposes (and commit msg credit) and another for current ones. Personally, I don't use the CF app when building reviewer lists. I scan the threads instead. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/