Re: [PATCH] json_lex_string: don't overread on bad UTF8
On Tue, Apr 30, 2024 at 10:39:04AM -0700, Jacob Champion wrote: > When json_lex_string() hits certain types of invalid input, it calls > pg_encoding_mblen_bounded(), which assumes that its input is > null-terminated and calls strnlen(). But the JSON lexer is constructed > with an explicit string length, and we don't ensure that the string is > null-terminated in all cases, so we can walk off the end of the > buffer. This isn't really relevant on the server side, where you'd > have to get a superuser to help you break string encodings, but for > client-side usage on untrusted input (such as my OAuth patch) it would > be more important. Not sure to like much the fact that this advances token_terminator first. Wouldn't it be better to calculate pg_encoding_mblen() first, then save token_terminator? I feel a bit uneasy about saving a value in token_terminator past the end of the string. That a nit in this context, still.. > Attached is a draft patch that explicitly checks against the > end-of-string pointer and clamps the token_terminator to it. Note that > this removes the only caller of pg_encoding_mblen_bounded() and I'm > not sure what we should do with that function. It seems like a > reasonable API, just not here. Hmm. Keeping it around as currently designed means that it could cause more harm than anything in the long term, even in the stable branches if new code uses it. There is a risk of seeing this new code incorrectly using it again, even if its top comment tells that we rely on the string being nul-terminated. A safer alternative would be to redesign it so as the length of the string is provided in input, removing the dependency of strlen in its internals, perhaps. Anyway, without any callers of it, I'd be tempted to wipe it from HEAD and call it a day. > The new test needs to record two versions of the error message, one > for invalid token and one for invalid escape sequence. This is > because, for smaller chunk sizes, the partial-token logic in the > incremental JSON parser skips the affected code entirely when it can't > find an ending double-quote. Ah, that makes sense. That looks OK here. A comment around the test would be adapted to document that, I guess. > Tangentially: Should we maybe rethink pieces of the json_lex_string > error handling? For example, do we really want to echo an incomplete > multibyte sequence once we know it's bad? It also looks like there are > places where the FAIL_AT_CHAR_END macro is called after the `s` > pointer has already advanced past the code point of interest. I'm not > sure if that's intentional. Advancing the tracking pointer 's' before reporting an error related the end of the string is a bad practive, IMO, and we should avoid that. json_lex_string() does not offer a warm feeling regarding that with escape characters, at least :/ -- Michael signature.asc Description: PGP signature
Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?
On Sat, Apr 27, 2024 at 08:33:55PM +0200, Daniel Gustafsson wrote: > > On 27 Apr 2024, at 20:32, Daniel Gustafsson wrote: > > > That's a good point, there is potential for more code removal here. The > > attached 0001 takes a stab at it while it's fresh in mind, I'll revisit > > before > > the July CF to see if there is more that can be done. > > ..and again with the attachment. Not enough coffee. My remark was originally about pq_init_crypto_lib that does the locking initialization, and your new patch a bit more, as of: -/* This stuff need be done only once. */ -if (!SSL_initialized) -{ -#ifdef HAVE_OPENSSL_INIT_SSL -OPENSSL_init_ssl(OPENSSL_INIT_LOAD_CONFIG, NULL); -#else -OPENSSL_config(NULL); -SSL_library_init(); -SSL_load_error_strings(); -#endif -SSL_initialized = true; -} OPENSSL_init_ssl() has replaced SSL_library_init(), marked as deprecated, and even this step is mentioned as not required anymore with 1.1.0~: https://www.openssl.org/docs/man1.1.1/man3/OPENSSL_init_ssl.html Same with OPENSSL_init_crypto(), replacing OPENSSL_config(), again not required in 1.1.0~: https://www.openssl.org/docs/man1.1.1/man3/OPENSSL_init_crypto.html SSL_load_error_strings() is recommended as not to use in 1.1.0, replaced by the others: https://www.openssl.org/docs/man3.2/man3/SSL_load_error_strings.html While OpenSSL will be able to cope with that, how much of that applies to LibreSSL? SSL_load_error_strings(), OPENSSL_init_ssl(), OPENSSL_CONFIG() are OK based on the docs: https://man.archlinux.org/man/extra/libressl/libressl-OPENSSL_config.3.en https://man.archlinux.org/man/extra/libressl/libressl-OPENSSL_init_ssl.3.en https://man.archlinux.org/man/extra/libressl/libressl-ERR_load_crypto_strings.3.en So +1 to remove all this code after a closer lookup. I would recommend to update the documentation of PQinitSSL and PQinitOpenSSL to tell that these become useless and are deprecated. ERR_clear_error(); - #ifdef USE_RESOWNER_FOR_HMAC Some noise diff. -- Michael signature.asc Description: PGP signature
Re: partitioning and identity column
On Tue, Apr 30, 2024 at 04:29:11PM +0530, Ashutosh Bapat wrote: > PFA patch which fixes all the three problems. Please note that this was not tracked as an open item, so I have added one referring to the failures reported by Alexander. -- Michael signature.asc Description: PGP signature
Re: Fix parallel vacuum buffer usage reporting
Hi! On 30.04.2024 05:18, Masahiko Sawada wrote: On Fri, Apr 26, 2024 at 9:12 PM Alena Rybakina wrote: Hi! The same script was run, but using vacuum verbose analyze, and I saw the difference again in the fifth step: with your patch: buffer usage: 32312 hits, 607 misses, 1566 dirtied master: buffer usage: 32346 hits, 573 misses, 1360 dirtied Isn't there a chance for the checkpointer to run during this time? That could make the conditions between the two runs slightly different and explain the change in buffer report. [0]https://github.com/postgres/postgres/blob/8a1b31e6e59631807a08a4e9465134c343bbdf5e/src/backend/access/heap/vacuumlazy.c#L2826-L2831 Looking at the script, you won't trigger the problem. Thank you for the link I accounted it in my next experiments. I repeated the test without processing checkpoints with a single index, and the number of pages in the buffer used almost matched: master branch: buffer usage: 32315 hits, 606 misses, 4486 dirtied with applied patch v4 version: buffer usage: 32315 hits, 606 misses, 4489 dirtied I think you are right - the problem was interfering with the checkpoint process, by the way I checked the first version patch. To cut a long story short, everything is fine now with one index. Just in case, I'll explain: I considered this case because your patch could have impact influenced it too. On 25.04.2024 10:17, Anthonin Bonnefoy wrote: On Wed, Apr 24, 2024 at 4:01 PM Alena Rybakina wrote: I tested the main postgres branch with and without your fix using a script that was written by me. It consists of five scenarios and I made a comparison in the logs between the original version of the master branch and the master branch with your patch: Hi! Thanks for the tests. I have attached a test file (vacuum_check_logs.sql) The reporting issue will only happen if there's a parallel index vacuum and it will only happen if there's at least 2 indexes [0]. You will need to create an additional index. Speaking of the problem, I added another index and repeated the test and found a significant difference: I found it when I commited the transaction (3): master: 2964 hits, 0 misses, 0 dirtied with applied patch v4 version: buffer usage: 33013 hits, 0 misses, 3 dirtied When I deleted all the data from the table and later started vacuum verbose again (4): master: buffer usage: 51486 hits, 0 misses, 0 dirtied with applied patch v4 version:buffer usage: 77924 hits, 0 misses, 0 dirtied when I inserted 1 million data into the table and updated it (5): master:buffer usage: 27904 hits, 5021 misses, 1777 dirtied with applied patch v4 version:buffer usage: 41051 hits, 9973 misses, 2564 dirtied As I see, the number of pages is significantly more than it was in the master branch and ,frankly, I couldn't fully figure out if it was a mistake or not. I think that the patch fixes the problem correctly. I've run pgindent and updated the commit message. I realized that parallel vacuum was introduced in pg13 but buffer usage reporting in VACUUM command was implemented in pg15. Therefore, in pg13 and pg14, VACUUM (PARALLEL) is available but VACUUM (PARALLEL, VERBOSE) doesn't show the buffer usage report. Autovacuum does show the buffer usage report but parallel autovacuum is not supported. Therefore, we should backpatch it down to 15, not 13. I'm going to push the patch down to pg15, barring any objections. Regards, I agree with you about porting and I saw that the patch is working correctly. -- Regards, Alena Rybakina Postgres Professional:http://www.postgrespro.com The Russian Postgres Company
Re: pg_sequence_last_value() for unlogged sequences on standbys
On Tue, Apr 30, 2024 at 09:05:31PM -0500, Nathan Bossart wrote: > This ended up being easier than I expected. While unlogged sequences are > only supported on v15 and above, temporary sequences have been around since > v7.2, so this will probably need to be back-patched to all supported > versions. Unlogged and temporary relations cannot be accessed during recovery, so I'm OK with your change to force a NULL for both relpersistences. However, it seems to me that you should also drop the pg_is_other_temp_schema() in system_views.sql for the definition of pg_sequences. Doing that on HEAD now would be OK, but there's nothing urgent to it so it may be better done once v18 opens up. Note that pg_is_other_temp_schema() is only used for this sequence view, which is a nice cleanup. By the way, shouldn't we also change the function to return NULL for a failed permission check? It would be possible to remove the has_sequence_privilege() as well, thanks to that, and a duplication between the code and the function view. I've been looking around a bit, noticing one use of this function in check_pgactivity (nagios agent), and its query also has a has_sequence_privilege() so returning NULL would simplify its definition in the long-run. I'd suspect other monitoring queries to do something similar to bypass permission errors. > The added test case won't work for v12-v14 since it uses an > unlogged sequence. That would require a BackgroundPsql to maintain the connection to the primary, so not having a test is OK by me. -- Michael signature.asc Description: PGP signature
Re: Fix parallel vacuum buffer usage reporting
On Tue, Apr 30, 2024 at 3:34 PM Anthonin Bonnefoy wrote: > > I've done some additional tests to validate the reported numbers. Using > pg_statio, it's possible to get the minimum number of block hits (Full script > attached). > > -- Save block hits before vacuum > SELECT pg_stat_force_next_flush(); > SELECT heap_blks_hit, idx_blks_hit FROM pg_statio_all_tables where > relname='vestat' \gset > vacuum (verbose, index_cleanup on) vestat; > -- Check the difference > SELECT pg_stat_force_next_flush(); > SELECT heap_blks_hit - :heap_blks_hit as delta_heap_hit, >idx_blks_hit - :idx_blks_hit as delta_idx_hit, >heap_blks_hit - :heap_blks_hit + idx_blks_hit - :idx_blks_hit as sum > FROM pg_statio_all_tables where relname='vestat'; > > Output: > ... > buffer usage: 14676 hits, 0 misses, 667 dirtied > buffer usage (new): 16081 hits, 0 misses, 667 dirtied > ... > -[ RECORD 1 ]--+-- > delta_heap_hit | 9747 > delta_idx_hit | 6325 > sum| 16072 > > From pg_statio, we had 16072 blocks for the relation + indexes. > Pre-patch, we are under reporting with 14676. > Post-patch, we have 16081. The 9 additional block hits come from vacuum > accessing catalog tables like pg_class or pg_class_oid_index. > Thank you for further testing! I've pushed the patch. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: Streaming I/O, vectored I/O (WIP)
On Wed, May 1, 2024 at 2:51 PM David Rowley wrote: > On Wed, 24 Apr 2024 at 14:32, David Rowley wrote: > > I've attached a patch with a few typo fixes and what looks like an > > incorrect type for max_ios. It's an int16 and I think it needs to be > > an int. Doing "max_ios = Min(max_ios, PG_INT16_MAX);" doesn't do > > anything when max_ios is int16. > > No feedback, so I'll just push this in a few hours unless anyone has anything. Patch looks correct, thanks. Please do. (Sorry, running a bit behind on email ATM... I also have a few more typos around here from an off-list email from Mr Lakhin, will get to that soon...)
Re: query_id, pg_stat_activity, extended query protocol
Here is a new rev of the patch which deals with the scenario mentioned by Andrei [1] in which the queryId may change due to a cached query invalidation. [1] https://www.postgresql.org/message-id/724348C9-8023-41BC-895E-80634E79A538%40amazon.com Regards, Sami 0001-v2-Fix-Extended-Query-Protocol-handling-of-queryId.patch Description: 0001-v2-Fix-Extended-Query-Protocol-handling-of-queryId.patch
Re: add tab-complete for memory, serialize option and other minor issues.
On Sat, Apr 27, 2024 at 11:15:47AM -0400, Tom Lane wrote: > https://www.postgresql.org/message-id/3870833.1712696581%40sss.pgh.pa.us > > Post-feature-freeze is no time to be messing with behavior as basic > as WORD_BREAKS, though. Indeed. By the way, that psql completion patch has fallen through the cracks and I don't see a point in waiting for that, so I have applied it. Note that the patch did not order the options according to the docs, which was consistent on HEAD but not anymore with the patch. -- Michael signature.asc Description: PGP signature
Re: Streaming I/O, vectored I/O (WIP)
On Wed, 24 Apr 2024 at 14:32, David Rowley wrote: > I've attached a patch with a few typo fixes and what looks like an > incorrect type for max_ios. It's an int16 and I think it needs to be > an int. Doing "max_ios = Min(max_ios, PG_INT16_MAX);" doesn't do > anything when max_ios is int16. No feedback, so I'll just push this in a few hours unless anyone has anything. David
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
On Wed, May 1, 2024 at 2:29 AM Tom Lane wrote: > > Alexander Korotkov writes: > > I agree that storing char signedness might seem weird. But it appears > > that we already store indexes that depend on char signedness. So, > > it's effectively property of bits-on-disk even though it affects > > indirectly. Then I see two options to make the picture consistent. > > 1) Assume that char signedness is somehow a property of bits-on-disk > > even though it's weird. Then pg_trgm indexes are correct, but we need > > to store char signedness in pg_control. > > 2) Assume that char signedness is not a property of bits-on-disk. > > Then pg_trgm indexes are buggy and need to be fixed. > > What do you think? > > The problem with option (2) is the assumption that pg_trgm's behavior > is the only bug of this kind, either now or in the future. I think > that's just about an impossible standard to meet, because there's no > realistic way to test whether char signedness is affecting things. > (Sure, you can compare results across platforms, but maybe you > just didn't test the right case.) > > Also, the bigger picture here is the seeming assumption that "if > we change pg_trgm then it will be safe to replicate from x86 to > arm". I don't believe that that's a good idea and I'm unwilling > to promise that it will work, regardless of what we do about > char signedness. That being the case, I don't want to invest a > lot of effort in the signedness issue. I think that the char signedness issue is an issue also for developers (and extension authors) since it could lead to confusion and potential bugs in the future due to that. x86 developers would think of char as always being signed and write code that will misbehave on arm machines. For example, since logical replication should behave correctly even in cross-arch replication all developers need to be aware of that. I thought of using the -funsigned-char (or -fsigned-char) compiler flag to avoid that but it would have a broader impact not only on indexes. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.
Hi Noah, On Wed, May 1, 2024 at 5:24 AM Noah Misch wrote: > On Thu, Apr 25, 2024 at 04:59:54PM +0400, Pavel Borisov wrote: > > 0001: Optimize speed by avoiding heap visibility checking for different > > non-deduplicated index tuples as proposed by Noah Misch > > > > Speed measurements on my laptop using the exact method recommended by Noah > > upthread: > > Current master branch: checkunique off: 144s, checkunique on: 419s > > With patch 0001: checkunique off: 141s, checkunique on: 171s > > Where is the CPU time going to make it still be 21% slower w/ checkunique on? > It's a great improvement vs. current master, but I don't have an obvious > explanation for the remaining +21%. I think there is at least extra index tuples comparison. -- Regards, Alexander Korotkov
Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.
On Thu, Apr 25, 2024 at 04:59:54PM +0400, Pavel Borisov wrote: > 0001: Optimize speed by avoiding heap visibility checking for different > non-deduplicated index tuples as proposed by Noah Misch > > Speed measurements on my laptop using the exact method recommended by Noah > upthread: > Current master branch: checkunique off: 144s, checkunique on: 419s > With patch 0001: checkunique off: 141s, checkunique on: 171s Where is the CPU time going to make it still be 21% slower w/ checkunique on? It's a great improvement vs. current master, but I don't have an obvious explanation for the remaining +21%.
Re: pg_sequence_last_value() for unlogged sequences on standbys
On Tue, Apr 30, 2024 at 08:13:17PM -0500, Nathan Bossart wrote: > Good point. I'll work on a patch along these lines, then. This ended up being easier than I expected. While unlogged sequences are only supported on v15 and above, temporary sequences have been around since v7.2, so this will probably need to be back-patched to all supported versions. The added test case won't work for v12-v14 since it uses an unlogged sequence. I'm not sure it's worth constructing a test case for temporary sequences. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 71008f13da88f41a205e0643129162df9d2ebc81 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Tue, 30 Apr 2024 20:54:51 -0500 Subject: [PATCH v1 1/1] Fix pg_sequence_last_value() for non-permanent sequences on standbys. --- src/backend/commands/sequence.c | 18 +- src/test/recovery/t/001_stream_rep.pl | 8 2 files changed, 21 insertions(+), 5 deletions(-) diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c index 46103561c3..659d2ad4fc 100644 --- a/src/backend/commands/sequence.c +++ b/src/backend/commands/sequence.c @@ -1780,7 +1780,7 @@ pg_sequence_last_value(PG_FUNCTION_ARGS) Buffer buf; HeapTupleData seqtuple; Form_pg_sequence_data seq; - bool is_called; + bool is_called = false; int64 result; /* open and lock sequence */ @@ -1792,12 +1792,20 @@ pg_sequence_last_value(PG_FUNCTION_ARGS) errmsg("permission denied for sequence %s", RelationGetRelationName(seqrel; - seq = read_seq_tuple(seqrel, &buf, &seqtuple); + /* + * For the benefit of the pg_sequences system view, we return NULL for + * temporary and unlogged sequences on standbys instead of throwing an + * error. + */ + if (RelationIsPermanent(seqrel) || !RecoveryInProgress()) + { + seq = read_seq_tuple(seqrel, &buf, &seqtuple); - is_called = seq->is_called; - result = seq->last_value; + is_called = seq->is_called; + result = seq->last_value; - UnlockReleaseBuffer(buf); + UnlockReleaseBuffer(buf); + } sequence_close(seqrel, NoLock); if (is_called) diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl index 5311ade509..4c698b5ce1 100644 --- a/src/test/recovery/t/001_stream_rep.pl +++ b/src/test/recovery/t/001_stream_rep.pl @@ -95,6 +95,14 @@ $result = $node_standby_2->safe_psql('postgres', "SELECT * FROM seq1"); print "standby 2: $result\n"; is($result, qq(33|0|t), 'check streamed sequence content on standby 2'); +# Check pg_sequence_last_value() returns NULL for unlogged sequence on standby +$node_primary->safe_psql('postgres', + "CREATE UNLOGGED SEQUENCE ulseq; SELECT nextval('ulseq')"); +$node_primary->wait_for_replay_catchup($node_standby_1); +is($node_standby_1->safe_psql('postgres', + "SELECT pg_sequence_last_value('ulseq'::regclass) IS NULL"), + 't', 'pg_sequence_last_value() on unlogged sequence on standby 1'); + # Check that only READ-only queries can run on standbys is($node_standby_1->psql('postgres', 'INSERT INTO tab_int VALUES (1)'), 3, 'read-only queries on standby 1'); -- 2.25.1
Re: pg_sequence_last_value() for unlogged sequences on standbys
On Tue, Apr 30, 2024 at 09:06:04PM -0400, Tom Lane wrote: > Nathan Bossart writes: >> If you create an unlogged sequence on a primary, pg_sequence_last_value() >> for that sequence on a standby will error like so: >> postgres=# select pg_sequence_last_value('test'::regclass); >> ERROR: could not open file "base/5/16388": No such file or directory > >> As pointed out a few years ago [0], this function is undocumented, so >> there's no stated contract to uphold. I lean towards just returning NULL >> because that's what we'll have to put in the relevant pg_sequences field >> anyway, but I can see an argument for fixing the ERROR to align with what >> you see when you try to access unlogged relations on a standby (i.e., >> "cannot access temporary or unlogged relations during recovery"). > > Yeah, I agree with putting that logic into the function. Putting > such conditions into the SQL of a system view is risky because it > is really, really painful to adjust the SQL in a released version. > You could back-patch a fix for this if done at the C level, but > I doubt we'd go to the trouble if it's done in the view. Good point. I'll work on a patch along these lines, then. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: pg_sequence_last_value() for unlogged sequences on standbys
Nathan Bossart writes: > If you create an unlogged sequence on a primary, pg_sequence_last_value() > for that sequence on a standby will error like so: > postgres=# select pg_sequence_last_value('test'::regclass); > ERROR: could not open file "base/5/16388": No such file or directory > As pointed out a few years ago [0], this function is undocumented, so > there's no stated contract to uphold. I lean towards just returning NULL > because that's what we'll have to put in the relevant pg_sequences field > anyway, but I can see an argument for fixing the ERROR to align with what > you see when you try to access unlogged relations on a standby (i.e., > "cannot access temporary or unlogged relations during recovery"). Yeah, I agree with putting that logic into the function. Putting such conditions into the SQL of a system view is risky because it is really, really painful to adjust the SQL in a released version. You could back-patch a fix for this if done at the C level, but I doubt we'd go to the trouble if it's done in the view. regards, tom lane
pg_sequence_last_value() for unlogged sequences on standbys
If you create an unlogged sequence on a primary, pg_sequence_last_value() for that sequence on a standby will error like so: postgres=# select pg_sequence_last_value('test'::regclass); ERROR: could not open file "base/5/16388": No such file or directory This function is used by the pg_sequences system view, which fails with the same error on standbys. The two options I see are: * Return a better ERROR and adjust pg_sequences to avoid calling this function for unlogged sequences on standbys. * Return NULL from pg_sequence_last_value() if called for an unlogged sequence on a standby. As pointed out a few years ago [0], this function is undocumented, so there's no stated contract to uphold. I lean towards just returning NULL because that's what we'll have to put in the relevant pg_sequences field anyway, but I can see an argument for fixing the ERROR to align with what you see when you try to access unlogged relations on a standby (i.e., "cannot access temporary or unlogged relations during recovery"). Thoughts? [0] https://postgr.es/m/CAAaqYe8JL8Et2DoO0RRjGaMvy7-C6eDH-2wHXK-gp3dOssvBkQ%40mail.gmail.com -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: TLS certificate alternate trust paths issue in libpq - certificate chain validation failing
On Tue, Apr 30, 2024 at 2:41 PM Thomas Spear wrote: > The full details can be found at github.com/pgjdbc/pgjdbc/discussions/3236 - > in summary, both jdbc-postgres and the psql cli seem to be affected by an > issue validating the certificate chain up to a publicly trusted root > certificate that has cross-signed an intermediate certificate coming from a > Postgres server in Azure, when using sslmode=verify-full and trying to rely > on the default path for sslrootcert. Hopefully someone more familiar with the Azure cross-signing setup sees something obvious and chimes in, but in the meantime there are a couple things I can think to ask: 1. Are you sure that the server is actually putting the cross-signed intermediate in the chain it's serving to the client? 2. What version of OpenSSL? There used to be validation bugs with alternate trust paths; hopefully you're not using any of those (I think they're old as dirt), but it doesn't hurt to know. 3. Can you provide a sample public certificate chain that should validate and doesn't? Thanks, --Jacob
TLS certificate alternate trust paths issue in libpq - certificate chain validation failing
Hello, I've recently joined the list on a tip from one of the maintainers of jdbc-postgres as I wanted to discuss an issue we've run into and find out if the fix we've worked out is the right thing to do, or if there is actually a bug that needs to be fixed. The full details can be found at github.com/pgjdbc/pgjdbc/discussions/3236 - in summary, both jdbc-postgres and the psql cli seem to be affected by an issue validating the certificate chain up to a publicly trusted root certificate that has cross-signed an intermediate certificate coming from a Postgres server in Azure, when using sslmode=verify-full and trying to rely on the default path for sslrootcert. The workaround we came up with is to add the original root cert, not the root that cross-signed the intermediate, to root.crt, in order to avoid needing to specify sslrootcert=. This allows the full chain to be verified. I believe that either one should be able to be placed there without me needing to explicitly specify sslrootcert=, but if I use the CA that cross-signed the intermediate cert, and don't specify sslrootcert= the chain validation fails. Thank you, Thomas
pg_parse_json() should not leak token copies on failure
Hi, When a client of our JSON parser registers semantic action callbacks, the parser will allocate copies of the lexed tokens to pass into those callbacks. The intent is for the callbacks to always take ownership of the copied memory, and if they don't want it then they can pfree() it. However, if parsing fails on the token before the callback is invoked, that allocation is leaked. That's not a problem for the server side, or for clients that immediately exit on parse failure, but if the JSON parser gets added to libpq for OAuth, it will become more of a problem. (I'm building a fuzzer to flush out these sorts of issues; not only does it complain loudly about the leaks, but the accumulation of leaked memory puts a hard limit on how long a fuzzer run can last. :D) Attached is a draft patch to illustrate what I mean, but it's incomplete: it only solves the problem for scalar values. We also make a copy of object field names, which is much harder to fix, because we make only a single copy and pass that to both the object_field_start and object_field_end callbacks. Consider: - If a client only implements object_field_start, it takes ownership of the field name when we call it. It can free the allocation if it decides that the field is irrelevant. - The same is true for clients that only implement object_field_end. - If a client implements both callbacks, it takes ownership of the field name when we call object_field_start. But irrelevant field names can't be freed during that callback, because the same copy is going to be passed to object_field_end. And object_field_end isn't guaranteed to be called, because parsing could fail for any number of reasons between now and then. So what code should be responsible for cleanup? The parser doesn't know whether the callback already freed the pointer or kept a reference to it in semstate. Any thoughts on how we can improve this? I was thinking we could maybe make two copies of the field name and track ownership individually? Thanks, --Jacob 0001-WIP-fix-leak-of-scalar-value-on-lex-failure.patch Description: Binary data
Re: Add SPLIT PARTITION/MERGE PARTITIONS commands
Hi! 30.04.2024 6:00, Alexander Lakhin пишет: Maybe I'm doing something wrong, but the following script: CREATE TABLE t (i int, PRIMARY KEY(i)) PARTITION BY RANGE (i); CREATE TABLE tp_0 PARTITION OF t FOR VALUES FROM (0) TO (1); CREATE TABLE tp_1 PARTITION OF t FOR VALUES FROM (1) TO (2); CREATE TABLE t2 (LIKE t INCLUDING ALL); CREATE TABLE tp2 (LIKE tp_0 INCLUDING ALL); creates tables t2, tp2 without not-null constraints. To create partitions is used the "CREATE TABLE ... LIKE ..." command with the "EXCLUDING INDEXES" modifier (to speed up the insertion of values). CREATE TABLE t (i int, PRIMARY KEY(i)) PARTITION BY RANGE(i); CREATE TABLE t2 (LIKE t INCLUDING ALL EXCLUDING INDEXES EXCLUDING IDENTITY); \d+ t2; ... Not-null constraints: "t2_i_not_null" NOT NULL "i" Access method: heap [1] https://github.com/postgres/postgres/blob/d12b4ba1bd3eedd862064cf1dad5ff107c5cba90/src/backend/commands/tablecmds.c#L21215 -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Re: Build with meson + clang + sanitizer resulted in undefined reference
> On Thu, Apr 25, 2024 at 06:38:58PM +0300, Maxim Orlov wrote: > > And then upon build I've got overwhelmed by thousands of undefined > reference errors. > > fe-auth-scram.c:(.text+0x17a): undefined reference to > `__ubsan_handle_builtin_unreachable' > /usr/bin/ld: fe-auth-scram.c:(.text+0x189): undefined reference to > `__ubsan_handle_type_mismatch_v1_abort' > /usr/bin/ld: fe-auth-scram.c:(.text+0x195): undefined reference to > `__ubsan_handle_type_mismatch_v1_abort' > /usr/bin/ld: fe-auth-scram.c:(.text+0x1a1): undefined reference to > `__ubsan_handle_type_mismatch_v1_abort' > /usr/bin/ld: fe-auth-scram.c:(.text+0x1ad): undefined reference to > `__ubsan_handle_type_mismatch_v1_abort' > /usr/bin/ld: fe-auth-scram.c:(.text+0x1b9): undefined reference to > `__ubsan_handle_type_mismatch_v1_abort' Seems to be a meson quirk [1]. I could reproduce this, and adding -Db_lundef=false on top of your configuration solved the issue. [1]: https://github.com/mesonbuild/meson/issues/3853
Re: Add SPLIT PARTITION/MERGE PARTITIONS commands
On Thu, Apr 11, 2024 at 08:00:00PM +0300, Alexander Lakhin wrote: > 11.04.2024 16:27, Dmitry Koval wrote: > > > > Added correction (and test), see > > v3-0001-Fix-for-SPLIT-MERGE-partitions-of-temporary-table.patch. > > Thank you for the correction, but may be an attempt to merge into implicit > pg_temp should fail just like CREATE TABLE ... PARTITION OF ... does? > > Please look also at another anomaly with schemas: > CREATE SCHEMA s1; > CREATE TABLE t (i int) PARTITION BY RANGE (i); > CREATE TABLE tp_0_2 PARTITION OF t > FOR VALUES FROM (0) TO (2); > ALTER TABLE t SPLIT PARTITION tp_0_2 INTO > (PARTITION s1.tp0 FOR VALUES FROM (0) TO (1), PARTITION s1.tp1 FOR VALUES > FROM (1) TO (2)); > results in: > \d+ s1.* > Did not find any relation named "s1.*" > \d+ tp* > Table "public.tp0" Hi, Is this issue already fixed ? I wasn't able to reproduce it. Maybe it only happened with earlier patch versions applied ? Thanks, -- Justin
Re: tablecmds.c/MergeAttributes() cleanup
On Tue, Apr 30, 2024 at 2:19 AM Ashutosh Bapat wrote: > On Mon, Apr 29, 2024 at 6:46 PM Robert Haas wrote: >> On Sat, Apr 20, 2024 at 12:17 AM Ashutosh Bapat >> wrote: >> > Yes please. Probably this issue surfaced again after we reverted >> > compression and storage fix? Please If that's the case, please add it to >> > the open items. >> >> This is still on the open items list and I'm not clear who, if anyone, >> is working on fixing it. >> >> It would be good if someone fixed it. :-) > > Here's a patch fixing it. > > I have added the reproducer provided by Alexander as a test. I thought of > improving that test further to test the compression of the inherited table > but did not implement it since we haven't documented the behaviour of > compression with inheritance. Defining and implementing compression behaviour > for inherited tables was the goal of > 0413a556990ba628a3de8a0b58be020fd9a14ed0, which has been reverted. I took a look at this patch. Currently this case crashes: CREATE TABLE cmdata(f1 text COMPRESSION pglz); CREATE TABLE cmdata3(f1 text); CREATE TABLE cminh() INHERITS (cmdata, cmdata3); The patch makes this succeed, but I was initially unclear why it didn't make it fail with an error instead: you can argue that cmdata has pglz and cmdata3 has default and those are different. It seems that prior precedent goes both ways -- we treat the absence of a STORAGE specification as STORAGE EXTENDED and it conflicts with an explicit storage specification on some other inheritance parent - but on the other hand, we treat the absence of a default as compatible with any explicit default, similar to what happens here. But I eventually realized that you're just putting back behavior that we had in previous releases: pre-v17, the code already works the way this patch makes it do, and MergeChildAttribute() is already coded similar to this. As Alexander Lakhin said upthread, 4d969b2f8 seems to have broken this. So now I think this is committable, but I can't do it now because I won't be around for the next few hours in case the buildfarm blows up. I can do it tomorrow, or perhaps Peter would like to handle it since it seems to have been his commit that introduced the issue. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Support tid range scan in parallel?
Hi David Thank you for your reply. > From a CPU point of view, I'd hard to imagine that a SELECT * query > without any other items in the WHERE clause other than the TID range > quals would run faster with multiple workers than with 1. The problem > is the overhead of pushing tuples to the main process often outweighs > the benefits of the parallelism. However, from an I/O point of view > on a server with slow enough disks, I can imagine there'd be a > speedup. yeah, this is generally true. With everything set to default, the planner would not choose parallel sequential scan if the scan range covers mostly all tuples of a table (to reduce the overhead of pushing tuples to main proc as you mentioned). It is preferred when the target data is small but the table is huge. In my case, it is also the same, the planner by default uses normal tid range scan, so I had to alter cost parameters to influence the planner's decision. This is where I found that with WHERE clause only containing TID ranges that cover the entire table would result faster with parallel workers, at least in my environment. > Of course, it may be beneficial to have parallel TID Range for other > cases when more row filtering or aggregation is being done as that > requires pushing fewer tuples over from the parallel worker to the > main process. It just would be good to get to the bottom of if there's > still any advantage to parallelism when no filtering other than the > ctid quals is being done now that we've less chance of having to wait > for I/O coming from disk with the read streams code. I believe so too. I shared my test procedure below with ctid being the only quals. >> below is the timing to complete a select query covering all the records in a >> simple 2-column table with 40 million records, >> >> - tid range scan takes 10216ms >> - tid range scan with 2 workers takes 7109ms >> - sequential scan with 2 workers takes 8499ms > > Can you share more details about this test? i.e. the query, what the > times are that you've measured (EXPLAIN ANALYZE, or SELECT, COPY?). > Also, which version/commit did you patch against? I was wondering if > the read stream code added in v17 would result in the serial case > running faster because the parallelism just resulted in more I/O > concurrency. Yes of course. These numbers were obtained earlier this year on master with the patch applied most likely without the read stream code you mentioned. The patch attached here is rebased to commit dd0183469bb779247c96e86c2272dca7ff4ec9e7 on master, which is quite recent and should have the read stream code for v17 as I can immediately tell that the serial scans run much faster now in my setup. I increased the records on the test table from 40 to 100 million because serial scans are much faster now. Below is the summary and details of my test. Note that I only include the EXPLAIN ANALYZE details of round1 test. Round2 is the same except for different execution times. [env] - OS: Ubuntu 18.04 - CPU: 4 cores @ 3.40 GHz - MEM: 16 GB [test table setup] initdb with all default values CREATE TABLE test (a INT, b TEXT); INSERT INTO test VALUES(generate_series(1,1), 'testing'); SELECT min(ctid), max(ctid) from test; min | max ---+-- (0,1) | (540540,100) (1 row) [summary] round 1: tid range scan: 14915ms tid range scan 2 workers: 12265ms seq scan with 2 workers: 12675ms round2: tid range scan: 12112ms tid range scan 2 workers: 10649ms seq scan with 2 workers: 11206ms [details of EXPLAIN ANALYZE below] [default tid range scan] EXPLAIN ANALYZE SELECT a FROM test WHERE ctid >= '(1,0)' AND ctid <= '(540540,100)'; QUERY PLAN Tid Range Scan on test (cost=0.01..1227029.81 rows=68648581 width=4) (actual time=0.188..12280.791 rows=9815 loops=1) TID Cond: ((ctid >= '(1,0)'::tid) AND (ctid <= '(540540,100)'::tid)) Planning Time: 0.817 ms Execution Time: 14915.035 ms (4 rows) [parallel tid range scan with 2 workers] set parallel_setup_cost=0; set parallel_tuple_cost=0; set min_parallel_table_scan_size=0; set max_parallel_workers_per_gather=2; EXPLAIN ANALYZE SELECT a FROM test WHERE ctid >= '(1,0)' AND ctid <= '(540540,100)'; QUERY PLAN - Gather (cost=0.01..511262.43 rows=68648581 width=4) (actual time=1.322..9249.197 rows=9815 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Tid Range Scan on test (cost=0.01..511262.43 rows=28603575 width=4) (actual time=0.332..4906.262 rows=3272 loops=3) TID Cond: ((ctid >= '(1,0)'::tid) AND (ctid <= '(540540,100)'::tid)) Planning Time: 0.213 ms
Re: pg17 issues with not-null contraints
On Tue, Apr 30, 2024 at 01:52:02PM -0400, Robert Haas wrote: > On Thu, Apr 18, 2024 at 12:52 PM Justin Pryzby wrote: > > I'm not totally clear on what's intended in v17 - maybe it'd be dead > > code, and maybe it shouldn't even be applied to master branch. But I do > > think it's worth patching earlier versions (even though it'll be less > > useful than having done so 5 years ago). > > This thread is still on the open items list, but I'm not sure whether > there's still stuff here that needs to be fixed for the current > release. If not, this thread should be moved to the "resolved before > 17beta1" section. If so, we should try to reach consensus on what the > remaining issues are and what we're going to do about them. I think the only thing that's relevant for v17 is this: On Tue, Apr 16, 2024 at 08:11:49PM +0200, Alvaro Herrera wrote: > Speaking of which, I wonder if I should modify pg16's tests so that they > leave behind tables set up in this way, to immortalize pg_upgrade testing. The patch on the other thread for pg_upgrade --check is an old issue affecting all stable releases. -- Justin
Control flow in logical replication walsender
Hi, I wanted to check my understanding of how control flows in a walsender doing logical replication. My understanding is that the (single) thread in each walsender process, in the simplest case, loops on: 1. Pull a record out of the WAL. 2. Pass it to the recorder buffer code, which, 3. Sorts it out into the appropriate in-memory structure for that transaction (spilling to disk as required), and then continues with #1, or, 4. If it's a commit record, it iteratively passes the transaction data one change at a time to, 5. The logical decoding plugin, which returns the output format of that plugin, and then, 6. The walsender sends the output from the plugin to the client. It cycles on passing the data to the plugin and sending it to the client until it runs out of changes in that transaction, and then resumes reading the WAL in #1. In particular, I wanted to confirm that while it is pulling the reordered transaction and sending it to the plugin (and thence to the client), that particular walsender is *not* reading new WAL records or putting them in the reorder buffer. The specific issue I'm trying to track down is an enormous pileup of spill files. This is in a non-supported version of PostgreSQL (v11), so an upgrade may fix it, but at the moment, I'm trying to find a cause and a mitigation.
Re: pg17 issues with not-null contraints
On Thu, Apr 18, 2024 at 12:52 PM Justin Pryzby wrote: > I'm not totally clear on what's intended in v17 - maybe it'd be dead > code, and maybe it shouldn't even be applied to master branch. But I do > think it's worth patching earlier versions (even though it'll be less > useful than having done so 5 years ago). This thread is still on the open items list, but I'm not sure whether there's still stuff here that needs to be fixed for the current release. If not, this thread should be moved to the "resolved before 17beta1" section. If so, we should try to reach consensus on what the remaining issues are and what we're going to do about them. -- Robert Haas EDB: http://www.enterprisedb.com
[PATCH] json_lex_string: don't overread on bad UTF8
Hi all, When json_lex_string() hits certain types of invalid input, it calls pg_encoding_mblen_bounded(), which assumes that its input is null-terminated and calls strnlen(). But the JSON lexer is constructed with an explicit string length, and we don't ensure that the string is null-terminated in all cases, so we can walk off the end of the buffer. This isn't really relevant on the server side, where you'd have to get a superuser to help you break string encodings, but for client-side usage on untrusted input (such as my OAuth patch) it would be more important. Attached is a draft patch that explicitly checks against the end-of-string pointer and clamps the token_terminator to it. Note that this removes the only caller of pg_encoding_mblen_bounded() and I'm not sure what we should do with that function. It seems like a reasonable API, just not here. The new test needs to record two versions of the error message, one for invalid token and one for invalid escape sequence. This is because, for smaller chunk sizes, the partial-token logic in the incremental JSON parser skips the affected code entirely when it can't find an ending double-quote. Tangentially: Should we maybe rethink pieces of the json_lex_string error handling? For example, do we really want to echo an incomplete multibyte sequence once we know it's bad? It also looks like there are places where the FAIL_AT_CHAR_END macro is called after the `s` pointer has already advanced past the code point of interest. I'm not sure if that's intentional. Thanks, --Jacob 0001-json_lex_string-don-t-overread-on-bad-UTF8.patch Description: Binary data
Re: A problem about partitionwise join
On Wed, Feb 21, 2024 at 6:25 AM Richard Guo wrote: > This patch was returned due to 'lack of interest'. However, upon > verification, it appears that the reported issue still exists, and the > proposed fix in the thread remains valid. Hence, resurrect this patch > after rebasing it on master. I've also written a detailed commit > message which hopefully can help people review the changes more > effectively. I think it's slightly questionable whether this patch is worthwhile. The case memorialized in the regression tests, t1.a = t2.a AND t1.a = t2.b, is a very weird thing to do. The case mentioned in the original email, foo.k1 = bar.k1 and foo.k2 = bar.k2 and foo.k2 = 16, seems like something that could realistically happen, especially when there are views in use (e.g. the view joins foo and bar, and then someone queries the view for one of the join columns). In such a case, it's possible that foo.k2 = 16 is selective enough that we really don't care about partition-wise join any more, but it's also possible that it's not too selective and we do care about partition-wise join. So I don't think that the case that the patch fixes is something that can ever happen, but I do think it's probably fairly rare that brings any benefit, which is why I thought that EC-based matching was an OK approach to this problem initially. Perhaps that was the wrong idea, though. Does the additional logic added by this patch have a noticeable performance cost? -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
Alexander Korotkov writes: > I agree that storing char signedness might seem weird. But it appears > that we already store indexes that depend on char signedness. So, > it's effectively property of bits-on-disk even though it affects > indirectly. Then I see two options to make the picture consistent. > 1) Assume that char signedness is somehow a property of bits-on-disk > even though it's weird. Then pg_trgm indexes are correct, but we need > to store char signedness in pg_control. > 2) Assume that char signedness is not a property of bits-on-disk. > Then pg_trgm indexes are buggy and need to be fixed. > What do you think? The problem with option (2) is the assumption that pg_trgm's behavior is the only bug of this kind, either now or in the future. I think that's just about an impossible standard to meet, because there's no realistic way to test whether char signedness is affecting things. (Sure, you can compare results across platforms, but maybe you just didn't test the right case.) Also, the bigger picture here is the seeming assumption that "if we change pg_trgm then it will be safe to replicate from x86 to arm". I don't believe that that's a good idea and I'm unwilling to promise that it will work, regardless of what we do about char signedness. That being the case, I don't want to invest a lot of effort in the signedness issue. Option (1) is clearly a small change with little if any risk of future breakage. Option (2) ... not so much. regards, tom lane
Re: TerminateOtherDBBackends code comments inconsistency.
On Tue, Apr 30, 2024 at 09:10:52AM +0530, Amit Kapila wrote: > On Tue, Apr 30, 2024 at 2:58 AM Noah Misch wrote: > > On Mon, Apr 29, 2024 at 10:18:35AM +0530, Amit Kapila wrote: > > > On Mon, Apr 22, 2024 at 9:56 PM Noah Misch wrote: > > > > > 3a9b18b309 didn't change the docs of pg_terminate_backend and whatever > > > is mentioned w.r.t permissions in the doc of that function sounds > > > valid for drop database force to me. Do you have any specific proposal > > > in your mind? > > > > Something like the attached. > > LGTM. > > > One could argue the function should also check > > isBackgroundWorker and ignore even bgworkers that set proc->roleId, but I've > > not done that. > > What is the argument for ignoring such workers? One of the proposed code comments says, "For bgworker authors, it's convenient to be able to recommend FORCE if a worker is blocking DROP DATABASE unexpectedly." That argument is debatable, but I do think it applies equally to bgworkers whether or not they set proc->roleId.
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
On Tue, Apr 30, 2024 at 7:54 PM Tom Lane wrote: > Alexander Korotkov writes: > > Given this, should we try to do better with binary compatibility > > checks using ControlFileData? AFAICS they are supposed to check if > > the database cluster is binary compatible with the running > > architecture. But it obviously allows incompatibilities. > > Perhaps. pg_control already covers endianness, which I think > is the root of the hashing differences I showed. Adding a field > for char signedness feels a little weird, since it's not directly > a property of the bits-on-disk, but maybe we should. I agree that storing char signedness might seem weird. But it appears that we already store indexes that depend on char signedness. So, it's effectively property of bits-on-disk even though it affects indirectly. Then I see two options to make the picture consistent. 1) Assume that char signedness is somehow a property of bits-on-disk even though it's weird. Then pg_trgm indexes are correct, but we need to store char signedness in pg_control. 2) Assume that char signedness is not a property of bits-on-disk. Then pg_trgm indexes are buggy and need to be fixed. What do you think? -- Regards, Alexander Korotkov
Re: Avoid orphaned objects dependencies, take 3
Hi Bertrand, 25.04.2024 10:20, Bertrand Drouvot wrote: postgres=# CREATE FUNCTION f() RETURNS int LANGUAGE SQL RETURN f() + 1; ERROR: cache lookup failed for function 16400 This stuff does appear before we get a chance to call the new depLockAndCheckObject() function. I think this is what Tom was referring to in [1]: " So the only real fix for this would be to make every object lookup in the entire system do the sort of dance that's done in RangeVarGetRelidExtended. " The fact that those kind of errors appear also somehow ensure that no orphaned dependencies can be created. I agree; the only thing that I'd change here, is the error code. But I've discovered yet another possibility to get a broken dependency. Please try this script: res=0 numclients=20 for ((i=1;i<=100;i++)); do for ((c=1;c<=numclients;c++)); do echo " CREATE SCHEMA s_$c; CREATE CONVERSION myconv_$c FOR 'LATIN1' TO 'UTF8' FROM iso8859_1_to_utf8; ALTER CONVERSION myconv_$c SET SCHEMA s_$c; " | psql >psql1-$c.log 2>&1 & echo "DROP SCHEMA s_$c RESTRICT;" | psql >psql2-$c.log 2>&1 & done wait pg_dump -f db.dump || { echo "on iteration $i"; res=1; break; } for ((c=1;c<=numclients;c++)); do echo "DROP SCHEMA s_$c CASCADE;" | psql >psql3-$c.log 2>&1 done done psql -c "SELECT * FROM pg_conversion WHERE connamespace NOT IN (SELECT oid FROM pg_namespace);" It fails for me (with the v4 patch applied) as follows: pg_dump: error: schema with OID 16392 does not exist on iteration 1 oid | conname | connamespace | conowner | conforencoding | contoencoding | conproc | condefault ---+--+--+--++---+---+ 16396 | myconv_6 | 16392 | 10 | 8 | 6 | iso8859_1_to_utf8 | f Best regards, Alexander
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
Alexander Korotkov writes: > Given this, should we try to do better with binary compatibility > checks using ControlFileData? AFAICS they are supposed to check if > the database cluster is binary compatible with the running > architecture. But it obviously allows incompatibilities. Perhaps. pg_control already covers endianness, which I think is the root of the hashing differences I showed. Adding a field for char signedness feels a little weird, since it's not directly a property of the bits-on-disk, but maybe we should. regards, tom lane
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
On Tue, Apr 23, 2024 at 5:57 PM Tom Lane wrote: > "Guo, Adam" writes: > > I would like to report an issue with the pg_trgm extension on > > cross-architecture replication scenarios. When an x86_64 standby > > server is replicating from an aarch64 primary server or vice versa, > > the gist_trgm_ops opclass returns different results on the primary > > and standby. > > I do not think that is a supported scenario. Hash functions and > suchlike are not guaranteed to produce the same results on different > CPU architectures. As a quick example, I get > > regression=# select hashfloat8(34); > hashfloat8 > >21570837 > (1 row) > > on x86_64 but > > postgres=# select hashfloat8(34); > hashfloat8 > > -602898821 > (1 row) > > on ppc32 thanks to the endianness difference. Given this, should we try to do better with binary compatibility checks using ControlFileData? AFAICS they are supposed to check if the database cluster is binary compatible with the running architecture. But it obviously allows incompatibilities. -- Regards, Alexander Korotkov
Re: SQL:2011 application time
On 4/30/24 09:24, Robert Haas wrote: Peter, could you have a look at http://postgr.es/m/47550967-260b-4180-9791-b224859fe...@illuminatedcomputing.com and express an opinion about whether each of those proposals are (a) good or bad ideas and (b) whether they need to be fixed for the current release? Here are the same patches but rebased. I've added a fourth which is my progress on adding the CHECK constraint. I don't really consider it finished though, because it has these problems: - The CHECK constraint should be marked as an internal dependency of the PK, so that you can't drop it, and it gets dropped when you drop the PK. I don't see a good way to tie the two together though, so I'd appreciate any advice there. They are separate AlterTableCmds, so how do I get the ObjectAddress of both constraints at the same time? I wanted to store the PK's ObjectAddress on the Constraint node, but since ObjectAddress isn't a Node it doesn't work. - The CHECK constraint should maybe be hidden when you say `\d foo`? Or maybe not, but that's what we do with FK triggers. - When you create partitions you get a warning about the constraint already existing, because it gets created via the PK and then also the partitioning code tries to copy it. Solving the first issue here should solve this nicely though. Alternately we could just fix the GROUP BY functional dependency code to only accept b-tree indexes. But I think the CHECK constraint approach is a better solution. Thanks, -- Paul ~{:-) p...@illuminatedcomputing.comFrom 0dbc008a654ab1fdc5f492345ee4575c352499d3 Mon Sep 17 00:00:00 2001 From: "Paul A. Jungwirth" Date: Sun, 24 Mar 2024 21:46:30 -0700 Subject: [PATCH v2 1/4] Fix ON CONFLICT DO NOTHING/UPDATE for temporal indexes A PRIMARY KEY or UNIQUE constraint with WITHOUT OVERLAPS will be a GiST index, not a B-Tree, but it will still have indisunique set. The code for ON CONFLICT fails if it sees a non-btree index that has indisunique. This commit fixes that and adds some tests. But now that we can't just test indisunique, we also need some extra checks to prevent DO UPDATE from running against a WITHOUT OVERLAPS constraint (because the conflict could happen against more than one row, and we'd only update one). --- src/backend/catalog/index.c | 1 + src/backend/executor/execIndexing.c | 2 +- src/backend/optimizer/util/plancat.c | 4 +- src/include/nodes/execnodes.h | 1 + .../regress/expected/without_overlaps.out | 176 ++ src/test/regress/sql/without_overlaps.sql | 113 +++ 6 files changed, 294 insertions(+), 3 deletions(-) diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index 5a8568c55c9..1fd543cc550 100644 --- a/src/backend/catalog/index.c +++ b/src/backend/catalog/index.c @@ -2443,6 +2443,7 @@ BuildIndexInfo(Relation index) &ii->ii_ExclusionOps, &ii->ii_ExclusionProcs, &ii->ii_ExclusionStrats); + ii->ii_HasWithoutOverlaps = ii->ii_Unique; } return ii; diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c index 9f05b3654c1..faa37ca56db 100644 --- a/src/backend/executor/execIndexing.c +++ b/src/backend/executor/execIndexing.c @@ -210,7 +210,7 @@ ExecOpenIndices(ResultRelInfo *resultRelInfo, bool speculative) * If the indexes are to be used for speculative insertion, add extra * information required by unique index entries. */ - if (speculative && ii->ii_Unique) + if (speculative && ii->ii_Unique && !ii->ii_HasWithoutOverlaps) BuildSpeculativeIndexInfo(indexDesc, ii); relationDescs[i] = indexDesc; diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c index 130f838629f..a398d7a78d1 100644 --- a/src/backend/optimizer/util/plancat.c +++ b/src/backend/optimizer/util/plancat.c @@ -815,7 +815,7 @@ infer_arbiter_indexes(PlannerInfo *root) */ if (indexOidFromConstraint == idxForm->indexrelid) { - if (!idxForm->indisunique && onconflict->action == ONCONFLICT_UPDATE) + if ((!idxForm->indisunique || idxForm->indisexclusion) && onconflict->action == ONCONFLICT_UPDATE) ereport(ERROR, (errcode(ERRCODE_WRONG_OBJECT_TYPE), errmsg("ON CONFLICT DO UPDATE not supported with exclusion constraints"))); @@ -837,7 +837,7 @@ infer_arbiter_indexes(PlannerInfo *root) * constraints), so index under consideration can be immediately * skipped if it's not unique */ - if (!idxForm->indisunique) + if (!idxForm->indisunique || idxForm->indisexclusion) goto next; /* Build BMS representation of plain (non expression) index attrs */ diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h index d927ac44a82..fdfaef284e9 100644 --- a/src/include/nodes/execnodes.h +++ b/src/include/nodes/execnodes.h @@ -204,6 +204,7 @@ typedef struct IndexInfo bool ii_Summarizing; int ii_ParallelWorkers; Oid ii
Re: SQL:2011 application time
On Fri, Apr 26, 2024 at 3:41 PM Paul Jungwirth wrote: > On 4/26/24 12:25, Robert Haas wrote: > > I think this thread should be added to the open items list. > > Thanks! I sent a request to pgsql-www to get edit permission. I didn't > realize there was a wiki page > tracking things like this. I agree it needs to be fixed if we want to include > the feature. Great, I see that it's on the list now. Peter, could you have a look at http://postgr.es/m/47550967-260b-4180-9791-b224859fe...@illuminatedcomputing.com and express an opinion about whether each of those proposals are (a) good or bad ideas and (b) whether they need to be fixed for the current release? Thanks, -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
Masahiko Sawada writes: > On Tue, Apr 30, 2024 at 12:37 PM Tom Lane wrote: >> I think this will break existing indexes that are working fine. >> Yeah, it would have been better to avoid the difference, but >> it's too late now. > True. So it will be a PG18 item. How will it be any better in v18? It's still an on-disk compatibility break for affected platforms. Now, people could recover by reindexing affected indexes, but I think we need to have a better justification than this for making them do so. regards, tom lane
Re: Support LIKE with nondeterministic collations
Peter Eisentraut wrote: > This patch adds support for using LIKE with nondeterministic > collations. So you can do things such as > > col LIKE 'foo%' COLLATE case_insensitive Nice! > The pattern is partitioned into substrings at wildcard characters > (so 'foo%bar' is partitioned into 'foo', '%', 'bar') and then then > whole predicate matches if a match can be found for each partition > under the applicable collation Trying with a collation that ignores punctuation: postgres=# CREATE COLLATION "ign_punct" ( provider = 'icu', locale='und-u-ka-shifted', deterministic = false ); postgres=# SELECT '.foo.' like 'foo' COLLATE ign_punct; ?column? -- t (1 row) postgres=# SELECT '.foo.' like 'f_o' COLLATE ign_punct; ?column? -- t (1 row) postgres=# SELECT '.foo.' like '_oo' COLLATE ign_punct; ?column? -- f (1 row) The first two results look fine, but the next one is inconsistent. Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite
Re: BitmapHeapScan streaming read user and prelim refactoring
> On 26 Apr 2024, at 15:04, Melanie Plageman wrote: > If this seems correct to you, are you okay with the rest of the fix > and test? We could close this open item once the patch is acceptable. From reading the discussion and the patch this seems like the right fix to me. Does the test added here aptly cover 04e72ed617be in terms its functionality? -- Daniel Gustafsson
Re: CREATE TABLE/ProcessUtility hook behavior change
On Tue, Apr 30, 2024 at 4:35 PM David Steele wrote: > > On 4/30/24 12:57, jian he wrote: > > On Tue, Apr 30, 2024 at 10:26 AM David Steele wrote: > >> > >> Since bb766cde cannot be readily applied to older commits in master I'm > >> unable to continue bisecting to find the ALTER TABLE behavioral change. > >> > >> This seems to leave me with manual code inspection and there have been a > >> lot of changes in this area, so I'm hoping somebody will know why this > >> ALTER TABLE change happened before I start digging into it. > > I just tested these two commits. https://git.postgresql.org/cgit/postgresql.git/commit/?id=3da13a6257bc08b1d402c83feb2a35450f988365 https://git.postgresql.org/cgit/postgresql.git/commit/?id=b0e96f311985bceba79825214f8e43f65afa653a i think it's related to the catalog not null commit. it will alter table and add not null constraint internally (equivalent to `alter table test alter id set not null`).
Re: partitioning and identity column
On Sun, Apr 28, 2024 at 12:29 PM Alexander Lakhin wrote: > 27.04.2024 18:00, Alexander Lakhin wrote: > > > > Please look also at another script, which produces the same error: > > I've discovered yet another problematic case: > CREATE TABLE tbl1 (a int GENERATED ALWAYS AS IDENTITY, b text) > PARTITION BY LIST (a); > CREATE TABLE tbl2 (b text, a int NOT NULL); > ALTER TABLE tbl1 ATTACH PARTITION tbl2 DEFAULT; > > INSERT INTO tbl2 DEFAULT VALUES; > ERROR: no owned sequence found > > Though it works with tbl2(a int NOT NULL, b text)... > Take a look at this too, please. > Thanks Alexander for the report. PFA patch which fixes all the three problems. I had not fixed getIdentitySequence() to fetch identity sequence associated with the partition because I thought it would be better to fail with an error when it's not used correctly. But these bugs show 1. the error is misleading and unwanted 2. there are more places where adding that logic to getIdentitySequence() makes sense. Fixed the function in these patches. Now callers like transformAlterTableStmt have to be careful not to call the function on a partition. I have examined all the callers of getIdentitySequence() and they seem to be fine. The code related to SetIdentity, DropIdentity is not called for partitions, errors for which are thrown elsewhere earlier. -- Best Wishes, Ashutosh Bapat From bce2e8d573040901b18c0c9f6884f0fd44f50724 Mon Sep 17 00:00:00 2001 From: Ashutosh Bapat Date: Tue, 30 Apr 2024 15:32:34 +0530 Subject: [PATCH] Fix assorted bugs related to identity column in partitioned tables When changing data type of a column of a partitioned table craft ALTER SEQUENCE command only once. Partitions do not have identity sequences of their own and thus do not need ALTER SEQUENCE command for each partition. Fix getIdentitySequence() to fetch the identity sequence associated with the top level partitioned table when Relation of a partition is passed to it. While doing so translate the attribute number of the partition into the attribute number of the partitioned table. Author: Ashutosh Bapat Discussion: https://www.postgresql.org/message-id/3b8a9dc1-bbc7-0ef5-6863-c432afac7...@gmail.com --- src/backend/catalog/pg_depend.c| 31 ++- src/backend/commands/tablecmds.c | 2 +- src/backend/parser/parse_utilcmd.c | 52 +++--- src/backend/rewrite/rewriteHandler.c | 19 +- src/include/catalog/dependency.h | 2 +- src/test/regress/expected/identity.out | 32 +++- src/test/regress/sql/identity.sql | 11 +- 7 files changed, 101 insertions(+), 48 deletions(-) diff --git a/src/backend/catalog/pg_depend.c b/src/backend/catalog/pg_depend.c index f85a898de8..5366f7820c 100644 --- a/src/backend/catalog/pg_depend.c +++ b/src/backend/catalog/pg_depend.c @@ -23,10 +23,12 @@ #include "catalog/pg_constraint.h" #include "catalog/pg_depend.h" #include "catalog/pg_extension.h" +#include "catalog/partition.h" #include "commands/extension.h" #include "miscadmin.h" #include "utils/fmgroids.h" #include "utils/lsyscache.h" +#include "utils/syscache.h" #include "utils/rel.h" @@ -941,10 +943,35 @@ getOwnedSequences(Oid relid) * Get owned identity sequence, error if not exactly one. */ Oid -getIdentitySequence(Oid relid, AttrNumber attnum, bool missing_ok) +getIdentitySequence(Relation rel, AttrNumber attnum, bool missing_ok) { - List *seqlist = getOwnedSequences_internal(relid, attnum, DEPENDENCY_INTERNAL); + Oid relid; + List *seqlist; + /* + * The identity sequence is associated with the topmost partitioned table, + * which might have column order different than the given partition. + */ + if (RelationGetForm(rel)->relispartition) + { + List *ancestors = + get_partition_ancestors(RelationGetRelid(rel)); + HeapTuple ctup = SearchSysCacheAttNum(RelationGetRelid(rel), attnum); + const char *attname = NameStr(((Form_pg_attribute) GETSTRUCT(ctup))->attname); + HeapTuple ptup; + + relid = llast_oid(ancestors); + ptup = SearchSysCacheAttName(relid, attname); + attnum = ((Form_pg_attribute) GETSTRUCT(ptup))->attnum; + + ReleaseSysCache(ctup); + ReleaseSysCache(ptup); + list_free(ancestors); + } + else + relid = RelationGetRelid(rel); + + seqlist = getOwnedSequences_internal(relid, attnum, DEPENDENCY_INTERNAL); if (list_length(seqlist) > 1) elog(ERROR, "more than one owned sequence found"); else if (seqlist == NIL) diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 3556240c8e..9d32b2c495 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -8516,7 +8516,7 @@ ATExecDropIdentity(Relation rel, const char *colName, bool missing_ok, LOCKMODE if (!recursing) { /* drop the internal sequence */ - seqid = getIdentitySequence(RelationGetRelid(rel), attnum, false); + seqid = getIdentitySequence(rel, attnum, false); deleteDependencyRecordsForClass(RelationRelat
Re: pg_input_error_info doc 2 exampled crammed together
On Sun, Apr 28, 2024 at 10:07:49PM -0700, David G. Johnston wrote: > Agreed. The column names are self-explanatory if you’ve seen errors > before. The values are immaterial. Plus we don’t generally use > psql-specific features in our examples. Okay, I've just cleaned up that a bit with f6ab942f5de0. -- Michael signature.asc Description: PGP signature
Re: Removing unneeded self joins
On Tue, Apr 30, 2024 at 9:00 AM Alexander Lakhin wrote: > 23.10.2023 12:47, Alexander Korotkov wrote: > > I think this patch makes substantial improvement to query planning. > > It has received plenty of reviews. The code is currently in quite > > good shape. I didn't manage to find the cases when this optimization > > causes significant overhead to planning time. Even if such cases will > > be spotted there is a GUC option to disable this feature. So, I'll > > push this if there are no objections. > > I've discovered another failure, introduced by d3d55ce57. > Please try the following: > CREATE TABLE t (a int unique, b float); > SELECT * FROM t NATURAL JOIN LATERAL > (SELECT * FROM t t2 TABLESAMPLE SYSTEM (t.b)) t2; I think we should just forbid SJE in case when relations to be merged have cross-references with lateral vars. The draft patch for this is attached. I'd like to ask Alexander to test it, Richard and Andrei to review it. Thank you! -- Regards, Alexander Korotkov sje_skip_cross_lateral_vars.patch Description: Binary data
Re: Direct SSL connection with ALPN and HBA rules
> On 29 Apr 2024, at 21:06, Heikki Linnakangas wrote: > Oh I was not aware sslrootcert=system works like that. That's a bit > surprising, none of the other ssl-related settings imply or require that SSL > is actually used. Did we intend to set a precedence for new settings with > that? It was very much intentional, and documented, an sslmode other than verify-full makes little sense when combined with sslrootcert=system. It wasn't intended to set a precedence (though there is probably a fair bit of things we can do, getting this right is hard enough as it is), rather it was footgun prevention. -- Daniel Gustafsson
Re: using extended statistics to improve join estimates
Hello Justin, Thanks for showing interest on this! > On Sun, Apr 28, 2024 at 10:07:01AM +0800, Andy Fan wrote: >> 's/estimiatedcluases/estimatedclauses/' typo error in the >> commit message is not fixed since I have to regenerate all the commits > > Maybe you know this, but some of these patches need to be squashed. > Regenerating the patches to address feedback is the usual process. > When they're not squished, it makes it hard to review the content of the > patches. You might overlooked the fact that the each individual commit is just to make the communication effectively (easy to review) and all of them will be merged into 1 commit at the last / during the process of review. Even though if something make it hard to review, I am pretty happy to regenerate the patches, but does 's/estimiatedcluases/estimatedclauses/' belongs to this category? I'm pretty sure that is not the only typo error or inapproprate word, if we need to regenerate the 22 patches because of that, we have to regenerate that pretty often. Do you mind to provide more feedback once and I can merge all of them in one modification or you think the typo error has blocked the review process? > > For example: > [PATCH v1 18/22] Fix error "unexpected system attribute" when join with > system attr > ..adds .sql regression tests, but the expected .out isn't updated until > [PATCH v1 19/22] Fix the incorrect comment on extended stats. > > That fixes an elog() in Tomas' original commit, so it should probably be > 002 or 003. Which elog are you talking about? > It might make sense to keep the first commit separate for > now, since it's nice to keep Tomas' original patch "pristine" to make > more apparent the changes you're proposing. This is my goal as well, did you find anything I did which break this rule, that's absoluately not my intention. > Another: > [PATCH v1 20/22] Add fastpath when combine the 2 MCV like eqjoinsel_inner. > ..doesn't compile without > [PATCH v1 21/22] When mcv->ndimensions == list_length(clauses), handle it > same as > > Your 022 patch fixes a typo in your 002 patch, which means that first > one reads a patch with a typo, and then later, a 10 line long patch > reflowing the comment with a typo fixed. I would like to regenerate the 22 patches if you think the typo error make the reivew process hard. I can do such things but not willing to do that often. > > A good guideline is that each patch should be self-contained, compiling > and passing tests. Which is more difficult with a long stack of > patches. I agree. -- Best Regards Andy Fan
Re: [PATCH] Fix bug when calling strncmp in check_authmethod_valid
> On 30 Apr 2024, at 04:41, Jingxian Li wrote: > Attached is a patch that fixes bug when calling strncmp function, in > which case the third argument (authmethod - strchr(authmethod, ' ')) > may be negative, which is not as expected.. The calculation is indeed incorrect, but the lack of complaints of it being broken made me wonder why this exist in the first place. This dates back to e7029b212755, just shy of 2 decades old, which added --auth with support for strings with auth-options to ident and pam like --auth 'pam ' and 'ident sameuser'. Support for options to ident was removed in 01c1a12a5bb4 but options to pam is still supported (although not documented), but was AFAICT broken in commit 8a02339e9ba3 some 12 years ago with this strncmp(). - if (strncmp(authmethod, *p, (authmethod - strchr(authmethod, ' '))) == 0) + if (strncmp(authmethod, *p, (strchr(authmethod, ' ') - authmethod)) == 0) This with compare "pam postgresql" with "pam" and not "pam " so the length should be "(strchr(authmethod, ' ') - authmethod + 1)" since "pam " is a method separate from "pam" in auth_methods_{host|local}. We don't want to allow "md5 " as that's not a method in the array of valid methods. But, since it's been broken in all supported versions of postgres and has AFAICT never been documented to exist, should we fix it or just remove it? We don't support auth-options for any other methods, like clientcert to cert for example. If we fix it we should also document that it works IMHO. -- Daniel Gustafsson
Re: Removing unneeded self joins
Hi, Alexander! On Tue, Apr 30, 2024 at 9:00 AM Alexander Lakhin wrote: > 23.10.2023 12:47, Alexander Korotkov wrote: > > I think this patch makes substantial improvement to query planning. > > It has received plenty of reviews. The code is currently in quite > > good shape. I didn't manage to find the cases when this optimization > > causes significant overhead to planning time. Even if such cases will > > be spotted there is a GUC option to disable this feature. So, I'll > > push this if there are no objections. > > I've discovered another failure, introduced by d3d55ce57. > Please try the following: > CREATE TABLE t (a int unique, b float); > SELECT * FROM t NATURAL JOIN LATERAL > (SELECT * FROM t t2 TABLESAMPLE SYSTEM (t.b)) t2; > > With asserts enabled, it triggers > TRAP: failed Assert("!bms_is_member(rti, lateral_relids)"), File: > "initsplan.c", Line: 697, PID: 3074054 > ExceptionalCondition at assert.c:52:13 > create_lateral_join_info at initsplan.c:700:8 > query_planner at planmain.c:257:2 > grouping_planner at planner.c:1523:17 > subquery_planner at planner.c:1098:2 > standard_planner at planner.c:415:9 > planner at planner.c:282:12 > pg_plan_query at postgres.c:904:9 > pg_plan_queries at postgres.c:996:11 > exec_simple_query at postgres.c:1193:19 > PostgresMain at postgres.c:4684:27 > > With no asserts, I get: > ERROR: failed to construct the join relation > > Please take a look at this. I'm looking into this, thank you! -- Regards, Alexander Korotkov
Re: speed up a logical replica setup
On Tue, Apr 30, 2024 at 12:04 PM Amit Kapila wrote: > > On Mon, Apr 29, 2024 at 5:28 PM Amit Kapila wrote: > > > > On Mon, Apr 29, 2024 at 5:23 PM Euler Taveira wrote: > > > > I was trying to test this utility when 'sync_replication_slots' is on > and it gets in an ERROR loop [1] and never finishes. Please find the > postgresql.auto used on the standby attached. I think if the standby > has enabled sync_slots, you need to pass dbname in > GenerateRecoveryConfig(). The other possibility is that when we start standby from pg_createsubscriber, we specifically set 'sync_replication_slots' as false. > > I couldn't test it further but I wonder if > there are already synced slots on the standby (either due to > 'sync_replication_slots' or users have used > pg_sync_replication_slots() before invoking pg_createsubscriber), > those would be retained as it is on new subscriber and lead to > unnecessary WAL retention and dead rows. > This still needs some handling. BTW, I don't see the use of following messages in --dry-run mode or rather they could be misleading: pg_createsubscriber: hint: If pg_createsubscriber fails after this point, you must recreate the physical replica before continuing. ... ... pg_createsubscriber: setting the replication progress (node name "pg_0" ; LSN 0/0) on database "postgres" Similarly, we should think if below messages are useful in --dry-run mode: pg_createsubscriber: dropping publication "pg_createsubscriber_5_887f7991" on database "postgres" pg_createsubscriber: creating subscription "pg_createsubscriber_5_887f7991" on database "postgres" ... pg_createsubscriber: enabling subscription "pg_createsubscriber_5_887f7991" on database "postgres" -- With Regards, Amit Kapila.
Re: CREATE TABLE/ProcessUtility hook behavior change
On 4/30/24 12:57, jian he wrote: On Tue, Apr 30, 2024 at 10:26 AM David Steele wrote: Since bb766cde cannot be readily applied to older commits in master I'm unable to continue bisecting to find the ALTER TABLE behavioral change. This seems to leave me with manual code inspection and there have been a lot of changes in this area, so I'm hoping somebody will know why this ALTER TABLE change happened before I start digging into it. probably this commit: https://git.postgresql.org/cgit/postgresql.git/commit/?id=0cd711271d42b0888d36f8eda50e1092c2fed4b3 Hmm, I don't think so since the problem already existed in bb766cde, which was committed on Apr 8 vs Apr 19 for the above patch. Probably I'll need to figure out which exact part of bb766cde fixes the event trigger issue so I can backpatch just that part and continue bisecting. Regards, -David
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
On Tue, Apr 30, 2024 at 12:37 PM Tom Lane wrote: > > Masahiko Sawada writes: > > On Tue, Apr 23, 2024 at 11:57 PM Tom Lane wrote: > >> Reject as not a bug. Discourage people from thinking that physical > >> replication will work across architectures. > > > While cross-arch physical replication is not supported, I think having > > architecture dependent differences is not good and It's legitimate to > > fix it. FYI the 'char' data type comparison is done as though char is > > unsigned. I've attached a small patch to fix it. What do you think? > > I think this will break existing indexes that are working fine. > Yeah, it would have been better to avoid the difference, but > it's too late now. True. So it will be a PG18 item. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com