Re: Assert name/short_desc to prevent SHOW ALL segfault
Michael Paquier writes: > And I've learnt today that we enforce a definition of __has_attribute > at the top of c.h, and that we already rely on that. So I agree that > what you are doing in 0002 should be enough. Should we wait until 16~ > opens for business though? I don't see a strong argument to push > forward with that now that we are in beta mode on HEAD. Agreed. This part isn't a bug fix. regards, tom lane
Re: Assert name/short_desc to prevent SHOW ALL segfault
On Fri, May 27, 2022 at 10:43:17AM -0700, Nathan Bossart wrote: > Makes sense. Here's a new patch set. 0001 is the part intended for > back-patching, and 0002 is the rest (i.e., adding pg_attribute_nonnull()). > I switched to using __has_attribute to discover whether nonnull was Okay, I have looked at 0001 this morning and applied it down to 12. The change in GetConfigOptionByNum() is not required in 10 and 11, as the strings of pg_show\all_settings() have begun to be translated in 12~. > supported, as that seemed cleaner. I didn't see any need for a new > configure check, but maybe I am missing something. And I've learnt today that we enforce a definition of __has_attribute at the top of c.h, and that we already rely on that. So I agree that what you are doing in 0002 should be enough. Should we wait until 16~ opens for business though? I don't see a strong argument to push forward with that now that we are in beta mode on HEAD. -- Michael signature.asc Description: PGP signature
Re: SLRUs in the main buffer pool, redux
On Fri, May 27, 2022 at 11:24 PM Thomas Munro wrote: > Rebased, debugged and fleshed out a tiny bit more, but still with > plenty of TODO notes and questions. I will talk about this idea at > PGCon, so I figured it'd help to have a patch that actually applies, > even if it doesn't work quite right yet. It's quite a large patch but > that's partly because it removes a lot of lines... FWIW, here are my PGCon slides about this: https://speakerdeck.com/macdice/improving-the-slru-subsystem There was a little bit of discussion on #pgcon-stream2 which I could summarise as: can we figure out a way to keep parts of the CLOG pinned so that backends don't have to do that for each lookup? Then CLOG checks become simple reads. There may be some relation to the idea of 'nailing' btree root pages that I've heard of from a couple of people now (with ProcSignalBarrier or something more fine grained along those lines if you need to unnail anything). Something to think about. I'm also wondering if it would be possible to do "optimistic" pinning instead for reads that normally need only a pin, using some kind of counter scheme with read barriers to tell you if the page might have been evicted after you read the data...
Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)
Em sex., 27 de mai. de 2022 às 18:22, Andres Freund escreveu: > Hi, > > On 2022-05-27 10:35:08 -0300, Ranier Vilela wrote: > > Em qui., 26 de mai. de 2022 às 22:30, Tomas Vondra < > > tomas.von...@enterprisedb.com> escreveu: > > > > > On 5/27/22 02:11, Ranier Vilela wrote: > > > > > > > > ... > > > > > > > > Here the results with -T 60: > > > > > > Might be a good idea to share your analysis / interpretation of the > > > results, not just the raw data. After all, the change is being proposed > > > by you, so do you think this shows the change is beneficial? > > > > > I think so, but the expectation has diminished. > > I expected that the more connections, the better the performance. > > And for both patch and head, this doesn't happen in tests. > > Performance degrades with a greater number of connections. > > Your system has four CPUs. Once they're all busy, adding more connections > won't improve performance. It'll just add more and more context switching, > cache misses, and make the OS scheduler do more work. > conns tps head 10 82365.634750 50 74593.714180 80 69219.756038 90 67419.574189 100 66613.771701 Yes it is quite disappointing that with 100 connections, tps loses to 10 connections. > > > > > GetSnapShowData() isn't a bottleneck? > > I'd be surprised if it showed up in a profile on your machine with that > workload in any sort of meaningful way. The snapshot reuse logic will > always > work - because there are no writes - and thus the only work that needs to > be > done is to acquire the ProcArrayLock briefly. And because there is only a > small number of cores, contention on the cacheline for that isn't a > problem. > Thanks for sharing this. > > > > > These results look much saner, but IMHO it also does not show any clear > > > benefit of the patch. Or are you still claiming there is a benefit? > > > > > We agree that they are micro-optimizations. However, I think they > should be > > considered micro-optimizations in inner loops, because all in > procarray.c is > > a hotpath. > > As explained earlier, I don't agree that they optimize anything - you're > making some of the scalability behaviour *worse*, if it's changed at all. > > > > The first objective, I believe, was achieved, with no performance > > regression. > > I agree, the gains are small, by the tests done. > > There are no gains. > IMHO, I must disagree. > > > > But, IMHO, this is a good way, small gains turn into big gains in the > end, > > when applied to all code. > > > > Consider GetSnapShotData() > > 1. Most of the time the snapshot is not null, so: > > if (snaphost == NULL), will fail most of the time. > > > > With the patch: > > if (snapshot->xip != NULL) > > { > > if (GetSnapshotDataReuse(snapshot)) > > return snapshot; > > } > > > > Most of the time the test is true and GetSnapshotDataReuse is not called > > for new > > snapshots. > > count, subcount and suboverflowed, will not be initialized, for all > > snapshots. > > But that's irrelevant. There's only a few "new" snapshots in the life of a > connection. You're optimizing something irrelevant. > IMHO, when GetSnapShotData() is the bottleneck, all is relevant. > > > > 2. If snapshot is taken during recoverys > > The pgprocnos and ProcGlobal->subxidStates are not touched unnecessarily. > > That code isn't reached when in recovery? > Currently it is reached *even* when not in recovery. With the patch, *only* is reached when in recovery. > > > 3. Calling GetSnapshotDataReuse() without first acquiring ProcArrayLock. > > There's an agreement that this would be fine, for now. > > There's no such agreement at all. It's not correct. > Ok, but there is a chance it will work correctly. > > > Consider ComputeXidHorizons() > > 1. ProcGlobal->statusFlags is touched before the lock. > > Hard to believe that'd have a measurable effect. > IMHO, anything you take out of the lock is a benefit. > > > > 2. allStatusFlags[index] is not touched for all numProcs. > > I'd be surprised if the compiler couldn't defer that load on its own. > Better be sure of that, no? regards, Ranier Vilela
Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)
Em sex., 27 de mai. de 2022 às 18:08, Andres Freund escreveu: > Hi, > > On 2022-05-27 03:30:46 +0200, Tomas Vondra wrote: > > On 5/27/22 02:11, Ranier Vilela wrote: > > > ./pgbench -M prepared -c $conns -j $conns -T 60 -S -n -U postgres > > > > > > pgbench (15beta1) > > > transaction type: > > > scaling factor: 1 > > > query mode: prepared > > > number of clients: 100 > > > number of threads: 100 > > > maximum number of tries: 1 > > > duration: 60 s > > > > > > connstps head tps patched > > > > > > 1 17126.32610817792.414234 > > > 10 82068.12338382468.334836 > > > 50 73808.73140474678.839428 > > > 80 73290.19171373116.553986 > > > 90 67558.48304368384.906949 > > > 100 65960.98280166997.793777 > > > 200 62216.01199862870.243385 > > > 300 62924.22565862796.157548 > > > 400 62278.09970463129.555135 > > > 500 63257.93087062188.825044 > > > 600 61479.89061161517.913967 > > > 700 61139.35405361327.898847 > > > 800 60833.66379161517.913967 > > > 900 61305.12964261248.336593 > > > 1000 60990.91871961041.670996 > > > > > > > These results look much saner, but IMHO it also does not show any clear > > benefit of the patch. Or are you still claiming there is a benefit? > > They don't look all that sane to me - isn't that way lower than one would > expect? Yes, quite disappointing. Restricting both client and server to the same four cores, a > thermically challenged older laptop I have around I get 150k tps at both 10 > and 100 clients. > And you can share the benchmark details? Hardware, postgres and pgbench, please? > > Either way, I'd not expect to see any GetSnapshotData() scalability > effects to > show up on an "Intel® Core™ i5-8250U CPU Quad Core" - there's just not > enough > concurrency. > This means that our customers will not see any connections scalability with PG15, using the simplest hardware? > The correct pieces of these changes seem very unlikely to affect > GetSnapshotData() performance meaningfully. > > To improve something like GetSnapshotData() you first have to come up with > a > workload that shows it being a meaningful part of a profile. Unless it is, > performance differences are going to just be due to various forms of noise. > Actually in the profiles I got with perf, GetSnapShotData() didn't show up. regards, Ranier Vilela
RE: [PATCH] Support % wildcard in extension upgrade filenames
> At PostGIS we've been thinking of ways to have better support, from > PostgreSQL proper, to our upgrade strategy, which has always consisted in a > SINGLE upgrade file good for upgrading from any older version. > > The current lack of such support for EXTENSIONs forced us to install a lot of > files on disk and we'd like to stop doing that at some point in the future. > > The proposal is to support wildcards for versions encoded in the filename so > that (for our case) a single wildcard could match "any version". I've been > thinking about the '%' character for that, to resemble the wildcard used for > LIKE. > > Here's the proposal: > https://lists.osgeo.org/pipermail/postgis-devel/2022-February/029500.html > > A very very short (and untested) patch which might (or might not) support our > case is attached. > > The only problem with my proposal/patch would be the presence, on the wild, > of PostgreSQL EXTENSION actually using the '%' character in their version > strings, which is currently considered legit by PostgreSQL. > > How do you feel about the proposal (which is wider than the patch) ? > > --strk; > > Libre GIS consultant/developer > https://strk.kbt.io/services.html [Regina Obe] Just a heads up about the above, Sandro has added it as a commitfest item which hopefully we can polish in time for PG16. https://commitfest.postgresql.org/38/3654/ Does anyone think this is such a horrible idea that we should abandon all hope? The main impetus is that many extensions (postgis, pgRouting, and I'm sure others) have over 300 extensions script files that are exactly the same. We'd like to reduce this footprint significantly. strk said the patch is crappy so don't look at it just yet. We'll work on polishing it. I'll review and provide docs for it. Thanks, Regina
Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)
Hi, On 2022-05-27 10:35:08 -0300, Ranier Vilela wrote: > Em qui., 26 de mai. de 2022 às 22:30, Tomas Vondra < > tomas.von...@enterprisedb.com> escreveu: > > > On 5/27/22 02:11, Ranier Vilela wrote: > > > > > > ... > > > > > > Here the results with -T 60: > > > > Might be a good idea to share your analysis / interpretation of the > > results, not just the raw data. After all, the change is being proposed > > by you, so do you think this shows the change is beneficial? > > > I think so, but the expectation has diminished. > I expected that the more connections, the better the performance. > And for both patch and head, this doesn't happen in tests. > Performance degrades with a greater number of connections. Your system has four CPUs. Once they're all busy, adding more connections won't improve performance. It'll just add more and more context switching, cache misses, and make the OS scheduler do more work. > GetSnapShowData() isn't a bottleneck? I'd be surprised if it showed up in a profile on your machine with that workload in any sort of meaningful way. The snapshot reuse logic will always work - because there are no writes - and thus the only work that needs to be done is to acquire the ProcArrayLock briefly. And because there is only a small number of cores, contention on the cacheline for that isn't a problem. > > These results look much saner, but IMHO it also does not show any clear > > benefit of the patch. Or are you still claiming there is a benefit? > > > We agree that they are micro-optimizations. However, I think they should be > considered micro-optimizations in inner loops, because all in procarray.c is > a hotpath. As explained earlier, I don't agree that they optimize anything - you're making some of the scalability behaviour *worse*, if it's changed at all. > The first objective, I believe, was achieved, with no performance > regression. > I agree, the gains are small, by the tests done. There are no gains. > But, IMHO, this is a good way, small gains turn into big gains in the end, > when applied to all code. > > Consider GetSnapShotData() > 1. Most of the time the snapshot is not null, so: > if (snaphost == NULL), will fail most of the time. > > With the patch: > if (snapshot->xip != NULL) > { > if (GetSnapshotDataReuse(snapshot)) > return snapshot; > } > > Most of the time the test is true and GetSnapshotDataReuse is not called > for new > snapshots. > count, subcount and suboverflowed, will not be initialized, for all > snapshots. But that's irrelevant. There's only a few "new" snapshots in the life of a connection. You're optimizing something irrelevant. > 2. If snapshot is taken during recoverys > The pgprocnos and ProcGlobal->subxidStates are not touched unnecessarily. That code isn't reached when in recovery? > 3. Calling GetSnapshotDataReuse() without first acquiring ProcArrayLock. > There's an agreement that this would be fine, for now. There's no such agreement at all. It's not correct. > Consider ComputeXidHorizons() > 1. ProcGlobal->statusFlags is touched before the lock. Hard to believe that'd have a measurable effect. > 2. allStatusFlags[index] is not touched for all numProcs. I'd be surprised if the compiler couldn't defer that load on its own. Greetings, Andres Freund
Ignore heap rewrites for materialized views in logical replication
Hi, While investigating an internal report, I concluded that it is a bug. The reproducible test case is simple (check 0002) and it consists of a FOR ALL TABLES publication and a non-empty materialized view on publisher. After the setup, if you refresh the MV, you got the following message on the subscriber: ERROR: logical replication target relation "public.pg_temp_N" does not exist That's because the commit 1a499c2520 (that fixes the heap rewrite for tables) forgot to consider that materialized views can also create transient heaps and they should also be skipped. The affected version is only 10 because 11 contains a different solution (commit 325f2ec555) that provides a proper fix for the heap rewrite handling in logical decoding. 0001 is a patch to skip MV too. I attached 0002 to demonstrate the issue but it doesn't seem appropriate to be included. The test was written to detect the error and bail out. After this fix, it takes a considerable amount of time to finish the test because it waits for a message that never arrives. Since nobody reports this bug in 5 years and considering that version 10 will be EOL in 6 months, I don't think an additional test is crucial here. -- Euler Taveira EDB https://www.enterprisedb.com/ From 652efe45665d91f2f4ae865dba078fcaffdc0a17 Mon Sep 17 00:00:00 2001 From: Euler Taveira Date: Fri, 27 May 2022 11:35:27 -0300 Subject: [PATCH v1 1/2] Ignore heap rewrites for materialized views in logical replication If you have a publication that specifies FOR ALL TABLES clause, a REFRESH MATERIALIZED VIEW can break your setup with the following message ERROR: logical replication target relation "public.pg_temp_NNN" does not exist Commit 1a499c2520 introduces a heuristic to prevent that transient heaps (due to table rewrites) are included for a FOR ALL TABLES publication. However, it forgot to exclude materialized views (heap rewrites occurs when you execute REFRESH MATERIALIZED VIEW). Since materialized views are not supported in logical decoding, it is appropriate to filter them too. As explained in the commit message, a more robust solution was included in version 11 (commit 325f2ec555), hence only version 10 is affected. --- src/backend/replication/pgoutput/pgoutput.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c index f96fde3df0..e03ff4a11e 100644 --- a/src/backend/replication/pgoutput/pgoutput.c +++ b/src/backend/replication/pgoutput/pgoutput.c @@ -540,7 +540,9 @@ get_rel_sync_entry(PGOutputData *data, Oid relid) if (sscanf(relname, "pg_temp_%u%n", , ) == 1 && relname[n] == '\0') { - if (get_rel_relkind(u) == RELKIND_RELATION) + char relkind = get_rel_relkind(u); + + if (relkind == RELKIND_RELATION || relkind == RELKIND_MATVIEW) break; } } -- 2.30.2 From 7ba28dea6800359ae631c1a132f8f6c6d418548b Mon Sep 17 00:00:00 2001 From: Euler Taveira Date: Fri, 27 May 2022 14:43:10 -0300 Subject: [PATCH v1 2/2] test: heap rewrite for materialized view in logical replication --- src/test/subscription/t/006_rewrite.pl | 26 +- 1 file changed, 25 insertions(+), 1 deletion(-) diff --git a/src/test/subscription/t/006_rewrite.pl b/src/test/subscription/t/006_rewrite.pl index 5e3211aefa..e5e0fdabe7 100644 --- a/src/test/subscription/t/006_rewrite.pl +++ b/src/test/subscription/t/006_rewrite.pl @@ -3,7 +3,8 @@ use strict; use warnings; use PostgresNode; use TestLib; -use Test::More tests => 2; +use Time::HiRes qw(usleep); +use Test::More tests => 3; sub wait_for_caught_up { @@ -26,6 +27,13 @@ my $ddl = "CREATE TABLE test1 (a int, b text);"; $node_publisher->safe_psql('postgres', $ddl); $node_subscriber->safe_psql('postgres', $ddl); +$ddl = "CREATE TABLE test2 (a int, b text);"; +$node_publisher->safe_psql('postgres', $ddl); +$node_subscriber->safe_psql('postgres', $ddl); + +$node_publisher->safe_psql('postgres', q{INSERT INTO test2 (a, b) VALUES (10, 'ten'), (20, 'twenty');}); +$node_publisher->safe_psql('postgres', 'CREATE MATERIALIZED VIEW test3 AS SELECT a, b FROM test2;'); + my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres'; my $appname = 'encoding_test'; @@ -69,5 +77,21 @@ is($node_subscriber->safe_psql('postgres', q{SELECT a, b, c FROM test1}), 3|three|33), 'data replicated to subscriber'); +$node_publisher->safe_psql('postgres', 'REFRESH MATERIALIZED VIEW test3;'); + +my $max_attempts = 2 * $TestLib::timeout_default; +my $attempts = 0; +my $errmsg = qr/logical replication target relation.*does not exist/; +{ + while ($attempts < $max_attempts) + { + last if (slurp_file($node_subscriber->logfile) =~ $errmsg); + + usleep(100_000); + $attempts++; + } +} +is($attempts, $max_attempts, 'heap rewrite for materialized view on subscriber'); + $node_subscriber->stop; $node_publisher->stop; -- 2.30.2
Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)
Hi, On 2022-05-27 03:30:46 +0200, Tomas Vondra wrote: > On 5/27/22 02:11, Ranier Vilela wrote: > > ./pgbench -M prepared -c $conns -j $conns -T 60 -S -n -U postgres > > > > pgbench (15beta1) > > transaction type: > > scaling factor: 1 > > query mode: prepared > > number of clients: 100 > > number of threads: 100 > > maximum number of tries: 1 > > duration: 60 s > > > > conns tps head tps patched > > > > 1 17126.326108 17792.414234 > > 10 82068.123383 82468.334836 > > 50 73808.731404 74678.839428 > > 80 73290.191713 73116.553986 > > 90 67558.483043 68384.906949 > > 100 65960.982801 66997.793777 > > 200 62216.011998 62870.243385 > > 300 62924.225658 62796.157548 > > 400 62278.099704 63129.555135 > > 500 63257.930870 62188.825044 > > 600 61479.890611 61517.913967 > > 700 61139.354053 61327.898847 > > 800 60833.663791 61517.913967 > > 900 61305.129642 61248.336593 > > 1000 60990.918719 61041.670996 > > > > These results look much saner, but IMHO it also does not show any clear > benefit of the patch. Or are you still claiming there is a benefit? They don't look all that sane to me - isn't that way lower than one would expect? Restricting both client and server to the same four cores, a thermically challenged older laptop I have around I get 150k tps at both 10 and 100 clients. Either way, I'd not expect to see any GetSnapshotData() scalability effects to show up on an "Intel® Core™ i5-8250U CPU Quad Core" - there's just not enough concurrency. The correct pieces of these changes seem very unlikely to affect GetSnapshotData() performance meaningfully. To improve something like GetSnapshotData() you first have to come up with a workload that shows it being a meaningful part of a profile. Unless it is, performance differences are going to just be due to various forms of noise. Greetings, Andres Freund
Re: Bump MIN_WINNT to 0x0600 (Vista) as minimal runtime in 16~
On Fri, May 27, 2022 at 3:53 PM Michael Paquier wrote: > Windows 8 ends its support > in 2023, it seems, so that sounds short even for PG16. I guess you meant 8.1 here, and corresponding server release 2012 R2. These will come to the end of their "extended" support phase in 2023, before PG16 comes out. If I understand correctly (and I'm not a Windows user, I just googled this), they will start showing blue full-screen danger-Will-Robinson alerts about viruses and malware. Why would we have explicit support for that in a new release? Do we want people putting their users' data in such a system? Can you go to GDPR jail for that in Europe? (Joking, I think). We should go full Marie Kondo on EOL'd OSes that are not in our CI or build farm, IMHO.
Re: "ERROR: latch already owned" on gharial
Robert Haas writes: > On Fri, May 27, 2022 at 10:21 AM Tom Lane wrote: >> What I'd suggest is to promote that failure to elog(PANIC), which >> would at least give us the PID and if we're lucky a stack trace. > That proposed change is fine with me. > As to the question of whether it's a real bug, nobody can prove > anything unless we actually run it down. Agreed, and I'll even grant your point that if it is an HPUX-specific or IA64-specific bug, it is not worth spending huge amounts of time to isolate. The problem is that we don't know that. What we do know so far is that if it can occur elsewhere, it's rare --- so we'd better be prepared to glean as much info as possible if we do get such a failure. Hence my thought of s/ERROR/PANIC/. And I'd be in favor of any other low-effort change we can make to instrument the case better. regards, tom lane
Re: generic plans and "initial" pruning
On Fri, May 27, 2022 at 1:10 AM Amit Langote wrote: > On Mon, Apr 11, 2022 at 12:53 PM Zhihong Yu wrote: > > On Sun, Apr 10, 2022 at 8:05 PM Amit Langote > wrote: > >> Sending v15 that fixes that to keep the cfbot green for now. > > > > Hi, > > > > + /* RT index of the partitione table. */ > > > > partitione -> partitioned > > Thanks, fixed. > > Also, I broke this into patches: > > 0001 contains the mechanical changes of moving PartitionPruneInfo out > of Append/MergeAppend into a list in PlannedStmt. > > 0002 is the main patch to "Optimize AcquireExecutorLocks() by locking > only unpruned partitions". > > -- > Thanks, Amit Langote > EDB: http://www.enterprisedb.com Hi, In the description: is made available to the actual execution via PartitionPruneResult, made available along with the PlannedStmt by the I think the second `made available` is redundant (can be omitted). + * Initial pruning is performed here if needed (unless it has already been done + * by ExecDoInitialPruning()), and in that case only the surviving subplans' I wonder if there is a typo above - I don't find ExecDoInitialPruning either in PG codebase or in the patches (except for this in the comment). I think it should be ExecutorDoInitialPruning. +* bit from it just above to prevent empty tail bits resulting in I searched in the code base but didn't find mentioning of `empty tail bit`. Do you mind explaining a bit about it ? Cheers
Re: "ERROR: latch already owned" on gharial
On Fri, May 27, 2022 at 10:21 AM Tom Lane wrote: > That's possible, certainly. It's also possible that it's a real bug > that so far has only manifested there for (say) timing reasons. > The buildfarm is not so large that we can write off single-machine > failures as being unlikely to hit in the real world. > > What I'd suggest is to promote that failure to elog(PANIC), which > would at least give us the PID and if we're lucky a stack trace. That proposed change is fine with me. As to the question of whether it's a real bug, nobody can prove anything unless we actually run it down. It's just a question of what you think the odds are. Noah's PGCon talk a few years back on the long tail of buildfarm failures convinced me (perhaps unintentionally) that low-probability failures that occur only on obscure systems or configurations are likely not worth running down, because while they COULD be real bugs, a lot of them aren't, and the time it would take to figure it out could be spent on other things - for instance, fixing things that we know for certain are bugs. Spending 40 hours of person-time on something with a 10% chance of being a bug in the PostgreSQL code doesn't necessarily make sense to me, because while you are correct that the buildfarm isn't that large, neither is the developer community. -- Robert Haas EDB: http://www.enterprisedb.com
Re: suboverflowed subtransactions concurrency performance optimize
On Fri, May 27, 2022 at 11:59 AM Andres Freund wrote: > On 2022-05-27 11:48:45 -0700, Peter Geoghegan wrote: > > I find it hard to believe that there wasn't even a cursory effort at > > performance validation before this was committed, but that's what it > > looks like. > > Yea. Imo this pretty clearly should be reverted. It has correctness issues, > testing issues and we don't know whether it does anything useful. It should definitely be reverted. -- Peter Geoghegan
Re: Allowing REINDEX to have an optional name
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:tested, passed Hello The patch applies and tests fine and I think this patch has good intentions to prevent the default behavior of REINDEX DATABASE to cause a deadlock. However, I am not in favor of simply omitting the database name after DATABASE clause because of consistency. Almost all other queries involving the DATABASE clause require database name to be given following after. For example, ALTER DATABASE [dbname]. Being able to omit database name for REINDEX DATABASE seems inconsistent to me. The documentation states that REINDEX DATABASE only works on the current database, but it still requires the user to provide a database name and require that it must match the current database. Not very useful option, isn’t it? But it is still required from the user to stay consistent with other DATABASE clauses. Maybe the best way is to keep the query clause as is (with the database name still required) and simply don’t let it reindex system catalog to prevent deadlock. At the end, give user a notification that system catalogs have not been reindexed, and tell them to use REINDEX SYSTEM instead. Cary Huang - HighGo Software Canada www.highgo.ca
Re: suboverflowed subtransactions concurrency performance optimize
On 2022-05-27 11:48:45 -0700, Peter Geoghegan wrote: > On Fri, May 27, 2022 at 8:55 AM Andres Freund wrote: > > > Anyway, how about if we clear this cache for subtrans whenever > > > TransactionXmin is advanced and cachedFetchSubXid precedes it? The > > > comments atop SubTransGetTopmostTransaction seem to state that we > > > don't care about the exact topmost parent when the intermediate one > > > precedes TransactionXmin. I think it should preserve the optimization > > > because anyway for such cases there is a fast path in > > > SubTransGetTopmostTransaction. > > > > There's not even a proof this does speed up anything useful! There's not a > > single benchmark for the patch. > > I find it hard to believe that there wasn't even a cursory effort at > performance validation before this was committed, but that's what it > looks like. Yea. Imo this pretty clearly should be reverted. It has correctness issues, testing issues and we don't know whether it does anything useful.
Re: suboverflowed subtransactions concurrency performance optimize
On Fri, May 27, 2022 at 8:55 AM Andres Freund wrote: > > Anyway, how about if we clear this cache for subtrans whenever > > TransactionXmin is advanced and cachedFetchSubXid precedes it? The > > comments atop SubTransGetTopmostTransaction seem to state that we > > don't care about the exact topmost parent when the intermediate one > > precedes TransactionXmin. I think it should preserve the optimization > > because anyway for such cases there is a fast path in > > SubTransGetTopmostTransaction. > > There's not even a proof this does speed up anything useful! There's not a > single benchmark for the patch. I find it hard to believe that there wasn't even a cursory effort at performance validation before this was committed, but that's what it looks like. -- Peter Geoghegan
Re: Assert name/short_desc to prevent SHOW ALL segfault
On Thu, May 26, 2022 at 11:01:44PM -0400, Tom Lane wrote: > Michael Paquier writes: >> FWIW, I would be fine to backpatch the NULL handling for short_desc, >> while treating the addition of nonnull as a HEAD-only change. > > Yeah, sounds about right to me. My guess is that we will need > a configure check for nonnull, but perhaps I'm wrong. Makes sense. Here's a new patch set. 0001 is the part intended for back-patching, and 0002 is the rest (i.e., adding pg_attribute_nonnull()). I switched to using __has_attribute to discover whether nonnull was supported, as that seemed cleaner. I didn't see any need for a new configure check, but maybe I am missing something. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From c2760bf3db292fb55b6448f22eaeee38656083b5 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Fri, 27 May 2022 09:48:51 -0700 Subject: [PATCH v4 1/2] Properly handle NULL short descriptions for custom variables. If a NULL short description is specified in one of the DefineCustomXXXVariable functions, SHOW ALL will segfault. This change teaches SHOW ALL to properly handle NULL short descriptions. Back-patch to all supported versions. Reported by: Steve Chavez Author: Steve Chavez Reviewed by: Nathan Bossart, Michael Paquier, Andred Freund, Tom Lane Discussion: https://postgr.es/m/CAGRrpzY6hO-Kmykna_XvsTv8P2DshGiU6G3j8yGao4mk0CqjHA%40mail.gmail.com --- src/backend/utils/misc/guc.c | 16 +--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 8e9b71375c..55d41ae7d6 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -9780,7 +9780,16 @@ ShowAllGUCConfig(DestReceiver *dest) isnull[1] = true; } - values[2] = PointerGetDatum(cstring_to_text(conf->short_desc)); + if (conf->short_desc) + { + values[2] = PointerGetDatum(cstring_to_text(conf->short_desc)); + isnull[2] = false; + } + else + { + values[2] = PointerGetDatum(NULL); + isnull[2] = true; + } /* send it to dest */ do_tup_output(tstate, values, isnull); @@ -9792,7 +9801,8 @@ ShowAllGUCConfig(DestReceiver *dest) pfree(setting); pfree(DatumGetPointer(values[1])); } - pfree(DatumGetPointer(values[2])); + if (conf->short_desc) + pfree(DatumGetPointer(values[2])); } end_tup_output(tstate); @@ -10002,7 +10012,7 @@ GetConfigOptionByNum(int varnum, const char **values, bool *noshow) values[3] = _(config_group_names[conf->group]); /* short_desc */ - values[4] = _(conf->short_desc); + values[4] = conf->short_desc != NULL ? _(conf->short_desc) : NULL; /* extra_desc */ values[5] = conf->long_desc != NULL ? _(conf->long_desc) : NULL; -- 2.25.1 >From bd05e7092f45f416476cc02a21f8d0bb069a313f Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Fri, 27 May 2022 10:10:09 -0700 Subject: [PATCH v4 2/2] Introduce pg_attribute_nonnull() and use it for DefineCustomXXXVariable(). pg_attribute_nonnull() can be used to generate compiler warnings when a function is called with the specified arguments set to NULL. An empty argument list indicates that all pointer arguments must not be NULL. pg_attribute_nonnull() only works for compilers that support the nonnull function attribute. If nonnull is not supported, pg_attribute_nonnull() has no effect. In addition to introducing pg_attribute_nonnull(), this change uses it for the DefineCustomXXXVariable() functions to generate warnings when the "name" and "valueAddr" arguments are set to NULL. Idea by: Andres Freund Author: Nathan Bossart Reviewed by: Michael Paquier, Tom Lane Discussion: https://postgr.es/m/20220525061739.ur7x535vtzyzkmqo%40alap3.anarazel.de --- src/include/c.h | 11 +++ src/include/utils/guc.h | 10 +- 2 files changed, 16 insertions(+), 5 deletions(-) diff --git a/src/include/c.h b/src/include/c.h index 4f16e589b3..0c64557c62 100644 --- a/src/include/c.h +++ b/src/include/c.h @@ -144,6 +144,17 @@ #define pg_attribute_no_sanitize_alignment() #endif +/* + * pg_attribute_nonnull means the compiler should warn if the function is called + * with the listed arguments set to NULL. If no arguments are listed, the + * compiler should warn if any pointer arguments are set to NULL. + */ +#if __has_attribute (nonnull) +#define pg_attribute_nonnull(...) __attribute__((nonnull(__VA_ARGS__))) +#else +#define pg_attribute_nonnull(...) +#endif + /* * Append PG_USED_FOR_ASSERTS_ONLY to definitions of variables that are only * used in assert-enabled builds, to avoid compiler warnings about unused diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h index efcbad7842..be691c5e9c 100644 --- a/src/include/utils/guc.h +++ b/src/include/utils/guc.h @@ -303,7 +303,7 @@ extern void DefineCustomBoolVariable(const char *name, int flags, GucBoolCheckHook check_hook, GucBoolAssignHook assign_hook, - GucShowHook show_hook); +
Re: postgres and initdb not working inside docker
Only in an ideal world are all standards observed... Docker has different standards inside. $ ls -l /home/neo/ drwxr-xr-x2 pgsql pgsql 8192 May 27 10:37 data drwxr-sr-x2 pgsql pgsql 4096 May 24 09:35 data2 /home/pgsql/data - mounted volume. Therefore, the permissions have changed to drwxr-xr-x $ mkdir /home/pgsql/data/pgtest $ ls -l /home/pgsql/data drwxr-xr-x2 pgsql pgsql 0 May 27 11:08 pgtest $ chmod 700 /home/pgsql/data/pgtest $ ls -l /home/pgsql/data drwxr-xr-x2 pgsql pgsql 0 May 27 11:08 pgtest Oops... If it's a regular "data2" folder and there is no "read_only: true" flag for the container: $ mkdir /home/pgsql/data2/pgtest $ chmod 700 /home/pgsql/data2/pgtest $ ls -l /home/pgsql/data2 drwx--2 pgsql pgsql 4096 May 27 11:19 pgtest Roffild writes: postgres and initdb not working inside docker. chmod 755 always for a mounted volume inside docker. This patch will never be accepted. You don't need it if you take the standard advice[1] that the Postgres data directory should not itself be a mount point. Instead, make a subdirectory in the mounted volume, and that can have the ownership and permissions that the server expects. regards, tom lane [1] https://www.postgresql.org/message-id/12168.1312921709%40sss.pgh.pa.us
Re: Support logical replication of DDLs
On Fri, May 27, 2022 at 7:19 AM Zheng Li wrote: > > Hi Masahiko, > > > Thank you for updating the patches! > > > > I've not looked at these patches in-depth yet but with this approach, > > what do you think we can handle the DDL syntax differences between > > major versions? DDL syntax or behavior could be changed by future > > changes and I think we need to somehow deal with the differences. For > > > example, if the user uses logical replication for major version > > upgrade, the publisher is older than the subscriber. We might have to > > rewrite the DDL before applying to the subscriber because the DDL > > executed on the publisher no longer work on a new PostgreSQL version > > I don't think we will allow this kind of situation to happen in the > first place for > backward compatibility. It seems like a big limitation to me. > If a DDL no longer works on a new version of > PostgreSQL, the user will have to change the application code as well. > So even if it happens for > whatever reason, we could either > 1. fail the apply worker and let the user fix such DDL because they'll > have to fix the application code anyway when this happens. Once the apply worker received the DDL, if the DDL doesn't work on the subscriber, it will enter an infinite loop until the problem is fixed. If the failure is due to a syntax error, how does the user fix it? > 2. add guard rail logic in the apply worker to automatically fix such > DDL if possible, knowing the version of the source and target. Similar > logic must have been implemented for pg_dump/restore/upgrade. If I'm not missing something, there is no such implementation in pg_dump/restore/upgrade. When we use pg_dump/pg_restore for major version upgrades, we usually use the newer version pg_dump to fetch objects from the older version server, then restore the objects by using the newer version pg_restore. > > > or we might have to add some options to the DDL before the application > > in order to keep the same behavior. This seems to require a different > > solution from what the patch does for the problem you mentioned such > > > as "DDL involving multiple tables where only some tables are > > replicated”. > > First of all, this case can only happen when the customer chooses to > only replicate a subset of the tables in a database in which case > table level DDL replication is chosen instead of database level DDL > replication (where all tables > and DDLs are replicated). I think the solution would be: > 1. make best effort to detect such DDLs on the publisher and avoid > logging of such DDLs in table level DDL replication. I think it's better to support this case. > 2. apply worker will fail to replay such command due to missing > objects if such DDLs didn't get filtered on the publisher for some > reason. This should be rare and I think it's OK even if it happens, > we'll find out > why and fix it. I'm not sure it's rare since replicating a subset of tables is a common use case of logical replication. But even if we want to go this way I think we should consider how to fix it at this stage, otherwise we will end up redesigning it. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: selectivity function
On Thu, May 26, 2022 at 3:10 PM Tom Lane wrote: Can you do anything useful with attaching selectivity estimates to the functions it references, instead? I may have been doing down a bad path before. The function I'm working to improve has five argument, the last being "degrees", which is the match radius. Obviously a larger match radius should cause more matches. For a small value of a match radius (0.005 degrees): q3c_test=# explain (analyze, buffers) select * from test as a, test1 as b where q3c_join(a.ra,a.dec,b.ra,b.dec,.005); QUERY PLAN Nested Loop (cost=92.28..22787968818.00 rows=5 width=32) (actual time=7.799..10758.566 rows=31 loops=1) Buffers: shared hit=8005684 -> Seq Scan on test a (cost=0.00..15406.00 rows=100 width=16) (actual time=0.008..215.570 rows=100 loops=1) Buffers: shared hit=5406 -> Bitmap Heap Scan on test1 b (cost=92.28..22785.45 rows=250 width=16) (actual time=0.009..0.009 rows=0 loops=100) (note: I deleted some of the output, since I think I'm keeping the important bits) So, the cost of the query is calculated as 2e10, where it expect five rows, found 31, and a hot cache of reading 8 million units of disk space, I'd have to check the fine manual to remind myself of the units of that. When I do the same sort of query on a much larger match radius (5 deg) I get: q3c_test=# explain (analyze, buffers) select * from test as a, test1 as b where q3c_join(a.ra,a.dec,b.ra,b.dec,5); QUERY PLAN Nested Loop (cost=92.28..22787968818.00 rows=4766288 width=32) (actual time=0.086..254995.691 rows=38051626 loops=1) Buffers: shared hit=104977026 -> Seq Scan on test a (cost=0.00..15406.00 rows=100 width=16) (actual time=0.008..261.425 rows=100 loops=1) Buffers: shared hit=5406 -> Bitmap Heap Scan on test1 b (cost=92.28..22785.45 rows=250 width=16) (actual time=0.053..0.247 rows=38 loops=100) The "total cost" is the same identical 2e10, this time the number of rows expectd is 4.7 million, the number of rows delivered is 38 million (so the calculation is off by a factor of 8, I'm not sure that is important), but the io is now 104 million units. So while we are doing a lot more IO, and dealing with a lot more rows, the calculated cost is identical. That seems strange me me. Is that a normal thing? Is it possible that the cost calculation isn't including the selectivity calculation? Greg
Re: suboverflowed subtransactions concurrency performance optimize
Hi, On 2022-05-27 15:44:39 +0530, Amit Kapila wrote: > On Thu, May 26, 2022 at 12:53 PM Michael Paquier wrote: > > > > On Tue, May 24, 2022 at 04:52:50PM -0700, Andres Freund wrote: > > > > > 2) xid wraparound. There's nothing forcing > > > SubTransGetTopmostTransaction() to > > >be called regularly, so even if a backend isn't idle, the cache could > > > just > > >get more and more outdated until hitting wraparound > > > > Hence, you mean that the non-regularity of the call makes it more > > exposed to an inconsistent result after a wraparound? > > > > Won't in theory the similar cache in transam.c is also prone to > similar behavior? It's not quite the same risk, because there we are likely to actually hit the cache regularly. Whereas quite normal workloads might not hit this cache for days on end. > Anyway, how about if we clear this cache for subtrans whenever > TransactionXmin is advanced and cachedFetchSubXid precedes it? The > comments atop SubTransGetTopmostTransaction seem to state that we > don't care about the exact topmost parent when the intermediate one > precedes TransactionXmin. I think it should preserve the optimization > because anyway for such cases there is a fast path in > SubTransGetTopmostTransaction. There's not even a proof this does speed up anything useful! There's not a single benchmark for the patch. Greetings, Andres Freund
Re: PG15 beta1 sort performance regression due to Generation context change
Yura Sokolov writes: > В Вт, 24/05/2022 в 17:39 -0700, Andres Freund пишет: >> A variation on your patch would be to only store the offset to the block >> header - that should always fit into 32bit (huge allocations being their own >> block, which is why this wouldn't work for storing an offset to the >> context). > I'm +1 for this. Given David's results in the preceding message, I don't think I am. A scheme like this would add more arithmetic and at least one more indirection to GetMemoryChunkContext(), and we already know that adding even a test-and-branch there has measurable cost. (I wonder if using unlikely() on the test would help? But it's not unlikely in a generation-context-heavy use case.) There would also be a good deal of complication and ensuing slowdown created by the need for oversize chunks to be a completely different kind of animal with a different header. I'm also not very happy about this: > And with this change every memory context kind can have same header: IMO that's a bug not a feature. It puts significant constraints on how context types can be designed. regards, tom lane
Re: "ERROR: latch already owned" on gharial
Robert Haas writes: > On Fri, May 27, 2022 at 7:55 AM Thomas Munro wrote: >> Thanks. Hmm. So far it's always a parallel worker. The best idea I >> have is to include the ID of the mystery PID in the error message and >> see if that provides a clue next time. > ... Even if we find a bug in PostgreSQL, > it's likely to be a bug that only matters on systems nobody cares > about. That's possible, certainly. It's also possible that it's a real bug that so far has only manifested there for (say) timing reasons. The buildfarm is not so large that we can write off single-machine failures as being unlikely to hit in the real world. What I'd suggest is to promote that failure to elog(PANIC), which would at least give us the PID and if we're lucky a stack trace. regards, tom lane
Re: "ERROR: latch already owned" on gharial
On Fri, May 27, 2022 at 7:55 AM Thomas Munro wrote: > Thanks. Hmm. So far it's always a parallel worker. The best idea I > have is to include the ID of the mystery PID in the error message and > see if that provides a clue next time. What I'm inclined to do is get gharial and anole removed from the buildfarm. anole was set up by Heikki in 2011. I don't know when gharial was set up, or by whom. I don't think anyone at EDB cares about these machines any more, or has any interest in maintaining them. I think the only reason they're still running is that, just by good fortune, they haven't fallen over and died yet. The hardest part of getting them taken out of the buildfarm is likely to be finding someone who has a working username and password to log into them and take the jobs out of the crontab. If someone really cares about figuring out what's going on here, it's probably possible to get someone who is an EDB employee access to the box to chase it down. But I'm having a hard time understanding what value we get out of that given that the machines are running an 11-year-old compiler version on discontinued hardware on a discontinued operating system. Even if we find a bug in PostgreSQL, it's likely to be a bug that only matters on systems nobody cares about. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)
Em qui., 26 de mai. de 2022 às 22:30, Tomas Vondra < tomas.von...@enterprisedb.com> escreveu: > On 5/27/22 02:11, Ranier Vilela wrote: > > > > ... > > > > Here the results with -T 60: > > Might be a good idea to share your analysis / interpretation of the > results, not just the raw data. After all, the change is being proposed > by you, so do you think this shows the change is beneficial? > I think so, but the expectation has diminished. I expected that the more connections, the better the performance. And for both patch and head, this doesn't happen in tests. Performance degrades with a greater number of connections. GetSnapShowData() isn't a bottleneck? > > > Linux Ubuntu 64 bits > > shared_buffers = 128MB > > > > ./pgbench -M prepared -c $conns -j $conns -T 60 -S -n -U postgres > > > > pgbench (15beta1) > > transaction type: > > scaling factor: 1 > > query mode: prepared > > number of clients: 100 > > number of threads: 100 > > maximum number of tries: 1 > > duration: 60 s > > > > connstps head tps patched > > > > 1 17126.32610817792.414234 > > 10 82068.12338382468.334836 > > 50 73808.73140474678.839428 > > 80 73290.19171373116.553986 > > 90 67558.48304368384.906949 > > 100 65960.98280166997.793777 > > 200 62216.01199862870.243385 > > 300 62924.22565862796.157548 > > 400 62278.09970463129.555135 > > 500 63257.93087062188.825044 > > 600 61479.89061161517.913967 > > 700 61139.35405361327.898847 > > 800 60833.66379161517.913967 > > 900 61305.12964261248.336593 > > 1000 60990.91871961041.670996 > > > > These results look much saner, but IMHO it also does not show any clear > benefit of the patch. Or are you still claiming there is a benefit? > We agree that they are micro-optimizations. However, I think they should be considered micro-optimizations in inner loops, because all in procarray.c is a hotpath. The first objective, I believe, was achieved, with no performance regression. I agree, the gains are small, by the tests done. But, IMHO, this is a good way, small gains turn into big gains in the end, when applied to all code. Consider GetSnapShotData() 1. Most of the time the snapshot is not null, so: if (snaphost == NULL), will fail most of the time. With the patch: if (snapshot->xip != NULL) { if (GetSnapshotDataReuse(snapshot)) return snapshot; } Most of the time the test is true and GetSnapshotDataReuse is not called for new snapshots. count, subcount and suboverflowed, will not be initialized, for all snapshots. 2. If snapshot is taken during recoverys The pgprocnos and ProcGlobal->subxidStates are not touched unnecessarily. Only if is not suboverflowed. Skipping all InvalidTransactionId, mypgxactoff, backends doing logical decoding, and XID is >= xmax. 3. Calling GetSnapshotDataReuse() without first acquiring ProcArrayLock. There's an agreement that this would be fine, for now. Consider ComputeXidHorizons() 1. ProcGlobal->statusFlags is touched before the lock. 2. allStatusFlags[index] is not touched for all numProcs. All changes were made with the aim of avoiding or postponing unnecessary work. > BTW it's generally a good idea to do multiple runs and then use the > average and/or median. Results from a single may be quite noisy. > > > > > Linux Ubuntu 64 bits > > shared_buffers = 2048MB > > > > ./pgbench -M prepared -c $conns -j $conns -S -n -U postgres > > > > pgbench (15beta1) > > transaction type: > > scaling factor: 1 > > query mode: prepared > > number of clients: 100 > > number of threads: 100 > > maximum number of tries: 1 > > number of transactions per client: 10 > > > > conns tps headtps patched > > > > 1 2918.004085 3211.303789 > > 10 12262.41569615540.015540 > > 50 13656.72457116701.182444 > > 80 14338.20234816628.559551 > > 90 16597.51037316835.016835 > > 100 17706.77579316607.433487 > > 200 16877.06744116426.969799 > > 300 16942.26077516319.780662 > > 400 16794.51491116155.023607 > > 500 16598.50215116051.106724 > > 600 16717.93500116007.171213 > > 700 16651.20483416004.353184 > > 800 16467.54658316834.591719 > > 900 16588.24114916693.902459 > > 1000 16564.98526516936.952195 > > > > I think we've agreed these results are useless. > > > > > Linux Ubuntu 64 bits > > shared_buffers = 2048MB > > > > ./pgbench -M prepared -c $conns -j $conns -T 60 -S -n -U postgres > > > > pgbench (15beta1) > > transaction type: > > scaling factor: 1 > > query mode: prepared > > number of clients: 100 > > number of threads: 100 > >
Re: PG15 beta1 sort performance regression due to Generation context change
В Вт, 24/05/2022 в 17:39 -0700, Andres Freund пишет: > > A variation on your patch would be to only store the offset to the block > header - that should always fit into 32bit (huge allocations being their own > block, which is why this wouldn't work for storing an offset to the > context). With a bit of care that'd allow aset.c to half it's overhead, by > using 4 bytes of space for all non-huge allocations. Of course, it'd increase > the cost of pfree() of small allocations, because AllocSetFree() currently > doesn't need to access the block for those. But I'd guess that'd be outweighed > by the reduced memory usage. I'm +1 for this. And with this change every memory context kind can have same header: typedef struct MemoryChunk { #ifdef MEMORY_CONTEXT_CHECKING Size requested_size; #endif uint32 encoded_size;/* encoded allocation size */ uint32 offset_to_block; /* backward offset to block header */ } Allocated size always could be encoded into uint32 since it is rounded for large allocations (I believe, large allocations certainly rounded to at least 4096 bytes): encoded_size = size < (1u<<31) ? size : (1u<<31)|(size>>12); /* and reverse */ size = (encoded_size >> 31) ? ((Size)(encoded_size<<1)<<12) : (Size)encoded_size; There is a glitch with Aset since it currently reuses `aset` pointer for freelist link. With such change this link had to be encoded in chunk-body itself instead of header. I was confused with this, since there are valgrind hooks, and I was not sure how to change it (I'm not good at valgrind hooks). But after thinking more about I believe it is doable. regards --- Yura Sokolov
Re: pg_upgrade test writes to source directory
On 26.05.22 22:52, Justin Pryzby wrote: On Fri, May 27, 2022 at 05:43:04AM +0900, Michael Paquier wrote: On Thu, May 26, 2022 at 04:36:47PM +0200, Peter Eisentraut wrote: I chose TESTOUTDIR because it corresponds to the tmp_check directory, so that the output files of the pg_upgrade run are removed when the test artifacts are cleaned up. When using TESTDIR, the pg_upgrade output files end up in the build directory, which is less bad than the source directory, but still not ideal. Where does the choice of TESTOUTDIR come from? I am a bit surprised by this choice, to be honest, because there is no trace of it in the buildfarm client or the core code. TESTDIR, on the other hand, points to tmp_check/ if not set. It gets set it in vcregress.pl and Makefile.global.in. It looks like Peter working on top of the meson branch. TESTOUTDIR is not yet in master. Ooops, yeah. :) I think you can just chdir to ${PostgreSQL::Test::Utils::tmp_check}.
Re: enable/disable broken for statement triggers on partitioned tables
On Fri, May 27, 2022 at 5:11 PM Peter Eisentraut wrote: > On 24.05.22 23:23, Zhihong Yu wrote: > > Hi, > > > > AT_EnableTrig, /* ENABLE TRIGGER name */ > > + AT_EnableTrigRecurse, /* internal to commands/tablecmds.c */ > > AT_EnableAlwaysTrig,/* ENABLE ALWAYS TRIGGER name */ > > + AT_EnableAlwaysTrigRecurse, /* internal to commands/tablecmds.c */ > > > > Is it better to put the new enum's at the end of the AlterTableType? > > > > This way the numeric values for existing ones don't change. > > That's a concern if backpatching. Otherwise, it's better to put them > like shown in the patch. Agreed. -- Thanks, Amit Langote EDB: http://www.enterprisedb.com
Re: doc: CREATE FOREIGN TABLE .. PARTITION OF .. DEFAULT
On Fri, May 27, 2022 at 7:15 PM Etsuro Fujita wrote: > On Fri, May 27, 2022 at 1:58 AM Robert Haas wrote: > > Committed, except I adjusted the v11 version so that the CREATE > > FOREIGN TABLE documentation would match the CREATE TABLE documentation > > in that branch. > > I think we should fix the syntax synopsis in the Parameters section > of the CREATE FOREIGN TABLE reference page as well. Oops, good catch. > Attached is a patch for that. Thank you. I think we should also rewrite the description to match the CREATE TABLE's text, as in the attached updated patch. -- Thanks, Amit Langote EDB: http://www.enterprisedb.com Further-fix-CREATE-FOREIGN-TABLE-synopsis_v2.patch Description: Binary data
Re: allow building trusted languages without the untrusted versions
On 24.05.22 22:58, Nathan Bossart wrote: FWIW this was my original thinking. I can choose to build/install extensions separately, but when it comes to PL/Tcl and PL/Perl, you've got to build the trusted and untrusted stuff at the same time, and the untrusted symbols remain even if you remove the control file and installation scripts. Of course, this isn't a complete solution for removing the ability to do any sort of random file system access, though. This only makes sense to me if you install directly from the source tree to your production installation. Presumably, there is usually a packaging step in between. And you can decide at that point which files to install or not to install.
Re: Prevent writes on large objects in read-only transactions
On Fri, 2022-05-27 at 15:30 +0900, Yugo NAGATA wrote: > Currently, lo_creat(e), lo_import, lo_unlink, lowrite, lo_put, > and lo_from_bytea are allowed even in read-only transactions. > By using them, pg_largeobject and pg_largeobject_metatable can > be modified in read-only transactions and the effect remains > after the transaction finished. Is it unacceptable behaviours, > isn't it? +1 Yours, Laurenz Albe
Re: "ERROR: latch already owned" on gharial
On Thu, May 26, 2022 at 2:35 PM Tom Lane wrote: > Thomas Munro writes: > > On a more practical note, I don't have access to the BF database right > > now. Would you mind checking if "latch already owned" has occurred on > > any other animals? > > Looking back 6 months, these are the only occurrences of that string > in failed tests: > > sysname | branch | snapshot | stage | > l > -++-++--- > gharial | HEAD | 2022-04-28 23:37:51 | Check | 2022-04-28 > 18:36:26.981 MDT [22642:1] ERROR: latch already owned > gharial | HEAD | 2022-05-06 11:33:11 | IsolationCheck | 2022-05-06 > 10:10:52.727 MDT [7366:1] ERROR: latch already owned > gharial | HEAD | 2022-05-24 06:31:31 | IsolationCheck | 2022-05-24 > 02:44:51.850 MDT [13089:1] ERROR: latch already owned > (3 rows) Thanks. Hmm. So far it's always a parallel worker. The best idea I have is to include the ID of the mystery PID in the error message and see if that provides a clue next time. diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c index 78c6a89271..07b8273a7d 100644 --- a/src/backend/storage/ipc/latch.c +++ b/src/backend/storage/ipc/latch.c @@ -402,6 +402,8 @@ InitSharedLatch(Latch *latch) void OwnLatch(Latch *latch) { + pid_t previous_owner; + /* Sanity checks */ Assert(latch->is_shared); @@ -410,8 +412,11 @@ OwnLatch(Latch *latch) Assert(selfpipe_readfd >= 0 && selfpipe_owner_pid == MyProcPid); #endif - if (latch->owner_pid != 0) - elog(ERROR, "latch already owned"); + previous_owner = latch->owner_pid; + if (previous_owner != 0) + elog(ERROR, + "latch already owned by PID %lu", + (unsigned long) previous_owner); latch->owner_pid = MyProcPid; }
Re: doc: CREATE FOREIGN TABLE .. PARTITION OF .. DEFAULT
On Fri, May 27, 2022 at 1:58 AM Robert Haas wrote: > Committed, except I adjusted the v11 version so that the CREATE > FOREIGN TABLE documentation would match the CREATE TABLE documentation > in that branch. I think we should fix the syntax synopsis in the Parameters section of the CREATE FOREIGN TABLE reference page as well. Attached is a patch for that. Sorry for being late for this. Best regards, Etsuro Fujita Further-fix-CREATE-FOREIGN-TABLE-synopsis.patch Description: Binary data
Re: suboverflowed subtransactions concurrency performance optimize
On Thu, May 26, 2022 at 12:53 PM Michael Paquier wrote: > > On Tue, May 24, 2022 at 04:52:50PM -0700, Andres Freund wrote: > > > 2) xid wraparound. There's nothing forcing SubTransGetTopmostTransaction() > > to > >be called regularly, so even if a backend isn't idle, the cache could > > just > >get more and more outdated until hitting wraparound > > Hence, you mean that the non-regularity of the call makes it more > exposed to an inconsistent result after a wraparound? > Won't in theory the similar cache in transam.c is also prone to similar behavior? Anyway, how about if we clear this cache for subtrans whenever TransactionXmin is advanced and cachedFetchSubXid precedes it? The comments atop SubTransGetTopmostTransaction seem to state that we don't care about the exact topmost parent when the intermediate one precedes TransactionXmin. I think it should preserve the optimization because anyway for such cases there is a fast path in SubTransGetTopmostTransaction. -- With Regards, Amit Kapila.
Re: Handle infinite recursion in logical replication setup
On Fri, May 27, 2022 at 5:04 PM shiy.f...@fujitsu.com wrote: > > On Wed, May 25, 2022 7:55 PM vignesh C wrote: > > > > The attached v16 patch has the changes for the same. > > > > Thanks for updating the patch. > > Some comments for the document in 0002 patch. > > 1. > + > +Lock the required tables in node1 and > +node2 till the setup is completed. > + > + > + > +Create a publication in node1: > + > +node1=# CREATE PUBLICATION pub_node1 FOR TABLE t1; > +CREATE PUBLICATION > + > > If the table is locked in the very beginning, we will not be able to create > the > publication (because the locks have conflict). Maybe we should switch the > order > of creating publication and locking tables here. > I agree. It seems some of the locking instructions in the earlier sections 31.11.1 - 31.11.4 are contradictory to the later "generic" steps given in "31.11.5. Generic steps to add a new node to the existing set of nodes". I'm assuming the generic steps are the "correct" steps e.g. generic steps say get the lock on new node tables AFTER the publication of new node. e.g. generic steps say do NOT get a table lock on the node one you are (first) joining to. ~~~ Furthermore, the generic steps are describing attaching a new N+1th node to some (1 ... N) other nodes. So I think in the TWO-node case (section 31.11.1) node2 should be treated as the "new" node that you are attaching to the "first" node1. IMO the section 31.11.1 example should be reversed so that it obeys the "generic" pattern. e.g. It should be doing the CREATE PUBLICATION pub_node2 first (since that is the "new" node) -- Kind Regards, Peter Smith. Fujitsu Australia
Remove useless tests about TRUNCATE on foreign table
Hello, I found that tests for TRUNCATE on foreign tables are left in the foreign_data regression test. Now TRUNCATE on foreign tables are allowed, so I think the tests should be removed. Currently, the results of the test is "ERROR: foreign-data wrapper "dummy" has no handler", but it is just because the foreign table has no handler, not due to TRUNCATE. The patch is attached. Regards, Yugo Nagata -- Yugo NAGATA diff --git a/src/test/regress/expected/foreign_data.out b/src/test/regress/expected/foreign_data.out index 5bf03680d2..33505352cc 100644 --- a/src/test/regress/expected/foreign_data.out +++ b/src/test/regress/expected/foreign_data.out @@ -1822,11 +1822,6 @@ Server: s0 FDW options: (delimiter ',', quote '"', "be quoted" 'value') Inherits: fd_pt1 --- TRUNCATE doesn't work on foreign tables, either directly or recursively -TRUNCATE ft2; -- ERROR -ERROR: foreign-data wrapper "dummy" has no handler -TRUNCATE fd_pt1; -- ERROR -ERROR: foreign-data wrapper "dummy" has no handler DROP TABLE fd_pt1 CASCADE; NOTICE: drop cascades to foreign table ft2 -- IMPORT FOREIGN SCHEMA @@ -2047,11 +2042,6 @@ ALTER TABLE fd_pt2 ATTACH PARTITION fd_pt2_1 FOR VALUES IN (1); -- ERROR ERROR: child table is missing constraint "fd_pt2chk1" ALTER FOREIGN TABLE fd_pt2_1 ADD CONSTRAINT fd_pt2chk1 CHECK (c1 > 0); ALTER TABLE fd_pt2 ATTACH PARTITION fd_pt2_1 FOR VALUES IN (1); --- TRUNCATE doesn't work on foreign tables, either directly or recursively -TRUNCATE fd_pt2_1; -- ERROR -ERROR: foreign-data wrapper "dummy" has no handler -TRUNCATE fd_pt2; -- ERROR -ERROR: foreign-data wrapper "dummy" has no handler DROP FOREIGN TABLE fd_pt2_1; DROP TABLE fd_pt2; -- foreign table cannot be part of partition tree made of temporary diff --git a/src/test/regress/sql/foreign_data.sql b/src/test/regress/sql/foreign_data.sql index 9dfff66f54..eefb860adc 100644 --- a/src/test/regress/sql/foreign_data.sql +++ b/src/test/regress/sql/foreign_data.sql @@ -746,10 +746,6 @@ ALTER TABLE fd_pt1 RENAME CONSTRAINT fd_pt1chk3 TO f2_check; \d+ fd_pt1 \d+ ft2 --- TRUNCATE doesn't work on foreign tables, either directly or recursively -TRUNCATE ft2; -- ERROR -TRUNCATE fd_pt1; -- ERROR - DROP TABLE fd_pt1 CASCADE; -- IMPORT FOREIGN SCHEMA @@ -833,10 +829,6 @@ ALTER TABLE fd_pt2 ATTACH PARTITION fd_pt2_1 FOR VALUES IN (1); -- ERROR ALTER FOREIGN TABLE fd_pt2_1 ADD CONSTRAINT fd_pt2chk1 CHECK (c1 > 0); ALTER TABLE fd_pt2 ATTACH PARTITION fd_pt2_1 FOR VALUES IN (1); --- TRUNCATE doesn't work on foreign tables, either directly or recursively -TRUNCATE fd_pt2_1; -- ERROR -TRUNCATE fd_pt2; -- ERROR - DROP FOREIGN TABLE fd_pt2_1; DROP TABLE fd_pt2;
Re: enable/disable broken for statement triggers on partitioned tables
On 24.05.22 23:23, Zhihong Yu wrote: Hi, AT_EnableTrig, /* ENABLE TRIGGER name */ + AT_EnableTrigRecurse, /* internal to commands/tablecmds.c */ AT_EnableAlwaysTrig, /* ENABLE ALWAYS TRIGGER name */ + AT_EnableAlwaysTrigRecurse, /* internal to commands/tablecmds.c */ Is it better to put the new enum's at the end of the AlterTableType? This way the numeric values for existing ones don't change. That's a concern if backpatching. Otherwise, it's better to put them like shown in the patch.
RE: bogus: logical replication rows/cols combinations
On Friday, May 27, 2022 1:54 PM Justin Pryzby wrote: > > On Fri, May 27, 2022 at 11:17:00AM +0530, Amit Kapila wrote: > > On Tue, May 24, 2022 at 11:03 AM houzj.f...@fujitsu.com > wrote: > > > > > > On Friday, May 20, 2022 11:06 AM Amit Kapila > wrote: > > > > > > Thanks for pointing it out. Here is the new version patch which add this > version check. > > > > I have added/edited a few comments and ran pgindent. The attached > > looks good to me. I'll push this early next week unless there are more > > comments/suggestions. > > A minor doc review. > Note that I also sent some doc comments at > 20220519120724.go19...@telsasoft.com. > > + lists among publications in which case ALTER > PUBLICATION > + command will be successful but later the WalSender in publisher > + or the > > COMMA in which > > remove "command" ? > > s/in publisher/on the publisher/ > > + Subscription having several publications in which the same table has been > + published with different column lists is not supported. > > Either "Subscriptions having .. are not supported"; or, "A subscription > having .. > is not supported". Thanks for the comments. Here is the new version patch set which fixes these. Best regards, Hou zj 0001-language-fixes-on-HEAD-from-Justin.patch Description: 0001-language-fixes-on-HEAD-from-Justin.patch v6-0001-Prohibit-combining-publications-with-different-co.patch Description: v6-0001-Prohibit-combining-publications-with-different-co.patch
RE: Handle infinite recursion in logical replication setup
On Wed, May 25, 2022 7:55 PM vignesh C wrote: > > The attached v16 patch has the changes for the same. > Thanks for updating the patch. Some comments for the document in 0002 patch. 1. + +Lock the required tables in node1 and +node2 till the setup is completed. + + + +Create a publication in node1: + +node1=# CREATE PUBLICATION pub_node1 FOR TABLE t1; +CREATE PUBLICATION + If the table is locked in the very beginning, we will not be able to create the publication (because the locks have conflict). Maybe we should switch the order of creating publication and locking tables here. 2. In the case of "Adding a new node when data is present in the new node", we need to truncate table t1 in node3, but the truncate operation would be blocked because the table has be locked before. Maybe we need some changes for it. Regards, Shi yu
Re: [BUG] Panic due to incorrect missingContrecPtr after promotion
At Fri, 27 May 2022 02:01:27 +, "Imseih (AWS), Sami" wrote in > After further research, we found the following. > > Testing on 13.6 with the attached patch we see > that the missingContrecPtr is being incorrectly > set on the standby and the promote in the tap > test fails. > > Per the comments in xlog.c, the > missingContrecPtr should not be set when > in standby mode. > > /* > * When not in standby mode we find that WAL ends in an incomplete > * record, keep track of that record. After recovery is done, > * we'll write a record to indicate downstream WAL readers that > * that portion is to be ignored. > */ > if (!StandbyMode && > !XLogRecPtrIsInvalid(xlogreader->abortedRecPtr)) > { > abortedRecPtr = xlogreader->abortedRecPtr; > missingContrecPtr = xlogreader->missingContrecPtr; > elog(LOG, "missingContrecPtr == %ld", missingContrecPtr); > } > > If StandbyModeRequested is checked instead, which > checks for the presence of a standby signal file, > The missingContrecPtr is not set on the > standby and the test succeeds. > > if (!StandbyModeRequested && > !XLogRecPtrIsInvalid(xlogreader->abortedRecPtr)) > { > abortedRecPtr = xlogreader->abortedRecPtr; > missingContrecPtr = xlogreader->missingContrecPtr; > elog(LOG, "missingContrecPtr == %ld", missingContrecPtr); > } > > If this is a bug as it appears, it appears the original patch > to resolve this issue is not needed and the ideal fix > Is to ensure that a standby does not set > missingContrecPtr. > > Would like to see what others think. Due to lack of information, I don't have a clear idea of what is happening here. If that change "fixed" the "issue", that seems to mean that crash recovery passed an immature contrecord (@6/A860) but recovery did not stop at the time then continues until 10/B1FA3D88. A possibility is, crash recovery ended at the immature contrecord then the server moved to archive recovery mode and was able to continue from the same (aborted) LSN on the archived WAL. This means 6/A8 in pg_wal had been somehow removed after archived, or the segment 6/A7 (not A8) in both pg_xlog and archive are in different histories, which seems to me what we wanted to prevent by the XLOG_OVERWRITE_CONTRECORD. Didn't you use the same archive content for repeated recovery testing? And from the fact that entering crash recovery means the cluster didn't have an idea of how far it should recover until consistency, that is contained in backup label control file. That could happen when a crashed primary is as-is reused as the new standby of the new primary. So.. I'd like to hear exactly what you did as the testing. When standby mode is requested, if crash recovery fails with immature contrecord, I think we shouldn't continue recovery. But I'm not sure we need to explictly reject that case. Further study is needed.. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Prevent writes on large objects in read-only transactions
Hello, Currently, lo_creat(e), lo_import, lo_unlink, lowrite, lo_put, and lo_from_bytea are allowed even in read-only transactions. By using them, pg_largeobject and pg_largeobject_metatable can be modified in read-only transactions and the effect remains after the transaction finished. Is it unacceptable behaviours, isn't it? Also, when such transactions are used in recovery mode, it fails but the messages output are not user friendly, like: postgres=# select lo_creat(42); ERROR: cannot assign OIDs during recovery postgres=# select lo_create(42); ERROR: cannot assign TransactionIds during recovery postgres=# select lo_unlink(16389); ERROR: cannot acquire lock mode AccessExclusiveLock on database objects while recovery is in progress HINT: Only RowExclusiveLock or less can be acquired on database objects during recovery. So, I would like propose to explicitly prevent such writes operations on large object in read-only transactions, like: postgres=# SELECT lo_create(42); ERROR: cannot execute lo_create in a read-only transaction The patch is attached. Regards, Yugo Nagata -- Yugo NAGATA diff --git a/src/backend/libpq/be-fsstubs.c b/src/backend/libpq/be-fsstubs.c index 5804532881..2ffdba53d6 100644 --- a/src/backend/libpq/be-fsstubs.c +++ b/src/backend/libpq/be-fsstubs.c @@ -245,6 +245,8 @@ be_lo_creat(PG_FUNCTION_ARGS) { Oid lobjId; + PreventCommandIfReadOnly("lo_creat"); + lo_cleanup_needed = true; lobjId = inv_create(InvalidOid); @@ -256,6 +258,8 @@ be_lo_create(PG_FUNCTION_ARGS) { Oid lobjId = PG_GETARG_OID(0); + PreventCommandIfReadOnly("lo_create"); + lo_cleanup_needed = true; lobjId = inv_create(lobjId); @@ -306,6 +310,8 @@ be_lo_unlink(PG_FUNCTION_ARGS) { Oid lobjId = PG_GETARG_OID(0); + PreventCommandIfReadOnly("lo_unlink"); + /* * Must be owner of the large object. It would be cleaner to check this * in inv_drop(), but we want to throw the error before not after closing @@ -368,6 +374,8 @@ be_lowrite(PG_FUNCTION_ARGS) int bytestowrite; int totalwritten; + PreventCommandIfReadOnly("lowrite"); + bytestowrite = VARSIZE_ANY_EXHDR(wbuf); totalwritten = lo_write(fd, VARDATA_ANY(wbuf), bytestowrite); PG_RETURN_INT32(totalwritten); @@ -413,6 +421,8 @@ lo_import_internal(text *filename, Oid lobjOid) LargeObjectDesc *lobj; Oid oid; + PreventCommandIfReadOnly("lo_import"); + /* * open the file to be read in */ @@ -539,6 +549,8 @@ lo_truncate_internal(int32 fd, int64 len) { LargeObjectDesc *lobj; + PreventCommandIfReadOnly("lo_truncate"); + if (fd < 0 || fd >= cookies_size || cookies[fd] == NULL) ereport(ERROR, (errcode(ERRCODE_UNDEFINED_OBJECT), @@ -815,6 +827,8 @@ be_lo_from_bytea(PG_FUNCTION_ARGS) LargeObjectDesc *loDesc; int written PG_USED_FOR_ASSERTS_ONLY; + PreventCommandIfReadOnly("lo_from_bytea"); + lo_cleanup_needed = true; loOid = inv_create(loOid); loDesc = inv_open(loOid, INV_WRITE, CurrentMemoryContext); @@ -837,6 +851,8 @@ be_lo_put(PG_FUNCTION_ARGS) LargeObjectDesc *loDesc; int written PG_USED_FOR_ASSERTS_ONLY; + PreventCommandIfReadOnly("lo_put"); + lo_cleanup_needed = true; loDesc = inv_open(loOid, INV_WRITE, CurrentMemoryContext); diff --git a/src/test/regress/expected/largeobject.out b/src/test/regress/expected/largeobject.out index 8e32de04e6..14344edfba 100644 --- a/src/test/regress/expected/largeobject.out +++ b/src/test/regress/expected/largeobject.out @@ -506,6 +506,39 @@ SELECT lo_create(2121); (1 row) COMMENT ON LARGE OBJECT 2121 IS 'testing comments'; +-- Test writes on large objects in read-only transactions +START TRANSACTION READ ONLY; +SELECT lo_create(42); +ERROR: cannot execute lo_create in a read-only transaction +ROLLBACK; +START TRANSACTION READ ONLY; +SELECT lo_creat(42); +ERROR: cannot execute lo_creat in a read-only transaction +ROLLBACK; +START TRANSACTION READ ONLY; +SELECT lo_unlink(42); +ERROR: cannot execute lo_unlink in a read-only transaction +ROLLBACK; +START TRANSACTION READ ONLY; +SELECT lowrite(42, 'x'); +ERROR: cannot execute lowrite in a read-only transaction +ROLLBACK; +START TRANSACTION READ ONLY; +SELECT lo_import(:'filename'); +ERROR: cannot execute lo_import in a read-only transaction +ROLLBACK; +START TRANSACTION READ ONLY; +SELECT lo_truncate(42, 0); +ERROR: cannot execute lo_truncate in a read-only transaction +ROLLBACK; +START TRANSACTION READ ONLY; +SELECT lo_from_bytea(0, 'x'); +ERROR: cannot execute lo_from_bytea in a read-only transaction +ROLLBACK; +START TRANSACTION READ ONLY; +SELECT lo_put(42, 0, 'x'); +ERROR: cannot execute lo_put in a read-only transaction +ROLLBACK; -- Clean up DROP TABLE lotest_stash_values; DROP ROLE regress_lo_user; diff --git a/src/test/regress/sql/largeobject.sql b/src/test/regress/sql/largeobject.sql index fd3518889e..8b1a2b70c5 100644 --- a/src/test/regress/sql/largeobject.sql +++ b/src/test/regress/sql/largeobject.sql @@ -271,6 +271,39 @@ SELECT
Re: doc phrase: "inheritance child"
On Wed, May 25, 2022 at 1:30 PM Ashutosh Bapat wrote: > > - If true, the stats include inheritance child columns, not just the > + If true, the stats include child tables, not just the > > We are replacing columns with tables; is that intentional? > > Partitioned tables do not have their own stats, it's just aggregated > partition stats. > ... > - If true, the stats include inheritance child columns, not just the > + If true, the stats include child childs, not just the > values in the specified relation > > > @@ -13152,7 +13152,7 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts > ppx > inherited bool > > > - If true, this row includes inheritance child columns, not just the > + If true, this row includes child tables, not just the > values in the specified table > > > > Replacing inheritance child "column" with "tables", is that intentional? I was a bit confused by these too, though perhaps the original text is not as clear as it could be? Would the following be a good rewrite: If true, the stats cover the contents not only of the specified table, but also of its child tables or partitions. (If the table is partitioned, which contains no data by itself, the stats only cover the contents of partitions). Although, maybe the parenthetical is unnecessary. -- Thanks, Amit Langote EDB: http://www.enterprisedb.com