Re: File truncation within PostgresNode::issues_sql_like() wrong on Windows
On Wed, Apr 14, 2021 at 09:26:19PM -0400, Andrew Dunstan wrote: > Well, let me try it on fairywren tomorrow. Since we manage this on the > buildfarm without any use at all of Win32API::File it might not be > necessary in TAP code either, particularly if we're not truncating the file. Thanks. If that proves to not be necessary, +1 to remove this code. I have been playing with this stuff, and the attached patch seems to work properly on Windows. On top of that, I have also tested the non-Win32 path on an MSVC box to see that it was working, but my environment is not really noisy usually with such compatibility issues. -- Michael diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm index e26b2b3f30..e209ea7163 100644 --- a/src/test/perl/PostgresNode.pm +++ b/src/test/perl/PostgresNode.pm @@ -1917,21 +1917,7 @@ sub connect_ok @log_unlike = @{ $params{log_unlike} }; } - if (@log_like or @log_unlike) - { - # Don't let previous log entries match for this connection. - # On Windows, the truncation would not work, so rotate the log - # file before restarting the server afresh. - if ($TestLib::windows_os) - { - $self->rotate_logfile; - $self->restart; - } - else - { - truncate $self->logfile, 0; - } - } + my $log_location = -s $self->logfile; # Never prompt for a password, any callers of this routine should # have set up things properly, and this should not block. @@ -1950,7 +1936,8 @@ sub connect_ok } if (@log_like or @log_unlike) { - my $log_contents = TestLib::slurp_file($self->logfile); + my $log_contents = TestLib::slurp_file($self->logfile, + $log_location); while (my $regex = shift @log_like) { @@ -2001,21 +1988,7 @@ sub connect_fails @log_unlike = @{ $params{log_unlike} }; } - if (@log_like or @log_unlike) - { - # Don't let previous log entries match for this connection. - # On Windows, the truncation would not work, so rotate the log - # file before restarting the server afresh. - if ($TestLib::windows_os) - { - $self->rotate_logfile; - $self->restart; - } - else - { - truncate $self->logfile, 0; - } - } + my $log_location = -s $self->logfile; # Never prompt for a password, any callers of this routine should # have set up things properly, and this should not block. @@ -2034,7 +2007,8 @@ sub connect_fails if (@log_like or @log_unlike) { - my $log_contents = TestLib::slurp_file($self->logfile); + my $log_contents = TestLib::slurp_file($self->logfile, + $log_location); while (my $regex = shift @log_like) { @@ -2194,9 +2168,6 @@ sub command_checks_all Run a command on the node, then verify that $expected_sql appears in the server log file. -Reads the whole log file so be careful when working with large log outputs. -The log file is truncated prior to running the command, however. - =cut sub issues_sql_like @@ -2207,10 +2178,11 @@ sub issues_sql_like local %ENV = $self->_get_env(); - truncate $self->logfile, 0; + my $log_location = -s $self->logfile; + my $result = TestLib::run_log($cmd); ok($result, "@$cmd exit code 0"); - my $log = TestLib::slurp_file($self->logfile); + my $log = TestLib::slurp_file($self->logfile, $log_location); like($log, $expected_sql, "$test_name: SQL found in server log"); return; } diff --git a/src/test/perl/TestLib.pm b/src/test/perl/TestLib.pm index 1baf6bd001..017d0f08e0 100644 --- a/src/test/perl/TestLib.pm +++ b/src/test/perl/TestLib.pm @@ -124,7 +124,7 @@ BEGIN if ($windows_os) { require Win32API::File; - Win32API::File->import(qw(createFile OsFHandleOpen CloseHandle)); + Win32API::File->import(qw(createFile OsFHandleOpen CloseHandle setFilePointer)); } # Specifies whether to use Unix sockets for test setups. On @@ -430,21 +430,27 @@ sub slurp_dir =pod -=item slurp_file(filename) +=item slurp_file(filename [, $offset]) -Return the full contents of the specified file. +Return the full contents of the specified file, beginning from an +offset position if specified. =cut sub slurp_file { - my ($filename) = @_; + my ($filename, $offset) = @_; local $/; my $contents; if ($Config{osname} ne 'MSWin32') { open(my $in, '<', $filename) or croak "could not read \"$filename\": $!"; + if (defined($offset)) + { + seek($in, $offset, 0) + or croak "could not seek \"$filename\": $!"; + } $contents = <$in>; close $in; } @@ -454,6 +460,11 @@ sub slurp_file or croak "could not open \"$filename\": $^E"; OsFHandleOpen(my $fh = IO::Handle->new(), $fHandle, 'r') or croak "could not read \"$filename\": $^E\n"; + if (defined($offset)) + { + setFilePointer($fh, $offset, qw(FILE_BEGIN)) + or croak "could not seek \"$filename\": $^E\n"; + } $contents = <$fh>; CloseHandle($fHandle) or croak "could not close \"$filename\": $^E\n"; signature.asc Description: PGP signature
Re: New IndexAM API controlling index vacuum strategies
On Wed, Apr 14, 2021 at 8:38 PM Andres Freund wrote: > The reason I didn't do further reviews for things in this thread was > that I was trying really hard to get the shared memory stats patch into > a committable shape - there were just not enough hours in the day. I > think it's to be expected that, during the final CF, there aren't a lot > of resources for reviewing patches that are substantially new. Why > should these new patches have gotten priority over a much older patch > set that also address significant operational issues? We're all doing our best. > It's very common for larger / busier databases to *substantially* > increase autovacuum_freeze_max_age, so there won't be 1.6 billion XIDs > of headroom, but a few hundred million. The cost of doing unnecessary > anti-wraparound vacuums is just too great. And databases on the busier & > larger side of things are precisely the ones that are more likely to hit > wraparound issues (otherwise you're just not that likely to burn through > that many xids). I think that this was once true, but is now much less common, mostly due to the freeze map stuff in 9.6. And due a general recognition that the *risk* of increasing them is just too great (a risk that we can hope was diminished by the failsafe, incidentally). As an example of this, Christophe Pettus had a Damascene conversion when it came to increasing autovacuum_freeze_max_age aggressively, which we explains here: https://thebuild.com/blog/2019/02/08/do-not-change-autovacuum-age-settings/ In short, he went from regularly advising clients to increase autovacuum_freeze_max_age to telling them to specifically advising them to never touch them. Even if we assume that I'm 100% wrong about autovacuum_freeze_max_age, it's still true that the vacuum_failsafe_age GUC is interpreted with reference to autovacuum_freeze_max_age -- it will always be interpreted as if it was set to 105% of whatever the current value of autovacuum_freeze_max_age happens to be (so it's symmetric with the freeze_table_age GUC and its 95% behavior). So it's never completely unreasonable in the sense that it directly clashes with an existing autovacuum_freeze_max_age setting from before the upgrade. Of course this doesn't mean that there couldn't possibly be any problems with the new mechanism clashing with autovacuum_freeze_max_age in some unforeseen way. But, the worst that can happen is that a user that is sophisticated enough to very aggressively increase autovacuum_freeze_max_age upgrades to Postgres 14, and then finds that index vacuuming is sometimes skipped. Which they'll see lots of annoying and scary messages about if they ever look in the logs. I think that that's an acceptable price to pay to protect the majority of less sophisticated users. > And my concern isn't really that vacuum would have finished without a > problem if cost limiting hadn't been disabled, but that having multiple > autovacuum workers going all out will cause problems. Like the system > slowing down so much that it's hard to fix the actual root cause of the > wraparound - I've seen systems with a bunch unthrottled autovacuum > overwhelme the IO subsystem so much that simply opening a connection to > fix the issue took 10+ minutes. Especially on systems with provisioned > IO (i.e. just about all cloud storage) that's not too hard to hit. I don't think that it's reasonable to expect an intervention like this to perfectly eliminate all risk, while at the same time never introducing any new theoretical risks. (Especially while also being simple and obviously correct.) > > If it's intrinsically impossible to advance relfrozenxid, then surely > > all bets are off. But even in this scenario it's very unlikely that we > > wouldn't at least do index vacuuming for those index tuples that are > > dead and safe to delete according to the OldestXmin cutoff. You still > > have 1.6 billion XIDs before the failsafe first kicks in, regardless > > of the issue of the OldestXmin/FreezeLimit being excessively far in > > the past. > > As I said above, I don't think the "1.6 billion XIDs" argument has > merit, because it's so reasonable (and common) to set > autovacuum_freeze_max_age to something much larger. No merit? Really? Not even a teeny, tiny, microscopic little bit of merit? You're sure? As I said, we handle the case where autovacuum_freeze_max_age is set to something larger than vacuum_failsafe_age is a straightforward and pretty sensible way. I am curious, though: what autovacuum_freeze_max_age setting is "much higher" than 1.6 billion, but somehow also not extremely ill-advised and dangerous? What number is that, precisely? Apparently this is common, but I must confess that it's the first I've heard about it. -- Peter Geoghegan
Re: New IndexAM API controlling index vacuum strategies
Hi, On 2021-04-14 19:53:29 -0700, Peter Geoghegan wrote: > > Or at least > > tests for it should be added (pg_resetwal + autovacuum_naptime=1s or > > such should make that doable, or even just running a small test with > > lower thresholds). > > You know what else doesn't have test coverage? Any kind of aggressive > VACUUM. There is a problem with our culture around testing. I would > like to address that in the scope of this project, but you know how it > is. Can I take it that I'll have your support with adding those tests? Sure! > > I think there are good arguments for having logic for an "emergency > > vacuum" mode (and also some good ones against). I'm not convinced that > > the current set of things that are [not] skipped in failsafe mode is the > > "obviously right set of things"™ but am convinced that there wasn't > > enough consensus building o what that set of things is. This all also > > would be different if it were the start of the development window, > > rather than the end. > > I all but begged you to review the patches. Same with Robert. While > the earlier patches (where almost all of the complexity is) did get > review from both you and Robert (which I was grateful to receive), for > whatever reason neither of you looked at the later patches in detail. Based on a quick scan of the thread, the first version of a patch that kind of resembles what got committed around the topic at hand seems to be https://postgr.es/m/CAH2-Wzm7Y%3D_g3FjVHv7a85AfUbuSYdggDnEqN1hodVeOctL%2BOw%40mail.gmail.com posted 2021-03-15. That's well into the last CF. The reason I didn't do further reviews for things in this thread was that I was trying really hard to get the shared memory stats patch into a committable shape - there were just not enough hours in the day. I think it's to be expected that, during the final CF, there aren't a lot of resources for reviewing patches that are substantially new. Why should these new patches have gotten priority over a much older patch set that also address significant operational issues? > > I think there's also a clear danger in having "cliffs" where the > > behaviour changes appruptly once a certain threshold is reached. It's > > not unlikely for systems to fall over entirely over when > > > > a) autovacuum cost limiting is disabled. E.g. reaching your disk > >iops/throughput quota and barely being able to log into postgres > >anymore to kill the stuck connection causing the wraparound issue. > > Let me get this straight: You're concerned that hurrying up vacuuming > when we have 500 million XIDs left to burn will overwhelm the system, > which would presumably have finished in time otherwise? > Even though it would have to do way more work in absolute terms in the > absence of the failsafe? And even though the 1.6 billion XID age that > we got to before the failsafe kicked in was clearly not enough? You'd > want to "play it safe", and stick with the original plan at that > point? It's very common for larger / busier databases to *substantially* increase autovacuum_freeze_max_age, so there won't be 1.6 billion XIDs of headroom, but a few hundred million. The cost of doing unnecessary anti-wraparound vacuums is just too great. And databases on the busier & larger side of things are precisely the ones that are more likely to hit wraparound issues (otherwise you're just not that likely to burn through that many xids). And my concern isn't really that vacuum would have finished without a problem if cost limiting hadn't been disabled, but that having multiple autovacuum workers going all out will cause problems. Like the system slowing down so much that it's hard to fix the actual root cause of the wraparound - I've seen systems with a bunch unthrottled autovacuum overwhelme the IO subsystem so much that simply opening a connection to fix the issue took 10+ minutes. Especially on systems with provisioned IO (i.e. just about all cloud storage) that's not too hard to hit. > > b) No index cleanup happens anymore. E.g. a workload with a lot of > >bitmap index scans (which do not support killtuples) could end up a > >off a lot worse because index pointers to dead tuples aren't being > >cleaned up. In cases where an old transaction or leftover replication > >slot is causing the problem (together a significant percentage of > >wraparound situations) this situation will persist across repeated > >(explicit or automatic) vacuums for a table, because relfrozenxid > >won't actually be advanced. And this in turn might actually end up > >slowing resolution of the wraparound issue more than doing all the > >index scans. > > If it's intrinsically impossible to advance relfrozenxid, then surely > all bets are off. But even in this scenario it's very unlikely that we > wouldn't at least do index vacuuming for those index tuples that are > dead and safe to delete according to the OldestXmin cutoff. You still > have 1.6 billion XIDs before the
Re: logical replication empty transactions
On Thu, Sep 17, 2020 at 3:29 PM Michael Paquier wrote: > On Wed, Jul 29, 2020 at 08:08:06PM +0530, Rahila Syed wrote: > > The make check passes. > > Since then, the patch is failing to apply, waiting on author and the > thread has died 6 weeks or so ago, so I am marking it as RwF in the > CF. > > I've rebased the patch and made changes so that the patch supports "streaming in-progress transactions" and handling of logical decoding messages (transactional and non-transactional). I see that this patch not only makes sure that empty transactions are not sent but also does call OutputPluginUpdateProgress when an empty transaction is not sent, as a result the confirmed_flush_lsn is kept moving. I also see no hangs when synchronous_standby is configured. Do let me know your thoughts on this patch. regards, Ajin Cherian Fujitsu Australia v2-0001-Skip-empty-transactions-for-logical-replication.patch Description: Binary data
Re: New IndexAM API controlling index vacuum strategies
On Wed, Apr 14, 2021 at 6:53 PM Andres Freund wrote: > > To a large degree the failsafe is something that is written in the > > hope that it will never be needed. This is unlike most other things, > > and has its own unique risks. > > Among them that the code is not covered by tests and is unlikely to be > meaningfully exercised within the beta timeframe due to the timeframes > for hitting it (hard to actually hit below a 1/2 day running extreme > workloads, weeks for more realistic ones). Which means that this code > has to be extra vigorously reviewed, not the opposite. There is test coverage for the optimization to bypass index vacuuming with very few dead tuples. Plus we can expect it to kick in very frequently. That's not as good as testing this other mechanism directly, which I agree ought to happen too. But the difference between those two cases is pretty much entirely around when and how they kick in. I have normalized the idea that index vacuuming is optional in principle, so in an important sense it is tested all the time. > Or at least > tests for it should be added (pg_resetwal + autovacuum_naptime=1s or > such should make that doable, or even just running a small test with > lower thresholds). You know what else doesn't have test coverage? Any kind of aggressive VACUUM. There is a problem with our culture around testing. I would like to address that in the scope of this project, but you know how it is. Can I take it that I'll have your support with adding those tests? > This line of argumentation scares me. Not explained arguments, about > running in conditions that we otherwise don't run in, when in > exceptional circumstances. This code has a history of super subtle > interactions, with quite a few data loss causing bugs due to us not > forseeing some combination of circumstances. I'll say it again: I was wrong to not make that clearer prior to committing the fixup. I regret that error, which probably had a lot to do with being fatigued. > I think there are good arguments for having logic for an "emergency > vacuum" mode (and also some good ones against). I'm not convinced that > the current set of things that are [not] skipped in failsafe mode is the > "obviously right set of things"™ but am convinced that there wasn't > enough consensus building o what that set of things is. This all also > would be different if it were the start of the development window, > rather than the end. I all but begged you to review the patches. Same with Robert. While the earlier patches (where almost all of the complexity is) did get review from both you and Robert (which I was grateful to receive), for whatever reason neither of you looked at the later patches in detail. (Robert said that the failsafe ought to cover single-pass/no-indexes VACUUM at one point, which did influence the design of the failsafe, but for the most part his input on the later stuff was minimal and expressed in general terms.) Of course, nothing stops us from improving the mechanism in the future. Though I maintain that the fundamental approach of finishing as quickly as possible is basically sound (short of fixing the problem directly, for example by obviating the need for freezing). > In my experience the big problem with vacuums in a wraparound situation > isn't actually things like truncation or eventhe index scans (although > they certainly can cause bad problems), but that VACUUM modifies > (prune/vacuum and WAL log or just setting hint bits) a crapton of pages > that don't actually need to be modified just to be able to get out of > the wraparound situation. Things like UUID indexes are very popular, and are likely to have an outsized impact on dirtying pages (which I agree is the real problem). Plus some people just have a ridiculous amount of indexes (e.g., the Discourse table that they pointed out as a particularly good target for deduplication had a total of 13 indexes). There is an excellent chance that stuff like that is involved in installations that actually have huge problems. The visibility map works pretty well these days -- but not for indexes. > And that the overhead of writing out all those > dirty pages + WAL logging causes the VACUUM to take unacceptably > long. E.g. because your storage is cloud storage with a few ms of > latency, and the ringbuffer + wal_buffer sizes cause so many synchronous > writes that you end up with < 10MB/s of data being processed. This is a false dichotomy. There probably is an argument for making the failsafe not do pruning that isn't strictly necessary (or something like that) in a future release. I don't see what particular significance that has for the failsafe mechanism now. The sooner we can advance relfrozenxid when it's dangerously far in the past, the better. It's true that the mechanism doesn't exploit every possible opportunity to do so. But it mostly manages to do that. > I think there's also a clear danger in having "cliffs" where the > behaviour changes
Re: "could not find pathkey item to sort" for TPC-DS queries 94-96
On Wed, Apr 14, 2021 at 5:42 PM James Coleman wrote: > > On Mon, Apr 12, 2021 at 8:37 AM Tomas Vondra > wrote: > > > > On 4/12/21 2:24 PM, Luc Vlaming wrote: > > > Hi, > > > > > > When trying to run on master (but afaik also PG-13) TPC-DS queries 94, > > > 95 and 96 on a SF10 I get the error "could not find pathkey item to sort". > > > When I disable enable_gathermerge the problem goes away and then the > > > plan for query 94 looks like below. I tried figuring out what the > > > problem is but to be honest I would need some pointers as the code that > > > tries to matching equivalence members in prepare_sort_from_pathkeys is > > > something i'm really not familiar with. > > > > > > > Could be related to incremental sort, which allowed some gather merge > > paths that were impossible before. We had a couple issues related to > > that fixed in November, IIRC. > > > > > To reproduce you can either ingest and test using the toolkit I used too > > > (see https://github.com/swarm64/s64da-benchmark-toolkit/), or > > > alternatively just use the schema (see > > > https://github.com/swarm64/s64da-benchmark-toolkit/tree/master/benchmarks/tpcds/schemas/psql_native) > > > > > > > Thanks, I'll see if I can reproduce that with your schema. > > > > > > regards > > > > -- > > Tomas Vondra > > EnterpriseDB: http://www.enterprisedb.com > > The Enterprise PostgreSQL Company > > The query in question is: > > select count(*) > from store_sales > ,household_demographics > ,time_dim, store > where ss_sold_time_sk = time_dim.t_time_sk > and ss_hdemo_sk = household_demographics.hd_demo_sk > and ss_store_sk = s_store_sk > and time_dim.t_hour = 15 > and time_dim.t_minute >= 30 > and household_demographics.hd_dep_count = 7 > and store.s_store_name = 'ese' > order by count(*) > limit 100; > > From debugging output it looks like this is the plan being chosen > (cheapest total path): > Gather(store_sales household_demographics time_dim) rows=60626 > cost=3145.73..699910.15 > HashJoin(store_sales household_demographics time_dim) > rows=25261 cost=2145.73..692847.55 > clauses: store_sales.ss_hdemo_sk = > household_demographics.hd_demo_sk > HashJoin(store_sales time_dim) rows=252609 > cost=1989.73..692028.08 > clauses: store_sales.ss_sold_time_sk = > time_dim.t_time_sk > SeqScan(store_sales) rows=11998564 > cost=0.00..658540.64 > SeqScan(time_dim) rows=1070 > cost=0.00..1976.35 > SeqScan(household_demographics) rows=720 > cost=0.00..147.00 > > prepare_sort_from_pathkeys fails to find a pathkey because > tlist_member_ignore_relabel returns null -- which seemed weird because > the sortexpr is an Aggref (in a single member equivalence class) and > the tlist contains a single member that's also an Aggref. It turns out > that the only difference between the two Aggrefs is that the tlist > entry has "aggsplit = AGGSPLIT_INITIAL_SERIAL" while the sortexpr has > aggsplit = AGGSPLIT_SIMPLE. > > That's as far as I've gotten so far, but I figured I'd get that info > out to see if it means anything obvious to anyone else. This really goes back to [1] where we fixed a similar issue by making find_em_expr_usable_for_sorting_rel parallel the rules in prepare_sort_from_pathkeys. Most of those conditions got copied, and the case we were trying to handle is the fact that prepare_sort_from_pathkeys can generate a target list entry under those conditions if one doesn't exist. However there's a further restriction there I don't remember looking at: it uses pull_var_clause and tlist_member_ignore_relabel to ensure that all of the vars that feed into the sort expression are found in the target list. As I understand it, that is: it will build a target list entry for something like "md5(column)" if "column" (and that was one of our test cases for the previous fix) is in the target list already. But there's an additional detail here: the call to pull_var_clause requests aggregates, window functions, and placeholders be treated as vars. That means for our Aggref case it would require that the two Aggrefs be fully equal, so the differing aggsplit member would cause a target list entry not to be built, hence our error here. I've attached a quick and dirty patch that encodes that final rule from prepare_sort_from_pathkeys into find_em_expr_usable_for_sorting_rel. I can't help but think that there's a cleaner way to do with this with less code duplication, but hindering that is that prepare_sort_from_pathkeys is working with a TargetList while find_em_expr_usable_for_sorting_rel is working with a list of expressions. James 1: https://www.postgresql.org/message-id/CAAaqYe9C3f6A_tZCRfr9Dm7hPpgGwpp4i-K_%3DNS9GWXuNiFANg%40mail.gmail.com
Re: New IndexAM API controlling index vacuum strategies
Hi, On 2021-04-14 14:55:36 -0700, Peter Geoghegan wrote: > On Wed, Apr 14, 2021 at 12:33 PM Andres Freund wrote: > > I'm getting a bit bothered by the speed at which you're pushing fairly > > substantial behavioural for vacuum. In this case without even a warning > > that you're about to do so. > > To a large degree the failsafe is something that is written in the > hope that it will never be needed. This is unlike most other things, > and has its own unique risks. Among them that the code is not covered by tests and is unlikely to be meaningfully exercised within the beta timeframe due to the timeframes for hitting it (hard to actually hit below a 1/2 day running extreme workloads, weeks for more realistic ones). Which means that this code has to be extra vigorously reviewed, not the opposite. Or at least tests for it should be added (pg_resetwal + autovacuum_naptime=1s or such should make that doable, or even just running a small test with lower thresholds). > I just went one step further than that in this recent commit. I didn't > point these details out before now because (to me) this is beside the > point. Which is that the failsafe is just that -- a failsafe. Anything > that adds unnecessary unpredictable delay in reaching the point of > advancing relfrozenxid should be avoided. (Besides, the design of > should_attempt_truncation() and lazy_truncate_heap() is very far from > guaranteeing that truncation will take place at the best of times.) This line of argumentation scares me. Not explained arguments, about running in conditions that we otherwise don't run in, when in exceptional circumstances. This code has a history of super subtle interactions, with quite a few data loss causing bugs due to us not forseeing some combination of circumstances. I think there are good arguments for having logic for an "emergency vacuum" mode (and also some good ones against). I'm not convinced that the current set of things that are [not] skipped in failsafe mode is the "obviously right set of things"™ but am convinced that there wasn't enough consensus building o what that set of things is. This all also would be different if it were the start of the development window, rather than the end. In my experience the big problem with vacuums in a wraparound situation isn't actually things like truncation or eventhe index scans (although they certainly can cause bad problems), but that VACUUM modifies (prune/vacuum and WAL log or just setting hint bits) a crapton of pages that don't actually need to be modified just to be able to get out of the wraparound situation. And that the overhead of writing out all those dirty pages + WAL logging causes the VACUUM to take unacceptably long. E.g. because your storage is cloud storage with a few ms of latency, and the ringbuffer + wal_buffer sizes cause so many synchronous writes that you end up with < 10MB/s of data being processed. I think there's also a clear danger in having "cliffs" where the behaviour changes appruptly once a certain threshold is reached. It's not unlikely for systems to fall over entirely over when a) autovacuum cost limiting is disabled. E.g. reaching your disk iops/throughput quota and barely being able to log into postgres anymore to kill the stuck connection causing the wraparound issue. b) No index cleanup happens anymore. E.g. a workload with a lot of bitmap index scans (which do not support killtuples) could end up a off a lot worse because index pointers to dead tuples aren't being cleaned up. In cases where an old transaction or leftover replication slot is causing the problem (together a significant percentage of wraparound situations) this situation will persist across repeated (explicit or automatic) vacuums for a table, because relfrozenxid won't actually be advanced. And this in turn might actually end up slowing resolution of the wraparound issue more than doing all the index scans. Because this is a hard cliff rather than something phasing in, it's not really possible to for a user to see this slowly getting worse and addressing the issue. Especially for a) this could be addressed by not turning off cost limiting at once, but instead have it decrease the closer you get to some limit. Greetings, Andres Freund
Re: psql - add SHOW_ALL_RESULTS option
Michael Paquier writes: > On Mon, Apr 12, 2021 at 03:33:01PM -0400, Alvaro Herrera wrote: >> I, for one, would prefer to see the feature repaired in this cycle. > If it is possible to get that fixed, I would not mind waiting a bit > more but it would be nice to see some actual proposals. There are > already three identified bugs in psql introduced by this commit, > including the query cancellation. > That's a lot IMO, so my vote would be to discard this feature for now > and revisit it properly in the 15 dev cycle, so as resources are > redirected into more urgent issues (13 open items as of the moment of > writing this email). I don't wish to tell people which open issues they ought to work on ... but this patch seems like it could be quite a large can of worms, and I'm not detecting very much urgency about getting it fixed. If it's not to be reverted then some significant effort needs to be put into it soon. regards, tom lane
Replacing pg_depend PIN entries with a fixed range check
In [1] Andres and I speculated about whether we really need all those PIN entries in pg_depend. Here is a draft patch that gets rid of them. It turns out to be no big problem to replace the PIN entries with an OID range check, because there's a well-defined point in initdb where it wants to pin (almost) all existing objects, and then no objects created after that are pinned. In the earlier thread I'd imagined having initdb record the OID counter at that point in pg_control, and then we could look at the recorded counter value to make is-it-pinned decisions. However, that idea has a fatal problem: what shall pg_resetwal fill into that field when it has to gin up a pg_control file from scratch? There's no good way to reconstruct the value. Hence, what this patch does is to establish a manually-managed cutoff point akin to FirstBootstrapObjectId, and make initdb push the OID counter up to that once it's made the small number of pinned objects it's responsible for. With the value I used here, a couple hundred OIDs are wasted, but there seems to be a reasonable amount of headroom still beyond that. On my build, the OID counter at the end of initdb is 15485 (with a reasonable number of glibc and ICU locales loaded). So we still have about 900 free OIDs there; and there are 500 or so free just below FirstBootstrapObjectId, too. So this approach does hasten the day when we're going to run out of free OIDs below 16384, but not by all that much. There are a couple of objects, namely template1 and the public schema, that are in the catalog .dat files but are not supposed to be pinned. The existing code accomplishes that by excluding them (in two different ways :-() while filling pg_depend. This patch just hard-wires exceptions for them in IsPinnedObject(), which seems to me not much uglier than what we had before. The existing code also handles pinning of the standard tablespaces in an idiosyncratic way; I just dropped that and made them be treated as pinned. One interesting point about doing things this way is that IsPinnedObject() will give correct answers throughout initdb, whereas before the backend couldn't tell what was supposed to be pinned until after initdb loaded pg_depend. This means we don't need the hacky truncation of pg_depend and pg_shdepend that initdb used to do, because now the backend will correctly not make entries relating to objects it now knows are pinned. Aside from saving a few cycles, this is more correct. For example, if some object that initdb made after bootstrap but before truncating pg_depend had a dependency on the public schema, the existing coding would lose track of that fact. (There's no live issue of that sort, I hasten to say, and really it would be a bug to set things up that way because then you couldn't drop the public schema. But the existing coding would make things worse by not detecting the mistake.) Anyway, as to concrete results: * pg_depend's total relation size, in a freshly made database, drops from 1269760 bytes to 368640 bytes. * There seems to be a small but noticeable reduction in the time to run check-world. I compared runtimes on a not-particularly-modern machine with spinning-rust storage, using -j4 parallelism: HEAD real5m4.248s user2m59.390s sys 1m21.473s + patch real5m2.924s user2m36.196s sys 1m19.724s These top-line numbers don't look too impressive, but the CPU-time reduction seems quite significant. Probably on a different hardware platform that would translate more directly to runtime savings. I didn't try to reproduce the original performance bottleneck that was complained of in [1], but that might be fun to check. Anyway, I'll stick this in the next CF so we don't lose track of it. regards, tom lane [1] https://www.postgresql.org/message-id/flat/947172.1617684433%40sss.pgh.pa.us#6a3d250a9c4a994cb3a26c87384fc823 diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 2656786d1e..83a391eed1 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -3249,8 +3249,7 @@ SCRAM-SHA-256$iteration count: (references pg_class.oid) - The OID of the system catalog the dependent object is in, - or zero for a DEPENDENCY_PIN entry + The OID of the system catalog the dependent object is in @@ -3260,8 +3259,7 @@ SCRAM-SHA-256$iteration count: (references any OID column) - The OID of the specific dependent object, - or zero for a DEPENDENCY_PIN entry + The OID of the specific dependent object @@ -3464,19 +3462,6 @@ SCRAM-SHA-256$iteration count: - - - DEPENDENCY_PIN (p) - - - There is no dependent object; this type of entry is a signal - that the system itself depends on the referenced object, and so - that object must never be deleted. Entries of this type are - created
Re: Proposal: Save user's original authenticated identity for logging
On Tue, Apr 13, 2021 at 03:47:21PM +, Jacob Champion wrote: > Looks like the farm has gone green, after some test fixups. Thanks for > all the reviews! You may want to follow this thread as well, as the topic is related to what has been discussed on this thread as there is an impact in a different code path for the TAP tests, and not only the connection tests: https://www.postgresql.org/message-id/yhajnhcmai3++...@paquier.xyz -- Michael signature.asc Description: PGP signature
Re: File truncation within PostgresNode::issues_sql_like() wrong on Windows
On 4/14/21 8:10 PM, Michael Paquier wrote: > On Wed, Apr 14, 2021 at 05:10:41PM -0400, Andrew Dunstan wrote: >> That seems rather heavy-handed. The buildfarm's approach is a bit >> different. Essentially it seeks to the previous position of the log file >> before reading contents. Here is its equivalent of slurp_file: >> >> use Fcntl qw(:seek); >> sub file_lines >> { >> my $filename = shift; >> my $filepos = shift; >> my $handle; >> open($handle, '<', $filename) || croak "opening $filename: $!"; >> seek($handle, $filepos, SEEK_SET) if $filepos; >> my @lines = <$handle>; >> close $handle; >> return @lines; >> } > That's a bit surprising to see that you can safely open a file handle > with perl like that without using Win32API::File, and I would have > assumed that this would have conflicted with the backend redirecting > its output to stderr the same way as a truncation on Windows. > >> A client wanting what's done in PostgresNode would do something like: >> >> my $logpos = -s $logfile; >> do_some_stuff(); >> my @lines = file_lines($logfile, $logpos); >> >> This has the benefit of working the same on all platforms, and no >> truncation, rotation, or restart is required. > Jacob has suggested something like that a couple of days ago, but all > this code was not centralized yet in a single place. > > For this code, the cleanest approach would be to extend slurp_file() > with an extra argument to seek the file before fetching its contents > based on a location given by the caller? Looking at the docs of > Win32API::File, we'd need to use SetFilePointer() instead of seek(). Well, let me try it on fairywren tomorrow. Since we manage this on the buildfarm without any use at all of Win32API::File it might not be necessary in TAP code either, particularly if we're not truncating the file. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: psql - add SHOW_ALL_RESULTS option
On Mon, Apr 12, 2021 at 03:33:01PM -0400, Alvaro Herrera wrote: > Please note that there's no "for now" about it -- if the patch is > reverted, the only way to get it back is to wait for PG15. That's > undesirable. A better approach is to collect all those bugs and get > them fixed. There's plenty of time to do that. > > I, for one, would prefer to see the feature repaired in this cycle. If it is possible to get that fixed, I would not mind waiting a bit more but it would be nice to see some actual proposals. There are already three identified bugs in psql introduced by this commit, including the query cancellation. That's a lot IMO, so my vote would be to discard this feature for now and revisit it properly in the 15 dev cycle, so as resources are redirected into more urgent issues (13 open items as of the moment of writing this email). -- Michael signature.asc Description: PGP signature
Re: "could not find pathkey item to sort" for TPC-DS queries 94-96
On Wed, Apr 14, 2021 at 8:21 PM Robert Haas wrote: > > On Wed, Apr 14, 2021 at 5:43 PM James Coleman wrote: > > The query in question is: > > select count(*) > > from store_sales > > ,household_demographics > > ,time_dim, store > > where ss_sold_time_sk = time_dim.t_time_sk > > and ss_hdemo_sk = household_demographics.hd_demo_sk > > and ss_store_sk = s_store_sk > > and time_dim.t_hour = 15 > > and time_dim.t_minute >= 30 > > and household_demographics.hd_dep_count = 7 > > and store.s_store_name = 'ese' > > order by count(*) > > limit 100; > > > > From debugging output it looks like this is the plan being chosen > > (cheapest total path): > > Gather(store_sales household_demographics time_dim) rows=60626 > > cost=3145.73..699910.15 > > HashJoin(store_sales household_demographics time_dim) > > rows=25261 cost=2145.73..692847.55 > > clauses: store_sales.ss_hdemo_sk = > > household_demographics.hd_demo_sk > > HashJoin(store_sales time_dim) rows=252609 > > cost=1989.73..692028.08 > > clauses: store_sales.ss_sold_time_sk = > > time_dim.t_time_sk > > SeqScan(store_sales) rows=11998564 > > cost=0.00..658540.64 > > SeqScan(time_dim) rows=1070 > > cost=0.00..1976.35 > > SeqScan(household_demographics) rows=720 > > cost=0.00..147.00 > > This doesn't really make sense to me given the strack trace in the OP. > That seems to go Limit -> Sort -> Agg -> NestLoop -> NestLoop -> > NestLoop -> GatherMerge -> Sort. If the plan were as you have it here, > there would be no Sort and no Gather Merge, so where would be getting > a failure related to pathkeys? Also I just realized why this didn't make sense -- I'm not sure what the above path is. It'd gotten logged as part of the debug options I have configured, but it must be 1.) incomplete (perhaps at a lower level of path generation) and/or not the final path selected. My apologies for the confusion. James
Re: "could not find pathkey item to sort" for TPC-DS queries 94-96
On Wed, Apr 14, 2021 at 8:21 PM Robert Haas wrote: > > On Wed, Apr 14, 2021 at 5:43 PM James Coleman wrote: > > The query in question is: > > select count(*) > > from store_sales > > ,household_demographics > > ,time_dim, store > > where ss_sold_time_sk = time_dim.t_time_sk > > and ss_hdemo_sk = household_demographics.hd_demo_sk > > and ss_store_sk = s_store_sk > > and time_dim.t_hour = 15 > > and time_dim.t_minute >= 30 > > and household_demographics.hd_dep_count = 7 > > and store.s_store_name = 'ese' > > order by count(*) > > limit 100; > > > > From debugging output it looks like this is the plan being chosen > > (cheapest total path): > > Gather(store_sales household_demographics time_dim) rows=60626 > > cost=3145.73..699910.15 > > HashJoin(store_sales household_demographics time_dim) > > rows=25261 cost=2145.73..692847.55 > > clauses: store_sales.ss_hdemo_sk = > > household_demographics.hd_demo_sk > > HashJoin(store_sales time_dim) rows=252609 > > cost=1989.73..692028.08 > > clauses: store_sales.ss_sold_time_sk = > > time_dim.t_time_sk > > SeqScan(store_sales) rows=11998564 > > cost=0.00..658540.64 > > SeqScan(time_dim) rows=1070 > > cost=0.00..1976.35 > > SeqScan(household_demographics) rows=720 > > cost=0.00..147.00 > > This doesn't really make sense to me given the strack trace in the OP. > That seems to go Limit -> Sort -> Agg -> NestLoop -> NestLoop -> > NestLoop -> GatherMerge -> Sort. If the plan were as you have it here, > there would be no Sort and no Gather Merge, so where would be getting > a failure related to pathkeys? Ah, yeah, I'm not sure where the original stacktrace came from, but here's the stack for the query I reproduced it with (perhaps it does so on different queries or there was some other GUC change in the reported plan): #0 errfinish (filename=filename@entry=0x56416eefa845 "createplan.c", lineno=lineno@entry=6186, funcname=funcname@entry=0x56416eefb660 <__func__.24872> "prepare_sort_from_pathkeys") at elog.c:514 #1 0x56416eb6ed52 in prepare_sort_from_pathkeys (lefttree=0x564170552658, pathkeys=0x5641704f2640, relids=0x0, reqColIdx=reqColIdx@entry=0x0, adjust_tlist_in_place=adjust_tlist_in_place@entry=false, p_numsortkeys=p_numsortkeys@entry=0x7fff1252817c, p_sortColIdx=0x7fff12528170, p_sortOperators=0x7fff12528168, p_collations=0x7fff12528160, p_nullsFirst=0x7fff12528158) at createplan.c:6186 #2 0x56416eb6ee69 in make_sort_from_pathkeys (lefttree=, pathkeys=, relids=) at createplan.c:6313 #3 0x56416eb71fc7 in create_sort_plan (root=root@entry=0x564170511a70, best_path=best_path@entry=0x56417054f650, flags=flags@entry=1) at createplan.c:2118 #4 0x56416eb6f638 in create_plan_recurse (root=root@entry=0x564170511a70, best_path=0x56417054f650, flags=flags@entry=1) at createplan.c:489 #5 0x56416eb72e06 in create_gather_merge_plan (root=root@entry=0x564170511a70, best_path=best_path@entry=0x56417054f6e8) at createplan.c:1885 #6 0x56416eb6f723 in create_plan_recurse (root=root@entry=0x564170511a70, best_path=0x56417054f6e8, flags=flags@entry=4) at createplan.c:541 #7 0x56416eb726fb in create_agg_plan (root=root@entry=0x564170511a70, best_path=best_path@entry=0x56417054f8c8) at createplan.c:2238 #8 0x56416eb6f67b in create_plan_recurse (root=root@entry=0x564170511a70, best_path=0x56417054f8c8, flags=flags@entry=3) at createplan.c:509 #9 0x56416eb71f8e in create_sort_plan (root=root@entry=0x564170511a70, best_path=best_path@entry=0x56417054f560, flags=flags@entry=1) at createplan.c:2109 #10 0x56416eb6f638 in create_plan_recurse (root=root@entry=0x564170511a70, best_path=0x56417054f560, flags=flags@entry=1) at createplan.c:489 #11 0x56416eb72c83 in create_limit_plan (root=root@entry=0x564170511a70, best_path=best_path@entry=0x56417054ffa0, flags=flags@entry=1) at createplan.c:2784 #12 0x56416eb6f713 in create_plan_recurse (root=root@entry=0x564170511a70, best_path=0x56417054ffa0, flags=flags@entry=1) at createplan.c:536 #13 0x56416eb6f79d in create_plan (root=root@entry=0x564170511a70, best_path=) at createplan.c:349 #14 0x56416eb7fe93 in standard_planner (parse=0x564170437268, query_string=, cursorOptions=2048, boundParams=) at planner.c:407 > I think if we can get the correct plan the thing to look at would be > the tlists at the relevant levels of the plan. Does the information in [1] help at all? The tlist does have an Aggref, as expected, but its aggsplit value doesn't match the pathkey's Aggref's aggsplit value. James 1: https://www.postgresql.org/message-id/CAAaqYe_NU4hO9COoJdcXWqjtH%3DdGMknYdsSdJjZ%3DJOHPTea-Nw%40mail.gmail.com
Re: New IndexAM API controlling index vacuum strategies
Hi, On 2021-04-14 20:08:10 -0400, Robert Haas wrote: > On Wed, Apr 14, 2021 at 5:55 PM Peter Geoghegan wrote: > > On Wed, Apr 14, 2021 at 12:33 PM Andres Freund wrote: > > > I'm getting a bit bothered by the speed at which you're pushing fairly > > > substantial behavioural for vacuum. In this case without even a warning > > > that you're about to do so. > > > > To a large degree the failsafe is something that is written in the > > hope that it will never be needed. This is unlike most other things, > > and has its own unique risks. > > > > I think that the proper thing to do is to accept a certain amount of > > risk in this area. The previous status quo was *appalling*, and so it > > seems very unlikely that the failsafe hasn't mostly eliminated a lot > > of risk for users. That factor is not everything, but it should count > > for a lot. The only way that we're going to have total confidence in > > anything like this is through the experience of it mostly working over > > several releases. > > I think this is largely missing the point Andres was making, which is > that you made a significant behavior change after feature freeze > without any real opportunity for discussion. More generally, you've > changed a bunch of other stuff relatively quickly based on input from > a relatively limited number of people. Now, it's fair to say that it's > often hard to get input on things, and sometimes you have to just take > your best shot and hope you're right. But in this particular case, you > didn't even try to get broader participation or buy-in. That's not > good. Yep, that was what I was trying to get at. - Andres
Re: "could not find pathkey item to sort" for TPC-DS queries 94-96
On Wed, Apr 14, 2021 at 8:20 PM James Coleman wrote: > > Hmm, could be. Although, the stack trace at issue doesn't seem to show > > a call to create_incrementalsort_plan(). > > The changes to gather merge path generation made it possible to use > those paths in more cases for both incremental sort and regular sort, > so by "incremental sort" I read Tomas as saying "the patches that > brought in incremental sort" not specifically "incremental sort > itself". I agree. That's why I said "hmm, could be" even though the plan doesn't involve one. -- Robert Haas EDB: http://www.enterprisedb.com
Re: "could not find pathkey item to sort" for TPC-DS queries 94-96
On Wed, Apr 14, 2021 at 5:43 PM James Coleman wrote: > The query in question is: > select count(*) > from store_sales > ,household_demographics > ,time_dim, store > where ss_sold_time_sk = time_dim.t_time_sk > and ss_hdemo_sk = household_demographics.hd_demo_sk > and ss_store_sk = s_store_sk > and time_dim.t_hour = 15 > and time_dim.t_minute >= 30 > and household_demographics.hd_dep_count = 7 > and store.s_store_name = 'ese' > order by count(*) > limit 100; > > From debugging output it looks like this is the plan being chosen > (cheapest total path): > Gather(store_sales household_demographics time_dim) rows=60626 > cost=3145.73..699910.15 > HashJoin(store_sales household_demographics time_dim) > rows=25261 cost=2145.73..692847.55 > clauses: store_sales.ss_hdemo_sk = > household_demographics.hd_demo_sk > HashJoin(store_sales time_dim) rows=252609 > cost=1989.73..692028.08 > clauses: store_sales.ss_sold_time_sk = > time_dim.t_time_sk > SeqScan(store_sales) rows=11998564 > cost=0.00..658540.64 > SeqScan(time_dim) rows=1070 > cost=0.00..1976.35 > SeqScan(household_demographics) rows=720 > cost=0.00..147.00 This doesn't really make sense to me given the strack trace in the OP. That seems to go Limit -> Sort -> Agg -> NestLoop -> NestLoop -> NestLoop -> GatherMerge -> Sort. If the plan were as you have it here, there would be no Sort and no Gather Merge, so where would be getting a failure related to pathkeys? I think if we can get the correct plan the thing to look at would be the tlists at the relevant levels of the plan. -- Robert Haas EDB: http://www.enterprisedb.com
Re: New IndexAM API controlling index vacuum strategies
On Wed, Apr 14, 2021 at 5:08 PM Robert Haas wrote: > I think this is largely missing the point Andres was making, which is > that you made a significant behavior change after feature freeze > without any real opportunity for discussion. I don't believe that it was a significant behavior change, for the reason I gave: the fact of the matter is that it's practically impossible for us to truncate the heap anyway, provided we have already decided to not vacuum (as opposed to prune) heap pages that almost certainly have some LP_DEAD items in them. Note that later heap pages are the most likely to still have some LP_DEAD items once the failsafe triggers, which are precisely the ones that will affect whether or not we can truncate the whole heap. I accept that I could have done better with the messaging. I'll try to avoid repeating that mistake in the future. > More generally, you've > changed a bunch of other stuff relatively quickly based on input from > a relatively limited number of people. Now, it's fair to say that it's > often hard to get input on things, and sometimes you have to just take > your best shot and hope you're right. I agree in general, and I agree that that's what I've done in this instance. It goes without saying, but I'll say it anyway: I accept full responsibility. > But in this particular case, you > didn't even try to get broader participation or buy-in. That's not > good. I will admit to being somewhat burned out by this project. That might have been a factor. -- Peter Geoghegan
Re: "could not find pathkey item to sort" for TPC-DS queries 94-96
On Wed, Apr 14, 2021 at 8:16 PM Robert Haas wrote: > > On Mon, Apr 12, 2021 at 8:37 AM Tomas Vondra > wrote: > > Could be related to incremental sort, which allowed some gather merge > > paths that were impossible before. We had a couple issues related to > > that fixed in November, IIRC. > > Hmm, could be. Although, the stack trace at issue doesn't seem to show > a call to create_incrementalsort_plan(). The changes to gather merge path generation made it possible to use those paths in more cases for both incremental sort and regular sort, so by "incremental sort" I read Tomas as saying "the patches that brought in incremental sort" not specifically "incremental sort itself". James
Re: Converting contrib SQL functions to new style
On 4/15/21 12:18 AM, Mark Dilger wrote: > > >> On Apr 14, 2021, at 2:47 PM, Vik Fearing wrote: >> >> On 4/14/21 7:36 PM, Tom Lane wrote: >>> Mark Dilger writes: > On Apr 13, 2021, at 3:26 PM, Tom Lane wrote: > However I think we may still need an assumption that earthdistance > and cube are in the same schema --- any comments on that? >>> This is probably not worth doing, and we are already past feature freeze, but adding syntax to look up the namespace of an extension might help. >>> >>> Yeah, that idea was discussed before (perhaps only in private >>> security-team threads, though). We didn't do anything about it because >>> at the time there didn't seem to be pressing need, but in the context >>> of SQL function bodies there's an obvious use-case. >>> We could get something like this working just inside the CREATE EXTENSION command if we expanded on the @extschema@ idea a bit. At first I thought this idea would suffer race conditions with concurrent modifications of pg_extension or pg_namespace, but it looks like we already have a snapshot when processing the script file, so: >>> -CREATE DOMAIN earth AS cube +CREATE DOMAIN @@earthdistance@@::earth AS @@cube@@::cube >>> >>> Right, extending the @extschema@ mechanism is what was discussed, >>> though I think I'd lean towards something like @extschema:cube@ >>> to denote the schema of a referenced extension "cube". >>> >>> I'm not sure this is useful enough to break feature freeze for, >>> but I'm +1 for investigating it for v15. >> Just like we have a pseudo "$user" schema, could we have a pseudo >> "$extension" catalog? That should avoid changing grammar rules too much. >> >> CREATE TABLE unaccented_words ( >>word "$extension".citext.citext, >>CHECK (word = "$extension".unaccent.unaccent(word) >> ); > > Having a single variable $extension might help in many cases, but I don't see > how to use it to handle the remaining cross-extension references, such as > earthdistance needing to reference cube. Sorry, I hadn't realized that was a real example so I made up my own. Basically my idea is to use the fully qualified catalog.schema.object syntax where the catalog is a special "$extension" value (meaning we would have to forbid that as an actual database name) and the schema is the name of the extension whose schema we want. The object is then just the object. CREATE DOMAIN earth AS "$extension".cube.cube CONSTRAINT not_point check("$extension".cube.cube_is_point(value)) CONSTRAINT not_3d check("$extension".cube.cube_dim(value <= 3) ...; CREATE FUNCTION earth_box(earth, float8) RETURNS "$extension".cube.cube LANGUAGE sql IMMUTABLE PARALLEL SAFE STRICT RETURN "$extension".cube.cube_enlarge($1, gc_to_sec($2), 3); If I had my druthers, we would spell it pg_extension instead of "$extension" because I hate double-quoting identifiers, but that's just bikeshedding and has little to do with the concept itself. -- Vik Fearing
Re: "could not find pathkey item to sort" for TPC-DS queries 94-96
On Mon, Apr 12, 2021 at 8:37 AM Tomas Vondra wrote: > Could be related to incremental sort, which allowed some gather merge > paths that were impossible before. We had a couple issues related to > that fixed in November, IIRC. Hmm, could be. Although, the stack trace at issue doesn't seem to show a call to create_incrementalsort_plan(). -- Robert Haas EDB: http://www.enterprisedb.com
Re: File truncation within PostgresNode::issues_sql_like() wrong on Windows
On Wed, Apr 14, 2021 at 05:10:41PM -0400, Andrew Dunstan wrote: > That seems rather heavy-handed. The buildfarm's approach is a bit > different. Essentially it seeks to the previous position of the log file > before reading contents. Here is its equivalent of slurp_file: > > use Fcntl qw(:seek); > sub file_lines > { > my $filename = shift; > my $filepos = shift; > my $handle; > open($handle, '<', $filename) || croak "opening $filename: $!"; > seek($handle, $filepos, SEEK_SET) if $filepos; > my @lines = <$handle>; > close $handle; > return @lines; > } That's a bit surprising to see that you can safely open a file handle with perl like that without using Win32API::File, and I would have assumed that this would have conflicted with the backend redirecting its output to stderr the same way as a truncation on Windows. > A client wanting what's done in PostgresNode would do something like: > > my $logpos = -s $logfile; > do_some_stuff(); > my @lines = file_lines($logfile, $logpos); > > This has the benefit of working the same on all platforms, and no > truncation, rotation, or restart is required. Jacob has suggested something like that a couple of days ago, but all this code was not centralized yet in a single place. For this code, the cleanest approach would be to extend slurp_file() with an extra argument to seek the file before fetching its contents based on a location given by the caller? Looking at the docs of Win32API::File, we'd need to use SetFilePointer() instead of seek(). -- Michael signature.asc Description: PGP signature
Re: New IndexAM API controlling index vacuum strategies
On Wed, Apr 14, 2021 at 5:55 PM Peter Geoghegan wrote: > On Wed, Apr 14, 2021 at 12:33 PM Andres Freund wrote: > > I'm getting a bit bothered by the speed at which you're pushing fairly > > substantial behavioural for vacuum. In this case without even a warning > > that you're about to do so. > > To a large degree the failsafe is something that is written in the > hope that it will never be needed. This is unlike most other things, > and has its own unique risks. > > I think that the proper thing to do is to accept a certain amount of > risk in this area. The previous status quo was *appalling*, and so it > seems very unlikely that the failsafe hasn't mostly eliminated a lot > of risk for users. That factor is not everything, but it should count > for a lot. The only way that we're going to have total confidence in > anything like this is through the experience of it mostly working over > several releases. I think this is largely missing the point Andres was making, which is that you made a significant behavior change after feature freeze without any real opportunity for discussion. More generally, you've changed a bunch of other stuff relatively quickly based on input from a relatively limited number of people. Now, it's fair to say that it's often hard to get input on things, and sometimes you have to just take your best shot and hope you're right. But in this particular case, you didn't even try to get broader participation or buy-in. That's not good. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Can a child process detect postmaster death when in pg_usleep?
Hi Bharath, On Thu, Apr 15, 2021 at 2:06 AM Bharath Rupireddy wrote: > 1) Is it really harmful to use pg_usleep in a postmaster child process > as it doesn't let the child process detect postmaster death? Yeah, that's a bad idea. Any long-term waiting (including short waits in a loop) should ideally be done with the latch infrastructure. One interesting and unusual case is recovery: it can run for a very long time without reaching a waiting primitive of any kind (other than LWLock, which doesn't count), because it can be busy applying records for hours at a time. In that case, we take special measures and explicitly check if the postmaster is dead in the redo loop. In theory, you could do the same in any loop containing pg_usleep() (we used to have several loops doing that, especially around replication code), but it'd be better to use the existing wait-event-multiplexing technology we have, and keep improving that. Some people have argued that long running queries should *also* react faster when the PM exits, a bit like recovery ... which leads to the next point... > 2) Can pg_usleep() always detect signals? I see the caution in the > pg_usleep function definition in pgsleep.c, saying the signal handling > is platform dependent. We have code blocks like below in the code. Do > we actually process interrupts before going to sleep with pg_usleep()? > while/for loop > { > .. > .. > CHECK_FOR_INTERRUPTS(); > pg_usleep(); > } > and > if (PostAuthDelay) >pg_usleep(); CHECK_FOR_INTERRUPTS() has nothing to do with postmaster death detection, currently, so that'd be for dealing with interrupts, not for that. Also, there would be a race: a signal on its own isn't enough on systems where we have them and where select() is guaranteed to wake up, because the signal might arrive between CFI() and pg_usleep(100 years). latch.c knows how to void such problems. There may be an argument that CFI() *should* be a potential postmaster-death-exit point, instead of having WaitLatch() (or its caller) handle it directly, but it's complicated. At the time the postmaster pipe system was invented we didn't have a signals for this so it wasn't even a candidate for treatment as an "interrupt". On systems that have postmaster death signals today (Linux + FreeBSD, but I suspect we can extend this to every Unix we support, see CF #3066, and a solution for Windows has been mentioned too), clearly the signal handler could set a new interrupt flag PostmasterLost + InterruptPending, and then CHECK_FOR_INTERRUPTS() could see it and exit. The argument against this is that exiting isn't always the right thing! In a couple of places, we do something special, such as printing a special error message (examples: sync rep and the main FEBE client read). Look for WL_POSTMASTER_DEATH (as opposed to WL_EXIT_ON_PM_DEATH). So I guess you'd need to reverse those decisions and standardise on "exit immediately, no message", or invented a way to suppress that behaviour in code regions. > 3) Is it intentional to use pg_usleep in some places in the code? If > yes, what are they? At least, I see one place where it's intentional > in the wait_pid function which is used while running the regression > tests. There are plenty of places that do a short sleep for various reasons, more like a deliberate stall or backoff or auth thing, and it's probably OK if they're shortish and not really a condition polling loop with an obvious latch/CV-based replacement. Note also that LWLock waits are similar. > 4) Are there any places where we need to replace pg_usleep with > WaitLatch/equivalent of pg_sleep to detect the postmaster death > properly? We definitely have replaced a lot of sleeps with latch.c primitives over the past few years, since we got WL_EXIT_ON_PM_DEATH and condition variables. There may be many more to improve... You mentioned autovacuum... yeah, Stephen fixed one of these with commit 4753ef37, but yeah it's not great to have those others in there...
Re: Converting contrib SQL functions to new style
> On Apr 14, 2021, at 2:47 PM, Vik Fearing wrote: > > On 4/14/21 7:36 PM, Tom Lane wrote: >> Mark Dilger writes: On Apr 13, 2021, at 3:26 PM, Tom Lane wrote: However I think we may still need an assumption that earthdistance and cube are in the same schema --- any comments on that? >> >>> This is probably not worth doing, and we are already past feature >>> freeze, but adding syntax to look up the namespace of an extension might >>> help. >> >> Yeah, that idea was discussed before (perhaps only in private >> security-team threads, though). We didn't do anything about it because >> at the time there didn't seem to be pressing need, but in the context >> of SQL function bodies there's an obvious use-case. >> >>> We could get something like this working just inside the CREATE EXTENSION >>> command if we expanded on the @extschema@ idea a bit. At first I thought >>> this idea would suffer race conditions with concurrent modifications of >>> pg_extension or pg_namespace, but it looks like we already have a snapshot >>> when processing the script file, so: >> >>> -CREATE DOMAIN earth AS cube >>> +CREATE DOMAIN @@earthdistance@@::earth AS @@cube@@::cube >> >> Right, extending the @extschema@ mechanism is what was discussed, >> though I think I'd lean towards something like @extschema:cube@ >> to denote the schema of a referenced extension "cube". >> >> I'm not sure this is useful enough to break feature freeze for, >> but I'm +1 for investigating it for v15. > Just like we have a pseudo "$user" schema, could we have a pseudo > "$extension" catalog? That should avoid changing grammar rules too much. > > CREATE TABLE unaccented_words ( >word "$extension".citext.citext, >CHECK (word = "$extension".unaccent.unaccent(word) > ); Having a single variable $extension might help in many cases, but I don't see how to use it to handle the remaining cross-extension references, such as earthdistance needing to reference cube. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: New IndexAM API controlling index vacuum strategies
On Wed, Apr 14, 2021 at 12:33 PM Andres Freund wrote: > I'm getting a bit bothered by the speed at which you're pushing fairly > substantial behavioural for vacuum. In this case without even a warning > that you're about to do so. To a large degree the failsafe is something that is written in the hope that it will never be needed. This is unlike most other things, and has its own unique risks. I think that the proper thing to do is to accept a certain amount of risk in this area. The previous status quo was *appalling*, and so it seems very unlikely that the failsafe hasn't mostly eliminated a lot of risk for users. That factor is not everything, but it should count for a lot. The only way that we're going to have total confidence in anything like this is through the experience of it mostly working over several releases. > I don't think it's that blindingly obvious that skipping truncation is > the right thing to do that it doesn't need review. Consider e.g. the > case that you're close to wraparound because you ran out of space for > the amount of WAL VACUUM produces, previously leading to autovacuums > being aborted / the server restarted. The user might then stop regular > activity and try to VACUUM. Skipping the truncation might now make it > harder to actually vacuum all the tables without running out of space. Note that the criteria for whether or not "hastup=false" for a page is slightly different in lazy_scan_prune() -- I added a comment that points this out directly (the fact that it works that way is not new, and might have originally been a happy mistake). Unlike count_nondeletable_pages(), which is used by heap truncation, lazy_scan_prune() is concerned about whether or not it's *likely to be possible* to truncate away the page by the time lazy_truncate_heap() is reached (if it is reached at all). And so it's optimistic about LP_DEAD items that it observes being removed by lazy_vacuum_heap_page() before we get to lazy_truncate_heap(). It's inherently race-prone anyway, so it might as well assume that LP_DEAD items will eventually become LP_UNUSED items later on. It follows that the chances of lazy_truncate_heap() failing to truncate anything when the failsafe has already triggered are exceptionally high -- all the LP_DEAD items are still there, and cannot be safely removed during truncation (for the usual reasons). I just went one step further than that in this recent commit. I didn't point these details out before now because (to me) this is beside the point. Which is that the failsafe is just that -- a failsafe. Anything that adds unnecessary unpredictable delay in reaching the point of advancing relfrozenxid should be avoided. (Besides, the design of should_attempt_truncation() and lazy_truncate_heap() is very far from guaranteeing that truncation will take place at the best of times.) FWIW, my intention is to try to get as much feedback about the failsafe as I possibly can -- it's hard to reason about exceptional events. I'm also happy to further discuss the specifics with you now. -- Peter Geoghegan
Re: Converting contrib SQL functions to new style
On 4/14/21 7:36 PM, Tom Lane wrote: > Mark Dilger writes: >>> On Apr 13, 2021, at 3:26 PM, Tom Lane wrote: >>> However I think we may still need an assumption that earthdistance >>> and cube are in the same schema --- any comments on that? > >> This is probably not worth doing, and we are already past feature >> freeze, but adding syntax to look up the namespace of an extension might >> help. > > Yeah, that idea was discussed before (perhaps only in private > security-team threads, though). We didn't do anything about it because > at the time there didn't seem to be pressing need, but in the context > of SQL function bodies there's an obvious use-case. > >> We could get something like this working just inside the CREATE EXTENSION >> command if we expanded on the @extschema@ idea a bit. At first I thought >> this idea would suffer race conditions with concurrent modifications of >> pg_extension or pg_namespace, but it looks like we already have a snapshot >> when processing the script file, so: > >> -CREATE DOMAIN earth AS cube >> +CREATE DOMAIN @@earthdistance@@::earth AS @@cube@@::cube > > Right, extending the @extschema@ mechanism is what was discussed, > though I think I'd lean towards something like @extschema:cube@ > to denote the schema of a referenced extension "cube". > > I'm not sure this is useful enough to break feature freeze for, > but I'm +1 for investigating it for v15. Just like we have a pseudo "$user" schema, could we have a pseudo "$extension" catalog? That should avoid changing grammar rules too much. CREATE TABLE unaccented_words ( word "$extension".citext.citext, CHECK (word = "$extension".unaccent.unaccent(word) ); -- Vik Fearing
Re: "could not find pathkey item to sort" for TPC-DS queries 94-96
On Mon, Apr 12, 2021 at 8:37 AM Tomas Vondra wrote: > > On 4/12/21 2:24 PM, Luc Vlaming wrote: > > Hi, > > > > When trying to run on master (but afaik also PG-13) TPC-DS queries 94, > > 95 and 96 on a SF10 I get the error "could not find pathkey item to sort". > > When I disable enable_gathermerge the problem goes away and then the > > plan for query 94 looks like below. I tried figuring out what the > > problem is but to be honest I would need some pointers as the code that > > tries to matching equivalence members in prepare_sort_from_pathkeys is > > something i'm really not familiar with. > > > > Could be related to incremental sort, which allowed some gather merge > paths that were impossible before. We had a couple issues related to > that fixed in November, IIRC. > > > To reproduce you can either ingest and test using the toolkit I used too > > (see https://github.com/swarm64/s64da-benchmark-toolkit/), or > > alternatively just use the schema (see > > https://github.com/swarm64/s64da-benchmark-toolkit/tree/master/benchmarks/tpcds/schemas/psql_native) > > > > Thanks, I'll see if I can reproduce that with your schema. > > > regards > > -- > Tomas Vondra > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company The query in question is: select count(*) from store_sales ,household_demographics ,time_dim, store where ss_sold_time_sk = time_dim.t_time_sk and ss_hdemo_sk = household_demographics.hd_demo_sk and ss_store_sk = s_store_sk and time_dim.t_hour = 15 and time_dim.t_minute >= 30 and household_demographics.hd_dep_count = 7 and store.s_store_name = 'ese' order by count(*) limit 100; >From debugging output it looks like this is the plan being chosen (cheapest total path): Gather(store_sales household_demographics time_dim) rows=60626 cost=3145.73..699910.15 HashJoin(store_sales household_demographics time_dim) rows=25261 cost=2145.73..692847.55 clauses: store_sales.ss_hdemo_sk = household_demographics.hd_demo_sk HashJoin(store_sales time_dim) rows=252609 cost=1989.73..692028.08 clauses: store_sales.ss_sold_time_sk = time_dim.t_time_sk SeqScan(store_sales) rows=11998564 cost=0.00..658540.64 SeqScan(time_dim) rows=1070 cost=0.00..1976.35 SeqScan(household_demographics) rows=720 cost=0.00..147.00 prepare_sort_from_pathkeys fails to find a pathkey because tlist_member_ignore_relabel returns null -- which seemed weird because the sortexpr is an Aggref (in a single member equivalence class) and the tlist contains a single member that's also an Aggref. It turns out that the only difference between the two Aggrefs is that the tlist entry has "aggsplit = AGGSPLIT_INITIAL_SERIAL" while the sortexpr has aggsplit = AGGSPLIT_SIMPLE. That's as far as I've gotten so far, but I figured I'd get that info out to see if it means anything obvious to anyone else. James
Re: File truncation within PostgresNode::issues_sql_like() wrong on Windows
On 4/14/21 4:13 AM, Michael Paquier wrote: > Hi all, > > As fairywren has proved a couple of days ago, it is not really a good > idea to rely on a file truncation to check for patterns in the logs of > the backend: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren=2021-04-07%2013%3A29%3A28 > > Visibly, a logic based on the log file truncation fails on Windows > because of the concurrent access of the backend that outputs its logs > there. In PostgresNode.pm, connect_ok() and connect_access() enforce > a rotation of the log file before restarting the server on Windows to > make sure that a given step does not find logs generated by a previous > test, but that's not the case of issues_sql_like(). Looking at the > existing tests using this routine (src/bin/scripts/), I have found on > test in 090_reindexdb.pl that could lead to a false positive. The > test is marked in the patch attached, just for awareness. > > Would there be any objections to change this routine so as we avoid > the file truncation on Windows? The patch attached achieves that. > > Any thoughts? That seems rather heavy-handed. The buildfarm's approach is a bit different. Essentially it seeks to the previous position of the log file before reading contents. Here is its equivalent of slurp_file: use Fcntl qw(:seek); sub file_lines { my $filename = shift; my $filepos = shift; my $handle; open($handle, '<', $filename) || croak "opening $filename: $!"; seek($handle, $filepos, SEEK_SET) if $filepos; my @lines = <$handle>; close $handle; return @lines; } A client wanting what's done in PostgresNode would do something like: my $logpos = -s $logfile; do_some_stuff(); my @lines = file_lines($logfile, $logpos); This has the benefit of working the same on all platforms, and no truncation, rotation, or restart is required. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Possible typo/unclear comment in joinpath.c
On Wed, Apr 14, 2021 at 1:27 PM Tom Lane wrote: > > Justin Pryzby writes: > > On Wed, Apr 14, 2021 at 11:36:38AM -0400, James Coleman wrote: > >> In joinpath.c three times we reference "extra_lateral_rels" (with > >> underscores like it's a field), but as far as I can tell that's not a > >> field anywhere in the source code, and looking at the code that > >> follows it seems like it should be referencing "lateral_relids" (and > >> the "extra" is really "extra [in relation to relids]"). > > > It looks like a loose end from > > > commit edca44b1525b3d591263d032dc4fe500ea771e0e > > Author: Tom Lane > > Date: Mon Dec 7 18:56:14 2015 -0500 > > > Simplify LATERAL-related calculations within add_paths_to_joinrel(). > > Yeah :-(. I'm usually pretty careful about grepping for comment > references as well as code references to a field when I do something > like that, but obviously I missed that step that time. > > Will fix, thanks James! > > regards, tom lane Thanks for fixing, Tom! James
Re: Converting contrib SQL functions to new style
Andrew Dunstan writes: > On 4/14/21 2:03 PM, Tom Lane wrote: >> This may mean that squeezing these contrib changes into v14 is a lost >> cause. We certainly shouldn't try to do what I suggest above for >> v14; but without it, these changes are just moving the security >> issue to a different place rather than eradicating it completely. > Is there anything else we should be doing along the eat your own dogfood > line that don't have these security implications? We can still convert the initdb-created SQL functions to new style, since there's no security threat during initdb. I'll make a patch for that soon. regards, tom lane
Re: Possible typo/unclear comment in joinpath.c
On Wed, Apr 14, 2021 at 2:32 PM Tom Lane wrote: > No, I take that back. There were no references to extra_lateral_rels > after that commit; these comments were added by 45be99f8c, about > six weeks later. The latter was a pretty large patch and had > presumably been under development for quite some time, so the comments > were probably accurate when written but didn't get updated. Woops. -- Robert Haas EDB: http://www.enterprisedb.com
Re: New IndexAM API controlling index vacuum strategies
Hi, On 2021-04-13 12:59:03 -0700, Peter Geoghegan wrote: > I agree. Bypassing heap truncation is exactly the kind of thing that > risks adding significant, unpredictable delay at a time when we need > to advance relfrozenxid as quickly as possible. > > I pushed a trivial commit that makes the failsafe bypass heap > truncation as well just now. I'm getting a bit bothered by the speed at which you're pushing fairly substantial behavioural for vacuum. In this case without even a warning that you're about to do so. I don't think it's that blindingly obvious that skipping truncation is the right thing to do that it doesn't need review. Consider e.g. the case that you're close to wraparound because you ran out of space for the amount of WAL VACUUM produces, previously leading to autovacuums being aborted / the server restarted. The user might then stop regular activity and try to VACUUM. Skipping the truncation might now make it harder to actually vacuum all the tables without running out of space. FWIW, I also don't like that substantial behaviour changes to how vacuum works were discussed only in a thread titled "New IndexAM API controlling index vacuum strategies". Greetings, Andres Freund
Re: Converting contrib SQL functions to new style
On 4/14/21 2:03 PM, Tom Lane wrote: > Robert Haas writes: >> On Wed, Apr 14, 2021 at 1:41 PM Tom Lane wrote: >>> Could we hack things so that extension scripts are only allowed to >>> reference objects created (a) by the system, (b) earlier in the >>> same script, or (c) owned by one of the declared prerequisite >>> extensions? Seems like that might provide a pretty bulletproof >>> defense against trojan-horse objects, though I'm not sure how much >>> of a pain it'd be to implement. >> That doesn't seem like a crazy idea, but the previous idea of having >> some magic syntax that means "the schema where extension FOO is" seems >> like it might be easier to implement and more generally useful. > I think that's definitely useful, but it's not a fix for the > reference-capture problem unless you care to assume that the other > extension's schema is free of trojan-horse objects. So I'm thinking > that we really ought to pursue both ideas. > > This may mean that squeezing these contrib changes into v14 is a lost > cause. We certainly shouldn't try to do what I suggest above for > v14; but without it, these changes are just moving the security > issue to a different place rather than eradicating it completely. > > Is there anything else we should be doing along the eat your own dogfood line that don't have these security implications? cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Tue, Apr 13, 2021 at 01:30:16PM -0400, Álvaro Herrera wrote: > On 2021-Apr-12, Bruce Momjian wrote: > > > OK, the attached patch renames pg_stat_activity.queryid to 'query_id'. I > > have not changed any of the APIs which existed before this feature was > > added, and are called "queryid" or "queryId" --- it is kind of a mess. > > I assume I should leave those unchanged. It will also need a catversion > > bump. > > I think it is fine actually. These names appear in structs Query and > PlannedStmt, and every single member of those already uses camelCase > naming. Changing those to use "query_id" would look out of place. > You did change the one in PgBackendStatus to st_query_id, which also > matches the naming style in that struct, so that looks fine also. > > So I'm -1 on Julien's first proposed change, and +1 on his second > proposed change (the name of the first argument of > pgstat_report_query_id should be query_id). Thanks for your analysis. Updated patch attached with the change suggested above. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion. query_id.diff.gz Description: application/gzip
Re: Possible typo/unclear comment in joinpath.c
I wrote: > Justin Pryzby writes: >> It looks like a loose end from >> commit edca44b1525b3d591263d032dc4fe500ea771e0e > Yeah :-(. I'm usually pretty careful about grepping for comment > references as well as code references to a field when I do something > like that, but obviously I missed that step that time. No, I take that back. There were no references to extra_lateral_rels after that commit; these comments were added by 45be99f8c, about six weeks later. The latter was a pretty large patch and had presumably been under development for quite some time, so the comments were probably accurate when written but didn't get updated. regards, tom lane
Re: Converting contrib SQL functions to new style
Robert Haas writes: > On Wed, Apr 14, 2021 at 1:41 PM Tom Lane wrote: >> Could we hack things so that extension scripts are only allowed to >> reference objects created (a) by the system, (b) earlier in the >> same script, or (c) owned by one of the declared prerequisite >> extensions? Seems like that might provide a pretty bulletproof >> defense against trojan-horse objects, though I'm not sure how much >> of a pain it'd be to implement. > That doesn't seem like a crazy idea, but the previous idea of having > some magic syntax that means "the schema where extension FOO is" seems > like it might be easier to implement and more generally useful. I think that's definitely useful, but it's not a fix for the reference-capture problem unless you care to assume that the other extension's schema is free of trojan-horse objects. So I'm thinking that we really ought to pursue both ideas. This may mean that squeezing these contrib changes into v14 is a lost cause. We certainly shouldn't try to do what I suggest above for v14; but without it, these changes are just moving the security issue to a different place rather than eradicating it completely. regards, tom lane
Re: Converting contrib SQL functions to new style
On Wed, Apr 14, 2021 at 1:41 PM Tom Lane wrote: > Doesn't help that much, because you still have to reference objects > already created by your own extension, so it's hard to see how the > target schema won't need to be in the path. Oh, woops. > Could we hack things so that extension scripts are only allowed to > reference objects created (a) by the system, (b) earlier in the > same script, or (c) owned by one of the declared prerequisite > extensions? Seems like that might provide a pretty bulletproof > defense against trojan-horse objects, though I'm not sure how much > of a pain it'd be to implement. That doesn't seem like a crazy idea, but the previous idea of having some magic syntax that means "the schema where extension FOO is" seems like it might be easier to implement and more generally useful. If we taught the core system that %!!**&^%?(earthdistance) means "the schema where the earthdistance is located" that syntax might get some use even outside of extension creation scripts, which seems like it could be a good thing, just because code that is used more widely is more likely to have been debugged to the point where it actually works. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Options given both on cmd-line and in the config with different values
Honza Horak writes: > I'm trying to understand what is happening in the following bug report: > https://bugzilla.redhat.com/show_bug.cgi?id=1935301 > The upgrade process makes it a bit more difficult, but it seems to boil > down to this problem -- even when pg_ctl gets clear guidance where to > find datadir using -D option on the command-line, it forgets this > guidance once finding data_directory option in the postgresql.conf. > Is this the expected behavior actually? The rule actually is that -D on the command line says where to find the configuration file. While -D is then also the default for where to find the data directory, the config file can override that by giving data_directory explicitly. This is intended to support situations where the config file is kept outside the data directory for management reasons. If you are not actively doing that, I'd recommend *not* setting data_directory explicitly in the file. While I've not studied the bug report carefully, it sounds like the update process you're using involves copying the old config file across verbatim. You'd at minimum need to filter out data_directory and related settings to make that safe. regards, tom lane
Re: Converting contrib SQL functions to new style
Robert Haas writes: > On Wed, Apr 14, 2021 at 10:49 AM Tom Lane wrote: >> The situation of interest is where you are trying to install an extension >> into a schema that also contains malicious objects. We've managed to make >> most of the commands you might use in an extension script secure against >> that situation, and Noah wants to hold SQL-function creation to that same >> standard. > Oh, I was forgetting that the creation schema has to be first in your > search path. :-( > Does the idea of allowing the creation schema to be set separately > have any legs? Because it seems like that would help here. Doesn't help that much, because you still have to reference objects already created by your own extension, so it's hard to see how the target schema won't need to be in the path. [ thinks for awhile ... ] Could we hack things so that extension scripts are only allowed to reference objects created (a) by the system, (b) earlier in the same script, or (c) owned by one of the declared prerequisite extensions? Seems like that might provide a pretty bulletproof defense against trojan-horse objects, though I'm not sure how much of a pain it'd be to implement. regards, tom lane
Re: Converting contrib SQL functions to new style
Mark Dilger writes: >> On Apr 13, 2021, at 3:26 PM, Tom Lane wrote: >> However I think we may still need an assumption that earthdistance >> and cube are in the same schema --- any comments on that? > This is probably not worth doing, and we are already past feature > freeze, but adding syntax to look up the namespace of an extension might > help. Yeah, that idea was discussed before (perhaps only in private security-team threads, though). We didn't do anything about it because at the time there didn't seem to be pressing need, but in the context of SQL function bodies there's an obvious use-case. > We could get something like this working just inside the CREATE EXTENSION > command if we expanded on the @extschema@ idea a bit. At first I thought > this idea would suffer race conditions with concurrent modifications of > pg_extension or pg_namespace, but it looks like we already have a snapshot > when processing the script file, so: > -CREATE DOMAIN earth AS cube > +CREATE DOMAIN @@earthdistance@@::earth AS @@cube@@::cube Right, extending the @extschema@ mechanism is what was discussed, though I think I'd lean towards something like @extschema:cube@ to denote the schema of a referenced extension "cube". I'm not sure this is useful enough to break feature freeze for, but I'm +1 for investigating it for v15. regards, tom lane
Re: Possible typo/unclear comment in joinpath.c
Justin Pryzby writes: > On Wed, Apr 14, 2021 at 11:36:38AM -0400, James Coleman wrote: >> In joinpath.c three times we reference "extra_lateral_rels" (with >> underscores like it's a field), but as far as I can tell that's not a >> field anywhere in the source code, and looking at the code that >> follows it seems like it should be referencing "lateral_relids" (and >> the "extra" is really "extra [in relation to relids]"). > It looks like a loose end from > commit edca44b1525b3d591263d032dc4fe500ea771e0e > Author: Tom Lane > Date: Mon Dec 7 18:56:14 2015 -0500 > Simplify LATERAL-related calculations within add_paths_to_joinrel(). Yeah :-(. I'm usually pretty careful about grepping for comment references as well as code references to a field when I do something like that, but obviously I missed that step that time. Will fix, thanks James! regards, tom lane
Re: Converting contrib SQL functions to new style
On Wed, Apr 14, 2021 at 10:49 AM Tom Lane wrote: > The situation of interest is where you are trying to install an extension > into a schema that also contains malicious objects. We've managed to make > most of the commands you might use in an extension script secure against > that situation, and Noah wants to hold SQL-function creation to that same > standard. Oh, I was forgetting that the creation schema has to be first in your search path. :-( Does the idea of allowing the creation schema to be set separately have any legs? Because it seems like that would help here. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Converting contrib SQL functions to new style
> On Apr 13, 2021, at 3:26 PM, Tom Lane wrote: > > However I think we may still need an assumption that earthdistance > and cube are in the same schema --- any comments on that? This is probably not worth doing, and we are already past feature freeze, but adding syntax to look up the namespace of an extension might help. The problem seems to be that we can't syntactically refer to the schema of an extension. We have to instead query pg_catalog.pg_extension joined against pg_catalog.pg_namespace and then interpolate the namespace name into strings that get executed, which is ugly. This syntax is perhaps a non-starter, but conceptually something like: -CREATE DOMAIN earth AS cube +CREATE DOMAIN earthdistance::->earth AS cube::->cube Then we'd perhaps extend RangeVar with an extensionname field and have either a schemaname or an extensionname be looked up in places where we currently lookup schemas, adding a catcache for extensions. (Like I said, probably not worth doing.) We could get something like this working just inside the CREATE EXTENSION command if we expanded on the @extschema@ idea a bit. At first I thought this idea would suffer race conditions with concurrent modifications of pg_extension or pg_namespace, but it looks like we already have a snapshot when processing the script file, so: -CREATE DOMAIN earth AS cube +CREATE DOMAIN @@earthdistance@@::earth AS @@cube@@::cube or such, with @@foo@@ being parsed out, looked up in pg_extension join pg_namespace, and substituted back in. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Bogus collation version recording in recordMultipleDependencies
I noticed some broken-looking logic in recordMultipleDependencies concerning how it records collation versions. It was a bit harder than I expected to demonstrate the bugs, but I eventually succeeded with u8=# create function foo(varchar) returns bool language sql return false; CREATE FUNCTION u8=# create collation mycoll from "en_US"; CREATE COLLATION u8=# CREATE DOMAIN d4 AS character varying(3) COLLATE "aa_DJ" CONSTRAINT yes_or_no_check CHECK (value = 'YES' collate mycoll or foo(value)); CREATE DOMAIN u8=# select objid, pg_describe_object(classid,objid,objsubid) as obj, pg_describe_object(refclassid,refobjid,refobjsubid) as ref, deptype, refobjversion from pg_depend where objid = 'd4'::regtype; objid | obj |ref| deptype | refobjversion ---+-+---+-+--- 37421 | type d4 | schema public | n | 37421 | type d4 | collation "aa_DJ" | n | (2 rows) u8=# select objid, pg_describe_object(classid,objid,objsubid) as obj, pg_describe_object(refclassid,refobjid,refobjsubid) as ref, deptype, refobjversion from pg_depend where refobjid = 'd4'::regtype; objid |obj | ref | deptype | refobjversion ---++-+-+--- 37420 | type d4[] | type d4 | i | 37422 | constraint yes_or_no_check | type d4 | a | (2 rows) u8=# select objid, pg_describe_object(classid,objid,objsubid) as obj, pg_describe_object(refclassid,refobjid,refobjsubid) as ref, deptype, refobjversion from pg_depend where objid = 37422; objid |obj | ref | deptype | refobjversion ---++-+-+--- 37422 | constraint yes_or_no_check | type d4 | a | 37422 | constraint yes_or_no_check | collation mycoll| n | 2.28 37422 | constraint yes_or_no_check | function foo(character varying) | n | 2.28 37422 | constraint yes_or_no_check | collation "default" | n | (4 rows) (This is in a glibc-based build, with C as the database's default collation.) One question here is whether it's correct that the domain's dependency on collation "aa_DJ" is unversioned. Maybe that's intentional, but it seems worth asking. Anyway, there are two pretty obvious bugs in the dependencies for the domain's CHECK constraint: the version for collation mycoll leaks into the entry for function foo, and an entirely useless (because unversioned) dependency is recorded on the default collation. ... well, it's almost entirely useless. If we fix things to not do that (as per patch 0001 below), the results of the create_index regression test become unstable, because there's two queries that inquire into the dependencies of indexes, and their results change depending on whether the default collation has a version or not. I'd be inclined to just take out the portions of that test that depend on that question, but maybe somebody will complain that there's a loss of useful coverage. I don't agree, but maybe I'll be overruled. If we do feel we need to stay bug-compatible with that behavior, then the alternate 0002 patch just fixes the version-leakage-across-entries problem, while still removing the unnecessary assumption that C, POSIX, and DEFAULT are the only pinned collations. (To be clear: 0002 passes check-world as-is, while 0001 is not committable without some regression-test fiddling.) Thoughts? regards, tom lane diff --git a/src/backend/catalog/pg_depend.c b/src/backend/catalog/pg_depend.c index 362db7fe91..1217c01b8a 100644 --- a/src/backend/catalog/pg_depend.c +++ b/src/backend/catalog/pg_depend.c @@ -73,7 +73,6 @@ recordMultipleDependencies(const ObjectAddress *depender, max_slots, slot_init_count, slot_stored_count; - char *version = NULL; if (nreferenced <= 0) return; /* nothing to do */ @@ -104,31 +103,22 @@ recordMultipleDependencies(const ObjectAddress *depender, slot_init_count = 0; for (i = 0; i < nreferenced; i++, referenced++) { - bool ignore_systempin = false; + char *version = NULL; if (record_version) { /* For now we only know how to deal with collations. */ if (referenced->classId == CollationRelationId) { -/* C and POSIX don't need version tracking. */ +/* These are unversioned, so don't waste cycles on them. */ if (referenced->objectId == C_COLLATION_OID || referenced->objectId == POSIX_COLLATION_OID) continue; version = get_collation_version_for_oid(referenced->objectId, false); - -/* - * Default collation is pinned, so we need to force recording - * the dependency to store the version. - */ -if (referenced->objectId == DEFAULT_COLLATION_OID) - ignore_systempin = true; } } - else -
Re: pg_amcheck contrib application
On Mon, Apr 12, 2021 at 11:06 PM Mark Dilger wrote: > It now reports: > > # heap table "postgres"."public"."test", block 0, offset 18, attribute 2: > # toast value 16461 missing chunk 3 with expected size 1996 > # heap table "postgres"."public"."test", block 0, offset 18, attribute 2: > # toast value 16461 was expected to end at chunk 6 with total size 1, > but ended at chunk 5 with total size 8004 > > It sounds like you weren't expecting the second of these reports. I think it > is valuable, especially when there are multiple missing chunks and multiple > extraneous chunks, as it makes it easier for the user to reconcile the > missing chunks against the extraneous chunks. I wasn't, but I'm not overwhelmingly opposed to it, either. I do think I would be in favor of splitting this kind of thing up into two messages: # toast value 16459 unexpected chunks 1000 through 1004 each with size 1996 followed by chunk 1005 with size 20 We'll have fewer message variants and I don't think any real regression in usability if we say: # toast value 16459 has unexpected chunks 1000 through 1004 each with size 1996 # toast value 16459 has unexpected chunk 1005 with size 20 (Notice that I also inserted "has" so that the sentence a verb. Or we could "contains.") I committed 0001. -- Robert Haas EDB: http://www.enterprisedb.com
Re: jsonb subscripting assignment performance
On Wed, Apr 14, 2021 at 10:57 AM Dmitry Dolgov <9erthali...@gmail.com> wrote: > > sure - there is big room for optimization. But this patch was big enough > > without its optimization. And it was not clean, if I will be committed or > > not (it waited in commitfest application for 4 years). So I accepted > > implemented behaviour (without inplace update). Now, this patch is in core, > > and anybody can work on others possible optimizations. > > Right, jsonb subscripting deals mostly with the syntax part and doesn't > change internal jsonb behaviour. If I understand the original question > correctly, "in-place" here means updating of e.g. just one particular > key within a jsonb object, since jsonb_set looks like an overwrite of > the whole jsonb. If so, then update will still cause the whole jsonb to > be updated, there is no partial update functionality for the on-disk > format. Although there is work going on to optimize this in case when > jsonb is big enough to be put into a toast table (partial toast > decompression thread, or bytea appendable toast). +1 -- Regards, Alexander Korotkov
[PATCH] expand the units that pg_size_pretty supports on output
Hi folks, Enclosed is a patch that expands the unit output for pg_size_pretty(numeric) going up to Yottabytes; I reworked the existing numeric output code to account for the larger number of units we're using rather than just adding nesting levels. There are also a few other places that could benefit from expanded units, but this is a standalone starting point. Best, David 0001-Expand-the-units-that-pg_size_pretty-numeric-knows-a.patch Description: Binary data
Re: Possible typo/unclear comment in joinpath.c
On Wed, Apr 14, 2021 at 11:36:38AM -0400, James Coleman wrote: > In joinpath.c three times we reference "extra_lateral_rels" (with > underscores like it's a field), but as far as I can tell that's not a > field anywhere in the source code, and looking at the code that > follows it seems like it should be referencing "lateral_relids" (and > the "extra" is really "extra [in relation to relids]"). It looks like a loose end from commit edca44b1525b3d591263d032dc4fe500ea771e0e Author: Tom Lane Date: Mon Dec 7 18:56:14 2015 -0500 Simplify LATERAL-related calculations within add_paths_to_joinrel(). -- Justin
Options given both on cmd-line and in the config with different values
Hello hackers, I'm trying to understand what is happening in the following bug report: https://bugzilla.redhat.com/show_bug.cgi?id=1935301 The upgrade process makes it a bit more difficult, but it seems to boil down to this problem -- even when pg_ctl gets clear guidance where to find datadir using -D option on the command-line, it forgets this guidance once finding data_directory option in the postgresql.conf. Is this the expected behavior actually? Or is the behavior in this case (i.e. when the same option is specified on the cmd-line and also in the datadir, with different values) defined at all? (couldn't find it in the doc and even google does not return me anything useful) Thanks for any tips, Honza
Possible typo/unclear comment in joinpath.c
In joinpath.c three times we reference "extra_lateral_rels" (with underscores like it's a field), but as far as I can tell that's not a field anywhere in the source code, and looking at the code that follows it seems like it should be referencing "lateral_relids" (and the "extra" is really "extra [in relation to relids]"). Assuming that interpretation is correct, I'd attached a patch to change all three occurrences to "extra lateral_relids" to reduce confusion. Thanks, James v1-0001-Fix-extra_lateral_rels-typo.patch Description: Binary data
Re: Converting contrib SQL functions to new style
Robert Haas writes: > On Wed, Apr 14, 2021 at 8:58 AM Noah Misch wrote: >> Once CREATE EXTENSION is over, things are a great deal safer under this >> proposal, as you say. I suspect it makes CREATE EXTENSION more hazardous. >> Today, typical SQL commands in extension creation scripts don't activate >> inexact argument type matching. You were careful to make each script clear >> the search_path around commands deviating from that (commit 7eeb1d9). I >> think >> "CREATE FUNCTION plus1dot1(int) RETURNS numeric LANGUAGE SQL RETURN $1 + >> 1.1;" >> in a trusted extension script would constitute a security vulnerability, >> since >> it can lock in the wrong operator. > I don't understand how that can happen, unless we've failed to secure > the search_path. And, if we've failed to secure the search_path, I > think we are in a lot of trouble no matter what else we do. The situation of interest is where you are trying to install an extension into a schema that also contains malicious objects. We've managed to make most of the commands you might use in an extension script secure against that situation, and Noah wants to hold SQL-function creation to that same standard. My concern in this patch is rendering SQL functions safe against untrusted search_path at *time of use*, which is really an independent security concern. If you're willing to assume there's nothing untrustworthy in your search_path, then there's no issue and nothing to fix. Unfortunately, that seems like a rather head-in-the-sand standpoint. regards, tom lane
Re: [PATCH] Identify LWLocks in tracepoints
On Tue, Apr 13, 2021 at 10:42 PM Craig Ringer wrote: > I'd really love it if a committer could add an explanatory comment or > two in the area though. I'm happy to draft a comment patch if anyone > agrees my suggestion is sensible. The key things I needed to know when > studying the code were: > > * A LWLock* is always part of a tranche, but locks within a given > tranche are not necessarily co-located in memory, cross referenced or > associated in any way. > * A LWLock tranche does not track how many LWLocks are in the tranche > or where they are in memory. It only groups up LWLocks into categories > and maps the tranche ID to a name. > * Not all LWLocks are part of the main LWLock array; others can be > embedded in shmem structs elsewhere, including in DSM segments. > * LWLocks in DSM segments may not have the same address between > different backends, because the DSM base address can vary, so a > LWLock* cannot be reliably compared between backends unless you know > it's in the main LWLock array or in static shmem. > > Having that info in lwlock.c near the tranche management code or the > tranche and main lwlock arrays would've been very handy. I'm willing to review a comment patch along those lines. > I'm actually inclined to revise the patch I sent in order to *remove* > the LWLock name from the tracepoint argument. At least for the > fast-path tracepoints on start-of-acquire and end-of-acquire. I think > it's probably OK to report it in the lock wait tracepoints, which is > where it's most important to have anyway. So the tracepoint will > always report the LWLock* and tranche ID, but it won't report the > tranche name for the fast-path. I'll add trace events for tranche ID > registration, so trace tools can either remember the tranche ID->name > mappings from there, or capture them from lock wait events and > remember them. > > Reasonable? That way we retain the most important trace functionality, > but we reduce the overheads. Reducing the overheads is good, but I have no opinion on what's important for people doing tracing, because I am not one of those people. -- Robert Haas EDB: http://www.enterprisedb.com
Re: [PATCH] Identify LWLocks in tracepoints
On Tue, Apr 13, 2021 at 4:46 PM Andres Freund wrote: > I still don't like the two bytes, fwiw ;). Especially because it's 4 > bytes due to padding right now. I'm not surprised by that disclosure. But I think it's entirely worth it. Making wait states visible in pg_stat_activity isn't the most useful thing I've ever done to PostgreSQL, but it's far from the least useful. If we can get the same benefit with less overhead, that's great. > I'd like to move the LWLock->waiters list to outside of the lwlock > itself - at most TotalProcs LWLocks can be waited for, so we don't need > millions of empty proclist_heads. That way we could also remove the > proclist indirection - which shows up a fair bit in contended workloads. > > And if we had a separate "lwlocks being waited for" structure, we could > also add more information to it if we wanted to... > > The difficulty of course is having space to indicate which of these > "waiting for" lists are being used - there's not enough space in ->state > right now to represent that. Two possibile approaches: > > - We could make it work if we restricted MAX_BACKENDS to be 2**14 - but > while I personally think that's a sane upper limit, I already had a > hard time getting consensus to lower the limit to 2^18-1. > > - Use a 64bit integer. Then we can easily fit MAX_BACKENDS lockers, as > well as an offset to one of MAX_BACKENDS "wait lists" into LWLock. I'd rather not further reduce MAX_BACKENDS. I still think some day we're going to want to make that bigger again. Maybe not for a while, admittedly. But, do you need to fit this into "state"? If you just replaced "waiters" with a 32-bit integer, you'd save 4 bytes and have bits left over (and maybe restrict the tranche ID to 2^14 and squeeze that in too, as you mention). -- Robert Haas EDB: http://www.enterprisedb.com
Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements
Hello, Kuroda! On Fri, 2021-04-09 at 00:23 +, kuroda.hay...@fujitsu.com wrote: > I think you are right. > According to [1] we can bump up the version per one PG major version, > and any features are not committed yet for 15. > > [1]: https://www.postgresql.org/message-id/20201202040516.GA43757@nol Version 2 of patch attached. pg_stat_statements version is now 1.10 and patch is based on 0f61727. -- Andrei Zubkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company From 321ce82f22894be39297515268c1f3bed74778e2 Mon Sep 17 00:00:00 2001 From: Andrei Zubkov Date: Wed, 14 Apr 2021 16:35:56 +0300 Subject: [PATCH] pg_stat_statements: Track statement entry timestamp Added first_seen column in pg_stat_statements view. This field is populated with current timestamp when a new statement is added to pg_stat_statements hashtable. This field provides clean information about statistics collection time interval for each statement. Besides it can be used by sampling solutions to detect situations when a statement was evicted and returned back between two samples. Discussion: https://www.postgresql.org/message-id/flat/72e80e7b160a6eb189df9ef6f068cce3765d37f8.camel%40moonset.ru --- contrib/pg_stat_statements/Makefile | 3 +- .../expected/pg_stat_statements.out | 29 + .../pg_stat_statements--1.9--1.10.sql | 59 +++ .../pg_stat_statements/pg_stat_statements.c | 31 +- .../pg_stat_statements.control| 2 +- .../sql/pg_stat_statements.sql| 9 +++ doc/src/sgml/pgstatstatements.sgml| 9 +++ 7 files changed, 137 insertions(+), 5 deletions(-) create mode 100644 contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile index 3ec627b956..7d7b2f4808 100644 --- a/contrib/pg_stat_statements/Makefile +++ b/contrib/pg_stat_statements/Makefile @@ -6,7 +6,8 @@ OBJS = \ pg_stat_statements.o EXTENSION = pg_stat_statements -DATA = pg_stat_statements--1.4.sql pg_stat_statements--1.8--1.9.sql \ +DATA = pg_stat_statements--1.4.sql \ + pg_stat_statements--1.9--1.10.sql pg_stat_statements--1.8--1.9.sql \ pg_stat_statements--1.7--1.8.sql pg_stat_statements--1.6--1.7.sql \ pg_stat_statements--1.5--1.6.sql pg_stat_statements--1.4--1.5.sql \ pg_stat_statements--1.3--1.4.sql pg_stat_statements--1.2--1.3.sql \ diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out index 674ed270a8..f2268dfd87 100644 --- a/contrib/pg_stat_statements/expected/pg_stat_statements.out +++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out @@ -941,4 +941,33 @@ SELECT query, toplevel, plans, calls FROM pg_stat_statements WHERE query LIKE '% $$ LANGUAGE plpgsql | | | (3 rows) +-- +-- statement timestamps +-- +SELECT pg_stat_statements_reset(); + pg_stat_statements_reset +-- + +(1 row) + +SELECT 1 AS "STMTTS1"; + STMTTS1 +- + 1 +(1 row) + +SELECT now() AS ref_ts \gset +SELECT 1,2 AS "STMTTS2"; + ?column? | STMTTS2 +--+- +1 | 2 +(1 row) + +SELECT first_seen >= :'ref_ts', count(*) FROM pg_stat_statements WHERE query LIKE '%STMTTS%' GROUP BY first_seen >= :'ref_ts' ORDER BY first_seen >= :'ref_ts'; + ?column? | count +--+--- + f| 1 + t| 1 +(2 rows) + DROP EXTENSION pg_stat_statements; diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql b/contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql new file mode 100644 index 00..969a30fbdc --- /dev/null +++ b/contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql @@ -0,0 +1,59 @@ +/* contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql */ + +-- complain if script is sourced in psql, rather than via ALTER EXTENSION +\echo Use "ALTER EXTENSION pg_stat_statements UPDATE TO '1.10'" to load this file. \quit + +/* We need to redefine a view and a function */ +/* First we have to remove them from the extension */ +ALTER EXTENSION pg_stat_statements DROP VIEW pg_stat_statements; +ALTER EXTENSION pg_stat_statements DROP FUNCTION pg_stat_statements(boolean); + +/* Then we can drop them */ +DROP VIEW pg_stat_statements; +DROP FUNCTION pg_stat_statements(boolean); + +/* Now redefine */ +CREATE FUNCTION pg_stat_statements(IN showtext boolean, +OUT userid oid, +OUT dbid oid, +OUT toplevel bool, +OUT queryid bigint, +OUT query text, +OUT plans int8, +OUT total_plan_time float8, +OUT min_plan_time float8, +OUT max_plan_time float8, +OUT mean_plan_time float8, +OUT stddev_plan_time float8, +OUT calls int8, +OUT total_exec_time float8, +OUT min_exec_time float8, +OUT max_exec_time float8, +OUT mean_exec_time float8, +OUT stddev_exec_time float8, +OUT rows int8, +OUT
Re: [BUG] Autovacuum not dynamically decreasing cost_limit and cost_delay
> On Mar 1, 2021, at 8:43 PM, Masahiko Sawada wrote: > > CAUTION: This email originated from outside of the organization. Do not click > links or open attachments unless you can confirm the sender and know the > content is safe. > > > > On Mon, Feb 8, 2021 at 11:49 PM Mead, Scott wrote: >> >> Hello, >> I recently looked at what it would take to make a running autovacuum >> pick-up a change to either cost_delay or cost_limit. Users frequently will >> have a conservative value set, and then wish to change it when autovacuum >> initiates a freeze on a relation. Most users end up finding out they are in >> ‘to prevent wraparound’ after it has happened, this means that if they want >> the vacuum to take advantage of more I/O, they need to stop and then restart >> the currently running vacuum (after reloading the GUCs). >> >> Initially, my goal was to determine feasibility for making this dynamic. I >> added debug code to vacuum.c:vacuum_delay_point(void) and found that changes >> to cost_delay and cost_limit are already processed by a running vacuum. >> There was a bug preventing the cost_delay or cost_limit from being >> configured to allow higher throughput however. >> >> I believe this is a bug because currently, autovacuum will dynamically >> detect and increase the cost_limit or cost_delay, but it can never decrease >> those values beyond their setting when the vacuum began. The current >> behavior is for vacuum to limit the maximum throughput of currently running >> vacuum processes to the cost_limit that was set when the vacuum process >> began. > > Thanks for your report. > > I've not looked at the patch yet but I agree that the calculation for > autovacuum cost delay seems not to work fine if vacuum-delay-related > parameters (e.g., autovacuum_vacuum_cost_delay etc) are changed during > vacuuming a table to speed up running autovacuums. Here is my > analysis: I appreciate your in-depth analysis and will comment in-line. That said, I still think it’s important that the attached path is applied. As it is today, a simple few lines of code prevent users from being able to increase the throughput on vacuums that are running without having to cancel them first. The patch that I’ve provided allows users to decrease their vacuum_cost_delay and get an immediate boost in performance to their running vacuum jobs. > > Suppose we have the following parameters and 3 autovacuum workers are > running on different tables: > > autovacuum_vacuum_cost_delay = 100 > autovacuum_vacuum_cost_limit = 100 > > Vacuum cost-based delay parameters for each workers are follows: > > worker->wi_cost_limit_base = 100 > worker->wi_cost_limit = 66 > worker->wi_cost_delay = 100 > > Each running autovacuum has "wi_cost_limit = 66" because the total > limit (100) is equally rationed. And another point is that the total > wi_cost_limit (198 = 66*3) is less than autovacuum_vacuum_cost_limit, > 100. Which are fine. > > Here let's change autovacuum_vacuum_cost_delay/limit value to speed up > running autovacuums. > > Case 1 : increasing autovacuum_vacuum_cost_limit to 1000. > > After reloading the configuration file, vacuum cost-based delay > parameters for each worker become as follows: > > worker->wi_cost_limit_base = 100 > worker->wi_cost_limit = 100 > worker->wi_cost_delay = 100 > > If we rationed autovacuum_vacuum_cost_limit, 1000, to 3 workers, it > would be 333. But since we cap it by wi_cost_limit_base, the > wi_cost_limit is 100. I think this is what Mead reported here. Yes, this is exactly correct. The cost_limit is capped at the cost_limit that was set during the start of a running vacuum. My patch changes this cap to be the max allowed cost_limit (10,000). > > Case 2 : decreasing autovacuum_vacuum_cost_delay to 10. > > After reloading the configuration file, vacuum cost-based delay > parameters for each workers become as follows: > > worker->wi_cost_limit_base = 100 > worker->wi_cost_limit = 100 > worker->wi_cost_delay = 100 > > Actually, the result is the same as case 1. But In this case, the > total cost among the three workers is 300, which is greater than > autovacuum_vacuum_cost_limit, 100. This behavior violates what the > documentation explains in the description of > autovacuum_vacuum_cost_limit: > > --- > Note that the value is distributed proportionally among the running > autovacuum workers, if there is more than one, so that the sum of the > limits for each worker does not exceed the value of this variable. > --- > > It seems to me that those problems come from the fact that we don't > change both wi_cost_limit_base and wi_cost_delay during auto-vacuuming > a table in spite of using autovacuum_vac_cost_limit/delay to calculate > cost_avail. Such a wrong calculation happens until all running > autovacuum workers finish the current vacuums. When a worker starts to > process a new table, it resets both wi_cost_limit_base and > wi_cost_delay.
ANALYZE counts LP_DEAD line pointers as n_dead_tup
Hi all, If we create a table with vacuum_index_cleanup = off or execute VACUUM with INDEX_CLEANUP = off, vacuum updates pg_stat_all_tables.n_dead_tup to the number of HEAPTUPLE_RECENTLY_DEAD tuples. Whereas analyze updates it to the sum of the number of HEAPTUPLE_DEAD/RECENTLY_DEAD tuples and LP_DEAD line pointers. So if the table has many LP_DEAD line pointers due to skipping index cleanup, autovacuum is triggered every time after analyze/autoanalyze. This issue seems to happen also on back branches, probably from 12 where INDEX_CLEANUP option was introduced. I think we can have heapam_scan_analyze_next_tuple() not count LP_DEAD line pointer as lazy_scan_prune() does. Attached the patch for that. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/ not_count_lp_dead_by_analyze.patch Description: Binary data
Can a child process detect postmaster death when in pg_usleep?
Hi, In my dev system(Ubuntu) when the postmaster is killed with SIGKILL, SIGPWR is being sent to its child processes (backends/any other bg process). If a child process is waiting with pg_usleep, it looks like it is not detecting the postmaster's death and it doesn't exit immediately but stays forever until it gets killed explicitly. For this experiment, I did 2 things to simulate the scenario i.e. a backend waiting in pg_usleep and killing the postmaster. 1) I wrote a wait function that uses pg_usleep and called it in a backend. This backend doesn't exit on postmaster death. 2) I set PostAuthDelay to 100 seconds and started the postmaster. Then, the "auotvacuum launcher" process still stays around (as it has pg_usleep in its main function), even after postmaster death. Questions: 1) Is it really harmful to use pg_usleep in a postmaster child process as it doesn't let the child process detect postmaster death? 2) Can pg_usleep() always detect signals? I see the caution in the pg_usleep function definition in pgsleep.c, saying the signal handling is platform dependent. We have code blocks like below in the code. Do we actually process interrupts before going to sleep with pg_usleep()? while/for loop { .. .. CHECK_FOR_INTERRUPTS(); pg_usleep(); } and if (PostAuthDelay) pg_usleep(); 3) Is it intentional to use pg_usleep in some places in the code? If yes, what are they? At least, I see one place where it's intentional in the wait_pid function which is used while running the regression tests. 4) Are there any places where we need to replace pg_usleep with WaitLatch/equivalent of pg_sleep to detect the postmaster death properly? Correct me if I'm missing something or if my observation/understanding of the pg_usleep() is wrong. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Converting contrib SQL functions to new style
On Wed, Apr 14, 2021 at 8:58 AM Noah Misch wrote: > Once CREATE EXTENSION is over, things are a great deal safer under this > proposal, as you say. I suspect it makes CREATE EXTENSION more hazardous. > Today, typical SQL commands in extension creation scripts don't activate > inexact argument type matching. You were careful to make each script clear > the search_path around commands deviating from that (commit 7eeb1d9). I think > "CREATE FUNCTION plus1dot1(int) RETURNS numeric LANGUAGE SQL RETURN $1 + 1.1;" > in a trusted extension script would constitute a security vulnerability, since > it can lock in the wrong operator. I don't understand how that can happen, unless we've failed to secure the search_path. And, if we've failed to secure the search_path, I think we are in a lot of trouble no matter what else we do. -- Robert Haas EDB: http://www.enterprisedb.com
Re: sepgsql logging
On Wed, Apr 14, 2021 at 8:42 AM Dave Page wrote: > Attached is a patch to clean this up. It will log denials as such regardless > of whether or not either selinux or sepgsql is in permissive mode. When > either is in permissive mode, it'll add " permissive=1" to the end of the log > messages. e.g. Looks superficially reasonable on first glance, but I think we should try to get an opinion from someone who knows more about SELinux. -- Robert Haas EDB: http://www.enterprisedb.com
Re: [PATCH] PREPARE TRANSACTION unexpected behavior with TEMP TABLE
On Wed, Apr 14, 2021 at 6:45 AM Himanshu Upadhyaya wrote: > The purpose of this FIX is to mainly focus on getting consistent behavior > with PREPARE TRANSACTION. With the case that I had mentioned > previously, my expectation was either both PREPARE TRANSACTION should fail > or both should succeed but here second same "PREPARE TRANSACTION" was > successful however first one was failing with an error, which is kind of > weird to me. I agree that it's weird, but that doesn't mean that your patch is an improvement, and I don't think it is. If we could make this thing more consistent without incurring any negatives, I'd be in favor of that. But the patch does have some negatives, which in my opinion are more substantial than the problem you're trying to fix. Namely, possible performance consequences, and undocumented and fragile assumptions that, as it seems to me, may easily get broken in the future. I see that you've repeatedly capitalized the word FIX in your reply, but it's just not that simple. If this had really bad consequences like corrupting data or crashing the server then it would be essential to do something about it, but so far the worst consequence you've indicated is that an obscure sequence of SQL commands that no real user is likely to issue produces a slightly surprising result. That's not great, but neither is it an emergency. > I have also tried to reproduce the behavior. Your test case isn't ideal for reproducing the problem that the comment is worrying about. Normally, when we take a lock on a table, we hold it until commit. But, that only applies when we run a query that mentions the table, like a SELECT or an UPDATE. In your case, we only end up opening the table to build a relcache entry for it, so that we can look at the metadata. And, catalog scans used to build syscache and relcache entries release locks immediately, rather than waiting until the end of the transaction. So it might be that if we failed to ever set XACT_FLAGS_ACCESSEDTEMPNAMESPACE in your test case, everything would be fine. That doesn't seem to be true in general though. I tried changing "cannot PREPARE a transaction that has operated on temporary objects" from an ERROR to a NOTICE and then ran 'make check'. It hung. I think this test case is the same problem as the regression tests hit; in any case, it also hangs: rhaas=# begin; BEGIN rhaas=*# create temp table first (); CREATE TABLE rhaas=*# prepare transaction 'whatever'; NOTICE: cannot PREPARE a transaction that has operated on temporary objects PREPARE TRANSACTION rhaas=# create temp table second (); [ hangs ] I haven't checked, but I think the problem here is that the first transaction had to create this backend's pg_temp schema and the second one can't see the results of the first one doing it so it wants to do the same thing and that results in waiting for a lock the prepared transaction already holds. I made a quick attempt to reproduce a hang at backend exit time, but couldn't find a case where that happened. That doesn't mean there isn't one, though. There's a very good chance that the person who wrote that comment knew that a real problem existed, and just didn't describe it well enough for you or I to immediately know what it is. It is also possible that they were completely wrong, or that things have changed since the comment was written, but we can't assume that without considerably more research and analysis than either of us has done so far. I think one point to take away here is that question of whether a temporary relation has been "accessed" is not all black and white. If I ran a SELECT statement against a relation, I think it's clear that I've accessed it. But, if I just used the name of that relation as a type name in some other SQL command, did I really access it? The current code's answer to that is that if we had to open and lock the relation to get its metadata, then we accessed it, and if we already had the details that we needed in cache, then we did not access it. Now, again, I agree that looks a little weird from a user point of view, but looking at the implementation, you can kind of see why it ends up like that. From a certain point of view, it would be more surprising if we never opened or locked the relation and yet ended up deciding that we'd accessed it. Now maybe we should further explore going the other direction and avoiding setting the flag at all in this case, since I think we're neither retaining a lock nor touching any relation buffers, but I think that needs more analysis. Even if we decide that's safe, there's still the problem of finding a better implementation that's not overly complex for what really feels like a very minor issue. -- Robert Haas EDB: http://www.enterprisedb.com
RE: Support tab completion for upper character inputs in psql
On Thursday, April 8, 2021 4:14 PM, Peter Eisentraut wrote >Seeing the tests you provided, it's pretty obvious that the current >behavior is insufficient. I think we could probably think of a few more >tests, for example exercising the "If case insensitive matching was >requested initially, adjust the case according to setting." case, or >something with quoted identifiers. Thanks for your review and suggestions on my patch. I've added more tests in the latest patch V5, the added tests helped me find some bugs in my patch and I fixed them. Now the patch can support not only the SET/SHOW [PARAMETER] but also UPDATE ["aTable"|ATABLE], also UPDATE atable SET ["aColumn"|ACOLUMN]. I really hope someone can have more tests suggestions on my patch or kindly do some tests on my patch and share me if any bugs happened. Differences from V4 are: * fix some bugs related to quoted identifiers. * add some tap tests. Regards, Tang V5-0001-Support-tab-completion-with-a-query-result-for-upper.patch Description: V5-0001-Support-tab-completion-with-a-query-result-for-upper.patch
Re: [PATCH] Identify LWLocks in tracepoints
On 12.04.21 07:46, Craig Ringer wrote: > To use systemtap semaphores (the _ENABLED macros) you need to run dtrace > -g to generate a probes.o then link that into postgres. > > I don't think we do that. I'll double check soon. We do that. (It's -G.) Huh. I could've sworn we didn't. My mistake, it's there in src/backend/Makefile . In that case I'll amend the patch to use semaphore guards. This whole thread is now obviously moved to consideration for PG15, but I did add an open item about this particular issue (https://wiki.postgresql.org/wiki/PostgreSQL_14_Open_Items, search for "dtrace"). So if you could produce a separate patch that adds the _ENABLED guards targeting PG14 (and PG13), that would be helpful.
Re: CTE push down
On Tue, Apr 13, 2021 at 6:58 PM Alexander Pyhalov wrote: > > Hi. > > Currently PostgreSQL supports CTE push down for SELECT statements, but > it is implemented as turning each CTE reference into subquery. > > When CTE is referenced multiple times, we have choice - to materialize > CTE (and disable quals distribution to the CTE query) or inline it (and > so run CTE query multiple times, > which can be inefficient, for example, when CTE references foreign > tables). > > I was looking if it is possible to collect quals referencing CTE, > combine in OR qual and add them to CTE query. > > So far I consider the following changes. > > 1) Modify SS_process_ctes() to add a list of RestrictInfo* to > PlannerInfo - one NULL RestrictInfo pointer per CTE (let's call this > list cte_restrictinfos for now)/ > 2) In distribute_restrictinfo_to_rels(), when we get rel of RTE_CTE > relkind and sure that can safely pushdown restrictinfo, preserve > restrictinfo in cte_restrictinfos, converting multiple restrictions to > "OR" RestrictInfos. > 3) In the end of subquery_planner() (after inheritance_planner() or > grouping_planner()) we can check if cte_restrictinfos contain some > non-null RestrictInfo pointers and recreate plan for corresponding CTEs, > distributing quals to relations inside CTE queries. > > For now I'm not sure how to handle vars mapping when we push > restrictinfos to the level of cte root or when we push it down to the > cte plan, but properly mapping vars seems seems to be doable. I think similar mapping happens when we push quals that reference a named JOIN down to join rels. I didn't take a look at it, but I think it happens before planning time. But some similar machinary might help in this case. I believe step2 is needed to avoid materializing rows which will never be selected. That would be a good improvement. However, care needs to be taken for volatile quals. I think, the quals on CTE will be evaluated twice, once when materializing the CTE result and second time when scanning the materialized result. volatile quals may produce different results when run multiple times. > > Is there something else I miss? > Does somebody work on alternative solution or see issues in such > approach? IMO, a POC patch will help understand your idea. -- Best Wishes, Ashutosh Bapat
Re: Converting contrib SQL functions to new style
On Tue, Apr 13, 2021 at 11:11:13PM -0400, Tom Lane wrote: > Noah Misch writes: > > On Tue, Apr 13, 2021 at 06:26:34PM -0400, Tom Lane wrote: > >> Attached are some draft patches to convert almost all of the > >> contrib modules' SQL functions to use SQL-standard function bodies. > >> The point of this is to remove the residual search_path security > >> hazards that we couldn't fix in commits 7eeb1d986 et al. Since > >> a SQL-style function body is fully parsed at creation time, > >> its object references are not subject to capture by the run-time > >> search path. > > > Are there any inexact matches in those function/operator calls? Will that > > matter more or less than it does today? > > I can't claim to have looked closely for inexact matches. It should > matter less than today, since there's a hazard only during creation > (with a somewhat-controlled search path) and not during use. But > that doesn't automatically eliminate the issue. Once CREATE EXTENSION is over, things are a great deal safer under this proposal, as you say. I suspect it makes CREATE EXTENSION more hazardous. Today, typical SQL commands in extension creation scripts don't activate inexact argument type matching. You were careful to make each script clear the search_path around commands deviating from that (commit 7eeb1d9). I think "CREATE FUNCTION plus1dot1(int) RETURNS numeric LANGUAGE SQL RETURN $1 + 1.1;" in a trusted extension script would constitute a security vulnerability, since it can lock in the wrong operator.
Re: sepgsql logging
Hi On Thu, Apr 1, 2021 at 3:30 PM Dave Page wrote: > > > On Thu, Apr 1, 2021 at 3:23 PM Tom Lane wrote: > >> Andrew Dunstan writes: >> > On 4/1/21 8:32 AM, Dave Page wrote: >> >> It seems to me that sepgsql should also log the denial, but flag that >> >> permissive mode is on. >> >> > +1 for doing what selinux does if possible. >> >> +1. If selinux itself is doing that, it's hard to see a reason why >> we should not; and I concur that the info is useful. >> > > Thanks both. I'll take a look at the code and see if I can whip up a patch > (it'll be a week or so as I'm taking some time off for Easter). > Attached is a patch to clean this up. It will log denials as such regardless of whether or not either selinux or sepgsql is in permissive mode. When either is in permissive mode, it'll add " permissive=1" to the end of the log messages. e.g. Regular user in permissive mode, with a restricted table column: 2021-04-14 13:20:30.401 BST [23073] LOG: SELinux: allowed { select } scontext=user_u:user_r:user_t:s0 tcontext=unconfined_u:object_r:sepgsql_table_t:s0 tclass=db_table name="public.tb_users" permissive=1 2021-04-14 13:20:30.401 BST [23073] STATEMENT: SELECT * FROM tb_users; 2021-04-14 13:20:30.401 BST [23073] LOG: SELinux: allowed { select } scontext=user_u:user_r:user_t:s0 tcontext=unconfined_u:object_r:sepgsql_table_t:s0 tclass=db_column name="column uid of table tb_users" permissive=1 2021-04-14 13:20:30.401 BST [23073] STATEMENT: SELECT * FROM tb_users; 2021-04-14 13:20:30.401 BST [23073] LOG: SELinux: allowed { select } scontext=user_u:user_r:user_t:s0 tcontext=unconfined_u:object_r:sepgsql_table_t:s0 tclass=db_column name="column name of table tb_users" permissive=1 2021-04-14 13:20:30.401 BST [23073] STATEMENT: SELECT * FROM tb_users; 2021-04-14 13:20:30.401 BST [23073] LOG: SELinux: allowed { select } scontext=user_u:user_r:user_t:s0 tcontext=unconfined_u:object_r:sepgsql_table_t:s0 tclass=db_column name="column mail of table tb_users" permissive=1 2021-04-14 13:20:30.401 BST [23073] STATEMENT: SELECT * FROM tb_users; 2021-04-14 13:20:30.401 BST [23073] LOG: SELinux: allowed { select } scontext=user_u:user_r:user_t:s0 tcontext=unconfined_u:object_r:sepgsql_table_t:s0 tclass=db_column name="column address of table tb_users" permissive=1 2021-04-14 13:20:30.401 BST [23073] STATEMENT: SELECT * FROM tb_users; 2021-04-14 13:20:30.401 BST [23073] LOG: SELinux: allowed { select } scontext=user_u:user_r:user_t:s0 tcontext=unconfined_u:object_r:sepgsql_table_t:s0 tclass=db_column name="column salt of table tb_users" permissive=1 2021-04-14 13:20:30.401 BST [23073] STATEMENT: SELECT * FROM tb_users; 2021-04-14 13:20:30.401 BST [23073] LOG: SELinux: denied { select } scontext=user_u:user_r:user_t:s0 tcontext=system_u:object_r:sepgsql_secret_table_t:s0 tclass=db_column name="column phash of table tb_users" permissive=1 2021-04-14 13:20:30.401 BST [23073] STATEMENT: SELECT * FROM tb_users; The same user/table, but in enforcing mode: 2021-04-14 13:17:21.645 BST [22974] LOG: SELinux: allowed { search } scontext=user_u:user_r:user_t:s0 tcontext=system_u:object_r:sepgsql_schema_t:s0 tclass=db_schema name="public" at character 15 2021-04-14 13:17:21.645 BST [22974] STATEMENT: SELECT * FROM tb_users; 2021-04-14 13:17:21.646 BST [22974] LOG: SELinux: allowed { select } scontext=user_u:user_r:user_t:s0 tcontext=unconfined_u:object_r:sepgsql_table_t:s0 tclass=db_table name="public.tb_users" 2021-04-14 13:17:21.646 BST [22974] STATEMENT: SELECT * FROM tb_users; 2021-04-14 13:17:21.646 BST [22974] LOG: SELinux: allowed { select } scontext=user_u:user_r:user_t:s0 tcontext=unconfined_u:object_r:sepgsql_table_t:s0 tclass=db_column name="column uid of table tb_users" 2021-04-14 13:17:21.646 BST [22974] STATEMENT: SELECT * FROM tb_users; 2021-04-14 13:17:21.646 BST [22974] LOG: SELinux: allowed { select } scontext=user_u:user_r:user_t:s0 tcontext=unconfined_u:object_r:sepgsql_table_t:s0 tclass=db_column name="column name of table tb_users" 2021-04-14 13:17:21.646 BST [22974] STATEMENT: SELECT * FROM tb_users; 2021-04-14 13:17:21.646 BST [22974] LOG: SELinux: allowed { select } scontext=user_u:user_r:user_t:s0 tcontext=unconfined_u:object_r:sepgsql_table_t:s0 tclass=db_column name="column mail of table tb_users" 2021-04-14 13:17:21.646 BST [22974] STATEMENT: SELECT * FROM tb_users; 2021-04-14 13:17:21.646 BST [22974] LOG: SELinux: allowed { select } scontext=user_u:user_r:user_t:s0 tcontext=unconfined_u:object_r:sepgsql_table_t:s0 tclass=db_column name="column address of table tb_users" 2021-04-14 13:17:21.646 BST [22974] STATEMENT: SELECT * FROM tb_users; 2021-04-14 13:17:21.646 BST [22974] LOG: SELinux: allowed { select } scontext=user_u:user_r:user_t:s0 tcontext=unconfined_u:object_r:sepgsql_table_t:s0 tclass=db_column name="column salt of table tb_users" 2021-04-14 13:17:21.646 BST [22974] STATEMENT: SELECT * FROM tb_users; 2021-04-14 13:17:21.646 BST [22974] LOG: SELinux: denied {
Re: Extensions not dumped when --schema is used
On Wed, Apr 14, 2021 at 10:38:17AM +0900, Michael Paquier wrote: > On Tue, Apr 13, 2021 at 08:00:34AM -0700, Noah Misch wrote: > > On Tue, Apr 13, 2021 at 02:43:11PM +0900, Michael Paquier wrote: > >>> - If extschema='public', "pg_dump -e plpgsql --schema=public" includes > >>> commands to dump the relation data. This surprised me. (The > >>> --schema=public argument causes selectDumpableNamespace() to set > >>> nsinfo->dobj.dump=DUMP_COMPONENT_ALL instead of DUMP_COMPONENT_ACL.) > > This isn't a problem of selecting schemas for inclusion in the dump. This > > is > > a problem of associating extensions with their pg_extension_config_dump() > > relations and omitting those extension-member relations when "-e" causes > > omission of the extension. > > At code level, the decision to dump the data of any extension's > dumpable table is done in processExtensionTables(). I have to admit > that your feeling here keeps the code simpler than what I have been > thinking if we apply an extra filtering based on the list of > extensions in this code path. So I can see the value in your argument > to not dump at all the data of an extension's dumpable table as long > as its extension is not listed, and this, even if its schema is > explicitly listed. > > So I got down to make the behavior more consistent with the patch > attached. This passes your case. Yes. > It is worth noting that if a > table is part of a schema created by an extension, but that the table > is not dependent on the extension, we would still dump its data if > using --schema with this table's schema while the extension is not > part of the list from --extension. In the attached, that's just the > extra test with without_extension_implicit_schema. That's consistent with v13, and it seems fine. I've not used a non-test extension that creates a schema. > --- a/src/test/modules/test_pg_dump/t/001_base.pl > +++ b/src/test/modules/test_pg_dump/t/001_base.pl > @@ -208,6 +208,30 @@ my %pgdump_runs = ( > 'pg_dump', '--no-sync', > "--file=$tempdir/without_extension.sql", > '--extension=plpgsql', 'postgres', > ], > + }, > + > + # plgsql in the list of extensions blocks the dump of extension > + # test_pg_dump. > + without_extension_explicit_schema => { > + dump_cmd => [ > + 'pg_dump', > + '--no-sync', > + "--file=$tempdir/without_extension_explicit_schema.sql", > + '--extension=plpgsql', > + '--schema=public', > + 'postgres', > + ], > + }, > + > + without_extension_implicit_schema => { > + dump_cmd => [ > + 'pg_dump', > + '--no-sync', > + "--file=$tempdir/without_extension_implicit_schema.sql", > + '--extension=plpgsql', > + '--schema=regress_pg_dump_schema', > + 'postgres', > + ], > },); The name "without_extension_explicit_schema" arose because that test differs from the "without_extension" test by adding --schema=public. The test named "without_extension_implicit_schema" differs from "without_extension" by adding --schema=regress_pg_dump_schema, so the word "implicit" feels not-descriptive of the test. I recommend picking a different name. Other than that, the change looks good.
Re: Replication slot stats misgivings
On Wed, Apr 14, 2021 at 12:09 PM Amit Kapila wrote: > > On Tue, Apr 13, 2021 at 1:37 PM vignesh C wrote: > > > > On Mon, Apr 12, 2021 at 7:03 PM Masahiko Sawada > > wrote: > > > > > > > > > The following test for the latest v8 patch seems to show different. > > > total_bytes is 1808 whereas spill_bytes is 1320. Am I missing > > > something? > > > > > > postgres(1:85969)=# select pg_create_logical_replication_slot('s', > > > 'test_decoding'); > > > pg_create_logical_replication_slot > > > > > > (s,0/1884468) > > > (1 row) > > > > > > postgres(1:85969)=# create table a (i int); > > > CREATE TABLE > > > postgres(1:85969)=# insert into a select generate_series(1, 10); > > > INSERT 0 10 > > > postgres(1:85969)=# set logical_decoding_work_mem to 64; > > > SET > > > postgres(1:85969)=# select * from pg_stat_replication_slots ; > > > slot_name | total_txns | total_bytes | spill_txns | spill_count | > > > spill_bytes | stream_txns | stream_count | stream_bytes | stats_reset > > > ---++-++-+-+-+--+--+- > > > s | 0 | 0 | 0 | 0 | > > > 0 | 0 |0 |0 | > > > (1 row) > > > > > > postgres(1:85969)=# select count(*) from > > > pg_logical_slot_peek_changes('s', NULL, NULL); > > > count > > > > > > 14 > > > (1 row) > > > > > > postgres(1:85969)=# select * from pg_stat_replication_slots ; > > > slot_name | total_txns | total_bytes | spill_txns | spill_count | > > > spill_bytes | stream_txns | stream_count | stream_bytes | stats_reset > > > ---++-++-+-+-+--+--+- > > > s | 2 |1808 | 1 | 202 | > > > 1320 | 0 |0 |0 | > > > (1 row) > > > > > > > Thanks for identifying this issue, while spilling the transactions > > reorder buffer changes gets released, we will not be able to get the > > total size for spilled transactions from reorderbuffer size. I have > > fixed it by including spilledbytes to totalbytes in case of spilled > > transactions. Attached patch has the fix for this. > > Thoughts? > > > > I am not sure if that is the best way to fix it because sometimes we > clear the serialized flag in which case it might not give the correct > answer. Another way to fix it could be that before we try to restore a > new set of changes, we update totalBytes counter. See, the attached > patch atop your v6-0002-* patch. I felt calculating totalbytes this way is better than depending on spill_bytes. I have taken your changes. Attached patch includes the changes suggested. Thoughts? Regards, Vignesh From 9a4ca0fcef85d000856339cfcb2a58eb87ee5e72 Mon Sep 17 00:00:00 2001 From: vignesh Date: Wed, 14 Apr 2021 10:08:13 +0530 Subject: [PATCH v10 1/4] Added total txns and total txn bytes to replication statistics. This adds the statistics about total transactions count and total transaction data logically replicated to the decoding output plugin from ReorderBuffer. Users can query the pg_stat_replication_slots view to check these stats. --- contrib/test_decoding/expected/stats.out | 79 +-- contrib/test_decoding/sql/stats.sql | 48 +++ doc/src/sgml/monitoring.sgml | 25 ++ src/backend/catalog/system_views.sql | 2 + src/backend/postmaster/pgstat.c | 6 ++ src/backend/replication/logical/logical.c | 16 ++-- .../replication/logical/reorderbuffer.c | 17 src/backend/utils/adt/pgstatfuncs.c | 8 +- src/include/catalog/pg_proc.dat | 6 +- src/include/pgstat.h | 4 + src/include/replication/reorderbuffer.h | 4 + src/test/regress/expected/rules.out | 4 +- 12 files changed, 170 insertions(+), 49 deletions(-) diff --git a/contrib/test_decoding/expected/stats.out b/contrib/test_decoding/expected/stats.out index bca36fa903..bc8e601eab 100644 --- a/contrib/test_decoding/expected/stats.out +++ b/contrib/test_decoding/expected/stats.out @@ -8,7 +8,7 @@ SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_d CREATE TABLE stats_test(data text); -- function to wait for counters to advance -CREATE FUNCTION wait_for_decode_stats(check_reset bool) RETURNS void AS $$ +CREATE FUNCTION wait_for_decode_stats(check_reset bool, check_spill_txns bool) RETURNS void AS $$ DECLARE start_time timestamptz := clock_timestamp(); updated bool; @@ -16,12 +16,25 @@ BEGIN -- we don't want to wait forever; loop will exit after 30 seconds FOR i IN 1 .. 300 LOOP --- check to see if all updates have been reset/updated -SELECT CASE WHEN check_reset THEN (spill_txns = 0) -ELSE
Re: logical replication worker accesses catalogs in error context callback
On Wed, Mar 17, 2021 at 4:52 PM Bharath Rupireddy wrote: > > On Tue, Mar 16, 2021 at 2:21 AM Tom Lane wrote: > > > Thanks for pointing to the changes in the commit message. I corrected > > > them. Attaching v4 patch set, consider it for further review. > > > > I took a quick look at this. I'm quite worried about the potential > > performance cost of the v4-0001 patch (the one for fixing > > slot_store_error_callback). Previously, we didn't pay any noticeable > > cost for having the callback unless there actually was an error. > > As patched, we perform several catalog lookups per column per row, > > even in the non-error code path. That seems like it'd be a noticeable > > performance hit. Just to add insult to injury, it leaks memory. > > > > I propose a more radical but simpler solution: let's just not bother > > with including the type names in the context message. How much are > > they really buying? > > Thanks. In that case, the message can only return the local and remote > columns names and ignore the types (something like below). And the > user will have to figure out what are the types of those columns in > local and remote separately in case of error. Then the function > logicalrep_typmap_gettypname can also be removed. I'm not sure if this > is okay. Thoughts? Hi Tom, As suggested earlier, I'm attaching a v5 patch that avoids printing the column type names in the context message thus no cache lookups have to be done in the error context callback. I think the column name is enough to know on which column the error occurred and if required it's type can be known by the user. This patch gets rid of printing local and remote type names in slot_store_error_callback and also logicalrep_typmap_gettypname because it's unnecessary. I'm not sure if this solution is acceptable. Thoughts? With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com From a65e13d591bb151eb71eb8ec42834ce28a3db304 Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy Date: Wed, 14 Apr 2021 13:25:16 +0530 Subject: [PATCH v5] Avoid Catalogue Accesses In slot_store_error_callback Avoid accessing system catalogues inside slot_store_error_callback error context callback, because the the entire transaction might have been broken at this point. logicalrep_typmap_gettypname() and format_type_be could search the sys cache. As per suggestion from Tom, a simple solution is to just avoid printing the column type names in the message, just the column name is enough to know on which column the error occurred. The above solution makes logicalrep_typmap_gettypname redundant hence removed it. --- src/backend/replication/logical/relation.c | 49 -- src/backend/replication/logical/worker.c | 25 +-- src/include/replication/logicalrelation.h | 2 - 3 files changed, 2 insertions(+), 74 deletions(-) diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c index e861c0ff80..cbc5612dbe 100644 --- a/src/backend/replication/logical/relation.c +++ b/src/backend/replication/logical/relation.c @@ -490,55 +490,6 @@ logicalrep_typmap_update(LogicalRepTyp *remotetyp) MemoryContextSwitchTo(oldctx); } -/* - * Fetch type name from the cache by remote type OID. - * - * Return a substitute value if we cannot find the data type; no message is - * sent to the log in that case, because this is used by error callback - * already. - */ -char * -logicalrep_typmap_gettypname(Oid remoteid) -{ - LogicalRepTyp *entry; - bool found; - - /* Internal types are mapped directly. */ - if (remoteid < FirstGenbkiObjectId) - { - if (!get_typisdefined(remoteid)) - { - /* - * This can be caused by having a publisher with a higher - * PostgreSQL major version than the subscriber. - */ - return psprintf("unrecognized %u", remoteid); - } - - return format_type_be(remoteid); - } - - if (LogicalRepTypMap == NULL) - { - /* - * If the typemap is not initialized yet, we cannot possibly attempt - * to search the hash table; but there's no way we know the type - * locally yet, since we haven't received a message about this type, - * so this is the best we can do. - */ - return psprintf("unrecognized %u", remoteid); - } - - /* search the mapping */ - entry = hash_search(LogicalRepTypMap, (void *) , - HASH_FIND, ); - if (!found) - return psprintf("unrecognized %u", remoteid); - - Assert(OidIsValid(entry->remoteid)); - return psprintf("%s.%s", entry->nspname, entry->typname); -} - /* * Partition cache: look up partition LogicalRepRelMapEntry's * diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c index fb3ba5c415..6494bef1d9 100644 --- a/src/backend/replication/logical/worker.c +++ b/src/backend/replication/logical/worker.c @@ -132,7 +132,6 @@ static dlist_head lsn_mapping = DLIST_STATIC_INIT(lsn_mapping); typedef struct SlotErrCallbackArg { LogicalRepRelMapEntry *rel; - int local_attnum; int
Re: Monitoring stats docs inconsistency
On Tue, Apr 13, 2021 at 6:08 PM vignesh C wrote: > > Few of the statistics description in monitoring_stats.sgml doc is not > consistent. Made all the descriptions consistent by including > punctuation marks at the end of each description. > Thoughts? > I think monitoring.sgml uses a similar pattern as we use for system catalogs. I am not sure of the rules in this regard but it appears that normally for single line descriptions (for fields like OID, name, etc.), we don't use a full stop at the end. -- With Regards, Amit Kapila.
Re: [PATCH] Add extra statistics to explain for Nested Loop
Thank you for working on this issue. Your comments helped me make this patch more correct. Lines with "colon" format shouldn't use equal signs, and should use two spaces between fields. Done. Now extra line looks like "Loop min_rows: %.0f max_rows: %.0f total_rows: %.0f" or "Loop min_time: %.3f max_time: %.3f min_rows: %.0f max_rows: %.0f total_rows: %.0f". Since this is now on a separate line, the "if (nloops > 1 && es->verbose)" can be after the existing "if (es->timing)", and shouldn't need its own "if (es->timing)". It should conditionally add a separate line, rather than duplicating the "(actual.*" line. - if (es->timing) + if (nloops > 1 && es->verbose) New version of patch contains this correction. It helped make the patch shorter. In non-text mode, think you should not check "nloops > 1". Rather, print the field as 0. The fields will not be zeros. New line will almost repeat the line with main sttistics. I think the labels in non-text format should say "Loop Min Time" or similar. And these variables should have a loop_ prefix like loop_min_t ? There are good ideas. I changed it. I apply new version of this patch. I hope it got better. Please don't hesitate to share any thoughts on this topic. -- Ekaterina Sokolova Postgres Professional: http://www.postgrespro.com The Russian Postgres CompanyFrom: "Ekaterina Sokolova" diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index b62a76e7e5a..bf8c37baefd 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -1615,6 +1615,11 @@ ExplainNode(PlanState *planstate, List *ancestors, double startup_ms = 1000.0 * planstate->instrument->startup / nloops; double total_ms = 1000.0 * planstate->instrument->total / nloops; double rows = planstate->instrument->ntuples / nloops; + double loop_total_rows = planstate->instrument->ntuples; + double loop_min_r = planstate->instrument->min_tuples; + double loop_max_r = planstate->instrument->max_tuples; + double loop_min_t_ms = 1000.0 * planstate->instrument->min_t; + double loop_max_t_ms = 1000.0 * planstate->instrument->max_t; if (es->format == EXPLAIN_FORMAT_TEXT) { @@ -1626,6 +1631,19 @@ ExplainNode(PlanState *planstate, List *ancestors, appendStringInfo(es->str, " (actual rows=%.0f loops=%.0f)", rows, nloops); + if (nloops > 1 && es->verbose) + { +appendStringInfo(es->str, "\n"); +ExplainIndentText(es); +if (es->timing) + appendStringInfo(es->str, + "Loop min_time: %.3f max_time: %.3f min_rows: %.0f max_rows: %.0f total_rows: %.0f", + loop_min_t_ms, loop_max_t_ms, loop_min_r, loop_max_r, loop_total_rows); +else + appendStringInfo(es->str, + "Loop min_rows: %.0f max_rows: %.0f total_rows: %.0f", + loop_min_r, loop_max_r, loop_total_rows); + } } else { @@ -1635,8 +1653,21 @@ ExplainNode(PlanState *planstate, List *ancestors, 3, es); ExplainPropertyFloat("Actual Total Time", "s", total_ms, 3, es); +if (nloops > 1 && es->verbose) +{ + ExplainPropertyFloat("Loop Min Time", "s", loop_min_t_ms, + 3, es); + ExplainPropertyFloat("Loop Max Time", "s", loop_max_t_ms, + 3, es); +} } ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es); + if (nloops > 1 && es->verbose) + { +ExplainPropertyFloat("Loop Min Rows", NULL, loop_min_r, 0, es); +ExplainPropertyFloat("Loop Max Rows", NULL, loop_max_r, 0, es); +ExplainPropertyFloat("Loop Total Rows", NULL, loop_total_rows, 0, es); + } ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es); } } @@ -1646,6 +1677,7 @@ ExplainNode(PlanState *planstate, List *ancestors, appendStringInfoString(es->str, " (never executed)"); else { + /* without min and max values because actual result is 0 */ if (es->timing) { ExplainPropertyFloat("Actual Startup Time", "ms", 0.0, 3, es); @@ -1672,12 +1704,22 @@ ExplainNode(PlanState *planstate, List *ancestors, double startup_ms; double total_ms; double rows; + double loop_min_t_ms; + double loop_max_t_ms; + double loop_min_r; + double loop_max_r; + double loop_total_rows; if (nloops <= 0) continue; startup_ms = 1000.0 * instrument->startup / nloops; total_ms = 1000.0 * instrument->total / nloops; rows = instrument->ntuples / nloops; + loop_min_t_ms = 1000.0 * instrument->min_t; + loop_max_t_ms = 1000.0 * instrument->max_t; + loop_min_r = instrument->min_tuples; + loop_max_r = instrument->max_tuples; + loop_total_rows = instrument->ntuples; ExplainOpenWorker(n, es); @@ -1692,6 +1734,19 @@ ExplainNode(PlanState *planstate, List *ancestors, appendStringInfo(es->str, "actual rows=%.0f loops=%.0f\n", rows, nloops); +if (nloops > 1) +{ +
Re: jsonb subscripting assignment performance
On Wed, Apr 14, 2021, at 11:07, Oleg Bartunov wrote: > I and Nikita are working on OLTP jsonb > http://www.sai.msu.su/~megera/postgres/talks/jsonb-pgconfonline-2021.pdf > Page 49/55 in the PDF: "UPDATE test_toast SET jb = jsonb_set(jb, {keyN,0}, ?);" Would you get similar improvements if updating jsonb variables in PL/pgSQL? If not, could the infrastructure somehow be reused to improve the PL/pgSQL use-case as well? I would be happy to help out if there is something I can do, such as testing. /Joel
Re: [PATCH] PREPARE TRANSACTION unexpected behavior with TEMP TABLE
Hi Robert, Thanks for sharing your thoughts. The purpose of this FIX is to mainly focus on getting consistent behavior with PREPARE TRANSACTION. With the case that I had mentioned previously, my expectation was either both PREPARE TRANSACTION should fail or both should succeed but here second same "PREPARE TRANSACTION" was successful however first one was failing with an error, which is kind of weird to me. I have also tried to reproduce the behavior. [session:1] postgres=# create temp table fullname (first text, last text); CREATE TABLE [session:2] postgres=# select oid::regclass from pg_class where relname = 'fullname'; oid pg_temp_3.fullname postgres=# BEGIN; create function longname1(pg_temp_3.fullname) returns text language sql as $$select $1.first || ' ' || $1.last$$; BEGIN CREATE FUNCTION postgres=*# prepare transaction 'mytran2'; ERROR: cannot PREPARE a transaction that has operated on temporary objects postgres=# BEGIN; create function longname1(pg_temp_3.fullname) returns text language sql as $$select $1.first || ' ' || $1.last$$; BEGIN CREATE FUNCTION [session:1] postgres=# \q// no problem in exiting [session:2] postgres=*# prepare transaction 'mytran2'; PREPARE TRANSACTION postgres=# \df ERROR: cache lookup failed for type 16429 looking at the comment in the code [session:1] should hang while exiting but I don't see a problem here, you have already explained that in your reply. Even then I feel that behavior should be consistent when we mix temporary objects in PREPARE TRANSACTION. The comments in > PrepareTransaction() justify this prohibition by saying that "Having > the prepared xact hold locks on another backend's temp table seems > a bad idea --- for instance it would prevent the backend from exiting. > There are other problems too, such as how to clean up the source > backend's local buffers and ON COMMIT state if the prepared xact > includes a DROP of a temp table." > I can see from the above experiment that there is no problem with the lock in the above case but not sure if there is any issue with "clean up the source backend's local buffers", if not then we don't even need this ERROR (ERROR: cannot PREPARE a transaction that has operated on temporary objects) in PREPARE TRANSACTION? To really fix this, you'd need CREATE FUNCTION to take a lock on the > containing namespace, whether permanent or temporary. You'd also need > every other CREATE statement that creates a schema-qualified object to > do the same thing. Maybe that's a good idea, but we've been reluctant > to go that far in the past due to performance consequences, and it's > not clear whether any of those problems are related to the issue that > prompted you to submit the patch. Yes, the purpose of this patch is to actually have a valid value in XACT_FLAGS_ACCESSEDTEMPNAMESPACE, having said that it should always be true if we access temporary object else false. Even if we do changes to have lock in case of "CREATE FUNCTION", we also need to have this FIX in place so that "PREPARE TRANSACTION" mixed with TEMPORARY OBJECT will always be restricted and will not cause any hang issue(which we will start observing once we implement these "CREATE STATEMENT" changes) as mentioned in the comment in the PrepareTransaction(). Just thinking if it's acceptable to FIX this and make it consistent by properly setting XACT_FLAGS_ACCESSEDTEMPNAMESPACE, so that it should always fail if we access the temporary object, I also agree here that it will not actually cause any issue because of xact lock but then from user perspective it seems weird when the same PREPARE TRANSACTION is working second time onwards, thoughts? Thanks, Himanshu On Thu, Apr 8, 2021 at 7:43 PM Robert Haas wrote: > On Wed, Apr 7, 2021 at 12:19 PM Himanshu Upadhyaya > wrote: > > Please find attached the V2 patch. > > Hi, > > This patch is essentially taking the position that calling > load_typcache_tupdesc before using that tupdesc for anything is > required for correctness. I'm not sure whether that's a good > architectural decision: to me, it looks like whoever wrote this code > originally - I think it was Tom - had the idea that it would be OK to > skip calling that function whenever we already have the value. > Changing that has some small performance cost, and it also just looks > kind of weird, because you don't expect a function called > load_typcache_tupdesc() to have the side effect of preventing some > kind of bad thing from happening. You just expect it to be loading > stuff. The comments in this code are not exactly stellar as things > stand, but the patch also doesn't update them in a meaningful way. > Sure, it corrects a few comments that would be flat-out wrong > otherwise, but it doesn't add any kind of explanation that would help > the next person who looks at this code understand why they shouldn't > just put back the exact same performance optimization you're proposing > to rip
Re: Unresolved repliaction hang and stop problem.
On Tue, Apr 13, 2021 at 1:18 PM Krzysztof Kois wrote: > > Hello, > After upgrading the cluster from 10.x to 13.1 we've started getting a problem > describe pgsql-general: > https://www.postgresql.org/message-id/8bf8785c-f47d-245c-b6af-80dc1eed40db%40unitygroup.com > We've noticed similar issue being described on this list in > https://www.postgresql-archive.org/Logical-replication-CPU-bound-with-TRUNCATE-DROP-CREATE-many-tables-tt6155123.html > with a fix being rolled out in 13.2. > The fix for the problem discussed in the above threads is committed only in PG-14, see [1]. I don't know what makes you think it is fixed in 13.2. Also, it is not easy to back-patch that because this fix depends on some of the infrastructure introduced in PG-14. [1] - https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d7eb52d7181d83cf2363570f7a205b8eb1008dbc -- With Regards, Amit Kapila.
Re: Truncate in synchronous logical replication failed
On Tue, Apr 13, 2021 at 8:07 PM Petr Jelinek wrote: > > > On 12 Apr 2021, at 08:58, Amit Kapila wrote: > > > > The problem happens only when we try to fetch IDENTITY_KEY attributes > > because pgoutput uses RelationGetIndexAttrBitmap() to get that > > information which locks the required indexes. Now, because TRUNCATE > > has already acquired an exclusive lock on the index, it seems to > > create a sort of deadlock where the actual Truncate operation waits > > for logical replication of operation to complete and logical > > replication waits for actual Truncate operation to finish. > > > > Do we really need to use RelationGetIndexAttrBitmap() to build > > IDENTITY_KEY attributes? During decoding, we don't even lock the main > > relation, we just scan the system table and build that information > > using a historic snapshot. Can't we do something similar here? > > > > Adding Petr J. and Peter E. to know their views as this seems to be an > > old problem (since the decoding of Truncate operation is introduced). > > We used RelationGetIndexAttrBitmap because it already existed, no other > reason. > Fair enough. But I think we should do something about it because using the same (RelationGetIndexAttrBitmap) just breaks the synchronous logical replication. I think this is broken since the logical replication of Truncate is supported. > I am not sure what exact locking we need but I would have guessed at least > AccessShareLock would be needed. > Are you telling that we need AccessShareLock on the index? If so, why is it different from how we access the relation during decoding (basically in ReorderBufferProcessTXN, we directly use RelationIdGetRelation() without any lock on the relation)? I think we do it that way because we need it to process WAL entries and we need the relation state based on the historic snapshot, so, even if the relation is later changed/dropped, we are fine with the old state we got. Isn't the same thing applies here in logicalrep_write_attrs? If that is true then some equivalent of RelationGetIndexAttrBitmap where we use RelationIdGetRelation instead of index_open should work? Am, I missing something? -- With Regards, Amit Kapila.
Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements
On Wed, 2021-04-14 at 17:32 +0800, Julien Rouhaud wrote: > > did you enable compute_query_id new parameter? Hi, Julien! Thank you very much! I've missed it. >
Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements
Le mer. 14 avr. 2021 à 17:22, Andrei Zubkov a écrit : > > But I'm unable to test the patch - it seems that pg_stat_statements is > receiving queryId = 0 for every statements in every hook now and > statements are not tracked at all. > > Am I mistaken somewhere? Maybe you know why this is happening? > did you enable compute_query_id new parameter? >
Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements
Hi, Kuroda! I've intended to change the pg_stat_statements version with rebasing this patch to the current master branch state. Now this is commit 07e5e66. But I'm unable to test the patch - it seems that pg_stat_statements is receiving queryId = 0 for every statements in every hook now and statements are not tracked at all. Am I mistaken somewhere? Maybe you know why this is happening? Thank you! -- Andrei Zubkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company On Fri, 2021-04-09 at 00:23 +, kuroda.hay...@fujitsu.com wrote: > Dear Andrei, > > > I think, yes, version of pg_stat_statements is need to be updated. > > Is > > it will be 1.10 in PG15? > > I think you are right. > According to [1] we can bump up the version per one PG major version, > and any features are not committed yet for 15. > > [1]: https://www.postgresql.org/message-id/20201202040516.GA43757@nol > > > Best Regards, > Hayato Kuroda > FUJITSU LIMITED >
Re: jsonb subscripting assignment performance
st 14. 4. 2021 v 11:07 odesílatel Oleg Bartunov napsal: > > > On Wed, Apr 14, 2021 at 11:09 AM Pavel Stehule > wrote: > >> >> >> st 14. 4. 2021 v 9:57 odesílatel Dmitry Dolgov <9erthali...@gmail.com> >> napsal: >> >>> > On Wed, Apr 14, 2021 at 09:20:08AM +0200, Pavel Stehule wrote: >>> > st 14. 4. 2021 v 7:39 odesílatel Joel Jacobson >>> napsal: >>> > >>> > > Hi, >>> > > >>> > > commit 676887a3 added support for jsonb subscripting. >>> > > >>> > > Many thanks for working on this. I really like the improved syntax. >>> > > >>> > > I was also hoping for some performance benefits, >>> > > but my testing shows that >>> > > >>> > >jsonb_value['existing_key'] = new_value; >>> > > >>> > > takes just as long time as >>> > > >>> > >jsonb_value := jsonb_set(jsonb_value, ARRAY['existing_key'], >>> new_value); >>> > > >>> > > which is a bit surprising to me. Shouldn't subscripting be a lot >>> faster, >>> > > since it could modify the existing data structure in-place? What am I >>> > > missing here? >>> > > >>> > >>> > no - it doesn't support in-place modification. Only arrays and records >>> > support it. >>> > >>> > >>> > > I came to think of the this new functionality when trying to >>> optimize some >>> > > PL/pgSQL code where the bottle-neck turned out to be lots of calls >>> > > to jsonb_set() for large jsonb objects. >>> > > >>> > >>> > sure - there is big room for optimization. But this patch was big >>> enough >>> > without its optimization. And it was not clean, if I will be committed >>> or >>> > not (it waited in commitfest application for 4 years). So I accepted >>> > implemented behaviour (without inplace update). Now, this patch is in >>> core, >>> > and anybody can work on others possible optimizations. >>> >>> Right, jsonb subscripting deals mostly with the syntax part and doesn't >>> change internal jsonb behaviour. If I understand the original question >>> correctly, "in-place" here means updating of e.g. just one particular >>> key within a jsonb object, since jsonb_set looks like an overwrite of >>> the whole jsonb. If so, then update will still cause the whole jsonb to >>> be updated, there is no partial update functionality for the on-disk >>> format. Although there is work going on to optimize this in case when >>> jsonb is big enough to be put into a toast table (partial toast >>> decompression thread, or bytea appendable toast). >>> >> >> Almost all and almost everywhere Postgres's values are immutable. There >> is only one exception - runtime plpgsql. "local variables" can hold values >> of complex values unboxed. Then the repeated update is significantly >> cheaper. Normal non repeated updates have the same speed, because the value >> should be unboxed and boxed. Outside plpgsql the values are immutable. I >> think this is a very hard problem, how to update big toasted values >> effectively, and I am not sure if there is a solution. TOAST value is >> immutable. It needs to introduce some alternative to TOAST. The benefits >> are clear - it can be nice to have fast append arrays for time series. But >> this is a very different topic. >> > > I and Nikita are working on OLTP jsonb > http://www.sai.msu.su/~megera/postgres/talks/jsonb-pgconfonline-2021.pdf > +1 Pavel > >> >> Regards >> >> Pavel >> >> >> >> >> >> >> >> > > -- > Postgres Professional: http://www.postgrespro.com > The Russian Postgres Company >
Re: jsonb subscripting assignment performance
On Wed, Apr 14, 2021 at 11:09 AM Pavel Stehule wrote: > > > st 14. 4. 2021 v 9:57 odesílatel Dmitry Dolgov <9erthali...@gmail.com> > napsal: > >> > On Wed, Apr 14, 2021 at 09:20:08AM +0200, Pavel Stehule wrote: >> > st 14. 4. 2021 v 7:39 odesílatel Joel Jacobson >> napsal: >> > >> > > Hi, >> > > >> > > commit 676887a3 added support for jsonb subscripting. >> > > >> > > Many thanks for working on this. I really like the improved syntax. >> > > >> > > I was also hoping for some performance benefits, >> > > but my testing shows that >> > > >> > >jsonb_value['existing_key'] = new_value; >> > > >> > > takes just as long time as >> > > >> > >jsonb_value := jsonb_set(jsonb_value, ARRAY['existing_key'], >> new_value); >> > > >> > > which is a bit surprising to me. Shouldn't subscripting be a lot >> faster, >> > > since it could modify the existing data structure in-place? What am I >> > > missing here? >> > > >> > >> > no - it doesn't support in-place modification. Only arrays and records >> > support it. >> > >> > >> > > I came to think of the this new functionality when trying to optimize >> some >> > > PL/pgSQL code where the bottle-neck turned out to be lots of calls >> > > to jsonb_set() for large jsonb objects. >> > > >> > >> > sure - there is big room for optimization. But this patch was big enough >> > without its optimization. And it was not clean, if I will be committed >> or >> > not (it waited in commitfest application for 4 years). So I accepted >> > implemented behaviour (without inplace update). Now, this patch is in >> core, >> > and anybody can work on others possible optimizations. >> >> Right, jsonb subscripting deals mostly with the syntax part and doesn't >> change internal jsonb behaviour. If I understand the original question >> correctly, "in-place" here means updating of e.g. just one particular >> key within a jsonb object, since jsonb_set looks like an overwrite of >> the whole jsonb. If so, then update will still cause the whole jsonb to >> be updated, there is no partial update functionality for the on-disk >> format. Although there is work going on to optimize this in case when >> jsonb is big enough to be put into a toast table (partial toast >> decompression thread, or bytea appendable toast). >> > > Almost all and almost everywhere Postgres's values are immutable. There is > only one exception - runtime plpgsql. "local variables" can hold values of > complex values unboxed. Then the repeated update is significantly cheaper. > Normal non repeated updates have the same speed, because the value should > be unboxed and boxed. Outside plpgsql the values are immutable. I think > this is a very hard problem, how to update big toasted values effectively, > and I am not sure if there is a solution. TOAST value is immutable. It > needs to introduce some alternative to TOAST. The benefits are clear - it > can be nice to have fast append arrays for time series. But this is a very > different topic. > I and Nikita are working on OLTP jsonb http://www.sai.msu.su/~megera/postgres/talks/jsonb-pgconfonline-2021.pdf > > Regards > > Pavel > > > > > > > > -- Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: More sepgsql weirdness
Hi On Tue, Apr 13, 2021 at 6:22 PM Robert Haas wrote: > On Tue, Apr 13, 2021 at 10:33 AM Dave Page wrote: > > On a system with selinux and sepgsql configured, search path resolution > appears to fail if sepgsql is in enforcing mode, but selinux is in > permissive mode (which, as I understand it, should cause sepgsql to behave > as if it's in permissive mode anyway - and does for other operations). > Regardless of whether my understanding of the interaction of the two > permissive modes is correct, I don't believe the following should happen: > > I agree that this sounds like something which shouldn't happen if the > system is in permissive mode, I realised that my test database hadn't had the sepgsql SQL script run in it (I must have created it before running it on template1). I guess the error was caused by lack of proper labelling. So, clearly my fault, but I think there are a couple of things we need to do here: 1) Improve the docs for sepgsql. The *only* vaguely useful source of info I've found on using this is "SELinux System Administration", a Packt book by Sven Vermeulen. Our own docs don't even list the supported object classes (e.g. db_table) or types (e.g. sepgsql_ro_table_t) for example. 2) Improve the way we handle cases like the one I ran into. I only realised what was going on when I tried to run sepgsql_getcon() to confirm I was running in undefined_t. Clearly very weird things can happen if labelling hasn't been run; perhaps we could raise a notice if the sepgsql module is loaded but sepgsql_getcon() isn't present (though that seems flakey at best)? I'd hesitate to try to check for the presence of one or more labels as the admin could have intentionally removed them or changed them of course. -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EDB: http://www.enterprisedb.com
RE: Truncate in synchronous logical replication failed
On Wednesday, April 14, 2021 11:38 AM Japin Li wrote: > On Tue, 13 Apr 2021 at 21:54, osumi.takami...@fujitsu.com > wrote: > > On Monday, April 12, 2021 3:58 PM Amit Kapila > wrote: > >> On Mon, Apr 12, 2021 at 10:03 AM osumi.takami...@fujitsu.com > >> wrote: > >> > but if we take a measure to fix the doc, we have to be careful for > >> > the description, because when we remove the primary keys of 'test' > >> > tables on the > >> scenario in [1], we don't have this issue. > >> > It means TRUNCATE in synchronous logical replication is not always > >> blocked. > >> > > >> > >> The problem happens only when we try to fetch IDENTITY_KEY attributes > >> because pgoutput uses RelationGetIndexAttrBitmap() to get that > >> information which locks the required indexes. Now, because TRUNCATE > >> has already acquired an exclusive lock on the index, it seems to > >> create a sort of deadlock where the actual Truncate operation waits > >> for logical replication of operation to complete and logical replication > >> waits > for actual Truncate operation to finish. > >> > >> Do we really need to use RelationGetIndexAttrBitmap() to build > >> IDENTITY_KEY attributes? During decoding, we don't even lock the main > >> relation, we just scan the system table and build that information > >> using a historic snapshot. Can't we do something similar here? > > I think we can build the IDENTITY_KEY attributes with NoLock instead > > of calling RelationGetIndexAttrBitmap(). > > > > When we trace back the caller side of logicalrep_write_attrs(), doing > > the thing equivalent to RelationGetIndexAttrBitmap() for > > INDEX_ATTR_BITMAP_IDENTITY_KEY impacts only pgoutput_truncate. > > > > OTOH, I can't find codes similar to RelationGetIndexAttrBitmap() in > > pgoutput_* functions and in the file of relcache.c. > > Therefore, I'd like to discuss how to address the hang. > > > > My first idea is to extract some parts of RelationGetIndexAttrBitmap() > > only for INDEX_ATTR_BITMAP_IDENTITY_KEY and implement those either > in > > a logicalrep_write_attrs() or as a new function. > > RelationGetIndexAttrBitmap() has 'restart' label for goto statement in > > order to ensure to return up-to-date attribute bitmaps, so I prefer > > having a new function when we choose this direction. > > Having that goto in logicalrep_write_attrs() makes it a little bit messy, I > > felt. > > > > The other direction might be to extend RelationGetIndexAttrBitmap's > > function definition to accept lockmode to give NoLock from > logicalrep_write_attrs(). > > But, this change impacts on other several callers so is not as good as the > > first > direction above, I think. > > > > If someone has any better idea, please let me know. > > > > I think the first idea is better than the second. OTOH, can we release the > locks > before SyncRepWaitForLSN(), since it already flush to local WAL files. Thank you for your comments. I didn't mean to change and touch TRUNCATE side to release the locks, because I expected that its AccessExclusiveLock protects any other operation (e.g. DROP INDEX) to the table which affects IDENTITY KEY building. But, now as I said in another e-mail, both ideas above can't work. Really sorry for making noises. Best Regards, Takamichi Osumi
File truncation within PostgresNode::issues_sql_like() wrong on Windows
Hi all, As fairywren has proved a couple of days ago, it is not really a good idea to rely on a file truncation to check for patterns in the logs of the backend: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren=2021-04-07%2013%3A29%3A28 Visibly, a logic based on the log file truncation fails on Windows because of the concurrent access of the backend that outputs its logs there. In PostgresNode.pm, connect_ok() and connect_access() enforce a rotation of the log file before restarting the server on Windows to make sure that a given step does not find logs generated by a previous test, but that's not the case of issues_sql_like(). Looking at the existing tests using this routine (src/bin/scripts/), I have found on test in 090_reindexdb.pl that could lead to a false positive. The test is marked in the patch attached, just for awareness. Would there be any objections to change this routine so as we avoid the file truncation on Windows? The patch attached achieves that. Any thoughts? -- Michael diff --git a/src/bin/scripts/t/090_reindexdb.pl b/src/bin/scripts/t/090_reindexdb.pl index 159b637230..8b06218d6b 100644 --- a/src/bin/scripts/t/090_reindexdb.pl +++ b/src/bin/scripts/t/090_reindexdb.pl @@ -174,6 +174,7 @@ $node->command_fails( $node->command_fails( [ 'reindexdb', '-j', '2', '-i', 'i1', 'postgres' ], 'parallel reindexdb cannot process indexes'); +# XXX The first query maps with a test above. $node->issues_sql_like( [ 'reindexdb', '-j', '2', 'postgres' ], qr/statement:\ REINDEX SYSTEM postgres; diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm index e26b2b3f30..9daa438ccc 100644 --- a/src/test/perl/PostgresNode.pm +++ b/src/test/perl/PostgresNode.pm @@ -2195,7 +2195,8 @@ Run a command on the node, then verify that $expected_sql appears in the server log file. Reads the whole log file so be careful when working with large log outputs. -The log file is truncated prior to running the command, however. +The log file is truncated prior to running the command, and on Windows, a +rotation of the log file is done before restarting the node. =cut @@ -2207,7 +2208,18 @@ sub issues_sql_like local %ENV = $self->_get_env(); - truncate $self->logfile, 0; + # On Windows, the truncation would not work, so rotate the log + # file before restarting the server afresh. + if ($TestLib::windows_os) + { + $self->rotate_logfile; + $self->restart; + } + else + { + truncate $self->logfile, 0; + } + my $result = TestLib::run_log($cmd); ok($result, "@$cmd exit code 0"); my $log = TestLib::slurp_file($self->logfile); signature.asc Description: PGP signature
Re: jsonb subscripting assignment performance
st 14. 4. 2021 v 9:57 odesílatel Dmitry Dolgov <9erthali...@gmail.com> napsal: > > On Wed, Apr 14, 2021 at 09:20:08AM +0200, Pavel Stehule wrote: > > st 14. 4. 2021 v 7:39 odesílatel Joel Jacobson > napsal: > > > > > Hi, > > > > > > commit 676887a3 added support for jsonb subscripting. > > > > > > Many thanks for working on this. I really like the improved syntax. > > > > > > I was also hoping for some performance benefits, > > > but my testing shows that > > > > > >jsonb_value['existing_key'] = new_value; > > > > > > takes just as long time as > > > > > >jsonb_value := jsonb_set(jsonb_value, ARRAY['existing_key'], > new_value); > > > > > > which is a bit surprising to me. Shouldn't subscripting be a lot > faster, > > > since it could modify the existing data structure in-place? What am I > > > missing here? > > > > > > > no - it doesn't support in-place modification. Only arrays and records > > support it. > > > > > > > I came to think of the this new functionality when trying to optimize > some > > > PL/pgSQL code where the bottle-neck turned out to be lots of calls > > > to jsonb_set() for large jsonb objects. > > > > > > > sure - there is big room for optimization. But this patch was big enough > > without its optimization. And it was not clean, if I will be committed or > > not (it waited in commitfest application for 4 years). So I accepted > > implemented behaviour (without inplace update). Now, this patch is in > core, > > and anybody can work on others possible optimizations. > > Right, jsonb subscripting deals mostly with the syntax part and doesn't > change internal jsonb behaviour. If I understand the original question > correctly, "in-place" here means updating of e.g. just one particular > key within a jsonb object, since jsonb_set looks like an overwrite of > the whole jsonb. If so, then update will still cause the whole jsonb to > be updated, there is no partial update functionality for the on-disk > format. Although there is work going on to optimize this in case when > jsonb is big enough to be put into a toast table (partial toast > decompression thread, or bytea appendable toast). > Almost all and almost everywhere Postgres's values are immutable. There is only one exception - runtime plpgsql. "local variables" can hold values of complex values unboxed. Then the repeated update is significantly cheaper. Normal non repeated updates have the same speed, because the value should be unboxed and boxed. Outside plpgsql the values are immutable. I think this is a very hard problem, how to update big toasted values effectively, and I am not sure if there is a solution. TOAST value is immutable. It needs to introduce some alternative to TOAST. The benefits are clear - it can be nice to have fast append arrays for time series. But this is a very different topic. Regards Pavel
Re: jsonb subscripting assignment performance
> On Wed, Apr 14, 2021 at 09:20:08AM +0200, Pavel Stehule wrote: > st 14. 4. 2021 v 7:39 odesílatel Joel Jacobson napsal: > > > Hi, > > > > commit 676887a3 added support for jsonb subscripting. > > > > Many thanks for working on this. I really like the improved syntax. > > > > I was also hoping for some performance benefits, > > but my testing shows that > > > >jsonb_value['existing_key'] = new_value; > > > > takes just as long time as > > > >jsonb_value := jsonb_set(jsonb_value, ARRAY['existing_key'], new_value); > > > > which is a bit surprising to me. Shouldn't subscripting be a lot faster, > > since it could modify the existing data structure in-place? What am I > > missing here? > > > > no - it doesn't support in-place modification. Only arrays and records > support it. > > > > I came to think of the this new functionality when trying to optimize some > > PL/pgSQL code where the bottle-neck turned out to be lots of calls > > to jsonb_set() for large jsonb objects. > > > > sure - there is big room for optimization. But this patch was big enough > without its optimization. And it was not clean, if I will be committed or > not (it waited in commitfest application for 4 years). So I accepted > implemented behaviour (without inplace update). Now, this patch is in core, > and anybody can work on others possible optimizations. Right, jsonb subscripting deals mostly with the syntax part and doesn't change internal jsonb behaviour. If I understand the original question correctly, "in-place" here means updating of e.g. just one particular key within a jsonb object, since jsonb_set looks like an overwrite of the whole jsonb. If so, then update will still cause the whole jsonb to be updated, there is no partial update functionality for the on-disk format. Although there is work going on to optimize this in case when jsonb is big enough to be put into a toast table (partial toast decompression thread, or bytea appendable toast).
Re: jsonb subscripting assignment performance
On Wed, Apr 14, 2021, at 09:20, Pavel Stehule wrote: > sure - there is big room for optimization. But this patch was big enough > without its optimization. And it was not clean, if I will be committed or not > (it waited in commitfest application for 4 years). So I accepted implemented > behaviour (without inplace update). Now, this patch is in core, and anybody > can work on others possible optimizations. Thanks for explaining. Do we a rough idea on how in-place could be implemented in a non-invasive non-controversial way that ought to be accepted by the project, if done right? Or are there other complicated problems that needs to be solved first? I'm asking because I could be interested in working on this, but I know my limitations when it comes to C, so I want to get an idea on if it should be more or less straightforward, or if we already know on beforehand it would require committer-level expertise of the PostgreSQL code base for any realistic chance of being successful. /Joel
RE: Truncate in synchronous logical replication failed
On Tuesday, April 13, 2021 11:38 PM Petr Jelinek wrote: > > On 12 Apr 2021, at 08:58, Amit Kapila wrote: > > On Mon, Apr 12, 2021 at 10:03 AM osumi.takami...@fujitsu.com > > wrote: > >> > >>> I checked the PG-DOC, found it says that “Replication of TRUNCATE > >>> commands is supported”[1], so maybe TRUNCATE is not supported in > >>> synchronous logical replication? > >>> > >>> If my understanding is right, maybe PG-DOC can be modified like > >>> this. Any thought? > >>> Replication of TRUNCATE commands is supported > >>> -> > >>> Replication of TRUNCATE commands is supported in asynchronous > mode > >> I'm not sure if this becomes the final solution, > >> > > > > I think unless the solution is not possible or extremely complicated > > going via this route doesn't seem advisable. > > > >> but if we take a measure to fix the doc, we have to be careful for > >> the description, because when we remove the primary keys of 'test' tables > on the scenario in [1], we don't have this issue. > >> It means TRUNCATE in synchronous logical replication is not always > blocked. > >> > > > > The problem happens only when we try to fetch IDENTITY_KEY attributes > > because pgoutput uses RelationGetIndexAttrBitmap() to get that > > information which locks the required indexes. Now, because TRUNCATE > > has already acquired an exclusive lock on the index, it seems to > > create a sort of deadlock where the actual Truncate operation waits > > for logical replication of operation to complete and logical > > replication waits for actual Truncate operation to finish. > > > > Do we really need to use RelationGetIndexAttrBitmap() to build > > IDENTITY_KEY attributes? During decoding, we don't even lock the main > > relation, we just scan the system table and build that information > > using a historic snapshot. Can't we do something similar here? > > > > Adding Petr J. and Peter E. to know their views as this seems to be an > > old problem (since the decoding of Truncate operation is introduced). > > We used RelationGetIndexAttrBitmap because it already existed, no other > reason.I am not sure what exact locking we need but I would have guessed at > least AccessShareLock would be needed. This was true. Having a look at the comment of index_open(), there's a description of basic rule that NoLock should be used if appropriate lock on the index is already taken. And, making the walsender use NoLock to build the attributes leads us to the Assert in the relation_open(). Please ignore the two ideas I suggested in another mail, which doesn't follow the basic and work. Best Regards, Takamichi Osumi
View invoker privileges
Hello guys! In Postgres we can create view with view owner privileges only. What’s the reason that there is no option to create view with invoker privileges? Is there any technical or security subtleties related to absence of this feature?
Re: jsonb subscripting assignment performance
st 14. 4. 2021 v 7:39 odesílatel Joel Jacobson napsal: > Hi, > > commit 676887a3 added support for jsonb subscripting. > > Many thanks for working on this. I really like the improved syntax. > > I was also hoping for some performance benefits, > but my testing shows that > >jsonb_value['existing_key'] = new_value; > > takes just as long time as > >jsonb_value := jsonb_set(jsonb_value, ARRAY['existing_key'], new_value); > > which is a bit surprising to me. Shouldn't subscripting be a lot faster, > since it could modify the existing data structure in-place? What am I > missing here? > no - it doesn't support in-place modification. Only arrays and records support it. > I came to think of the this new functionality when trying to optimize some > PL/pgSQL code where the bottle-neck turned out to be lots of calls > to jsonb_set() for large jsonb objects. > sure - there is big room for optimization. But this patch was big enough without its optimization. And it was not clean, if I will be committed or not (it waited in commitfest application for 4 years). So I accepted implemented behaviour (without inplace update). Now, this patch is in core, and anybody can work on others possible optimizations. Regards Pavel > > Here is the output from attached bench: > > n=1 > 00:00:00.002628 jsonb := jsonb_set(jsonb, ARRAY[existing key], value); > 00:00:00.002778 jsonb := jsonb_set(jsonb, ARRAY[new key], value); > 00:00:00.002332 jsonb[existing key] := value; > 00:00:00.002794 jsonb[new key] := value; > n=10 > 00:00:00.042843 jsonb := jsonb_set(jsonb, ARRAY[existing key], value); > 00:00:00.046515 jsonb := jsonb_set(jsonb, ARRAY[new key], value); > 00:00:00.044974 jsonb[existing key] := value; > 00:00:00.075429 jsonb[new key] := value; > n=100 > 00:00:00.420808 jsonb := jsonb_set(jsonb, ARRAY[existing key], value); > 00:00:00.449622 jsonb := jsonb_set(jsonb, ARRAY[new key], value); > 00:00:00.31834 jsonb[existing key] := value; > 00:00:00.527904 jsonb[new key] := value; > > Many thanks for clarifying. > > Best regards, > > Joel >
Re: [PATCH] force_parallel_mode and GUC categories
On Tue, Apr 13, 2021 at 07:31:39AM -0500, Justin Pryzby wrote: > Good point. Thanks. I have used the wording that Tom has proposed upthread, added one GUC_NOT_IN_SAMPLE that you forgot, and applied the force_parallel_mode patch. -- Michael signature.asc Description: PGP signature
Re: Replication slot stats misgivings
On Tue, Apr 13, 2021 at 1:37 PM vignesh C wrote: > > On Mon, Apr 12, 2021 at 7:03 PM Masahiko Sawada wrote: > > > > > > The following test for the latest v8 patch seems to show different. > > total_bytes is 1808 whereas spill_bytes is 1320. Am I missing > > something? > > > > postgres(1:85969)=# select pg_create_logical_replication_slot('s', > > 'test_decoding'); > > pg_create_logical_replication_slot > > > > (s,0/1884468) > > (1 row) > > > > postgres(1:85969)=# create table a (i int); > > CREATE TABLE > > postgres(1:85969)=# insert into a select generate_series(1, 10); > > INSERT 0 10 > > postgres(1:85969)=# set logical_decoding_work_mem to 64; > > SET > > postgres(1:85969)=# select * from pg_stat_replication_slots ; > > slot_name | total_txns | total_bytes | spill_txns | spill_count | > > spill_bytes | stream_txns | stream_count | stream_bytes | stats_reset > > ---++-++-+-+-+--+--+- > > s | 0 | 0 | 0 | 0 | > > 0 | 0 |0 |0 | > > (1 row) > > > > postgres(1:85969)=# select count(*) from > > pg_logical_slot_peek_changes('s', NULL, NULL); > > count > > > > 14 > > (1 row) > > > > postgres(1:85969)=# select * from pg_stat_replication_slots ; > > slot_name | total_txns | total_bytes | spill_txns | spill_count | > > spill_bytes | stream_txns | stream_count | stream_bytes | stats_reset > > ---++-++-+-+-+--+--+- > > s | 2 |1808 | 1 | 202 | > > 1320 | 0 |0 |0 | > > (1 row) > > > > Thanks for identifying this issue, while spilling the transactions > reorder buffer changes gets released, we will not be able to get the > total size for spilled transactions from reorderbuffer size. I have > fixed it by including spilledbytes to totalbytes in case of spilled > transactions. Attached patch has the fix for this. > Thoughts? > I am not sure if that is the best way to fix it because sometimes we clear the serialized flag in which case it might not give the correct answer. Another way to fix it could be that before we try to restore a new set of changes, we update totalBytes counter. See, the attached patch atop your v6-0002-* patch. -- With Regards, Amit Kapila. fix_spilled_stats_1.patch Description: Binary data