On Thu, Jan 26, 2023 at 12:45 PM Andres Freund <and...@anarazel.de> wrote: > > Most of the overhead of FREEZE WAL records (with freeze plan > > deduplication and page-level freezing in) is generic WAL record header > > overhead. Your recent adversarial test case is going to choke on that, > > too. At least if you set checkpoint_timeout to 1 minute again. > > I don't quite follow. What do you mean with "record header overhead"? Unless > that includes FPIs, I don't think that's that commonly true?
Even if there are no directly observable FPIs, there is still extra WAL, which can cause FPIs indirectly, just by making checkpoints more frequent. I feel ridiculous even having to explain this to you. > The problematic case I am talking about is when we do *not* emit a WAL record > during pruning (because there's nothing to prune), but want to freeze the > table. If you don't log an FPI, the remaining big overhead is that increasing > the LSN on the page will often cause an XLogFlush() when writing out the > buffer. > > I don't see what your reference to checkpoint timeout is about here? > > Also, as I mentioned before, the problem isn't specific to checkpoint_timeout > = 1min. It just makes it cheaper to reproduce. That's flagrantly intellectually dishonest. Sure, it made it easier to reproduce. But that's not all it did! You had *lots* of specific numbers and technical details in your first email, such as "Time for vacuuming goes up to ~5x. WAL volume to ~9x.". But you did not feel that it was worth bothering with details like having set checkpoint_timeout to 1 minute, which is a setting that nobody uses, and obviously had a multiplicative effect. That detail was unimportant. I had to drag it out of you! You basically found a way to add WAL overhead to a system/workload that is already in a write amplification vicious cycle, with latent tipping point type behavior. There is a practical point here, that is equally obvious, and yet somehow still needs to be said: benchmarks like that one are basically completely free of useful information. If we can't agree on how to assess such things in general, then what can we agree on when it comes to what should be done about it, what trade-off to make, when it comes to any similar question? > > In many cases we'll have to dirty the page anyway, just to set > > PD_ALL_VISIBLE. The whole way the logic works is conditioned (whether > > triggered by an FPI or triggered by my now-reverted GUC) on being able > > to set the whole page all-frozen in the VM. > > IIRC setting PD_ALL_VISIBLE doesn't trigger an FPI unless we need to log hint > bits. But freezing does trigger one even without wal_log_hint_bits. That is correct. > You're right, it makes sense to consider whether we'll emit a > XLOG_HEAP2_VISIBLE anyway. As written the page-level freezing FPI mechanism probably doesn't really stand to benefit much from doing that. Either checksums are disabled and it's just a hint, or they're enabled and there is a very high chance that we'll get an FPI inside lazy_scan_prune rather than right after it is called, when PD_ALL_VISIBLE is set. That's not perfect, of course, but it doesn't have to be. Perhaps it should still be improved, just on general principle. > > > A less aggressive version would be to check if any WAL records were > > > emitted > > > during heap_page_prune() (instead of FPIs) and whether we'd emit an FPI > > > if we > > > modified the page again. Similar to what we do now, except not requiring > > > an > > > FPI to have been emitted. > > > > Also way more aggressive. Not nearly enough on its own. > > In which cases will it be problematically more aggressive? > > If we emitted a WAL record during pruning we've already set the LSN of the > page to a very recent LSN. We know the page is dirty. Thus we'll already > trigger an XLogFlush() during ringbuffer replacement. We won't emit an FPI. You seem to be talking about this as if the only thing that could matter is the immediate FPI -- the first order effects -- and not any second order effects. You certainly didn't get to 9x extra WAL overhead by controlling for that before. Should I take it that you've decided to assess these things more sensibly now? Out of curiosity: why the change of heart? > > > But to me it seems a bit odd that VACUUM now is more aggressive if > > > checksums / > > > wal_log_hint bits is on, than without them. Which I think is how using > > > either > > > of pgWalUsage.wal_fpi, pgWalUsage.wal_records ends up working? > > > > Which part is the odd part? Is it odd that page-level freezing works > > that way, or is it odd that page-level checksums work that way? > > That page-level freezing works that way. I think that it will probably cause a little confusion, and should be specifically documented. But other than that, it seems reasonable enough to me. I mean, should I not do something that's going to be of significant help to users with checksums, just because it'll be somewhat confusing to a small minority of them? > > In any case this seems like an odd thing for you to say, having > > eviscerated a patch that really just made the same behavior trigger > > independently of FPIs in some tables, controlled via a GUC. > > jdksjfkjdlkajsd;lfkjasd;lkfj;alskdfj > > That behaviour I critizied was causing a torrent of FPIs and additional > dirtying of pages. My proposed replacement for the current FPI check doesn't, > because a) it only triggers when we wrote a WAL record b) It doesn't trigger > if we would write an FPI. It increases the WAL written in many important cases that vacuum_freeze_strategy_threshold avoided. Sure, it did have some problems, but the general idea of adding some high level context/strategies seems essential to me. You also seem to be suggesting that your proposed change to how basic page-level freezing works will make freezing of pages on databases with page-level checksums similar to an equivalent case without checksums enabled. Even assuming that that's an important goal, you won't be much closer to achieving it under your scheme, since hint bits being set during VACUUM and requiring an FPI still make a huge difference. Tables like pgbench_history have pages that generally aren't pruned, that don't need to log an FPI just to set PD_ALL_VISIBLE once checksums are disabled. That's the difference that users are going to notice between checksums enabled vs disabled, if they notice any -- it's the most important one by far. -- Peter Geoghegan