Hi, On 3/17/21 10:43 PM, Stephen Frost wrote: > Greetings, > > * Tomas Vondra (tomas.von...@enterprisedb.com) wrote: >> Right, I was just going to point out the FPIs are not necessary - what >> matters is the presence of long streaks of WAL records touching the same >> set of blocks. But people with workloads where this is common likely >> don't need the WAL prefetching at all - the replica can keep up just >> fine, because it doesn't need to do much I/O anyway (and if it can't >> then prefetching won't help much anyway). So just don't enable the >> prefetching, and there'll be no overhead. > > Isn't this exactly the common case though..? Checkpoints happening > every 5 minutes, the replay of the FPI happens first and then the record > is updated and everything's in SB for the later changes?
Well, as I said before, the FPIs are not very significant - you'll have mostly the same issue with any repeated changes to the same block. It does not matter much if you do FPI for block 1 WAL record for block 2 WAL record for block 1 WAL record for block 2 WAL record for block 1 or just WAL record for block 1 WAL record for block 2 WAL record for block 1 WAL record for block 2 WAL record for block 1 In both cases some of the prefetches are probably unnecessary. But the frequency of checkpoints does not really matter, the important bit is repeated changes to the same block(s). If you have active set much larger than RAM, this is quite unlikely. And we know from the pgbench tests that prefetching has a huge positive effect in this case. On smaller active sets, with frequent updates to the same block, we may issue unnecessary prefetches - that's true. But (a) you have not shown any numbers suggesting this is actually an issue, and (b) those cases don't really need prefetching because all the data is already either in shared buffers or in page cache. So if it happens to be an issue, the user can simply disable it. So how exactly would a problematic workload look like? > You mentioned elsewhere that this would improve 80% of cases but that > doesn't seem to be backed up by anything and certainly doesn't seem > likely to be the case if we're talking about across all PG > deployments. Obviously, the 80% was just a figure of speech, illustrating my belief that the proposed patch is beneficial for most users who currently have issues with replication lag. That is based on my experience with support customers who have such issues - it's almost invariably an OLTP workload with large active set, and we know (from the benchmarks) that in these cases it helps. Users who don't have issues with replication lag can disable (or not enable) the prefetching, and won't get any negative effects. Perhaps there are users with weird workloads that have replication lag issues but this patch won't help them - bummer, we can't solve everything in one go. Also, no one actually demonstrated such workload in this thread so far. But as you're suggesting we don't have data to support the claim that this actually helps many users (with no risk to others), I'd point out you have not actually provided any numbers showing that it actually is an issue in practice. > I also disagree that asking the kernel to go do random I/O for us, > even as a prefetch, is entirely free simply because we won't > actually need those pages. At the least, it potentially pushes out > pages that we might need shortly from the filesystem cache, no? Where exactly did I say it's free? I said that workloads where this happens a lot most likely don't need the prefetching at all, so it can be simply disabled, eliminating all negative effects. Moreover, looking at a limited number of recently prefetched blocks won't eliminate this problem anyway - imagine a random OLTP on large data set that however fits into RAM. After a while no read I/O needs to be done, but you'd need pretty much infinite list of prefetched blocks to eliminate that, and with smaller lists you'll still do 99% of the prefetches. Just disabling prefetching on such instances seems quite reasonable. >> If it was up to me, I'd just get the patch committed as is. Delaying the >> feature because of concerns that it might have some negative effect in >> some cases, when that can be simply mitigated by disabling the feature, >> is not really beneficial for our users. > > I don't know that we actually know how many cases it might have a > negative effect on or what the actual amount of such negative case there > might be- that's really why we should probably try to actually benchmark > it and get real numbers behind it, particularly when the chances of > running into such a negative effect with the default configuration (that > is, FPWs enabled) on the more typical platforms (as in, not ZFS) is more > likely to occur in the field than the cases where FPWs are disabled and > someone's running on ZFS. > > Perhaps more to the point, it'd be nice to see how this change actually > improves the caes where PG is running with more-or-less the defaults on > the more commonly deployed filesystems. If it doesn't then maybe it > shouldn't be the default..? Surely the folks running on ZFS and running > with FPWs disabled would be able to manage to enable it if they > wished to and we could avoid entirely the question of if this has a > negative impact on the more common cases. > > Guess I'm just not a fan of pushing out a change that will impact > everyone by default, in a possibly negative way (or positive, though > that doesn't seem terribly likely, but who knows), without actually > measuring what that impact will look like in those more common cases. > Showing that it's a great win when you're on ZFS or running with FPWs > disabled is good and the expected best case, but we should be > considering the worst case too when it comes to performance > improvements. > Well, maybe it'll behave differently on systems with ZFS. I don't know, and I have no such machine to test that at the moment. My argument however remains the same - if if happens to be a problem, just don't enable (or disable) the prefetching, and you get the current behavior. FWIW I'm not sure there was a discussion or argument about what should be the default setting (enabled or disabled). I'm fine with not enabling this by default, so that people have to enable it explicitly. In a way that'd be consistent with effective_io_concurrency being 1 by default, which almost disables regular prefetching. > Anyhow, ultimately I don't know that there's much more to discuss on > this thread with regard to this particular topic, at least. As I said > before, if everyone else is on board and not worried about it then so be > it; I feel that at least the concern that I raised has been heard. > OK, thanks for the discussions. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company