Hi, (I wrote the first part of the email before Robert and I chatted on a call, I left it in the email for posterity)
On 2023-09-26 13:49:32 -0400, Robert Haas wrote: > On Tue, Sep 26, 2023 at 11:11 AM Andres Freund <and...@anarazel.de> wrote: > > As long as the most extreme cases are prevented, unnecessarily freezing is > > imo > > far less harmful than freezing too little. > > > > I'm worried that using something as long as 100-200% > > time-between-recent-checkpoints won't handle insert-mostly workload well, > > which IME are also workloads suffering quite badly under our current scheme > > - > > and which are quite common. > > I wrote about this problem in my other reply and I'm curious as to > your thoughts about it. Basically, suppose we forget about all of > Melanie's tests except for three cases: (1) an insert-only table, (2) > an update-heavy workload with uniform distribution, and (3) an > update-heavy workload with skew. In case (1), freezing is good. In > case (2), freezing is bad. In case (3), freezing is good for cooler > pages and bad for hotter ones. I postulate that any > recency-of-modification threshold that handles (1) well will handle > (2) poorly, and that the only way to get both right is to take some > other factor into account. You seem to be arguing that we can just > freeze aggressively in case (2) and it won't cost much, but it doesn't > sound to me like Melanie believes that and I don't think I do either. I don't believe we can freeze aggressively in all cases of 2) without causing problems. A small-ish table that's vacuumed constantly, where all rows are constantly frozen and then updated, will suffer a lot from the WAL overhead. Whereas superfluously freezing a row in a table with many millions of rows, where each row is only occasionally updated, due to the update rate being much smaller than the number of rows, will have acceptable overhead. What I *do* believe is that for all but the most extreme cases, it's safer to freeze too much than to freeze too little. There definitely are negative consequences, but they're more bounded and less surprising than not freezing for ages and then suddenly freezing everything at once. Whether 2) really exists in the real world for huge tables, is of course somewhat debatable... > > > This doesn't seem completely stupid, but I fear it would behave > > > dramatically differently on a workload a little smaller than s_b vs. > > > one a little larger than s_b, and that doesn't seem good. > > > > Hm. I'm not sure that that's a real problem. In the case of a workload > > bigger > > than s_b, having to actually read the page again increases the cost of > > freezing later, even if the workload is just a bit bigger than s_b. > > That is true, but I don't think it means that there is no problem. It > could lead to a situation where, for a while, a table never needs any > significant freezing, because we always freeze aggressively. What do you mean with "always freeze aggressively" - do you mean 'aggressive' autovacuums? Or opportunistic freezing being aggressive? I don't know why the former would be the case? > When it grows large enough, we suddenly stop freezing it aggressively, and > now it starts experiencing vacuums that do a whole bunch of work all at > once. A user who notices that is likely to be pretty confused about what > happened, and maybe not too happy when they find out. Hm - isn't my proposal exactly the other way round? I'm proposing that a page is frozen more aggressively if not already in shared buffers - which will become more common once the table has grown "large enough"? (the remainder was written after that call) I think there were three main ideas that we discussed: 1) We don't need to be accurate in the freezing decisions for individual pages, we "just" need to avoid the situation that over time we commonly freeze pages that will be updated again "soon". 2) It might be valuable to adjust the "should freeze page opportunistically" based on feedback. 3) We might need to classify the workload for a table and use different heruristics for different workloads. For 2), one of the mechanisms we discussed was to collect information about the "age" of a page when we "unfreeze" it. If we frequently unfreeze pages that were recently frozen, we need to be less aggressive in opportunistic freezing going forward. If that never happens, we can be more aggressive. The basic idea for classifying the age of a page when unfreezing is to use "insert_lsn - page_lsn", pretty simple. We can convert that into time using the averaged WAL generation rate. What's a bit harder is figuring out how to usefully aggregate the age across multiple "unfreezes". I was initially thinking of just using the mean, but Robert was rightly concerned that that'd the mean would be moved a lot when occasionally freezing very old pages, potentially leading to opportunistically freezing young pages too aggressively. The median would be a better choice, but at least with the naive algorithms we can't maintain that over time in the stats. One way to deal with that would be to not track the average age in LSN-difference-bytes, but convert the value to some age metric at that time. If we e.g. were to convert the byte-age into an approximate age in checkpoints, with quadratic bucketing (e.g. 0 -> current checkpoint, 1 -> 1 checkpoint, 2 -> 2 checkpoints ago, 3 -> 4 checkpoints ago, ...), using a mean of that age would probably be fine. Once we have a metric like that, we can increase the aggressiveness of opportunistic freezing if recently frozen pages aren't frequently frozen, and decrease aggressiveness when they are. For 3), we could do something similar. If we made updates/deletes track how long ago (again in LSN difference) the page was modified last, we should be able to quite effectively differentiate between a workload that only modifies recent data, and one that doesn't have such locality. Greetings, Andres Freund