On Thu, Nov 20, 2025 at 5:12 PM David Rowley <[email protected]> wrote: > It wasn't intended to be offensive.
OK. > I suspect the most likely area the new prioritisation order could > cause issues is from the lack of randomness. Will multiple workers > working into the same database be more likely to bump into each other > somehow in a bad way? Maybe that's a good area to focus testing. I agree that lack of randomness could cause problems, but I don't see how it could cause regressions, because the current system isn't random, either. Even if the order of pg_class is unpredictable, it may (depending on the workload) not change very much from one day to the next. > Yeah partly, but mostly I just really doubt that this matters that > much. It's been said on this thread already that prioritisation isn't > as important as the autovacuum-configured-to-run-too-slowly issue, and > I agree with that. I just find it hard to believe that the highly > volatile pg_class order has been just perfect all these years and that > sorting by percentage-over-threshold-desc will make things worse > overall. There was mention that pg_catalog tables are first in > pg_class, but I don't really agree with that as if I create some new > tables on a fresh database, I see those getting lower ctids than any > pg_catalog table. The space for that is finite, but there's no > shortage of other reasons for user tables to become mentioned in > pg_class before catalogue tables as the database gets used. I see that > table_beginscan_catalog() uses SO_ALLOW_SYNC too, so there's an extra > layer of randomness from sync scans. I don't recall any complaints > from the order autovacuum works on tables, so, to me, it just seems > strange to think that the volatile order of pg_class just happened to > be right all these years. I suspect what's happening is that the extra > bloat or stale statistics that people get as a result of the > pg_class-order autovacuum just gets unnoticed, ignored or attended to > via adjustments to the corresponding scale_factor reloption. Interesting. I don't have any real knowledge of how jumbled-up the order of pg_class is on real production systems, and I agree that if the answer is "it's usually quite jumbled up" then that is good news for this patch. In any case, I'm not trying to say that prioritization is an intrinsically bad idea, because I don't believe that. What I'm trying to say is that there's a limited number of ways for this patch to make things worse, and one of them is if someone is winning right now by accident, so therefore we should think about how many people might be in that situation. I would argue that if a large number of users end up with a very similar pattern in terms of how pg_class is ordered, that makes the patch higher-risk than if, as I think you're arguing here, there's enough randomness in terms of where things end up in pg_class to prevent any particular pattern from predominating. In the latter case, one or two really unlucky users could end up worse off, but that's not really an issue. What would be an issue is if we regressed some kind of common pattern. I admit that's a bit speculative and I'm probably being a little paranoid here: doing smart things is typically better than doing dumb things, and what we're doing right now is dumb. On the other hand, once we ship something, we can't pull it back. If it causes a problem, someone will call me at 2am and need their system fixed right now. If my answer is "well, there are no configuration knobs we can change and no way to get back to the old behavior and I'm sorry you're having that problem but the only answer is for you to run all your VACUUMs manually until two years from now when maybe the algorithm will have been improved," it's not going to be a very good night. After 15 years at EDB, I've learned that the problem isn't being wrong per se; it's having no way to get out from under being wrong. It is absolutely inevitable that I will screw up, you will screw up, the project as a whole will screw up, and that doesn't worry me a bit. What does worry me is the prospect that we won't have thought hard enough about what we're going to do if and when that happens. Most of the customers that I've gotten to work with over the years are very gracious about things going wrong with the software as long as there are some options to deal with the problem. I fully admit that this patch may already be good enough that I'll never hear a single customer complain about it, but the time to think through the reverse scenario, where some users are unhappy, is before we ship, not after. That necessarily involves some speculation about what might go wrong and some of that speculation may be groundless, but speculation causes a lot less pain than angry customers whose problems you can't fix. -- Robert Haas EDB: http://www.enterprisedb.com
