On Thu, Nov 20, 2025 at 5:12 PM David Rowley <[email protected]> wrote:
> It wasn't intended to be offensive.

OK.

> I suspect the most likely area the new prioritisation order could
> cause issues is from the lack of randomness. Will multiple workers
> working into the same database be more likely to bump into each other
> somehow in a bad way? Maybe that's a good area to focus testing.

I agree that lack of randomness could cause problems, but I don't see
how it could cause regressions, because the current system isn't
random, either. Even if the order of pg_class is unpredictable, it may
(depending on the workload) not change very much from one day to the
next.

> Yeah partly, but mostly I just really doubt that this matters that
> much. It's been said on this thread already that prioritisation isn't
> as important as the autovacuum-configured-to-run-too-slowly issue, and
> I agree with that. I just find it hard to believe that the highly
> volatile pg_class order has been just perfect all these years and that
> sorting by percentage-over-threshold-desc will make things worse
> overall. There was mention that pg_catalog tables are first in
> pg_class, but I don't really agree with that as if I create some new
> tables on a fresh database, I see those getting lower ctids than any
> pg_catalog table. The space for that is finite, but there's no
> shortage of other reasons for user tables to become mentioned in
> pg_class before catalogue tables as the database gets used. I see that
> table_beginscan_catalog() uses SO_ALLOW_SYNC too, so there's an extra
> layer of randomness from sync scans. I don't recall any complaints
> from the order autovacuum works on tables, so, to me, it just seems
> strange to think that the volatile order of pg_class just happened to
> be right all these years. I suspect what's happening is that the extra
> bloat or stale statistics that people get as a result of the
> pg_class-order autovacuum just gets unnoticed, ignored or attended to
> via adjustments to the corresponding scale_factor reloption.

Interesting. I don't have any real knowledge of how jumbled-up the
order of pg_class is on real production systems, and I agree that if
the answer is "it's usually quite jumbled up" then that is good news
for this patch. In any case, I'm not trying to say that prioritization
is an intrinsically bad idea, because I don't believe that. What I'm
trying to say is that there's a limited number of ways for this patch
to make things worse, and one of them is if someone is winning right
now by accident, so therefore we should think about how many people
might be in that situation. I would argue that if a large number of
users end up with a very similar pattern in terms of how pg_class is
ordered, that makes the patch higher-risk than if, as I think you're
arguing here, there's enough randomness in terms of where things end
up in pg_class to prevent any particular pattern from predominating.
In the latter case, one or two really unlucky users could end up worse
off, but that's not really an issue. What would be an issue is if we
regressed some kind of common pattern. I admit that's a bit
speculative and I'm probably being a little paranoid here: doing smart
things is typically better than doing dumb things, and what we're
doing right now is dumb.

On the other hand, once we ship something, we can't pull it back. If
it causes a problem, someone will call me at 2am and need their system
fixed right now. If my answer is "well, there are no configuration
knobs we can change and no way to get back to the old behavior and I'm
sorry you're having that problem but the only answer is for you to run
all your VACUUMs manually until two years from now when maybe the
algorithm will have been improved," it's not going to be a very good
night. After 15 years at EDB, I've learned that the problem isn't
being wrong per se; it's having no way to get out from under being
wrong. It is absolutely inevitable that I will screw up, you will
screw up, the project as a whole will screw up, and that doesn't worry
me a bit. What does worry me is the prospect that we won't have
thought hard enough about what we're going to do if and when that
happens. Most of the customers that I've gotten to work with over the
years are very gracious about things going wrong with the software as
long as there are some options to deal with the problem. I fully admit
that this patch may already be good enough that I'll never hear a
single customer complain about it, but the time to think through the
reverse scenario, where some users are unhappy, is before we ship, not
after. That necessarily involves some speculation about what might go
wrong and some of that speculation may be groundless, but speculation
causes a lot less pain than angry customers whose problems you can't
fix.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


Reply via email to