Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

Robert Haas Mon, 17 Jan 2022 07:12:58 -0800

On Thu, Jan 13, 2022 at 4:27 PM Peter Geoghegan <p...@bowt.ie> wrote:
> 1. Cases where our inability to get a cleanup lock signifies nothing
> at all about the page in question, or any page in the same table, with
> the same workload.
>
> 2. Pathological cases. Cases where we're at least at the mercy of the
> application to do something about an idle cursor, where the situation
> may be entirely hopeless on a long enough timeline. (Whether or not it
> actually happens in the end is less significant.)


Sure. I'm worrying about case (2). I agree that in case (1) waiting
for the lock is almost always the wrong idea.

> I think that you're focussing on individual VACUUM operations, whereas
> I'm more concerned about the aggregate effect of a particular policy
> over time.

I don't think so. I think I'm worrying about the aggregate effect of a
particular policy over time *in the pathological cases* i.e. (2).

> This is my concern -- what I've called category 2 cases have this
> exact quality. So given that, why not freeze what you can, elsewhere,
> on other pages that don't have the same issue (presumably the vast
> vast majority in the table)? That way you have the best possible
> chance of recovering once the DBA gets a clue and fixes the issue.

That's the part I'm not sure I believe. Imagine a table with a
gigantic number of pages that are not yet all-visible, a small number
of all-visible pages, and one page containing very old XIDs on which a
cursor holds a pin. I don't think it's obvious that not waiting is
best. Maybe you're going to end up vacuuming the table repeatedly and
doing nothing useful. If you avoid vacuuming it repeatedly, you still
have a lot of work to do once the DBA locates a clue.

I think there's probably an important principle buried in here: the
XID threshold that forces a vacuum had better also force waiting for
pins. If it doesn't, you can tight-loop on that table without getting
anything done.

> That's kind of what I meant. The difference between 50 million and 150
> million is rather unclear indeed. So having accepted that that might
> be true, why not be open to the possibility that it won't turn out to
> be true in the long run, for any given table? With the enhancements
> from the patch series in place (particularly the early freezing
> stuff), what do we have to lose by making the FreezeLimit XID cutoff
> for freezing much higher than your typical vacuum_freeze_min_age?
> Maybe the same as autovacuum_freeze_max_age or vacuum_freeze_table_age
> (it can't be higher than that without also making these other settings
> become meaningless, of course).

We should probably distinguish between the situation where (a) an
adverse pin is held continuously and effectively forever and (b)
adverse pins are held frequently but for short periods of time. I
think it's possible to imagine a small, very hot table (or portion of
a table) where very high concurrency means there are often pins. In
case (a), it's not obvious that waiting will ever resolve anything,
although it might prevent other problems like infinite looping. In
case (b), a brief wait will do a lot of good. But maybe that doesn't
even matter. I think part of your argument is that if we fail to
update relfrozenxid for a while, that really isn't that bad.

I think I agree, up to a point. One consequence of failing to
immediately advance relfrozenxid might be that pg_clog and friends are
bigger, but that's pretty minor. Another consequence might be that we
might vacuum the table more times, which is more serious. I'm not
really sure that can happen to a degree that is meaningful, apart from
the infinite loop case already described, but I'm also not entirely
sure that it can't.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

Reply via email to