Hi, On 2026-03-10 17:03:14 -0400, Peter Geoghegan wrote: > On Tue, Feb 10, 2026 at 4:26 AM Jakub Wartak > <[email protected]> wrote: > > I just worry that if get_actual_variable_range() starts reporting much worse > > estimates what do we do then? Well maybe, we'll have pg_plan_advice then, > > or maybe the depth (time?) of search should be somehow bound to the column's > > statistics_target too, but on second thought it looks like it should be more > > property of the query itself (so maybe some GUC?) > > The purpose of get_actual_variable_range is to get the extremal value > in the index -- a single value from a single tuple. If we can't manage > that by reading only 2 or 3 pages, then I see no good reason to > believe we can by reading many more pages. > > In other words, we're perfectly justified in making a soft expectation > that it'll require very little work to get an extremal value. But once > we notice that our soft assumption doesn't hold, all bets are off -- > we're then justified in giving up relatively quickly. Having several > contiguous pages that are completely empty (or empty of > non-LP_DEAD-marked tuples) at the extreme leftmost or rightmost en of > the index is absolutely a pathological case. It strongly suggests a > queue-like workload of the kind that my testing shows that > VISITED_PAGES_LIMIT can't deal with sensibly.
You don't need a queuing workload to occasionally have a few pages of dead tuples at one end of the value range. E.g. a small batch insert that rolls back due to a unique conflict is enough. The insert case is also the one where, without get_actual_variable_range(), we will often end up with completely bogus estimates, as obviously the stored stats won't yet know about newly inserted data. > get_actual_variable_range exists to ascertain information about a > particular index. To me, it makes perfect sense that a mechanism such > as this would work in terms of costs paid on the index AM side. (The > heap fetches matter a great deal too, of course, but it's reasonable > to use index costs as a proxy for heap/table AM costs -- but it's not > reasonable to use table/heap AM costs as a proxy for index AM costs. > It doesn't work both ways.) I am not convinced it's true either direction - tids from three leaf pages of tids can point to a very small number of heap pages or hundreds of heap pages. Why is the index AM cost a good proxy for the table? If anything it's more sane the other way round (i.e. how it is today). Greetings, Andres Freund
