On Tue, Mar 10, 2026 at 6:08 PM Andres Freund <[email protected]> wrote: > You don't need a queuing workload to occasionally have a few pages of dead > tuples at one end of the value range. E.g. a small batch insert that rolls > back due to a unique conflict is enough. The insert case is also the one > where, without get_actual_variable_range(), we will often end up with > completely bogus estimates, as obviously the stored stats won't yet know about > newly inserted data.
True, but why would you expect the problem to be any worse with the approach that I've proposed? If we conservatively assume that each leaf page has 6 distinct heap pages, as is the case with pgbench_accounts_pkey (which is very low and thus favors your argument), we can expect to examine about 2.5 x 6 = 15 heap pages before giving up, with the approach I'm proposing. That's not all that different to the current behavior. Besides all this, the scenario you're describing is unlikely to last long. Clients encountering the problem you describe tend to repeat the transactions that initially failed. All it takes is another transaction that inserts another tuple into the rightmost leaf page, or one close by. This will immediately make it possible for get_actual_variable_range to find a valid extremal value in the index once more. > > get_actual_variable_range exists to ascertain information about a > > particular index. To me, it makes perfect sense that a mechanism such > > as this would work in terms of costs paid on the index AM side. (The > > heap fetches matter a great deal too, of course, but it's reasonable > > to use index costs as a proxy for heap/table AM costs -- but it's not > > reasonable to use table/heap AM costs as a proxy for index AM costs. > > It doesn't work both ways.) > > I am not convinced it's true either direction - tids from three leaf pages of > tids can point to a very small number of heap pages or hundreds of heap > pages. I was unclear about this point. There is definitely a wide natural variation in how many distinct heap blocks appear on each leaf page. And the approach I have in mind is almost completely indifferent to that. Though, of course, it still imposes a natural ceiling on how many heap pages can be read: there can only be so many distinct heap blocks on one leaf page (that was what I referred to above). I believe that it's actually a good thing that my approach doesn't *directly* care about the number of heap accesses. Ignoring the natural variation in heap layout will produce more consistent behavior over time, and make it less likely that we'll give up too early. This is helped by the fact that we can set LP_DEAD bits on dead tuples in passing (and by the natural ceiling). > Why is the index AM cost a good proxy for the table? If anything it's > more sane the other way round (i.e. how it is today). At a high level we need to reason about the general conditions in the index without ever doing excessive work. The boundaries between leaf pages and leaf page density provide relevant hints about the overall picture in the index. If we see a leaf page with one 16 byte index tuple, it is very reasonable to surmise that it once held hundreds. To a lesser degree it is valid to expect that its sibling leaf pages will also have very few index tuples, likely due to a sparse deletion pattern. Indexes are organized logically/form a key space, so making these kinds of inferences is much more reasonable than it would be when looking at a group of physically contiguous heap pages. By the same token, if we encounter a small run of logically contiguous leaf pages that are all completely empty, it's a pathological case; the chances of there being many more empty leaf pages after that shoot up. The more empty leaf pages we see, the more we should expect to see. As I keep pointing out, all we need to do is find a single index tuple -- which shouldn't be hard at all, barring these truly patholgical cases. I'm trying to be practical here. I want to design a solution compatible with the changes that we're making to the table AM interface. That fixes Mark's complaint. Ideally, this should avoid adding much new code to hot code paths. Any design that meets those goals is acceptable to me. -- Peter Geoghegan
