On Mon, 8 Aug 2022 at 18:48, Peter Geoghegan <p...@bowt.ie> wrote: > > On Mon, Aug 8, 2022 at 9:17 AM Matthias van de Meent > <boekewurm+postg...@gmail.com> wrote: > > Because if a subset of the pages of a relation contains more tuples > > than your current total expected tuples in the table, you should > > update your expectations regardless of which blocks or which number of > > blocks you've scanned - the previous stored value is a strictly worse > > estimation than your last measurement. > > The previous stored value could be -1, which represents the idea that > we don't know the tuple density yet. So it doesn't necessarily follow > that the new estimate is strictly better, even in this exact scenario. > > > A 33-block relation with first 32 1-tuple pages is still enough to > > have a last page with 250 tuples, which would be ignored in that > > scheme and have a total tuple count of 33 or so. > > The simple fact is that there is only so much we can do with the > limited information/context that we have. Heuristics are not usually > free of all bias. Often the bias is the whole point -- the goal can be > to make sure that we have the bias that we know we can live with, and > not the opposite bias, which is much worse. Details of which are > usually very domain specific. > > I presented my patch with a very simple test case -- a very clear > problem. Can you do the same for this scenario?
CREATE TABLE tst (id int primary key generated by default as identity, payload text) with (fillfactor 50); -- fillfactor to make pages fill up fast INSERT INTO tst (payload) select repeat('a', 5000) from generate_series(32); -- 32 pages filled with large tuples INSERT INTO tst (payload) select repeat('a', 4); -- small tuple at last page vacuum (verbose, freeze) tst; -- 33 tuples on 33 pages, with lots of space left on last page INSERT INTO tst(payload) select repeat('a', 4) from generate_series(1,63); -- now, we have 64 tuples on the last page vacuum verbose tst; -- with your patch it reports only 33 tuples total, while the single page that was scanned contains 64 tuples, and the table contains 96 tuples. > I accept that it is possible that we'll keep an old reltuples which is > provably less accurate than doing something with the latest > information from vacuumlazy.c. But the conditions under which this can > happen are *very* narrow. I am not inclined to do anything about it > for that reason. I think I understand your reasoning, but I don't agree with the conclusion. The attached patch 0002 does fix that skew too, at what I consider negligible cost. 0001 is your patch with a new version number. I'm fine with your patch as is, but would appreciate it if known estimate mistakes would also be fixed. An alternative solution could be doing double-vetting, where we ignore tuples_scanned if <2% of pages AND <2% of previous estimated tuples was scanned. Kind regards, Matthias van de Meent
v2-0001-Avoid-reltuples-distortion-in-very-small-tables.patch
Description: Binary data
v2-0002-Avoid-reltuples-distortion-in-very-small-tables.patch
Description: Binary data