On Fri, 2 Mar 2018 18:47:44 +0300
Alexander Kuzmenkov <a.kuzmen...@postgrespro.ru> wrote:
> The calculation I made for the first step applies to the next steps too,
> with minor differences. So, the estimate increases at each step. Just
> out of interest, I plotted the reltuples for 60 steps, and it doesn't
> look like it's going to converge anytime soon (see attached).
> Looking at the formula, this overshoot term is created when we multiply
> the old density by the new number of pages. I'm not sure how to fix
> this. I think we could average the number of tuples, not the densities.
> The attached patch demonstrates what I mean.
I'm confused at this point, I provided a patch that addresses this and a
test case. We seem to be discussing everything as if we first noticed the
issue. Have you reviewed the patch and and attached analysis and tested it?
Please commment on that?
Also, here is a datapoint that I found just this morning on a clients
INFO: "staging_xyz": scanned 30000 of pages, containing 63592 live rows and
964346 dead rows;
30000 rows in sample, 1959918155 estimated total rows
# select (50000953.0/30000*63592)::int as nrows;
This tables reltuples is 18 times the actual row count. It will never converge
because with 50000953 pages analyze can only adjust reltuples by 0.0006 each
It will also almost never get vacuumed because the autovacuum threshold of
0.2 * 1959918155 = 391983631 about 3.7 times larger than the actual row count.
The submitted patch is makes analyze effective in setting reltuples to within
a few percent of the count(*) value.
David Gould da...@sonic.net
If simplicity worked, the world would be overrun with insects.