Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

2018-04-23 Thread David Gould
On Mon, 23 Apr 2018 20:09:21 -0500 Justin Pryzby <pry...@telsasoft.com> wrote: > Just want to add for the archive that I happened to run across what appears to > be a 7-year old report of (I think) both of these vacuum/analyze bugs: > > Re: [patch] BUG #15005: ANALYZE can make

Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

2018-04-23 Thread Justin Pryzby
Just want to add for the archive that I happened to run across what appears to be a 7-year old report of (I think) both of these vacuum/analyze bugs: Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate

Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

2018-03-13 Thread Tom Lane
David Gould writes: > I have attached the patch we are currently using. It applies to 9.6.8. I > have versions for older releases in 9.4, 9.5, 9.6. I fails to apply to 10, > and presumably head but I can update it if there is any interest. > The patch has three main features: >

Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

2018-03-13 Thread Tom Lane
David Gould writes: > I also thought about the theory and am confident that there really is no way > to trick it. Basically if there are enough pages that are different to affect > the overall density, say 10% empty or so, there is no way a random sample > larger than a few

Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

2018-03-13 Thread David Gould
On Mon, 12 Mar 2018 12:21:34 -0400 Tom Lane wrote: > I wrote: > > Maybe this type of situation is an argument for trusting an ANALYZE-based > > estimate more than the VACUUM-based estimate. I remain uncomfortable with > > that in cases where VACUUM looked at much more of the

Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

2018-03-12 Thread Tom Lane
I wrote: > Re-reading that thread, it seems like we should have applied Jeff's > initial trivial patch[1] (to not hold AutovacuumScheduleLock across > table_recheck_autovac) rather than waiting around for a super duper > improvement to get agreed on. I'm a bit tempted to go do that; > if nothing

Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

2018-03-12 Thread Tom Lane
I wrote: > Maybe this type of situation is an argument for trusting an ANALYZE-based > estimate more than the VACUUM-based estimate. I remain uncomfortable with > that in cases where VACUUM looked at much more of the table than ANALYZE > did, though. Maybe we need some heuristic based on the

Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

2018-03-12 Thread Tom Lane
David Gould writes: > On Wed, 7 Mar 2018 21:39:08 -0800 > Jeff Janes wrote: >> As for preventing it in the first place, based on your description of your >> hardware and operations, I was going to say you need to increase the max >> number of autovac

Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

2018-03-08 Thread David Gould
On Wed, 7 Mar 2018 21:39:08 -0800 Jeff Janes wrote: > As for preventing it in the first place, based on your description of your > hardware and operations, I was going to say you need to increase the max > number of autovac workers, but then I remembered you from

Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

2018-03-07 Thread Jeff Janes
On Sun, Mar 4, 2018 at 3:18 PM, David Gould wrote: > On Sun, 4 Mar 2018 07:49:46 -0800 > Jeff Janes wrote: > > > On Wed, Jan 17, 2018 at 4:49 PM, David Gould wrote: > ... > > > > Maybe a well-timed crash caused n_dead_tup to get reset to

Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

2018-03-07 Thread Robert Haas
On Fri, Mar 2, 2018 at 5:17 PM, Tom Lane wrote: > (1) do we really want to go over to treating ANALYZE's tuple density > result as gospel, contradicting the entire thrust of the 2011 discussion? > >> This tables reltuples is 18 times the actual row count. It will never >>

Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

2018-03-07 Thread David Gould
On Tue, 06 Mar 2018 11:16:04 -0500 Tom Lane wrote: > so that we can decide whether this bug is bad enough to justify > back-patching a behavioral change. I remain concerned that the proposed > fix is too simplistic and will have some unforeseen side-effects, so > I'd really

Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

2018-03-06 Thread Tom Lane
David Gould writes: > On Thu, 01 Mar 2018 18:49:20 -0500 > Tom Lane wrote: >> The sticking point in my mind right now is, if we do that, what to do with >> VACUUM's estimates. > For what it's worth, I think the current estimate formula for VACUUM is > pretty

Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

2018-03-06 Thread Tom Lane
David Gould writes: > On Sun, 4 Mar 2018 07:49:46 -0800 > Jeff Janes wrote: >> I don't see how it could have caused the problem in the first place. In >> your demonstration case, you had to turn off autovac in order to get it to >> happen, and then when

Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

2018-03-04 Thread David Gould
On Sun, 4 Mar 2018 07:49:46 -0800 Jeff Janes wrote: > I don't see how it could have caused the problem in the first place. In > your demonstration case, you had to turn off autovac in order to get it to > happen, and then when autovac is turned back on, it is all primed

Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

2018-03-04 Thread David Gould
On Sun, 4 Mar 2018 07:49:46 -0800 Jeff Janes wrote: > In any event, I agree with your analysis that ANALYZE should set the number > of tuples from scratch. After all, it sets the other estimates, such as > MCV, from scratch, and those are much more fragile to sampling than

Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

2018-03-04 Thread David Gould
On Sun, 4 Mar 2018 07:49:46 -0800 Jeff Janes wrote: > On Wed, Jan 17, 2018 at 4:49 PM, David Gould wrote: > > # analyze verbose pg_attribute; > > INFO: "pg_attribute": scanned 3 of 24519424 pages, containing 6475 > > live rows and 83 dead rows; 6475

Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

2018-03-04 Thread Jeff Janes
On Wed, Jan 17, 2018 at 4:49 PM, David Gould wrote: > > Please add the attached patch and this discussion to the open commit fest. > The > original bugs thread is here: 2018011254.1408.8342@wrigl > eys.postgresql.org. > > Bug reference: 15005 > Logged by: David

Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

2018-03-02 Thread David Gould
On Fri, 02 Mar 2018 17:17:29 -0500 Tom Lane wrote: > But by the same token, analyze only looked at 0.0006 of the pages. It's > nice that for you, that's enough to get a robust estimate of the density > everywhere; but I have a nasty feeling that that won't hold good for >

Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

2018-03-02 Thread David Gould
On Thu, 01 Mar 2018 18:49:20 -0500 Tom Lane wrote: > The sticking point in my mind right now is, if we do that, what to do with > VACUUM's estimates. If you believe the argument in the PDF that we'll > necessarily overshoot reltuples in the face of declining true density, >

Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

2018-03-02 Thread Tom Lane
David Gould writes: > I'm confused at this point, I provided a patch that addresses this and a > test case. We seem to be discussing everything as if we first noticed the > issue. Have you reviewed the patch and and attached analysis and tested it? > Please commment on that?

Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

2018-03-02 Thread David Gould
On Fri, 2 Mar 2018 18:47:44 +0300 Alexander Kuzmenkov wrote: > The calculation I made for the first step applies to the next steps too, > with minor differences. So, the estimate increases at each step. Just > out of interest, I plotted the reltuples for 60 steps,

Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

2018-03-02 Thread Alexander Kuzmenkov
On 02.03.2018 02:49, Tom Lane wrote: I looked at this and don't think it really answers the question. What happens is that, precisely because we only slowly adapt our estimate of density towards the new measurement, we will have an overestimate of density if the true density is decreasing (even

Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

2018-03-01 Thread Tom Lane
Alexander Kuzmenkov writes: > On 01.03.2018 18:09, Tom Lane wrote: >> Ideally, at least, the estimate would remain on-target. > The test shows that under this particular scenario the estimated number > of tuples grows after each ANALYZE. I tried to explain how this

Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

2018-03-01 Thread David Gould
On Thu, 1 Mar 2018 17:25:09 +0300 Alexander Kuzmenkov wrote: > Well, that sounds reasonable. But the problem with the moving average > calculation remains. Suppose you run vacuum and not analyze. If the > updates are random enough, vacuum won't be able to reclaim

Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

2018-03-01 Thread Alexander Kuzmenkov
On 01.03.2018 18:09, Tom Lane wrote: Ideally, at least, the estimate would remain on-target. The test shows that under this particular scenario the estimated number of tuples grows after each ANALYZE. I tried to explain how this happens in the attached pdf. The direct averaging of the

Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

2018-03-01 Thread Tom Lane
Alexander Kuzmenkov writes: > On 01.03.2018 06:23, David Gould wrote: >> In theory the sample pages analyze uses should represent the whole table >> fairly well. We rely on this to generate pg_statistic and it is a key >> input to the planner. Why should we not believe

Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

2018-03-01 Thread Alexander Kuzmenkov
On 01.03.2018 06:23, David Gould wrote: In theory the sample pages analyze uses should represent the whole table fairly well. We rely on this to generate pg_statistic and it is a key input to the planner. Why should we not believe in it as much only for reltuples? If the analyze sampling does

Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

2018-02-28 Thread Alexander Kuzmenkov
Hi David, I was able to reproduce the problem using your script. analyze_counts.awk is missing, though. The idea of using the result of ANALYZE as-is, without additional averaging, was discussed when vac_estimate_reltuples() was introduced originally. Ultimately, it was decided not to do