Re: [HACKERS] Bogus ANALYZE results for an otherwise-unique column with many nulls

2016-08-14 Thread Andreas Joseph Krogh
På søndag 07. august 2016 kl. 18:35:56, skrev Tom Lane >: Dean Rasheed writes: > On 5 August 2016 at 21:48, Tom Lane wrote: >> OK, thanks.  What shall we do about Andreas' request to back-patch this?

Re: [HACKERS] Bogus ANALYZE results for an otherwise-unique column with many nulls

2016-08-07 Thread Tom Lane
Dean Rasheed writes: > On 5 August 2016 at 21:48, Tom Lane wrote: >> OK, thanks. What shall we do about Andreas' request to back-patch this? >> I'm personally willing to do it, but there is the old bugaboo of "maybe >> it will destabilize a plan

Re: [HACKERS] Bogus ANALYZE results for an otherwise-unique column with many nulls

2016-08-07 Thread Andreas Joseph Krogh
På søndag 07. august 2016 kl. 09:01:40, skrev Dean Rasheed < dean.a.rash...@gmail.com >: On 5 August 2016 at 21:48, Tom Lane wrote: > OK, thanks.  What shall we do about Andreas' request to back-patch this? > I'm personally willing to do it,

Re: [HACKERS] Bogus ANALYZE results for an otherwise-unique column with many nulls

2016-08-07 Thread Dean Rasheed
On 5 August 2016 at 21:48, Tom Lane wrote: > OK, thanks. What shall we do about Andreas' request to back-patch this? > I'm personally willing to do it, but there is the old bugaboo of "maybe > it will destabilize a plan that someone is happy with". > My inclination would be

Re: [HACKERS] Bogus ANALYZE results for an otherwise-unique column with many nulls

2016-08-05 Thread Tom Lane
Andrew Gierth writes: > Objection withdrawn. OK, thanks. What shall we do about Andreas' request to back-patch this? I'm personally willing to do it, but there is the old bugaboo of "maybe it will destabilize a plan that someone is happy with".

Re: [HACKERS] Bogus ANALYZE results for an otherwise-unique column with many nulls

2016-08-05 Thread Andrew Gierth
> "Tom" == Tom Lane writes: Tom> Also, the way that the value is calculated in the Tom> samples-not-all-distinct case corresponds to the way I have it in Tom> the patch. Ahh, gotcha. You're referring to this: /* * If we estimated the number of

Re: [HACKERS] Bogus ANALYZE results for an otherwise-unique column with many nulls

2016-08-05 Thread Tom Lane
Andrew Gierth writes: > Hm. I am wrong about this, since it's the fact that consumers are taking > stanullfrac into account that makes the value wrong in the first place. Also, the way that the value is calculated in the samples-not-all-distinct case corresponds to

Re: [HACKERS] Bogus ANALYZE results for an otherwise-unique column with many nulls

2016-08-05 Thread Andrew Gierth
> "Andrew" == Andrew Gierth writes: > "Tom" == Tom Lane writes: Tom> What I did in the patch is to scale the formerly fixed "-1.0" Tom> stadistinct estimate to discount the fraction of nulls we found. Andrew> This seems quite dubious

Re: [HACKERS] Bogus ANALYZE results for an otherwise-unique column with many nulls

2016-08-05 Thread Andreas Joseph Krogh
På fredag 05. august 2016 kl. 01:01:06, skrev Tom Lane >: I wrote: > Looking around, there are a couple of places outside commands/analyze.c > that are making the same mistake, so this patch isn't complete, but it > illustrates what needs to be

Re: [HACKERS] Bogus ANALYZE results for an otherwise-unique column with many nulls

2016-08-05 Thread Andrew Gierth
> "Tom" == Tom Lane writes: Tom> What I did in the patch is to scale the formerly fixed "-1.0" Tom> stadistinct estimate to discount the fraction of nulls we found. This seems quite dubious to me. stadistinct representing only the non-null values seems to me to be

Re: [HACKERS] Bogus ANALYZE results for an otherwise-unique column with many nulls

2016-08-04 Thread Tom Lane
I wrote: > Looking around, there are a couple of places outside commands/analyze.c > that are making the same mistake, so this patch isn't complete, but it > illustrates what needs to be done. Here's a more complete patch. regards, tom lane diff --git

[HACKERS] Bogus ANALYZE results for an otherwise-unique column with many nulls

2016-08-04 Thread Tom Lane
I looked into the problem described at https://www.postgresql.org/message-id/flat/VisenaEmail.26.df42f82acae38a58.156463942b8%40tc7-visena and I believe I've reproduced it: the requirement is that the inner join column for the antijoin must contain a lot of NULL values, and what isn't NULL must be