On 3/16/19 11:55 AM, Dean Rasheed wrote: > On Fri, 15 Mar 2019 at 00:06, Tomas Vondra <tomas.von...@2ndquadrant.com> > wrote: >> I've noticed an annoying thing when modifying type of column not >> included in a statistics... >> >> That is, we don't remove the statistics, but the estimate still changes. >> But that's because the ALTER TABLE also resets reltuples/relpages: >> >> That's a bit unfortunate, and it kinda makes the whole effort to not >> drop the statistics unnecessarily kinda pointless :-( >> > > Well not entirely. Repeating that test with 100,000 rows, I get an > initial estimate of 9850 (actual 10,000), which then drops to 2451 > after altering the column. But if you drop the dependency statistics, > the estimate drops to 241, so clearly there is some benefit in keeping > them in that case. >
Sure. What I meant is that to correct the relpages/reltuples estimates you need to do ANALYZE, which rebuilds the statistics anyway. Although VACUUM also fixes the estimates, without the stats rebuild. > Besides, I thought there was no extra effort in keeping the extended > statistics in this case -- isn't it just using the column > dependencies, so in this case UpdateStatisticsForTypeChange() never > gets called anyway? > Yes, it does not get called at all. My point was that I was a little bit confused because the test says "check change of unrelated column type does not reset the MCV statistics" yet the estimates do actually change. I wonder why we reset the relpages/reltuples to 0, instead of retaining the original values, though. That would likely give us better density estimates in estimate_rel_size, I think. So I've tried doing that, and I've included it as 0001 into the patch series. It seems to work, but I suppose the reset is there for a reason. In any case, this is a preexisting issue, independent of what this patch does or changes. I've discovered another issue, though. Currently, clauselist_selectivity has this as the very beginning: /* * If there's exactly one clause, just go directly to * clause_selectivity(). None of what we might do below is relevant. */ if (list_length(clauses) == 1) return clause_selectivity(root, (Node *) linitial(clauses), varRelid, jointype, sjinfo); Which however fails with queries like this: WHERE (a = 1 OR b = 1) because clauselist_selectivity sees it as a single clause, passes it to clause_selectivity and the OR-clause handling simply relies on (s1 + s2 - s1 * s2) which entirely ignores the multivariate stats. The other similar places in clause_selectivity() simply call clauselist_selectivity() so that's OK, but OR-clauses don't do that. For functional dependencies this is not a huge issue because those apply only to AND-clauses. But there were proposals to maybe apply them to other types of clauses, in which case it might become issue. I think the best fix is moving the optimization after the multivariate stats are applied. The only alternative I can think of is modifying clauselist_selectivity so that it can be executed on OR-clauses. But that seems much more complicated than the former option for almost no other advantages. I've also changed how statext_is_compatible_clause_internal() handles the attnums bitmapset - you were right in your 3/10 message that we can just pass the value, without creating a local bitmapset. So I've just done that. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
0001-retain-reltuples-relpages-on-ALTER-TABLE.patch.gz
Description: application/gzip
0002-multivariate-MCV-lists.patch.gz
Description: application/gzip
0003-multivariate-histograms.patch.gz
Description: application/gzip