On 5/12/17 4:46 AM, Tom Lane wrote:
Alvaro Herrera <alvhe...@2ndquadrant.com> writes:
Hmm, yeah, makes sense.  Here's a patch for this approach.

...

Also, while reading the docs changes, I got rather ill from the
inconsistent and not very grammatical handling of "statistics" as a
noun, particularly the inability to decide from one sentence to the next
if that is singular or plural.  In the attached, I fixed this in the
ref/*_statistics.sgml files by always saying "statistics object" instead.
If people agree that that reads better, we should make an effort to
propagate that terminology elsewhere in the docs, and into error messages,
objectaddress.c output, etc.


I'm fine with the 'statistics object' wording. I've been struggling with this bit while working on the patch, and I agree it sounds better and getting it consistent is definitely worthwhile.

>
Although I've not done anything about it here, I'm not happy about the
handling of dependencies for stats objects.  I do not think that cloning
RemoveStatistics as RemoveStatisticsExt is sane at all.  The former is
meant to deal with cleaning up pg_statistic rows that we know will be
there, so there's no real need to expend pg_depend overhead to track them.
For objects that are only loosely connected, the right thing is to use
the dependency system; in particular, this design has no real chance of
working well with cross-table stats.  Also, it's really silly to have
*both* this hard-wired mechanism and a pg_depend entry; the latter is
surely redundant if you have the former.  IMO we should revert
RemoveStatisticsExt and instead handle things by making stats objects
auto-dependent on the individual column(s) they reference (not the whole
table).

I'm also of the opinion that having an AUTO dependency, rather than
a NORMAL dependency, on the stats object's schema is the wrong semantics.
There isn't any other case where you can drop a non-empty schema without
saying CASCADE, and I'm mystified why this case should act that way.


Yeah, it's a bit frankensteinian ...

>
Lastly, I tried the example given in the CREATE STATISTICS reference page,
and it doesn't seem to work.  Without the stats object, the two queries
are both estimated at one matching row, whereas they really produce 100
and 0 rows respectively.  With the stats object, they seem to both get
estimated at ~100 rows, which is a considerable improvement for one case
but a considerable disimprovement for the other.  If this is not broken,
I'd like to know why not.  What's the point of an expensive extended-
stats mechanism if it can't get this right?


I assume you're talking about the functional dependencies and in that case that's expected behavior, because f. dependencies do assume the conditions are "consistent" with the functional dependencies.

This is an inherent limitation of functional dependencies, and perhaps a price for the simplicity of this statistics type. If your application executes queries that are likely not consistent with this, don't use functional dependencies.

The simplicity is why dependencies were implemented first, mostly to introduce all the infrastructure. The other statistics types (MCV, histograms) don't have this limitation, but those did not make it into pg10.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to