On 12/30/2016 02:12 PM, Petr Jelinek wrote:
On 12/12/16 22:50, Tomas Vondra wrote:
On 12/12/2016 12:26 PM, Amit Langote wrote:
On 2016/10/30 4:23, Tomas Vondra wrote:
Attached is v20 of the multivariate statistics patch series, doing
the changes outlined in the preceding e-mail from October 11.
The patch series currently has these parts:
* 0001 : (FIX) teach pull_varno about RestrictInfo
* 0002 : (PATCH) shared infrastructure and ndistinct coefficients
I went over these two (IMHO those could easily be considered as minimal
committable set even if the user visible functionality they provide is
Yes, although I still have my doubts 0001 is the right way to make
pull_varnos work. It's probably related to the bigger design question,
because moving the statistics selection to an earlier phase could make
it unnecessary I guess.
The statistics may be dropped automatically using DROP STATISTICS.
After ALTER TABLE ... DROP COLUMN, statistics referencing are:
(a) dropped, if the statistics would reference only one column
(b) retained, but modified on the next ANALYZE
This should be documented in user visible form if you plan to keep it
(it does make sense to me).
Yes, I plan to keep it. I agree it should be documented, probably on the
ALTER TABLE page (and linked from CREATE/DROP statistics pages).
+ therefore perfectly correlated. Providing additional information about
+ correlation between columns is the purpose of multivariate statistics,
+ and the rest of this section thoroughly explains how the planner
+ leverages them to improve estimates.
+ For additional details about multivariate statistics, see
+ <filename>src/backend/utils/mvstats/README.stats</>. There are additional
+ <literal>READMEs</> for each type of statistics, mentioned in the following
I don't think this qualifies as "thoroughly explains" ;)
OK, I'll drop the "thoroughly" ;-)
+get_statistics_oid(List *names, bool missing_ok)
+ case OBJECT_STATISTICS:
+ msg = gettext_noop("statistics \"%s\" does not exist,
+ name = NameListToString(objname);
This sounds somewhat weird (plural vs singular).
Ah, right - it should be either "statistic ... does not" or "statistics
... do not". I think "statistics" is the right choice here, because (a)
we have CREATE STATISTICS and (b) it may be a combination of statistics,
e.g. histogram + MCV.
+ * XXX Maybe this should check for duplicate stats. Although it's not clear
+ * what "duplicate" would mean here (wheter to compare only keys or also
+ * options). Moreover, we don't do such checks for indexes, although those
+ * store tuples and recreating a new index may be a way to fix bloat (which
+ * is a problem statistics don't have).
I don't think we should check duplicates TBH so I would remove the XXX
(also "wheter" is typo but if you remove that paragraph it does not matter).
Yes, I came to the same conclusion - we can only really check for exact
matches (same set of columns, same choice of statistic types), but
that's fairly useless. I'll remove the XXX.
+ if (true)
Yeah, that's a bit weird pattern. It's a remainder of copy-pasting the
preceding block, which looks like this
But we've decided to not add similar flag for the statistics. I'll move
the block to a separate function (instead of merging it directly into
the function, which is already a bit largeish).
+update_mv_stats(Oid mvoid, MVNDistinct ndistinct,
+ int2vector *attrs, VacAttrStats **stats)
+ndistinct_for_combination(double totalrows, int numrows, HeapTuple *rows,
+ int2vector *attrs, VacAttrStats **stats,
+ int k, int *combination)
Again, these deserve comment.
OK, will add.
I'll try to look at other patches in the series as time permits.
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Sent via pgsql-hackers mailing list (firstname.lastname@example.org)
To make changes to your subscription: