Re: [HACKERS] multivariate statistics (v19)

Tomas Vondra Tue, 03 Jan 2017 13:56:43 -0800

On 12/30/2016 02:12 PM, Petr Jelinek wrote:

On 12/12/16 22:50, Tomas Vondra wrote:

On 12/12/2016 12:26 PM, Amit Langote wrote:


Hi Tomas,

On 2016/10/30 4:23, Tomas Vondra wrote:

Hi,

Attached is v20 of the multivariate statistics patch series, doing
mostly
the changes outlined in the preceding e-mail from October 11.

The patch series currently has these parts:

* 0001 : (FIX) teach pull_varno about RestrictInfo
* 0002 : (PATCH) shared infrastructure and ndistinct coefficients


Hi,

I went over these two (IMHO those could easily be considered as minimal
committable set even if the user visible functionality they provide is
rather limited).

Yes, although I still have my doubts 0001 is the right way to makepull_varnos work. It's probably related to the bigger design question,because moving the statistics selection to an earlier phase could makeit unnecessary I guess.

dropping statistics
-------------------

The statistics may be dropped automatically using DROP STATISTICS.

After ALTER TABLE ... DROP COLUMN, statistics referencing are:

  (a) dropped, if the statistics would reference only one column

  (b) retained, but modified on the next ANALYZE


This should be documented in user visible form if you plan to keep it
(it does make sense to me).

Yes, I plan to keep it. I agree it should be documented, probably on theALTER TABLE page (and linked from CREATE/DROP statistics pages).

+   therefore perfectly correlated. Providing additional information about
+   correlation between columns is the purpose of multivariate statistics,
+   and the rest of this section thoroughly explains how the planner
+   leverages them to improve estimates.
+  </para>
+
+  <para>
+   For additional details about multivariate statistics, see
+   <filename>src/backend/utils/mvstats/README.stats</>. There are additional
+   <literal>READMEs</> for each type of statistics, mentioned in the following
+   sections.
+  </para>
+
+ </sect1>


I don't think this qualifies as "thoroughly explains" ;)


OK, I'll drop the "thoroughly" ;-)

+
+Oid
+get_statistics_oid(List *names, bool missing_ok)


No comment?

+               case OBJECT_STATISTICS:
+                       msg = gettext_noop("statistics \"%s\" does not exist, 
skipping");
+                       name = NameListToString(objname);
+                       break;


This sounds somewhat weird (plural vs singular).

Ah, right - it should be either "statistic ... does not" or "statistics... do not". I think "statistics" is the right choice here, because (a)we have CREATE STATISTICS and (b) it may be a combination of statistics,e.g. histogram + MCV.

+ * XXX Maybe this should check for duplicate stats. Although it's not clear
+ * what "duplicate" would mean here (wheter to compare only keys or also
+ * options). Moreover, we don't do such checks for indexes, although those
+ * store tuples and recreating a new index may be a way to fix bloat (which
+ * is a problem statistics don't have).
+ */
+ObjectAddress
+CreateStatistics(CreateStatsStmt *stmt)


I don't think we should check duplicates TBH so I would remove the XXX
(also "wheter" is typo but if you remove that paragraph it does not matter).

Yes, I came to the same conclusion - we can only really check for exactmatches (same set of columns, same choice of statistic types), butthat's fairly useless. I'll remove the XXX.

+       if (true)
+       {


Huh?

Yeah, that's a bit weird pattern. It's a remainder of copy-pasting thepreceding block, which looks like this


    if (hasindex)
    {
        ...
    }

But we've decided to not add similar flag for the statistics. I'll movethe block to a separate function (instead of merging it directly intothe function, which is already a bit largeish).

+
+List *
+RelationGetMVStatList(Relation relation)
+{

...

+
+void
+update_mv_stats(Oid mvoid, MVNDistinct ndistinct,
+                               int2vector *attrs, VacAttrStats **stats)

...

+static double
+ndistinct_for_combination(double totalrows, int numrows, HeapTuple *rows,
+                                  int2vector *attrs, VacAttrStats **stats,
+                                  int k, int *combination)
+{



Again, these deserve comment.


OK, will add.

I'll try to look at other patches in the series as time permits.


thanks

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] multivariate statistics (v19)

Reply via email to