On 12/30/2016 02:12 PM, Petr Jelinek wrote:
On 12/12/16 22:50, Tomas Vondra wrote:
On 12/12/2016 12:26 PM, Amit Langote wrote:

Hi Tomas,

On 2016/10/30 4:23, Tomas Vondra wrote:
Hi,

Attached is v20 of the multivariate statistics patch series, doing
mostly
the changes outlined in the preceding e-mail from October 11.

The patch series currently has these parts:

* 0001 : (FIX) teach pull_varno about RestrictInfo
* 0002 : (PATCH) shared infrastructure and ndistinct coefficients

Hi,

I went over these two (IMHO those could easily be considered as minimal
committable set even if the user visible functionality they provide is
rather limited).


Yes, although I still have my doubts 0001 is the right way to make pull_varnos work. It's probably related to the bigger design question, because moving the statistics selection to an earlier phase could make it unnecessary I guess.

dropping statistics
-------------------

The statistics may be dropped automatically using DROP STATISTICS.

After ALTER TABLE ... DROP COLUMN, statistics referencing are:

  (a) dropped, if the statistics would reference only one column

  (b) retained, but modified on the next ANALYZE

This should be documented in user visible form if you plan to keep it
(it does make sense to me).


Yes, I plan to keep it. I agree it should be documented, probably on the ALTER TABLE page (and linked from CREATE/DROP statistics pages).

+   therefore perfectly correlated. Providing additional information about
+   correlation between columns is the purpose of multivariate statistics,
+   and the rest of this section thoroughly explains how the planner
+   leverages them to improve estimates.
+  </para>
+
+  <para>
+   For additional details about multivariate statistics, see
+   <filename>src/backend/utils/mvstats/README.stats</>. There are additional
+   <literal>READMEs</> for each type of statistics, mentioned in the following
+   sections.
+  </para>
+
+ </sect1>

I don't think this qualifies as "thoroughly explains" ;)


OK, I'll drop the "thoroughly" ;-)

+
+Oid
+get_statistics_oid(List *names, bool missing_ok)

No comment?

+               case OBJECT_STATISTICS:
+                       msg = gettext_noop("statistics \"%s\" does not exist, 
skipping");
+                       name = NameListToString(objname);
+                       break;

This sounds somewhat weird (plural vs singular).


Ah, right - it should be either "statistic ... does not" or "statistics ... do not". I think "statistics" is the right choice here, because (a) we have CREATE STATISTICS and (b) it may be a combination of statistics, e.g. histogram + MCV.

+ * XXX Maybe this should check for duplicate stats. Although it's not clear
+ * what "duplicate" would mean here (wheter to compare only keys or also
+ * options). Moreover, we don't do such checks for indexes, although those
+ * store tuples and recreating a new index may be a way to fix bloat (which
+ * is a problem statistics don't have).
+ */
+ObjectAddress
+CreateStatistics(CreateStatsStmt *stmt)

I don't think we should check duplicates TBH so I would remove the XXX
(also "wheter" is typo but if you remove that paragraph it does not matter).


Yes, I came to the same conclusion - we can only really check for exact matches (same set of columns, same choice of statistic types), but that's fairly useless. I'll remove the XXX.

+       if (true)
+       {

Huh?


Yeah, that's a bit weird pattern. It's a remainder of copy-pasting the preceding block, which looks like this

    if (hasindex)
    {
        ...
    }

But we've decided to not add similar flag for the statistics. I'll move the block to a separate function (instead of merging it directly into the function, which is already a bit largeish).

+
+List *
+RelationGetMVStatList(Relation relation)
+{
...
+
+void
+update_mv_stats(Oid mvoid, MVNDistinct ndistinct,
+                               int2vector *attrs, VacAttrStats **stats)
...
+static double
+ndistinct_for_combination(double totalrows, int numrows, HeapTuple *rows,
+                                  int2vector *attrs, VacAttrStats **stats,
+                                  int k, int *combination)
+{


Again, these deserve comment.


OK, will add.

I'll try to look at other patches in the series as time permits.

thanks

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to