Re: [HACKERS] multivariate statistics (v19)

Petr Jelinek Fri, 30 Dec 2016 05:13:17 -0800

On 12/12/16 22:50, Tomas Vondra wrote:
> On 12/12/2016 12:26 PM, Amit Langote wrote:
>>
>> Hi Tomas,
>>
>> On 2016/10/30 4:23, Tomas Vondra wrote:
>>> Hi,
>>>
>>> Attached is v20 of the multivariate statistics patch series, doing
>>> mostly
>>> the changes outlined in the preceding e-mail from October 11.
>>>
>>> The patch series currently has these parts:
>>>
>>> * 0001 : (FIX) teach pull_varno about RestrictInfo
>>> * 0002 : (PATCH) shared infrastructure and ndistinct coefficients


Hi,

I went over these two (IMHO those could easily be considered as minimal
committable set even if the user visible functionality they provide is
rather limited).

> dropping statistics
> -------------------
> 
> The statistics may be dropped automatically using DROP STATISTICS.
> 
> After ALTER TABLE ... DROP COLUMN, statistics referencing are:
> 
>   (a) dropped, if the statistics would reference only one column
> 
>   (b) retained, but modified on the next ANALYZE

This should be documented in user visible form if you plan to keep it
(it does make sense to me).

> +   therefore perfectly correlated. Providing additional information about
> +   correlation between columns is the purpose of multivariate statistics,
> +   and the rest of this section thoroughly explains how the planner
> +   leverages them to improve estimates.
> +  </para>
> +
> +  <para>
> +   For additional details about multivariate statistics, see
> +   <filename>src/backend/utils/mvstats/README.stats</>. There are additional
> +   <literal>READMEs</> for each type of statistics, mentioned in the 
> following
> +   sections.
> +  </para>
> +
> + </sect1>

I don't think this qualifies as "thoroughly explains" ;)

> +
> +Oid
> +get_statistics_oid(List *names, bool missing_ok)

No comment?

> +             case OBJECT_STATISTICS:
> +                     msg = gettext_noop("statistics \"%s\" does not exist, 
> skipping");
> +                     name = NameListToString(objname);
> +                     break;

This sounds somewhat weird (plural vs singular).

> + * XXX Maybe this should check for duplicate stats. Although it's not clear
> + * what "duplicate" would mean here (wheter to compare only keys or also
> + * options). Moreover, we don't do such checks for indexes, although those
> + * store tuples and recreating a new index may be a way to fix bloat (which
> + * is a problem statistics don't have).
> + */
> +ObjectAddress
> +CreateStatistics(CreateStatsStmt *stmt)

I don't think we should check duplicates TBH so I would remove the XXX
(also "wheter" is typo but if you remove that paragraph it does not matter).

> +     if (true)
> +     {

Huh?

> +
> +List *
> +RelationGetMVStatList(Relation relation)
> +{
...
> +
> +void
> +update_mv_stats(Oid mvoid, MVNDistinct ndistinct,
> +                             int2vector *attrs, VacAttrStats **stats)
...
> +static double
> +ndistinct_for_combination(double totalrows, int numrows, HeapTuple *rows,
> +                                int2vector *attrs, VacAttrStats **stats,
> +                                int k, int *combination)
> +{


Again, these deserve comment.

I'll try to look at other patches in the series as time permits.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] multivariate statistics (v19)

Reply via email to