Re: PoC/WIP: Extended statistics on expressions

Zhihong Yu Sat, 16 Jan 2021 18:54:16 -0800

Hi,
+    * Check that only the base rel is mentioned.  (This should be dead code
+    * now that add_missing_from is history.)
+    */
+   if (list_length(pstate->p_rtable) != 1)


If it is dead code, it can be removed, right ?

For statext_mcv_clauselist_selectivity:

+                   // bms_free(list_attnums[listidx]);

The commented line can be removed.

+bool
+examine_clause_args2(List *args, Node **exprp, Const **cstp, bool
*expronleftp)

Better add some comment for examine_clause_args2 since there
is examine_clause_args() already.

+   if (!ok || stats->compute_stats == NULL || stats->minrows <= 0)

When would stats->minrows have negative value ?

For serialize_expr_stats():

+   sd = table_open(StatisticRelationId, RowExclusiveLock);
+
+   /* lookup OID of composite type for pg_statistic */
+   typOid = get_rel_type_id(StatisticRelationId);
+   if (!OidIsValid(typOid))
+       ereport(ERROR,

Looks like the table_open() call can be made after the typOid check.

+       Datum       values[Natts_pg_statistic];
+       bool        nulls[Natts_pg_statistic];
+       HeapTuple   stup;
+
+       if (!stats->stats_valid)

It seems the local arrays can be declared after the validity check.

+           if (enabled[i] == STATS_EXT_NDISTINCT)
+               ndistinct_enabled = true;
+           if (enabled[i] == STATS_EXT_DEPENDENCIES)
+               dependencies_enabled = true;
+           if (enabled[i] == STATS_EXT_MCV)

the second and third if should be preceded with 'else'

+ReleaseDummy(HeapTuple tuple)
+{
+   pfree(tuple);

Since ReleaseDummy() is just a single pfree call, maybe you don't need this
method - call pfree in its place.

Cheers

On Sat, Jan 16, 2021 at 4:24 PM Tomas Vondra <tomas.von...@enterprisedb.com>
wrote:

> On 1/17/21 12:22 AM, Justin Pryzby wrote:
> > On Sat, Jan 16, 2021 at 05:48:43PM +0100, Tomas Vondra wrote:
> >> +      <entry role="catalog_table_entry"><para role="column_definition">
> >> +       <structfield>expr</structfield> <type>text</type>
> >> +      </para>
> >> +      <para>
> >> +       Expression the extended statistics is defined on
> >> +      </para></entry>
> >
> > Expression the extended statistics ARE defined on
> > Or maybe say "on which the extended statistics are defined"
> >
>
> I'm pretty sure "is" is correct because "expression" is singular.
>
> >> +  <para>
> >> +   The <command>CREATE STATISTICS</command> command has two basic
> forms. The
> >> +   simple variant allows to build statistics for a single expression,
> does
> >
> > .. ALLOWS BUILDING statistics for a single expression, AND does (or BUT
> does)
> >
> >> +   Expression statistics are per-expression and are similar to
> creating an
> >> +   index on the expression, except that they avoid the overhead of the
> index.
> >
> > Maybe say "overhead of index maintenance"
> >
>
> Yeah, that sounds better.
>
> >> +   All functions and operators used in a statistics definition must be
> >> +   <quote>immutable</quote>, that is, their results must depend only on
> >> +   their arguments and never on any outside influence (such as
> >> +   the contents of another table or the current time).  This
> restriction
> >
> > say "outside factor" or "external factor"
> >
>
> In fact, we've removed the immutability restriction, so this paragraph
> should have been removed.
>
> >> +   results of those expression, and uses default estimates as
> illustrated
> >> +   by the first query.  The planner also does not realize the value of
> the
> >
> > realize THAT
> >
>
> OK, changed.
>
> >> +   second column fully defines the value of the other column, because
> date
> >> +   truncated to day still identifies the month. Then expression and
> >> +   ndistinct statistics are built on those two columns:
> >
> > I got an error doing this:
> >
> > CREATE TABLE t AS SELECT generate_series(1,9) AS i;
> > CREATE STATISTICS s ON (i+1) ,(i+1+0) FROM t;
> > ANALYZE t;
> > SELECT i+1 FROM t GROUP BY 1;
> > ERROR:  corrupt MVNDistinct entry
> >
>
> Thanks. There was a thinko in estimate_multivariate_ndistinct, resulting
> in mismatching the ndistinct coefficient items. The attached patch fixes
> that, but I've realized the way we pick the "best" statistics may need
> some improvements (I added an XXX comment about that).
>
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Re: PoC/WIP: Extended statistics on expressions

Reply via email to