On Wed, 24 Mar 2021 at 10:22, Tomas Vondra <tomas.von...@enterprisedb.com> wrote: > > Thanks, it seems to be some thinko in handling in PlaceHolderVars, which > seem to break the code's assumptions about varnos. This fixes it for me, > but I need to look at it more closely. >
I think that makes sense. Reviewing the docs, I noticed a couple of omissions, and had a few other suggestions (attached). Regards, Dean
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml new file mode 100644 index dadca67..382cbd7 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -7377,6 +7377,15 @@ SCRAM-SHA-256$<replaceable><iteration <literal>m</literal> for most common values (MCV) list statistics </para></entry> </row> + + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>stxexprs</structfield> <type>pg_node_tree</type> + </para> + <para> + A list of any expressions covered by this statistics object. + </para></entry> + </row> </tbody> </tgroup> </table> @@ -7474,6 +7483,16 @@ SCRAM-SHA-256$<replaceable><iteration <structname>pg_mcv_list</structname> type </para></entry> </row> + + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>stxdexpr</structfield> <type>pg_statistic[]</type> + </para> + <para> + Per-expression statistics, serialized as an array of + <structname>pg_statistic</structname> type + </para></entry> + </row> </tbody> </tgroup> </table> @@ -12843,7 +12862,8 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_p <para> The view <structname>pg_stats_ext</structname> provides access to - the information stored in the <link + information about each extended statistics object in the database, + combining information stored in the <link linkend="catalog-pg-statistic-ext"><structname>pg_statistic_ext</structname></link> and <link linkend="catalog-pg-statistic-ext-data"><structname>pg_statistic_ext_data</structname></link> catalogs. This view allows access only to rows of @@ -12930,7 +12950,16 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_p (references <link linkend="catalog-pg-attribute"><structname>pg_attribute</structname></link>.<structfield>attname</structfield>) </para> <para> - Names of the columns the extended statistics is defined on + Names of the columns included in the extended statistics + </para></entry> + </row> + + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>exprs</structfield> <type>text[]</type> + </para> + <para> + Expressions included in the extended statistics </para></entry> </row> @@ -13033,7 +13062,8 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_p <para> The view <structname>pg_stats_ext_exprs</structname> provides access to - the information stored in the <link + information about all expressions included in extended statistics objects, + combining information stored in the <link linkend="catalog-pg-statistic-ext"><structname>pg_statistic_ext</structname></link> and <link linkend="catalog-pg-statistic-ext-data"><structname>pg_statistic_ext_data</structname></link> catalogs. This view allows access only to rows of @@ -13119,7 +13149,7 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_p <structfield>expr</structfield> <type>text</type> </para> <para> - Expression the extended statistics is defined on + Expression included in the extended statistics </para></entry> </row> diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml new file mode 100644 index 5f3aefd..f561599 --- a/doc/src/sgml/ref/create_statistics.sgml +++ b/doc/src/sgml/ref/create_statistics.sgml @@ -27,7 +27,7 @@ CREATE STATISTICS [ IF NOT EXISTS ] <rep CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="parameter">statistics_name</replaceable> [ ( <replaceable class="parameter">statistics_kind</replaceable> [, ... ] ) ] - ON { <replaceable class="parameter">column_name</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) } [, ...] + ON { <replaceable class="parameter">column_name</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) }, { <replaceable class="parameter">column_name</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) } [, ...] FROM <replaceable class="parameter">table_name</replaceable> </synopsis> @@ -45,12 +45,15 @@ CREATE STATISTICS [ IF NOT EXISTS ] <rep <para> The <command>CREATE STATISTICS</command> command has two basic forms. The - simple variant allows building statistics for a single expression, does - not allow specifying any statistics kinds and provides benefits similar - to an expression index. The full variant allows defining statistics objects - on multiple columns and expressions, and selecting which statistics kinds will - be built. The per-expression statistics are built automatically when there - is at least one expression. + first form allows univariate statistics for a single expression to be + collected, providing benefits similar to an expression index without the + overhead of index maintenance. This form does not allow the statistics + kind to be specified, since the various statistics kinds refer only to + multivariate statistics. The second form of the command allows + multivariate statistics on multiple columns and/or expressions to be + collected, optionally specifying which statistics kinds to include. This + form will also automatically cause univariate statistics to be collected on + any expressions included in the list. </para> <para> @@ -93,16 +96,16 @@ CREATE STATISTICS [ IF NOT EXISTS ] <rep <term><replaceable class="parameter">statistics_kind</replaceable></term> <listitem> <para> - A statistics kind to be computed in this statistics object. + A multivariate statistics kind to be computed in this statistics object. Currently supported kinds are <literal>ndistinct</literal>, which enables n-distinct statistics, <literal>dependencies</literal>, which enables functional dependency statistics, and <literal>mcv</literal> which enables most-common values lists. If this clause is omitted, all supported statistics kinds are - included in the statistics object. Expression statistics are built - automatically when the statistics definition includes complex - expressions and not just simple column references. + included in the statistics object. Univariate expression statistics are + built automatically if the statistics definition includes any complex + expressions rather than just simple column references. For more information, see <xref linkend="planner-stats-extended"/> and <xref linkend="multivariate-statistics-examples"/>. </para> @@ -114,8 +117,9 @@ CREATE STATISTICS [ IF NOT EXISTS ] <rep <listitem> <para> The name of a table column to be covered by the computed statistics. - At least two column names must be given; the order of the column names - is insignificant. + This is only allowed when building multivariate statistics. At least + two column names or expressions must be specified, and their order is + not significant. </para> </listitem> </varlistentry> @@ -124,9 +128,11 @@ CREATE STATISTICS [ IF NOT EXISTS ] <rep <term><replaceable class="parameter">expression</replaceable></term> <listitem> <para> - The expression to be covered by the computed statistics. In this case - only a single expression is required, in which case only statistics - for the expression are built. + An expression to be covered by the computed statistics. This may be + used to build univariate statistics on a single expression, or as part + of a list of multiple column names and/or expressions to build + multivariate statistics. In the latter case, separate univariate + statistics are built automatically for each expression in the list. </para> </listitem> </varlistentry> @@ -156,8 +162,8 @@ CREATE STATISTICS [ IF NOT EXISTS ] <rep <para> Expression statistics are per-expression and are similar to creating an index on the expression, except that they avoid the overhead of index - maintenance. Expression statistics are built automatically when there - is at least one expression in the statistics object definition. + maintenance. Expression statistics are built automatically for each + expression in the statistics object definition. </para> </refsect1> @@ -232,13 +238,12 @@ EXPLAIN ANALYZE SELECT * FROM t2 WHERE ( <para> Create table <structname>t3</structname> with a single timestamp column, - and run a query using an expression on that column. Without extended - statistics, the planner has no information about data distribution for - results of those expression, and uses default estimates as illustrated - by the first query. The planner also does not realize that the value of - the second column fully determines the value of the other column, because - date truncated to day still identifies the month. Then expression and - ndistinct statistics are built on those two columns: + and run queries using expressions on that column. Without extended + statistics, the planner has no information about the data distribution for + the expressions, and uses default estimates. The planner also does not + realize that the value of the date truncated to the month is fully + determined by the value of the date truncated to the day. Then expression + and ndistinct statistics are built on those two expressions: <programlisting> CREATE TABLE t3 ( @@ -262,7 +267,8 @@ EXPLAIN ANALYZE SELECT * FROM t3 EXPLAIN ANALYZE SELECT date_trunc('month', a), date_trunc('day', a) FROM t3 GROUP BY 1, 2; --- per-expression statistics are built automatically +-- build ndistinct statistics on the pair of expressions (per-expression +-- statistics are built automatically) CREATE STATISTICS s3 (ndistinct) ON date_trunc('month', a), date_trunc('day', a) FROM t3; ANALYZE t3;