Add support for "exprs" in pg_restore_extended_stats()
This commit adds support for the restore of extended statistics of the
kind "exprs", counting for the statistics data computed for expressions.
The input format consists of a jsonb object which must be an array of
objects which are keyed by statistics parameter names, like this:
[{"stat_type1": "...", "stat_type2": "...", ...},
{"stat_type1": "...", "stat_type2": "...", ...}, ...]
The outer array must have as many elements as there are expressions
defined in the statistics object, mapping with the way extended
statistics are built with one pg_statistic tuple stored for each
expression whose statistics have been computed. The elements of the
array must be either objects or null values (equivalent of invalid data,
case also supported by the stats computations when its data is inserted
in the catalogs).
The keys of the inner objects are names of the statistical columns in
pg_stats_ext_exprs (i.e. everything after "inherited"). Not all
parameter keys need to be provided, those omitted being silently
ignored. Key values that do not match a statistical column name will
cause a warning to be issued, but do not otherwise fail the expression
or the import as a whole.
The expected value type for all parameters is jbvString, which allows
us to validate the values using the input function specific to that
parameter. Any parameters with a null value are silently ignored, same
as if they were not provided in the first place.
This commit includes a battery of test cases:
- Sanity checks for what-should-be-all the failures in restore code
paths, including parsing errors, parameter sanity checks depending on
the extended stats object definition, etc.
- Value injection, for scalar, array, range, multi-range cases.
- Stats data cloning, with differential checks between the source
relation and its target. The source and the target should hold the same
stats data after restore.
- While expressions are supported in extended statistics since v14,
range_length_histogram, range_empty_frac, and range_bounds_histogram
have been added to pg_stat_ext_exprs only in v19. A test case has been
added to emulate a dump taken from v18, with expression stats restored
for a range data type where these three fields are NULL.
Support for pg_dump is included, with expressions supported since v14,
inherited since v15, and data for range types in expressions in v19.
pg_upgrade is the main use-case of this feature; it is also possible to
inject statistics, same as for the other extstat kinds.
As of this commit, ANALYZE should not be required after pg_upgrade when
the cluster upgrading from uses extended statistics, as MCV,
dependencies, expressions and ndistinct stats are all covered. The
stats data related to range types used in expressions requires v19,
whose support has also been added.
Author: Corey Huinker <[email protected]>
Co-authored-by: Michael Paquier <[email protected]>
Discussion:
https://postgr.es/m/CADkLM=fpcci6opyuyez0f4bwqaa7hzawo+zpptufux5_uwt...@mail.gmail.com
Branch
------
master
Details
-------
https://git.postgresql.org/pg/commitdiff/ba97bf9cb7b4ca184ab669569be2633a46cfdd0c
Modified Files
--------------
doc/src/sgml/func/func-admin.sgml | 42 +-
src/backend/statistics/extended_stats_funcs.c | 901 ++++++++++++++++-
src/bin/pg_dump/pg_dump.c | 56 +-
src/test/regress/expected/stats_import.out | 1343 ++++++++++++++++++++++++-
src/test/regress/sql/stats_import.sql | 847 +++++++++++++++-
5 files changed, 3133 insertions(+), 56 deletions(-)