Re: Statistics Import and Export

Ashutosh Bapat Mon, 01 Apr 2024 04:27:14 -0700

Hi Corey,


On Mon, Mar 25, 2024 at 3:38 PM Ashutosh Bapat <[email protected]>
wrote:

> Hi Corey,
>
>
> On Sat, Mar 23, 2024 at 7:21 AM Corey Huinker <[email protected]>
> wrote:
>
>> v12 attached.
>>
>> 0001 -
>>
>>
> Some random comments
>
> +SELECT
> +    format('SELECT pg_catalog.pg_set_attribute_stats( '
> +            || 'relation => %L::regclass::oid, attname => %L::name, '
> +            || 'inherited => %L::boolean, null_frac => %L::real, '
> +            || 'avg_width => %L::integer, n_distinct => %L::real, '
> +            || 'most_common_vals => %L::text, '
> +            || 'most_common_freqs => %L::real[], '
> +            || 'histogram_bounds => %L::text, '
> +            || 'correlation => %L::real, '
> +            || 'most_common_elems => %L::text, '
> +            || 'most_common_elem_freqs => %L::real[], '
> +            || 'elem_count_histogram => %L::real[], '
> +            || 'range_length_histogram => %L::text, '
> +            || 'range_empty_frac => %L::real, '
> +            || 'range_bounds_histogram => %L::text) ',
> +        'stats_export_import.' || s.tablename || '_clone', s.attname,
> +        s.inherited, s.null_frac,
> +        s.avg_width, s.n_distinct,
> +        s.most_common_vals, s.most_common_freqs, s.histogram_bounds,
> +        s.correlation, s.most_common_elems, s.most_common_elem_freqs,
> +        s.elem_count_histogram, s.range_length_histogram,
> +        s.range_empty_frac, s.range_bounds_histogram)
> +FROM pg_catalog.pg_stats AS s
> +WHERE s.schemaname = 'stats_export_import'
> +AND s.tablename IN ('test', 'is_odd')
> +\gexec
>
> Why do we need to construct the command and execute? Can we instead
> execute the function directly? That would also avoid ECHO magic.
>

Addressed in v15


>
> +   <table id="functions-admin-statsimport">
> +    <title>Database Object Statistics Import Functions</title>
> +    <tgroup cols="1">
> +     <thead>
> +      <row>
> +       <entry role="func_table_entry"><para role="func_signature">
> +        Function
> +       </para>
> +       <para>
> +        Description
> +       </para></entry>
> +      </row>
> +     </thead>
>
> COMMENT: The functions throw many validation errors. Do we want to list
> the acceptable/unacceptable input values in the documentation corresponding
> to those? I don't expect one line per argument validation. Something like
> "these, these and these arguments can not be NULL" or "both arguments in
> each of the pairs x and y, a and b, and c and d should be non-NULL or NULL
> respectively".
>

Addressed in v15.


> + /* Statistics are dependent on the definition, not the data */
> + /* Views don't have stats */
> + if ((tbinfo->dobj.dump & DUMP_COMPONENT_STATISTICS) &&
> + (tbinfo->relkind == RELKIND_VIEW))
> + dumpRelationStats(fout, &tbinfo->dobj, reltypename,
> +  tbinfo->dobj.dumpId);
> +
>
> Statistics are about data. Whenever pg_dump dumps some filtered data, the
> statistics collected for the whole table are uselss. We should avoide
> dumping
> statistics in such a case. E.g. when only schema is dumped what good is
> statistics? Similarly the statistics on a partitioned table may not be
> useful
> if some its partitions are not dumped. Said that dumping statistics on
> foreign
> table makes sense since they do not contain data but the statistics still
> makes sense.
>

Dumping statistics without data is required for pg_upgrade. This is being
discussed in the same thread. But I don't see some of the suggestions e.g.
using binary-mode switch being used in v15.

Also, should we handle sequences, composite types the same way? THe latter
is probably not dumped, but in case.


>
> Whether or not I pass --no-statistics, there is no difference in the dump
> output. Am I missing something?
> $ pg_dump -d postgres > /tmp/dump_no_arguments.out
> $ pg_dump -d postgres --no-statistics > /tmp/dump_no_statistics.out
> $ diff /tmp/dump_no_arguments.out /tmp/dump_no_statistics.out
> $
>
> IIUC, pg_dump includes statistics by default. That means all our pg_dump
> related tests will have statistics output by default. That's good since the
> functionality will always be tested. 1. We need additional tests to ensure
> that the statistics is installed after restore. 2. Some of those tests
> compare dumps before and after restore. In case the statistics is changed
> because of auto-analyze happening post-restore, these tests will fail.
>

Fixed.

Thanks for addressing those comments.

-- 
Best Wishes,
Ashutosh Bapat

Re: Statistics Import and Export

Reply via email to