Hi,
> Per side conversation in [1], this patch exposes pg_statistic.starelid in the
> pg_stats view (0001)
> and then changes pg_dump to use the starelid in the queries on pg_stats
> rather than the
> combination of schemaname+relname, which gets a bit clumsy in bulk array
> fetching, and
> also leads to bad query plans against pg_stats because the security barrier
> is also an
> optimization barrier, which means that queries often try to hash join, which
> causes the
> very large table pg_statistic to be sequentially scanned, and that's a bad
> time. Currently
> we get around this by adding a redundant qual to the query which gooses the
> plan towards
> an index, and that works fine for now, but there's no guarantee that it will
> continue to work
> in the future, so this change brings some plan safety as well.
This alone seems like a good enough reason to include startled in
pg_stat, besides other
future use-cases this will simplify as discussed in [1]
> 0001 also exposes pg_statistic.attnum. This is no direct application of this
> in 0002,
> but people have often expressed surprise that pg_dump orders the
> pg_restore_attribute_stats() calls by attname rather than attnum,
> and if we wanted to change that now we could (albeit only on new versions).
This seems logical
v1-0001 comments:
1/
- nspname AS schemaname,
- relname AS tablename,
- attname AS attname,
+ n.nspname AS schemaname,
+ c.relname AS tablename,
+ s.starelid AS starelid,
+ a.attnum AS attnum,
+ a.attname AS attname,
Ideally, we would want the OID and attnum to be first columns to match
the pg_stat_activity/pg_stat_progress style, but that will be too invasive.
The new attname column should be moved after tablename, however.
2/
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>attnum</structfield> <type>name</type>
+ (references <link
linkend="catalog-pg-attribute"><structname>pg_attribute</structname></link>.<structfield>attnum</structfield>)
+ </para>
+ <para>
+ Position of column described by this row
+ </para></entry>
+ </row>
- This <type> should be int2
- Maybe for the describtion, it should be something like:
"The number of the column, as in
<structname>pg_attribute</structname>.<structfield>attnum</structfield>"
to be close to the description in pg_attribute
3/
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>starelid</structfield> <type>oid</type>
+ </para>
+ <para>
+ ID of the relation
+ </para></entry>
+ </row>
This should indicate it is a reference to pg_class.oid, like so:
```
(references <link
linkend="catalog-pg-class"><structname>pg_class</structname></link>.<structfield>oid</structfield>)
```
Maybe "ID of the table or index" is better, since this can only be a
table or index
for pg_stats.
I dislike the existing "pg_stats.tablename", since this can also be an
expression index.
"pg_stats.relation" with a description of "Name of table or index" is
more appropriate.
It is a change that we can possibly make in a major version. Looked
through the archives,
and did not see this being reported/discussed.
v1-0002:
I examined the version variants of getAttributeStats and all looks good,
and also ran a test on a 18 and 19 server version with the patched pg_dump
client and all looks good.
One minor comment is:
+ pg_fatal("statistics table oid information missing");
I noticed in other pg_fatal messages, we include OIDs
pg_fatal("could not find function definition for function
with OID %u",
cast->castfunc);
Should we do the same here?
[1] https://www.postgresql.org/message-id/aZ3RbTE8Racscykc@nathan
--
Sami Imseih
Amazon Web Services (AWS)