Hi,

> Per side conversation in [1], this patch exposes pg_statistic.starelid in the 
> pg_stats view (0001)
> and then changes pg_dump to use the starelid in the queries on pg_stats 
> rather than the
> combination of schemaname+relname, which gets a bit clumsy in bulk array 
> fetching, and
> also leads to bad query plans against pg_stats because the security barrier 
> is also an
> optimization barrier, which means that queries often try to hash join, which 
> causes the
> very large table pg_statistic to be sequentially scanned, and that's a bad 
> time. Currently
> we get around this by adding a redundant qual to the query which gooses the 
> plan towards
> an index, and that works fine for now, but there's no guarantee that it will 
> continue to work
> in the future, so this change brings some plan safety as well.

This alone seems like a good enough reason to include startled in
pg_stat, besides other
future use-cases this will simplify as discussed in [1]

> 0001 also exposes pg_statistic.attnum. This is no direct application of this 
> in 0002,
> but people have often expressed surprise that pg_dump orders the
> pg_restore_attribute_stats() calls by attname rather than attnum,
> and if we wanted to change that now we could (albeit only on new versions).

This seems logical

v1-0001 comments:

1/

-        nspname AS schemaname,
-        relname AS tablename,
-        attname AS attname,
+        n.nspname AS schemaname,
+        c.relname AS tablename,
+        s.starelid AS starelid,
+        a.attnum AS attnum,
+        a.attname AS attname,

Ideally, we would want the OID and attnum to be first columns to match
the pg_stat_activity/pg_stat_progress style, but that will be too invasive.

The new attname column should be moved after tablename, however.

2/

+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>attnum</structfield> <type>name</type>
+       (references <link
linkend="catalog-pg-attribute"><structname>pg_attribute</structname></link>.<structfield>attnum</structfield>)
+      </para>
+      <para>
+       Position of column described by this row
+      </para></entry>
+     </row>

- This <type> should be int2
- Maybe for the describtion, it should be something like:

  "The number of the column, as in
<structname>pg_attribute</structname>.<structfield>attnum</structfield>"

  to be close to the description in pg_attribute

3/

+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>starelid</structfield> <type>oid</type>
+      </para>
+      <para>
+       ID of the relation
+      </para></entry>
+     </row>

This should indicate it is a reference to pg_class.oid, like so:

```
(references <link
linkend="catalog-pg-class"><structname>pg_class</structname></link>.<structfield>oid</structfield>)
```

Maybe "ID of the table or index" is better, since this can only be a
table or index
for pg_stats.

I dislike the existing "pg_stats.tablename", since this can also be an
expression index.
"pg_stats.relation" with a description of "Name of table or index" is
more appropriate.
It is a change that we can possibly make in a major version. Looked
through the archives,
and did not see this being reported/discussed.

v1-0002:

I examined the version variants of getAttributeStats and all looks good,
and also ran a test on a 18 and 19 server version with the patched pg_dump
client and all looks good.

One minor comment is:


+                    pg_fatal("statistics table oid information missing");


I noticed in other pg_fatal messages, we include OIDs

            pg_fatal("could not find function definition for function
with OID %u",
                     cast->castfunc);

Should we do the same here?



[1] https://www.postgresql.org/message-id/aZ3RbTE8Racscykc@nathan

--
Sami Imseih
Amazon Web Services (AWS)


Reply via email to