Re: Import Statistics in postgres_fdw before resorting to sampling.

Etsuro Fujita Sat, 03 Jan 2026 02:06:47 -0800

On Mon, Dec 15, 2025 at 3:40 AM Etsuro Fujita <[email protected]> wrote:
> On Sun, Dec 14, 2025 at 4:12 AM Corey Huinker <[email protected]> wrote:
> > Anyway, here's v6 incorporating both threads of feedback.
>
> I will review the postgres_fdw changes next.


I spent some time reviewing the changes.

+     <term><literal>fetch_stats</literal> (<type>boolean</type>)</term>
+     <listitem>
+      <para>
+       This option, which can be specified for a foreign table or a foreign
+       server, determines if <command>ANALYZE</command> on a foreign table
+       will first attempt to fetch and import the existing relation and
+       attribute statistics from the remote table, and only attempt regular
+       data sampling if no statistics were available. This option is only
+       useful if the remote relation is one that can have regular statistics
+       (tables and materialized views).
+       The default is <literal>true</literal>.

This version of the patch wouldn't fall back to the normal ANALYZE
processing anymore, so this documentation should be updated as such.
Also, as I think it's the user's responsibility to ensure the existing
statistics are up-to-date, as I said before, I think we should add a
note about that here.  Also, as some users wouldn't be able to ensure
it, I'm wondering if the default should be false.

+     <term><literal>remote_analyze</literal> (<type>boolean</type>)</term>
+     <listitem>
+      <para>
+       This option, which can be specified for a foreign table or a foreign
+       server, determines whether an <command>ANALYZE</command> on a foreign
+       table will attempt to <command>ANALYZE</command> the remote table if
+       the first attempt to fetch remote statistics fails, and will then
+       make a second and final attempt to fetch remote statistics. This option
+       is only meaningful if the foreign table has
+       <literal>fetch_stats</literal> enabled at either the server or table
+       level.

If the user wasn't able to ensure the statistics are up-to-date, I
think he/she might want to do remote ANALYZE *before* fetching the
statistics, not after trying to do so.  So I think we could instead
provide this option as such.  What do you think about that?  Anyway,
I'd vote for leaving this for another patch, as I think it's a
nice-to-have option rather than a must-have one, as I said before.

ISTM that the code is well organized overall.  Here are a few comments:

+static const char *relstats_query_18 =
+   "SELECT c.relkind, c.relpages, c.reltuples, c.relallvisible,
c.relallfrozen "
+   "FROM pg_catalog.pg_class AS c "
+   "JOIN pg_catalog.pg_namespace AS n ON n.oid = c.relnamespace "
+   "WHERE n.nspname = $1 AND c.relname = $2";

We don't use relallvisible and relallfrozen for foreign tables (note
that do_analyze_rel() calls vac_update_relstats() with relallvisible=0
and relallfrozen=0 for them).  Do we really need to retrieve (and
restore) them?

+static const char *attstats_query_17 =
+   "SELECT DISTINCT ON (s.attname) attname, s.null_frac, s.avg_width, "
+   "s.n_distinct, s.most_common_vals, s.most_common_freqs, "
+   "s.histogram_bounds, s.correlation, s.most_common_elems, "
+   "s.most_common_elem_freqs, s.elem_count_histogram, "
+   "s.range_length_histogram, s.range_empty_frac, s.range_bounds_histogram "
+   "FROM pg_catalog.pg_stats AS s "
+   "WHERE s.schemaname = $1 AND s.tablename = $2 "
+   "ORDER BY s.attname, s.inherited DESC";

I think we should retrieve the attribute statistics for only the
referenced columns of the remote table, not all the columns of it, to
reduce the data transfer and the cost of matching local/remote
attributes in import_fetched_statistics().

In fetch_remote_statistics()

+   if (server_version_num >= 180000)
+       relation_sql = relstats_query_18;
+   else if (server_version_num >= 140000)
+       relation_sql = relstats_query_14;
+   else
+       relation_sql = relstats_query_default;

I think that having the definition for each of relstats_query_18,
relstats_query_14 and relstats_query_default makes the code redundant
and the maintenance hard.  To avoid that, how about building the
relation_sql query dynamically as done for the query to fetch all
table data from the remote server in postgresImportForeignSchema().
Same for the attribute_sql query.

That's all I have for now.  I will continue to review the changes.

Best regards,
Etsuro Fujita

Re: Import Statistics in postgres_fdw before resorting to sampling.

Reply via email to