> > > I agree, if there is no fallback, then the default should be false. When > I was initially brainstorming this patch, Nathan Bossart had suggested > making it the default because 1) that would be an automatic benefit to > users and 2) the cost for attempting to import stats was small in > comparison to a table stample, so it was worth the attempt. I still want > users to get that automatic benefit, but if there is no fallback to > sampling then the default only makes sense as false. > > I think that the FDW API that I proposed could actually allow us to > fall back to sampling, by modifying StatisticsAreImportable so that it > also checks if 1) there are statistics on the remote server and 2) the > data is fresh enough, and if so, returns true; otherwise, returns > false; in the latter case we could fall back to sampling. And if we > modified it as such, I think we could change the default to true. > (Checking #2 is necessary to avoid importing stale data, which would > degrade plan quality.)
I've given this some more thought. First, we'd have to add the va_cols param to StatisticsAreImportable, which isn't itself terrible. Then, we'd have to determine that there are stats available for every mapped column (filtered by va_cols, if any). But the only way to do that is to query the pg_stats view on the remote end, and if we have done that, then we've already fetched the stats. Yes, we could avoid selecting the actual statistical values, and that would save some network bandwidth at the cost of having to do the query again with stats. So I don't really see the savings. Also, the pg_stats view is our security-barrier black box into statistics, and it gives no insight into how recently those stats were acquired. We could poke pg_stat_all_tables, assuming we can even query it, and then make a value judgement on the value of (CURRENT_TIMESTAMP - GREATEST(last_analyze, last_autoanalyze), but that value is highly subjective. I suppose we could move all of the statistics fetching into StatisticsAreImportable, And carry those values forward if they are satisfactory. That would leave ImportStatistics() with little to do other than form up the calls to pg_restore_*_stats(), but those could still fail, and at that point we'd have no way to fall back to sampling and analysis. I really want to make sampling fallback possible. Anyway, here's v8, incorporating the documentation feedback and Matheus's notes.
From 57733548ed2e06b10f18f5299dae3e58aa453e67 Mon Sep 17 00:00:00 2001 From: Corey Huinker <[email protected]> Date: Thu, 7 Aug 2025 23:58:38 -0400 Subject: [PATCH v8] Add remote statistics fetching to postgres_fdw. This adds the ability to fetch and import statistics from a remote server table table rather than fetching the data or data sample from that table. This is managed via two new options, fetch_stats and remote_analyze, both are available at the server level and table level. If fetch_stats is true, then the ANALYZE command will attempt to fetch statistics from the remote table and import those statistics locally. If remote_analyze is true, and if the first attempt to fetch remote statistics found no attribute statistics, then an attempt will be made to ANALYZE the remote table before a second and final attempt to fetch remote statistics. If the attempts to fetch statistics result in some analyzed columns missing imported statistics, then the ANALYZE will raise an error. This operation will only work on remote relations that can have stored statistics: tables, partitioned tables, and materialized views. If the remote relation is a view then remote fetching/analyzing is just wasted effort and the user is better off setting fetch_stats to false for that table. The default for fetch_stats is, for now, true at both server and table level. The default for remote_analyze is false at both the server and table level. In both cases, setting a value at the table level will override the corresponding server-level setting. --- src/include/foreign/fdwapi.h | 9 +- src/backend/commands/analyze.c | 52 +- doc/src/sgml/postgres-fdw.sgml | 38 +- .../postgres_fdw/expected/postgres_fdw.out | 52 +- contrib/postgres_fdw/option.c | 10 + contrib/postgres_fdw/postgres_fdw.c | 797 ++++++++++++++++++ contrib/postgres_fdw/sql/postgres_fdw.sql | 46 +- 7 files changed, 983 insertions(+), 21 deletions(-) diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h index 96b6f692d2a..e54d441125a 100644 --- a/src/include/foreign/fdwapi.h +++ b/src/include/foreign/fdwapi.h @@ -19,7 +19,6 @@ /* avoid including explain_state.h here */ typedef struct ExplainState ExplainState; - /* * Callback function signatures --- see fdwhandler.sgml for more info. */ @@ -157,6 +156,12 @@ typedef bool (*AnalyzeForeignTable_function) (Relation relation, AcquireSampleRowsFunc *func, BlockNumber *totalpages); +typedef bool (*StatisticsAreImportable_function) (Relation relation); + +typedef void (*ImportStatistics_function) (Relation relation, + List *va_cols, + int elevel); + typedef List *(*ImportForeignSchema_function) (ImportForeignSchemaStmt *stmt, Oid serverOid); @@ -255,6 +260,8 @@ typedef struct FdwRoutine /* Support functions for ANALYZE */ AnalyzeForeignTable_function AnalyzeForeignTable; + StatisticsAreImportable_function StatisticsAreImportable; + ImportStatistics_function ImportStatistics; /* Support functions for IMPORT FOREIGN SCHEMA */ ImportForeignSchema_function ImportForeignSchema; diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c index a483424152c..e69288968fc 100644 --- a/src/backend/commands/analyze.c +++ b/src/backend/commands/analyze.c @@ -113,6 +113,8 @@ analyze_rel(Oid relid, RangeVar *relation, int elevel; AcquireSampleRowsFunc acquirefunc = NULL; BlockNumber relpages = 0; + FdwRoutine *fdwroutine = NULL; + bool import_stats = false; /* Select logging level */ if (params.options & VACOPT_VERBOSE) @@ -195,26 +197,33 @@ analyze_rel(Oid relid, RangeVar *relation, else if (onerel->rd_rel->relkind == RELKIND_FOREIGN_TABLE) { /* - * For a foreign table, call the FDW's hook function to see whether it - * supports analysis. + * For a foreign table, call the FDW's hook functions to see whether + * it supports statistics import or analysis. */ - FdwRoutine *fdwroutine; - bool ok = false; fdwroutine = GetFdwRoutineForRelation(onerel, false); - if (fdwroutine->AnalyzeForeignTable != NULL) - ok = fdwroutine->AnalyzeForeignTable(onerel, - &acquirefunc, - &relpages); - - if (!ok) + if (fdwroutine->ImportStatistics != NULL && + fdwroutine->StatisticsAreImportable != NULL && + fdwroutine->StatisticsAreImportable(onerel)) + import_stats = true; + else { - ereport(WARNING, - (errmsg("skipping \"%s\" --- cannot analyze this foreign table", - RelationGetRelationName(onerel)))); - relation_close(onerel, ShareUpdateExclusiveLock); - return; + bool ok = false; + + if (fdwroutine->AnalyzeForeignTable != NULL) + ok = fdwroutine->AnalyzeForeignTable(onerel, + &acquirefunc, + &relpages); + + if (!ok) + { + ereport(WARNING, + errmsg("skipping \"%s\" -- cannot analyze this foreign table.", + RelationGetRelationName(onerel))); + relation_close(onerel, ShareUpdateExclusiveLock); + return; + } } } else if (onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) @@ -248,9 +257,18 @@ analyze_rel(Oid relid, RangeVar *relation, /* * Do the normal non-recursive ANALYZE. We can skip this for partitioned - * tables, which don't contain any rows. + * tables, which don't contain any rows. For foreign tables, if they + * support importing statistics, do that instead of the non-recursive + * ANALYZE. */ - if (onerel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE) + if (import_stats) + { + Assert(onerel->rd_rel->relkind == RELKIND_FOREIGN_TABLE); + Assert(fdwroutine != NULL); + Assert(fdwroutine->ImportStatistics != NULL); + fdwroutine->ImportStatistics(onerel, va_cols, elevel); + } + else if (onerel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE) do_analyze_rel(onerel, params, va_cols, acquirefunc, relpages, false, in_outer_xact, elevel); diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml index 9b032fbf675..05928835355 100644 --- a/doc/src/sgml/postgres-fdw.sgml +++ b/doc/src/sgml/postgres-fdw.sgml @@ -332,7 +332,7 @@ OPTIONS (ADD password_required 'false'); </para> <para> - The following option controls how such an <command>ANALYZE</command> + The following options control how such an <command>ANALYZE</command> operation behaves: </para> @@ -364,6 +364,42 @@ OPTIONS (ADD password_required 'false'); </listitem> </varlistentry> + <varlistentry> + <term><literal>fetch_stats</literal> (<type>boolean</type>)</term> + <listitem> + <para> + This option, which can be specified for a foreign table or a foreign + server, determines if <command>ANALYZE</command> on a foreign table + will instead attempt to fetch the existing relation and attribute + statistics from the remote table, and if all of the attributes being + analyzed have statistics in the remote table, then it will import + those statistics directly using + <function>pg_restore_relation_stats</relation> and + <function>pg_restore_attribute_stats</relation>. This option is only + useful if the remote relation is one that can have regular statistics + (tables and materialized views). + The default is <literal>true</literal>. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><literal>remote_analyze</literal> (<type>boolean</type>)</term> + <listitem> + <para> + This option, which can be specified for a foreign table or a foreign + server, determines whether an <command>ANALYZE</command> on a foreign + table will attempt to <command>ANALYZE</command> the remote table if + the first attempt to fetch remote statistics fails, and will then + make a second and final attempt to fetch remote statistics. This option + is only meaningful if the foreign table has + <literal>fetch_stats</literal> enabled at either the server or table + level. + The default is <literal>false</literal>. + </para> + </listitem> + </varlistentry> + </variablelist> </sect3> diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index 6066510c7c0..1f7ebeda82c 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -252,6 +252,7 @@ SELECT c3, c4 FROM ft1 ORDER BY c3, c1 LIMIT 1; -- should work again -- Now we should be able to run ANALYZE. -- To exercise multiple code paths, we use local stats on ft1 -- and remote-estimate mode on ft2. +ALTER SERVER loopback OPTIONS (ADD fetch_stats 'false'); ANALYZE ft1; ALTER FOREIGN TABLE ft2 OPTIONS (use_remote_estimate 'true'); -- =================================================================== @@ -4551,7 +4552,8 @@ REINDEX TABLE reind_fdw_parent; -- ok REINDEX TABLE CONCURRENTLY reind_fdw_parent; -- ok DROP TABLE reind_fdw_parent; -- =================================================================== --- conversion error +-- conversion error, will generate a WARNING for imported stats and an +-- error on locally computed stats. -- =================================================================== ALTER FOREIGN TABLE ft1 ALTER COLUMN c8 TYPE int; SELECT * FROM ft1 ftx(x1,x2,x3,x4,x5,x6,x7,x8) WHERE x1 = 1; -- ERROR @@ -11462,6 +11464,11 @@ CREATE FOREIGN TABLE async_p2 PARTITION OF async_pt FOR VALUES FROM (2000) TO (3 SERVER loopback2 OPTIONS (table_name 'base_tbl2'); INSERT INTO async_p1 SELECT 1000 + i, i, to_char(i, 'FM0000') FROM generate_series(0, 999, 5) i; INSERT INTO async_p2 SELECT 2000 + i, i, to_char(i, 'FM0000') FROM generate_series(0, 999, 5) i; +-- Will fail because fetch_stats = true (the default) on async_p2 +ANALYZE async_pt; +ERROR: Failed to import statistics from remote table public.base_tbl2, no statistics found. +-- Turn off fetch_stats at the table level. +ALTER FOREIGN TABLE async_p2 OPTIONS (ADD fetch_stats 'false'); ANALYZE async_pt; -- simple queries CREATE TABLE result_tbl (a int, b int, c text); @@ -11568,6 +11575,11 @@ CREATE TABLE base_tbl3 (a int, b int, c text); CREATE FOREIGN TABLE async_p3 PARTITION OF async_pt FOR VALUES FROM (3000) TO (4000) SERVER loopback2 OPTIONS (table_name 'base_tbl3'); INSERT INTO async_p3 SELECT 3000 + i, i, to_char(i, 'FM0000') FROM generate_series(0, 999, 5) i; +-- Will fail because fetch_stats = true (the default) on async_p3/loopback2 +ANALYZE async_pt; +ERROR: Failed to import statistics from remote table public.base_tbl3, no statistics found. +-- Turn off fetch_stats at the server level. +ALTER SERVER loopback2 OPTIONS (ADD fetch_stats 'false'); ANALYZE async_pt; EXPLAIN (VERBOSE, COSTS OFF) INSERT INTO result_tbl SELECT * FROM async_pt WHERE b === 505; @@ -12665,6 +12677,44 @@ ANALYZE analyze_ftable; DROP FOREIGN TABLE analyze_ftable; DROP TABLE analyze_table; -- =================================================================== +-- test remote analyze +-- =================================================================== +CREATE TABLE remote_analyze_table (id int, a text, b bigint); +INSERT INTO remote_analyze_table (SELECT x FROM generate_series(1,1000) x); +CREATE FOREIGN TABLE remote_analyze_ftable (id int, a text, b bigint) + SERVER loopback + OPTIONS (table_name 'remote_analyze_table', + fetch_stats 'true', + remote_analyze 'true'); +-- no stats before +SELECT s.tablename, COUNT(*) AS num_stats +FROM pg_stats AS s +WHERE s.schemaname = 'public' +AND s.tablename IN ('remote_analyze_table', 'remote_analyze_ftable') +GROUP BY s.tablename +ORDER BY s.tablename; + tablename | num_stats +-----------+----------- +(0 rows) + +ANALYZE remote_analyze_ftable; +-- both stats after +SELECT s.tablename, COUNT(*) AS num_stats +FROM pg_stats AS s +WHERE s.schemaname = 'public' +AND s.tablename IN ('remote_analyze_table', 'remote_analyze_ftable') +GROUP BY s.tablename +ORDER BY s.tablename; + tablename | num_stats +-----------------------+----------- + remote_analyze_ftable | 3 + remote_analyze_table | 3 +(2 rows) + +-- cleanup +DROP FOREIGN TABLE remote_analyze_ftable; +DROP TABLE remote_analyze_table; +-- =================================================================== -- test for postgres_fdw_get_connections function with check_conn = true -- =================================================================== -- Disable debug_discard_caches in order to manage remote connections diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c index b0bd72d1e58..2941ecbfb87 100644 --- a/contrib/postgres_fdw/option.c +++ b/contrib/postgres_fdw/option.c @@ -120,6 +120,8 @@ postgres_fdw_validator(PG_FUNCTION_ARGS) strcmp(def->defname, "async_capable") == 0 || strcmp(def->defname, "parallel_commit") == 0 || strcmp(def->defname, "parallel_abort") == 0 || + strcmp(def->defname, "fetch_stats") == 0 || + strcmp(def->defname, "remote_analyze") == 0 || strcmp(def->defname, "keep_connections") == 0) { /* these accept only boolean values */ @@ -278,6 +280,14 @@ InitPgFdwOptions(void) {"use_scram_passthrough", ForeignServerRelationId, false}, {"use_scram_passthrough", UserMappingRelationId, false}, + /* fetch_stats is available on both server and table */ + {"fetch_stats", ForeignServerRelationId, false}, + {"fetch_stats", ForeignTableRelationId, false}, + + /* remote_analyze is available on both server and table */ + {"remote_analyze", ForeignServerRelationId, false}, + {"remote_analyze", ForeignTableRelationId, false}, + /* * sslcert and sslkey are in fact libpq options, but we repeat them * here to allow them to appear in both foreign server context (when diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c index 3572689e33b..e69cb421523 100644 --- a/contrib/postgres_fdw/postgres_fdw.c +++ b/contrib/postgres_fdw/postgres_fdw.c @@ -22,6 +22,8 @@ #include "commands/explain_format.h" #include "commands/explain_state.h" #include "executor/execAsync.h" +#include "executor/spi.h" +#include "fmgr.h" #include "foreign/fdwapi.h" #include "funcapi.h" #include "miscadmin.h" @@ -48,6 +50,7 @@ #include "utils/rel.h" #include "utils/sampling.h" #include "utils/selfuncs.h" +#include "utils/syscache.h" PG_MODULE_MAGIC_EXT( .name = "postgres_fdw", @@ -317,6 +320,20 @@ typedef struct List *already_used; /* expressions already dealt with */ } ec_member_foreign_arg; +/* Result sets that are returned from a foreign statistics scan */ +typedef struct +{ + PGresult *rel; + PGresult *att; +} RemoteStatsResults; + +/* Pairs of remote columns with local attnums */ +typedef struct +{ + char remote_attname[NAMEDATALEN]; + AttrNumber local_attnum; +} RemoteAttributeMapping; + /* * SQL functions */ @@ -402,6 +419,10 @@ static void postgresExecForeignTruncate(List *rels, static bool postgresAnalyzeForeignTable(Relation relation, AcquireSampleRowsFunc *func, BlockNumber *totalpages); +static bool postgresStatisticsAreImportable(Relation relation); +static void postgresImportStatistics(Relation relation, + List *va_cols, + int elevel); static List *postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid); static void postgresGetForeignJoinPaths(PlannerInfo *root, @@ -546,6 +567,86 @@ static void merge_fdw_options(PgFdwRelationInfo *fpinfo, const PgFdwRelationInfo *fpinfo_i); static int get_batch_size_option(Relation rel); +/* + * Static queries for querying remote statistics. + */ + +/* All static relstats queries have the same column order */ +enum RelStatsColumns { + RELSTATS_RELKIND = 0, + RELSTATS_RELPAGES, + RELSTATS_RELTUPLES, + RELSTATS_NUM_FIELDS +}; + +/* range stats introduced in v17 */ +static const char *attstats_query_17 = + "SELECT DISTINCT ON (s.attname) attname, s.null_frac, s.avg_width, " + "s.n_distinct, s.most_common_vals, s.most_common_freqs, " + "s.histogram_bounds, s.correlation, s.most_common_elems, " + "s.most_common_elem_freqs, s.elem_count_histogram, " + "s.range_length_histogram, s.range_empty_frac, s.range_bounds_histogram " + "FROM pg_catalog.pg_stats AS s " + "WHERE s.schemaname = $1 AND s.tablename = $2 " + "AND s.attname = ANY($3::text[]) " + "ORDER BY s.attname, s.inherited DESC"; + +/* elements stats introduced in 9.2 */ +static const char *attstats_query_9_2 = + "SELECT DISTINCT ON (s.attname) attname, s.null_frac, s.avg_width, " + "s.n_distinct, s.most_common_vals, s.most_common_freqs, " + "s.histogram_bounds, s.correlation, s.most_common_elems, " + "s.most_common_elem_freqs, s.elem_count_histogram, " + "NULL AS range_length_histogram, NULL AS range_empty_frac, " + "NULL AS range_bounds_histogram " + "FROM pg_catalog.pg_stats AS s " + "WHERE s.schemaname = $1 AND s.tablename = $2 " + "AND s.attname = ANY($3::text[]) " + "ORDER BY s.attname, s.inherited DESC"; + +/* inherited introduced in 9.0 */ +static const char *attstats_query_9_0 = + "SELECT DISTINCT ON (s.attname) attname, s.null_frac, s.avg_width, " + "s.n_distinct, s.most_common_vals, s.most_common_freqs, " + "s.histogram_bounds, s.correlation, NULL AS most_common_elems, " + "NULL AS most_common_elem_freqs, NULL AS elem_count_histogram, " + "NULL AS range_length_histogram, NULL AS range_empty_frac, " + "NULL AS range_bounds_histogram " + "FROM pg_catalog.pg_stats AS s " + "WHERE s.schemaname = $1 AND s.tablename = $2 " + "AND s.attname = ANY($3::text[]) " + "ORDER BY s.attname, s.inherited DESC"; + +static const char *attstats_query_default = + "SELECT s.attname, s.null_frac, s.avg_width, " + "s.n_distinct, s.most_common_vals, s.most_common_freqs, " + "s.histogram_bounds, s.correlation, NULL AS most_common_elems, " + "NULL AS most_common_elem_freqs, NULL AS elem_count_histogram, " + "NULL AS range_length_histogram, NULL AS range_empty_frac, " + "NULL AS range_bounds_histogram " + "FROM pg_catalog.pg_stats AS s " + "WHERE s.schemaname = $1 AND s.tablename = $2 " + "AND s.attname = ANY($3::text[]) " + "ORDER BY s.attname"; + +/* All static attstats queries have the same column order */ +enum AttStatsColumns { + ATTSTATS_ATTNAME = 0, + ATTSTATS_NULL_FRAC, + ATTSTATS_AVG_WIDTH, + ATTSTATS_N_DISTINCT, + ATTSTATS_MOST_COMMON_VALS, + ATTSTATS_MOST_COMMON_FREQS, + ATTSTATS_HISTOGRAM_BOUNDS, + ATTSTATS_CORRELATION, + ATTSTATS_MOST_COMMON_ELEMS, + ATTSTATS_MOST_COMMON_ELEM_FREQS, + ATTSTATS_ELEM_COUNT_HISTOGRAM, + ATTSTATS_RANGE_LENGTH_HISTOGRAM, + ATTSTATS_RANGE_EMPTY_FRAC, + ATTSTATS_RANGE_BOUNDS_HISTOGRAM, + ATTSTATS_NUM_FIELDS +}; /* * Foreign-data wrapper handler function: return a struct with pointers @@ -595,6 +696,8 @@ postgres_fdw_handler(PG_FUNCTION_ARGS) /* Support functions for ANALYZE */ routine->AnalyzeForeignTable = postgresAnalyzeForeignTable; + routine->StatisticsAreImportable = postgresStatisticsAreImportable; + routine->ImportStatistics = postgresImportStatistics; /* Support functions for IMPORT FOREIGN SCHEMA */ routine->ImportForeignSchema = postgresImportForeignSchema; @@ -4935,6 +5038,700 @@ postgresAnalyzeForeignTable(Relation relation, return true; } +/* + * Process optional argument. + * + * Cannot be the first argument in the SQL function call. + * + * It is safe to presume that argname and argtype are quote-safe. + * + * Argument values can potentially be quite large, so free the quoted string + * after use. + */ +static void +append_optional(StringInfo str, PGresult *res, int row, int field, + const char *argname, const char *argtype) +{ + if (!PQgetisnull(res, row, field)) + { + /* Argument values can be quite large, so free after use */ + char *argval_l = quote_literal_cstr(PQgetvalue(res, row, field)); + + appendStringInfo(str, ",\n\t'%s', %s::%s", argname, argval_l, argtype); + + pfree(argval_l); + } +} + + +/* + * Generate a pg_restore_relation_stats command. + */ +static char * +restore_relation_stats_sql(PGresult *res, const char *schemaname, + const char *relname, const int server_version_num) +{ + StringInfoData sql; + + char *schemaname_l = quote_literal_cstr(schemaname); + char *relname_l = quote_literal_cstr(relname); + + initStringInfo(&sql); + appendStringInfo(&sql, "SELECT pg_catalog.pg_restore_relation_stats(\n" + "\t'version', %d::integer,\n" + "\t'schemaname', %s,\n" + "\t'relname', %s", + server_version_num, schemaname_l, relname_l); + + pfree(schemaname_l); + pfree(relname_l); + + append_optional(&sql, res, 0, RELSTATS_RELPAGES, "relpages", "integer"); + append_optional(&sql, res, 0, RELSTATS_RELTUPLES, "reltuples", "real"); + + appendStringInfoChar(&sql, ')'); + + return sql.data; +} + +/* + * Generate a pg_restore_attribute_stats command. + */ +static char * +restore_attribute_stats_sql(PGresult *res, int row, + const char *schemaname, const char *relname, + const AttrNumber attnum, const int server_version_num) +{ + StringInfoData sql; + + char *schemaname_l = quote_literal_cstr(schemaname); + char *relname_l = quote_literal_cstr(relname); + + initStringInfo(&sql); + appendStringInfo(&sql, "SELECT pg_catalog.pg_restore_attribute_stats(\n" + "\t'version', %d::integer,\n" + "\t'schemaname', %s,\n" + "\t'relname', %s,\n" + "\t'attnum', %d::smallint,\n" + "\t'inherited', false::boolean", + server_version_num, schemaname_l, relname_l, attnum); + + pfree(schemaname_l); + pfree(relname_l); + + append_optional(&sql, res, row, ATTSTATS_NULL_FRAC, "null_frac", "real"); + append_optional(&sql, res, row, ATTSTATS_AVG_WIDTH, "avg_width", "integer"); + append_optional(&sql, res, row, ATTSTATS_N_DISTINCT, "n_distinct", "real"); + append_optional(&sql, res, row, ATTSTATS_MOST_COMMON_VALS, "most_common_vals", "text"); + append_optional(&sql, res, row, ATTSTATS_MOST_COMMON_FREQS, "most_common_freqs", "real[]"); + append_optional(&sql, res, row, ATTSTATS_HISTOGRAM_BOUNDS, "histogram_bounds", "text"); + append_optional(&sql, res, row, ATTSTATS_CORRELATION, "correlation", "real"); + append_optional(&sql, res, row, ATTSTATS_MOST_COMMON_ELEMS, "most_common_elems", "text"); + append_optional(&sql, res, row, ATTSTATS_MOST_COMMON_ELEM_FREQS, "most_common_elem_freqs", "real[]"); + append_optional(&sql, res, row, ATTSTATS_ELEM_COUNT_HISTOGRAM, "elem_count_histogram", "real[]"); + append_optional(&sql, res, row, ATTSTATS_RANGE_LENGTH_HISTOGRAM, "range_length_histogram", "text"); + append_optional(&sql, res, row, ATTSTATS_RANGE_EMPTY_FRAC, "range_empty_frac", "real"); + append_optional(&sql, res, row, ATTSTATS_RANGE_BOUNDS_HISTOGRAM, "range_bounds_histogram", "text"); + + appendStringInfoChar(&sql, ')'); + + return sql.data; +} + +/* + * Test if an attribute name is in the list. + * + * An empty list means that all attribute names are in the list. + */ +static bool +attname_in_list(const char *attname, List *va_cols) +{ + ListCell *le; + + if (va_cols == NIL) + return true; + + foreach(le, va_cols) + { + char *col = strVal(lfirst(le)); + + if (strcmp(attname, col) == 0) + return true; + } + return false; +} + +/* + * Import fetched statistics into the local statistics tables. + */ +static void +import_fetched_statistics(const char *schemaname, const char *relname, + int server_version_num, int natts, + const RemoteAttributeMapping *remattrmap, + RemoteStatsResults *remstats) +{ + PGresult *res = remstats->att; + int spirc; + char *relimport_sql; + int mapidx = 0; + + Assert(natts > 0); + + SPI_connect(); + + /* + * Match result set rows to local attnums. + * + * Every row of the result should be an attribute that we specifically + * filtered for, so every row should have at least one match in the + * RemoteAttributeMapping, which is also ordered by attname, so we only + * need to walk that array once. + * + * We import the attribute statistics first, because those are more prone + * to errors. This avoids making a modification of pg_class that will just + * get rolled back by a failed attribute import. + */ + for (int i = 0; i < PQntuples(remstats->att); i++) + { + char *remote_attname; + char *attimport_sql; + bool match_found = false; + int cmp = 0; + + if (PQgetisnull(res, i, ATTSTATS_ATTNAME)) + ereport(ERROR, + errcode(ERRCODE_FDW_SCHEMA_NOT_FOUND), + errmsg("Remote statistics returned a row with NULL attribute name")); + + remote_attname = PQgetvalue(res, i, ATTSTATS_ATTNAME); + + /* + * Handle a series of matching elements in the mapping. + * + * We exit this loop in the following ways: + * + * 1. Having found at least match and exhausting the list. We're done. + * 2. Having found at least match and ending on a <, advance. + * 3. Having found at least match and ending on a >, bad sort. + * 4. No match, list exhausted. + * 5. No match, <. + * 6. No match, >. + * + */ + for (; mapidx < natts; mapidx++) + { + cmp = strcmp(remote_attname, remattrmap[mapidx].remote_attname); + + /* Stop scanning on the first non-match */ + if (cmp != 0) + break; + + match_found = true; + + attimport_sql = restore_attribute_stats_sql(res, i, + schemaname, relname, + remattrmap[mapidx].local_attnum, + server_version_num); + + spirc = SPI_execute(attimport_sql, false, 1); + pfree(attimport_sql); + + /* + * It takes a lot to make a restore command fail outright, so any + * actual failure is a sign that the statistics are seriously + * malformed, and we should give up on importing stats for this + * table. + */ + if (spirc != SPI_OK_SELECT) + ereport(ERROR, + errcode(ERRCODE_FDW_SCHEMA_NOT_FOUND), + errmsg("Attribute statistics import failed %s", + attimport_sql)); + } + + /* We found a match, move onto the next result. */ + if (match_found) + continue; + + /* + * If no match and we ended on a <, then the mapidx will never find a + * match, so stop now and let the cleanup report the error. + */ + if (cmp > 0) + break; + + /* + * Otherwise this result found no matching mapidx, which should only + * happen if the results are out of sort or the wrong query was + * executed. + */ + if (cmp < 0) + ereport(ERROR, + errcode(ERRCODE_FDW_SCHEMA_NOT_FOUND), + errmsg("No remote statistics found for %s of %s.", + quote_identifier(remattrmap[mapidx].remote_attname), + quote_qualified_identifier(schemaname, relname))); + } + + /* Look for a mapidx with no possible matching result */ + if (mapidx < natts) + ereport(ERROR, + errcode(ERRCODE_FDW_SCHEMA_NOT_FOUND), + errmsg("No remote statistics found for %s of %s.", + quote_identifier(remattrmap[mapidx].remote_attname), + quote_qualified_identifier(schemaname, relname))); + + /* + * Import relation stats. + */ + relimport_sql = restore_relation_stats_sql(remstats->rel, schemaname, + relname, server_version_num); + + spirc = SPI_execute(relimport_sql, false, 1); + + /* + * It takes a lot to make a restore command fail outright, so any actual + * failure is a sign that the statistics are seriously malformed, and + * we should give up on importing stats for this table. + */ + if (spirc != SPI_OK_SELECT) + ereport(ERROR, + errcode(ERRCODE_FDW_SCHEMA_NOT_FOUND), + errmsg("Relation statistics import failed: %s.", + relimport_sql)); + + pfree(relimport_sql); + SPI_finish(); +} + +/* + * Analyze a remote table. + */ +static void +analyze_remote_table(PGconn *conn, const char *remote_schemaname, + const char *remote_relname) +{ + StringInfoData buf; + PGresult *res; + + initStringInfo(&buf); + + appendStringInfo(&buf, "ANALYZE %s", + quote_qualified_identifier(remote_schemaname, remote_relname)); + + res = pgfdw_exec_query(conn, buf.data, NULL); + + if (res == NULL || + PQresultStatus(res) != PGRES_COMMAND_OK) + pgfdw_report_error(res, conn, buf.data); + + PQclear(res); + pfree(buf.data); +} + +/* + * Attempt to fetch remote relations stats. + * Verify that the result is of the proper shape. + */ +static PGresult * +fetch_relstats(PGconn *conn, + const char *remote_schemaname, const char *remote_relname) +{ + const char *params[2] = { remote_schemaname, remote_relname }; + + /* + * Before v14, a reltuples value of 0 was ambiguous: it could either mean + * the relation is empty, or it could mean that it hadn't yet been + * vacuumed or analyzed. (Newer versions use -1 for the latter case). + * + * We can ignore this change, because if the remote table wasn't analyzed, + * then it would have no attribute stats, and thus we wouldn't have stats that + * we would try to import. So we can take the reltuples value as-is. + */ + const char *sql = "SELECT c.relkind, c.relpages, c.reltuples " + "FROM pg_catalog.pg_class AS c " + "JOIN pg_catalog.pg_namespace AS n " + "ON n.oid = c.relnamespace " + "WHERE n.nspname = $1 AND c.relname = $2"; + + PGresult *res; + + if (!PQsendQueryParams(conn, sql, 2, NULL, params, NULL, NULL, 0)) + pgfdw_report_error(NULL, conn, sql); + + res = pgfdw_get_result(conn); + + if (res == NULL + || PQresultStatus(res) != PGRES_TUPLES_OK + || PQntuples(res) != 1 + || PQnfields(res) != RELSTATS_NUM_FIELDS + || PQgetisnull(res, 0, RELSTATS_RELKIND)) + pgfdw_report_error(res, conn, sql); + + return res; +} + +/* + * Attempt to fetch remote attribute stats. + * Verify that the result is of the proper shape. + * Note that we do not verify the row count. + */ +static PGresult * +fetch_attstats(PGconn *conn, + const char *remote_schemaname, const char *remote_relname, + const char *column_list, bool is_retry) +{ + const char *params[3] = { remote_schemaname, remote_relname, column_list }; + int version = PQserverVersion(conn); + const char *sql; + PGresult *res; + + if (version >= 170000) + sql = attstats_query_17; + else if (version >= 90200) + sql = attstats_query_9_2; + else if (version >= 90000) + sql = attstats_query_9_0; + else + sql = attstats_query_default; + + if (!PQsendQueryParams(conn, sql, 3, NULL, params, NULL, NULL, 0)) + pgfdw_report_error(NULL, conn, sql); + + res = pgfdw_get_result(conn); + + if (res == NULL + || PQresultStatus(res) != PGRES_TUPLES_OK + || PQnfields(res) != ATTSTATS_NUM_FIELDS) + pgfdw_report_error(res, conn, sql); + + /* Getting nothing on the second try is a failure */ + if (is_retry && PQntuples(res) == 0) + pgfdw_report_error(res, conn, sql); + + return res; +} + +/* + * Attempt to fetch statistics from a remote server. + */ +static void +fetch_remote_statistics(PGconn *conn, + const char *remote_schemaname, + const char *remote_relname, + int server_version_num, int natts, + const RemoteAttributeMapping *remattrmap, + bool remote_analyze, RemoteStatsResults *remstats) +{ + StringInfoData column_list; + + char relkind; + + remstats->rel = fetch_relstats(conn, remote_schemaname, remote_relname); + + /* + * Verify that the remote table is the sort that can have meaningful stats + * in pg_stats. + * + * Note that while relations of kinds RELKIND_INDEX and + * RELKIND_PARTITIONED_INDEX can have rows in pg_stats, they obviously can't + * support a foreign table. + */ + relkind = *PQgetvalue(remstats->rel, 0, RELSTATS_RELKIND); + + switch(relkind) + { + case RELKIND_RELATION: + case RELKIND_PARTITIONED_TABLE: + case RELKIND_FOREIGN_TABLE: + case RELKIND_MATVIEW: + break; + default: + ereport(ERROR, + errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("Remote table %s does not support statistics.", + quote_qualified_identifier(remote_schemaname, + remote_relname))); + return; + } + + initStringInfo(&column_list); + appendStringInfoChar(&column_list, '{'); + for (int i = 0; i < natts; i++) + { + if (i > 0) + appendStringInfoChar(&column_list, ','); + appendStringInfoString(&column_list, + quote_identifier(remattrmap[i].remote_attname)); + } + appendStringInfoChar(&column_list, '}'); + + /* See if it actually has any attribute stats. */ + remstats->att = fetch_attstats(conn, remote_schemaname, remote_relname, + column_list.data, false); + + /* + * If we got attribute statistics results, then we are done with fetching. + */ + if (PQntuples(remstats->att) > 0) + return; + + /* + * Clear off any existing fetched statistics, if any. If the analyze works + * then they we want to fetch the new ones. + */ + PQclear(remstats->att); + PQclear(remstats->rel); + remstats->att = NULL; + remstats->rel = NULL; + + /* + * If remote_analyze is enabled, then we will try to analyze the table and + * then try again. + */ + if (remote_analyze) + analyze_remote_table(conn, remote_schemaname, remote_relname); + else + ereport(ERROR, + errcode(ERRCODE_NO_DATA_FOUND), + errmsg("Failed to import statistics from remote table %s, " + "no statistics found.", + quote_qualified_identifier(remote_schemaname, + remote_relname))); + + /* + * Remote ANALYZE complete, so re-fetch attribute stats query. + */ + remstats->att = fetch_attstats(conn, remote_schemaname, remote_relname, + column_list.data, true); + + /* Re-fetch basic relation stats, as they have been updated. */ + remstats->rel = fetch_relstats(conn, remote_schemaname, remote_relname); + + pfree(column_list.data); +} + +static bool +postgresStatisticsAreImportable(Relation relation) +{ + ForeignTable *table; + ForeignServer *server; + ListCell *lc; + bool fetch_stats = true; + + table = GetForeignTable(RelationGetRelid(relation)); + server = GetForeignServer(table->serverid); + + /* + * Server-level options can be overridden by table-level options, so check + * server-level first. + */ + foreach(lc, server->options) + { + DefElem *def = (DefElem *) lfirst(lc); + + if (strcmp(def->defname, "fetch_stats") == 0) + { + fetch_stats = defGetBoolean(def); + break; + } + } + + foreach(lc, table->options) + { + DefElem *def = (DefElem *) lfirst(lc); + + if (strcmp(def->defname, "fetch_stats") == 0) + { + fetch_stats = defGetBoolean(def); + break; + } + } + + return fetch_stats; +} + +/* + * Compare two RemoteAttributeMappings for sorting. + */ +static int +remattrmap_cmp(const void *v1, const void *v2) +{ + const RemoteAttributeMapping *r1 = v1; + const RemoteAttributeMapping *r2 = v2; + + return strncmp(r1->remote_attname, r2->remote_attname, NAMEDATALEN); +} + +/* + * Fetch attstattarget from a pg_attribute tuple. + */ +static int16 +get_attstattarget(Relation relation, AttrNumber attnum) +{ + HeapTuple atttuple; + bool isnull; + Datum dat; + int16 attstattarget; + Oid relid = RelationGetRelid(relation); + + atttuple = SearchSysCache2(ATTNUM, + ObjectIdGetDatum(relid), + Int16GetDatum(attnum)); + if (!HeapTupleIsValid(atttuple)) + elog(ERROR, "cache lookup failed for attribute %d of relation %u", + attnum, relid); + dat = SysCacheGetAttr(ATTNUM, atttuple, + Anum_pg_attribute_attstattarget, + &isnull); + attstattarget = isnull ? -1 : DatumGetInt16(dat); + ReleaseSysCache(atttuple); + + return attstattarget; +} + +/* + * postgresImportStatistics + * Attempt to fetch remote statistics and apply those instead of analyzing. + */ +static void +postgresImportStatistics(Relation relation, List *va_cols, int elevel) +{ + + ForeignTable *table; + ForeignServer *server; + UserMapping *user; + PGconn *conn; + ListCell *lc; + bool remote_analyze = false; + int server_version_num = 0; + const char *schemaname = NULL; + const char *relname = NULL; + const char *remote_schemaname = NULL; + const char *remote_relname = NULL; + TupleDesc tupdesc = RelationGetDescr(relation); + int natts = 0; + + RemoteAttributeMapping *remattrmap; + + RemoteStatsResults remstats = { .rel = NULL, .att = NULL }; + + table = GetForeignTable(RelationGetRelid(relation)); + server = GetForeignServer(table->serverid); + user = GetUserMapping(GetUserId(), table->serverid); + conn = GetConnection(user, false, NULL); + server_version_num = PQserverVersion(conn); + schemaname = get_namespace_name(RelationGetNamespace(relation)); + relname = RelationGetRelationName(relation); + + /* + * Server-level options can be overridden by table-level options, so check + * server-level first. + */ + foreach(lc, server->options) + { + DefElem *def = (DefElem *) lfirst(lc); + + if (strcmp(def->defname, "remote_analyze") == 0) + remote_analyze = defGetBoolean(def); + } + + foreach(lc, table->options) + { + DefElem *def = (DefElem *) lfirst(lc); + + if (strcmp(def->defname, "schema_name") == 0) + remote_schemaname = defGetString(def); + else if (strcmp(def->defname, "table_name") == 0) + remote_relname = defGetString(def); + else if (strcmp(def->defname, "remote_analyze") == 0) + remote_analyze = defGetBoolean(def); + } + + /* + * Assume the relation/schema names are the same as the local name unless + * the options tell us otherwise. + */ + if (remote_schemaname == NULL) + remote_schemaname = schemaname; + if (remote_relname == NULL) + remote_relname = relname; + + /* + * Build attnum/remote-attname list. + * + */ + remattrmap = palloc_array(RemoteAttributeMapping, tupdesc->natts); + for (int i = 0; i < tupdesc->natts; i++) + { + char *attname; + char *remote_colname; + List *fc_options; + ListCell *fc_lc; + AttrNumber attnum; + + Form_pg_attribute attr = TupleDescAttr(tupdesc, i); + + /* Ignore dropped columns. */ + if (attr->attisdropped) + continue; + + /* Ignore generated columns. */ + if (attr->attgenerated) + continue; + + attname = NameStr(attr->attname); + + /* If a list is specified, exclude any attnames not in it. */ + if (!attname_in_list(attname, va_cols)) + continue; + + attnum = attr->attnum; + + /* Ignore if attstatarget is 0 */ + if (get_attstattarget(relation,attnum) == 0) + continue; + + /* If column_name is not specified, go with attname. */ + remote_colname = attname; + fc_options = GetForeignColumnOptions(RelationGetRelid(relation), attnum); + + foreach(fc_lc, fc_options) + { + DefElem *def = (DefElem *) lfirst(fc_lc); + + if (strcmp(def->defname, "column_name") == 0) + { + remote_colname = defGetString(def); + break; + } + } + + remattrmap[natts].local_attnum = attnum; + strncpy(remattrmap[natts].remote_attname, remote_colname, NAMEDATALEN); + natts++; + } + + /* Sort mapping by remote attribute name */ + qsort(remattrmap, natts, sizeof(RemoteAttributeMapping), remattrmap_cmp); + + fetch_remote_statistics(conn, remote_schemaname, remote_relname, + server_version_num, natts, remattrmap, + remote_analyze, &remstats); + + ReleaseConnection(conn); + + Assert(remstats.rel != NULL); + Assert(remstats.att != NULL); + import_fetched_statistics(schemaname, relname, server_version_num, natts, + remattrmap, &remstats); + + pfree(remattrmap); + PQclear(remstats.att); + PQclear(remstats.rel); +} + + /* * postgresGetAnalyzeInfoForForeignTable * Count tuples in foreign table (just get pg_class.reltuples). diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql index 4f7ab2ed0ac..c527a1fbb15 100644 --- a/contrib/postgres_fdw/sql/postgres_fdw.sql +++ b/contrib/postgres_fdw/sql/postgres_fdw.sql @@ -241,6 +241,7 @@ SELECT c3, c4 FROM ft1 ORDER BY c3, c1 LIMIT 1; -- should work again -- Now we should be able to run ANALYZE. -- To exercise multiple code paths, we use local stats on ft1 -- and remote-estimate mode on ft2. +ALTER SERVER loopback OPTIONS (ADD fetch_stats 'false'); ANALYZE ft1; ALTER FOREIGN TABLE ft2 OPTIONS (use_remote_estimate 'true'); @@ -1286,7 +1287,8 @@ REINDEX TABLE CONCURRENTLY reind_fdw_parent; -- ok DROP TABLE reind_fdw_parent; -- =================================================================== --- conversion error +-- conversion error, will generate a WARNING for imported stats and an +-- error on locally computed stats. -- =================================================================== ALTER FOREIGN TABLE ft1 ALTER COLUMN c8 TYPE int; SELECT * FROM ft1 ftx(x1,x2,x3,x4,x5,x6,x7,x8) WHERE x1 = 1; -- ERROR @@ -3903,6 +3905,10 @@ CREATE FOREIGN TABLE async_p2 PARTITION OF async_pt FOR VALUES FROM (2000) TO (3 SERVER loopback2 OPTIONS (table_name 'base_tbl2'); INSERT INTO async_p1 SELECT 1000 + i, i, to_char(i, 'FM0000') FROM generate_series(0, 999, 5) i; INSERT INTO async_p2 SELECT 2000 + i, i, to_char(i, 'FM0000') FROM generate_series(0, 999, 5) i; +-- Will fail because fetch_stats = true (the default) on async_p2 +ANALYZE async_pt; +-- Turn off fetch_stats at the table level. +ALTER FOREIGN TABLE async_p2 OPTIONS (ADD fetch_stats 'false'); ANALYZE async_pt; -- simple queries @@ -3940,6 +3946,10 @@ CREATE TABLE base_tbl3 (a int, b int, c text); CREATE FOREIGN TABLE async_p3 PARTITION OF async_pt FOR VALUES FROM (3000) TO (4000) SERVER loopback2 OPTIONS (table_name 'base_tbl3'); INSERT INTO async_p3 SELECT 3000 + i, i, to_char(i, 'FM0000') FROM generate_series(0, 999, 5) i; +-- Will fail because fetch_stats = true (the default) on async_p3/loopback2 +ANALYZE async_pt; +-- Turn off fetch_stats at the server level. +ALTER SERVER loopback2 OPTIONS (ADD fetch_stats 'false'); ANALYZE async_pt; EXPLAIN (VERBOSE, COSTS OFF) @@ -4387,6 +4397,40 @@ ANALYZE analyze_ftable; DROP FOREIGN TABLE analyze_ftable; DROP TABLE analyze_table; +-- =================================================================== +-- test remote analyze +-- =================================================================== +CREATE TABLE remote_analyze_table (id int, a text, b bigint); +INSERT INTO remote_analyze_table (SELECT x FROM generate_series(1,1000) x); + +CREATE FOREIGN TABLE remote_analyze_ftable (id int, a text, b bigint) + SERVER loopback + OPTIONS (table_name 'remote_analyze_table', + fetch_stats 'true', + remote_analyze 'true'); + +-- no stats before +SELECT s.tablename, COUNT(*) AS num_stats +FROM pg_stats AS s +WHERE s.schemaname = 'public' +AND s.tablename IN ('remote_analyze_table', 'remote_analyze_ftable') +GROUP BY s.tablename +ORDER BY s.tablename; + +ANALYZE remote_analyze_ftable; + +-- both stats after +SELECT s.tablename, COUNT(*) AS num_stats +FROM pg_stats AS s +WHERE s.schemaname = 'public' +AND s.tablename IN ('remote_analyze_table', 'remote_analyze_ftable') +GROUP BY s.tablename +ORDER BY s.tablename; + +-- cleanup +DROP FOREIGN TABLE remote_analyze_ftable; +DROP TABLE remote_analyze_table; + -- =================================================================== -- test for postgres_fdw_get_connections function with check_conn = true -- =================================================================== base-commit: a2e632ece1691be18771644182f769b525991f97 -- 2.52.0
