Hi all,

On Fri, Feb 13, 2026 at 11:51 PM Corey Huinker <[email protected]>
wrote:

> A few notes:
>
> - attnum variables can be of type AttrNumber
> - The SearchSysCache3 lookup can't assume inh = false. Partitioned tables
> will ONLY have inh = true. Inheritance parents will have both.
> - Same inh concerns for extended stats.
>
Done.

I am attaching v5 of the MISSING_STATS_ONLY patch.

The main changes are in how we detect existing statistics for inherited and
partitioned tables.

Previously, the column stats check assumed stainherit = false in the
syscache lookup. As Corey pointed out, that assumption is incorrect:

Partitioned tables only have stainherit = true.

Inheritance parents can have both true and false entries.

In this version, the column stats check now tries both cases (stainherit =
false and stainherit = true) before concluding that stats are missing. A
column is considered missing stats only if neither entry exists.

The same logic has been applied to extended statistics as well.

Other than this correction, the overall semantics of ANALYZE
(MISSING_STATS_ONLY) remain unchanged:

Relations with at least one analyzable attribute or extended stats object
lacking statistics are analyzed.

Relations with complete statistics are skipped.

Repeated runs converge toward a no-op once stats are fully populated.

Regression tests were updated and re-run after this change.

Please let me know if the inheritance handling now matches expectations, or
if there are additional cases I should consider.

As discussed earlier, I will start a separate thread for the modified-stats
behavior once this direction is settled.

Thanks again for the review and guidance.

Regards,
Vasuki M
C-DAC,Chennai
From 673bce4f84972c548bbf07551ffe0f2e15e31e26 Mon Sep 17 00:00:00 2001
From: Vasuki M <[email protected]>
Date: Fri, 13 Feb 2026 12:09:44 +0530
Subject: [PATCH v5] ANALYZE: add MISSING_STATS_ONLY option

Introduce a new ANALYZE option, MISSING_STATS_ONLY, which limits
ANALYZE to relations that are missing statistics.

When specified, ANALYZE will process only those relations that
have at least one analyzable attribute or extended statistics object
without corresponding entries in pg_statistic or pg_statistic_ext_data.

Relations that already have complete statistics are skipped.

This provides SQL-level functionality similar in spirit to
vacuumdb --missing-stats-only, allowing users to reduce unnecessary
work during repeated manual ANALYZE runs across many relations.

The behavior is conservative:
- Tables that have never been analyzed are processed.
- Tables that lost statistics (e.g., due to crash recovery) are processed.
- Tables with newly added columns lacking statistics are processed.
- Tables with complete statistics are skipped.

Default ANALYZE behavior is unchanged.

Regression tests are included.
---
 src/backend/commands/analyze.c                | 143 +++++++++++++++++-
 src/backend/commands/vacuum.c                 |   7 +-
 src/include/commands/vacuum.h                 |   2 +
 .../expected/analyze_missing_stats_only.out   |  54 +++++++
 src/test/regress/parallel_schedule            |   1 +
 .../sql/analyze_missing_stats_only.sql        |  47 ++++++
 6 files changed, 252 insertions(+), 2 deletions(-)
 create mode 100644 src/test/regress/expected/analyze_missing_stats_only.out
 create mode 100644 src/test/regress/sql/analyze_missing_stats_only.sql

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index a4834241..71d53877 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -29,6 +29,8 @@
 #include "catalog/index.h"
 #include "catalog/indexing.h"
 #include "catalog/pg_inherits.h"
+#include "catalog/pg_statistic_ext.h"
+#include "catalog/pg_statistic_ext_data.h"
 #include "commands/progress.h"
 #include "commands/tablecmds.h"
 #include "commands/vacuum.h"
@@ -46,6 +48,7 @@
 #include "storage/procarray.h"
 #include "utils/attoptcache.h"
 #include "utils/datum.h"
+#include "utils/fmgroids.h"
 #include "utils/guc.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
@@ -55,7 +58,6 @@
 #include "utils/syscache.h"
 #include "utils/timestamp.h"
 
-
 /* Per-index data for ANALYZE */
 typedef struct AnlIndexData
 {
@@ -96,6 +98,118 @@ static void update_attstats(Oid relid, bool inh,
 static Datum std_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
 static Datum ind_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
 
+static bool
+relation_has_missing_column_stats(Relation rel)
+{
+	AttrNumber attnum;
+
+	for (attnum = 1; attnum <= rel->rd_att->natts; attnum++)
+	{
+		VacAttrStats *stats;
+		HeapTuple	statstup;
+		bool		has_stats = false;
+
+		/*
+		 * Step 1: Is this attribute analyzable at all?
+		 * (skips dropped columns, system cols, attstattarget=0, etc.)
+		 */
+		stats = examine_attribute(rel, attnum, NULL);
+		if (stats == NULL)
+			continue;
+
+		/*
+		 * Step 2: Try lookup with stainherit = false
+		 */
+		statstup = SearchSysCache3(STATRELATTINH,
+						ObjectIdGetDatum(RelationGetRelid(rel)),
+						Int16GetDatum(attnum),
+						BoolGetDatum(false));
+
+		if (HeapTupleIsValid(statstup))
+		{
+			has_stats = true;
+			ReleaseSysCache(statstup);
+		}
+		else
+		{
+			/*
+			 * Step 3: Try lookup with stainherit = true
+			 * (needed for partitioned tables / inheritance parents)
+			 */
+			statstup = SearchSysCache3(STATRELATTINH,
+							ObjectIdGetDatum(RelationGetRelid(rel)),
+							Int16GetDatum(attnum),
+							BoolGetDatum(true));
+
+			if (HeapTupleIsValid(statstup))
+			{
+				has_stats = true;
+				ReleaseSysCache(statstup);
+			}
+		}
+
+		/*
+		 * If neither lookup found stats → this column is missing stats
+		 */
+		if (!has_stats)
+			return true;
+	}
+
+	/*
+	 * All analyzable columns have stats
+	 */
+	return false;
+}
+
+
+static bool
+relation_has_missing_extended_stats(Relation rel)
+{
+	Relation extrel;
+	SysScanDesc scan;
+	ScanKeyData key;
+	HeapTuple tup;
+
+	extrel = table_open(StatisticExtRelationId, AccessShareLock);
+
+	ScanKeyInit(&key,
+			Anum_pg_statistic_ext_stxrelid,
+			BTEqualStrategyNumber,
+			F_OIDEQ,
+			ObjectIdGetDatum(RelationGetRelid(rel)));
+
+	scan = systable_beginscan(extrel,
+					StatisticExtRelidIndexId,
+					true,
+					NULL,
+					1,
+					&key);
+
+	while (HeapTupleIsValid(tup = systable_getnext(scan)))
+	{
+		Form_pg_statistic_ext e =
+			(Form_pg_statistic_ext) GETSTRUCT(tup);
+
+		HeapTuple dtup =
+			SearchSysCache2(STATEXTDATASTXOID,
+						ObjectIdGetDatum(e->oid),
+						BoolGetDatum(false));
+
+		if (!HeapTupleIsValid(dtup))
+		{
+			systable_endscan(scan);
+			table_close(extrel, AccessShareLock);
+			return true;
+		}
+
+		ReleaseSysCache(dtup);
+	}
+
+	systable_endscan(scan);
+	table_close(extrel, AccessShareLock);
+
+	return false;
+}
 
 /*
  *	analyze_rel() -- analyze one relation
@@ -234,6 +348,29 @@ analyze_rel(Oid relid, RangeVar *relation,
 		return;
 	}
 
+	/*
+	 * ANALYZE (MISSING_STATS_ONLY):
+	 * Skip relation if it has no missing column or extended statistics.
+	 */
+	if (params.options & VACOPT_MISSING_STATS_ONLY)
+	{
+		if (!relation_has_missing_column_stats(onerel) &&
+			!relation_has_missing_extended_stats(onerel))
+		{
+			elog(DEBUG1,
+				"ANALYZE (MISSING_STATS_ONLY): skipping relation \"%s\"",
+				RelationGetRelationName(onerel));
+
+			table_close(onerel, ShareUpdateExclusiveLock);
+			return;
+		}
+
+		elog(DEBUG1,
+			"ANALYZE (MISSING_STATS_ONLY): relation eligible \"%s\"",
+			RelationGetRelationName(onerel));
+	}
+
+
 	/*
 	 * OK, let's do it.  First, initialize progress reporting.
 	 */
@@ -314,6 +451,10 @@ do_analyze_rel(Relation onerel, const VacuumParams params,
 	PgStat_Counter startreadtime = 0;
 	PgStat_Counter startwritetime = 0;
 
+	elog(DEBUG1, "ANALYZE processing relation \"%s\" (OID %u)",
+			RelationGetRelationName(onerel),
+			RelationGetRelid(onerel));
+
 	verbose = (params.options & VACOPT_VERBOSE) != 0;
 	instrument = (verbose || (AmAutoVacuumWorkerProcess() &&
 							  params.log_analyze_min_duration >= 0));
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 03932f45..20981a9d 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -165,6 +165,7 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 	BufferAccessStrategy bstrategy = NULL;
 	bool		verbose = false;
 	bool		skip_locked = false;
+	bool		missing_stats_only = false;
 	bool		analyze = false;
 	bool		freeze = false;
 	bool		full = false;
@@ -229,6 +230,9 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 
 			ring_size = result;
 		}
+		else if (strcmp(opt->defname, "missing_stats_only") == 0)
+			missing_stats_only = defGetBoolean(opt);
+
 		else if (!vacstmt->is_vacuumcmd)
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -305,6 +309,7 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 		(vacstmt->is_vacuumcmd ? VACOPT_VACUUM : VACOPT_ANALYZE) |
 		(verbose ? VACOPT_VERBOSE : 0) |
 		(skip_locked ? VACOPT_SKIP_LOCKED : 0) |
+		(missing_stats_only ? VACOPT_MISSING_STATS_ONLY :0) |
 		(analyze ? VACOPT_ANALYZE : 0) |
 		(freeze ? VACOPT_FREEZE : 0) |
 		(full ? VACOPT_FULL : 0) |
@@ -315,7 +320,7 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 		(only_database_stats ? VACOPT_ONLY_DATABASE_STATS : 0);
 
 	/* sanity checks on options */
-	Assert(params.options & (VACOPT_VACUUM | VACOPT_ANALYZE));
+	Assert(params.options & (VACOPT_VACUUM | VACOPT_ANALYZE | VACOPT_MISSING_STATS_ONLY));
 	Assert((params.options & VACOPT_VACUUM) ||
 		   !(params.options & (VACOPT_FULL | VACOPT_FREEZE)));
 
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index e885a4b9..63e9d437 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -188,6 +188,8 @@ typedef struct VacAttrStats
 #define VACOPT_DISABLE_PAGE_SKIPPING 0x100	/* don't skip any pages */
 #define VACOPT_SKIP_DATABASE_STATS 0x200	/* skip vac_update_datfrozenxid() */
 #define VACOPT_ONLY_DATABASE_STATS 0x400	/* only vac_update_datfrozenxid() */
+#define VACOPT_MISSING_STATS_ONLY	0x800	/* ANALYZE if stats are missing */
+
 
 /*
  * Values used by index_cleanup and truncate params.
diff --git a/src/test/regress/expected/analyze_missing_stats_only.out b/src/test/regress/expected/analyze_missing_stats_only.out
new file mode 100644
index 00000000..9d33cd70
--- /dev/null
+++ b/src/test/regress/expected/analyze_missing_stats_only.out
@@ -0,0 +1,54 @@
+--
+-- ANALYZE (MISSING_STATS_ONLY) regression test
+--
+-- 1. Brand new empty table
+CREATE TABLE ms1 (a int);
+-- No statistics yet → should analyze
+ANALYZE (MISSING_STATS_ONLY);
+-- pg_statistic should still be empty (table has no rows)
+SELECT count(*) AS stat_rows
+FROM pg_statistic
+WHERE starelid = 'ms1'::regclass;
+ stat_rows 
+-----------
+         0
+(1 row)
+
+-- 2. Insert data
+INSERT INTO ms1 SELECT generate_series(1,10);
+-- Stats missing → should analyze
+ANALYZE (MISSING_STATS_ONLY);
+-- Now stats must exist
+SELECT count(*) AS stat_rows
+FROM pg_statistic
+WHERE starelid = 'ms1'::regclass;
+ stat_rows 
+-----------
+         1
+(1 row)
+
+-- 3. Re-run → stats exist → should skip
+ANALYZE (MISSING_STATS_ONLY);
+-- Stats count should remain the same
+SELECT count(*) AS stat_rows
+FROM pg_statistic
+WHERE starelid = 'ms1'::regclass;
+ stat_rows 
+-----------
+         1
+(1 row)
+
+-- 4. Add new column → missing stats for column b
+ALTER TABLE ms1 ADD COLUMN b int;
+-- Should analyze again
+ANALYZE (MISSING_STATS_ONLY);
+-- Now both columns should have stats
+SELECT count(*) AS stat_rows
+FROM pg_statistic
+WHERE starelid = 'ms1'::regclass;
+ stat_rows 
+-----------
+         2
+(1 row)
+
+DROP TABLE ms1;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 549e9b2d..a64be382 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -90,6 +90,7 @@ test: rules psql psql_crosstab psql_pipeline amutils stats_ext collate.linux.utf
 test: select_parallel
 test: write_parallel
 test: vacuum_parallel
+test: analyze_missing_stats_only
 
 # Run this alone, because concurrent DROP TABLE would make non-superuser
 # "ANALYZE;" fail with "relation with OID $n does not exist".
diff --git a/src/test/regress/sql/analyze_missing_stats_only.sql b/src/test/regress/sql/analyze_missing_stats_only.sql
new file mode 100644
index 00000000..0a6ed73c
--- /dev/null
+++ b/src/test/regress/sql/analyze_missing_stats_only.sql
@@ -0,0 +1,47 @@
+--
+-- ANALYZE (MISSING_STATS_ONLY) regression test
+--
+
+-- 1. Brand new empty table
+CREATE TABLE ms1 (a int);
+
+-- No statistics yet → should analyze
+ANALYZE (MISSING_STATS_ONLY);
+
+-- pg_statistic should still be empty (table has no rows)
+SELECT count(*) AS stat_rows
+FROM pg_statistic
+WHERE starelid = 'ms1'::regclass;
+
+-- 2. Insert data
+INSERT INTO ms1 SELECT generate_series(1,10);
+
+-- Stats missing → should analyze
+ANALYZE (MISSING_STATS_ONLY);
+
+-- Now stats must exist
+SELECT count(*) AS stat_rows
+FROM pg_statistic
+WHERE starelid = 'ms1'::regclass;
+
+-- 3. Re-run → stats exist → should skip
+ANALYZE (MISSING_STATS_ONLY);
+
+-- Stats count should remain the same
+SELECT count(*) AS stat_rows
+FROM pg_statistic
+WHERE starelid = 'ms1'::regclass;
+
+-- 4. Add new column → missing stats for column b
+ALTER TABLE ms1 ADD COLUMN b int;
+
+-- Should analyze again
+ANALYZE (MISSING_STATS_ONLY);
+
+-- Now both columns should have stats
+SELECT count(*) AS stat_rows
+FROM pg_statistic
+WHERE starelid = 'ms1'::regclass;
+
+DROP TABLE ms1;
+
-- 
2.43.0

Reply via email to