> I am attaching v5 of the MISSING_STATS_ONLY patch.
Thanks for the patch. Here are some comments I have:
1/
+relation_has_missing_column_stats(Relation rel)
+{
+ AttrNumber attnum;
+
+ for (attnum = 1; attnum <= rel->rd_att->natts; attnum++)
+ {
+ VacAttrStats *stats;
+ HeapTuple statstup;
+ bool has_stats = false;
+
+ /*
+ * Step 1: Is this attribute analyzable at all?
+ * (skips dropped columns, system cols, attstattarget=0, etc.)
+ */
+ stats = examine_attribute(rel, attnum, NULL);
- and -
+ /*
+ * ANALYZE (MISSING_STATS_ONLY):
+ * Skip relation if it has no missing column or extended statistics.
+ */
+ if (params.options & VACOPT_MISSING_STATS_ONLY)
+ {
+ if (!relation_has_missing_column_stats(onerel) &&
+ !relation_has_missing_extended_stats(onerel))
+ {
+ elog(DEBUG1,
+ "ANALYZE (MISSING_STATS_ONLY):
skipping relation \"%s\"",
+ RelationGetRelationName(onerel));
+
This approach seems off. relation_has_missing_column_stats() and
elation_has_missing_extended_stats()
are being called early, and relation_has_missing_column_stats() a also
calls examine_attribute() for all attributes.
Later on, examine_attribute() is called again for the required
attributes, so it ends up running twice.
I think the natural thing to do here is to make
relation_has_missing_column_stats() take in
an attnum to perform the lookup in the stats catalog on a specific
attribute. This call should occur inside
examine_attribute().
examine_attribute() can also take in a boolean, missing_stats_only, so
we only look up
the stats catalog only when the missing_stats_only option is
specified. Columns that already
have stats simply return NULL, causing them to be skipped naturally.
With that, inside do_analyze_rel(), we can then check if attr_cnt = 0,
the number of expr_indexes = 0
and we are not missing extended stats, we can just skip the relation
from analyze.
Something like this:
```
if (params.options & VACOPT_MISSING_STATS_ONLY &&
(attr_cnt == 0 && expr_cnt == 0 &&
!relation_has_missing_extended_stats(onerel)))
{
vac_close_indexes(nindexes, Irel, NoLock);
return;
}
```
2/
I think we need more test coverage, inheritance partitions,
declarative partitions,
index on expressions, ANALYZE with a column specified, etc.
3/
+ elog(DEBUG1,
+ "ANALYZE (MISSING_STATS_ONLY):
skipping relation \"%s\"",
+ RelationGetRelationName(onerel));
I think logging should be at info level, and we should only log when
skipping a relation
The following seems unnecessary.
+ elog(DEBUG1,
+ "ANALYZE (MISSING_STATS_ONLY): relation
eligible \"%s\"",
+ RelationGetRelationName(onerel));
+ }
+
+
/*
* OK, let's do it. First, initialize progress reporting.
*/
@@ -314,6 +451,10 @@ do_analyze_rel(Relation onerel, const VacuumParams params,
PgStat_Counter startreadtime = 0;
PgStat_Counter startwritetime = 0;
+ elog(DEBUG1, "ANALYZE processing relation \"%s\" (OID %u)",
+ RelationGetRelationName(onerel),
+ RelationGetRelid(onerel));
Also, I think it's better to use ereport like is being done for the other
ANALYZE logging, i.e.
```
appendStringInfo(&buf,
_("WAL usage:
%" PRId64 " records, %" PRId64 " full page images, %" PRIu64 " bytes,
%" PRIu64 " full page image bytes, %" PRId64 " buffers full\n"),
walusage.wal_records,
walusage.wal_fpi,
walusage.wal_bytes,
walusage.wal_fpi_bytes,
walusage.wal_buffers_full);
appendStringInfo(&buf, _("system usage: %s"),
pg_rusage_show(&ru0));
ereport(verbose ? INFO : LOG,
(errmsg_internal("%s", buf.data)));
```
4/
Make sure to run pgindent
> And while I think you might be able to
> argue that MODIFIED_STATS should also include MISSING_STATS (I do
> wonder though, does autoanalyze do that?)
No, autoanalyze decisions are driven purely off of thresholds.
--
Sami Imseih
Amazon Web Services (AWS)