> I am attaching v5 of the MISSING_STATS_ONLY patch.

Thanks for the patch. Here are some comments I have:

1/
+relation_has_missing_column_stats(Relation rel)
+{
+       AttrNumber attnum;
+
+       for (attnum = 1; attnum <= rel->rd_att->natts; attnum++)
+       {
+               VacAttrStats *stats;
+               HeapTuple       statstup;
+               bool            has_stats = false;
+
+               /*
+                * Step 1: Is this attribute analyzable at all?
+                * (skips dropped columns, system cols, attstattarget=0, etc.)
+                */
+               stats = examine_attribute(rel, attnum, NULL);

- and -

+       /*
+        * ANALYZE (MISSING_STATS_ONLY):
+        * Skip relation if it has no missing column or extended statistics.
+        */
+       if (params.options & VACOPT_MISSING_STATS_ONLY)
+       {
+               if (!relation_has_missing_column_stats(onerel) &&
+                       !relation_has_missing_extended_stats(onerel))
+               {
+                       elog(DEBUG1,
+                               "ANALYZE (MISSING_STATS_ONLY):
skipping relation \"%s\"",
+                               RelationGetRelationName(onerel));
+


This approach seems off. relation_has_missing_column_stats() and
elation_has_missing_extended_stats()
are being called early, and relation_has_missing_column_stats() a also
calls examine_attribute() for all attributes.
Later on, examine_attribute() is called again for the required
attributes, so it ends up running twice.

I think the natural thing to do here is to make
relation_has_missing_column_stats() take in
an attnum to perform the lookup in the stats catalog on a specific
attribute. This call should occur inside
examine_attribute().
examine_attribute() can also take in a boolean, missing_stats_only, so
we only look up
the stats catalog only when the missing_stats_only option is
specified. Columns that already
have stats simply return NULL, causing them to be skipped naturally.

With that, inside do_analyze_rel(), we can then check if attr_cnt = 0,
the number of expr_indexes = 0
and we are not missing extended stats, we can just skip the relation
from analyze.

Something like this:

```
if (params.options & VACOPT_MISSING_STATS_ONLY &&
    (attr_cnt == 0 && expr_cnt == 0 &&
!relation_has_missing_extended_stats(onerel)))
{
    vac_close_indexes(nindexes, Irel, NoLock);
    return;
}
```

2/
I think we need more test coverage, inheritance partitions,
declarative partitions,
index on expressions, ANALYZE with a column specified, etc.

3/

+                       elog(DEBUG1,
+                               "ANALYZE (MISSING_STATS_ONLY):
skipping relation \"%s\"",
+                               RelationGetRelationName(onerel));

I think logging should be at info level, and we should only log when
skipping a relation

The following seems unnecessary.

+               elog(DEBUG1,
+                       "ANALYZE (MISSING_STATS_ONLY): relation
eligible \"%s\"",
+                       RelationGetRelationName(onerel));
+       }
+
+
        /*
         * OK, let's do it.  First, initialize progress reporting.
         */
@@ -314,6 +451,10 @@ do_analyze_rel(Relation onerel, const VacuumParams params,
        PgStat_Counter startreadtime = 0;
        PgStat_Counter startwritetime = 0;

+       elog(DEBUG1, "ANALYZE processing relation \"%s\" (OID %u)",
+                       RelationGetRelationName(onerel),
+                       RelationGetRelid(onerel));

Also, I think it's better to use ereport like is being done for the other
ANALYZE logging, i.e.

```
                        appendStringInfo(&buf,
                                                         _("WAL usage:
%" PRId64 " records, %" PRId64 " full page images, %" PRIu64 " bytes,
%" PRIu64 " full page image bytes, %" PRId64 " buffers full\n"),
                                                         walusage.wal_records,
                                                         walusage.wal_fpi,
                                                         walusage.wal_bytes,
                                                         walusage.wal_fpi_bytes,

walusage.wal_buffers_full);
                        appendStringInfo(&buf, _("system usage: %s"),
pg_rusage_show(&ru0));

                        ereport(verbose ? INFO : LOG,
                                        (errmsg_internal("%s", buf.data)));
```

4/
Make sure to run pgindent



> And while I think you might be able to
> argue that MODIFIED_STATS should also include MISSING_STATS (I do
> wonder though, does autoanalyze do that?)

No, autoanalyze decisions are driven purely off of thresholds.

--
Sami Imseih
Amazon Web Services (AWS)


Reply via email to