> > > Relation to vacuumdb --missing-stats-only
> > > I agree this is related but slightly different in intent. 
> > > --missing-stats-only
> > > answers “does this table have any statistics at all?”, while SMART ANALYZE
> > > answers “has this table changed since the last statistics collection?”. 
> > > Both seem
> > > useful, but they target different use cases. I see SMART ANALYZE primarily
> > > as a performance optimization for repeated manual ANALYZE runs on 
> > > mostly-static schemas.
> >
> > SMART ANALYZE is trying to answer 2 questions "which table does not
> > have any statistics at all"
> > and "has this table changed since the last statistics collection?”, right?
> >
> > So, maybe they need to be 2 separate options.
> >
> > > Although as sami said this SMART is not smart enough as it should be ,
> > > I will change name accordingly in the further patches
> >
> > Yup, I am not too fond of SMART in the name. Also, then name itself
> > is vague. SKIP_LOCKED and BUFFER_USAGE_LIMIT on the other
> > hand tell you exactly what they[re used for.
> >
>
> So, tossing out a new proposal here, which is to offer ANALYZE with 2
> new options... MISSING_STATS and MODIFIED_STATS.

Yes, that is what I am thinking as well.

> When MISSING_STATS is passed, we attempt to analyze only tables that
> have missing stats, essentially implementing a version of
> --missing-stats-only but for the ANALYZE command. In successive runs,
> this should reduce towards a no-op, although we need to decide what to
> do about system tables, which, iirc --missing-stats-only always
> assumes to be true, but this version probably doesn't want to assume
> that.

>From a quick test, I don't see system tables being treated different

```
postgres=# select max(last_analyze), max(last_autoanalyze) from
pg_stat_all_tables where relname = 'pg_class';
 max | max
-----+-----
     |
(1 row)

postgres=# delete from pg_statistic;
DELETE 417
postgres=# \! vacuumdb --analyze-only --missing-stats-only postgres
vacuumdb: vacuuming database "postgres"
postgres=# select max(last_analyze), max(last_autoanalyze) from
pg_stat_all_tables where relname = 'pg_class';
              max              | max
-------------------------------+-----
 2026-01-25 16:31:42.839329+00 |
(1 row)

postgres=# \! vacuumdb --analyze-only --missing-stats-only postgres
vacuumdb: vacuuming database "postgres"
postgres=#
postgres=# select max(last_analyze), max(last_autoanalyze) from
pg_stat_all_tables where relname = 'pg_class';
              max              | max
-------------------------------+-----
 2026-01-25 16:31:42.839329+00 |
(1 row)
```

Tables that remain empty, will always be analyzed since they will
always have "missing stats".
For example pg_sequence, if a sequence is never created. The same
applies for normal
user tables.

> When MODIFIED_STATS is passed, we would instead only analyze tables
> where some threshold of rows has been modified. I feel like the most
> obvious choice for this calculation would be based on a formula like
> "analyze threshold = analyze base threshold + analyze scale factor *
> number of tuples". Astute observers will note that this is the same
> threshold used by autoanalyze, which means if you had the same
> defaults you are just doing the work manually that autoanalyze would
> eventually get around to doing (which seems potentially useful on its
> own).

Yes, we would want to use the same calculation as autoanalyze.

> But also if these were based on gucs, the OP could modify those
> gucs to achieve their desired behavior, ie.
> set analyze_base_threshold=1; set analyze_scale_factor=0; analyze
> (modified_stats);  // this should analyze anything with 1 modified row
> Granted, I don't like that it is both more wordy than the original
> idea, and that we would need to add new gucs, but this would be pretty
> flexible.

We can either allow the threshold and scale_factor be an argument to
the option; but I do really think the GUC approach is much better.

Not in scope, but I can even see vacuum_threshold and vacuum_scale_factor
 to allow us to control VACUUM the same way.

Overall, this becomes very handy for scripting of manual ANALYZE
and VACUUM.

--
Sami Imseih
Amazon Web Services (AWS)


Reply via email to