> > For variable-length statistics, perhaps we can do things a bit
> > differently than what is currently proposed. 0005 requires
> > a relation anytime stat update to call
> > pgstat_schedule_anytime_update(). This is done this way because
> > it allows long-running queries to update their stats every
> > stats_flush_interval using a timeout.
> >
> > But maybe what we should be doing for variable-numbered stats is
> > to schedule an anytime update whenever a "transaction goes idle".
>
> I think the logic for fixed stats and variable stats should be the same. If
> not we could observe discrepancies: for example a long running select could
> genereate reads/hits IO visible in pg_stat_io but tuples_returned, 
> tuples_fetched,
> blocks_fetched or blocks_hit would not be updated until the session goes idle.

After having more time to think about this, I believe it can be much simpler.
As soon as we enter an idle-in-transaction (aborted) state, we can simply
schedule an anytime update. This ensures that a flush is scheduled whenever
the fixed stats trigger one, which will likely be the most common reason
(e.g., I/O stats, WAL stats, etc.). To cover the cases where fixed stats
do not schedule a flush, we can also schedule one as soon as a transaction
goes idle.

In my mind, this makes this whole flushing scheduling behavior easy to reason
about, and if we introduce future anytime stats anywhere, we are not required
to schedule a flush for each individual field. The flush callback will of course
still need to decide what to flush anytime or at the transaction boundary.

What do you think?

--
Sami Imseih
Amazon Web Services (AWS)


Reply via email to