> > For variable-length statistics, perhaps we can do things a bit > > differently than what is currently proposed. 0005 requires > > a relation anytime stat update to call > > pgstat_schedule_anytime_update(). This is done this way because > > it allows long-running queries to update their stats every > > stats_flush_interval using a timeout. > > > > But maybe what we should be doing for variable-numbered stats is > > to schedule an anytime update whenever a "transaction goes idle". > > I think the logic for fixed stats and variable stats should be the same. If > not we could observe discrepancies: for example a long running select could > genereate reads/hits IO visible in pg_stat_io but tuples_returned, > tuples_fetched, > blocks_fetched or blocks_hit would not be updated until the session goes idle.
After having more time to think about this, I believe it can be much simpler. As soon as we enter an idle-in-transaction (aborted) state, we can simply schedule an anytime update. This ensures that a flush is scheduled whenever the fixed stats trigger one, which will likely be the most common reason (e.g., I/O stats, WAL stats, etc.). To cover the cases where fixed stats do not schedule a flush, we can also schedule one as soon as a transaction goes idle. In my mind, this makes this whole flushing scheduling behavior easy to reason about, and if we introduce future anytime stats anywhere, we are not required to schedule a flush for each individual field. The flush callback will of course still need to decide what to flush anytime or at the transaction boundary. What do you think? -- Sami Imseih Amazon Web Services (AWS)
