Hi, On 2025-02-26 15:37:10 +0900, Michael Paquier wrote: > That's bad, worse for a logical WAL sender, because it means that we > have no idea what kind of I/O happens in this process until it exits, > and logical WAL senders could loop forever, since v16 where we've > begun tracking I/O.
FWIW, I think medium term we need to work on splitting stats flushing into two separate kinds of flushes: 1) non-transactional stats, which should be flushed at a regular interval, unless a process is completely idle 2) transaction stats, which can only be flushed at transaction boundaries, because before the transaction boundary we don't know if e.g. newly inserted rows should be counted as live or dead So far we have some timer logic for 2), but we have basically no support for 1). Which means we have weird ad-hoc logic in various kinds of non-plain-connection processes. And that will often have holes, as Bertrand noticed here. I think it's also bad that we don't have a solution for 1), even just for normal connections. If a backend causes a lot of IO we might want to know about that long before the longrunning transaction commits. I suspect the right design here would be to have a generalized form of the timeout mechanism we have for 2). For that we'd need to make sure that pgstat_report_stat() can be safely called inside a transaction. The second part would be to redesign the IdleStatsUpdateTimeoutPending mechanism so it is triggered independent of idleness, without introducing unacceptable overhead - I think that's doable. Greetings, Andres Freund