On Tue, Nov 2, 2021 at 11:18 PM Jan Wieck <j...@wi3ck.info> wrote: > The thing I don't want to see us doing is *nothing at all* when pretty > much everyone with some customer experience in the field is saying "this > is the information we want to see post incident and nobody has it so we > sit there waiting for the next time it happens."
Quite so. I'm not convinced that the proposal to log checkpoints only when they're triggered by WAL rather than by time is really solving anything. It isn't as if a time-based checkpoint couldn't have caused a problem. What you're going to be looking for is something much more complicated than that. Were the fsyncs slow? Did the checkpoint around the time the user reported a problem write significantly more data than the other checkpoints? I guess if a checkpoint wrote 1MB of data and took 0.1 seconds to complete the fsyncs, I don't much care whether it shows up in the log or not. If it wrote 4GB of data, or if it took 15 seconds to complete the fsyncs, I care. That's easily enough to account for some problem that somebody had. I'm not sure whether there are any other interesting criteria. -- Robert Haas EDB: http://www.enterprisedb.com