Hi, On 2024-10-02 18:36:44 +0200, Tomas Vondra wrote: > On 10/2/24 17:02, Tony Wayne wrote: > > > > > > On Wed, Oct 2, 2024 at 8:14 PM Laurenz Albe <laurenz.a...@cybertec.at > > <mailto:laurenz.a...@cybertec.at>> wrote: > > > > On Wed, 2024-10-02 at 16:48 +0800, wenhui qiu wrote: > > > Whenever I check the checkpoint information in a log, most dirty > > pages are written by the checkpoint process > > > > That's exactly how it should be! > > > > is it because if bgwriter frequently flushes, the disk io will be more?🤔 > > Yes, pretty much. But it's also about where the writes happen. > > Checkpoint flushes dirty buffers only once per checkpoint interval, > which is the lowest amount of write I/O that needs to happen. > > Every other way of flushing buffers is less efficient, and is mostly a > sign of memory pressure (shared buffers not large enough for active part > of the data).
It's implied, but to make it more explicit: One big efficiency advantage of writes by checkpointer is that they are sorted and can often be combined into larger writes. That's often a lot more efficient: For network attached storage it saves you iops, for local SSDs it's much friendlier to wear leveling. > But it's also happens about where the writes happen. Checkpoint does > that in the background, not as part of regular query execution. What we > don't want is for the user backends to flush buffers, because it's > expensive and can cause result in much higher latency. > > The bgwriter is somewhere in between - it's happens in the background, > but may not be as efficient as doing it in the checkpointer. Still much > better than having to do this in regular backends. Another aspect is that checkpointer's writes are much easier to pace over time than e.g. bgwriters, because bgwriter is triggered by a fairly short term signal. Eventually we'll want to combine writes by bgwriter too, but that's always going to be more expensive than doing it in a large batched fashion like checkpointer does. I think we could improve checkpointer's pacing further, fwiw, by taking into account that the WAL volume at the start of a spread-out checkpoint typically is bigger than at the end. Greetings, Andres Freund