Hi all, I did some more research and found this explanation in a presentation by 2ndQuadrant <https://www.2ndquadrant.com/wp-content/uploads/2019/05/Inside-the-PostgreSQL-Shared-Buffer-Cache.pdf> :
When a process wants a buffer, it asks BufferAlloc for the file/block. If > the block is already cached, it gets pinned and then returned. Otherwise, a > new buffer must be found to hold this data. If there are no buffers free > (there usually aren’t) BufferAlloc selects a buffer to evict to make space > for the new one. If that page is dirty, it is written out to disk. This can > cause the backend trying to allocate that buffer to block as it waits for > that write I/O to complete. So it seems that both reads and writes can potentially have to wait for I/O. And the bgwriter reduces the risk of hitting a dirty page and needing to write it before evicting. So perhaps the documentation should say: "There is a separate server process called the background writer, whose function is to issue writes of “dirty” (new or modified) shared buffers. This reduces the chances that a backend needing an empty buffer must write a dirty one back to disk before evicting it." Thanks, Chris. On Mon, 2 Nov 2020 at 12:38, Chris Wilson <chris+goo...@qwirx.com> wrote: > Hi all, > > Thanks Thomas. > > When the bgwriter flushes (cleans) a dirty Postgres buffer, it generates a > write() syscall of its own, which I think must increase the number of dirty > cache buffers in the Linux kernel (temporarily, until it actually flushes > those cache buffers to disk). Therefore it temporarily increases the risk > of a write stall (in any process, not just Postgres backends), is that > correct? > > I suppose that if dirty buffers are being cleaned regularly, then it > reduces the risk that (1) a Postgres backend which is writing (dirtying > buffers) suddenly needs an empty buffer when there are no clean buffers to > evict, so it needs to flush a dirty one and (2) the resulting write() > syscall would take the kernel over its background dirty limit, so the > kernel must flush it immediately, and make the backend wait. By that > mechanism I can see that it might reduce the chance of backends having to > wait, but by writing more in general (as above) it could also increase it. > > So when it says "It writes shared buffers so server processes handling > user queries seldom or never need to wait for a write to occur", is that > really justified, or is that sentence incorrect and we should remove it? Or > have I missed something? > > Thanks, Chris. > > On Sun, 1 Nov 2020 at 21:00, Thomas Munro <thomas.mu...@gmail.com> wrote: > >> On Fri, Oct 30, 2020 at 11:24 AM PG Doc comments form >> <nore...@postgresql.org> wrote: >> > The following documentation comment has been logged on the website: >> > >> > Page: https://www.postgresql.org/docs/13/runtime-config-resource.html >> > Description: >> > >> > >> https://www.postgresql.org/docs/13/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-BACKGROUND-WRITER >> > >> > says: >> > >> > "There is a separate server process called the background writer, whose >> > function is to issue writes of “dirty” (new or modified) shared >> buffers. It >> > writes shared buffers so server processes handling user queries seldom >> or >> > never need to wait for a write to occur." >> > >> > It's not clear what "wait for a write to occur" means: a write() >> syscall or >> > an fsync() syscall? >> >> It means pwrite(). That could block if your kernel cache is swamped, >> but hopefully it just copies the data into the kernel and returns. >> There is an fsync() call, but it's usually queued up for handling by >> the checkpointer process some time later. >> >