On Sun, Jan 13, 2019 at 10:35:55AM +1300, Thomas Munro wrote: > 1. We need a new "bgreader" process to do read-ahead. I think you'd > want a way to tell it with explicit hints (for example, perhaps > sequential scans would advertise that they're reading sequentially so > that it starts to slurp future blocks into the buffer pool, and > streaming replicas might look ahead in the WAL and tell it what's > coming). In theory this might be better than the heuristics OSes use > to guess our access pattern and pre-fetch into the page cache, since > we have better information (and of course we're skipping a buffer > layer).
Yes, that could be interesting mainly for analytics by being able to snipe better than the OS readahead. > 2. We need a new kind of bgwriter/syncer that aggressively creates > clean pages so that foreground processes rarely have to evict (since > that is now super slow), but also efficiently finds ranges of dirty > blocks that it can write in big sequential chunks. Okay, that's a new idea. A bgwriter able to do syncs in chunks would be also interesting with O_DIRECT, no? > 3. We probably want SLRUs to use the main buffer pool, instead of > their own mini-pools, so they can benefit from the above. Wasn't there a thread about that on -hackers actually? I cannot see any reference to it. > Whether we need multiple bgreader and bgwriter processes or perhaps a > general IO scheduler process may depend on whether we also want to > switch to async (multiplexing from a single process). Starting simple > with a traditional sync IO and N processes seems OK to me. So you mean that we could just have a simple switch as a first step? Or I misunderstood you :) One of the reasons why I have begun this thread is that since we have heard about the fsync issues on Linux, I think that there is room for giving our user base more control of their fate without relying on the Linux community decisions to potentially eat data and corrupt a cluster with a page dirty bit cleared without its data actually flushed. Even the latest kernels are not fixing all the patterns with open fds across processes, switching the problem from one corner of the table to another, and there are folks patching the Linux kernel to make Postgres more reliable from this perspective, and living happily with this option. As long as the option can be controlled and defaults to false, it seems to be that we could do something. Even if the performance is bad, this gives the user control of how he/she wants things to be done. -- Michael
signature.asc
Description: PGP signature