On 2/12/21 5:46 AM, Andres Freund wrote:
Hi,
On 2021-02-12 00:42:04 +0100, Tomas Vondra wrote:
Yeah, that's a good point. I think it'd make sense to keep track of recent
FPIs and skip prefetching such blocks. But how exactly should we implement
that, how many blocks do we need to track? If you get an FPI, how long
should we skip prefetching of that block?
I don't think the history needs to be very long, for two reasons. Firstly,
the usual pattern is that we have FPI + several changes for that block
shortly after it. Secondly, maintenance_io_concurrency limits this naturally
- after crossing that, redo should place the FPI into shared buffers,
allowing us to skip the prefetch.
So I think using maintenance_io_concurrency is sufficient. We might track
more buffers to allow skipping prefetches of blocks that were evicted from
shared buffers, but that seems like an overkill.
However, maintenance_io_concurrency can be quite high, so just a simple
queue is not very suitable - searching it linearly for each block would be
too expensive. But I think we can use a simple hash table, tracking
(relfilenode, block, LSN), over-sized to minimize collisions.
Imagine it's a simple array with (2 * maintenance_io_concurrency) elements,
and whenever we prefetch a block or find an FPI, we simply add the block to
the array as determined by hash(relfilenode, block)
hashtable[hash(...)] = {relfilenode, block, LSN}
and then when deciding whether to prefetch a block, we look at that one
position. If the (relfilenode, block) match, we check the LSN and skip the
prefetch if it's sufficiently recent. Otherwise we prefetch.
I'm a bit doubtful this is really needed at this point. Yes, the
prefetching will do a buffer table lookup - but it's a lookup that
already happens today. And the patch already avoids doing a second
lookup after prefetching (by optimistically caching the last Buffer id,
and re-checking).
I think there's potential for some significant optimization going
forward, but I think it's basically optimization over what we're doing
today. As this is already a nontrivial patch, I'd argue for doing so
separately.
I agree with treating this as an improvement - it's not something that
needs to be solved in the first verson. OTOH I think Stephen has a point
that just skipping FPIs like we do now has limited effect, because the
WAL usually contains additional changes to the same block.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company