On 2/12/21 5:46 AM, Andres Freund wrote:
Hi, On 2021-02-12 00:42:04 +0100, Tomas Vondra wrote:Yeah, that's a good point. I think it'd make sense to keep track of recent FPIs and skip prefetching such blocks. But how exactly should we implement that, how many blocks do we need to track? If you get an FPI, how long should we skip prefetching of that block? I don't think the history needs to be very long, for two reasons. Firstly, the usual pattern is that we have FPI + several changes for that block shortly after it. Secondly, maintenance_io_concurrency limits this naturally - after crossing that, redo should place the FPI into shared buffers, allowing us to skip the prefetch. So I think using maintenance_io_concurrency is sufficient. We might track more buffers to allow skipping prefetches of blocks that were evicted from shared buffers, but that seems like an overkill. However, maintenance_io_concurrency can be quite high, so just a simple queue is not very suitable - searching it linearly for each block would be too expensive. But I think we can use a simple hash table, tracking (relfilenode, block, LSN), over-sized to minimize collisions. Imagine it's a simple array with (2 * maintenance_io_concurrency) elements, and whenever we prefetch a block or find an FPI, we simply add the block to the array as determined by hash(relfilenode, block) hashtable[hash(...)] = {relfilenode, block, LSN} and then when deciding whether to prefetch a block, we look at that one position. If the (relfilenode, block) match, we check the LSN and skip the prefetch if it's sufficiently recent. Otherwise we prefetch.I'm a bit doubtful this is really needed at this point. Yes, the prefetching will do a buffer table lookup - but it's a lookup that already happens today. And the patch already avoids doing a second lookup after prefetching (by optimistically caching the last Buffer id, and re-checking). I think there's potential for some significant optimization going forward, but I think it's basically optimization over what we're doing today. As this is already a nontrivial patch, I'd argue for doing so separately.
I agree with treating this as an improvement - it's not something that needs to be solved in the first verson. OTOH I think Stephen has a point that just skipping FPIs like we do now has limited effect, because the WAL usually contains additional changes to the same block.
regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
