On Fri, Jun 15, 2018 at 1:08 PM, Konstantin Knizhnik <k.knizh...@postgrespro.ru> wrote: > > > On 15.06.2018 07:36, Amit Kapila wrote: >> >> On Fri, Jun 15, 2018 at 12:16 AM, Stephen Frost <sfr...@snowman.net> >> wrote: >>>> >>>> I have tested wal_prefetch at two powerful servers with 24 cores, 3Tb >>>> NVME >>>> RAID 10 storage device and 256Gb of RAM connected using InfiniBand. >>>> The speed of synchronous replication between two nodes is increased from >>>> 56k >>>> TPS to 60k TPS (on pgbench with scale 1000). >>> >>> I'm also surprised that it wasn't a larger improvement. >>> >>> Seems like it would make sense to implement in core using >>> posix_fadvise(), perhaps in the wal receiver and in RestoreArchivedFile >>> or nearby.. At least, that's the thinking I had when I was chatting w/ >>> Sean. >>> >> Doing in-core certainly has some advantage such as it can easily reuse >> the existing xlog code rather trying to make a copy as is currently >> done in the patch, but I think it also depends on whether this is >> really a win in a number of common cases or is it just a win in some >> limited cases. >> > I am completely agree. It was my mail concern: on which use cases this > prefetch will be efficient. > If "full_page_writes" is on (and it is safe and default value), then first > update of a page since last checkpoint will be written in WAL as full page > and applying it will not require reading any data from disk. >
What exactly you mean by above? AFAIU, it needs to read WAL to apply full page image. See below code: XLogReadBufferForRedoExtended() { .. /* If it has a full-page image and it should be restored, do it. */ if (XLogRecBlockImageApply(record, block_id)) { Assert(XLogRecHasBlockImage(record, block_id)); *buf = XLogReadBufferExtended(rnode, forknum, blkno, get_cleanup_lock ? RBM_ZERO_AND_CLEANUP_LOCK : RBM_ZERO_AND_LOCK); page = BufferGetPage(*buf); if (!RestoreBlockImage(record, block_id, page)) .. } -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com