On 15.06.2018 07:36, Amit Kapila wrote:
On Fri, Jun 15, 2018 at 12:16 AM, Stephen Frost <sfr...@snowman.net> wrote:
I have tested wal_prefetch at two powerful servers with 24 cores, 3Tb NVME
RAID 10 storage device and 256Gb of RAM connected using InfiniBand.
The speed of synchronous replication between two nodes is increased from 56k
TPS to 60k TPS (on pgbench with scale 1000).
I'm also surprised that it wasn't a larger improvement.

Seems like it would make sense to implement in core using
posix_fadvise(), perhaps in the wal receiver and in RestoreArchivedFile
or nearby..  At least, that's the thinking I had when I was chatting w/
Sean.

Doing in-core certainly has some advantage such as it can easily reuse
the existing xlog code rather trying to make a copy as is currently
done in the patch, but I think it also depends on whether this is
really a win in a number of common cases or is it just a win in some
limited cases.

I am completely agree. It was my mail concern: on which use cases this prefetch will be efficient. If "full_page_writes" is on (and it is safe and default value), then first update of a page since last checkpoint will be written in WAL as full page and applying it will not require reading any data from disk. If this pages is updated multiple times in subsequent transactions, then most likely it will be still present in OS file cache, unless checkpoint interval exceeds OS cache size (amount of free memory in the system). So if this conditions are satisfied then looks like prefetch is not needed. And it seems to be true for most real configurations: checkpoint interval is rarely set larger than hundred of gigabytes and modern servers usually have more RAM.

But once this condition is not satisfied and lag is larger than size of OS cache, then prefetch can be not efficient because prefetched pages may be thrown away from OS cache before them are actually accessed by redo process. In this case extra synchronization between prefetch and replay processes is needed so that prefetch is not moving too far away from replayed LSN.

It is not a problem to integrate this code in Postgres core and run it in background worker. I do not think that performing prefetch in wal receiver process itself is good idea: it may slow down speed of receiving changes from master. And in this case I really can throw away cut&pasted code. But it is easier to experiment with extension rather than with patch to Postgres core. And I have published this extension to make it possible to perform experiments and check whether it is useful on real workloads.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Reply via email to