On 15.06.2018 18:03, Amit Kapila wrote:
On Fri, Jun 15, 2018 at 1:08 PM, Konstantin Knizhnik
<k.knizh...@postgrespro.ru> wrote:

On 15.06.2018 07:36, Amit Kapila wrote:
On Fri, Jun 15, 2018 at 12:16 AM, Stephen Frost <sfr...@snowman.net>
wrote:
I have tested wal_prefetch at two powerful servers with 24 cores, 3Tb
NVME
RAID 10 storage device and 256Gb of RAM connected using InfiniBand.
The speed of synchronous replication between two nodes is increased from
56k
TPS to 60k TPS (on pgbench with scale 1000).
I'm also surprised that it wasn't a larger improvement.

Seems like it would make sense to implement in core using
posix_fadvise(), perhaps in the wal receiver and in RestoreArchivedFile
or nearby..  At least, that's the thinking I had when I was chatting w/
Sean.

Doing in-core certainly has some advantage such as it can easily reuse
the existing xlog code rather trying to make a copy as is currently
done in the patch, but I think it also depends on whether this is
really a win in a number of common cases or is it just a win in some
limited cases.

I am completely agree. It was my mail concern: on which use cases this
prefetch will be efficient.
If "full_page_writes" is on (and it is safe and default value), then first
update of a page since last checkpoint will be written in WAL as full page
and applying it will not require reading any data from disk.

What exactly you mean by above?  AFAIU, it needs to read WAL to apply
full page image.  See below code:

XLogReadBufferForRedoExtended()
{
..
/* If it has a full-page image and it should be restored, do it. */
if (XLogRecBlockImageApply(record, block_id))
{
Assert(XLogRecHasBlockImage(record, block_id));
*buf = XLogReadBufferExtended(rnode, forknum, blkno,
   get_cleanup_lock ? RBM_ZERO_AND_CLEANUP_LOCK : RBM_ZERO_AND_LOCK);
page = BufferGetPage(*buf);
if (!RestoreBlockImage(record, block_id, page))
..
}



Sorry, for my confusing statement.
Definitely we need to read page from WAL.
I mean that in case of "full page write" we do not need to read updated page from the database.
It can be just overwritten.

pg_prefaulter and my wal_prefetch are not prefetching WAL pages themselves.
There is no sense to do it, because them are just written by wal_receiver and so should be present in file system cache. wal_prefetch is prefetching blocks referenced by WAL records. But in case of "full page writes" such prefetch is not needed and even is harmful.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Reply via email to