Attached now...

On Mon, 9 Jun 2025, Dimitrios Apostolou wrote:



On Mon, 14 Apr 2025, Tom Lane wrote:


 You should add your patch to the July commitfest [1] to make sure
 we don't lose track of it.

I rebased the patch (attached) and created an entry in the commitfest:

https://commitfest.postgresql.org/patch/5809/


Thanks!
Dimitris


From ea8072d7a2481db002a94c2bdc487772bc26599f Mon Sep 17 00:00:00 2001
From: Dimitrios Apostolou <ji...@qt.io>
Date: Sat, 29 Mar 2025 01:16:07 +0100
Subject: [PATCH v2] parallel pg_restore: avoid disk seeks when moving short
 distance forward

Improve the performance of parallel pg_restore (-j) from a custom format
pg_dump archive that does not include data offsets - typically happening
when pg_dump has generated it by writing to stdout instead of a file.

In this case pg_restore workers manifest constant looping of reading
small sizes (4KB) and seeking forward small lenths (around 10KB for a
compressed archive):

read(4, "..."..., 4096) = 4096
lseek(4, 55544369152, SEEK_SET)         = 55544369152
read(4, "..."..., 4096) = 4096
lseek(4, 55544381440, SEEK_SET)         = 55544381440
read(4, "..."..., 4096) = 4096
lseek(4, 55544397824, SEEK_SET)         = 55544397824
read(4, "..."..., 4096) = 4096
lseek(4, 55544414208, SEEK_SET)         = 55544414208
read(4, "..."..., 4096) = 4096
lseek(4, 55544426496, SEEK_SET)         = 55544426496

This happens as each worker scans the whole file until it finds the
entry it wants, skipping forward each block. In combination to the small
block size of the custom format dump, this causes many seeks and low
performance.

Fix by avoiding forward seeks for jumps of less than 1MB forward.
Do instead sequential reads.

Performance gain can be significant, depending on the size of the dump
and the I/O subsystem. On my local NVMe drive, read speeds for that
phase of pg_restore increased from 150MB/s to 3GB/s.
---
 src/bin/pg_dump/pg_backup_custom.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/bin/pg_dump/pg_backup_custom.c b/src/bin/pg_dump/pg_backup_custom.c
index f7c3af56304..27695e24dde 100644
--- a/src/bin/pg_dump/pg_backup_custom.c
+++ b/src/bin/pg_dump/pg_backup_custom.c
@@ -624,17 +624,21 @@ _skipData(ArchiveHandle *AH)
 	lclContext *ctx = (lclContext *) AH->formatData;
 	size_t		blkLen;
 	char	   *buf = NULL;
 	int			buflen = 0;
 
 	blkLen = ReadInt(AH);
 	while (blkLen != 0)
 	{
-		if (ctx->hasSeek)
+		/*
+		 * Sequential access is usually faster, so avoid seeking if the jump
+		 * forward is shorter than 1MB.
+		 */
+		if (ctx->hasSeek && blkLen > 1024 * 1024)
 		{
 			if (fseeko(AH->FH, blkLen, SEEK_CUR) != 0)
 				pg_fatal("error during file seek: %m");
 		}
 		else
 		{
 			if (blkLen > buflen)
 			{
-- 
2.49.0

Reply via email to