I wrote:
> Current status is that I've filed a bug report with Apple and am waiting
> to see their response before deciding what to do next.  If they fix the
> issue promptly then there's little need for us to do anything.

Not having heard a peep from Apple yet, I decided to do a bit more
experimenting.  I found that indeed, issuing fewer and larger mmap/msync
requests helps enormously.  If you're willing to go as far as issuing
only one per file, the speed seems on par with non-fsync.  But that
requires being able to mmap 1GB-sized files, so it's probably not
something we want to do.

What I did instead was to adjust the logic in copy_file() so that the
unit of flush requests can be a multiple of the unit of read/write
requests.  (My original thought of just raising the buffer size seems
like not as good an idea; it's less cache-friendly for one thing.)

I find that on both Linux and macOS-with-HFS, requesting a flush only
every 1MB seems to be a win compared to flushing every 64KB as we
currently do.  Actually it seems that on macOS, every increment of
increase in the flush distance helps, but with HFS you hit diminishing
returns after 1MB or so.  With APFS you need a flush distance of 32MB
or more to have credible performance.

Accordingly I propose the attached patch.  If anyone's interested in
experimenting on other platforms, we might be able to refine/complicate
the FLUSH_DISTANCE selection further, but I think this is probably good
enough for a first cut.

                        regards, tom lane

diff --git a/src/backend/storage/file/copydir.c b/src/backend/storage/file/copydir.c
index a5e074e..eae9f5a 100644
--- a/src/backend/storage/file/copydir.c
+++ b/src/backend/storage/file/copydir.c
@@ -139,10 +139,24 @@ copy_file(char *fromfile, char *tofile)
 	int			dstfd;
 	int			nbytes;
 	off_t		offset;
+	off_t		flush_offset;
-	/* Use palloc to ensure we get a maxaligned buffer */
+	/* Size of copy buffer (read and write requests) */
 #define COPY_BUF_SIZE (8 * BLCKSZ)
+	/*
+	 * Size of data flush requests.  It seems beneficial on most platforms to
+	 * do this every 1MB or so.  But macOS, at least with early releases of
+	 * APFS, is really unfriendly to small mmap/msync requests, so there do it
+	 * only every 32MB.
+	 */
+#if defined(__darwin__)
+#define FLUSH_DISTANCE (32 * 1024 * 1024)
+#define FLUSH_DISTANCE (1024 * 1024)
+	/* Use palloc to ensure we get a maxaligned buffer */
 	buffer = palloc(COPY_BUF_SIZE);
@@ -163,11 +177,23 @@ copy_file(char *fromfile, char *tofile)
 	 * Do the data copying.
+	flush_offset = 0;
 	for (offset = 0;; offset += nbytes)
 		/* If we got a cancel signal during the copy of the file, quit */
+		/*
+		 * We fsync the files later, but during the copy, flush them every so
+		 * often to avoid spamming the cache and hopefully get the kernel to
+		 * start writing them out before the fsync comes.
+		 */
+		if (offset - flush_offset >= FLUSH_DISTANCE)
+		{
+			pg_flush_data(dstfd, flush_offset, offset - flush_offset);
+			flush_offset = offset;
+		}
 		nbytes = read(srcfd, buffer, COPY_BUF_SIZE);
@@ -190,15 +216,11 @@ copy_file(char *fromfile, char *tofile)
 					 errmsg("could not write to file \"%s\": %m", tofile)));
-		/*
-		 * We fsync the files later but first flush them to avoid spamming the
-		 * cache and hopefully get the kernel to start writing them out before
-		 * the fsync comes.
-		 */
-		pg_flush_data(dstfd, offset, nbytes);
+	if (offset > flush_offset)
+		pg_flush_data(dstfd, flush_offset, offset - flush_offset);
 	if (CloseTransientFile(dstfd))
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to