Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra

2017-11-08 Thread Tom Lane
Peter Eisentraut  writes:
> On 10/7/17 16:46, Tom Lane wrote:
>> Accordingly I propose the attached patch.  If anyone's interested in
>> experimenting on other platforms, we might be able to refine/complicate
>> the FLUSH_DISTANCE selection further, but I think this is probably good
>> enough for a first cut.

> What is the status of this?  Is performance on High Sierra still bad?

I committed the fix at 643c27e36.  If Apple have done anything about the
underlying problem, you couldn't tell it from their non-response to my
bug report.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra

2017-11-08 Thread Andres Freund


On November 8, 2017 7:31:17 AM PST, Peter Eisentraut 
 wrote:
>On 10/7/17 16:46, Tom Lane wrote:
>> I wrote:
>>> Current status is that I've filed a bug report with Apple and am
>waiting
>>> to see their response before deciding what to do next.  If they fix
>the
>>> issue promptly then there's little need for us to do anything.
>
>> Accordingly I propose the attached patch.  If anyone's interested in
>> experimenting on other platforms, we might be able to
>refine/complicate
>> the FLUSH_DISTANCE selection further, but I think this is probably
>good
>> enough for a first cut.
>
>What is the status of this?  Is performance on High Sierra still bad?

Didn't Tom commit a workaround?

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra

2017-11-08 Thread Peter Eisentraut
On 10/7/17 16:46, Tom Lane wrote:
> I wrote:
>> Current status is that I've filed a bug report with Apple and am waiting
>> to see their response before deciding what to do next.  If they fix the
>> issue promptly then there's little need for us to do anything.

> Accordingly I propose the attached patch.  If anyone's interested in
> experimenting on other platforms, we might be able to refine/complicate
> the FLUSH_DISTANCE selection further, but I think this is probably good
> enough for a first cut.

What is the status of this?  Is performance on High Sierra still bad?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra

2017-10-07 Thread Tom Lane
I wrote:
> Current status is that I've filed a bug report with Apple and am waiting
> to see their response before deciding what to do next.  If they fix the
> issue promptly then there's little need for us to do anything.

Not having heard a peep from Apple yet, I decided to do a bit more
experimenting.  I found that indeed, issuing fewer and larger mmap/msync
requests helps enormously.  If you're willing to go as far as issuing
only one per file, the speed seems on par with non-fsync.  But that
requires being able to mmap 1GB-sized files, so it's probably not
something we want to do.

What I did instead was to adjust the logic in copy_file() so that the
unit of flush requests can be a multiple of the unit of read/write
requests.  (My original thought of just raising the buffer size seems
like not as good an idea; it's less cache-friendly for one thing.)

I find that on both Linux and macOS-with-HFS, requesting a flush only
every 1MB seems to be a win compared to flushing every 64KB as we
currently do.  Actually it seems that on macOS, every increment of
increase in the flush distance helps, but with HFS you hit diminishing
returns after 1MB or so.  With APFS you need a flush distance of 32MB
or more to have credible performance.

Accordingly I propose the attached patch.  If anyone's interested in
experimenting on other platforms, we might be able to refine/complicate
the FLUSH_DISTANCE selection further, but I think this is probably good
enough for a first cut.

regards, tom lane

diff --git a/src/backend/storage/file/copydir.c b/src/backend/storage/file/copydir.c
index a5e074e..eae9f5a 100644
--- a/src/backend/storage/file/copydir.c
+++ b/src/backend/storage/file/copydir.c
@@ -139,10 +139,24 @@ copy_file(char *fromfile, char *tofile)
 	int			dstfd;
 	int			nbytes;
 	off_t		offset;
+	off_t		flush_offset;
 
-	/* Use palloc to ensure we get a maxaligned buffer */
+	/* Size of copy buffer (read and write requests) */
 #define COPY_BUF_SIZE (8 * BLCKSZ)
 
+	/*
+	 * Size of data flush requests.  It seems beneficial on most platforms to
+	 * do this every 1MB or so.  But macOS, at least with early releases of
+	 * APFS, is really unfriendly to small mmap/msync requests, so there do it
+	 * only every 32MB.
+	 */
+#if defined(__darwin__)
+#define FLUSH_DISTANCE (32 * 1024 * 1024)
+#else
+#define FLUSH_DISTANCE (1024 * 1024)
+#endif
+
+	/* Use palloc to ensure we get a maxaligned buffer */
 	buffer = palloc(COPY_BUF_SIZE);
 
 	/*
@@ -163,11 +177,23 @@ copy_file(char *fromfile, char *tofile)
 	/*
 	 * Do the data copying.
 	 */
+	flush_offset = 0;
 	for (offset = 0;; offset += nbytes)
 	{
 		/* If we got a cancel signal during the copy of the file, quit */
 		CHECK_FOR_INTERRUPTS();
 
+		/*
+		 * We fsync the files later, but during the copy, flush them every so
+		 * often to avoid spamming the cache and hopefully get the kernel to
+		 * start writing them out before the fsync comes.
+		 */
+		if (offset - flush_offset >= FLUSH_DISTANCE)
+		{
+			pg_flush_data(dstfd, flush_offset, offset - flush_offset);
+			flush_offset = offset;
+		}
+
 		pgstat_report_wait_start(WAIT_EVENT_COPY_FILE_READ);
 		nbytes = read(srcfd, buffer, COPY_BUF_SIZE);
 		pgstat_report_wait_end();
@@ -190,15 +216,11 @@ copy_file(char *fromfile, char *tofile)
 	 errmsg("could not write to file \"%s\": %m", tofile)));
 		}
 		pgstat_report_wait_end();
-
-		/*
-		 * We fsync the files later but first flush them to avoid spamming the
-		 * cache and hopefully get the kernel to start writing them out before
-		 * the fsync comes.
-		 */
-		pg_flush_data(dstfd, offset, nbytes);
 	}
 
+	if (offset > flush_offset)
+		pg_flush_data(dstfd, flush_offset, offset - flush_offset);
+
 	if (CloseTransientFile(dstfd))
 		ereport(ERROR,
 (errcode_for_file_access(),

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra

2017-10-04 Thread Tom Lane
Brent Dearth  writes:
> Is there an issue tracker I could be looking at to follow along on the
> progress on this issue?

This email thread is pretty much it ...

Current status is that I've filed a bug report with Apple and am waiting
to see their response before deciding what to do next.  If they fix the
issue promptly then there's little need for us to do anything.

If you want to help move things along, it would be useful to run some
experiments and see if there's a way to ameliorate the problem short of
the brute-force answer of disabling copy_file()'s pg_flush_data() call.
One thing that occurs to me is that even a 64KB file copy buffer is pretty
small on any modern machine.  If we were to up that to, say, 1MB, does
that help any?  See COPY_BUF_SIZE in src/backend/storage/file/copydir.c.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra

2017-10-04 Thread Brent Dearth
Tom, Andres -

Is there an issue tracker I could be looking at to follow along on the
progress on this issue?

Thanks so much!

On Mon, Oct 2, 2017 at 9:06 PM, Tom Lane  wrote:

> Andres Freund  writes:
> > On 2017-10-02 19:50:51 -0400, Tom Lane wrote:
> >> What I saw was that the backend process was consuming 100% of (one) CPU,
> >> while the I/O transaction rate viewed by "iostat 1" started pretty low
> >> --- under 10% of what the machine is capable of --- and dropped from
> >> there as the copy proceeded.  I did not think to check if that was user
> >> or kernel-space CPU, but I imagine it has to be the latter.
>
> > So that's pretty clearly a kernel bug... Hm. I wonder if it's mmap() or
> > msync() that's the problem here. I guess you didn't run a profile?
>
> Interestingly, profiling with Activity Monitor seems to blame the problem
> entirely on munmap() ... which squares with the place I hit every time
> when randomly stopping the process with gdb^Hlldb, so I'm inclined to
> believe it.
>
> This still offers no insight as to why CREATE DATABASE is hitting the
> problem while regular flush activity doesn't.
>
> > One interesting thing here is that in the CREATE DATABASE case there'll
> > probably be a lot larger contiguous mappings than in *_flush_after
> > cases. So it might be related to the size of the mapping / flush "unit".
>
> Meh, the mapping is only 64K in this case vs. 8K in the other.  Hard
> to credit that it breaks that easily.
>
> regards, tom lane
>


Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra

2017-10-02 Thread Tom Lane
Andres Freund  writes:
> On 2017-10-02 19:50:51 -0400, Tom Lane wrote:
>> What I saw was that the backend process was consuming 100% of (one) CPU,
>> while the I/O transaction rate viewed by "iostat 1" started pretty low
>> --- under 10% of what the machine is capable of --- and dropped from
>> there as the copy proceeded.  I did not think to check if that was user
>> or kernel-space CPU, but I imagine it has to be the latter.

> So that's pretty clearly a kernel bug... Hm. I wonder if it's mmap() or
> msync() that's the problem here. I guess you didn't run a profile?

Interestingly, profiling with Activity Monitor seems to blame the problem
entirely on munmap() ... which squares with the place I hit every time
when randomly stopping the process with gdb^Hlldb, so I'm inclined to
believe it.

This still offers no insight as to why CREATE DATABASE is hitting the
problem while regular flush activity doesn't.

> One interesting thing here is that in the CREATE DATABASE case there'll
> probably be a lot larger contiguous mappings than in *_flush_after
> cases. So it might be related to the size of the mapping / flush "unit".

Meh, the mapping is only 64K in this case vs. 8K in the other.  Hard
to credit that it breaks that easily.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra

2017-10-02 Thread Andres Freund
On 2017-10-02 19:50:51 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-10-02 18:33:17 -0400, Tom Lane wrote:
> >> I'm kind of surprised that machine B doesn't show obvious tanking in this
> >> test given that msync() makes it suck so badly at copying a database.
> >> I wonder what is different from the kernel's standpoint ... maybe the
> >> sheer number of different files mmap'd by a single process during the
> >> copy?
> 
> > Yea, that's curious. I've really no clue about OSX, so pardon my
> > question: With HEAD and CREATE DATABASE, is it IO blocked or kernel cpu
> > blocked?
> 
> What I saw was that the backend process was consuming 100% of (one) CPU,
> while the I/O transaction rate viewed by "iostat 1" started pretty low
> --- under 10% of what the machine is capable of --- and dropped from
> there as the copy proceeded.  I did not think to check if that was user
> or kernel-space CPU, but I imagine it has to be the latter.

So that's pretty clearly a kernel bug... Hm. I wonder if it's mmap() or
msync() that's the problem here. I guess you didn't run a profile?

One interesting thing here is that in the CREATE DATABASE case there'll
probably be a lot larger contiguous mappings than in *_flush_after
cases. So it might be related to the size of the mapping / flush "unit".

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra

2017-10-02 Thread Tom Lane
Andres Freund  writes:
> On 2017-10-02 18:33:17 -0400, Tom Lane wrote:
>> I'm kind of surprised that machine B doesn't show obvious tanking in this
>> test given that msync() makes it suck so badly at copying a database.
>> I wonder what is different from the kernel's standpoint ... maybe the
>> sheer number of different files mmap'd by a single process during the
>> copy?

> Yea, that's curious. I've really no clue about OSX, so pardon my
> question: With HEAD and CREATE DATABASE, is it IO blocked or kernel cpu
> blocked?

What I saw was that the backend process was consuming 100% of (one) CPU,
while the I/O transaction rate viewed by "iostat 1" started pretty low
--- under 10% of what the machine is capable of --- and dropped from
there as the copy proceeded.  I did not think to check if that was user
or kernel-space CPU, but I imagine it has to be the latter.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra

2017-10-02 Thread Andres Freund
Hi,

On 2017-10-02 18:33:17 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > To demonstrate what I'm observing here, on linux with a fairly fast ssd:
> > ...
> 
> I tried to replicate this test as closely as I could on the Mac hardware
> I have laying about.

Thanks!

> I only bothered with the synchronous_commit=off case, though, since
> you say that shows the worst effects.

That's the case on linux at least, but I'd suspect it's a much more
general thing - you just can't get that much data dirty with pgbench
otherwise.


> I'm kind of surprised that machine B doesn't show obvious tanking in this
> test given that msync() makes it suck so badly at copying a database.
> I wonder what is different from the kernel's standpoint ... maybe the
> sheer number of different files mmap'd by a single process during the
> copy?

Yea, that's curious. I've really no clue about OSX, so pardon my
question: With HEAD and CREATE DATABASE, is it IO blocked or kernel cpu
blocked?


> If we could arrange to not use pg_flush_after in copydir.c on macOS,
> I'd be okay with leaving it alone for the configurable flush_after
> calls.  But I can't think of a way to do that that wouldn't be a
> complete kluge.  I don't much want to do
> 
> +#ifndef __darwin__
>   pg_flush_data(dstfd, offset, nbytes);
> +#endif
> 
> but I don't see any better alternative ...

What we could do is introduce a guc (~create_database_flush_data) that
controls whether we flush here, and have the defaults set differently on
OSX. Not sure if that's actually any better.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra

2017-10-02 Thread Tom Lane
Andres Freund  writes:
> To demonstrate what I'm observing here, on linux with a fairly fast ssd:
> ...

I tried to replicate this test as closely as I could on the Mac hardware
I have laying about.  I only bothered with the synchronous_commit=off
case, though, since you say that shows the worst effects.  I used the
same parameters you did and the same pgbench settings.  I attach the
pgbench output for six cases, flush_after disabled or enabled on three
different machines:

(A) 2016 MacBook Pro, 2.7GHz i7 + SSD, Sierra, HFS+ file system
(B) 2013 MacBook Pro, 2.3GHz i7 + SSD, High Sierra, APFS file system
(C) 2012 Mac Mini, 2.3GHz i7 + 5400-RPM SATA, High Sierra, HFS+ file system

There is some benefit on the SSD machines, but it's in the range of a
few percent --- clearly, these kernels are not as subject to the basic
I/O-scheduling problem as Linux is.  The spinning-rust machine shows a
nice gain in overall TPS with flush enabled, but it's actually a bit
worse off in terms of the worst-case slowdown --- note that only that
case shows things coming to a complete halt.  It'd be interesting to
check the behavior of a pre-High-Sierra kernel with spinning rust,
but I don't have any remotely modern machine answering that description.

I'm kind of surprised that machine B doesn't show obvious tanking in this
test given that msync() makes it suck so badly at copying a database.
I wonder what is different from the kernel's standpoint ... maybe the
sheer number of different files mmap'd by a single process during the
copy?

> What I'm basically wondering is whether we're screwing somebody over
> that made the effort to manually configure this on OSX. It's fairly
> obvious we need to find a way to disable the msync() by default.

I suspect that anybody who cares about DB performance on macOS will
be running it on SSD-based hardware these days.  The benefit seen on
the Mac Mini would have been worth the trouble of a custom configuration
a few years ago, but I'm dubious that it matters in the real world
anymore.

If we could arrange to not use pg_flush_after in copydir.c on macOS,
I'd be okay with leaving it alone for the configurable flush_after
calls.  But I can't think of a way to do that that wouldn't be a
complete kluge.  I don't much want to do

+#ifndef __darwin__
pg_flush_data(dstfd, offset, nbytes);
+#endif

but I don't see any better alternative ...

regards, tom lane

progress: 1.0 s, 8132.9 tps, lat 0.978 ms stddev 0.434
progress: 2.0 s, 8841.0 tps, lat 0.905 ms stddev 0.260
progress: 3.0 s, 9020.1 tps, lat 0.887 ms stddev 0.418
progress: 4.0 s, 9005.9 tps, lat 0.888 ms stddev 0.353
progress: 5.0 s, 9167.1 tps, lat 0.873 ms stddev 0.259
progress: 6.0 s, 9248.1 tps, lat 0.865 ms stddev 0.333
progress: 7.0 s, 9295.0 tps, lat 0.861 ms stddev 0.325
progress: 8.0 s, 9435.0 tps, lat 0.848 ms stddev 0.190
progress: 9.0 s, 9453.0 tps, lat 0.846 ms stddev 0.261
progress: 10.0 s, 9717.1 tps, lat 0.823 ms stddev 0.230
progress: 11.0 s, 9658.8 tps, lat 0.828 ms stddev 0.179
progress: 12.0 s, 9332.8 tps, lat 0.857 ms stddev 0.110
progress: 13.0 s, 9430.2 tps, lat 0.848 ms stddev 0.272
progress: 14.0 s, 9479.0 tps, lat 0.844 ms stddev 0.143
progress: 15.0 s, 9381.0 tps, lat 0.853 ms stddev 0.132
progress: 16.0 s, 9464.8 tps, lat 0.845 ms stddev 0.260
progress: 17.0 s, 9563.3 tps, lat 0.836 ms stddev 0.183
progress: 18.0 s, 9646.0 tps, lat 0.829 ms stddev 0.138
progress: 19.0 s, 9554.0 tps, lat 0.837 ms stddev 0.146
progress: 20.0 s, 9296.0 tps, lat 0.861 ms stddev 0.213
progress: 21.0 s, 9255.9 tps, lat 0.864 ms stddev 0.174
progress: 22.0 s, 9224.1 tps, lat 0.867 ms stddev 0.163
progress: 23.0 s, 9381.2 tps, lat 0.853 ms stddev 0.226
progress: 24.0 s, 9263.9 tps, lat 0.864 ms stddev 0.225
progress: 25.0 s, 9320.8 tps, lat 0.858 ms stddev 0.226
progress: 26.0 s, 9538.8 tps, lat 0.837 ms stddev 0.185
progress: 27.0 s, 9588.3 tps, lat 0.836 ms stddev 0.192
progress: 28.0 s, 9720.0 tps, lat 0.823 ms stddev 0.135
progress: 29.0 s, 9958.0 tps, lat 0.803 ms stddev 0.121
progress: 30.0 s, 9080.1 tps, lat 0.881 ms stddev 0.249
progress: 31.0 s, 8969.8 tps, lat 0.892 ms stddev 0.334
progress: 32.0 s, 9059.9 tps, lat 0.883 ms stddev 0.210
progress: 33.0 s, 9052.2 tps, lat 0.884 ms stddev 0.308
progress: 34.0 s, 8738.9 tps, lat 0.915 ms stddev 0.313
progress: 35.0 s, 8969.0 tps, lat 0.891 ms stddev 0.484
progress: 36.0 s, 8959.8 tps, lat 0.893 ms stddev 0.530
progress: 37.0 s, 8943.3 tps, lat 0.895 ms stddev 0.486
progress: 38.0 s, 8593.7 tps, lat 0.930 ms stddev 0.699
progress: 39.0 s, 8715.1 tps, lat 0.919 ms stddev 0.661
progress: 40.0 s, 8990.3 tps, lat 0.890 ms stddev 0.366
progress: 41.0 s, 8940.8 tps, lat 0.895 ms stddev 0.459
progress: 42.0 s, 9226.0 tps, lat 0.867 ms stddev 0.278
progress: 43.0 s, 9149.2 tps, lat 0.874 ms stddev 0.353
progress: 44.0 s, 9252.0 tps, lat 0.865 ms stddev 0.215
progress: 45.0 s, 7473.9 tps, lat 1.070 ms stddev 2.826
progress: 46.0 s, 

Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra

2017-10-02 Thread Andres Freund
On 2017-10-02 15:59:05 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-10-02 15:54:43 -0400, Tom Lane wrote:
> >> Should I expect there to be any difference at all?  We don't enable
> >> *_flush_after by default on non-Linux platforms.
> 
> > Right, you'd have to enable that. But your patch would neuter an
> > intentionally enabled config too, no?
> 
> Well, if you want to suggest a specific scenario to try, I'm happy to.
> I am not going to guess as to what will satisfy you.

To demonstrate what I'm observing here, on linux with a fairly fast ssd:

with:
-c autovacuum_analyze_threshold=2147483647 # to avoid analyze snapshot issue
-c fsync=on
-c synchronous_commit=on
-c shared_buffers=4GB
-c max_wal_size=30GB
-c checkpoint_timeout=30s
-c checkpoint_flush_after=0
-c bgwriter_flush_after=0
and
pgbench -i -s 100 -q

a pgbench -M prepared -c 8 -j 8 -n -P1 -T 100
often has periods like:

synchronous_commit = on:
progress: 73.0 s, 395.0 tps, lat 20.029 ms stddev 4.001
progress: 74.0 s, 289.0 tps, lat 23.730 ms stddev 23.337
progress: 75.0 s, 88.0 tps, lat 104.029 ms stddev 178.038
progress: 76.0 s, 400.0 tps, lat 20.055 ms stddev 4.844
latency average = 21.599 ms
latency stddev = 13.865 ms
tps = 370.346506 (including connections establishing)
tps = 370.372550 (excluding connections establishing)

with synchronous_commit=off those periods are a lot worse:
progress: 57.0 s, 21104.3 tps, lat 0.379 ms stddev 0.193
progress: 58.0 s, 9994.1 tps, lat 0.536 ms stddev 3.140
progress: 59.0 s, 0.0 tps, lat -nan ms stddev -nan
progress: 60.0 s, 0.0 tps, lat -nan ms stddev -nan
progress: 61.0 s, 0.0 tps, lat -nan ms stddev -nan
progress: 62.0 s, 0.0 tps, lat -nan ms stddev -nan
progress: 63.0 s, 3319.6 tps, lat 12.860 ms stddev 253.664
progress: 64.0 s, 20997.0 tps, lat 0.381 ms stddev 0.190
progress: 65.0 s, 20409.1 tps, lat 0.392 ms stddev 0.303
...
latency average = 0.745 ms
latency stddev = 20.470 ms
tps = 10743.53 (including connections establishing)
tps = 10743.815591 (excluding connections establishing)

contrasting that to checkpoint_flush_after=256kB and
bgwriter_flush_after=512kB:

synchronous_commit=on
worst:
progress: 87.0 s, 298.0 tps, lat 26.874 ms stddev 26.691

latency average = 21.898 ms
latency stddev = 6.416 ms
tps = 365.308180 (including connections establishing)
tps = 365.318793 (excluding connections establishing)

synchronous_commit=on

worst:

progress: 30.0 s, 7026.8 tps, lat 1.137 ms stddev 11.070

latency average = 0.550 ms
latency stddev = 5.599 ms
tps = 14547.842213 (including connections establishing)
tps = 14548.325102 (excluding connections establishing)

If you do the same on rotational disks, the stall periods can get a
*lot* worse (multi-minute stalls with pretty much no activity).


What I'm basically wondering is whether we're screwing somebody over
that made the effort to manually configure this on OSX. It's fairly
obvious we need to find a way to disable the msync() by default.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra

2017-10-02 Thread Tom Lane
Andres Freund  writes:
> On 2017-10-02 15:54:43 -0400, Tom Lane wrote:
>> Should I expect there to be any difference at all?  We don't enable
>> *_flush_after by default on non-Linux platforms.

> Right, you'd have to enable that. But your patch would neuter an
> intentionally enabled config too, no?

Well, if you want to suggest a specific scenario to try, I'm happy to.
I am not going to guess as to what will satisfy you.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra

2017-10-02 Thread Andres Freund
On 2017-10-02 15:54:43 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-10-02 15:42:25 -0400, Tom Lane wrote:
> >> I experimented with this further by seeing whether the msync() code path
> >> is of any value on Sierra either.  The answer seems to be "no": cloning
> >> a scale-1000 pgbench database takes about 17-18 seconds on my Sierra
> >> laptop using unmodified HEAD, but if I dike out the msync() logic then
> >> it takes 16-17 seconds.  Both numbers jump around a little, but using
> >> msync is strictly worse.
> 
> > Well, that's only measuring one type of workload. Could you run a normal
> > pgbench with -P1 or so for 2-3 checkpoint cycles and see how big the
> > latency differences are?
> 
> Should I expect there to be any difference at all?  We don't enable
> *_flush_after by default on non-Linux platforms.

Right, you'd have to enable that. But your patch would neuter an
intentionally enabled config too, no?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra

2017-10-02 Thread Tom Lane
Andres Freund  writes:
> On 2017-10-02 15:42:25 -0400, Tom Lane wrote:
>> I experimented with this further by seeing whether the msync() code path
>> is of any value on Sierra either.  The answer seems to be "no": cloning
>> a scale-1000 pgbench database takes about 17-18 seconds on my Sierra
>> laptop using unmodified HEAD, but if I dike out the msync() logic then
>> it takes 16-17 seconds.  Both numbers jump around a little, but using
>> msync is strictly worse.

> Well, that's only measuring one type of workload. Could you run a normal
> pgbench with -P1 or so for 2-3 checkpoint cycles and see how big the
> latency differences are?

Should I expect there to be any difference at all?  We don't enable
*_flush_after by default on non-Linux platforms.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra

2017-10-02 Thread Andres Freund
On 2017-10-02 15:42:25 -0400, Tom Lane wrote:
> I wrote:
> > In short, therefore, APFS cannot cope with the way we're using msync().
> 
> I experimented with this further by seeing whether the msync() code path
> is of any value on Sierra either.  The answer seems to be "no": cloning
> a scale-1000 pgbench database takes about 17-18 seconds on my Sierra
> laptop using unmodified HEAD, but if I dike out the msync() logic then
> it takes 16-17 seconds.  Both numbers jump around a little, but using
> msync is strictly worse.

Well, that's only measuring one type of workload. Could you run a normal
pgbench with -P1 or so for 2-3 checkpoint cycles and see how big the
latency differences are?


> I propose therefore that an appropriate fix is to unconditionally disable
> the msync code path on Darwin, as we have already done for Windows.  When
> and if Apple changes their kernel so that this path is actually of some
> value, we can figure out how to detect whether to use it.

I'm inclined to think you're right.

This is a surprisingly massive regression for a new OS release - we're
not the only database that uses msync...


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra

2017-10-02 Thread Brent Dearth
Thanks for this breakdown Tom!

FWIW - I'm on Postgres 9.6.5 as bundled with Postgres.app (2.0.5) running
on 2013 MBP (2.7GHz i7 / 16GB / SSD) setup. It looks like this might be a
priority for an upcoming release, so I might try to hold out for
downstream, but thanks for the patch. It will help if we need get custom
builds out to fellow devs if this becomes too unbearable.

On Mon, Oct 2, 2017 at 1:42 PM, Tom Lane  wrote:

> I wrote:
> > In short, therefore, APFS cannot cope with the way we're using msync().
>
> I experimented with this further by seeing whether the msync() code path
> is of any value on Sierra either.  The answer seems to be "no": cloning
> a scale-1000 pgbench database takes about 17-18 seconds on my Sierra
> laptop using unmodified HEAD, but if I dike out the msync() logic then
> it takes 16-17 seconds.  Both numbers jump around a little, but using
> msync is strictly worse.
>
> I propose therefore that an appropriate fix is to unconditionally disable
> the msync code path on Darwin, as we have already done for Windows.  When
> and if Apple changes their kernel so that this path is actually of some
> value, we can figure out how to detect whether to use it.
>
> The msync logic seems to date back to this thread:
>
> https://www.postgresql.org/message-id/flat/alpine.DEB.2.
> 10.150601132.28433%40sto
>
> wherein Andres opined
> >> I think this patch primarily needs:
> >> * Benchmarking on FreeBSD/OSX to see whether we should enable the
> >>   mmap()/msync(MS_ASYNC) method by default. Unless somebody does so, I'm
> >>   inclined to leave it off till then.
>
> but so far as I can tell from the thread, only testing on FreeBSD ever
> got done.  So there's no evidence that this was ever beneficial on macOS,
> and we now have evidence that it's between counterproductive and
> catastrophic depending on which kernel version you look at.
>
> regards, tom lane
>


Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra

2017-10-02 Thread Tom Lane
I wrote:
> In short, therefore, APFS cannot cope with the way we're using msync().

I experimented with this further by seeing whether the msync() code path
is of any value on Sierra either.  The answer seems to be "no": cloning
a scale-1000 pgbench database takes about 17-18 seconds on my Sierra
laptop using unmodified HEAD, but if I dike out the msync() logic then
it takes 16-17 seconds.  Both numbers jump around a little, but using
msync is strictly worse.

I propose therefore that an appropriate fix is to unconditionally disable
the msync code path on Darwin, as we have already done for Windows.  When
and if Apple changes their kernel so that this path is actually of some
value, we can figure out how to detect whether to use it.

The msync logic seems to date back to this thread:

https://www.postgresql.org/message-id/flat/alpine.DEB.2.10.150601132.28433%40sto

wherein Andres opined
>> I think this patch primarily needs:
>> * Benchmarking on FreeBSD/OSX to see whether we should enable the
>>   mmap()/msync(MS_ASYNC) method by default. Unless somebody does so, I'm
>>   inclined to leave it off till then.

but so far as I can tell from the thread, only testing on FreeBSD ever
got done.  So there's no evidence that this was ever beneficial on macOS,
and we now have evidence that it's between counterproductive and
catastrophic depending on which kernel version you look at.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra

2017-10-02 Thread Tom Lane
Brent Dearth  writes:
> I just recently "upgraded" to High Sierra and experiencing horrendous CREATE
> DATABASE performance. Creating a database from a 3G template DB used to
> take ~1m but post-upgrade is taking ~22m at a sustained write of around
> 4MB/s. Occasionally, attempting to create an empty database hangs
> indefinitely as well. When this happens, restarting the Postgres server
> allows empty database initialization in ~1s.

What PG version are you running?

I tried to reproduce this, using HEAD and what I had handy:
(a) Late 2016 MacBook Pro, 2.7GHz i7, still on Sierra
(b) Late 2013 MacBook Pro, 2.3GHz i7, High Sierra, drive is converted to APFS

I made a ~7.5GB test database using "pgbench -i -s 500 bench" and
then cloned it with "create database b2 with template bench".

Case 1: fsync off.
Machine A did the clone in 5.6 seconds, machine B in 12.9 seconds.

Considering the CPU speed difference and the fact that Apple put
significantly faster SSDs into the 2016 models, I'm not sure this
difference is due to anything but better hardware.

Case 2: fsync on.
Machine A did the clone in 7.5 seconds, machine B in 2523.5 sec (42 min!).

So something is badly busted in APFS' handling of fsync, and/or
we're doing it in a bad way.

Interestingly, pg_test_fsync shows only about a factor-of-2 difference
in the timings for regular file fsyncs.  So I poked into non-fsync
logic that we'd added recently, and after awhile found that diking out
the msync code path in pg_flush_data reduces machine B's time to an
entirely reasonable 11.5 seconds.

In short, therefore, APFS cannot cope with the way we're using msync().
I observe that the copy gets slower and slower as it runs (watching the
transfer rate with "iostat 1"), which makes me wonder if there's some
sort of O(N^2) issue in the kernel logic for this.  But anyway, as
a short-term workaround you might try

diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index b0c174284b..af35de5f7d 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -451,7 +451,7 @@ pg_flush_data(int fd, off_t offset, off_t nbytes)
return;
}
 #endif
-#if !defined(WIN32) && defined(MS_ASYNC)
+#if 0
{
void   *p;
static int  pagesize = 0;


regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Horrible CREATE DATABASE Performance in High Sierra

2017-10-02 Thread Brent Dearth
I just recently "upgraded" to High Sierra and experiencing horrendous CREATE
DATABASE performance. Creating a database from a 3G template DB used to
take ~1m but post-upgrade is taking ~22m at a sustained write of around
4MB/s. Occasionally, attempting to create an empty database hangs
indefinitely as well. When this happens, restarting the Postgres server
allows empty database initialization in ~1s.

I had been running on an encrypted APFS volume (FileVault), but after
dropping encryption, saw the tasks drop to about *~15m* per run. Still a
far cry from the previous *~1m* threshold.

A multi-threaded pg_restore seems to sustain writes of ~38M/s and completes
in about the same time as pre-upgrade (Sierra), so I'm not sure it's
entirely related to APFS / disk IO.

I've completely rebuilt the Postgres data directory, re-installed Postgres
(Postgres.app 2.0.5) etc. I don't have any reasonable explanation for what
could have broken so catastrophically.

Coworker has seen the exact same issue. Has anyone else experienced this
yet or have any insight as to what could be happening?


Thanks in advance!