On 2016-05-31 16:03:46 -0400, Robert Haas wrote:
> On Fri, May 27, 2016 at 12:37 AM, Andres Freund <and...@anarazel.de> wrote:
> > I don't think the situation is quite that simple. By *disabling* backend
> > flushing it's also easy to see massive performance regressions. In
> > situations where shared buffers was configured appropriately for the
> > workload (not the case here IIRC).
> On what kind of workload does setting backend_flush_after=0 represent
> a large regression vs. the default settings?
> I think we have to consider that pgbench and parallel copy are pretty
> common things to want to do, and a non-zero default setting hurts
> those workloads a LOT.
I don't think pgbench's workload has much to do with reality. Even less
so in the setup presented here.
The slowdown comes from the fact that default pgbench randomly, but
uniformly, updates a large table. Which is slower with
backend_flush_after if the workload is considerably bigger than
shared_buffers, but, and that's a very important restriction, the
workload at the same time largely fits in to less than
/proc/sys/vm/dirty_ratio / 20% (probably even 10% /
/proc/sys/vm/dirty_background_ratio) of the free os memory. The "trick"
in that case is that very often, before a buffer has been written back
to storage by the OS, it'll be re-dirtied by postgres. Which means
triggering flushing by postgres increases the total amount of writes.
That only matters if the kernel doesn't trigger writeback because of the
above ratios, or because of time limits (30s /
> I have a really hard time believing that the benefits on other
> workloads are large enough to compensate for the slowdowns we're
> seeing here.
As a random example, without looking for good parameters, on my laptop:
pgbench -i -q -s 1000
Ram: 24GB of memory
Storage: Samsung SSD 850 PRO 1TB, encrypted
postgres -c shared_buffers=6GB -c backend_flush_after=128 -c max_wal_size=100GB
-c fsync=on -c synchronous_commit=off
pgbench -M prepared -c 16 -j 16 -T 520 -P 1 -n -N
(note the -N)
latency average = 2.774 ms
latency stddev = 10.388 ms
tps = 5761.883323 (including connections establishing)
tps = 5762.027278 (excluding connections establishing)
latency average = 2.543 ms
latency stddev = 3.554 ms
tps = 6284.069846 (including connections establishing)
tps = 6284.184570 (excluding connections establishing)
Note the latency dev which is 3x better. And the improved throughput.
That's for a workload which even fits into the OS memory. Without
backend flushing there's several periods looking like
progress: 249.0 s, 7237.6 tps, lat 1.997 ms stddev 4.365
progress: 250.0 s, 0.0 tps, lat -nan ms stddev -nan
progress: 251.0 s, 1880.6 tps, lat 17.761 ms stddev 169.682
progress: 252.0 s, 6904.4 tps, lat 2.328 ms stddev 3.256
i.e. moments in which no transactions are executed. And that's on
storage that can do 500MB/sec, and tens of thousand IOPs.
If you change the workload workload that uses synchronous_commit, is
bigger than OS memory and/or doesn't have very fast storage, the
differences can be a *LOT* bigger.
In general, any workload which doesn't fit a) the above criteria of
likely re-dirtying blocks it already dirtied, before kernel triggered
writeback happens b) concurrently COPYs into an indvidual file, is
likely to be faster (or unchanged if within s_b) with backend flushing.
Which means that transactional workloads that are bigger than the OS
memory, or which have a non-uniform distribution leading to some
locality, are likely to be faster. In practice those are *hugely* more
likely than the uniform distribution that pgbench has.
Similarly, this *considerably* reduces the impact a concurrently running
vacuum or COPY has on concurrent queries. Because suddenly VACUUM/COPY
can't create a couple gigabytes of dirty buffers which will be written
back at some random point in time later, stalling everything.
I think the benefits of a more predictable (and often faster!)
performance in a bunch of actual real-worl-ish workloads are higher than
optimizing for benchmarks.
> We have nobody writing in to say that
> backend_flush_after>0 is making things way better for them, and
> Ashutosh and I have independently hit massive slowdowns on unrelated
Actually, we have some of evidence of that? Just so far not in this
thread; which I don't find particularly surprising.
Sent via pgsql-hackers mailing list (firstname.lastname@example.org)
To make changes to your subscription: