On 03/11/2016 02:34 AM, Andres Freund wrote:

I just pushed the two major remaining patches in this thread. Let's see
what the buildfarm has to say; I'd not be surprised if there's some
lingering portability problem in the flushing code.

There's one remaining issue we definitely want to resolve before the
next release:  Right now we always use one writeback context across all
tablespaces in a checkpoint, but Fabien's testing shows that that's
likely to hurt in a number of cases. I've some data suggesting the
contrary in others.

Things that'd be good:
* Some benchmarking. Right now controlled flushing is enabled by default
  on linux, but disabled by default on other operating systems. Somebody
  running benchmarks on e.g. freebsd or OSX might be good.

So I've done some benchmarks of this, and I think the results are very good. I've compared a298a1e06 and 23a27b039d (so the two patches mentioned here are in-between those two), and I've done a few long pgbench runs - 24h each:

1) master (a298a1e06), regular pgbench
2) master (a298a1e06), throttled to 5000 tps
3) patched (23a27b039), regular pgbench
3) patched (23a27b039), throttled to 5000 tps

All of this was done on a quite large machine:

* 4 x CPU E5-4620 (2.2GHz)
* 256GB of RAM
* 24x SSD on LSI 2208 controller (with 1GB BBWC)

The page cache was using the default config, although in production setups we'd probably lower the limits (particularly the background threshold):

* vm.dirty_background_ratio = 10
* vm.dirty_ratio = 20

The main PostgreSQL configuration changes are these:

* shared_buffers=64GB
* bgwriter_delay = 10ms
* bgwriter_lru_maxpages = 1000
* checkpoint_timeout = 30min
* max_wal_size = 64GB
* min_wal_size = 32GB

I haven't touched the flush_after values, so those are at default. Full config in the github repo, along with all the results and scripts used to generate the charts etc:


I'd like to see some benchmarks on machines with regular rotational storage, but I don't have a suitable system at hand.

The pgbench was scale 60000, so ~750GB of data on disk, and was executed either like this (the "default"):

pgbench -c 32 -j 8 -T 86400 -l --aggregate-interval=1 pgbench

or like this ("throttled"):

pgbench -c 32 -j 8 -T 86400 -R 5000 -l --aggregate-interval=1 pgbench

The reason for the throttling is that people generally don't run production databases 100% saturated, so it'd be sad to improve the 100% saturated case and hurt the common case by increasing latency. The machine does ~8000 tps, so 5000 tps is ~60% of that.

It's difficult to judge based on a single run (although a long one), but it seems the throughput increased a tiny bit from 7725 to 8000. That's ~4% difference, but I guess more runs would be needed to see if this is noise or actual improvement.

Now, let's see at the per-second results, i.e. how much the performance fluctuates over time (due to checkpoints etc.). That's where the aggregated log (per-second) gets useful, as it's used for generating the various charts for tps, max latency, stddev of latency etc.

All those charts are CDF, i.e. cumulative distribution function, i.e. they plot a metric on x-axis, and probability P(X <= x) on y-axis.

In general the steeper the curve the better (more consistent behavior over time). It also allows comparing two curves - e.g. for tps metric the "lower" curve is better, as it means higher values are more likely.

default (non-throttled) pgbench runs

Let's see the regular (non-throttled) pgbench runs first:

* regular-tps.png (per-second TPS)

Clearly, the patched version is much more consistent - firstly it's much less "wobbly" and it's considerably steeper, which means the per-second throughput fluctuates much less. That's good.

We already know the total throughput is almost exactly the same (just 4% difference), this also shows that the medians are almost exactly the same (the curves intersect at pretty much exactly 50%).

* regular-max-lat.png (per-second maximum latency)
* regular-stddev-lat.png (per-second latency stddev)

Apparently the additional processing slightly increases both the maximum latency and standard deviation, as the green line (patched) is consistently below the pink one (unpatched).

Notice however that x-axis is using log scale, so the differences are actually very small, and we also know that the total throughput slightly increased. So while those two metrics slightly increased, the overall impact on latency has to be positive.

throttled pgbench runs

* throttled-tps.png (per-second TPS)

OK, this is great - the chart shows that the performance is way more consistent. Originally there was ~10% of samples with ~2000 tps, but with the flushing you'd have to go to ~4600 tps. It's actually pretty difficult to determine this from the chart, because the curve got so steep and I had to check the data used to generate the charts.

Similarly for the upper end, but I assume that's a consequence of the throttling not having to compensate for the "slow" seconds anymore.

* throttled-max-lat.png (per-second maximum latency)
* throttled-stddev-lat.png (per-second latency stddev)

This time the stddev/max latency charts are actually in favor of the patched code. It's actually a bit worse for the low latencies (the green line is below the pink one, so there are fewer low values), but then it starts winning for higher values. And that's what counts when it comes to consistency.

Again, notice that the x-axis is log scale, so the differences for large values are actually way more significant than it might look.

So, good work I guess!


Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to