Re: [HACKERS] checkpointer continuous flushing

Tomas Vondra Sat, 19 Mar 2016 04:04:01 -0700

Hi,

On 03/11/2016 02:34 AM, Andres Freund wrote:

Hi,


I just pushed the two major remaining patches in this thread. Let's see
what the buildfarm has to say; I'd not be surprised if there's some
lingering portability problem in the flushing code.

There's one remaining issue we definitely want to resolve before the
next release:  Right now we always use one writeback context across all
tablespaces in a checkpoint, but Fabien's testing shows that that's
likely to hurt in a number of cases. I've some data suggesting the
contrary in others.

Things that'd be good:
* Some benchmarking. Right now controlled flushing is enabled by default
  on linux, but disabled by default on other operating systems. Somebody
  running benchmarks on e.g. freebsd or OSX might be good.

So I've done some benchmarks of this, and I think the results are verygood. I've compared a298a1e06 and 23a27b039d (so the two patchesmentioned here are in-between those two), and I've done a few longpgbench runs - 24h each:


1) master (a298a1e06), regular pgbench
2) master (a298a1e06), throttled to 5000 tps
3) patched (23a27b039), regular pgbench
3) patched (23a27b039), throttled to 5000 tps

All of this was done on a quite large machine:

* 4 x CPU E5-4620 (2.2GHz)
* 256GB of RAM
* 24x SSD on LSI 2208 controller (with 1GB BBWC)

The page cache was using the default config, although in productionsetups we'd probably lower the limits (particularly the backgroundthreshold):


* vm.dirty_background_ratio = 10
* vm.dirty_ratio = 20

The main PostgreSQL configuration changes are these:

* shared_buffers=64GB
* bgwriter_delay = 10ms
* bgwriter_lru_maxpages = 1000
* checkpoint_timeout = 30min
* max_wal_size = 64GB
* min_wal_size = 32GB

I haven't touched the flush_after values, so those are at default. Fullconfig in the github repo, along with all the results and scripts usedto generate the charts etc:


    https://github.com/tvondra/flushing-benchmark

I'd like to see some benchmarks on machines with regular rotationalstorage, but I don't have a suitable system at hand.

The pgbench was scale 60000, so ~750GB of data on disk, and was executedeither like this (the "default"):


pgbench -c 32 -j 8 -T 86400 -l --aggregate-interval=1 pgbench

or like this ("throttled"):

pgbench -c 32 -j 8 -T 86400 -R 5000 -l --aggregate-interval=1 pgbench

The reason for the throttling is that people generally don't runproduction databases 100% saturated, so it'd be sad to improve the 100%saturated case and hurt the common case by increasing latency. Themachine does ~8000 tps, so 5000 tps is ~60% of that.

It's difficult to judge based on a single run (although a long one), butit seems the throughput increased a tiny bit from 7725 to 8000. That's~4% difference, but I guess more runs would be needed to see if this isnoise or actual improvement.

Now, let's see at the per-second results, i.e. how much the performancefluctuates over time (due to checkpoints etc.). That's where theaggregated log (per-second) gets useful, as it's used for generating thevarious charts for tps, max latency, stddev of latency etc.

All those charts are CDF, i.e. cumulative distribution function, i.e.they plot a metric on x-axis, and probability P(X <= x) on y-axis.

In general the steeper the curve the better (more consistent behaviorover time). It also allows comparing two curves - e.g. for tps metricthe "lower" curve is better, as it means higher values are more likely.


default (non-throttled) pgbench runs
------------------------------------

Let's see the regular (non-throttled) pgbench runs first:

* regular-tps.png (per-second TPS)

Clearly, the patched version is much more consistent - firstly it's muchless "wobbly" and it's considerably steeper, which means the per-secondthroughput fluctuates much less. That's good.

We already know the total throughput is almost exactly the same (just 4%difference), this also shows that the medians are almost exactly thesame (the curves intersect at pretty much exactly 50%).


* regular-max-lat.png (per-second maximum latency)
* regular-stddev-lat.png (per-second latency stddev)

Apparently the additional processing slightly increases both the maximumlatency and standard deviation, as the green line (patched) isconsistently below the pink one (unpatched).

Notice however that x-axis is using log scale, so the differences areactually very small, and we also know that the total throughput slightlyincreased. So while those two metrics slightly increased, the overallimpact on latency has to be positive.


throttled pgbench runs
----------------------

* throttled-tps.png (per-second TPS)

OK, this is great - the chart shows that the performance is way moreconsistent. Originally there was ~10% of samples with ~2000 tps, butwith the flushing you'd have to go to ~4600 tps. It's actually prettydifficult to determine this from the chart, because the curve got sosteep and I had to check the data used to generate the charts.

Similarly for the upper end, but I assume that's a consequence of thethrottling not having to compensate for the "slow" seconds anymore.


* throttled-max-lat.png (per-second maximum latency)
* throttled-stddev-lat.png (per-second latency stddev)

This time the stddev/max latency charts are actually in favor of thepatched code. It's actually a bit worse for the low latencies (the greenline is below the pink one, so there are fewer low values), but then itstarts winning for higher values. And that's what counts when it comesto consistency.

Again, notice that the x-axis is log scale, so the differences for largevalues are actually way more significant than it might look.



So, good work I guess!

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing

Reply via email to