Hi,

On 12/31/2016 04:00 PM, Magnus Hagander wrote:
Cycling back to this topic again, but this time at the beginning of a CF.

Here's an actual patch to change:


wal_level=replica
max_wal_senders=10
max_replication_slots=20

Based on feedback from last year
(https://www.postgresql.org/message-id/CABUevEwfV7zDutescm2PHGvsJdYA0RWHFMTRGhwrJPGgSbzZDQ%40mail.gmail.com):


There were requests for benchmarks of performance difference. Tomas
has promised to run a couple of benchmarks on his standard
benchmarking setups to give numbers on that. Thanks Tomas, please
pipe in with your results when you have them!
>

As promised, I'm running some benchmarks, and I have some early results to report. And perhaps we can discuss whether we need to test some additional workloads.

I'm 100% on board with the idea that we should switch to wal_level which allows taking backups or setting-up a streaming replica, as long as it does not cause severe performance regression in common workloads.

So while it'd be trivial to construct workloads demonstrating the optimizations in wal_level=minimal (e.g. initial loads doing CREATE TABLE + COPY + CREATE INDEX in a single transaction), but that would be mostly irrelevant I guess.

Instead, I've decided to run regular pgbench TPC-B-like workload on a bunch of different scales, and measure throughput + some xlog stats with each of the three wal_level options.

Note: I tweaked the code a bit to allow archiving with "minimal" WAL level, to allow computing WAL stats on the archived segments (instead of keeping all segments in the data directory).

As usual, I'm running it on two machines - a small old one (i5-2500k box with 4 cores and 8GB of RAM) and a new one (2x e5-2620v4 with 16/32 cores, 64GB of RAM). Both machines have SSD-based storage.

The clusters on both machines were reasonably tuned, see 'settings.log' for each run. The tests are fairly long, covering multiple checkpoints etc. In other words, the results should be fairly stable.

The scripts/results/stats/configs are available here:

* https://bitbucket.org/tvondra/wal-levels-e2620-v4/src
* https://bitbucket.org/tvondra/wal-levels-i5/src

So far I only have results for the smallest data sets (50 on i5 and 100 on e5), which easily fits into shared_buffers in both cases, and the numbers look like this:

                minimal      replica      standby
  ------------------------------------------------
  i5-2500k         5884         5896         5873
  e5-2620v4       23968        24393        24259

So the performance penalty of replica/standby WAL levels on this workload is pretty much non-existent - for the larger machine those levels are actually a tad faster than 'minimal', but the difference is within 2% (so might easily be noise).

I'll push results for larger ones once those tests complete (possibly tomorrow).


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to