On 09/23/2016 03:07 PM, Amit Kapila wrote:
On Fri, Sep 23, 2016 at 6:16 PM, Tomas Vondra
<tomas.von...@2ndquadrant.com> wrote:
On 09/23/2016 01:44 AM, Tomas Vondra wrote:

The 4.5 kernel clearly changed the results significantly:


(c) Although it's not visible in the results, 4.5.5 almost perfectly
eliminated the fluctuations in the results. For example when 3.2.80
produced this results (10 runs with the same parameters):

    12118 11610 27939 11771 18065
    12152 14375 10983 13614 11077

we get this on 4.5.5

    37354 37650 37371 37190 37233
    38498 37166 36862 37928 38509

Notice how much more even the 4.5.5 results are, compared to 3.2.80.

The more I think about these random spikes in pgbench performance on 3.2.80,
the more I find them intriguing. Let me show you another example (from
Dilip's workload and group-update patch on 64 clients).

This is on 3.2.80:

  44175  34619  51944  38384  49066
  37004  47242  36296  46353  36180

and on 4.5.5 it looks like this:

  34400  35559  35436  34890  34626
  35233  35756  34876  35347  35486

So the 4.5.5 results are much more even, but overall clearly below 3.2.80.
How does 3.2.80 manage to do ~50k tps in some of the runs? Clearly we
randomly do something right, but what is it and why doesn't it happen on the
new kernel? And how could we do it every time?

As far as I can see you are using default values of min_wal_size,
max_wal_size, checkpoint related params, have you changed default
shared_buffer settings, because that can have a bigger impact.

Huh? Where do you see me using default values? There are settings.log with a dump of pg_settings data, and the modified values are

checkpoint_completion_target = 0.9
checkpoint_timeout = 3600
effective_io_concurrency = 32
log_autovacuum_min_duration = 100
log_checkpoints = on
log_line_prefix = %m
log_timezone = UTC
maintenance_work_mem = 524288
max_connections = 300
max_wal_size = 8192
min_wal_size = 1024
shared_buffers = 2097152
synchronous_commit = on
work_mem = 524288

(ignoring some irrelevant stuff like locales, timezone etc.).

Using default values of mentioned parameters can lead to checkpoints in
between your runs.

So I'm using 16GB shared buffers (so with scale 300 everything fits into shared buffers), min_wal_size=16GB, max_wal_size=128GB, checkpoint timeout 1h etc. So no, there are no checkpoints during the 5-minute runs, only those triggered explicitly before each run.

Also, I think instead of 5 mins, read-write runs should be run for 15
mins to get consistent data.

Where does the inconsistency come from? Lack of warmup? Considering how uniform the results from the 10 runs are (at least on 4.5.5), I claim this is not an issue.

For Dilip's workload where he is using only Select ... For Update, i
think it is okay, but otherwise you need to drop and re-create the
database between each run, otherwise data bloat could impact the

And why should it affect 3.2.80 and 4.5.5 differently?

I think in general, the impact should be same for both the kernels
because you are using same parameters, but I think if use
appropriate parameters, then you can get consistent results for
3.2.80. I have also seen variation in read-write tests, but the
variation you are showing is really a matter of concern, because it
will be difficult to rely on final data.

Both kernels use exactly the same parameters (fairly tuned, IMHO).

Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to