On 09/23/2016 03:07 PM, Amit Kapila wrote:
On Fri, Sep 23, 2016 at 6:16 PM, Tomas Vondra
On 09/23/2016 01:44 AM, Tomas Vondra wrote:
The 4.5 kernel clearly changed the results significantly:
(c) Although it's not visible in the results, 4.5.5 almost perfectly
eliminated the fluctuations in the results. For example when 3.2.80
produced this results (10 runs with the same parameters):
12118 11610 27939 11771 18065
12152 14375 10983 13614 11077
we get this on 4.5.5
37354 37650 37371 37190 37233
38498 37166 36862 37928 38509
Notice how much more even the 4.5.5 results are, compared to 3.2.80.
The more I think about these random spikes in pgbench performance on 3.2.80,
the more I find them intriguing. Let me show you another example (from
Dilip's workload and group-update patch on 64 clients).
This is on 3.2.80:
44175 34619 51944 38384 49066
37004 47242 36296 46353 36180
and on 4.5.5 it looks like this:
34400 35559 35436 34890 34626
35233 35756 34876 35347 35486
So the 4.5.5 results are much more even, but overall clearly below 3.2.80.
How does 3.2.80 manage to do ~50k tps in some of the runs? Clearly we
randomly do something right, but what is it and why doesn't it happen on the
new kernel? And how could we do it every time?
As far as I can see you are using default values of min_wal_size,
max_wal_size, checkpoint related params, have you changed default
shared_buffer settings, because that can have a bigger impact.
Huh? Where do you see me using default values? There are settings.log
with a dump of pg_settings data, and the modified values are
checkpoint_completion_target = 0.9
checkpoint_timeout = 3600
effective_io_concurrency = 32
log_autovacuum_min_duration = 100
log_checkpoints = on
log_line_prefix = %m
log_timezone = UTC
maintenance_work_mem = 524288
max_connections = 300
max_wal_size = 8192
min_wal_size = 1024
shared_buffers = 2097152
synchronous_commit = on
work_mem = 524288
(ignoring some irrelevant stuff like locales, timezone etc.).
Using default values of mentioned parameters can lead to checkpoints in
between your runs.
So I'm using 16GB shared buffers (so with scale 300 everything fits into
shared buffers), min_wal_size=16GB, max_wal_size=128GB, checkpoint
timeout 1h etc. So no, there are no checkpoints during the 5-minute
runs, only those triggered explicitly before each run.
Also, I think instead of 5 mins, read-write runs should be run for 15
mins to get consistent data.
Where does the inconsistency come from? Lack of warmup? Considering how
uniform the results from the 10 runs are (at least on 4.5.5), I claim
this is not an issue.
For Dilip's workload where he is using only Select ... For Update, i
think it is okay, but otherwise you need to drop and re-create the
database between each run, otherwise data bloat could impact the
And why should it affect 3.2.80 and 4.5.5 differently?
I think in general, the impact should be same for both the kernels
because you are using same parameters, but I think if use
appropriate parameters, then you can get consistent results for
3.2.80. I have also seen variation in read-write tests, but the
variation you are showing is really a matter of concern, because it
will be difficult to rely on final data.
Both kernels use exactly the same parameters (fairly tuned, IMHO).
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Sent via pgsql-hackers mailing list (email@example.com)
To make changes to your subscription: