Hi,
On 05/10/2016 10:29 AM, Kevin Grittner wrote:
On Mon, May 9, 2016 at 9:01 PM, Tomas Vondra
<tomas.von...@2ndquadrant.com <mailto:tomas.von...@2ndquadrant.com>> wrote:
Over the past few days I've been running benchmarks on a fairly
large NUMA box (4 sockets, 32 cores / 64 with HR, 256GB of RAM)
to see the impact of the 'snapshot too old' - both when disabled
and enabled with various values in the old_snapshot_threshold
GUC.
Thanks!
The benchmark is a simple read-only pgbench with prepared
statements, i.e. doing something like this:
pgbench -S -M prepared -j N -c N
Do you have any plans to benchmark cases where the patch can have a
benefit? (Clearly, nobody would be interested in using the feature
with a read-only load; so while that makes a good "worst case"
scenario and is very valuable for testing the "off" versus
"reverted" comparison, it's not an intended use or one that's
likely to happen in production.)
Yes, I'd like to repeat the tests with other workloads - I'm thinking
about regular pgbench and perhaps something that'd qualify as 'mostly
read-only' (not having a clear idea how that should work).
Feel free to propose other tests.
master-10-new - 91fd1df4 + old_snapshot_threshold=10
master-10-new-2 - 91fd1df4 + old_snapshot_threshold=10 (rerun)
So, these runs were with identical software on the same data? Any
differences are just noise?
Yes, same config. The differences are either noise or something
unexpected (like the sudden drops of tps with high client counts).
* The results are a bit noisy, but I think in general this shows
that for certain cases there's a clearly measurable difference
(up to 5%) between the "disabled" and "reverted" cases. This is
particularly visible on the smallest data set.
In some cases, the differences are in favor of disabled over
reverted.
Well, that's a good question. I think the results for higher client
counts (>=64) are fairy noisy, so in those cases it may easily be just
due to noise. For the lower client counts that seems to be much less
noisy though.
* What's fairly strange is that on the largest dataset (scale
10000), the "disabled" case is actually consistently faster than
"reverted" - that seems a bit suspicious, I think. It's possible
that I did the revert wrong, though - the revert.patch is
included in the tgz. This is why I also tested 689f9a05, but
that's also slower than "disabled".
Since there is not a consistent win of disabled or reverted over
the other, and what difference there is is often far less than the
difference between the two runs with identical software, is there
any reasonable interpretation of this except that the difference is
"in the noise"?
Are we both looking at the results for scale 10000? I think there's
pretty clear difference between "disabled" and "reverted" (or 68919a05,
for that matter). The gap is also much larger compared to the two
"identical" runs (ignoring the runs with 128 clients).
* The performance impact with the feature enabled seems rather
significant, especially once you exceed the number of physical
cores (32 in this case). Then the drop is pretty clear - often
~50% or more.
* 7e3da1c4 claims to bring the performance within 5% of the
disabled case, but that seems not to be the case.
The commit comment says "At least in the tested case this brings
performance within 5% of when the feature is off, compared to
several times slower without this patch." The tested case was a
read-write load, so your read-only tests do nothing to determine
whether this was the case in general for this type of load.
Partly, the patch decreases chasing through HOT chains and
increases the number of HOT updates, so there are compensating
benefits of performing early vacuum in a read-write load.
OK. Sadly the commit message does not mention what the tested case was,
so I wasn't really sure ...
What it however does is bringing the 'non-immediate' cases close
to the immediate ones (before the performance drop came much
sooner in these cases - at 16 clients).
Right. This is, of course, just the first optimization, that we
were able to get in "under the wire" before beta, but the other
optimizations under consideration would only tend to bring the
"enabled" cases closer together in performance, not make an enabled
case perform the same as when the feature was off -- especially for
a read-only workload.
OK
* It's also seems to me the feature greatly amplifies the
variability of the results, somehow. It's not uncommon to see
results like this:
master-10-new-2 235516 331976 133316 155563 133396
where after the first runs (already fairly variable) the
performance tanks to ~50%. This happens particularly with higher
client counts, otherwise the max-min is within ~5% of the max.
There are a few cases where this happens without the feature
(i.e. old master, reverted or disabled), but it's usually much
smaller than with it enabled (immediate, 10 or 60). See the
'summary' sheet in the ODS spreadsheet.
I don't know what's the problem here - at first I thought that
maybe something else was running on the machine, or that
anti-wraparound autovacuum kicked in, but that seems not to be
the case. There's nothing like that in the postgres log (also
included in the .tgz).
I'm inclined to suspect NUMA effects. It would be interesting to
try with the NUMA patch and cpuset I submitted a while back or with
fixes in place for the Linux scheduler bugs which were reported
last month. Which kernel version was this?
I can try that, sure. Can you point me to the last versions of the
patches, possibly rebased to current master if needed?
The kernel is 3.19.0-031900-generic
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers