On Tue, Nov 12, 2019 at 10:49:49AM +0000, k.jami...@fujitsu.com wrote:
On Thurs, November 7, 2019 1:27 AM (GMT+9), Robert Haas wrote:
On Tue, Nov 5, 2019 at 10:34 AM Tomas Vondra <tomas.von...@2ndquadrant.com>
wrote:
> 2) This adds another hashtable maintenance to BufferAlloc etc. but
> you've only done tests / benchmark for the case this optimizes. I
> think we need to see a benchmark for workload that allocates and
> invalidates lot of buffers. A pgbench with a workload that fits into
> RAM but not into shared buffers would be interesting.
Yeah, it seems pretty hard to believe that this won't be bad for some workloads.
Not only do you have the overhead of the hash table operations, but you also
have locking overhead around that. A whole new set of LWLocks where you have
to take and release one of them every time you allocate or invalidate a buffer
seems likely to cause a pretty substantial contention problem.
I'm sorry for the late reply. Thank you Tomas and Robert for checking this
patch.
Attached is the v3 of the patch.
- I moved the unnecessary items from buf_internals.h to cached_buf.c since most
of
of those items are only used in that file.
- Fixed the bug of v2. Seems to pass both RT and TAP test now
Thanks for the advice on benchmark test. Please refer below for test and
results.
[Machine spec]
CPU: 16, Number of cores per socket: 8
RHEL6.5, Memory: 240GB
scale: 3125 (about 46GB DB size)
shared_buffers = 8GB
[workload that fits into RAM but not into shared buffers]
pgbench -i -s 3125 cachetest
pgbench -c 16 -j 8 -T 600 cachetest
[Patched]
scaling factor: 3125
query mode: simple
number of clients: 16
number of threads: 8
duration: 600 s
number of transactions actually processed: 8815123
latency average = 1.089 ms
tps = 14691.436343 (including connections establishing)
tps = 14691.482714 (excluding connections establishing)
[Master/Unpatched]
...
number of transactions actually processed: 8852327
latency average = 1.084 ms
tps = 14753.814648 (including connections establishing)
tps = 14753.861589 (excluding connections establishing)
My patch caused a little overhead of about 0.42-0.46%, which I think is small.
Kindly let me know your opinions/comments about the patch or tests, etc.
Now try measuring that with a read-only workload, with prepared
statements. I've tried that on a machine with 16 cores, doing
# 16 clients
pgbench -n -S -j 16 -c 16 -M prepared -T 60 test
# 1 client
pgbench -n -S -c 1 -M prepared -T 60 test
and average from 30 runs of each looks like this:
# clients master patched %
---------------------------------------------------------
1 29690 27833 93.7%
16 300935 283383 94.1%
That's quite significant regression, considering it's optimizing an
operation that is expected to be pretty rare (people are generally not
dropping dropping objects as often as they query them).
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services