On Tue, Nov 12, 2019 at 10:49:49AM +0000, k.jami...@fujitsu.com wrote:
On Thurs, November 7, 2019 1:27 AM (GMT+9), Robert Haas wrote:
On Tue, Nov 5, 2019 at 10:34 AM Tomas Vondra <tomas.von...@2ndquadrant.com>
wrote:
> 2) This adds another hashtable maintenance to BufferAlloc etc. but
>     you've only done tests / benchmark for the case this optimizes. I
>     think we need to see a benchmark for workload that allocates and
>     invalidates lot of buffers. A pgbench with a workload that fits into
>     RAM but not into shared buffers would be interesting.

Yeah, it seems pretty hard to believe that this won't be bad for some workloads.
Not only do you have the overhead of the hash table operations, but you also
have locking overhead around that. A whole new set of LWLocks where you have
to take and release one of them every time you allocate or invalidate a buffer
seems likely to cause a pretty substantial contention problem.

I'm sorry for the late reply. Thank you Tomas and Robert for checking this 
patch.
Attached is the v3 of the patch.
- I moved the unnecessary items from buf_internals.h to cached_buf.c since most 
of
 of those items are only used in that file.
- Fixed the bug of v2. Seems to pass both RT and TAP test now

Thanks for the advice on benchmark test. Please refer below for test and 
results.

[Machine spec]
CPU: 16, Number of cores per socket: 8
RHEL6.5, Memory: 240GB

scale: 3125 (about 46GB DB size)
shared_buffers = 8GB

[workload that fits into RAM but not into shared buffers]
pgbench -i -s 3125 cachetest
pgbench -c 16 -j 8 -T 600 cachetest

[Patched]
scaling factor: 3125
query mode: simple
number of clients: 16
number of threads: 8
duration: 600 s
number of transactions actually processed: 8815123
latency average = 1.089 ms
tps = 14691.436343 (including connections establishing)
tps = 14691.482714 (excluding connections establishing)

[Master/Unpatched]
...
number of transactions actually processed: 8852327
latency average = 1.084 ms
tps = 14753.814648 (including connections establishing)
tps = 14753.861589 (excluding connections establishing)


My patch caused a little overhead of about 0.42-0.46%, which I think is small.
Kindly let me know your opinions/comments about the patch or tests, etc.


Now try measuring that with a read-only workload, with prepared
statements. I've tried that on a machine with 16 cores, doing

  # 16 clients
  pgbench -n -S -j 16 -c 16 -M prepared -T 60 test

  # 1 client
  pgbench -n -S -c 1 -M prepared -T 60 test

and average from 30 runs of each looks like this:

   # clients      master         patched         %
  ---------------------------------------------------------
   1              29690          27833           93.7%
   16            300935         283383           94.1%

That's quite significant regression, considering it's optimizing an
operation that is expected to be pretty rare (people are generally not
dropping dropping objects as often as they query them).

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply via email to