On Sat, Apr 2, 2016 at 5:25 PM, Amit Kapila <amit.kapil...@gmail.com> wrote:

> On Thu, Mar 31, 2016 at 3:48 PM, Andres Freund <and...@anarazel.de> wrote:
> Here is the performance data (configuration of machine used to perform
> this test is mentioned at end of mail):
> Non-default parameters
> ------------------------------------
> max_connections = 300
> shared_buffers=8GB
> min_wal_size=10GB
> max_wal_size=15GB
> checkpoint_timeout    =35min
> maintenance_work_mem = 1GB
> checkpoint_completion_target = 0.9
> wal_buffers = 256MB
> median of 3, 20-min pgbench tpc-b results for --unlogged-tables

I have ran exactly same test on intel x86 m/c and the results are as below:

Client Count/Patch_ver (tps) 2 128 256
HEAD – Commit 2143f5e1 2832 35001 26756
clog_buf_128 2909 50685 40998
clog_buf_128 +group_update_clog_v8 2981 53043 50779
clog_buf_128 +content_lock 2843 56261 54059
clog_buf_128 +nocontent_lock 2630 56554 54429

In this m/c, I don't see any run-to-run variation, however the trend of
results seems somewhat similar to power m/c.  Clearly the first patch
increasing clog bufs to 128 shows upto 50% performance improvement on 256
client-count.  We can also observe that group clog patch gives ~24% gain on
top of increase clog bufs patch at 256 client count.  Both content lock and
no content lock patches show similar performance gains and the performance
is 6~7% better than group clog patch.  Also as on power m/c, no content
lock patch seems to show some regression at lower client count (2 clients
in this case).

Based on above results, increase_clog_bufs to 128 is a clear winner and I
think we might not want to proceed with no content lock approach patch as
that shows some regression and also it is no better than using content lock
approach patch.   Now, I think we need to decide between group clog mode
approach patch and use content lock approach patch, it seems to me that the
difference between both of these is not high (6~7%) and I think that when
there are sub-transactions involved (sub-transactions on same page as main
transaction) group clog mode patch should give better performance as then
content lock in itself will start becoming bottleneck.  Now, I think we can
address that case for content lock approach by using grouping technique on
content lock or something similar, but not sure if that is worth the
effort.   Also, I see some variation in performance data with content lock
patch on power m/c, but again that might be attributed to m/c
characteristics.  So, I think we can proceed with either group clog patch
or content lock patch and if we want to proceed with content lock approach,
then we need to do some more work on it.

Note - For both content and no content lock, I have
applied 0001-Improve-64bit-atomics-support patch.

m/c config (lscpu)
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                128
On-line CPU(s) list:   0-127
Thread(s) per core:    2
Core(s) per socket:    8
Socket(s):             8
NUMA node(s):          8
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 47
Model name:            Intel(R) Xeon(R) CPU E7- 8830  @ 2.13GHz
Stepping:              2
CPU MHz:               1064.000
BogoMIPS:              4266.62
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              24576K
NUMA node0 CPU(s):     0,65-71,96-103
NUMA node1 CPU(s):     72-79,104-111
NUMA node2 CPU(s):     80-87,112-119
NUMA node3 CPU(s):     88-95,120-127
NUMA node4 CPU(s):     1-8,33-40
NUMA node5 CPU(s):     9-16,41-48
NUMA node6 CPU(s):     17-24,49-56
NUMA node7 CPU(s):     25-32,57-64

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Reply via email to