On Thu, Mar 24, 2016 at 8:08 AM, Amit Kapila <amit.kapil...@gmail.com>
wrote:
>
> On Thu, Mar 24, 2016 at 5:40 AM, Andres Freund <and...@anarazel.de> wrote:
> >
> > Have you, in your evaluation of the performance of this patch, done
> > profiles over time? I.e. whether the performance benefits are the
> > immediately, or only after a significant amount of test time? Comparing
> > TPS over time, for both patched/unpatched looks relevant.
> >
>
> I have mainly done it with half-hour read-write tests. What do you want
to observe via smaller tests, sometimes it gives inconsistent data for
read-write tests?
>

I have done some tests on both intel and power m/c (configuration of which
are mentioned at end-of-mail) to see the results at different
time-intervals and it is always showing greater than 50% improvement in
power m/c at 128 client-count and greater than 29% improvement in Intel m/c
at 88 client-count.


Non-default parameters
------------------------------------
max_connections = 300
shared_buffers=8GB
min_wal_size=10GB
max_wal_size=15GB
checkpoint_timeout    =35min
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.9
wal_buffers = 256MB

pgbench setup
------------------------
scale factor - 300
used *unlogged* tables : pgbench -i --unlogged-tables -s 300 ..
pgbench -M prepared tpc-b


Results on Intel m/c
--------------------------------
client-count - 88

Time (minutes) Base Patch %
5 39978 51858 29.71
10 38169 52195 36.74
20 36992 52173 41.03
30 37042 52149 40.78

Results on power m/c
-----------------------------------
Client-count - 128

Time (minutes) Base Patch %
5 42479 65655 54.55
10 41876 66050 57.72
20 38099 65200 71.13
30 37838 61908 63.61
>
> >
> > Even after changing to scale 500, the performance benefits on this,
> > older 2 socket, machine were minor; even though contention on the
> > ClogControlLock was the second most severe (after ProcArrayLock).
> >
>
> I have tried this patch on mainly 8 socket machine with 300 & 1000 scale
factor.  I am hoping that you have tried this test on unlogged tables and
by the way at what client count, you have seen these results.
>

Do you think in your tests, we don't see increase in performance in your
tests because of m/c difference (sockets/cpu cores) or client-count?


Intel m/c config (lscpu)
-------------------------------------
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                128
On-line CPU(s) list:   0-127
Thread(s) per core:    2
Core(s) per socket:    8
Socket(s):             8
NUMA node(s):          8
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 47
Model name:            Intel(R) Xeon(R) CPU E7- 8830  @ 2.13GHz
Stepping:              2
CPU MHz:               1064.000
BogoMIPS:              4266.62
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              24576K
NUMA node0 CPU(s):     0,65-71,96-103
NUMA node1 CPU(s):     72-79,104-111
NUMA node2 CPU(s):     80-87,112-119
NUMA node3 CPU(s):     88-95,120-127
NUMA node4 CPU(s):     1-8,33-40
NUMA node5 CPU(s):     9-16,41-48
NUMA node6 CPU(s):     17-24,49-56
NUMA node7 CPU(s):     25-32,57-64

Power m/c config (lscpu)
-------------------------------------
Architecture:          ppc64le
Byte Order:            Little Endian
CPU(s):                192
On-line CPU(s) list:   0-191
Thread(s) per core:    8
Core(s) per socket:    1
Socket(s):             24
NUMA node(s):          4
Model:                 IBM,8286-42A
L1d cache:             64K
L1i cache:             32K
L2 cache:              512K
L3 cache:              8192K
NUMA node0 CPU(s):     0-47
NUMA node1 CPU(s):     48-95
NUMA node2 CPU(s):     96-143
NUMA node3 CPU(s):     144-191

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Reply via email to