Re: Benchmarking: How to identify bottleneck (limiting factor) and achieve "linear scalability"?

2019-01-28 Thread Saurabh Nanda
Yet another update:

a) I've tried everything with me EX41-SSD server on Hetzner and nothing is
increasing the performance over & above the default configuration.
b) I tried commissioning a new EX41-SSD server and was able to replicate
the same pathetic performance numbers.
c) I tried another cloud hosting provider (E2E Networks) and just the raw
performance numbers (with default configuration) are blowing Hetzner out of
the water.

This leads me to believe that my assumption of the first hardware (or SSD)
being faulty is incorrect. Something is wrong with either the EX41-SSD
hardware or the out-of-box configuration. I'm commissioning something from
their PX line (which is marked as "Datacenter Edition") and checking if
that makes things better.

++--+--+
| client | Hetzner  | E2E Networks |
|| EX41-SSD [1] | Cloud Server [2] |
++--+--+
| 1  | ~160 | ~400 |
++--+--+
| 4  | ~460 | ~1450|
++--+--+
| 8  | ~850 | ~2600|
++--+--+
| 12 | ~1200| ~4000|
++--+--+


[1] lshw output for Hetzner -
https://gist.github.com/saurabhnanda/613813d0d58fe1a406a8ce9b62ad10a9
[2] lshw output for E2E -
https://gist.github.com/saurabhnanda/d276603990aa773269bad35f335344eb -
since this is a cloud server low-level hardware info is not available. It's
advertised as a 9vCPU + 30GB RAM + SSD cloud instance.

-- Saurabh.


Re: Benchmarking: How to identify bottleneck (limiting factor) and achieve "linear scalability"?

2019-01-28 Thread Saurabh Nanda
>
> Do you know which of the settings is causing lower TPS ?
>
> I suggest to check shared_buffers.
>
> If you haven't done it, disabling THP and KSM can resolve performance
> issues,
> esp. with large RAM like shared_buffers, at least with older kernels.
>
> https://www.postgresql.org/message-id/20170718180152.GE17566%40telsasoft.com


I've tried reducing the number of variables to a bare minimum and have the
following three cases now with RAID disabled:

a) only default settings [1]
b) default settings with shared_buffers=2G [2]
c) default settings with shared_buffers=2G & huge_pages=on [3]

The numbers are still not making any sense whatsoever.

++--++---+
| client | Defaults [1] | buffers=2G [2] | buffers=2G|
||  || huge_pages=on |
++--++---+
| 1  | 348-475 (??) | 529-583 (??)   | 155-290   |
++--++---+
| 4  | 436-452  | 451-452| 388-403   |
++--++---+
| 8  | 862-869  | 859-861| 778-781   |
++--++---+
| 12 | 1210-1219| 1220-1225  | 1110- |
++--++---+



[1] Default settings
 checkpoint_completion_target=0.5
 default_statistics_target=100
 effective_io_concurrency=1
 max_parallel_workers=8
 max_parallel_workers_per_gather=2
 max_wal_size=1024 MB
 max_worker_processes=20
 min_wal_size=80 MB
 random_page_cost=4
* shared_buffers=1024 8kB*
 wal_buffers=32 8kB
 work_mem=4096 kB

[2] Increased shared_buffers
 checkpoint_completion_target=0.5
 default_statistics_target=100
 effective_io_concurrency=1
 max_parallel_workers=8
 max_parallel_workers_per_gather=2
 max_wal_size=1024 MB
 max_worker_processes=20
 min_wal_size=80 MB
 random_page_cost=4
* shared_buffers=262144 8kB*
 wal_buffers=2048 8kB
 work_mem=4096 kB

[3] Same settings as [2] with huge_pages=on and the following changes:

 $ cat /sys/kernel/mm/transparent_hugepage/enabled
 always madvise [never]

 $ cat /proc/meminfo |grep -i huge
 AnonHugePages: 0 kB
 ShmemHugePages:0 kB
 HugePages_Total:5000
 HugePages_Free: 3940
 HugePages_Rsvd:1
 HugePages_Surp:0
 Hugepagesize:   2048 kB

-- Saurabh.


Re: dsa_allocate() faliure

2019-01-28 Thread Thomas Munro
On Tue, Jan 29, 2019 at 2:50 AM Arne Roland  wrote:
> does anybody have any idea what goes wrong here? Is there some additional 
> information that could be helpful?

Hi Arne,

This seems to be a bug; that error should not be reached.  I wonder if
it is a different manifestation of the bug reported as #15585 (ie some
type of rare corruption).  Are you able to reproduce this
consistently?  Can you please show the query plan?

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Benchmarking: How to identify bottleneck (limiting factor) and achieve "linear scalability"?

2019-01-28 Thread Saurabh Nanda
I've disabled transpare huge-pages and enabled huge_pages as given below.
Let's see what happens. (I feel like a monkey pressing random buttons
trying to turn a light bulb on... and I'm sure the monkey would've had it
easier!)

AnonHugePages: 0 kB
ShmemHugePages:0 kB
HugePages_Total:5000
HugePages_Free: 4954
HugePages_Rsvd: 1015
HugePages_Surp:0
Hugepagesize:   2048 kB

-- Saurabh.


RE: dsa_allocate() faliure

2019-01-28 Thread Arne Roland
Hello,

does anybody have any idea what goes wrong here? Is there some additional 
information that could be helpful?

All the best
Arne Roland


Re: Benchmarking: How to identify bottleneck (limiting factor) and achieve "linear scalability"?

2019-01-28 Thread Saurabh Nanda
>
> You should probably include the detailed hardware you are working on -
> especially for the SSD, the model can have a big impact, as well as its
> wear.
>

What's the best tool to get meaningful information for SSD drives?

-- Saurabh.


Re: Benchmarking: How to identify bottleneck (limiting factor) and achieve "linear scalability"?

2019-01-28 Thread Nicolas Charles

Le 28/01/2019 à 15:03, Saurabh Nanda a écrit :
An update. It seems (to my untrained eye) that something is wrong with 
the second SSD in the RAID configuration. Here's my question on 
serverfault related to what I saw with iostat - 
https://serverfault.com/questions/951096/difference-in-utilisation-reported-by-iostat-for-two-identical-disks-in-raid1


I've disabled RAID and rebooted the server to run the benchmarks with 
client=1,4,8,12 with shared_buffers=8MB (default) vs 
shared_buffers=2GB (optimised?) and will report back.



You should probably include the detailed hardware you are working on - 
especially for the SSD, the model can have a big impact, as well as its 
wear.



Nicolas





Re: upgrade from 9.6 to 10/11

2019-01-28 Thread Laurenz Albe
Mariel Cherkassky wrote:
> I'm planning our db upgrade from 9.6. Basically I wanted to check how stable
> is pg11 version. I'm considering upgrading from 9.6 to 10 and then to 11 
> immediatly.
> Is there a way to upgrade directly to 11 and jump on 10.

v11 is stable, else the PGDG would not release it.

There is no need to upgrade via v10, I recommend that you upgrade from 9.6
to v11 directly, either via dump/restore or with pg_upgrade.

https://www.postgresql.org/docs/current/upgrading.html

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com