Re: Postgresql TPS Bottleneck

2022-03-31 Thread Mladen Gogala

On 3/31/22 07:50, wakandavis...@outlook.com wrote:

Hi everyone,

I am a bachelor's student and writing my thesis about the scaling and
performance of an application. The application is using postgresql as a
database but we can't scale any further currently as it seems postgres
is hitting the limit.

With the application, as well as with pgbench, we don't get more than
(max) 70k TPS on postgres. But the servers' resources are not utilized
completely (more below).


I would try monitoring using "perf top" and "atop -d" to see what is 
going on on the system. Also, try using sar to figure out what's going 
on. Are you paging, waiting for I/O or having some other kind of 
bottleneck. Once you figure where is your system spending time, you can 
address the problem. In addition to that, analyze the log files with 
pgbadger to find out which queries are time consuming and try optimizing 
them.


--
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com


Re: Postgresql TPS Bottleneck

2022-03-31 Thread Tomas Vondra



On 3/31/22 13:50, wakandavis...@outlook.com wrote:
> Hi everyone,
> 
> I am a bachelor's student and writing my thesis about the scaling and
> performance of an application. The application is using postgresql as a
> database but we can't scale any further currently as it seems postgres
> is hitting the limit.
> 
> With the application, as well as with pgbench, we don't get more than
> (max) 70k TPS on postgres. But the servers' resources are not utilized
> completely (more below).
> 
> I've tried many different configurations but none of them had any major
> performance impact (unless fsync and synchronous_commit = off).
> 
> This is the (custom) configuration I am using:
> 
> shared_buffers=65551953kB
> effective_cache_size=147491895kB
> huge_pages=on
> min_wal_size=20GB
> max_wal_size=200GB
> wal_buffers=1GB
> max_wal_senders=0
> archive_mode=off
> wal_level=minimal
> work_mem=2GB
> maintenance_work_mem=4GB
> checkpoint_completion_target=0.9
> checkpoint_timeout = 30min
> random_page_cost=1.1
> bgwriter_flush_after = 2MB
> effective_io_concurrency = 200
> # Disabled just for performance experiments
> fsync = off
> synchronous_commit = off
> full_page_writes = on
> max_worker_processes=64
> max_parallel_workers=64
> max_parallel_workers_per_gather=10
> max_parallel_maintenance_workers=12
> 
> The system is as follows:
> 
> * 64 Cores (Intel Xeon Gold 6130 (Skylake, 2.10GHz, 2 CPUs/node, 16
> cores/CPU))
> * 192 GiB RAM (12 * 16GiB DIMM DDR4 Synchronous Registered (Buffered)
> 2666 MHz (0.4 ns))
> * 2 * SSD SATA Samsung MZ7KM240HMHQ0D3 (one is used for the WAL and the
> other for the data)
> * 10 Gbps network link
> * OS: Debian 11
> * Postgres 13 from apt
> 
> (I've also written a stackoverflow post about it -
> https://stackoverflow.com/questions/71631348/postgresql-bottleneck-neither-cpu-network-nor-i-o
> 
> )
> 
>   
> performance - Postgresql bottleneck neither CPU, network nor I/O - Stack
> Overflow
> 
> We are testing our application for performance, which is using
> Postgresql 13 as a database. It is very insert and update heavy and we
> cannot get more than 65k TPS on the database. But none of the m...
> stackoverflow.com
> 
> 
> 
> Below is just an example of the pgbench I ran:
> 
> pgbench -i -s 50 -U postgres -h  -d 
> pgbench -c 64 -j 32 -t 10 -h  -U postgres 
> 

I'd bet you need to use "pgbench -N" because the regular transaction
updates the "branch" table, and you only have 50 branches. Which
probably means a lot of conflicts and locking. The other thing you might
try is "-M prepared" which saves time on query planning.

FWIW I really doubt "fsync=off" will give you any meaningful results.

Maybe try assessing the hardware capability first, using tools like fio
to measure IOPS with different workloads.

Then try pgbench with a single client, and finally increase the number
of clients and see how it behaves and compare it to what you expect.

In any case, every system has a bottleneck. You're clearly hitting one,
otherwise the numbers would go faster. Usually, it's either CPU bound,
in which case "perf top" might tell us more, or it's IO bound, in which
case try e.g. "iostat -x -k 1" or something.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Postgresql TPS Bottleneck

2022-03-31 Thread MichaelDBA
While setting these 2 parameters to off will make things go faster 
(especially for fsync), it is unrealistic to have these settings in a 
production environment, especiall fsync=off.  You might get by with 
synchronous_commit=off, but with fsync=off you could end up with 
corruption in your database.  synchronous_commit may not make anything 
go faster just change where the time is being spent.


Regards,
Michael Vitale


wakandavis...@outlook.com wrote on 3/31/2022 7:50 AM:

fsync = off
synchronous_commit = off






Re: HIGH IO and Less CPU utilization

2022-03-31 Thread Mladen Gogala

On 3/29/22 14:04, Rambabu g wrote:

Hi All,

We have an issue with high load and IO Wait's but less cpu on postgres 
Database, The emp Table size is around 500GB, and the connections are 
very less.


Please suggest to us do we need to change and config parameters at 
system level or Postgres configuration.



The "emp" table is 500 GB? You're doing something wrong, The "emp" table 
should have 14 rows and the "dept" table should have 4 rows The "bonus" 
and "salgrade" tables should also be very small. The guy named Bruce 
Scott could probably help you with that schema. Other than that, do you 
have a SQL causing all this ruckus and a detailed explain plan ("explain 
(analyze,costs,buffers)") for the SQL using most of the time? You can 
analyze the log file with PgBadger to get the queries consuming the most 
time.


--
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com


Re: Postgresql TPS Bottleneck

2022-03-31 Thread Guillaume Cottenceau
 writes:

> Optimally I would like to fully use the CPU and get about 3-4 times
> more TPS (if even possible).

Disclaimer: I'm really not a pg performance expert.
I don't understand your hope to fully use the CPU; if your
scenario is disk-limited, which may very well be the case, then
of course you cannot fully use the CPU. With synchronous commits
and fsync, the system is probably spending time just waiting for
the disks to report the writes completion. Are iostat/vmstat
showing a lot of IO-wait?
Also, if you can live with a few lost transactions in case of
server crash, synchronous_commit=off is very ok and provides a
lot of performance gain.

-- 
Guillaume Cottenceau




Postgresql TPS Bottleneck

2022-03-31 Thread wakandavision
Hi everyone,

I am a bachelor's student and writing my thesis about the scaling and
performance of an application. The application is using postgresql as a
database but we can't scale any further currently as it seems postgres
is hitting the limit.

With the application, as well as with pgbench, we don't get more than
(max) 70k TPS on postgres. But the servers' resources are not utilized
completely (more below).

I've tried many different configurations but none of them had any major
performance impact (unless fsync and synchronous_commit = off).

This is the (custom) configuration I am using:

shared_buffers=65551953kB
effective_cache_size=147491895kB
huge_pages=on
min_wal_size=20GB
max_wal_size=200GB
wal_buffers=1GB
max_wal_senders=0
archive_mode=off
wal_level=minimal
work_mem=2GB
maintenance_work_mem=4GB
checkpoint_completion_target=0.9
checkpoint_timeout = 30min
random_page_cost=1.1
bgwriter_flush_after = 2MB
effective_io_concurrency = 200
# Disabled just for performance experiments
fsync = off
synchronous_commit = off
full_page_writes = on
max_worker_processes=64
max_parallel_workers=64
max_parallel_workers_per_gather=10
max_parallel_maintenance_workers=12

The system is as follows:

* 64 Cores (Intel Xeon Gold 6130 (Skylake, 2.10GHz, 2 CPUs/node, 16
cores/CPU))
* 192 GiB RAM (12 * 16GiB DIMM DDR4 Synchronous Registered (Buffered)
2666 MHz (0.4 ns))
* 2 * SSD SATA Samsung MZ7KM240HMHQ0D3 (one is used for the WAL and the
other for the data)
* 10 Gbps network link
* OS: Debian 11
* Postgres 13 from apt

(I've also written a stackoverflow post about it -
https://stackoverflow.com/questions/71631348/postgresql-bottleneck-neither-cpu-network-nor-i-o
 )
[https://cdn.sstatic.net/Sites/stackoverflow/Img/apple-touch-i...@2.png?v=73d79a89bded]
performance - Postgresql bottleneck neither CPU, network nor I/O - Stack 
Overflow
We are testing our application for performance, which is using Postgresql 13 as 
a database. It is very insert and update heavy and we cannot get more than 65k 
TPS on the database. But none of the m...
stackoverflow.com


Below is just an example of the pgbench I ran:

pgbench -i -s 50 -U postgres -h  -d 
pgbench -c 64 -j 32 -t 10 -h  -U postgres 

starting vacuum...end.
transaction type: 
scaling factor: 50
query mode: simple
number of clients: 64
number of threads: 32
number of transactions per client: 10
number of transactions actually processed: 640/640
latency average = 0.976 ms
tps = 65584.664360 (including connections establishing)
tps = 65594.330678 (excluding connections establishing)

As comparison (average of three runs with pgbench as above):

num clients default config  custom config above

10  11336   16848
20  19528   30187
30  25769   39430
40  29792   50150
50  31096   60133
60  33900   64916
70  34986   64308
80  34170   63075
90  35108   59910
100 34864   58320
120 35124   55414
140 33403   53610

(with fsync=off alone I almost get the TPS from the right already)

For `-S -M prepared` the TPS is ~700k and for `-S` ~500k but as the
application is very write heavy this is not really useful for me.

With the app the CPU is only at 25% load and the disks are also no
problem. For pgbench its about 75% CPU but still no disk bottleneck
(about 5%).

There are also Grafana snapshots I created for the system (node-
exporter) and postgres (prometheus-postgres-exporter) while running
with our application (same configuration as above). Both do not show
any kind of bottleneck (except high amounts context switches and pages
in/out)

node: 
https://147.87.255.221:3000/dashboard/snapshot/3eXe1sS3QDL6cbvI7HkPjYnjrVLCNOOF
postgres: 
https://147.87.255.221:3000/dashboard/snapshot/wHkRphdr3D4k5kRckhn57Pc6ZD3st1x7

I have also looked at postgresql's lock tables while running the above
experiment, but there is nothing which seemed strange to me. There are
about 300 locks but all are granted (select * from pg_locks).

Also, the following query:

select wait_event, count(*) from pg_stat_activity where state='idle in
transaction' group by wait_event;

did not show some contention there the output looks always similar to
this (80 clients):

wait_event  | count
--+---
 ClientRead | 2
 SerializableFinishedList  | 1

Thanks to the slack channel I got a link to edb which used a more
powerful server and they achieved also about 70k TPS but did not set
fsync=off. So maybe they were limited by disk IO (just guessing, as
unfortunately, 

Re: HIGH IO and Less CPU utilization

2022-03-31 Thread Justin Pryzby
On Wed, Mar 30, 2022 at 10:17:38AM +0530, Rambabu g wrote:
> Hi Justin,
> 
> Only one query is causing the issue, sharing the def of indexes. Please
> have a look.
> 
> > > There are three indexes defined on the table, each one is around 20 to 
> > > 25GB
> 
>  tp| character varying(2000)   | yes| tp   | extended |
> 
>852 | 00:09:56.131136 | IO  | DataFileRead  | explain
> analyze select distinct  empno  from emp where sname='test' and tp='EMP
> NAME 1'

The server is doing a scan of the large table.
The tp index matches a lot of rows (13e6) which probably aren't clustered, so
it elects to scan the 500GB table each time.

Looking at this in isolation, maybe it'd be enough to create an index on
tp,empno (and maybe drop the tp index).  CREATE INDEX CONCURRENTLY if you don't
want to disrupt other queries.

But This seems like something that should be solved in a better way though ;
like keeping a table with all the necessary "empno" maintained with "INSERT ON
CONFLICT DO NOTHING".  Or a trigger.