Hi everyone,

I've written a new open source tool for easily parallelising SQL scripts in 
postgres.   [obligatory plug:   https://github.com/gbb/par_psql   ]

Using it, I'm seeing a problem that I've also seen in other postgres projects 
involving high degrees of parallelisation in the last 12 months.

Basically:

- I have machines here with up to 16 CPU cores and 128GB memory, very fast SSDs 
and controller etc, carefully configured kernel/postgresql.conf for high 
performance.

- Ordinary queries parallelise nearly perfectly (e.g. SELECT some_stuff ...), 
e.g. almost up to 16x performance improvement.

- Non-DB stuff like GDAL, python etc. parallelise nearly perfectly. 

- HOWEVER calls to CPU-intensive user-defined pl/pgsql functions (e.g. SELECT 
myfunction(some_stuff)) do not parallelise well, even when they are 
independently defined functions, or accessing tables in a read-only way. They 
hit a limit of 2.5x performance improvement relative to single-CPU performance 
(pg9.4) and merely 2x performance (pg9.3) regardless of how many CPU cores I 
throw at them. This is about 6 times slower than I'm expecting. 


I can't see what would be locking. It seems like it's the pl/pgsql environment 
itself that is somehow locking or incurring some huge frictional costs. Whether 
I use independently defined functions, independent source tables, independent 
output tables, makes no difference whatsoever, so it doesn't feel 
'lock-related'. It also doesn't seem to be WAL/synchronisation related, as the 
machines I'm using can hit absurdly high pgbench rates, and I'm using unlogged 
tables for output. 

Take a quick peek here: 
https://github.com/gbb/par_psql/blob/master/BENCHMARKS.md

I'm wondering what I'm missing here. Any ideas? 

Graeme.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to