Re: Disable parallel query by default

Scott Mead Tue, 13 May 2025 14:53:59 -0700

On Tue, May 13, 2025, at 5:07 PM, Greg Sabino Mullane wrote:
> On Tue, May 13, 2025 at 4:37 PM Scott Mead <[email protected]> wrote:
>> I'll open by proposing that we prevent the planner from automatically 
>> selecting parallel plans by default
> 
> That seems a pretty heavy hammer, when we have things like 
> parallel_setup_cost that should be tweaked first.

I agree it's a big hammer and I thought through parallel_setup_cost quite a bit 
myself.  The problem with parallel_setup_cost is that it doesn't actually 
represent the overhead of a setting up parallel query for a busy system.  It 
does define the cost of setup for a *single* parallel session, but it cannot 
accurately express the cost of CPU and other overhead associated with the 
second, third, fourth, etc... query that is executed as parallel.  The expense 
to the operating system is a function of the _rate_ of parallel query 
executions being issued.  Without new infrastructure, there's no way to define 
something that will give me a true representation of the cost of issuing a 
query with parallelism.


>> The recommendation that I give to users is pretty straightforward: "Disable 
>> automatic parallel query, enable it for queries where you find substantial 
>> savings and can control the rate of execution."  I always tell users that if 
>> they're using parallel query for anything that should execute in less than 5 
>> minutes, they're probably pushing on the wrong tuning strategy as the load 
>> induced by the parallel query infrastructure is likely going to negate the 
>> savings that they're getting.
> 
> Five minutes?! That's not been my experience. Not claiming parallelism is 
> perfect yet, but there are plenty of parallel performance savings under the 
> five minute mark.

Absolutely, I've seen 1 second queries go to 200ms with parallelism of 2.  The 
problem isn't about making that query faster in isolation, the problem is that 
every single one of those means a new connection.

If you have a connection pool from your application and you issue 60 queries 
per minute, each that can go from 1 second to 200 ms, That means that you are 
making 120 connections per minute back to the DB.

As we know, connection establishment is brutal.... 

Executing a pgbench with 15 clients for 10 seconds using "SELECT 1;" as the 
workload gives me 315,449 tps.  If I add the -C flag (connect / disconnect for 
each transaction), I get 684 TPS.   The overhead of a connection is more than 
just to the specific query being optimized, it has far-reaching impact even 
outside of the postgres processes on the machine. 


$ pgbench -f select1.sql --no-vacuum -c 15 -T 10
pgbench (16.4)
transaction type: select1.sql
scaling factor: 1
query mode: simple
number of clients: 15
number of threads: 1
maximum number of tries: 1
duration: 10 s
number of transactions actually processed: 3146966
number of failed transactions: 0 (0.000%)
latency average = 0.048 ms
initial connection time = 28.854 ms
tps = 315449.609763 (without initial connection time)



$ pgbench -f select1.sql --no-vacuum -c 15 -T 10 -C
pgbench (16.4)
transaction type: select1.sql
scaling factor: 1
query mode: simple
number of clients: 15
number of threads: 1
maximum number of tries: 1
duration: 10 s
number of transactions actually processed: 6848
number of failed transactions: 0 (0.000%)
latency average = 21.910 ms
average connection time = 1.455 ms
tps = 684.615907 (including reconnection times)


> 
> Cheers,
> Greg
> 
> --
> Crunchy Data - https://www.crunchydata.com
> Enterprise Postgres Software Products & Tech Support
> 

--
Scott Mead
Amazon Web Services
[email protected]
Re: Disable parallel query by default

Reply via email to