On Wed, May 14, 2025, at 4:06 AM, Laurenz Albe wrote:
> On Tue, 2025-05-13 at 17:53 -0400, Scott Mead wrote:
> > On Tue, May 13, 2025, at 5:07 PM, Greg Sabino Mullane wrote:
> > > On Tue, May 13, 2025 at 4:37 PM Scott Mead <sc...@meads.us> wrote:
> > > > I'll open by proposing that we prevent the planner from automatically
> > > > selecting parallel plans by default
> > > 
> > > That seems a pretty heavy hammer, when we have things like
> > > parallel_setup_cost that should be tweaked first.
> > 
> > I agree it's a big hammer and I thought through parallel_setup_cost
> > quite a bit myself.  The problem with parallel_setup_cost is that it
> > doesn't actually represent the overhead of a setting up parallel
> > query for a busy system.  It does define the cost of setup for a
> > *single* parallel session, but it cannot accurately express the
> > cost of CPU and other overhead associated with the second, third,
> > fourth, etc... query that is executed as parallel.  The expense to
> > the operating system is a function of the _rate_ of parallel query
> > executions being issued.  Without new infrastructure, there's no way
> > to define something that will give me a true representation of the
> > cost of issuing a query with parallelism.
> 
> There is no way for the optimizer to represent that your system is
> under CPU overload currently.  But I agree with Greg that
> parallel_setup_cost is the setting that should be adjusted.
> If PostgreSQL is more reluctant to even start considering a parallel plan,
> that would be a move in the right direction in a case like this:
> 
> > > > What is the fallout?  When a high-volume, low-latency query flips to
> > > > parallel execution on a busy system, we end up in a situation where
> > > > the database is effectively DDOSing itself with a very high rate of
> > > > connection establish and tear-down requests.  Even if the query ends
> > > > up being faster (it generally does not), the CPU requirements for the
> > > > same workload rapidly double or worse, with most of it being spent
> > > > in the OS (context switch, fork(), destroy()).  When looking at the
> > > > database, you'll see a high load average, and high wait for CPU with
> > > > very little actual work being done within the database.
> 
> You are painting a bleak picture indeed.  I get to see PostgreSQL databases
> in trouble regularly, but I have not seen anything like what you describe.
> If a rather cheap, very frequent query is suddenly estimated to be
> expensive enough to warrant a parallel plan, I'd suspect that the estimates
> must be seriously off.
> 
> With an argument like that, you may as well disable nested loop joins.
> I have seen enough cases where disabling nested loop joins, without any
> deeper analysis, made very slow queries reasonably fast.

My argument is that parallel query should not be allowed to be invoked without 
user intervention.  Yes, nestedloop can have a similar impact, but let's take a 
look at the breakdown at scale of PQ: 

1. pgbench -i -s 100

2. Make a query that will execute in parallel

SELECT aid, a.bid, bbalance
   FROM pgbench_accounts a, pgbench_branches b 
WHERE a.bid = b.bid
ORDER BY bbalance desc;

Non Parallel query = 4506.559 ms
Parallel query = 2849.073 

Arguably, much better.

3. Use pgbench to execute these with a concurrency of 10, rate limit of 1 tps

pgbench -R 1 -r -T 120 -P 5 --no-vacuum -f pselect.sql -c 10

4. The parallel query was executing ~ 2.8 seconds in isolation, but when 
running with 10 concurrent sessions, breaks down to 5.8 seconds the 
non-parallel version executes on average of 5.5 seconds.  You've completely 
erased the gains and only have a concurrency of 5 (that's with 
max_parallel_workers = 8).  If you increase max_parallel_workers, this quickly 
becomes worse.

Even though parallel query is faster in isolation, even a small amount of 
concurrency has a quickly compounding effect the degrades very quickly (again, 
defaults with a 16 processor machine).

Concurrency - Non Parallel Runtime - Parallel Runtime
              1        -       5003.951              -   3681.452
              5        -      4936.565              -   4565.171
             10        -      5573.239              -   5894.397
             15       -       6224.292              -   8470.982
             20      -       5632.948              -   13277.857

Even with max_parallel_workers protecting us with '8' (default), we erase our 
advantage by the time we go to concurrency of 5 clients.  

Going back to the original commit which enabled PQ by default[1], it was done 
so that the feature would be tested during beta.  I think it's time that we 
limit the accidental impact this can have to users by disabling the feature by 
default. 



[1]-
https://github.com/postgres/postgres/commit/77cd477c4ba885cfa1ba67beaa82e06f2e182b85

"
Enable parallel query by default.
Change max_parallel_degree default from 0 to 2.  It is possible that
this is not a good idea, or that we should go with 1 worker rather
than 2, but we won't find out without trying it.  Along the way,
reword the documentation for max_parallel_degree a little bit to
hopefully make it more clear.

Discussion: 20160420174631.3qjjhpwsvvx5b...@alap3.anarazel.de
"

> 
> Sure enough, I often see systems where I recommend disabling parallel
> query - in fact, whenever throughput is more important than response time.
> But I also see many cases where parallel query works just like it should
> and leads to a better user experience.
> 
> I have come to disable JIT by default, but not parallel query.
> 
> The primary problem that I encounter with parallel query is that dynamic
> shared memory segments grow to a size where they cause OOM errors.
> That's the most frequent reason for me to recommend disabling parallel query.
> 
> Yours,
> Laurenz Albe
> 

--
Scott Mead
Amazon Web Services
sc...@meads.us


Reply via email to