On Wed, May 14, 2025, at 4:06 AM, Laurenz Albe wrote: > On Tue, 2025-05-13 at 17:53 -0400, Scott Mead wrote: > > On Tue, May 13, 2025, at 5:07 PM, Greg Sabino Mullane wrote: > > > On Tue, May 13, 2025 at 4:37 PM Scott Mead <sc...@meads.us> wrote: > > > > I'll open by proposing that we prevent the planner from automatically > > > > selecting parallel plans by default > > > > > > That seems a pretty heavy hammer, when we have things like > > > parallel_setup_cost that should be tweaked first. > > > > I agree it's a big hammer and I thought through parallel_setup_cost > > quite a bit myself. The problem with parallel_setup_cost is that it > > doesn't actually represent the overhead of a setting up parallel > > query for a busy system. It does define the cost of setup for a > > *single* parallel session, but it cannot accurately express the > > cost of CPU and other overhead associated with the second, third, > > fourth, etc... query that is executed as parallel. The expense to > > the operating system is a function of the _rate_ of parallel query > > executions being issued. Without new infrastructure, there's no way > > to define something that will give me a true representation of the > > cost of issuing a query with parallelism. > > There is no way for the optimizer to represent that your system is > under CPU overload currently. But I agree with Greg that > parallel_setup_cost is the setting that should be adjusted. > If PostgreSQL is more reluctant to even start considering a parallel plan, > that would be a move in the right direction in a case like this: > > > > > What is the fallout? When a high-volume, low-latency query flips to > > > > parallel execution on a busy system, we end up in a situation where > > > > the database is effectively DDOSing itself with a very high rate of > > > > connection establish and tear-down requests. Even if the query ends > > > > up being faster (it generally does not), the CPU requirements for the > > > > same workload rapidly double or worse, with most of it being spent > > > > in the OS (context switch, fork(), destroy()). When looking at the > > > > database, you'll see a high load average, and high wait for CPU with > > > > very little actual work being done within the database. > > You are painting a bleak picture indeed. I get to see PostgreSQL databases > in trouble regularly, but I have not seen anything like what you describe. > If a rather cheap, very frequent query is suddenly estimated to be > expensive enough to warrant a parallel plan, I'd suspect that the estimates > must be seriously off. > > With an argument like that, you may as well disable nested loop joins. > I have seen enough cases where disabling nested loop joins, without any > deeper analysis, made very slow queries reasonably fast.
My argument is that parallel query should not be allowed to be invoked without user intervention. Yes, nestedloop can have a similar impact, but let's take a look at the breakdown at scale of PQ: 1. pgbench -i -s 100 2. Make a query that will execute in parallel SELECT aid, a.bid, bbalance FROM pgbench_accounts a, pgbench_branches b WHERE a.bid = b.bid ORDER BY bbalance desc; Non Parallel query = 4506.559 ms Parallel query = 2849.073 Arguably, much better. 3. Use pgbench to execute these with a concurrency of 10, rate limit of 1 tps pgbench -R 1 -r -T 120 -P 5 --no-vacuum -f pselect.sql -c 10 4. The parallel query was executing ~ 2.8 seconds in isolation, but when running with 10 concurrent sessions, breaks down to 5.8 seconds the non-parallel version executes on average of 5.5 seconds. You've completely erased the gains and only have a concurrency of 5 (that's with max_parallel_workers = 8). If you increase max_parallel_workers, this quickly becomes worse. Even though parallel query is faster in isolation, even a small amount of concurrency has a quickly compounding effect the degrades very quickly (again, defaults with a 16 processor machine). Concurrency - Non Parallel Runtime - Parallel Runtime 1 - 5003.951 - 3681.452 5 - 4936.565 - 4565.171 10 - 5573.239 - 5894.397 15 - 6224.292 - 8470.982 20 - 5632.948 - 13277.857 Even with max_parallel_workers protecting us with '8' (default), we erase our advantage by the time we go to concurrency of 5 clients. Going back to the original commit which enabled PQ by default[1], it was done so that the feature would be tested during beta. I think it's time that we limit the accidental impact this can have to users by disabling the feature by default. [1]- https://github.com/postgres/postgres/commit/77cd477c4ba885cfa1ba67beaa82e06f2e182b85 " Enable parallel query by default. Change max_parallel_degree default from 0 to 2. It is possible that this is not a good idea, or that we should go with 1 worker rather than 2, but we won't find out without trying it. Along the way, reword the documentation for max_parallel_degree a little bit to hopefully make it more clear. Discussion: 20160420174631.3qjjhpwsvvx5b...@alap3.anarazel.de " > > Sure enough, I often see systems where I recommend disabling parallel > query - in fact, whenever throughput is more important than response time. > But I also see many cases where parallel query works just like it should > and leads to a better user experience. > > I have come to disable JIT by default, but not parallel query. > > The primary problem that I encounter with parallel query is that dynamic > shared memory segments grow to a size where they cause OOM errors. > That's the most frequent reason for me to recommend disabling parallel query. > > Yours, > Laurenz Albe > -- Scott Mead Amazon Web Services sc...@meads.us