On Fri, Jun 6, 2025 at 1:29 PM Tatsuro Yamada <yamatat...@gmail.com> wrote:
> Hi hackers, > > When I measured the execution time of a certain query with parallel query > enabled and disabled, I found that the execution time was slower when > parallel query was enabled. > > To improve the performance of the parallel query, I considered adjusting > the execution plan and attempted to switch from GroupAggregate to > HashAggregate. However, I noticed that there was no GUC parameter to > disable GroupAggregate. > > Therefore, I propose adding a new GUC parameter: enable_groupagg. > > Below are the results of a performance test where I disabled > GroupAggregate using enable_groupagg. In this case, the planner chose > HashAggregate instead, which improved performance by about 35 times. > > # Query Execution Results (Average of 3 measurements) > - With parallel query: 39546 seconds > - With parallel query and enable_groupagg turned off: 1115 seconds > > # Query and Data Used (attached to this email) > - Query: test_query.sql > - Data: create_table.sql > > # The steps to run the test are as follows. > For example, on psql: > > 1. Create tables: > \i create_table.sql > > 2. Execute a query: > \i test_query.sql > > 3. Execute a query using the new GUC parameter: > set enable_groupagg to off; > \i test_query.sql > > As a benefit to users, while there has previously been a GUC parameter > to control HashAggregate, there was no corresponding way to control > GroupAggregate. This patch addresses that, giving users more flexibility > in tuning execution plans. > > I've attached a WIP patch that adds this GUC parameter. I would > appreciate any feedback, especially regarding how many test cases I > should create. > I first thought enable_hashagg should be sufficient to choose one strategy over the other. But that is not true, enable_hashagg = true allows both the strategies, enable_hashagg = false disables just hash strategy. There's no way to disable group agg alone. So I think it makes sense to have this GUC. I am surprised that we didn't see this being a problem for so long. We seem to disable mixed strategy when enable_hashagg is false. Do we want to do the same when enable_groupagg = false as well? > > To create new test cases for enable_groupagg, I looked into existing > test cases that use enable_hashagg and found that it is used in many > places (62 places). Should I add a test case for enable_groupagg in > the same place as enable_hashagg? I think that adding a new feature > requires a minimum number of test cases, so I would appreciate your > advice. > Some of those instances are for plan stability, all of which need not be replicated. But some of them explicitly test sort based grouping. For rest of them hash based plan seems to be the best one, so explicit enable_groupagg = false is not needed. We will need some test to test the switch though. > > > Additionally, based on the execution plan, I suspect the slowdown in the > parallel query might be caused by misestimates related to Sort or > Gather Merge. > While resolving those misestimates would ideally improve the root issue, > I'd like to keep the focus of this thread on adding the GUC parameter. > Then, I plan to report or address the estimation problem in a separate > thread. > > +1. -- Best Wishes, Ashutosh Bapat