On Thu, May 16, 2024 at 3:47 PM Andrei Lepikhov <a.lepik...@postgrespro.ru> wrote: > > On 24.04.2024 13:25, jian he wrote: > > hi. > > I found an interesting case. > > > > CREATE TABLE t1 AS > > SELECT (i % 10)::numeric AS x,(i % 10)::int8 AS y,'abc' || i % 10 AS > > z, i::int4 AS w > > FROM generate_series(1, 100) AS i; > > CREATE INDEX t1_x_y_idx ON t1 (x, y); > > ANALYZE t1; > > SET enable_hashagg = off; > > SET enable_seqscan = off; > > > > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,y,w; > > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,w,y,z; > > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,w,y; > > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,w,z,y; > > the above part will use: > > -> Incremental Sort > > Sort Key: x, $, $, $ > > Presorted Key: x > > -> Index Scan using t1_x_y_idx on t1 > > > > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY z,y,w,x; > > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY w,y,z,x; > > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,z,x,w; > > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,w,x,z; > > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,x,z,w; > > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,x,w,z; > > > > these will use: > > -> Incremental Sort > > Sort Key: x, y, $, $ > > Presorted Key: x, y > > -> Index Scan using t1_x_y_idx on t1 > > > > I guess this is fine, but not optimal? > It looks like a bug right now - in current implementation we don't > differentiate different orders. So: > 1. Applying all the patches from the thread which I proposed as an > answer to T.Lane last rebuke - does behavior still the same?.
I've applied these 4 patches in this thread. final_improvements.patch minor_comment.patch get_useful_group_keys_orderings.patch group_by_asymmetry.diff The behavior is still the same as the master. meaning that below quoted queries are still using "Presorted Key: x". > > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,y,w; > > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,w,y,z; > > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,w,y; > > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,w,z,y; > > the above part will use: > > -> Incremental Sort > > Sort Key: x, $, $, $ > > Presorted Key: x > > -> Index Scan using t1_x_y_idx on t1 > 2. Could you try to find the reason? the following are my understanding, it may be wrong... function get_useful_group_keys_orderings, may generate alternative pathkeys and group clauses. for different PathKeyInfo sub functions within add_paths_to_grouping_rel: make_ordered_path, create_agg_path will yield the same cost. function add_path (within add_paths_to_grouping_rel) will add these paths (different PathKeyInfos) in a *sequential* order. then later in the function set_cheapest. `foreach(p, parent_rel->pathlist)` will iterate these different paths in a sequential order. so in set_cheapest if 2 path costs are the same, then the first path residing in parent_rel->pathlist wins. fo a concrete example: CREATE TABLE t1 AS SELECT (i % 10)::numeric AS x,(i % 10)::int8 AS y,'abc' || i % 10 AS z, i::float4 AS w FROM generate_series(1, 100) AS i; CREATE INDEX t1_x_y_idx ON t1 (x, y); ANALYZE t1; SET enable_hashagg = off; SET enable_seqscan = off; EXPLAIN (COSTS OFF, verbose) SELECT count(*) FROM t1 GROUP BY x,z,y,w; add_paths_to_grouping_rel will do the following in a sequential order. A. generate and add original PathKeyInfo(groupby x,z,y,w) B. generate and add alternative PathKeyInfo(groupby x,y,z,w). (because of the index path) C. can_hash is true, generate a HashAgg Path (because of enable_hashagg is set to off, so this path the cost is very high) As mentioned previously, both A and B can use Incremental Sort, set_cheapest will choose the A instead of B, because A step generated path **first** satisfies increment sort.