On Thu, 28 Nov 2024 at 17:39, jian he <jian.universal...@gmail.com> wrote: > v4 logic is to choose one with the least number of columns. > if there is more than one with the least number of columns, simply > choose the first one > in the matched list.
The new code inside remove_useless_groupby_columns() is still more complex than what's needed. There's no need to build a 2nd list with the matching indexes and loop over it. Just skip the non-matching indexes and record the best match. I imagined it would look a bit more like the attached patch. I also removed the pk_pos column as I didn't see any point in it. You were not using it. If some future code needs that, it can be added then. I also fixed up a few comments that needed attention. I didn't look at the regression tests being added. I think given that there's now no special case for primary key indexes that we don't need a completely new set of tests. I think it'll make more sense to adjust some of the existing tests to use a unique constraint instead of a PK and then adjust a column's NOT NULL property to check that part of the code is working correctly. Another issue with this is that the list of indexes being used is not sorted by Oid. If you look at RelationGetIndexList() you'll see that we perform a sort. That gives us more consistency in the planner. I think this patch should be doing that too, otherwise, you could end up with a plan change after some operation that changes the order that the indexes are stored in the pg_index table. It's probably fairly unlikely, but it is the sort of issue that someone will eventually discover and report. David
v5-0001-remove-useless-group-by-columns-via-unique-not-nu.patch
Description: Binary data