The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation: tested, passed
This is a review of the patch to remove useless DISTINCT columns from the DISTINCT clause https://www.postgresql.org/message-id/CAKJS1f8UMJ137sRuVSnEMDDpa57Q71JuLZi4yLCFMekNYVYqaQ%40mail.gmail.com Contents & Purpose ================== This patch removes any additional columns in the DISTINCT clause when the table's primary key columns are entirely present in the DISTINCT clause. This optimization works because the PK columns functionally determine the other columns in the relation. The patch includes regression test cases. Initial Run =========== The patch applies cleanly to HEAD. All tests which are run as part of the `installcheck` target pass correctly (all existing tests pass, all new tests pass with the patch applied and fail without it). All TAP tests pass. All tests which are run as part of the `installcheck-world` target pass except one of the Bloom contrib module tests in `contrib/bloom/sql/bloom.sql`, `CREATE INDEX bloomidx ON tst USING bloom (i, t) WITH (col1 = 3);` which is currently also failing (crashing) for me on master, so it is unrelated to the patch. The test cases seem to cover the new behavior. Functionality ============= Given that this optimization is safe for the GROUP BY clause (you can remove functionally dependent attributes from the clause there as well), the logic does seem to follow for DISTINCT. It seems semantically correct to do this. As for the implementation, the author seems to have reasonably addressed concerns expressed over the distinction between DISTINCT and DISTINCT ON. As far as I can see, this should be fine. Code Review =========== For a small performance hit but an improvement in readability, the length check can be moved from the individual group by and distinct clause checks into the helper function if (list_length(parse->distinctClause) < 2) return; and if (list_length(parse->groupClause) < 2) return; can be moved into `remove_functionally_dependent_clauses` Comments/Documentation ======================= The main helper function that is added `remove_functionally_dependent_clauses` uses one style of comment--with the name of the function and then the rest of the description indented which is found some places in the code, /* * remove_functionally_dependent_clauses * Processes clauselist and removes any items which are deemed to be * functionally dependent on other clauselist items. * * If any item from the list can be removed, then a new list is built which * does not contain the removed items. If no item can be removed then the * original list is returned. */ while other helper functions in the same file use a different style--all lines flush to the side with a space. I'm not sure which is preferred /* * limit_needed - do we actually need a Limit plan node? * * If we have constant-zero OFFSET and constant-null LIMIT, we can skip adding * a Limit node. This is worth checking for because "OFFSET 0" is a common * locution for an optimization fence. (Because other places in the planner ... it looks like the non-indented ones are from older commits (2013 vs 2016). The list length change is totally optional, but I'll leave it as Waiting On Author in case the author wants to make that change. Overall, LGTM. The new status of this patch is: Waiting on Author