Re: Removing useless DISTINCT clauses

Melanie Plageman Tue, 20 Mar 2018 20:30:54 -0700

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       tested, passed
Spec compliant:           tested, passed
Documentation:            tested, passed


This is a review of the patch to remove useless DISTINCT columns from the 
DISTINCT clause
https://www.postgresql.org/message-id/CAKJS1f8UMJ137sRuVSnEMDDpa57Q71JuLZi4yLCFMekNYVYqaQ%40mail.gmail.com


Contents & Purpose
==================

This patch removes any additional columns in the DISTINCT clause when the
table's primary key columns are entirely present in the DISTINCT clause. This
optimization works because the PK columns functionally determine the other
columns in the relation. The patch includes regression test cases.

Initial Run
===========
The patch applies cleanly to HEAD. All tests which are run as part of the
`installcheck` target pass correctly (all existing tests pass, all new tests
pass with the patch applied and fail without it). All TAP tests pass. All tests
which are run as part of the `installcheck-world` target pass except one of the
Bloom contrib module tests in `contrib/bloom/sql/bloom.sql`,
 `CREATE INDEX bloomidx ON tst USING bloom (i, t) WITH (col1 = 3);`
which is currently also failing (crashing) for me on master, so it is unrelated 
to the
patch. The test cases seem to cover the new behavior.

Functionality
=============
Given that this optimization is safe for the GROUP BY clause (you can remove
functionally dependent attributes from the clause there as well), the logic
does seem to follow for DISTINCT. It seems semantically correct to do this. As
for the implementation, the author seems to have reasonably addressed concerns
expressed over the distinction between DISTINCT and DISTINCT ON. As far as I
can see, this should be fine.

Code Review
===========
For a small performance hit but an improvement in readability, the length check
can be moved from the individual group by and distinct clause checks into the
helper function

        if (list_length(parse->distinctClause) < 2)
          return;

        and

        if (list_length(parse->groupClause) < 2)
          return;

can be moved into `remove_functionally_dependent_clauses`


Comments/Documentation
=======================

The main helper function that is added `remove_functionally_dependent_clauses`
uses one style of comment--with the name of the function and then the rest of
the description indented which is found some places in the code, 
/*
 * remove_functionally_dependent_clauses
 *              Processes clauselist and removes any items which are deemed to 
be
 *              functionally dependent on other clauselist items.
 *
 * If any item from the list can be removed, then a new list is built which
 * does not contain the removed items.  If no item can be removed then the
 * original list is returned.
 */

while other helper functions in the same file use a different style--all lines
flush to the side with a space. I'm not sure which is preferred

/*
 * limit_needed - do we actually need a Limit plan node?
 *
 * If we have constant-zero OFFSET and constant-null LIMIT, we can skip adding
 * a Limit node.  This is worth checking for because "OFFSET 0" is a common
 * locution for an optimization fence.  (Because other places in the planner
 ...

it looks like the non-indented ones are from older commits (2013 vs 2016).

The list length change is totally optional, but I'll leave it as Waiting On 
Author in case the author wants to make that change.

Overall, LGTM.

The new status of this patch is: Waiting on Author

Re: Removing useless DISTINCT clauses

Reply via email to