Re: Is "DISTINCT" making a diffence in: SELECT [DISTINCT] ... EXCEPT
On Wed, 2023-11-15 at 10:57 +0100, Dimitrios Apostolou wrote: > SELECT [DISTINCT] ... EXCEPT ... > > In this query I get the same results regardless of including DISTINCT or > not. But I get different query plans, I get an extra HashAggregate node > in the case of SELECT DISTINCT. Any idea why? The DISTINCT is superfluous, because EXCEPT already removes duplicate rows. However, the planner does not invest extra processing cycles to detect that you wrote a superfluous DISTINCT, and it does not remove it. As a consequence, you end up with a pointless extra execution plan node that does not achieve anything except slowing down the query. Remove the DISTINCT. Yours, Laurenz Albe
Re: Is "DISTINCT" making a diffence in: SELECT [DISTINCT] ... EXCEPT
On 2023-11-15 12:12 +0100, Dimitrios Apostolou wrote: > On Wed, 15 Nov 2023, Erik Wienhold wrote: > > > On 2023-11-15 10:57 +0100, Dimitrios Apostolou wrote: > > > SELECT [DISTINCT] ... EXCEPT ... > > > > > > In this query I get the same results regardless of including DISTINCT or > > > not. But I get different query plans, I get an extra HashAggregate node > > > in the case of SELECT DISTINCT. Any idea why? > > > > As Tom Lane recently wrote[1] EXCEPT is not optimized and will operate > > on the subqueries which are planned independently. > > > > [1] https://www.postgresql.org/message-id/2664450.1698799...@sss.pgh.pa.us > > Heh, as he wrote to me even. :-) I just wanted to make sure that this is > indeed a missing optimisation of the planner, and that the queries are > effectively the same. Thank you for clarifying. > > As mentioned, the docs don't make it clear if the SELECT DISTINCT part is > implied or not, only the EXCEPT DISTINCT part is clearly on by default. SELECT ALL is the default as spelled out in [1]. DISTINCT as the default for UNION/EXCEPT/INTERSECT makes sense because those are set operators. I guess SELECT ALL is the default because SQL allows duplicate rows (contrary to the relation model) and the user should instead be explicit about wanting distinct rows which requires additional computation. But when combining subqueries with the default UNION/EXCEPT/INTERSECT you effectively get SELECT DISTINCT ... UNION SELECT DISTINCT ... when it comes to the result. [1] https://www.postgresql.org/docs/current/sql-select.html#SQL-DISTINCT -- Erik
Re: Is "DISTINCT" making a diffence in: SELECT [DISTINCT] ... EXCEPT
On Wed, 15 Nov 2023, Erik Wienhold wrote: On 2023-11-15 10:57 +0100, Dimitrios Apostolou wrote: SELECT [DISTINCT] ... EXCEPT ... In this query I get the same results regardless of including DISTINCT or not. But I get different query plans, I get an extra HashAggregate node in the case of SELECT DISTINCT. Any idea why? As Tom Lane recently wrote[1] EXCEPT is not optimized and will operate on the subqueries which are planned independently. [1] https://www.postgresql.org/message-id/2664450.1698799...@sss.pgh.pa.us Heh, as he wrote to me even. :-) I just wanted to make sure that this is indeed a missing optimisation of the planner, and that the queries are effectively the same. Thank you for clarifying. As mentioned, the docs don't make it clear if the SELECT DISTINCT part is implied or not, only the EXCEPT DISTINCT part is clearly on by default. Dimitris
Re: Is "DISTINCT" making a diffence in: SELECT [DISTINCT] ... EXCEPT
On 2023-11-15 10:57 +0100, Dimitrios Apostolou wrote: > SELECT [DISTINCT] ... EXCEPT ... > > In this query I get the same results regardless of including DISTINCT or > not. But I get different query plans, I get an extra HashAggregate node > in the case of SELECT DISTINCT. Any idea why? As Tom Lane recently wrote[1] EXCEPT is not optimized and will operate on the subqueries which are planned independently. [1] https://www.postgresql.org/message-id/2664450.1698799...@sss.pgh.pa.us -- Erik