[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries
[ https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17272890#comment-17272890 ] Martin Raszyk commented on CALCITE-4242: Are there some updates on this issue? > Wrong plan for nested NOT EXISTS subqueries > --- > > Key: CALCITE-4242 > URL: https://issues.apache.org/jira/browse/CALCITE-4242 > Project: Calcite > Issue Type: Bug >Reporter: Martin Raszyk >Priority: Major > > Suppose we initialize an empty database as follows. > > {code:java} > CREATE TABLE P(x INTEGER); > CREATE TABLE Q(y INTEGER); > CREATE TABLE R(z INTEGER); > INSERT INTO P VALUES (1); > INSERT INTO Q VALUES (1);{code} > > The following query is supposed to yield an empty table as the result. > > {code:java} > SELECT x FROM P > WHERE NOT EXISTS ( > SELECT y FROM Q > WHERE NOT EXISTS ( > SELECT z FROM R > WHERE x = z > ) > ){code} > > However, the query is parsed and converted to the following plan > {code:java} > LogicalProject(X=[$0]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[=($0, $1)], joinType=[left]) > LogicalTableScan(table=[[Bug, P]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$1], $f0=[true]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[true], joinType=[left]) > LogicalTableScan(table=[[Bug, Q]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$0], $f0=[true]) > LogicalTableScan(table=[[Bug, R]]) > {code} > that corresponds to the following SQL query > {code:java} > SELECT P.X > FROM Bug.P > LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1 > FROM Bug.Q > LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1 > FROM Bug.R > GROUP BY Z) AS t0 ON TRUE > WHERE t0.$f1 IS NULL > GROUP BY t0.Z) AS t3 ON P.X = t3.Z > WHERE t3.$f1 IS NULL > {code} > which yields the (non-empty) table P as the result. > Hence, the parsed and converted query is not equivalent to the input query. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries
[ https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199183#comment-17199183 ] Martin Raszyk commented on CALCITE-4242: You're right that I still have a correlated query in my proposal. It can be avoided by changing the pattern to {code:java} WITH filter AS -- an equivalent query under set semantics, e.g., using left-associative LEFT JOIN, distinct_filter AS (SELECT DISTINCT * FROM filter) SELECT my_p.x FROM P AS my_p LEFT JOIN distinct_filter AS df ON my_p.x = df.x WHERE df.x IS NOT NULL {code} I understand how you handled an OR condition, but my point was that it could be tricky to preserve the multiplicites throughout (e.g., with conditions that are even more complex than a single OR). On the other hand, the set semantics is achieved by using left-associative LEFT JOINs and the multiplicities can be restored by using a single LEFT JOIN together with a single DISTINCT _at the top-level_. The comparison of the performance of the two approaches is likely to be data-dependent: both approaches use three LEFT JOINs in total, two of them are identical, but the third one in your query is on the table R whereas in my case on a subset of the table P. > Wrong plan for nested NOT EXISTS subqueries > --- > > Key: CALCITE-4242 > URL: https://issues.apache.org/jira/browse/CALCITE-4242 > Project: Calcite > Issue Type: Bug >Reporter: Martin Raszyk >Priority: Major > > Suppose we initialize an empty database as follows. > > {code:java} > CREATE TABLE P(x INTEGER); > CREATE TABLE Q(y INTEGER); > CREATE TABLE R(z INTEGER); > INSERT INTO P VALUES (1); > INSERT INTO Q VALUES (1);{code} > > The following query is supposed to yield an empty table as the result. > > {code:java} > SELECT x FROM P > WHERE NOT EXISTS ( > SELECT y FROM Q > WHERE NOT EXISTS ( > SELECT z FROM R > WHERE x = z > ) > ){code} > > However, the query is parsed and converted to the following plan > {code:java} > LogicalProject(X=[$0]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[=($0, $1)], joinType=[left]) > LogicalTableScan(table=[[Bug, P]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$1], $f0=[true]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[true], joinType=[left]) > LogicalTableScan(table=[[Bug, Q]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$0], $f0=[true]) > LogicalTableScan(table=[[Bug, R]]) > {code} > that corresponds to the following SQL query > {code:java} > SELECT P.X > FROM Bug.P > LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1 > FROM Bug.Q > LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1 > FROM Bug.R > GROUP BY Z) AS t0 ON TRUE > WHERE t0.$f1 IS NULL > GROUP BY t0.Z) AS t3 ON P.X = t3.Z > WHERE t3.$f1 IS NULL > {code} > which yields the (non-empty) table P as the result. > Hence, the parsed and converted query is not equivalent to the input query. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries
[ https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199085#comment-17199085 ] Martin Raszyk commented on CALCITE-4242: I see - thank you for your explanation. However, making the condition slightly more complex, e.g., in the query {code:java} SELECT x FROM P WHERE NOT EXISTS ( SELECT y FROM Q WHERE NOT EXISTS ( SELECT z FROM R WHERE x = z OR 2 * x = z ) ); {code} evaluated on the database obtained by {code:java} DELETE FROM P; DELETE FROM Q; DELETE FROM R; INSERT INTO P VALUES (2); INSERT INTO P VALUES (2); INSERT INTO P VALUES (1); INSERT INTO P VALUES (1); INSERT INTO Q VALUES (2); INSERT INTO Q VALUES (1); INSERT INTO R VALUES (2); INSERT INTO R VALUES (1); {code} makes it difficult to reuse your trick with the distinct keyword. In particular, the query obtained by replacing the condition in your proposal does not preserve multiplicities {code:java} WITH q_present AS (SELECT DISTINCT TRUE as present FROM q), r_present_given_z AS (SELECT DISTINCT TRUE as present, z FROM r GROUP BY z) SELECT my_p.x FROM p my_p LEFT JOIN q_present ON TRUE LEFT JOIN r_present_given_z ON (r_present_given_z.z = my_p.x OR r_present_given_z.z = 2 * my_p.x) WHERE NOT (q_present.present IS NOT NULL AND NOT (r_present_given_z.present IS NOT NULL)); {code} Yet, the filtering approach captured by the pattern {code:java} WITH filter AS -- an equivalent query under set semantics, e.g., the previous query using left-associative LEFT JOIN SELECT x FROM P WHERE x IN (SELECT * FROM filter) {code} would work here as well. > Wrong plan for nested NOT EXISTS subqueries > --- > > Key: CALCITE-4242 > URL: https://issues.apache.org/jira/browse/CALCITE-4242 > Project: Calcite > Issue Type: Bug >Reporter: Martin Raszyk >Priority: Major > > Suppose we initialize an empty database as follows. > > {code:java} > CREATE TABLE P(x INTEGER); > CREATE TABLE Q(y INTEGER); > CREATE TABLE R(z INTEGER); > INSERT INTO P VALUES (1); > INSERT INTO Q VALUES (1);{code} > > The following query is supposed to yield an empty table as the result. > > {code:java} > SELECT x FROM P > WHERE NOT EXISTS ( > SELECT y FROM Q > WHERE NOT EXISTS ( > SELECT z FROM R > WHERE x = z > ) > ){code} > > However, the query is parsed and converted to the following plan > {code:java} > LogicalProject(X=[$0]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[=($0, $1)], joinType=[left]) > LogicalTableScan(table=[[Bug, P]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$1], $f0=[true]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[true], joinType=[left]) > LogicalTableScan(table=[[Bug, Q]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$0], $f0=[true]) > LogicalTableScan(table=[[Bug, R]]) > {code} > that corresponds to the following SQL query > {code:java} > SELECT P.X > FROM Bug.P > LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1 > FROM Bug.Q > LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1 > FROM Bug.R > GROUP BY Z) AS t0 ON TRUE > WHERE t0.$f1 IS NULL > GROUP BY t0.Z) AS t3 ON P.X = t3.Z > WHERE t3.$f1 IS NULL > {code} > which yields the (non-empty) table P as the result. > Hence, the parsed and converted query is not equivalent to the input query. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries
[ https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199074#comment-17199074 ] Martin Raszyk commented on CALCITE-4242: I'm confused how the distinct keyword would matter for the tables Q and R which do not contain any duplicate values in my counter-example. Could you please elaborate more on this? > Wrong plan for nested NOT EXISTS subqueries > --- > > Key: CALCITE-4242 > URL: https://issues.apache.org/jira/browse/CALCITE-4242 > Project: Calcite > Issue Type: Bug >Reporter: Martin Raszyk >Priority: Major > > Suppose we initialize an empty database as follows. > > {code:java} > CREATE TABLE P(x INTEGER); > CREATE TABLE Q(y INTEGER); > CREATE TABLE R(z INTEGER); > INSERT INTO P VALUES (1); > INSERT INTO Q VALUES (1);{code} > > The following query is supposed to yield an empty table as the result. > > {code:java} > SELECT x FROM P > WHERE NOT EXISTS ( > SELECT y FROM Q > WHERE NOT EXISTS ( > SELECT z FROM R > WHERE x = z > ) > ){code} > > However, the query is parsed and converted to the following plan > {code:java} > LogicalProject(X=[$0]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[=($0, $1)], joinType=[left]) > LogicalTableScan(table=[[Bug, P]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$1], $f0=[true]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[true], joinType=[left]) > LogicalTableScan(table=[[Bug, Q]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$0], $f0=[true]) > LogicalTableScan(table=[[Bug, R]]) > {code} > that corresponds to the following SQL query > {code:java} > SELECT P.X > FROM Bug.P > LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1 > FROM Bug.Q > LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1 > FROM Bug.R > GROUP BY Z) AS t0 ON TRUE > WHERE t0.$f1 IS NULL > GROUP BY t0.Z) AS t3 ON P.X = t3.Z > WHERE t3.$f1 IS NULL > {code} > which yields the (non-empty) table P as the result. > Hence, the parsed and converted query is not equivalent to the input query. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries
[ https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17198921#comment-17198921 ] Martin Raszyk commented on CALCITE-4242: I'm afraid we are running in a circle. Your most recent query is equivalent to your second suggestion which works under the set semantics, but not under the multiset semantics as it does not preserve multiplicities. I would conclude the following: 1. only NOT EXISTS subqueries *without data dependencies crossing two nesting levels* -> use right-associative LEFT JOIN (equivalent also w.r.t. multiplicities) 2. NOT EXISTS subquery *with data dependency crossing two nesting levels* -> cannot use left-associative LEFT JOIN (does not preserve multiplicities) nor right-associative LEFT JOIN (cannot reflect all conditions crossing two nesting levels) Nevertheless, one can use left-associative LEFT JOIN which yields an equivalent table w.r.t. set semantics and use it to filter the original table P, thus obtaining the following query which *is equivalent* to my original query *under the multiset semantics*. The following query can be evaluated using a relational algebra plan without subplans. {code:java} WITH filter AS (WITH q_present AS (SELECT TRUE as present FROM q), r_present_given_z AS (SELECT TRUE as present, z FROM r GROUP BY z) SELECT my_p.x FROM p my_p LEFT JOIN q_present ON TRUE LEFT JOIN r_present_given_z ON r_present_given_z.z = my_p.x WHERE NOT (q_present.present IS NOT NULL AND NOT (r_present_given_z.present IS NOT NULL))) SELECT x FROM P WHERE x IN (SELECT * FROM filter) {code} > Wrong plan for nested NOT EXISTS subqueries > --- > > Key: CALCITE-4242 > URL: https://issues.apache.org/jira/browse/CALCITE-4242 > Project: Calcite > Issue Type: Bug >Reporter: Martin Raszyk >Priority: Major > > Suppose we initialize an empty database as follows. > > {code:java} > CREATE TABLE P(x INTEGER); > CREATE TABLE Q(y INTEGER); > CREATE TABLE R(z INTEGER); > INSERT INTO P VALUES (1); > INSERT INTO Q VALUES (1);{code} > > The following query is supposed to yield an empty table as the result. > > {code:java} > SELECT x FROM P > WHERE NOT EXISTS ( > SELECT y FROM Q > WHERE NOT EXISTS ( > SELECT z FROM R > WHERE x = z > ) > ){code} > > However, the query is parsed and converted to the following plan > {code:java} > LogicalProject(X=[$0]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[=($0, $1)], joinType=[left]) > LogicalTableScan(table=[[Bug, P]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$1], $f0=[true]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[true], joinType=[left]) > LogicalTableScan(table=[[Bug, Q]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$0], $f0=[true]) > LogicalTableScan(table=[[Bug, R]]) > {code} > that corresponds to the following SQL query > {code:java} > SELECT P.X > FROM Bug.P > LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1 > FROM Bug.Q > LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1 > FROM Bug.R > GROUP BY Z) AS t0 ON TRUE > WHERE t0.$f1 IS NULL > GROUP BY t0.Z) AS t3 ON P.X = t3.Z > WHERE t3.$f1 IS NULL > {code} > which yields the (non-empty) table P as the result. > Hence, the parsed and converted query is not equivalent to the input query. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries
[ https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17198801#comment-17198801 ] Martin Raszyk commented on CALCITE-4242: On your version of my query, I got the same output as you, namely {code:java} CREATE TABLE P(x INTEGER); CREATE TABLE Q(y INTEGER); CREATE TABLE R(z INTEGER); INSERT INTO P VALUES (1); INSERT INTO Q VALUES (1); -- evaluate your query x --- 1 (1 row){code} However, my original query {code:java} SELECT x FROM P WHERE NOT EXISTS ( SELECT y FROM Q WHERE NOT EXISTS ( SELECT z FROM R WHERE x = z ) ) {code} yields an empty table (0 rows). So you didn't provide an equivalent query. > Wrong plan for nested NOT EXISTS subqueries > --- > > Key: CALCITE-4242 > URL: https://issues.apache.org/jira/browse/CALCITE-4242 > Project: Calcite > Issue Type: Bug >Reporter: Martin Raszyk >Priority: Major > > Suppose we initialize an empty database as follows. > > {code:java} > CREATE TABLE P(x INTEGER); > CREATE TABLE Q(y INTEGER); > CREATE TABLE R(z INTEGER); > INSERT INTO P VALUES (1); > INSERT INTO Q VALUES (1);{code} > > The following query is supposed to yield an empty table as the result. > > {code:java} > SELECT x FROM P > WHERE NOT EXISTS ( > SELECT y FROM Q > WHERE NOT EXISTS ( > SELECT z FROM R > WHERE x = z > ) > ){code} > > However, the query is parsed and converted to the following plan > {code:java} > LogicalProject(X=[$0]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[=($0, $1)], joinType=[left]) > LogicalTableScan(table=[[Bug, P]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$1], $f0=[true]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[true], joinType=[left]) > LogicalTableScan(table=[[Bug, Q]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$0], $f0=[true]) > LogicalTableScan(table=[[Bug, R]]) > {code} > that corresponds to the following SQL query > {code:java} > SELECT P.X > FROM Bug.P > LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1 > FROM Bug.Q > LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1 > FROM Bug.R > GROUP BY Z) AS t0 ON TRUE > WHERE t0.$f1 IS NULL > GROUP BY t0.Z) AS t3 ON P.X = t3.Z > WHERE t3.$f1 IS NULL > {code} > which yields the (non-empty) table P as the result. > Hence, the parsed and converted query is not equivalent to the input query. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries
[ https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17197422#comment-17197422 ] Martin Raszyk edited comment on CALCITE-4242 at 9/17/20, 6:45 AM: -- I tried to use `LEFT SEMI JOIN` in PostgreSQL version `psql (PostgreSQL) 12.4 (Ubuntu 12.4-0ubuntu0.20.04.1)`, but I didn't succeed. The query {code:java} SELECT x FROM P LEFT SEMI JOIN Q ON TRUE; {code} leads to syntax error. The query {code:java} SELECT x FROM P SEMI JOIN Q ON TRUE; {code} is evaluated, but does not yield the proper multiplicities. Finally, the query {code:java} SELECT my_p.x FROM P AS my_p SEMI JOIN Q ON TRUE; {code} leads to syntax error, too. Moreover, I said that "one could keep the LEFT JOIN right-associative for nested NOT EXISTS subqueries *without data dependencies crossing two nesting levels*". In my very original query {code:java} SELECT x FROM P WHERE NOT EXISTS ( SELECT y FROM Q WHERE NOT EXISTS ( SELECT z FROM R WHERE x = z ) ) {code} this is not the case and your most recent query which I fixed as follows so that it evaluates {code:java} WITH q_present AS (SELECT TRUE as present FROM q), r_present_given_z AS (SELECT TRUE as present, z FROM r GROUP BY z), q_present_given_r_present_given_z AS ( SELECT TRUE AS present, r_present_given_z.present AS r_present, r_present_given_z.z AS r_z FROM q_present LEFT JOIN r_present_given_z ON TRUE ) SELECT my_p.x FROM P my_p LEFT JOIN q_present_given_r_present_given_z ON q_present_given_r_present_given_z.r_z = my_p.x WHERE NOT( q_present_given_r_present_given_z.present AND NOT q_present_given_r_present_given_z.r_present ); {code} is not equivalent to my very original query with the same counter-example as your very first suggestion: {code:java} CREATE TABLE P(x INTEGER); CREATE TABLE Q(y INTEGER); CREATE TABLE R(z INTEGER); INSERT INTO P VALUES (1); {code} was (Author: mraszyk): I tried to use `LEFT SEMI JOIN` in PostgreSQL version `psql (PostgreSQL) 12.4 (Ubuntu 12.4-0ubuntu0.20.04.1)`, but I didn't succeed. The query {code:java} SELECT x FROM P LEFT SEMI JOIN Q ON TRUE; {code} leads to syntax error. The query {code:java} SELECT x FROM P SEMI JOIN Q ON TRUE; {code} is evaluated, but does not yield the proper multiplicities. Finally, the query {code:java} SELECT my_p.x FROM P AS my_p SEMI JOIN Q ON TRUE; {code} leads to syntax error, too. Moreover, I said that "one could keep the LEFT JOIN right-associative for nested NOT EXISTS subqueries *without data dependencies crossing two nesting levels*". In my very original query {code:java} SELECT x FROM P WHERE NOT EXISTS ( SELECT y FROM Q WHERE NOT EXISTS ( SELECT z FROM R WHERE x = z ) ) {code} this is not the case and your most recent query which I fixed as follows so that it evaluates {code:java} WITH q_present AS (SELECT TRUE as present FROM q), r_present_given_z AS (SELECT TRUE as present, z FROM r GROUP BY z), q_present_given_r_present_given_z AS ( SELECT TRUE AS present, r_present_given_z.present AS r_present, r_present_given_z.z AS r_z FROM q_present LEFT JOIN r_present_given_z ON TRUE ) SELECT my_p.x FROM P my_p LEFT JOIN q_present_given_r_present_given_z ON q_present_given_r_present_given_z.r_z = my_p.x WHERE NOT( q_present_given_r_present_given_z.present AND NOT q_present_given_r_present_given_z.r_present ); {code} is not equivalent to my very original query with the same counter-example as you very first suggestion: {code:java} CREATE TABLE P(x INTEGER); CREATE TABLE Q(y INTEGER); CREATE TABLE R(z INTEGER); INSERT INTO P VALUES (1); {code} > Wrong plan for nested NOT EXISTS subqueries > --- > > Key: CALCITE-4242 > URL: https://issues.apache.org/jira/browse/CALCITE-4242 > Project: Calcite > Issue Type: Bug >Reporter: Martin Raszyk >Priority: Major > > Suppose we initialize an empty database as follows. > > {code:java} > CREATE TABLE P(x INTEGER); > CREATE TABLE Q(y INTEGER); > CREATE TABLE R(z INTEGER); > INSERT INTO P VALUES (1); > INSERT INTO Q VALUES (1);{code} > > The following query is supposed to yield an empty table as the result. > > {code:java} > SELECT x FROM P > WHERE NOT EXISTS ( > SELECT y FROM Q > WHERE NOT EXISTS ( > SELECT z FROM R > WHERE x = z > ) > ){code} > > However, the query is parsed and converted to the following plan > {code:java} > LogicalProject(X=[$0]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[=($0, $1)], joinType=[left]) > LogicalTableScan(table=[[Bug, P]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$1], $f0=[true]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[true], joinType=[left]) >
[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries
[ https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17197422#comment-17197422 ] Martin Raszyk commented on CALCITE-4242: I tried to use `LEFT SEMI JOIN` in PostgreSQL version `psql (PostgreSQL) 12.4 (Ubuntu 12.4-0ubuntu0.20.04.1)`, but I didn't succeed. The query {code:java} SELECT x FROM P LEFT SEMI JOIN Q ON TRUE; {code} leads to syntax error. The query {code:java} SELECT x FROM P SEMI JOIN Q ON TRUE; {code} is evaluated, but does not yield the proper multiplicities. Finally, the query {code:java} SELECT my_p.x FROM P AS my_p SEMI JOIN Q ON TRUE; {code} leads to syntax error, too. Moreover, I said that "one could keep the LEFT JOIN right-associative for nested NOT EXISTS subqueries *without data dependencies crossing two nesting levels*". In my very original query {code:java} SELECT x FROM P WHERE NOT EXISTS ( SELECT y FROM Q WHERE NOT EXISTS ( SELECT z FROM R WHERE x = z ) ) {code} this is not the case and your most recent query which I fixed as follows so that it evaluates {code:java} WITH q_present AS (SELECT TRUE as present FROM q), r_present_given_z AS (SELECT TRUE as present, z FROM r GROUP BY z), q_present_given_r_present_given_z AS ( SELECT TRUE AS present, r_present_given_z.present AS r_present, r_present_given_z.z AS r_z FROM q_present LEFT JOIN r_present_given_z ON TRUE ) SELECT my_p.x FROM P my_p LEFT JOIN q_present_given_r_present_given_z ON q_present_given_r_present_given_z.r_z = my_p.x WHERE NOT( q_present_given_r_present_given_z.present AND NOT q_present_given_r_present_given_z.r_present ); {code} is not equivalent to my very original query with the same counter-example as you very first suggestion: {code:java} CREATE TABLE P(x INTEGER); CREATE TABLE Q(y INTEGER); CREATE TABLE R(z INTEGER); INSERT INTO P VALUES (1); {code} > Wrong plan for nested NOT EXISTS subqueries > --- > > Key: CALCITE-4242 > URL: https://issues.apache.org/jira/browse/CALCITE-4242 > Project: Calcite > Issue Type: Bug >Reporter: Martin Raszyk >Priority: Major > > Suppose we initialize an empty database as follows. > > {code:java} > CREATE TABLE P(x INTEGER); > CREATE TABLE Q(y INTEGER); > CREATE TABLE R(z INTEGER); > INSERT INTO P VALUES (1); > INSERT INTO Q VALUES (1);{code} > > The following query is supposed to yield an empty table as the result. > > {code:java} > SELECT x FROM P > WHERE NOT EXISTS ( > SELECT y FROM Q > WHERE NOT EXISTS ( > SELECT z FROM R > WHERE x = z > ) > ){code} > > However, the query is parsed and converted to the following plan > {code:java} > LogicalProject(X=[$0]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[=($0, $1)], joinType=[left]) > LogicalTableScan(table=[[Bug, P]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$1], $f0=[true]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[true], joinType=[left]) > LogicalTableScan(table=[[Bug, Q]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$0], $f0=[true]) > LogicalTableScan(table=[[Bug, R]]) > {code} > that corresponds to the following SQL query > {code:java} > SELECT P.X > FROM Bug.P > LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1 > FROM Bug.Q > LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1 > FROM Bug.R > GROUP BY Z) AS t0 ON TRUE > WHERE t0.$f1 IS NULL > GROUP BY t0.Z) AS t3 ON P.X = t3.Z > WHERE t3.$f1 IS NULL > {code} > which yields the (non-empty) table P as the result. > Hence, the parsed and converted query is not equivalent to the input query. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries
[ https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17194049#comment-17194049 ] Martin Raszyk commented on CALCITE-4242: Your last query makes sense in the set semantics. However, it seems hard to enforce the right multiplicities, e.g., on a database obtained by {code:java} DELETE FROM P; DELETE FROM Q; DELETE FROM R; INSERT INTO P VALUES (1); INSERT INTO P VALUES (1); INSERT INTO P VALUES (0); INSERT INTO P VALUES (0); INSERT INTO Q VALUES (1); INSERT INTO Q VALUES (0); INSERT INTO R VALUES (1); INSERT INTO R VALUES (0); {code} Let me also point out that it still makes sense to keep the LEFT JOIN right-associative for nested NOT EXISTS subqueries without data dependencies crossing two nesting levels, e.g., in the input query {code:java} SELECT x FROM P WHERE NOT EXISTS ( SELECT y FROM Q WHERE x = y AND NOT EXISTS ( SELECT z FROM R WHERE y = z ) ) {code} for which the currently converted query {code:java} SELECT P.X FROM PQR.P LEFT JOIN (SELECT Y, MIN(TRUE) AS $f1 FROM (SELECT Q.Y, t0.$f1 AS $f0 FROM PQR.Q LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1 FROM PQR.R GROUP BY Z) AS t0 ON Q.Y = t0.Z) AS t1 WHERE t1.$f0 IS NULL GROUP BY Y) AS t4 ON P.X = t4.Y WHERE t4.$f1 IS NULL {code} is equivalent (also w.r.t. multiplicities) and could be more efficient than with a left-associative LEFT JOIN. Finally, let me point out that the WHERE clause from your last query becomes more involved if the nesting depth increases, e.g., on the input query {code:java} SELECT x FROM P WHERE NOT EXISTS ( SELECT y FROM Q WHERE x = y AND NOT EXISTS ( SELECT z FROM R WHERE x = z AND y = z AND NOT EXISTS ( SELECT w FROM S WHERE x = w AND y = w AND z = w AND NOT EXISTS ( SELECT v FROM T WHERE x = v AND y = v AND z = v AND w = v ) ) ) ) {code} where it should read {code:java} WHERE my_q.present IS NULL OR (my_r.present IS NOT NULL AND (my_s.present IS NULL OR my_t.present IS NOT NULL)) {code} > Wrong plan for nested NOT EXISTS subqueries > --- > > Key: CALCITE-4242 > URL: https://issues.apache.org/jira/browse/CALCITE-4242 > Project: Calcite > Issue Type: Bug >Reporter: Martin Raszyk >Priority: Major > > Suppose we initialize an empty database as follows. > > {code:java} > CREATE TABLE P(x INTEGER); > CREATE TABLE Q(y INTEGER); > CREATE TABLE R(z INTEGER); > INSERT INTO P VALUES (1); > INSERT INTO Q VALUES (1);{code} > > The following query is supposed to yield an empty table as the result. > > {code:java} > SELECT x FROM P > WHERE NOT EXISTS ( > SELECT y FROM Q > WHERE NOT EXISTS ( > SELECT z FROM R > WHERE x = z > ) > ){code} > > However, the query is parsed and converted to the following plan > {code:java} > LogicalProject(X=[$0]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[=($0, $1)], joinType=[left]) > LogicalTableScan(table=[[Bug, P]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$1], $f0=[true]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[true], joinType=[left]) > LogicalTableScan(table=[[Bug, Q]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$0], $f0=[true]) > LogicalTableScan(table=[[Bug, R]]) > {code} > that corresponds to the following SQL query > {code:java} > SELECT P.X > FROM Bug.P > LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1 > FROM Bug.Q > LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1 > FROM Bug.R > GROUP BY Z) AS t0 ON TRUE > WHERE t0.$f1 IS NULL > GROUP BY t0.Z) AS t3 ON P.X = t3.Z > WHERE t3.$f1 IS NULL > {code} > which yields the (non-empty) table P as the result. > Hence, the parsed and converted query is not equivalent to the input query. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries
[ https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17193811#comment-17193811 ] Martin Raszyk commented on CALCITE-4242: Your proposal is not equivalent to the input query if you initialize the database as follows. {code:java} CREATE TABLE P(x INTEGER); CREATE TABLE Q(y INTEGER); CREATE TABLE R(z INTEGER); INSERT INTO P VALUES (1); {code} Now the input query yields the table P as the result, but your query yields an empty result. > Wrong plan for nested NOT EXISTS subqueries > --- > > Key: CALCITE-4242 > URL: https://issues.apache.org/jira/browse/CALCITE-4242 > Project: Calcite > Issue Type: Bug >Reporter: Martin Raszyk >Priority: Major > > Suppose we initialize an empty database as follows. > > {code:java} > CREATE TABLE P(x INTEGER); > CREATE TABLE Q(y INTEGER); > CREATE TABLE R(z INTEGER); > INSERT INTO P VALUES (1); > INSERT INTO Q VALUES (1);{code} > > The following query is supposed to yield an empty table as the result. > > {code:java} > SELECT x FROM P > WHERE NOT EXISTS ( > SELECT y FROM Q > WHERE NOT EXISTS ( > SELECT z FROM R > WHERE x = z > ) > ){code} > > However, the query is parsed and converted to the following plan > {code:java} > LogicalProject(X=[$0]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[=($0, $1)], joinType=[left]) > LogicalTableScan(table=[[Bug, P]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$1], $f0=[true]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[true], joinType=[left]) > LogicalTableScan(table=[[Bug, Q]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$0], $f0=[true]) > LogicalTableScan(table=[[Bug, R]]) > {code} > that corresponds to the following SQL query > {code:java} > SELECT P.X > FROM Bug.P > LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1 > FROM Bug.Q > LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1 > FROM Bug.R > GROUP BY Z) AS t0 ON TRUE > WHERE t0.$f1 IS NULL > GROUP BY t0.Z) AS t3 ON P.X = t3.Z > WHERE t3.$f1 IS NULL > {code} > which yields the (non-empty) table P as the result. > Hence, the parsed and converted query is not equivalent to the input query. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CALCITE-4243) Wrong SQL conversion for JOIN and NOT EXISTS subquery
Martin Raszyk created CALCITE-4243: -- Summary: Wrong SQL conversion for JOIN and NOT EXISTS subquery Key: CALCITE-4243 URL: https://issues.apache.org/jira/browse/CALCITE-4243 Project: Calcite Issue Type: Bug Reporter: Martin Raszyk Suppose we initialize an empty database as follows. {code:java} CREATE TABLE P(x INTEGER); CREATE TABLE Q(y INTEGER); CREATE TABLE R(z INTEGER); {code} The following query {code:java} SELECT * FROM P JOIN Q ON TRUE WHERE NOT EXISTS ( SELECT * FROM R WHERE x = z ) {code} is parsed and converted to the following plan {code:java} LogicalProject(X=[$0], Y=[$1]) LogicalFilter(condition=[IS NULL($3)]) LogicalJoin(condition=[=($0, $2)], joinType=[left]) LogicalJoin(condition=[true], joinType=[inner]) LogicalTableScan(table=[[Bug, P]]) LogicalTableScan(table=[[Bug, Q]]) LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) LogicalProject(Z=[$0], $f0=[true]) LogicalTableScan(table=[[Bug, R]]) {code} that is subsequently converted to the following SQL query {code:java} SELECT P.X, Q.Y FROM Bug.P, Bug.Q LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1 FROM Bug.R GROUP BY Z) AS t0 ON P.X = t0.Z WHERE t0.$f1 IS NULL{code} which is not valid in SQL, because the LEFT JOIN operation acts only on the tables Q and R, so the ON condition cannot access the column X from the table P. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries
Martin Raszyk created CALCITE-4242: -- Summary: Wrong plan for nested NOT EXISTS subqueries Key: CALCITE-4242 URL: https://issues.apache.org/jira/browse/CALCITE-4242 Project: Calcite Issue Type: Bug Reporter: Martin Raszyk Suppose we initialize an empty database as follows. {code:java} CREATE TABLE P(x INTEGER); CREATE TABLE Q(y INTEGER); CREATE TABLE R(z INTEGER); INSERT INTO P VALUES (1); INSERT INTO Q VALUES (1);{code} The following query is supposed to yield an empty table as the result. {code:java} SELECT x FROM P WHERE NOT EXISTS ( SELECT y FROM Q WHERE NOT EXISTS ( SELECT z FROM R WHERE x = z ) ){code} However, the query is parsed and converted to the following plan {code:java} LogicalProject(X=[$0]) LogicalFilter(condition=[IS NULL($2)]) LogicalJoin(condition=[=($0, $1)], joinType=[left]) LogicalTableScan(table=[[Bug, P]]) LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) LogicalProject(Z=[$1], $f0=[true]) LogicalFilter(condition=[IS NULL($2)]) LogicalJoin(condition=[true], joinType=[left]) LogicalTableScan(table=[[Bug, Q]]) LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) LogicalProject(Z=[$0], $f0=[true]) LogicalTableScan(table=[[Bug, R]]) {code} that corresponds to the following SQL query {code:java} SELECT P.X FROM Bug.P LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1 FROM Bug.Q LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1 FROM Bug.R GROUP BY Z) AS t0 ON TRUE WHERE t0.$f1 IS NULL GROUP BY t0.Z) AS t3 ON P.X = t3.Z WHERE t3.$f1 IS NULL {code} which yields the (non-empty) table P as the result. Hence, the parsed and converted query is not equivalent to the input query. -- This message was sent by Atlassian Jira (v8.3.4#803005)