[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries
[ https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17272890#comment-17272890 ] Martin Raszyk commented on CALCITE-4242: Are there some updates on this issue? > Wrong plan for nested NOT EXISTS subqueries > --- > > Key: CALCITE-4242 > URL: https://issues.apache.org/jira/browse/CALCITE-4242 > Project: Calcite > Issue Type: Bug >Reporter: Martin Raszyk >Priority: Major > > Suppose we initialize an empty database as follows. > > {code:java} > CREATE TABLE P(x INTEGER); > CREATE TABLE Q(y INTEGER); > CREATE TABLE R(z INTEGER); > INSERT INTO P VALUES (1); > INSERT INTO Q VALUES (1);{code} > > The following query is supposed to yield an empty table as the result. > > {code:java} > SELECT x FROM P > WHERE NOT EXISTS ( > SELECT y FROM Q > WHERE NOT EXISTS ( > SELECT z FROM R > WHERE x = z > ) > ){code} > > However, the query is parsed and converted to the following plan > {code:java} > LogicalProject(X=[$0]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[=($0, $1)], joinType=[left]) > LogicalTableScan(table=[[Bug, P]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$1], $f0=[true]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[true], joinType=[left]) > LogicalTableScan(table=[[Bug, Q]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$0], $f0=[true]) > LogicalTableScan(table=[[Bug, R]]) > {code} > that corresponds to the following SQL query > {code:java} > SELECT P.X > FROM Bug.P > LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1 > FROM Bug.Q > LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1 > FROM Bug.R > GROUP BY Z) AS t0 ON TRUE > WHERE t0.$f1 IS NULL > GROUP BY t0.Z) AS t3 ON P.X = t3.Z > WHERE t3.$f1 IS NULL > {code} > which yields the (non-empty) table P as the result. > Hence, the parsed and converted query is not equivalent to the input query. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries
[ https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199548#comment-17199548 ] James Starr commented on CALCITE-4242: -- One nit for your example, what if the resulting filter should return x when it is null? I think adding a simple boolean column handles some of the edge cases but I think the join could be much more complicated. I think the an important insight from you approach is that the filtered query needs to be apart of the filter. So the algorithm goes: Terms(open to suggestions): Data Query - the query that will eventually be filtered Data Filter - the filter with sub-queries Filter Query - A query that will generate all combinations of columns used from data query that will pass the filters 1. Create the base of the filter query from the data query. 2. Add rewrite sub query filters as LEFT JOINS to build up the filter query. 3. Use some mechanism to perform a LEFT JOIN from the data query to filter query, this needs to handle the multiplicity 4. Rewrite the the sub-query filter in the data filter to use a field from the filter query. > Wrong plan for nested NOT EXISTS subqueries > --- > > Key: CALCITE-4242 > URL: https://issues.apache.org/jira/browse/CALCITE-4242 > Project: Calcite > Issue Type: Bug >Reporter: Martin Raszyk >Priority: Major > > Suppose we initialize an empty database as follows. > > {code:java} > CREATE TABLE P(x INTEGER); > CREATE TABLE Q(y INTEGER); > CREATE TABLE R(z INTEGER); > INSERT INTO P VALUES (1); > INSERT INTO Q VALUES (1);{code} > > The following query is supposed to yield an empty table as the result. > > {code:java} > SELECT x FROM P > WHERE NOT EXISTS ( > SELECT y FROM Q > WHERE NOT EXISTS ( > SELECT z FROM R > WHERE x = z > ) > ){code} > > However, the query is parsed and converted to the following plan > {code:java} > LogicalProject(X=[$0]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[=($0, $1)], joinType=[left]) > LogicalTableScan(table=[[Bug, P]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$1], $f0=[true]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[true], joinType=[left]) > LogicalTableScan(table=[[Bug, Q]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$0], $f0=[true]) > LogicalTableScan(table=[[Bug, R]]) > {code} > that corresponds to the following SQL query > {code:java} > SELECT P.X > FROM Bug.P > LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1 > FROM Bug.Q > LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1 > FROM Bug.R > GROUP BY Z) AS t0 ON TRUE > WHERE t0.$f1 IS NULL > GROUP BY t0.Z) AS t3 ON P.X = t3.Z > WHERE t3.$f1 IS NULL > {code} > which yields the (non-empty) table P as the result. > Hence, the parsed and converted query is not equivalent to the input query. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries
[ https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199183#comment-17199183 ] Martin Raszyk commented on CALCITE-4242: You're right that I still have a correlated query in my proposal. It can be avoided by changing the pattern to {code:java} WITH filter AS -- an equivalent query under set semantics, e.g., using left-associative LEFT JOIN, distinct_filter AS (SELECT DISTINCT * FROM filter) SELECT my_p.x FROM P AS my_p LEFT JOIN distinct_filter AS df ON my_p.x = df.x WHERE df.x IS NOT NULL {code} I understand how you handled an OR condition, but my point was that it could be tricky to preserve the multiplicites throughout (e.g., with conditions that are even more complex than a single OR). On the other hand, the set semantics is achieved by using left-associative LEFT JOINs and the multiplicities can be restored by using a single LEFT JOIN together with a single DISTINCT _at the top-level_. The comparison of the performance of the two approaches is likely to be data-dependent: both approaches use three LEFT JOINs in total, two of them are identical, but the third one in your query is on the table R whereas in my case on a subset of the table P. > Wrong plan for nested NOT EXISTS subqueries > --- > > Key: CALCITE-4242 > URL: https://issues.apache.org/jira/browse/CALCITE-4242 > Project: Calcite > Issue Type: Bug >Reporter: Martin Raszyk >Priority: Major > > Suppose we initialize an empty database as follows. > > {code:java} > CREATE TABLE P(x INTEGER); > CREATE TABLE Q(y INTEGER); > CREATE TABLE R(z INTEGER); > INSERT INTO P VALUES (1); > INSERT INTO Q VALUES (1);{code} > > The following query is supposed to yield an empty table as the result. > > {code:java} > SELECT x FROM P > WHERE NOT EXISTS ( > SELECT y FROM Q > WHERE NOT EXISTS ( > SELECT z FROM R > WHERE x = z > ) > ){code} > > However, the query is parsed and converted to the following plan > {code:java} > LogicalProject(X=[$0]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[=($0, $1)], joinType=[left]) > LogicalTableScan(table=[[Bug, P]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$1], $f0=[true]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[true], joinType=[left]) > LogicalTableScan(table=[[Bug, Q]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$0], $f0=[true]) > LogicalTableScan(table=[[Bug, R]]) > {code} > that corresponds to the following SQL query > {code:java} > SELECT P.X > FROM Bug.P > LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1 > FROM Bug.Q > LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1 > FROM Bug.R > GROUP BY Z) AS t0 ON TRUE > WHERE t0.$f1 IS NULL > GROUP BY t0.Z) AS t3 ON P.X = t3.Z > WHERE t3.$f1 IS NULL > {code} > which yields the (non-empty) table P as the result. > Hence, the parsed and converted query is not equivalent to the input query. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries
[ https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199086#comment-17199086 ] James Starr commented on CALCITE-4242: -- On an OR condition you will have to fork the join if you do not have access to semi join. But why re-invent the wheel when when SEMI LEFT JOIN were created for this use case. Your proposal is largely self defeating because you still have correlated query after you decorrelated. {code:sql} WITH q_present AS (SELECT DISTINCT TRUE as present FROM q), r_present_given_z AS (SELECT DISTINCT TRUE as present, z FROM r GROUP BY z) SELECT my_p.x FROM p my_p LEFT JOIN q_present ON TRUE LEFT JOIN r_present_given_z AS r1 ON r_present_given_z.z = my_p.x LEFT JOIN r_present_given_z AS r2 ON r_present_given_z.z = 2 * my_p.x WHERE NOT (q_present.present IS NOT NULL AND NOT (r1.present IS NOT NULL OR r2.present IS NOT NULL)); {code} > Wrong plan for nested NOT EXISTS subqueries > --- > > Key: CALCITE-4242 > URL: https://issues.apache.org/jira/browse/CALCITE-4242 > Project: Calcite > Issue Type: Bug >Reporter: Martin Raszyk >Priority: Major > > Suppose we initialize an empty database as follows. > > {code:java} > CREATE TABLE P(x INTEGER); > CREATE TABLE Q(y INTEGER); > CREATE TABLE R(z INTEGER); > INSERT INTO P VALUES (1); > INSERT INTO Q VALUES (1);{code} > > The following query is supposed to yield an empty table as the result. > > {code:java} > SELECT x FROM P > WHERE NOT EXISTS ( > SELECT y FROM Q > WHERE NOT EXISTS ( > SELECT z FROM R > WHERE x = z > ) > ){code} > > However, the query is parsed and converted to the following plan > {code:java} > LogicalProject(X=[$0]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[=($0, $1)], joinType=[left]) > LogicalTableScan(table=[[Bug, P]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$1], $f0=[true]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[true], joinType=[left]) > LogicalTableScan(table=[[Bug, Q]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$0], $f0=[true]) > LogicalTableScan(table=[[Bug, R]]) > {code} > that corresponds to the following SQL query > {code:java} > SELECT P.X > FROM Bug.P > LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1 > FROM Bug.Q > LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1 > FROM Bug.R > GROUP BY Z) AS t0 ON TRUE > WHERE t0.$f1 IS NULL > GROUP BY t0.Z) AS t3 ON P.X = t3.Z > WHERE t3.$f1 IS NULL > {code} > which yields the (non-empty) table P as the result. > Hence, the parsed and converted query is not equivalent to the input query. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries
[ https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199085#comment-17199085 ] Martin Raszyk commented on CALCITE-4242: I see - thank you for your explanation. However, making the condition slightly more complex, e.g., in the query {code:java} SELECT x FROM P WHERE NOT EXISTS ( SELECT y FROM Q WHERE NOT EXISTS ( SELECT z FROM R WHERE x = z OR 2 * x = z ) ); {code} evaluated on the database obtained by {code:java} DELETE FROM P; DELETE FROM Q; DELETE FROM R; INSERT INTO P VALUES (2); INSERT INTO P VALUES (2); INSERT INTO P VALUES (1); INSERT INTO P VALUES (1); INSERT INTO Q VALUES (2); INSERT INTO Q VALUES (1); INSERT INTO R VALUES (2); INSERT INTO R VALUES (1); {code} makes it difficult to reuse your trick with the distinct keyword. In particular, the query obtained by replacing the condition in your proposal does not preserve multiplicities {code:java} WITH q_present AS (SELECT DISTINCT TRUE as present FROM q), r_present_given_z AS (SELECT DISTINCT TRUE as present, z FROM r GROUP BY z) SELECT my_p.x FROM p my_p LEFT JOIN q_present ON TRUE LEFT JOIN r_present_given_z ON (r_present_given_z.z = my_p.x OR r_present_given_z.z = 2 * my_p.x) WHERE NOT (q_present.present IS NOT NULL AND NOT (r_present_given_z.present IS NOT NULL)); {code} Yet, the filtering approach captured by the pattern {code:java} WITH filter AS -- an equivalent query under set semantics, e.g., the previous query using left-associative LEFT JOIN SELECT x FROM P WHERE x IN (SELECT * FROM filter) {code} would work here as well. > Wrong plan for nested NOT EXISTS subqueries > --- > > Key: CALCITE-4242 > URL: https://issues.apache.org/jira/browse/CALCITE-4242 > Project: Calcite > Issue Type: Bug >Reporter: Martin Raszyk >Priority: Major > > Suppose we initialize an empty database as follows. > > {code:java} > CREATE TABLE P(x INTEGER); > CREATE TABLE Q(y INTEGER); > CREATE TABLE R(z INTEGER); > INSERT INTO P VALUES (1); > INSERT INTO Q VALUES (1);{code} > > The following query is supposed to yield an empty table as the result. > > {code:java} > SELECT x FROM P > WHERE NOT EXISTS ( > SELECT y FROM Q > WHERE NOT EXISTS ( > SELECT z FROM R > WHERE x = z > ) > ){code} > > However, the query is parsed and converted to the following plan > {code:java} > LogicalProject(X=[$0]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[=($0, $1)], joinType=[left]) > LogicalTableScan(table=[[Bug, P]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$1], $f0=[true]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[true], joinType=[left]) > LogicalTableScan(table=[[Bug, Q]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$0], $f0=[true]) > LogicalTableScan(table=[[Bug, R]]) > {code} > that corresponds to the following SQL query > {code:java} > SELECT P.X > FROM Bug.P > LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1 > FROM Bug.Q > LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1 > FROM Bug.R > GROUP BY Z) AS t0 ON TRUE > WHERE t0.$f1 IS NULL > GROUP BY t0.Z) AS t3 ON P.X = t3.Z > WHERE t3.$f1 IS NULL > {code} > which yields the (non-empty) table P as the result. > Hence, the parsed and converted query is not equivalent to the input query. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries
[ https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199083#comment-17199083 ] James Starr commented on CALCITE-4242: -- Adding distincts generates the correct results. Also, in the original queries, if the exists clauses are turned to IN clauses, then a similiar hoisting algorithm would be needed. Finally, it could be generalized and work in the absents of NOTs. {code:sql} CREATE OR REPLACE VIEW CALCITE_4242 AS WITH q_present AS (SELECT DISTINCT TRUE as present FROM q), r_present_given_z AS (SELECT DISTINCT TRUE as present, z FROM r GROUP BY z) SELECT my_p.x FROM p my_p LEFT JOIN q_present ON TRUE LEFT JOIN r_present_given_z ON r_present_given_z.z = my_p.x WHERE NOT (q_present.present IS NOT NULL AND NOT (r_present_given_z.present IS NOT NULL)); {code} Generates {code} ?column? | x --+--- (0 rows) ?column? | x --+--- QUERY_2 | 1 QUERY_2 | 1 QUERY_2 | 0 QUERY_2 | 0 (4 rows) ?column? | x --+--- QUERY_3 | 1 (1 row) {code} > Wrong plan for nested NOT EXISTS subqueries > --- > > Key: CALCITE-4242 > URL: https://issues.apache.org/jira/browse/CALCITE-4242 > Project: Calcite > Issue Type: Bug >Reporter: Martin Raszyk >Priority: Major > > Suppose we initialize an empty database as follows. > > {code:java} > CREATE TABLE P(x INTEGER); > CREATE TABLE Q(y INTEGER); > CREATE TABLE R(z INTEGER); > INSERT INTO P VALUES (1); > INSERT INTO Q VALUES (1);{code} > > The following query is supposed to yield an empty table as the result. > > {code:java} > SELECT x FROM P > WHERE NOT EXISTS ( > SELECT y FROM Q > WHERE NOT EXISTS ( > SELECT z FROM R > WHERE x = z > ) > ){code} > > However, the query is parsed and converted to the following plan > {code:java} > LogicalProject(X=[$0]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[=($0, $1)], joinType=[left]) > LogicalTableScan(table=[[Bug, P]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$1], $f0=[true]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[true], joinType=[left]) > LogicalTableScan(table=[[Bug, Q]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$0], $f0=[true]) > LogicalTableScan(table=[[Bug, R]]) > {code} > that corresponds to the following SQL query > {code:java} > SELECT P.X > FROM Bug.P > LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1 > FROM Bug.Q > LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1 > FROM Bug.R > GROUP BY Z) AS t0 ON TRUE > WHERE t0.$f1 IS NULL > GROUP BY t0.Z) AS t3 ON P.X = t3.Z > WHERE t3.$f1 IS NULL > {code} > which yields the (non-empty) table P as the result. > Hence, the parsed and converted query is not equivalent to the input query. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries
[ https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199074#comment-17199074 ] Martin Raszyk commented on CALCITE-4242: I'm confused how the distinct keyword would matter for the tables Q and R which do not contain any duplicate values in my counter-example. Could you please elaborate more on this? > Wrong plan for nested NOT EXISTS subqueries > --- > > Key: CALCITE-4242 > URL: https://issues.apache.org/jira/browse/CALCITE-4242 > Project: Calcite > Issue Type: Bug >Reporter: Martin Raszyk >Priority: Major > > Suppose we initialize an empty database as follows. > > {code:java} > CREATE TABLE P(x INTEGER); > CREATE TABLE Q(y INTEGER); > CREATE TABLE R(z INTEGER); > INSERT INTO P VALUES (1); > INSERT INTO Q VALUES (1);{code} > > The following query is supposed to yield an empty table as the result. > > {code:java} > SELECT x FROM P > WHERE NOT EXISTS ( > SELECT y FROM Q > WHERE NOT EXISTS ( > SELECT z FROM R > WHERE x = z > ) > ){code} > > However, the query is parsed and converted to the following plan > {code:java} > LogicalProject(X=[$0]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[=($0, $1)], joinType=[left]) > LogicalTableScan(table=[[Bug, P]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$1], $f0=[true]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[true], joinType=[left]) > LogicalTableScan(table=[[Bug, Q]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$0], $f0=[true]) > LogicalTableScan(table=[[Bug, R]]) > {code} > that corresponds to the following SQL query > {code:java} > SELECT P.X > FROM Bug.P > LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1 > FROM Bug.Q > LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1 > FROM Bug.R > GROUP BY Z) AS t0 ON TRUE > WHERE t0.$f1 IS NULL > GROUP BY t0.Z) AS t3 ON P.X = t3.Z > WHERE t3.$f1 IS NULL > {code} > which yields the (non-empty) table P as the result. > Hence, the parsed and converted query is not equivalent to the input query. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries
[ https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199043#comment-17199043 ] James Starr commented on CALCITE-4242: -- As brought up before, the group by/aggregate function/distinct is needed to handle multiplicity. If you add distinct to the selects for q_present and r_present_given, then the multiplicity is correct. There probably needs to be a more general fix for this in calcite because the same logic also needs to applied to all other multi nested subqueries, where the sub queries are not traversing down a tree. > Wrong plan for nested NOT EXISTS subqueries > --- > > Key: CALCITE-4242 > URL: https://issues.apache.org/jira/browse/CALCITE-4242 > Project: Calcite > Issue Type: Bug >Reporter: Martin Raszyk >Priority: Major > > Suppose we initialize an empty database as follows. > > {code:java} > CREATE TABLE P(x INTEGER); > CREATE TABLE Q(y INTEGER); > CREATE TABLE R(z INTEGER); > INSERT INTO P VALUES (1); > INSERT INTO Q VALUES (1);{code} > > The following query is supposed to yield an empty table as the result. > > {code:java} > SELECT x FROM P > WHERE NOT EXISTS ( > SELECT y FROM Q > WHERE NOT EXISTS ( > SELECT z FROM R > WHERE x = z > ) > ){code} > > However, the query is parsed and converted to the following plan > {code:java} > LogicalProject(X=[$0]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[=($0, $1)], joinType=[left]) > LogicalTableScan(table=[[Bug, P]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$1], $f0=[true]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[true], joinType=[left]) > LogicalTableScan(table=[[Bug, Q]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$0], $f0=[true]) > LogicalTableScan(table=[[Bug, R]]) > {code} > that corresponds to the following SQL query > {code:java} > SELECT P.X > FROM Bug.P > LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1 > FROM Bug.Q > LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1 > FROM Bug.R > GROUP BY Z) AS t0 ON TRUE > WHERE t0.$f1 IS NULL > GROUP BY t0.Z) AS t3 ON P.X = t3.Z > WHERE t3.$f1 IS NULL > {code} > which yields the (non-empty) table P as the result. > Hence, the parsed and converted query is not equivalent to the input query. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries
[ https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17198921#comment-17198921 ] Martin Raszyk commented on CALCITE-4242: I'm afraid we are running in a circle. Your most recent query is equivalent to your second suggestion which works under the set semantics, but not under the multiset semantics as it does not preserve multiplicities. I would conclude the following: 1. only NOT EXISTS subqueries *without data dependencies crossing two nesting levels* -> use right-associative LEFT JOIN (equivalent also w.r.t. multiplicities) 2. NOT EXISTS subquery *with data dependency crossing two nesting levels* -> cannot use left-associative LEFT JOIN (does not preserve multiplicities) nor right-associative LEFT JOIN (cannot reflect all conditions crossing two nesting levels) Nevertheless, one can use left-associative LEFT JOIN which yields an equivalent table w.r.t. set semantics and use it to filter the original table P, thus obtaining the following query which *is equivalent* to my original query *under the multiset semantics*. The following query can be evaluated using a relational algebra plan without subplans. {code:java} WITH filter AS (WITH q_present AS (SELECT TRUE as present FROM q), r_present_given_z AS (SELECT TRUE as present, z FROM r GROUP BY z) SELECT my_p.x FROM p my_p LEFT JOIN q_present ON TRUE LEFT JOIN r_present_given_z ON r_present_given_z.z = my_p.x WHERE NOT (q_present.present IS NOT NULL AND NOT (r_present_given_z.present IS NOT NULL))) SELECT x FROM P WHERE x IN (SELECT * FROM filter) {code} > Wrong plan for nested NOT EXISTS subqueries > --- > > Key: CALCITE-4242 > URL: https://issues.apache.org/jira/browse/CALCITE-4242 > Project: Calcite > Issue Type: Bug >Reporter: Martin Raszyk >Priority: Major > > Suppose we initialize an empty database as follows. > > {code:java} > CREATE TABLE P(x INTEGER); > CREATE TABLE Q(y INTEGER); > CREATE TABLE R(z INTEGER); > INSERT INTO P VALUES (1); > INSERT INTO Q VALUES (1);{code} > > The following query is supposed to yield an empty table as the result. > > {code:java} > SELECT x FROM P > WHERE NOT EXISTS ( > SELECT y FROM Q > WHERE NOT EXISTS ( > SELECT z FROM R > WHERE x = z > ) > ){code} > > However, the query is parsed and converted to the following plan > {code:java} > LogicalProject(X=[$0]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[=($0, $1)], joinType=[left]) > LogicalTableScan(table=[[Bug, P]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$1], $f0=[true]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[true], joinType=[left]) > LogicalTableScan(table=[[Bug, Q]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$0], $f0=[true]) > LogicalTableScan(table=[[Bug, R]]) > {code} > that corresponds to the following SQL query > {code:java} > SELECT P.X > FROM Bug.P > LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1 > FROM Bug.Q > LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1 > FROM Bug.R > GROUP BY Z) AS t0 ON TRUE > WHERE t0.$f1 IS NULL > GROUP BY t0.Z) AS t3 ON P.X = t3.Z > WHERE t3.$f1 IS NULL > {code} > which yields the (non-empty) table P as the result. > Hence, the parsed and converted query is not equivalent to the input query. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries
[ https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17198831#comment-17198831 ] James Starr commented on CALCITE-4242: -- [~mraszyk], So I was wrong about, it does need to be to left associative. {code:sql} CREATE OR REPLACE VIEW CALCITE_4242 AS WITH q_present AS (SELECT TRUE as present FROM q), r_present_given_z AS (SELECT TRUE as present, z FROM r GROUP BY z) SELECT my_p.x FROM p my_p LEFT JOIN q_present ON TRUE LEFT JOIN r_present_given_z ON r_present_given_z.z = my_p.x WHERE NOT (q_present.present IS NOT NULL AND NOT (r_present_given_z.present IS NOT NULL)) CREATE TABLE IF NOT EXISTS P(x INTEGER); CREATE TABLE IF NOT EXISTS Q(y INTEGER); CREATE TABLE IF NOT EXISTS R(z INTEGER); DELETE FROM P; DELETE FROM Q; DELETE FROM R; INSERT INTO P VALUES (1); INSERT INTO Q VALUES (1); \o results.txt SELECT 'QUERY_1', * FROM CALCITE_4242; \o DELETE FROM P; DELETE FROM Q; DELETE FROM R; INSERT INTO P VALUES (1); INSERT INTO P VALUES (1); INSERT INTO P VALUES (0); INSERT INTO P VALUES (0); INSERT INTO Q VALUES (1); INSERT INTO Q VALUES (0); INSERT INTO R VALUES (1); INSERT INTO R VALUES (0); \o | cat - >> results.txt SELECT 'QUERY_2', * FROM CALCITE_4242; \o DELETE FROM P; DELETE FROM Q; DELETE FROM R; INSERT INTO P VALUES (1); \o | cat - >> results.txt SELECT 'QUERY_3', * FROM CALCITE_4242; \o {code} Generates the following file {code} ?column? | x --+--- (0 rows) ?column? | x --+--- QUERY_2 | 1 QUERY_2 | 1 QUERY_2 | 1 QUERY_2 | 1 QUERY_2 | 0 QUERY_2 | 0 QUERY_2 | 0 QUERY_2 | 0 (8 rows) ?column? | x --+--- QUERY_3 | 1 (1 row) {code} > Wrong plan for nested NOT EXISTS subqueries > --- > > Key: CALCITE-4242 > URL: https://issues.apache.org/jira/browse/CALCITE-4242 > Project: Calcite > Issue Type: Bug >Reporter: Martin Raszyk >Priority: Major > > Suppose we initialize an empty database as follows. > > {code:java} > CREATE TABLE P(x INTEGER); > CREATE TABLE Q(y INTEGER); > CREATE TABLE R(z INTEGER); > INSERT INTO P VALUES (1); > INSERT INTO Q VALUES (1);{code} > > The following query is supposed to yield an empty table as the result. > > {code:java} > SELECT x FROM P > WHERE NOT EXISTS ( > SELECT y FROM Q > WHERE NOT EXISTS ( > SELECT z FROM R > WHERE x = z > ) > ){code} > > However, the query is parsed and converted to the following plan > {code:java} > LogicalProject(X=[$0]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[=($0, $1)], joinType=[left]) > LogicalTableScan(table=[[Bug, P]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$1], $f0=[true]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[true], joinType=[left]) > LogicalTableScan(table=[[Bug, Q]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$0], $f0=[true]) > LogicalTableScan(table=[[Bug, R]]) > {code} > that corresponds to the following SQL query > {code:java} > SELECT P.X > FROM Bug.P > LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1 > FROM Bug.Q > LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1 > FROM Bug.R > GROUP BY Z) AS t0 ON TRUE > WHERE t0.$f1 IS NULL > GROUP BY t0.Z) AS t3 ON P.X = t3.Z > WHERE t3.$f1 IS NULL > {code} > which yields the (non-empty) table P as the result. > Hence, the parsed and converted query is not equivalent to the input query. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries
[ https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17198801#comment-17198801 ] Martin Raszyk commented on CALCITE-4242: On your version of my query, I got the same output as you, namely {code:java} CREATE TABLE P(x INTEGER); CREATE TABLE Q(y INTEGER); CREATE TABLE R(z INTEGER); INSERT INTO P VALUES (1); INSERT INTO Q VALUES (1); -- evaluate your query x --- 1 (1 row){code} However, my original query {code:java} SELECT x FROM P WHERE NOT EXISTS ( SELECT y FROM Q WHERE NOT EXISTS ( SELECT z FROM R WHERE x = z ) ) {code} yields an empty table (0 rows). So you didn't provide an equivalent query. > Wrong plan for nested NOT EXISTS subqueries > --- > > Key: CALCITE-4242 > URL: https://issues.apache.org/jira/browse/CALCITE-4242 > Project: Calcite > Issue Type: Bug >Reporter: Martin Raszyk >Priority: Major > > Suppose we initialize an empty database as follows. > > {code:java} > CREATE TABLE P(x INTEGER); > CREATE TABLE Q(y INTEGER); > CREATE TABLE R(z INTEGER); > INSERT INTO P VALUES (1); > INSERT INTO Q VALUES (1);{code} > > The following query is supposed to yield an empty table as the result. > > {code:java} > SELECT x FROM P > WHERE NOT EXISTS ( > SELECT y FROM Q > WHERE NOT EXISTS ( > SELECT z FROM R > WHERE x = z > ) > ){code} > > However, the query is parsed and converted to the following plan > {code:java} > LogicalProject(X=[$0]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[=($0, $1)], joinType=[left]) > LogicalTableScan(table=[[Bug, P]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$1], $f0=[true]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[true], joinType=[left]) > LogicalTableScan(table=[[Bug, Q]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$0], $f0=[true]) > LogicalTableScan(table=[[Bug, R]]) > {code} > that corresponds to the following SQL query > {code:java} > SELECT P.X > FROM Bug.P > LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1 > FROM Bug.Q > LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1 > FROM Bug.R > GROUP BY Z) AS t0 ON TRUE > WHERE t0.$f1 IS NULL > GROUP BY t0.Z) AS t3 ON P.X = t3.Z > WHERE t3.$f1 IS NULL > {code} > which yields the (non-empty) table P as the result. > Hence, the parsed and converted query is not equivalent to the input query. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries
[ https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17197921#comment-17197921 ] James Starr commented on CALCITE-4242: -- Sorry, about not validating the previous query on postgres. The null conditional tripped me up. The correct query for postgres would be: {code:sql} // Some comments here public String getFoo() WITH q_present AS (SELECT TRUE as present FROM q), r_present_given_z AS (SELECT TRUE as present, z FROM r GROUP BY z), q_present_given_r_present_given_z AS ( SELECT distinct TRUE AS present, r_present_given_z.present AS r_present, r_present_given_z.z AS r_z FROM q_present LEFT JOIN r_present_given_z ON TRUE ) SELECT my_p.x FROM P my_p LEFT JOIN q_present_given_r_present_given_z ON q_present_given_r_present_given_z.r_z = my_p.x WHERE NOT( (q_present_given_r_present_given_z.present IS NOT NULL) AND NOT (q_present_given_r_present_given_z.r_present IS NOT NULL) ); {code} {code:sql} CREATE TABLE P(x INTEGER); CREATE TABLE Q(y INTEGER); CREATE TABLE R(z INTEGER); INSERT INTO P VALUES (1); INSERT INTO Q VALUES (1); x --- 1 (1 row) DELETE FROM P; DELETE FROM Q; DELETE FROM R; INSERT INTO P VALUES (1); INSERT INTO P VALUES (1); INSERT INTO P VALUES (0); INSERT INTO P VALUES (0); INSERT INTO Q VALUES (1); INSERT INTO Q VALUES (0); INSERT INTO R VALUES (1); INSERT INTO R VALUES (0); x --- 1 1 0 0 (4 rows) DELETE FROM P; DELETE FROM Q; DELETE FROM R; INSERT INTO P VALUES (1); x --- 1 (1 row) {code} The semi left join/distinct/aggregation handles the multiplicity. All of the conditional need to hoisted to the top where clause. If the joins are kept right-associative, then only top most query joined, in the example q_present_given_r_present_given_z, needs to have an aggregation. However, I think this could result in cross product joins if there are 2 nested queries in a query that is already nested. However, most likely the optimization rules should be able to rearrange the join such that it be right or left associative is not a question of performance, simply of correctness, assuming that the rewrite is in such a way that it does not block the rule. This query could be very expensive if it was right associative and the planner did not rearrange the joins. {code:sql} CREATE TABLE t_1(t1_id INTEGER); CREATE TABLE t_2(y INTEGER); CREATE TABLE t_2_1(t1_id INTEGER); CREATE TABLE t_2_2(t1_id INTEGER); SELECT * FROM t_1 WHERE EXISTS ( SELECT * FROM t_2 WHERE EXISTS(SELECT * FROM t_2_1 WHERE t1.t1_id = t2_1.t1_id) AND EXISTS(SELECT * FROM t_2_2 WHERE t1.t1_id = t2_2.t1_id) ); {code} So I believe the algorithm works out to something like this: {code:java} //De-morgan the expression such that you have a normalized AND expression List demorganSubExpression = demorganToAnd(subExpression); List resultExpression = new ArrayList<>(); for(RexNode subExpresion: demorganSubExpression){ if(hasSubQuery(subExpression)) { bb.hoiste(subExpression); } else { resultExpression.add(subExpression); } return buildAndExpression(resultExpression); {code} With bb.hoiste traversing up the tree until until all elements are in scope. > Wrong plan for nested NOT EXISTS subqueries > --- > > Key: CALCITE-4242 > URL: https://issues.apache.org/jira/browse/CALCITE-4242 > Project: Calcite > Issue Type: Bug >Reporter: Martin Raszyk >Priority: Major > > Suppose we initialize an empty database as follows. > > {code:java} > CREATE TABLE P(x INTEGER); > CREATE TABLE Q(y INTEGER); > CREATE TABLE R(z INTEGER); > INSERT INTO P VALUES (1); > INSERT INTO Q VALUES (1);{code} > > The following query is supposed to yield an empty table as the result. > > {code:java} > SELECT x FROM P > WHERE NOT EXISTS ( > SELECT y FROM Q > WHERE NOT EXISTS ( > SELECT z FROM R > WHERE x = z > ) > ){code} > > However, the query is parsed and converted to the following plan > {code:java} > LogicalProject(X=[$0]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[=($0, $1)], joinType=[left]) > LogicalTableScan(table=[[Bug, P]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$1], $f0=[true]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[true], joinType=[left]) > LogicalTableScan(table=[[Bug, Q]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$0], $f0=[true]) > LogicalTableScan(table=[[Bug, R]]) > {code} > that corresponds to the following SQL query > {code:java} > SELECT P.X > FROM Bug.P > LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1 > FROM Bug.Q > LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1 > FROM Bug.R > GROUP BY Z) AS t0 ON TRUE > WHERE t0.$f1 IS NULL > GROUP BY
[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries
[ https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17197422#comment-17197422 ] Martin Raszyk commented on CALCITE-4242: I tried to use `LEFT SEMI JOIN` in PostgreSQL version `psql (PostgreSQL) 12.4 (Ubuntu 12.4-0ubuntu0.20.04.1)`, but I didn't succeed. The query {code:java} SELECT x FROM P LEFT SEMI JOIN Q ON TRUE; {code} leads to syntax error. The query {code:java} SELECT x FROM P SEMI JOIN Q ON TRUE; {code} is evaluated, but does not yield the proper multiplicities. Finally, the query {code:java} SELECT my_p.x FROM P AS my_p SEMI JOIN Q ON TRUE; {code} leads to syntax error, too. Moreover, I said that "one could keep the LEFT JOIN right-associative for nested NOT EXISTS subqueries *without data dependencies crossing two nesting levels*". In my very original query {code:java} SELECT x FROM P WHERE NOT EXISTS ( SELECT y FROM Q WHERE NOT EXISTS ( SELECT z FROM R WHERE x = z ) ) {code} this is not the case and your most recent query which I fixed as follows so that it evaluates {code:java} WITH q_present AS (SELECT TRUE as present FROM q), r_present_given_z AS (SELECT TRUE as present, z FROM r GROUP BY z), q_present_given_r_present_given_z AS ( SELECT TRUE AS present, r_present_given_z.present AS r_present, r_present_given_z.z AS r_z FROM q_present LEFT JOIN r_present_given_z ON TRUE ) SELECT my_p.x FROM P my_p LEFT JOIN q_present_given_r_present_given_z ON q_present_given_r_present_given_z.r_z = my_p.x WHERE NOT( q_present_given_r_present_given_z.present AND NOT q_present_given_r_present_given_z.r_present ); {code} is not equivalent to my very original query with the same counter-example as you very first suggestion: {code:java} CREATE TABLE P(x INTEGER); CREATE TABLE Q(y INTEGER); CREATE TABLE R(z INTEGER); INSERT INTO P VALUES (1); {code} > Wrong plan for nested NOT EXISTS subqueries > --- > > Key: CALCITE-4242 > URL: https://issues.apache.org/jira/browse/CALCITE-4242 > Project: Calcite > Issue Type: Bug >Reporter: Martin Raszyk >Priority: Major > > Suppose we initialize an empty database as follows. > > {code:java} > CREATE TABLE P(x INTEGER); > CREATE TABLE Q(y INTEGER); > CREATE TABLE R(z INTEGER); > INSERT INTO P VALUES (1); > INSERT INTO Q VALUES (1);{code} > > The following query is supposed to yield an empty table as the result. > > {code:java} > SELECT x FROM P > WHERE NOT EXISTS ( > SELECT y FROM Q > WHERE NOT EXISTS ( > SELECT z FROM R > WHERE x = z > ) > ){code} > > However, the query is parsed and converted to the following plan > {code:java} > LogicalProject(X=[$0]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[=($0, $1)], joinType=[left]) > LogicalTableScan(table=[[Bug, P]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$1], $f0=[true]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[true], joinType=[left]) > LogicalTableScan(table=[[Bug, Q]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$0], $f0=[true]) > LogicalTableScan(table=[[Bug, R]]) > {code} > that corresponds to the following SQL query > {code:java} > SELECT P.X > FROM Bug.P > LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1 > FROM Bug.Q > LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1 > FROM Bug.R > GROUP BY Z) AS t0 ON TRUE > WHERE t0.$f1 IS NULL > GROUP BY t0.Z) AS t3 ON P.X = t3.Z > WHERE t3.$f1 IS NULL > {code} > which yields the (non-empty) table P as the result. > Hence, the parsed and converted query is not equivalent to the input query. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries
[ https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17194478#comment-17194478 ] James Starr commented on CALCITE-4242: -- I believe you are correct about keeping the {code:sql}LEFT JOIN{code} right associative due to solving the multiplicity issue. If we make the top most join `LEFT SEMI`, it would solve the multiplicity issue. Further more, I think this could be expanded to all exists clauses not just NOT exists. {code:sql} WITH q_present AS (SELECT TRUE as present FROM q), r_present_given_z AS (SELECT TRUE, z FROM r GROUP BY z) q_present_given_r_present_given_z AS ( SELECT TRUE AS present, r_present_given_x.present AS r_present, r_present_given_z.z AS r_z FROM q_present LEFT JOIN r_present_given_z ON TRUE ) SELECT my_p.x FROM P my_p LEFT SEMI JOIN q_present_given_r_present_given_z ON q_present_given_r_present_given_z.z = my_p.x WHERE NOT( q_present_given_r_present_given_z.present = TRUE AND NOT q_present_given_r_present_given_z.r_present ); {code} > Wrong plan for nested NOT EXISTS subqueries > --- > > Key: CALCITE-4242 > URL: https://issues.apache.org/jira/browse/CALCITE-4242 > Project: Calcite > Issue Type: Bug >Reporter: Martin Raszyk >Priority: Major > > Suppose we initialize an empty database as follows. > > {code:java} > CREATE TABLE P(x INTEGER); > CREATE TABLE Q(y INTEGER); > CREATE TABLE R(z INTEGER); > INSERT INTO P VALUES (1); > INSERT INTO Q VALUES (1);{code} > > The following query is supposed to yield an empty table as the result. > > {code:java} > SELECT x FROM P > WHERE NOT EXISTS ( > SELECT y FROM Q > WHERE NOT EXISTS ( > SELECT z FROM R > WHERE x = z > ) > ){code} > > However, the query is parsed and converted to the following plan > {code:java} > LogicalProject(X=[$0]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[=($0, $1)], joinType=[left]) > LogicalTableScan(table=[[Bug, P]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$1], $f0=[true]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[true], joinType=[left]) > LogicalTableScan(table=[[Bug, Q]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$0], $f0=[true]) > LogicalTableScan(table=[[Bug, R]]) > {code} > that corresponds to the following SQL query > {code:java} > SELECT P.X > FROM Bug.P > LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1 > FROM Bug.Q > LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1 > FROM Bug.R > GROUP BY Z) AS t0 ON TRUE > WHERE t0.$f1 IS NULL > GROUP BY t0.Z) AS t3 ON P.X = t3.Z > WHERE t3.$f1 IS NULL > {code} > which yields the (non-empty) table P as the result. > Hence, the parsed and converted query is not equivalent to the input query. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries
[ https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17194049#comment-17194049 ] Martin Raszyk commented on CALCITE-4242: Your last query makes sense in the set semantics. However, it seems hard to enforce the right multiplicities, e.g., on a database obtained by {code:java} DELETE FROM P; DELETE FROM Q; DELETE FROM R; INSERT INTO P VALUES (1); INSERT INTO P VALUES (1); INSERT INTO P VALUES (0); INSERT INTO P VALUES (0); INSERT INTO Q VALUES (1); INSERT INTO Q VALUES (0); INSERT INTO R VALUES (1); INSERT INTO R VALUES (0); {code} Let me also point out that it still makes sense to keep the LEFT JOIN right-associative for nested NOT EXISTS subqueries without data dependencies crossing two nesting levels, e.g., in the input query {code:java} SELECT x FROM P WHERE NOT EXISTS ( SELECT y FROM Q WHERE x = y AND NOT EXISTS ( SELECT z FROM R WHERE y = z ) ) {code} for which the currently converted query {code:java} SELECT P.X FROM PQR.P LEFT JOIN (SELECT Y, MIN(TRUE) AS $f1 FROM (SELECT Q.Y, t0.$f1 AS $f0 FROM PQR.Q LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1 FROM PQR.R GROUP BY Z) AS t0 ON Q.Y = t0.Z) AS t1 WHERE t1.$f0 IS NULL GROUP BY Y) AS t4 ON P.X = t4.Y WHERE t4.$f1 IS NULL {code} is equivalent (also w.r.t. multiplicities) and could be more efficient than with a left-associative LEFT JOIN. Finally, let me point out that the WHERE clause from your last query becomes more involved if the nesting depth increases, e.g., on the input query {code:java} SELECT x FROM P WHERE NOT EXISTS ( SELECT y FROM Q WHERE x = y AND NOT EXISTS ( SELECT z FROM R WHERE x = z AND y = z AND NOT EXISTS ( SELECT w FROM S WHERE x = w AND y = w AND z = w AND NOT EXISTS ( SELECT v FROM T WHERE x = v AND y = v AND z = v AND w = v ) ) ) ) {code} where it should read {code:java} WHERE my_q.present IS NULL OR (my_r.present IS NOT NULL AND (my_s.present IS NULL OR my_t.present IS NOT NULL)) {code} > Wrong plan for nested NOT EXISTS subqueries > --- > > Key: CALCITE-4242 > URL: https://issues.apache.org/jira/browse/CALCITE-4242 > Project: Calcite > Issue Type: Bug >Reporter: Martin Raszyk >Priority: Major > > Suppose we initialize an empty database as follows. > > {code:java} > CREATE TABLE P(x INTEGER); > CREATE TABLE Q(y INTEGER); > CREATE TABLE R(z INTEGER); > INSERT INTO P VALUES (1); > INSERT INTO Q VALUES (1);{code} > > The following query is supposed to yield an empty table as the result. > > {code:java} > SELECT x FROM P > WHERE NOT EXISTS ( > SELECT y FROM Q > WHERE NOT EXISTS ( > SELECT z FROM R > WHERE x = z > ) > ){code} > > However, the query is parsed and converted to the following plan > {code:java} > LogicalProject(X=[$0]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[=($0, $1)], joinType=[left]) > LogicalTableScan(table=[[Bug, P]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$1], $f0=[true]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[true], joinType=[left]) > LogicalTableScan(table=[[Bug, Q]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$0], $f0=[true]) > LogicalTableScan(table=[[Bug, R]]) > {code} > that corresponds to the following SQL query > {code:java} > SELECT P.X > FROM Bug.P > LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1 > FROM Bug.Q > LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1 > FROM Bug.R > GROUP BY Z) AS t0 ON TRUE > WHERE t0.$f1 IS NULL > GROUP BY t0.Z) AS t3 ON P.X = t3.Z > WHERE t3.$f1 IS NULL > {code} > which yields the (non-empty) table P as the result. > Hence, the parsed and converted query is not equivalent to the input query. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries
[ https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17193868#comment-17193868 ] James Starr commented on CALCITE-4242: -- This is or at least reasonable close to what the decorrelated query should look like. So perhaps how logic for pushing up joins is not correct. {code:java} SELECT my_p.X FROM P my_p LEFT JOIN (SELECT TRUE as present FROM Q ) AS my_q ON TRUE LEFT JOIN (SELECT Z, TRUE AS present FROM R GROUP BY Z) AS my_r ON my_p.X = my_r.Z WHERE my_q.present IS NULL OR my_r.present IS NOT NULL {code} > Wrong plan for nested NOT EXISTS subqueries > --- > > Key: CALCITE-4242 > URL: https://issues.apache.org/jira/browse/CALCITE-4242 > Project: Calcite > Issue Type: Bug >Reporter: Martin Raszyk >Priority: Major > > Suppose we initialize an empty database as follows. > > {code:java} > CREATE TABLE P(x INTEGER); > CREATE TABLE Q(y INTEGER); > CREATE TABLE R(z INTEGER); > INSERT INTO P VALUES (1); > INSERT INTO Q VALUES (1);{code} > > The following query is supposed to yield an empty table as the result. > > {code:java} > SELECT x FROM P > WHERE NOT EXISTS ( > SELECT y FROM Q > WHERE NOT EXISTS ( > SELECT z FROM R > WHERE x = z > ) > ){code} > > However, the query is parsed and converted to the following plan > {code:java} > LogicalProject(X=[$0]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[=($0, $1)], joinType=[left]) > LogicalTableScan(table=[[Bug, P]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$1], $f0=[true]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[true], joinType=[left]) > LogicalTableScan(table=[[Bug, Q]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$0], $f0=[true]) > LogicalTableScan(table=[[Bug, R]]) > {code} > that corresponds to the following SQL query > {code:java} > SELECT P.X > FROM Bug.P > LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1 > FROM Bug.Q > LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1 > FROM Bug.R > GROUP BY Z) AS t0 ON TRUE > WHERE t0.$f1 IS NULL > GROUP BY t0.Z) AS t3 ON P.X = t3.Z > WHERE t3.$f1 IS NULL > {code} > which yields the (non-empty) table P as the result. > Hence, the parsed and converted query is not equivalent to the input query. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries
[ https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17193811#comment-17193811 ] Martin Raszyk commented on CALCITE-4242: Your proposal is not equivalent to the input query if you initialize the database as follows. {code:java} CREATE TABLE P(x INTEGER); CREATE TABLE Q(y INTEGER); CREATE TABLE R(z INTEGER); INSERT INTO P VALUES (1); {code} Now the input query yields the table P as the result, but your query yields an empty result. > Wrong plan for nested NOT EXISTS subqueries > --- > > Key: CALCITE-4242 > URL: https://issues.apache.org/jira/browse/CALCITE-4242 > Project: Calcite > Issue Type: Bug >Reporter: Martin Raszyk >Priority: Major > > Suppose we initialize an empty database as follows. > > {code:java} > CREATE TABLE P(x INTEGER); > CREATE TABLE Q(y INTEGER); > CREATE TABLE R(z INTEGER); > INSERT INTO P VALUES (1); > INSERT INTO Q VALUES (1);{code} > > The following query is supposed to yield an empty table as the result. > > {code:java} > SELECT x FROM P > WHERE NOT EXISTS ( > SELECT y FROM Q > WHERE NOT EXISTS ( > SELECT z FROM R > WHERE x = z > ) > ){code} > > However, the query is parsed and converted to the following plan > {code:java} > LogicalProject(X=[$0]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[=($0, $1)], joinType=[left]) > LogicalTableScan(table=[[Bug, P]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$1], $f0=[true]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[true], joinType=[left]) > LogicalTableScan(table=[[Bug, Q]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$0], $f0=[true]) > LogicalTableScan(table=[[Bug, R]]) > {code} > that corresponds to the following SQL query > {code:java} > SELECT P.X > FROM Bug.P > LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1 > FROM Bug.Q > LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1 > FROM Bug.R > GROUP BY Z) AS t0 ON TRUE > WHERE t0.$f1 IS NULL > GROUP BY t0.Z) AS t3 ON P.X = t3.Z > WHERE t3.$f1 IS NULL > {code} > which yields the (non-empty) table P as the result. > Hence, the parsed and converted query is not equivalent to the input query. -- This message was sent by Atlassian Jira (v8.3.4#803005)