[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries

2021-01-27 Thread Martin Raszyk (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17272890#comment-17272890
 ] 

Martin Raszyk commented on CALCITE-4242:


Are there some updates on this issue?

> Wrong plan for nested NOT EXISTS subqueries
> ---
>
> Key: CALCITE-4242
> URL: https://issues.apache.org/jira/browse/CALCITE-4242
> Project: Calcite
>  Issue Type: Bug
>Reporter: Martin Raszyk
>Priority: Major
>
> Suppose we initialize an empty database as follows.
>  
> {code:java}
> CREATE TABLE P(x INTEGER);
> CREATE TABLE Q(y INTEGER);
> CREATE TABLE R(z INTEGER);
> INSERT INTO P VALUES (1);
> INSERT INTO Q VALUES (1);{code}
>  
> The following query is supposed to yield an empty table as the result.
>  
> {code:java}
> SELECT x FROM P
> WHERE NOT EXISTS (
>   SELECT y FROM Q
>   WHERE NOT EXISTS (
> SELECT z FROM R
> WHERE x = z
>   )
> ){code}
>  
> However, the query is parsed and converted to the following plan
> {code:java}
> LogicalProject(X=[$0])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[=($0, $1)], joinType=[left])
>   LogicalTableScan(table=[[Bug, P]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$1], $f0=[true])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[true], joinType=[left])
>   LogicalTableScan(table=[[Bug, Q]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$0], $f0=[true])
>   LogicalTableScan(table=[[Bug, R]])
> {code}
> that corresponds to the following SQL query
> {code:java}
> SELECT P.X
> FROM Bug.P
> LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1
> FROM Bug.Q
> LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1
> FROM Bug.R
> GROUP BY Z) AS t0 ON TRUE
> WHERE t0.$f1 IS NULL
> GROUP BY t0.Z) AS t3 ON P.X = t3.Z
> WHERE t3.$f1 IS NULL
> {code}
> which yields the (non-empty) table P as the result.
> Hence, the parsed and converted query is not equivalent to the input query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries

2020-09-21 Thread Martin Raszyk (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199183#comment-17199183
 ] 

Martin Raszyk commented on CALCITE-4242:


You're right that I still have a correlated query in my proposal. It can be 
avoided by changing the pattern to
{code:java}
WITH filter AS
-- an equivalent query under set semantics, e.g., using left-associative LEFT 
JOIN,
distinct_filter AS (SELECT DISTINCT * FROM filter)
SELECT my_p.x
FROM P AS my_p LEFT JOIN distinct_filter AS df ON my_p.x = df.x
WHERE df.x IS NOT NULL
{code}
I understand how you handled an OR condition, but my point was that it could be 
tricky to preserve the multiplicites throughout (e.g., with conditions that are 
even more complex than a single OR). On the other hand, the set semantics is 
achieved by using left-associative LEFT JOINs and the multiplicities can be 
restored by using a single LEFT JOIN together with a single DISTINCT _at the 
top-level_.

The comparison of the performance of the two approaches is likely to be 
data-dependent: both approaches use three LEFT JOINs in total, two of them are 
identical, but the third one in your query is on the table R whereas in my case 
on a subset of the table P.

> Wrong plan for nested NOT EXISTS subqueries
> ---
>
> Key: CALCITE-4242
> URL: https://issues.apache.org/jira/browse/CALCITE-4242
> Project: Calcite
>  Issue Type: Bug
>Reporter: Martin Raszyk
>Priority: Major
>
> Suppose we initialize an empty database as follows.
>  
> {code:java}
> CREATE TABLE P(x INTEGER);
> CREATE TABLE Q(y INTEGER);
> CREATE TABLE R(z INTEGER);
> INSERT INTO P VALUES (1);
> INSERT INTO Q VALUES (1);{code}
>  
> The following query is supposed to yield an empty table as the result.
>  
> {code:java}
> SELECT x FROM P
> WHERE NOT EXISTS (
>   SELECT y FROM Q
>   WHERE NOT EXISTS (
> SELECT z FROM R
> WHERE x = z
>   )
> ){code}
>  
> However, the query is parsed and converted to the following plan
> {code:java}
> LogicalProject(X=[$0])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[=($0, $1)], joinType=[left])
>   LogicalTableScan(table=[[Bug, P]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$1], $f0=[true])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[true], joinType=[left])
>   LogicalTableScan(table=[[Bug, Q]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$0], $f0=[true])
>   LogicalTableScan(table=[[Bug, R]])
> {code}
> that corresponds to the following SQL query
> {code:java}
> SELECT P.X
> FROM Bug.P
> LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1
> FROM Bug.Q
> LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1
> FROM Bug.R
> GROUP BY Z) AS t0 ON TRUE
> WHERE t0.$f1 IS NULL
> GROUP BY t0.Z) AS t3 ON P.X = t3.Z
> WHERE t3.$f1 IS NULL
> {code}
> which yields the (non-empty) table P as the result.
> Hence, the parsed and converted query is not equivalent to the input query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries

2020-09-20 Thread Martin Raszyk (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199085#comment-17199085
 ] 

Martin Raszyk commented on CALCITE-4242:


I see - thank you for your explanation. However, making the condition slightly 
more complex, e.g., in the query

 
{code:java}
SELECT x FROM P
WHERE NOT EXISTS (
  SELECT y FROM Q
  WHERE NOT EXISTS (
SELECT z FROM R
WHERE x = z OR 2 * x = z
  )
);
{code}
evaluated on the database obtained by

 

 
{code:java}
DELETE FROM P;
DELETE FROM Q;
DELETE FROM R;
INSERT INTO P VALUES (2);
INSERT INTO P VALUES (2);
INSERT INTO P VALUES (1);
INSERT INTO P VALUES (1);
INSERT INTO Q VALUES (2);
INSERT INTO Q VALUES (1);
INSERT INTO R VALUES (2);
INSERT INTO R VALUES (1);
{code}
makes it difficult to reuse your trick with the distinct keyword. In 
particular, the query obtained by replacing the condition in your proposal does 
not preserve multiplicities

 

 
{code:java}
WITH
q_present AS (SELECT DISTINCT TRUE as present FROM q),
r_present_given_z AS (SELECT DISTINCT TRUE as present, z FROM r GROUP BY z)
SELECT my_p.x
FROM p my_p
LEFT JOIN q_present ON TRUE
LEFT JOIN r_present_given_z ON (r_present_given_z.z = my_p.x OR 
r_present_given_z.z = 2 * my_p.x)
WHERE NOT (q_present.present IS NOT NULL AND NOT (r_present_given_z.present IS 
NOT NULL));
{code}
Yet, the filtering approach captured by the pattern

 

 
{code:java}
WITH filter AS
-- an equivalent query under set semantics, e.g., the previous query using 
left-associative LEFT JOIN
SELECT x
FROM P
WHERE x IN (SELECT * FROM filter)
{code}
would work here as well.

 

> Wrong plan for nested NOT EXISTS subqueries
> ---
>
> Key: CALCITE-4242
> URL: https://issues.apache.org/jira/browse/CALCITE-4242
> Project: Calcite
>  Issue Type: Bug
>Reporter: Martin Raszyk
>Priority: Major
>
> Suppose we initialize an empty database as follows.
>  
> {code:java}
> CREATE TABLE P(x INTEGER);
> CREATE TABLE Q(y INTEGER);
> CREATE TABLE R(z INTEGER);
> INSERT INTO P VALUES (1);
> INSERT INTO Q VALUES (1);{code}
>  
> The following query is supposed to yield an empty table as the result.
>  
> {code:java}
> SELECT x FROM P
> WHERE NOT EXISTS (
>   SELECT y FROM Q
>   WHERE NOT EXISTS (
> SELECT z FROM R
> WHERE x = z
>   )
> ){code}
>  
> However, the query is parsed and converted to the following plan
> {code:java}
> LogicalProject(X=[$0])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[=($0, $1)], joinType=[left])
>   LogicalTableScan(table=[[Bug, P]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$1], $f0=[true])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[true], joinType=[left])
>   LogicalTableScan(table=[[Bug, Q]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$0], $f0=[true])
>   LogicalTableScan(table=[[Bug, R]])
> {code}
> that corresponds to the following SQL query
> {code:java}
> SELECT P.X
> FROM Bug.P
> LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1
> FROM Bug.Q
> LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1
> FROM Bug.R
> GROUP BY Z) AS t0 ON TRUE
> WHERE t0.$f1 IS NULL
> GROUP BY t0.Z) AS t3 ON P.X = t3.Z
> WHERE t3.$f1 IS NULL
> {code}
> which yields the (non-empty) table P as the result.
> Hence, the parsed and converted query is not equivalent to the input query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries

2020-09-20 Thread Martin Raszyk (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199074#comment-17199074
 ] 

Martin Raszyk commented on CALCITE-4242:


I'm confused how the distinct keyword would matter for the tables Q and R which 
do not contain any duplicate values in my counter-example. Could you please 
elaborate more on this?

> Wrong plan for nested NOT EXISTS subqueries
> ---
>
> Key: CALCITE-4242
> URL: https://issues.apache.org/jira/browse/CALCITE-4242
> Project: Calcite
>  Issue Type: Bug
>Reporter: Martin Raszyk
>Priority: Major
>
> Suppose we initialize an empty database as follows.
>  
> {code:java}
> CREATE TABLE P(x INTEGER);
> CREATE TABLE Q(y INTEGER);
> CREATE TABLE R(z INTEGER);
> INSERT INTO P VALUES (1);
> INSERT INTO Q VALUES (1);{code}
>  
> The following query is supposed to yield an empty table as the result.
>  
> {code:java}
> SELECT x FROM P
> WHERE NOT EXISTS (
>   SELECT y FROM Q
>   WHERE NOT EXISTS (
> SELECT z FROM R
> WHERE x = z
>   )
> ){code}
>  
> However, the query is parsed and converted to the following plan
> {code:java}
> LogicalProject(X=[$0])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[=($0, $1)], joinType=[left])
>   LogicalTableScan(table=[[Bug, P]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$1], $f0=[true])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[true], joinType=[left])
>   LogicalTableScan(table=[[Bug, Q]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$0], $f0=[true])
>   LogicalTableScan(table=[[Bug, R]])
> {code}
> that corresponds to the following SQL query
> {code:java}
> SELECT P.X
> FROM Bug.P
> LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1
> FROM Bug.Q
> LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1
> FROM Bug.R
> GROUP BY Z) AS t0 ON TRUE
> WHERE t0.$f1 IS NULL
> GROUP BY t0.Z) AS t3 ON P.X = t3.Z
> WHERE t3.$f1 IS NULL
> {code}
> which yields the (non-empty) table P as the result.
> Hence, the parsed and converted query is not equivalent to the input query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries

2020-09-20 Thread Martin Raszyk (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17198921#comment-17198921
 ] 

Martin Raszyk commented on CALCITE-4242:


I'm afraid we are running in a circle. Your most recent query is equivalent to 
your second suggestion which works under the set semantics, but not under the 
multiset semantics as it does not preserve multiplicities. I would conclude the 
following:

1. only NOT EXISTS subqueries *without data dependencies crossing two nesting 
levels* -> use right-associative LEFT JOIN (equivalent also w.r.t. 
multiplicities)

2. NOT EXISTS subquery *with data dependency crossing two nesting levels* -> 
cannot use left-associative LEFT JOIN (does not preserve multiplicities) nor 
right-associative LEFT JOIN (cannot reflect all conditions crossing two nesting 
levels)

Nevertheless, one can use left-associative LEFT JOIN which yields an equivalent 
table w.r.t. set semantics and use it to filter the original table P, thus 
obtaining the following query which *is equivalent* to my original query *under 
the multiset semantics*. The following query can be evaluated using a 
relational algebra plan without subplans.
{code:java}
WITH filter AS
 (WITH
 q_present AS (SELECT TRUE as present FROM q),
 r_present_given_z AS (SELECT TRUE as present, z FROM r GROUP BY z)
 SELECT my_p.x
 FROM p my_p
 LEFT JOIN q_present ON TRUE
 LEFT JOIN r_present_given_z ON r_present_given_z.z = my_p.x
 WHERE NOT (q_present.present IS NOT NULL AND NOT (r_present_given_z.present IS 
NOT NULL)))
SELECT x
FROM P
WHERE x IN (SELECT * FROM filter)
{code}

> Wrong plan for nested NOT EXISTS subqueries
> ---
>
> Key: CALCITE-4242
> URL: https://issues.apache.org/jira/browse/CALCITE-4242
> Project: Calcite
>  Issue Type: Bug
>Reporter: Martin Raszyk
>Priority: Major
>
> Suppose we initialize an empty database as follows.
>  
> {code:java}
> CREATE TABLE P(x INTEGER);
> CREATE TABLE Q(y INTEGER);
> CREATE TABLE R(z INTEGER);
> INSERT INTO P VALUES (1);
> INSERT INTO Q VALUES (1);{code}
>  
> The following query is supposed to yield an empty table as the result.
>  
> {code:java}
> SELECT x FROM P
> WHERE NOT EXISTS (
>   SELECT y FROM Q
>   WHERE NOT EXISTS (
> SELECT z FROM R
> WHERE x = z
>   )
> ){code}
>  
> However, the query is parsed and converted to the following plan
> {code:java}
> LogicalProject(X=[$0])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[=($0, $1)], joinType=[left])
>   LogicalTableScan(table=[[Bug, P]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$1], $f0=[true])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[true], joinType=[left])
>   LogicalTableScan(table=[[Bug, Q]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$0], $f0=[true])
>   LogicalTableScan(table=[[Bug, R]])
> {code}
> that corresponds to the following SQL query
> {code:java}
> SELECT P.X
> FROM Bug.P
> LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1
> FROM Bug.Q
> LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1
> FROM Bug.R
> GROUP BY Z) AS t0 ON TRUE
> WHERE t0.$f1 IS NULL
> GROUP BY t0.Z) AS t3 ON P.X = t3.Z
> WHERE t3.$f1 IS NULL
> {code}
> which yields the (non-empty) table P as the result.
> Hence, the parsed and converted query is not equivalent to the input query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries

2020-09-19 Thread Martin Raszyk (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17198801#comment-17198801
 ] 

Martin Raszyk commented on CALCITE-4242:


On your version of my query, I got the same output as you, namely
{code:java}
CREATE TABLE P(x INTEGER);
CREATE TABLE Q(y INTEGER);
CREATE TABLE R(z INTEGER);
INSERT INTO P VALUES (1);
INSERT INTO Q VALUES (1);

-- evaluate your query

 x 
---
 1
(1 row){code}
However, my original query
{code:java}
SELECT x FROM P
WHERE NOT EXISTS (
  SELECT y FROM Q
  WHERE NOT EXISTS (
SELECT z FROM R
WHERE x = z
  )
)
{code}
yields an empty table (0 rows). So you didn't provide an equivalent query.

> Wrong plan for nested NOT EXISTS subqueries
> ---
>
> Key: CALCITE-4242
> URL: https://issues.apache.org/jira/browse/CALCITE-4242
> Project: Calcite
>  Issue Type: Bug
>Reporter: Martin Raszyk
>Priority: Major
>
> Suppose we initialize an empty database as follows.
>  
> {code:java}
> CREATE TABLE P(x INTEGER);
> CREATE TABLE Q(y INTEGER);
> CREATE TABLE R(z INTEGER);
> INSERT INTO P VALUES (1);
> INSERT INTO Q VALUES (1);{code}
>  
> The following query is supposed to yield an empty table as the result.
>  
> {code:java}
> SELECT x FROM P
> WHERE NOT EXISTS (
>   SELECT y FROM Q
>   WHERE NOT EXISTS (
> SELECT z FROM R
> WHERE x = z
>   )
> ){code}
>  
> However, the query is parsed and converted to the following plan
> {code:java}
> LogicalProject(X=[$0])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[=($0, $1)], joinType=[left])
>   LogicalTableScan(table=[[Bug, P]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$1], $f0=[true])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[true], joinType=[left])
>   LogicalTableScan(table=[[Bug, Q]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$0], $f0=[true])
>   LogicalTableScan(table=[[Bug, R]])
> {code}
> that corresponds to the following SQL query
> {code:java}
> SELECT P.X
> FROM Bug.P
> LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1
> FROM Bug.Q
> LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1
> FROM Bug.R
> GROUP BY Z) AS t0 ON TRUE
> WHERE t0.$f1 IS NULL
> GROUP BY t0.Z) AS t3 ON P.X = t3.Z
> WHERE t3.$f1 IS NULL
> {code}
> which yields the (non-empty) table P as the result.
> Hence, the parsed and converted query is not equivalent to the input query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries

2020-09-17 Thread Martin Raszyk (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17197422#comment-17197422
 ] 

Martin Raszyk edited comment on CALCITE-4242 at 9/17/20, 6:45 AM:
--

I tried to use `LEFT SEMI JOIN` in PostgreSQL version `psql (PostgreSQL) 12.4 
(Ubuntu 12.4-0ubuntu0.20.04.1)`, but I didn't succeed.

The query

 
{code:java}
SELECT x FROM P LEFT SEMI JOIN Q ON TRUE;
{code}
leads to syntax error.

 

The query
{code:java}
SELECT x FROM P SEMI JOIN Q ON TRUE;
{code}
is evaluated, but does not yield the proper multiplicities.

Finally, the query
{code:java}
SELECT my_p.x FROM P AS my_p SEMI JOIN Q ON TRUE;
{code}
leads to syntax error, too.

Moreover, I said that "one could keep the LEFT JOIN right-associative for 
nested NOT EXISTS subqueries *without data dependencies crossing two nesting 
levels*". In my very original query
{code:java}
SELECT x FROM P
WHERE NOT EXISTS (
  SELECT y FROM Q
  WHERE NOT EXISTS (
SELECT z FROM R
WHERE x = z
  )
)
{code}
this is not the case and your most recent query which I fixed as follows so 
that it evaluates
{code:java}
WITH
q_present AS (SELECT TRUE as present FROM q),
r_present_given_z AS (SELECT TRUE as present, z FROM r GROUP BY z),
q_present_given_r_present_given_z AS (
SELECT TRUE AS present, r_present_given_z.present AS r_present, 
r_present_given_z.z AS r_z
FROM q_present
LEFT JOIN r_present_given_z ON TRUE
)
SELECT my_p.x
FROM P my_p
LEFT JOIN q_present_given_r_present_given_z ON 
q_present_given_r_present_given_z.r_z = my_p.x
WHERE NOT(
  q_present_given_r_present_given_z.present
  AND NOT q_present_given_r_present_given_z.r_present
);
{code}
is not equivalent to my very original query with the same counter-example as 
your very first suggestion:
{code:java}
CREATE TABLE P(x INTEGER);
CREATE TABLE Q(y INTEGER);
CREATE TABLE R(z INTEGER);
INSERT INTO P VALUES (1);
{code}


was (Author: mraszyk):
I tried to use `LEFT SEMI JOIN` in PostgreSQL version `psql (PostgreSQL) 12.4 
(Ubuntu 12.4-0ubuntu0.20.04.1)`, but I didn't succeed.

The query

 
{code:java}
SELECT x FROM P LEFT SEMI JOIN Q ON TRUE;
{code}
leads to syntax error.

 

The query
{code:java}
SELECT x FROM P SEMI JOIN Q ON TRUE;
{code}
is evaluated, but does not yield the proper multiplicities.

Finally, the query
{code:java}
SELECT my_p.x FROM P AS my_p SEMI JOIN Q ON TRUE;
{code}
leads to syntax error, too.

Moreover, I said that "one could keep the LEFT JOIN right-associative for 
nested NOT EXISTS subqueries *without data dependencies crossing two nesting 
levels*". In my very original query
{code:java}
SELECT x FROM P
WHERE NOT EXISTS (
  SELECT y FROM Q
  WHERE NOT EXISTS (
SELECT z FROM R
WHERE x = z
  )
)
{code}
this is not the case and your most recent query which I fixed as follows so 
that it evaluates
{code:java}
WITH
q_present AS (SELECT TRUE as present FROM q),
r_present_given_z AS (SELECT TRUE as present, z FROM r GROUP BY z),
q_present_given_r_present_given_z AS (
SELECT TRUE AS present, r_present_given_z.present AS r_present, 
r_present_given_z.z AS r_z
FROM q_present
LEFT JOIN r_present_given_z ON TRUE
)
SELECT my_p.x
FROM P my_p
LEFT JOIN q_present_given_r_present_given_z ON 
q_present_given_r_present_given_z.r_z = my_p.x
WHERE NOT(
  q_present_given_r_present_given_z.present
  AND NOT q_present_given_r_present_given_z.r_present
);
{code}
is not equivalent to my very original query with the same counter-example as 
you very first suggestion:
{code:java}
CREATE TABLE P(x INTEGER);
CREATE TABLE Q(y INTEGER);
CREATE TABLE R(z INTEGER);
INSERT INTO P VALUES (1);
{code}

> Wrong plan for nested NOT EXISTS subqueries
> ---
>
> Key: CALCITE-4242
> URL: https://issues.apache.org/jira/browse/CALCITE-4242
> Project: Calcite
>  Issue Type: Bug
>Reporter: Martin Raszyk
>Priority: Major
>
> Suppose we initialize an empty database as follows.
>  
> {code:java}
> CREATE TABLE P(x INTEGER);
> CREATE TABLE Q(y INTEGER);
> CREATE TABLE R(z INTEGER);
> INSERT INTO P VALUES (1);
> INSERT INTO Q VALUES (1);{code}
>  
> The following query is supposed to yield an empty table as the result.
>  
> {code:java}
> SELECT x FROM P
> WHERE NOT EXISTS (
>   SELECT y FROM Q
>   WHERE NOT EXISTS (
> SELECT z FROM R
> WHERE x = z
>   )
> ){code}
>  
> However, the query is parsed and converted to the following plan
> {code:java}
> LogicalProject(X=[$0])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[=($0, $1)], joinType=[left])
>   LogicalTableScan(table=[[Bug, P]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$1], $f0=[true])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[true], joinType=[left])
>   

[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries

2020-09-17 Thread Martin Raszyk (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17197422#comment-17197422
 ] 

Martin Raszyk commented on CALCITE-4242:


I tried to use `LEFT SEMI JOIN` in PostgreSQL version `psql (PostgreSQL) 12.4 
(Ubuntu 12.4-0ubuntu0.20.04.1)`, but I didn't succeed.

The query

 
{code:java}
SELECT x FROM P LEFT SEMI JOIN Q ON TRUE;
{code}
leads to syntax error.

 

The query
{code:java}
SELECT x FROM P SEMI JOIN Q ON TRUE;
{code}
is evaluated, but does not yield the proper multiplicities.

Finally, the query
{code:java}
SELECT my_p.x FROM P AS my_p SEMI JOIN Q ON TRUE;
{code}
leads to syntax error, too.

Moreover, I said that "one could keep the LEFT JOIN right-associative for 
nested NOT EXISTS subqueries *without data dependencies crossing two nesting 
levels*". In my very original query
{code:java}
SELECT x FROM P
WHERE NOT EXISTS (
  SELECT y FROM Q
  WHERE NOT EXISTS (
SELECT z FROM R
WHERE x = z
  )
)
{code}
this is not the case and your most recent query which I fixed as follows so 
that it evaluates
{code:java}
WITH
q_present AS (SELECT TRUE as present FROM q),
r_present_given_z AS (SELECT TRUE as present, z FROM r GROUP BY z),
q_present_given_r_present_given_z AS (
SELECT TRUE AS present, r_present_given_z.present AS r_present, 
r_present_given_z.z AS r_z
FROM q_present
LEFT JOIN r_present_given_z ON TRUE
)
SELECT my_p.x
FROM P my_p
LEFT JOIN q_present_given_r_present_given_z ON 
q_present_given_r_present_given_z.r_z = my_p.x
WHERE NOT(
  q_present_given_r_present_given_z.present
  AND NOT q_present_given_r_present_given_z.r_present
);
{code}
is not equivalent to my very original query with the same counter-example as 
you very first suggestion:
{code:java}
CREATE TABLE P(x INTEGER);
CREATE TABLE Q(y INTEGER);
CREATE TABLE R(z INTEGER);
INSERT INTO P VALUES (1);
{code}

> Wrong plan for nested NOT EXISTS subqueries
> ---
>
> Key: CALCITE-4242
> URL: https://issues.apache.org/jira/browse/CALCITE-4242
> Project: Calcite
>  Issue Type: Bug
>Reporter: Martin Raszyk
>Priority: Major
>
> Suppose we initialize an empty database as follows.
>  
> {code:java}
> CREATE TABLE P(x INTEGER);
> CREATE TABLE Q(y INTEGER);
> CREATE TABLE R(z INTEGER);
> INSERT INTO P VALUES (1);
> INSERT INTO Q VALUES (1);{code}
>  
> The following query is supposed to yield an empty table as the result.
>  
> {code:java}
> SELECT x FROM P
> WHERE NOT EXISTS (
>   SELECT y FROM Q
>   WHERE NOT EXISTS (
> SELECT z FROM R
> WHERE x = z
>   )
> ){code}
>  
> However, the query is parsed and converted to the following plan
> {code:java}
> LogicalProject(X=[$0])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[=($0, $1)], joinType=[left])
>   LogicalTableScan(table=[[Bug, P]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$1], $f0=[true])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[true], joinType=[left])
>   LogicalTableScan(table=[[Bug, Q]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$0], $f0=[true])
>   LogicalTableScan(table=[[Bug, R]])
> {code}
> that corresponds to the following SQL query
> {code:java}
> SELECT P.X
> FROM Bug.P
> LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1
> FROM Bug.Q
> LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1
> FROM Bug.R
> GROUP BY Z) AS t0 ON TRUE
> WHERE t0.$f1 IS NULL
> GROUP BY t0.Z) AS t3 ON P.X = t3.Z
> WHERE t3.$f1 IS NULL
> {code}
> which yields the (non-empty) table P as the result.
> Hence, the parsed and converted query is not equivalent to the input query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries

2020-09-11 Thread Martin Raszyk (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17194049#comment-17194049
 ] 

Martin Raszyk commented on CALCITE-4242:


Your last query makes sense in the set semantics. However, it seems hard to 
enforce the right multiplicities, e.g., on a database obtained by
{code:java}
DELETE FROM P;
DELETE FROM Q;
DELETE FROM R;
INSERT INTO P VALUES (1);
INSERT INTO P VALUES (1);
INSERT INTO P VALUES (0);
INSERT INTO P VALUES (0);
INSERT INTO Q VALUES (1);
INSERT INTO Q VALUES (0);
INSERT INTO R VALUES (1);
INSERT INTO R VALUES (0);
{code}
Let me also point out that it still makes sense to keep the LEFT JOIN 
right-associative for nested NOT EXISTS subqueries without data dependencies 
crossing two nesting levels, e.g., in the input query
{code:java}
SELECT x FROM P
WHERE NOT EXISTS (
  SELECT y FROM Q
  WHERE x = y AND NOT EXISTS (
SELECT z FROM R
WHERE y = z
  )
)
{code}
for which the currently converted query
{code:java}
SELECT P.X
FROM PQR.P
LEFT JOIN (SELECT Y, MIN(TRUE) AS $f1
FROM (SELECT Q.Y, t0.$f1 AS $f0
FROM PQR.Q
LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1
FROM PQR.R
GROUP BY Z) AS t0 ON Q.Y = t0.Z) AS t1
WHERE t1.$f0 IS NULL
GROUP BY Y) AS t4 ON P.X = t4.Y
WHERE t4.$f1 IS NULL
{code}
is equivalent (also w.r.t. multiplicities) and could be more efficient than 
with a left-associative LEFT JOIN.

Finally, let me point out that the WHERE clause from your last query becomes 
more involved if the nesting depth increases, e.g., on the input query
{code:java}
SELECT x FROM P
WHERE NOT EXISTS (
  SELECT y FROM Q
  WHERE x = y AND NOT EXISTS (
SELECT z FROM R
WHERE x = z AND y = z AND NOT EXISTS (
  SELECT w FROM S
  WHERE x = w AND y = w AND z = w AND NOT EXISTS (
SELECT v FROM T
WHERE x = v AND y = v AND z = v AND w = v
  )
)
  )
)
{code}
where it should read
{code:java}
WHERE my_q.present IS NULL OR (my_r.present IS NOT NULL AND (my_s.present IS 
NULL OR my_t.present IS NOT NULL))
{code}

> Wrong plan for nested NOT EXISTS subqueries
> ---
>
> Key: CALCITE-4242
> URL: https://issues.apache.org/jira/browse/CALCITE-4242
> Project: Calcite
>  Issue Type: Bug
>Reporter: Martin Raszyk
>Priority: Major
>
> Suppose we initialize an empty database as follows.
>  
> {code:java}
> CREATE TABLE P(x INTEGER);
> CREATE TABLE Q(y INTEGER);
> CREATE TABLE R(z INTEGER);
> INSERT INTO P VALUES (1);
> INSERT INTO Q VALUES (1);{code}
>  
> The following query is supposed to yield an empty table as the result.
>  
> {code:java}
> SELECT x FROM P
> WHERE NOT EXISTS (
>   SELECT y FROM Q
>   WHERE NOT EXISTS (
> SELECT z FROM R
> WHERE x = z
>   )
> ){code}
>  
> However, the query is parsed and converted to the following plan
> {code:java}
> LogicalProject(X=[$0])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[=($0, $1)], joinType=[left])
>   LogicalTableScan(table=[[Bug, P]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$1], $f0=[true])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[true], joinType=[left])
>   LogicalTableScan(table=[[Bug, Q]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$0], $f0=[true])
>   LogicalTableScan(table=[[Bug, R]])
> {code}
> that corresponds to the following SQL query
> {code:java}
> SELECT P.X
> FROM Bug.P
> LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1
> FROM Bug.Q
> LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1
> FROM Bug.R
> GROUP BY Z) AS t0 ON TRUE
> WHERE t0.$f1 IS NULL
> GROUP BY t0.Z) AS t3 ON P.X = t3.Z
> WHERE t3.$f1 IS NULL
> {code}
> which yields the (non-empty) table P as the result.
> Hence, the parsed and converted query is not equivalent to the input query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries

2020-09-10 Thread Martin Raszyk (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17193811#comment-17193811
 ] 

Martin Raszyk commented on CALCITE-4242:


Your proposal is not equivalent to the input query if you initialize the 
database as follows.

 
{code:java}
CREATE TABLE P(x INTEGER);
CREATE TABLE Q(y INTEGER);
CREATE TABLE R(z INTEGER);
INSERT INTO P VALUES (1);
{code}
Now the input query yields the table P as the result, but your query yields an 
empty result.

 

> Wrong plan for nested NOT EXISTS subqueries
> ---
>
> Key: CALCITE-4242
> URL: https://issues.apache.org/jira/browse/CALCITE-4242
> Project: Calcite
>  Issue Type: Bug
>Reporter: Martin Raszyk
>Priority: Major
>
> Suppose we initialize an empty database as follows.
>  
> {code:java}
> CREATE TABLE P(x INTEGER);
> CREATE TABLE Q(y INTEGER);
> CREATE TABLE R(z INTEGER);
> INSERT INTO P VALUES (1);
> INSERT INTO Q VALUES (1);{code}
>  
> The following query is supposed to yield an empty table as the result.
>  
> {code:java}
> SELECT x FROM P
> WHERE NOT EXISTS (
>   SELECT y FROM Q
>   WHERE NOT EXISTS (
> SELECT z FROM R
> WHERE x = z
>   )
> ){code}
>  
> However, the query is parsed and converted to the following plan
> {code:java}
> LogicalProject(X=[$0])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[=($0, $1)], joinType=[left])
>   LogicalTableScan(table=[[Bug, P]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$1], $f0=[true])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[true], joinType=[left])
>   LogicalTableScan(table=[[Bug, Q]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$0], $f0=[true])
>   LogicalTableScan(table=[[Bug, R]])
> {code}
> that corresponds to the following SQL query
> {code:java}
> SELECT P.X
> FROM Bug.P
> LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1
> FROM Bug.Q
> LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1
> FROM Bug.R
> GROUP BY Z) AS t0 ON TRUE
> WHERE t0.$f1 IS NULL
> GROUP BY t0.Z) AS t3 ON P.X = t3.Z
> WHERE t3.$f1 IS NULL
> {code}
> which yields the (non-empty) table P as the result.
> Hence, the parsed and converted query is not equivalent to the input query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CALCITE-4243) Wrong SQL conversion for JOIN and NOT EXISTS subquery

2020-09-10 Thread Martin Raszyk (Jira)
Martin Raszyk created CALCITE-4243:
--

 Summary: Wrong SQL conversion for JOIN and NOT EXISTS subquery
 Key: CALCITE-4243
 URL: https://issues.apache.org/jira/browse/CALCITE-4243
 Project: Calcite
  Issue Type: Bug
Reporter: Martin Raszyk


Suppose we initialize an empty database as follows.
{code:java}
CREATE TABLE P(x INTEGER);
CREATE TABLE Q(y INTEGER);
CREATE TABLE R(z INTEGER);
{code}
 

The following query
{code:java}
SELECT *
FROM P JOIN Q ON TRUE
WHERE NOT EXISTS (
  SELECT * FROM R
  WHERE x = z
)
{code}
is parsed and converted to the following plan
{code:java}
LogicalProject(X=[$0], Y=[$1])
  LogicalFilter(condition=[IS NULL($3)])
LogicalJoin(condition=[=($0, $2)], joinType=[left])
  LogicalJoin(condition=[true], joinType=[inner])
LogicalTableScan(table=[[Bug, P]])
LogicalTableScan(table=[[Bug, Q]])
  LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
LogicalProject(Z=[$0], $f0=[true])
  LogicalTableScan(table=[[Bug, R]])
{code}
that is subsequently converted to the following SQL query
{code:java}
SELECT P.X, Q.Y
FROM Bug.P,
Bug.Q
LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1
FROM Bug.R
GROUP BY Z) AS t0 ON P.X = t0.Z
WHERE t0.$f1 IS NULL{code}
which is not valid in SQL, because the LEFT JOIN operation acts only
on the tables Q and R, so the ON condition cannot access the column X
from the table P.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries

2020-09-10 Thread Martin Raszyk (Jira)
Martin Raszyk created CALCITE-4242:
--

 Summary: Wrong plan for nested NOT EXISTS subqueries
 Key: CALCITE-4242
 URL: https://issues.apache.org/jira/browse/CALCITE-4242
 Project: Calcite
  Issue Type: Bug
Reporter: Martin Raszyk


Suppose we initialize an empty database as follows.

 
{code:java}
CREATE TABLE P(x INTEGER);
CREATE TABLE Q(y INTEGER);
CREATE TABLE R(z INTEGER);
INSERT INTO P VALUES (1);
INSERT INTO Q VALUES (1);{code}
 

The following query is supposed to yield an empty table as the result.

 
{code:java}
SELECT x FROM P
WHERE NOT EXISTS (
  SELECT y FROM Q
  WHERE NOT EXISTS (
SELECT z FROM R
WHERE x = z
  )
){code}
 

However, the query is parsed and converted to the following plan
{code:java}
LogicalProject(X=[$0])
  LogicalFilter(condition=[IS NULL($2)])
LogicalJoin(condition=[=($0, $1)], joinType=[left])
  LogicalTableScan(table=[[Bug, P]])
  LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
LogicalProject(Z=[$1], $f0=[true])
  LogicalFilter(condition=[IS NULL($2)])
LogicalJoin(condition=[true], joinType=[left])
  LogicalTableScan(table=[[Bug, Q]])
  LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
LogicalProject(Z=[$0], $f0=[true])
  LogicalTableScan(table=[[Bug, R]])
{code}
that corresponds to the following SQL query
{code:java}
SELECT P.X
FROM Bug.P
LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1
FROM Bug.Q
LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1
FROM Bug.R
GROUP BY Z) AS t0 ON TRUE
WHERE t0.$f1 IS NULL
GROUP BY t0.Z) AS t3 ON P.X = t3.Z
WHERE t3.$f1 IS NULL
{code}
which yields the (non-empty) table P as the result.

Hence, the parsed and converted query is not equivalent to the input query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)