[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries

2021-01-27 Thread Martin Raszyk (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17272890#comment-17272890
 ] 

Martin Raszyk commented on CALCITE-4242:


Are there some updates on this issue?

> Wrong plan for nested NOT EXISTS subqueries
> ---
>
> Key: CALCITE-4242
> URL: https://issues.apache.org/jira/browse/CALCITE-4242
> Project: Calcite
>  Issue Type: Bug
>Reporter: Martin Raszyk
>Priority: Major
>
> Suppose we initialize an empty database as follows.
>  
> {code:java}
> CREATE TABLE P(x INTEGER);
> CREATE TABLE Q(y INTEGER);
> CREATE TABLE R(z INTEGER);
> INSERT INTO P VALUES (1);
> INSERT INTO Q VALUES (1);{code}
>  
> The following query is supposed to yield an empty table as the result.
>  
> {code:java}
> SELECT x FROM P
> WHERE NOT EXISTS (
>   SELECT y FROM Q
>   WHERE NOT EXISTS (
> SELECT z FROM R
> WHERE x = z
>   )
> ){code}
>  
> However, the query is parsed and converted to the following plan
> {code:java}
> LogicalProject(X=[$0])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[=($0, $1)], joinType=[left])
>   LogicalTableScan(table=[[Bug, P]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$1], $f0=[true])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[true], joinType=[left])
>   LogicalTableScan(table=[[Bug, Q]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$0], $f0=[true])
>   LogicalTableScan(table=[[Bug, R]])
> {code}
> that corresponds to the following SQL query
> {code:java}
> SELECT P.X
> FROM Bug.P
> LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1
> FROM Bug.Q
> LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1
> FROM Bug.R
> GROUP BY Z) AS t0 ON TRUE
> WHERE t0.$f1 IS NULL
> GROUP BY t0.Z) AS t3 ON P.X = t3.Z
> WHERE t3.$f1 IS NULL
> {code}
> which yields the (non-empty) table P as the result.
> Hence, the parsed and converted query is not equivalent to the input query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries

2020-09-21 Thread James Starr (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199548#comment-17199548
 ] 

James Starr commented on CALCITE-4242:
--

One nit for your example, what if the resulting filter should return x when it 
is null?  I think adding a simple boolean column handles some of the edge cases 
but I think the join could be much more complicated. 

I think the an important insight from you approach is that the filtered query 
needs to be apart of the filter.  So the algorithm goes:

Terms(open to suggestions):
Data Query - the query that will eventually be filtered
Data Filter - the filter with sub-queries 
Filter Query - A query that will generate all combinations of columns used from 
data query that will pass the filters

1.  Create the base of the filter query from the data query.
2.  Add rewrite sub query filters as LEFT JOINS to build up the filter query.
3.  Use some mechanism to perform a LEFT JOIN from the data query to filter 
query, this needs to handle the multiplicity
4.  Rewrite the the sub-query filter in the data filter to use a field from the 
filter query.

> Wrong plan for nested NOT EXISTS subqueries
> ---
>
> Key: CALCITE-4242
> URL: https://issues.apache.org/jira/browse/CALCITE-4242
> Project: Calcite
>  Issue Type: Bug
>Reporter: Martin Raszyk
>Priority: Major
>
> Suppose we initialize an empty database as follows.
>  
> {code:java}
> CREATE TABLE P(x INTEGER);
> CREATE TABLE Q(y INTEGER);
> CREATE TABLE R(z INTEGER);
> INSERT INTO P VALUES (1);
> INSERT INTO Q VALUES (1);{code}
>  
> The following query is supposed to yield an empty table as the result.
>  
> {code:java}
> SELECT x FROM P
> WHERE NOT EXISTS (
>   SELECT y FROM Q
>   WHERE NOT EXISTS (
> SELECT z FROM R
> WHERE x = z
>   )
> ){code}
>  
> However, the query is parsed and converted to the following plan
> {code:java}
> LogicalProject(X=[$0])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[=($0, $1)], joinType=[left])
>   LogicalTableScan(table=[[Bug, P]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$1], $f0=[true])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[true], joinType=[left])
>   LogicalTableScan(table=[[Bug, Q]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$0], $f0=[true])
>   LogicalTableScan(table=[[Bug, R]])
> {code}
> that corresponds to the following SQL query
> {code:java}
> SELECT P.X
> FROM Bug.P
> LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1
> FROM Bug.Q
> LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1
> FROM Bug.R
> GROUP BY Z) AS t0 ON TRUE
> WHERE t0.$f1 IS NULL
> GROUP BY t0.Z) AS t3 ON P.X = t3.Z
> WHERE t3.$f1 IS NULL
> {code}
> which yields the (non-empty) table P as the result.
> Hence, the parsed and converted query is not equivalent to the input query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries

2020-09-21 Thread Martin Raszyk (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199183#comment-17199183
 ] 

Martin Raszyk commented on CALCITE-4242:


You're right that I still have a correlated query in my proposal. It can be 
avoided by changing the pattern to
{code:java}
WITH filter AS
-- an equivalent query under set semantics, e.g., using left-associative LEFT 
JOIN,
distinct_filter AS (SELECT DISTINCT * FROM filter)
SELECT my_p.x
FROM P AS my_p LEFT JOIN distinct_filter AS df ON my_p.x = df.x
WHERE df.x IS NOT NULL
{code}
I understand how you handled an OR condition, but my point was that it could be 
tricky to preserve the multiplicites throughout (e.g., with conditions that are 
even more complex than a single OR). On the other hand, the set semantics is 
achieved by using left-associative LEFT JOINs and the multiplicities can be 
restored by using a single LEFT JOIN together with a single DISTINCT _at the 
top-level_.

The comparison of the performance of the two approaches is likely to be 
data-dependent: both approaches use three LEFT JOINs in total, two of them are 
identical, but the third one in your query is on the table R whereas in my case 
on a subset of the table P.

> Wrong plan for nested NOT EXISTS subqueries
> ---
>
> Key: CALCITE-4242
> URL: https://issues.apache.org/jira/browse/CALCITE-4242
> Project: Calcite
>  Issue Type: Bug
>Reporter: Martin Raszyk
>Priority: Major
>
> Suppose we initialize an empty database as follows.
>  
> {code:java}
> CREATE TABLE P(x INTEGER);
> CREATE TABLE Q(y INTEGER);
> CREATE TABLE R(z INTEGER);
> INSERT INTO P VALUES (1);
> INSERT INTO Q VALUES (1);{code}
>  
> The following query is supposed to yield an empty table as the result.
>  
> {code:java}
> SELECT x FROM P
> WHERE NOT EXISTS (
>   SELECT y FROM Q
>   WHERE NOT EXISTS (
> SELECT z FROM R
> WHERE x = z
>   )
> ){code}
>  
> However, the query is parsed and converted to the following plan
> {code:java}
> LogicalProject(X=[$0])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[=($0, $1)], joinType=[left])
>   LogicalTableScan(table=[[Bug, P]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$1], $f0=[true])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[true], joinType=[left])
>   LogicalTableScan(table=[[Bug, Q]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$0], $f0=[true])
>   LogicalTableScan(table=[[Bug, R]])
> {code}
> that corresponds to the following SQL query
> {code:java}
> SELECT P.X
> FROM Bug.P
> LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1
> FROM Bug.Q
> LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1
> FROM Bug.R
> GROUP BY Z) AS t0 ON TRUE
> WHERE t0.$f1 IS NULL
> GROUP BY t0.Z) AS t3 ON P.X = t3.Z
> WHERE t3.$f1 IS NULL
> {code}
> which yields the (non-empty) table P as the result.
> Hence, the parsed and converted query is not equivalent to the input query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries

2020-09-20 Thread James Starr (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199086#comment-17199086
 ] 

James Starr commented on CALCITE-4242:
--

On an OR condition you will have to fork the join if you do not have access to 
semi join.  But why re-invent the wheel when when SEMI LEFT JOIN were created 
for this use case.  Your proposal is largely self defeating because you still 
have correlated query after you decorrelated. 
{code:sql}
WITH
q_present AS (SELECT DISTINCT TRUE as present FROM q),
r_present_given_z AS (SELECT DISTINCT TRUE as present, z FROM r GROUP BY z)
SELECT my_p.x
FROM p my_p
LEFT JOIN q_present ON TRUE
LEFT JOIN r_present_given_z AS r1 ON r_present_given_z.z = my_p.x
LEFT JOIN r_present_given_z AS r2 ON r_present_given_z.z = 2 * my_p.x
WHERE NOT (q_present.present IS NOT NULL AND NOT (r1.present IS NOT NULL OR 
r2.present IS NOT NULL));
{code}

> Wrong plan for nested NOT EXISTS subqueries
> ---
>
> Key: CALCITE-4242
> URL: https://issues.apache.org/jira/browse/CALCITE-4242
> Project: Calcite
>  Issue Type: Bug
>Reporter: Martin Raszyk
>Priority: Major
>
> Suppose we initialize an empty database as follows.
>  
> {code:java}
> CREATE TABLE P(x INTEGER);
> CREATE TABLE Q(y INTEGER);
> CREATE TABLE R(z INTEGER);
> INSERT INTO P VALUES (1);
> INSERT INTO Q VALUES (1);{code}
>  
> The following query is supposed to yield an empty table as the result.
>  
> {code:java}
> SELECT x FROM P
> WHERE NOT EXISTS (
>   SELECT y FROM Q
>   WHERE NOT EXISTS (
> SELECT z FROM R
> WHERE x = z
>   )
> ){code}
>  
> However, the query is parsed and converted to the following plan
> {code:java}
> LogicalProject(X=[$0])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[=($0, $1)], joinType=[left])
>   LogicalTableScan(table=[[Bug, P]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$1], $f0=[true])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[true], joinType=[left])
>   LogicalTableScan(table=[[Bug, Q]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$0], $f0=[true])
>   LogicalTableScan(table=[[Bug, R]])
> {code}
> that corresponds to the following SQL query
> {code:java}
> SELECT P.X
> FROM Bug.P
> LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1
> FROM Bug.Q
> LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1
> FROM Bug.R
> GROUP BY Z) AS t0 ON TRUE
> WHERE t0.$f1 IS NULL
> GROUP BY t0.Z) AS t3 ON P.X = t3.Z
> WHERE t3.$f1 IS NULL
> {code}
> which yields the (non-empty) table P as the result.
> Hence, the parsed and converted query is not equivalent to the input query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries

2020-09-20 Thread Martin Raszyk (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199085#comment-17199085
 ] 

Martin Raszyk commented on CALCITE-4242:


I see - thank you for your explanation. However, making the condition slightly 
more complex, e.g., in the query

 
{code:java}
SELECT x FROM P
WHERE NOT EXISTS (
  SELECT y FROM Q
  WHERE NOT EXISTS (
SELECT z FROM R
WHERE x = z OR 2 * x = z
  )
);
{code}
evaluated on the database obtained by

 

 
{code:java}
DELETE FROM P;
DELETE FROM Q;
DELETE FROM R;
INSERT INTO P VALUES (2);
INSERT INTO P VALUES (2);
INSERT INTO P VALUES (1);
INSERT INTO P VALUES (1);
INSERT INTO Q VALUES (2);
INSERT INTO Q VALUES (1);
INSERT INTO R VALUES (2);
INSERT INTO R VALUES (1);
{code}
makes it difficult to reuse your trick with the distinct keyword. In 
particular, the query obtained by replacing the condition in your proposal does 
not preserve multiplicities

 

 
{code:java}
WITH
q_present AS (SELECT DISTINCT TRUE as present FROM q),
r_present_given_z AS (SELECT DISTINCT TRUE as present, z FROM r GROUP BY z)
SELECT my_p.x
FROM p my_p
LEFT JOIN q_present ON TRUE
LEFT JOIN r_present_given_z ON (r_present_given_z.z = my_p.x OR 
r_present_given_z.z = 2 * my_p.x)
WHERE NOT (q_present.present IS NOT NULL AND NOT (r_present_given_z.present IS 
NOT NULL));
{code}
Yet, the filtering approach captured by the pattern

 

 
{code:java}
WITH filter AS
-- an equivalent query under set semantics, e.g., the previous query using 
left-associative LEFT JOIN
SELECT x
FROM P
WHERE x IN (SELECT * FROM filter)
{code}
would work here as well.

 

> Wrong plan for nested NOT EXISTS subqueries
> ---
>
> Key: CALCITE-4242
> URL: https://issues.apache.org/jira/browse/CALCITE-4242
> Project: Calcite
>  Issue Type: Bug
>Reporter: Martin Raszyk
>Priority: Major
>
> Suppose we initialize an empty database as follows.
>  
> {code:java}
> CREATE TABLE P(x INTEGER);
> CREATE TABLE Q(y INTEGER);
> CREATE TABLE R(z INTEGER);
> INSERT INTO P VALUES (1);
> INSERT INTO Q VALUES (1);{code}
>  
> The following query is supposed to yield an empty table as the result.
>  
> {code:java}
> SELECT x FROM P
> WHERE NOT EXISTS (
>   SELECT y FROM Q
>   WHERE NOT EXISTS (
> SELECT z FROM R
> WHERE x = z
>   )
> ){code}
>  
> However, the query is parsed and converted to the following plan
> {code:java}
> LogicalProject(X=[$0])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[=($0, $1)], joinType=[left])
>   LogicalTableScan(table=[[Bug, P]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$1], $f0=[true])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[true], joinType=[left])
>   LogicalTableScan(table=[[Bug, Q]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$0], $f0=[true])
>   LogicalTableScan(table=[[Bug, R]])
> {code}
> that corresponds to the following SQL query
> {code:java}
> SELECT P.X
> FROM Bug.P
> LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1
> FROM Bug.Q
> LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1
> FROM Bug.R
> GROUP BY Z) AS t0 ON TRUE
> WHERE t0.$f1 IS NULL
> GROUP BY t0.Z) AS t3 ON P.X = t3.Z
> WHERE t3.$f1 IS NULL
> {code}
> which yields the (non-empty) table P as the result.
> Hence, the parsed and converted query is not equivalent to the input query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries

2020-09-20 Thread James Starr (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199083#comment-17199083
 ] 

James Starr commented on CALCITE-4242:
--

Adding distincts generates the correct results.  Also, in the original queries, 
if the exists clauses are turned to IN clauses, then a similiar hoisting 
algorithm would be needed.  Finally, it could be generalized and work in the 
absents of NOTs.

{code:sql}
CREATE OR REPLACE VIEW CALCITE_4242 AS 
WITH
q_present AS (SELECT DISTINCT TRUE as present FROM q),
r_present_given_z AS (SELECT DISTINCT TRUE as present, z FROM r GROUP BY z)
SELECT my_p.x
FROM p my_p
LEFT JOIN q_present ON TRUE
LEFT JOIN r_present_given_z ON r_present_given_z.z = my_p.x
WHERE NOT (q_present.present IS NOT NULL AND NOT (r_present_given_z.present IS 
NOT NULL));
{code}

Generates
{code}
 ?column? | x 
--+---
(0 rows)

 ?column? | x 
--+---
 QUERY_2  | 1
 QUERY_2  | 1
 QUERY_2  | 0
 QUERY_2  | 0
(4 rows)

 ?column? | x 
--+---
 QUERY_3  | 1
(1 row)
{code}

> Wrong plan for nested NOT EXISTS subqueries
> ---
>
> Key: CALCITE-4242
> URL: https://issues.apache.org/jira/browse/CALCITE-4242
> Project: Calcite
>  Issue Type: Bug
>Reporter: Martin Raszyk
>Priority: Major
>
> Suppose we initialize an empty database as follows.
>  
> {code:java}
> CREATE TABLE P(x INTEGER);
> CREATE TABLE Q(y INTEGER);
> CREATE TABLE R(z INTEGER);
> INSERT INTO P VALUES (1);
> INSERT INTO Q VALUES (1);{code}
>  
> The following query is supposed to yield an empty table as the result.
>  
> {code:java}
> SELECT x FROM P
> WHERE NOT EXISTS (
>   SELECT y FROM Q
>   WHERE NOT EXISTS (
> SELECT z FROM R
> WHERE x = z
>   )
> ){code}
>  
> However, the query is parsed and converted to the following plan
> {code:java}
> LogicalProject(X=[$0])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[=($0, $1)], joinType=[left])
>   LogicalTableScan(table=[[Bug, P]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$1], $f0=[true])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[true], joinType=[left])
>   LogicalTableScan(table=[[Bug, Q]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$0], $f0=[true])
>   LogicalTableScan(table=[[Bug, R]])
> {code}
> that corresponds to the following SQL query
> {code:java}
> SELECT P.X
> FROM Bug.P
> LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1
> FROM Bug.Q
> LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1
> FROM Bug.R
> GROUP BY Z) AS t0 ON TRUE
> WHERE t0.$f1 IS NULL
> GROUP BY t0.Z) AS t3 ON P.X = t3.Z
> WHERE t3.$f1 IS NULL
> {code}
> which yields the (non-empty) table P as the result.
> Hence, the parsed and converted query is not equivalent to the input query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries

2020-09-20 Thread Martin Raszyk (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199074#comment-17199074
 ] 

Martin Raszyk commented on CALCITE-4242:


I'm confused how the distinct keyword would matter for the tables Q and R which 
do not contain any duplicate values in my counter-example. Could you please 
elaborate more on this?

> Wrong plan for nested NOT EXISTS subqueries
> ---
>
> Key: CALCITE-4242
> URL: https://issues.apache.org/jira/browse/CALCITE-4242
> Project: Calcite
>  Issue Type: Bug
>Reporter: Martin Raszyk
>Priority: Major
>
> Suppose we initialize an empty database as follows.
>  
> {code:java}
> CREATE TABLE P(x INTEGER);
> CREATE TABLE Q(y INTEGER);
> CREATE TABLE R(z INTEGER);
> INSERT INTO P VALUES (1);
> INSERT INTO Q VALUES (1);{code}
>  
> The following query is supposed to yield an empty table as the result.
>  
> {code:java}
> SELECT x FROM P
> WHERE NOT EXISTS (
>   SELECT y FROM Q
>   WHERE NOT EXISTS (
> SELECT z FROM R
> WHERE x = z
>   )
> ){code}
>  
> However, the query is parsed and converted to the following plan
> {code:java}
> LogicalProject(X=[$0])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[=($0, $1)], joinType=[left])
>   LogicalTableScan(table=[[Bug, P]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$1], $f0=[true])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[true], joinType=[left])
>   LogicalTableScan(table=[[Bug, Q]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$0], $f0=[true])
>   LogicalTableScan(table=[[Bug, R]])
> {code}
> that corresponds to the following SQL query
> {code:java}
> SELECT P.X
> FROM Bug.P
> LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1
> FROM Bug.Q
> LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1
> FROM Bug.R
> GROUP BY Z) AS t0 ON TRUE
> WHERE t0.$f1 IS NULL
> GROUP BY t0.Z) AS t3 ON P.X = t3.Z
> WHERE t3.$f1 IS NULL
> {code}
> which yields the (non-empty) table P as the result.
> Hence, the parsed and converted query is not equivalent to the input query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries

2020-09-20 Thread James Starr (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199043#comment-17199043
 ] 

James Starr commented on CALCITE-4242:
--

As brought up before, the group by/aggregate function/distinct is needed to 
handle multiplicity.  If you add distinct to the selects for q_present and 
r_present_given, then the multiplicity is correct.  There probably needs to be 
a more general fix for this in calcite because the same logic also needs to 
applied to all other multi nested subqueries, where the sub queries are not 
traversing down a tree. 

> Wrong plan for nested NOT EXISTS subqueries
> ---
>
> Key: CALCITE-4242
> URL: https://issues.apache.org/jira/browse/CALCITE-4242
> Project: Calcite
>  Issue Type: Bug
>Reporter: Martin Raszyk
>Priority: Major
>
> Suppose we initialize an empty database as follows.
>  
> {code:java}
> CREATE TABLE P(x INTEGER);
> CREATE TABLE Q(y INTEGER);
> CREATE TABLE R(z INTEGER);
> INSERT INTO P VALUES (1);
> INSERT INTO Q VALUES (1);{code}
>  
> The following query is supposed to yield an empty table as the result.
>  
> {code:java}
> SELECT x FROM P
> WHERE NOT EXISTS (
>   SELECT y FROM Q
>   WHERE NOT EXISTS (
> SELECT z FROM R
> WHERE x = z
>   )
> ){code}
>  
> However, the query is parsed and converted to the following plan
> {code:java}
> LogicalProject(X=[$0])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[=($0, $1)], joinType=[left])
>   LogicalTableScan(table=[[Bug, P]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$1], $f0=[true])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[true], joinType=[left])
>   LogicalTableScan(table=[[Bug, Q]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$0], $f0=[true])
>   LogicalTableScan(table=[[Bug, R]])
> {code}
> that corresponds to the following SQL query
> {code:java}
> SELECT P.X
> FROM Bug.P
> LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1
> FROM Bug.Q
> LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1
> FROM Bug.R
> GROUP BY Z) AS t0 ON TRUE
> WHERE t0.$f1 IS NULL
> GROUP BY t0.Z) AS t3 ON P.X = t3.Z
> WHERE t3.$f1 IS NULL
> {code}
> which yields the (non-empty) table P as the result.
> Hence, the parsed and converted query is not equivalent to the input query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries

2020-09-20 Thread Martin Raszyk (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17198921#comment-17198921
 ] 

Martin Raszyk commented on CALCITE-4242:


I'm afraid we are running in a circle. Your most recent query is equivalent to 
your second suggestion which works under the set semantics, but not under the 
multiset semantics as it does not preserve multiplicities. I would conclude the 
following:

1. only NOT EXISTS subqueries *without data dependencies crossing two nesting 
levels* -> use right-associative LEFT JOIN (equivalent also w.r.t. 
multiplicities)

2. NOT EXISTS subquery *with data dependency crossing two nesting levels* -> 
cannot use left-associative LEFT JOIN (does not preserve multiplicities) nor 
right-associative LEFT JOIN (cannot reflect all conditions crossing two nesting 
levels)

Nevertheless, one can use left-associative LEFT JOIN which yields an equivalent 
table w.r.t. set semantics and use it to filter the original table P, thus 
obtaining the following query which *is equivalent* to my original query *under 
the multiset semantics*. The following query can be evaluated using a 
relational algebra plan without subplans.
{code:java}
WITH filter AS
 (WITH
 q_present AS (SELECT TRUE as present FROM q),
 r_present_given_z AS (SELECT TRUE as present, z FROM r GROUP BY z)
 SELECT my_p.x
 FROM p my_p
 LEFT JOIN q_present ON TRUE
 LEFT JOIN r_present_given_z ON r_present_given_z.z = my_p.x
 WHERE NOT (q_present.present IS NOT NULL AND NOT (r_present_given_z.present IS 
NOT NULL)))
SELECT x
FROM P
WHERE x IN (SELECT * FROM filter)
{code}

> Wrong plan for nested NOT EXISTS subqueries
> ---
>
> Key: CALCITE-4242
> URL: https://issues.apache.org/jira/browse/CALCITE-4242
> Project: Calcite
>  Issue Type: Bug
>Reporter: Martin Raszyk
>Priority: Major
>
> Suppose we initialize an empty database as follows.
>  
> {code:java}
> CREATE TABLE P(x INTEGER);
> CREATE TABLE Q(y INTEGER);
> CREATE TABLE R(z INTEGER);
> INSERT INTO P VALUES (1);
> INSERT INTO Q VALUES (1);{code}
>  
> The following query is supposed to yield an empty table as the result.
>  
> {code:java}
> SELECT x FROM P
> WHERE NOT EXISTS (
>   SELECT y FROM Q
>   WHERE NOT EXISTS (
> SELECT z FROM R
> WHERE x = z
>   )
> ){code}
>  
> However, the query is parsed and converted to the following plan
> {code:java}
> LogicalProject(X=[$0])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[=($0, $1)], joinType=[left])
>   LogicalTableScan(table=[[Bug, P]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$1], $f0=[true])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[true], joinType=[left])
>   LogicalTableScan(table=[[Bug, Q]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$0], $f0=[true])
>   LogicalTableScan(table=[[Bug, R]])
> {code}
> that corresponds to the following SQL query
> {code:java}
> SELECT P.X
> FROM Bug.P
> LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1
> FROM Bug.Q
> LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1
> FROM Bug.R
> GROUP BY Z) AS t0 ON TRUE
> WHERE t0.$f1 IS NULL
> GROUP BY t0.Z) AS t3 ON P.X = t3.Z
> WHERE t3.$f1 IS NULL
> {code}
> which yields the (non-empty) table P as the result.
> Hence, the parsed and converted query is not equivalent to the input query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries

2020-09-19 Thread James Starr (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17198831#comment-17198831
 ] 

James Starr commented on CALCITE-4242:
--

[~mraszyk], So I was wrong about, it does need to be to left associative.
{code:sql}
CREATE OR REPLACE VIEW CALCITE_4242 AS 
WITH
q_present AS (SELECT TRUE as present FROM q),
r_present_given_z AS (SELECT TRUE as present, z FROM r GROUP BY z)
SELECT my_p.x
FROM p my_p
LEFT JOIN q_present ON TRUE
LEFT JOIN r_present_given_z ON r_present_given_z.z = my_p.x
WHERE NOT (q_present.present IS NOT NULL AND NOT (r_present_given_z.present IS 
NOT NULL))


CREATE TABLE IF NOT EXISTS P(x INTEGER);
CREATE TABLE IF NOT EXISTS  Q(y INTEGER);
CREATE TABLE IF NOT EXISTS  R(z INTEGER);
DELETE FROM P;
DELETE FROM Q;
DELETE FROM R;

INSERT INTO P VALUES (1);
INSERT INTO Q VALUES (1);

\o results.txt
SELECT 'QUERY_1', * FROM CALCITE_4242;
\o


DELETE FROM P;
DELETE FROM Q;
DELETE FROM R;

INSERT INTO P VALUES (1);
INSERT INTO P VALUES (1);
INSERT INTO P VALUES (0);
INSERT INTO P VALUES (0);
INSERT INTO Q VALUES (1);
INSERT INTO Q VALUES (0);
INSERT INTO R VALUES (1);
INSERT INTO R VALUES (0);

\o | cat - >> results.txt
SELECT 'QUERY_2', * FROM CALCITE_4242;
\o

DELETE FROM P;
DELETE FROM Q;
DELETE FROM R;

INSERT INTO P VALUES (1);

\o | cat - >> results.txt
SELECT 'QUERY_3', * FROM CALCITE_4242;
\o

{code}

Generates the following file
{code}
 ?column? | x 
--+---
(0 rows)

 ?column? | x 
--+---
 QUERY_2  | 1
 QUERY_2  | 1
 QUERY_2  | 1
 QUERY_2  | 1
 QUERY_2  | 0
 QUERY_2  | 0
 QUERY_2  | 0
 QUERY_2  | 0
(8 rows)

 ?column? | x 
--+---
 QUERY_3  | 1
(1 row)
{code}

> Wrong plan for nested NOT EXISTS subqueries
> ---
>
> Key: CALCITE-4242
> URL: https://issues.apache.org/jira/browse/CALCITE-4242
> Project: Calcite
>  Issue Type: Bug
>Reporter: Martin Raszyk
>Priority: Major
>
> Suppose we initialize an empty database as follows.
>  
> {code:java}
> CREATE TABLE P(x INTEGER);
> CREATE TABLE Q(y INTEGER);
> CREATE TABLE R(z INTEGER);
> INSERT INTO P VALUES (1);
> INSERT INTO Q VALUES (1);{code}
>  
> The following query is supposed to yield an empty table as the result.
>  
> {code:java}
> SELECT x FROM P
> WHERE NOT EXISTS (
>   SELECT y FROM Q
>   WHERE NOT EXISTS (
> SELECT z FROM R
> WHERE x = z
>   )
> ){code}
>  
> However, the query is parsed and converted to the following plan
> {code:java}
> LogicalProject(X=[$0])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[=($0, $1)], joinType=[left])
>   LogicalTableScan(table=[[Bug, P]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$1], $f0=[true])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[true], joinType=[left])
>   LogicalTableScan(table=[[Bug, Q]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$0], $f0=[true])
>   LogicalTableScan(table=[[Bug, R]])
> {code}
> that corresponds to the following SQL query
> {code:java}
> SELECT P.X
> FROM Bug.P
> LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1
> FROM Bug.Q
> LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1
> FROM Bug.R
> GROUP BY Z) AS t0 ON TRUE
> WHERE t0.$f1 IS NULL
> GROUP BY t0.Z) AS t3 ON P.X = t3.Z
> WHERE t3.$f1 IS NULL
> {code}
> which yields the (non-empty) table P as the result.
> Hence, the parsed and converted query is not equivalent to the input query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries

2020-09-19 Thread Martin Raszyk (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17198801#comment-17198801
 ] 

Martin Raszyk commented on CALCITE-4242:


On your version of my query, I got the same output as you, namely
{code:java}
CREATE TABLE P(x INTEGER);
CREATE TABLE Q(y INTEGER);
CREATE TABLE R(z INTEGER);
INSERT INTO P VALUES (1);
INSERT INTO Q VALUES (1);

-- evaluate your query

 x 
---
 1
(1 row){code}
However, my original query
{code:java}
SELECT x FROM P
WHERE NOT EXISTS (
  SELECT y FROM Q
  WHERE NOT EXISTS (
SELECT z FROM R
WHERE x = z
  )
)
{code}
yields an empty table (0 rows). So you didn't provide an equivalent query.

> Wrong plan for nested NOT EXISTS subqueries
> ---
>
> Key: CALCITE-4242
> URL: https://issues.apache.org/jira/browse/CALCITE-4242
> Project: Calcite
>  Issue Type: Bug
>Reporter: Martin Raszyk
>Priority: Major
>
> Suppose we initialize an empty database as follows.
>  
> {code:java}
> CREATE TABLE P(x INTEGER);
> CREATE TABLE Q(y INTEGER);
> CREATE TABLE R(z INTEGER);
> INSERT INTO P VALUES (1);
> INSERT INTO Q VALUES (1);{code}
>  
> The following query is supposed to yield an empty table as the result.
>  
> {code:java}
> SELECT x FROM P
> WHERE NOT EXISTS (
>   SELECT y FROM Q
>   WHERE NOT EXISTS (
> SELECT z FROM R
> WHERE x = z
>   )
> ){code}
>  
> However, the query is parsed and converted to the following plan
> {code:java}
> LogicalProject(X=[$0])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[=($0, $1)], joinType=[left])
>   LogicalTableScan(table=[[Bug, P]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$1], $f0=[true])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[true], joinType=[left])
>   LogicalTableScan(table=[[Bug, Q]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$0], $f0=[true])
>   LogicalTableScan(table=[[Bug, R]])
> {code}
> that corresponds to the following SQL query
> {code:java}
> SELECT P.X
> FROM Bug.P
> LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1
> FROM Bug.Q
> LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1
> FROM Bug.R
> GROUP BY Z) AS t0 ON TRUE
> WHERE t0.$f1 IS NULL
> GROUP BY t0.Z) AS t3 ON P.X = t3.Z
> WHERE t3.$f1 IS NULL
> {code}
> which yields the (non-empty) table P as the result.
> Hence, the parsed and converted query is not equivalent to the input query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries

2020-09-17 Thread James Starr (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17197921#comment-17197921
 ] 

James Starr commented on CALCITE-4242:
--

Sorry, about not validating the previous query on postgres.  The null 
conditional tripped me up.  The correct query for postgres would be:

{code:sql}
// Some comments here
public String getFoo()
 WITH
q_present AS (SELECT TRUE as present FROM q),
r_present_given_z AS (SELECT TRUE as present, z FROM r GROUP BY z),
q_present_given_r_present_given_z AS (
SELECT distinct TRUE AS present, r_present_given_z.present AS r_present, 
r_present_given_z.z AS r_z
FROM q_present
LEFT JOIN r_present_given_z ON TRUE
)
SELECT my_p.x
FROM P my_p
LEFT JOIN q_present_given_r_present_given_z ON 
q_present_given_r_present_given_z.r_z = my_p.x
WHERE NOT(
  (q_present_given_r_present_given_z.present IS NOT NULL)
  AND NOT (q_present_given_r_present_given_z.r_present IS NOT NULL)
);
{code}

{code:sql}
CREATE TABLE P(x INTEGER);
CREATE TABLE Q(y INTEGER);
CREATE TABLE R(z INTEGER);
INSERT INTO P VALUES (1);
INSERT INTO Q VALUES (1);

 x 
---
 1
(1 row)

DELETE FROM P;
DELETE FROM Q;
DELETE FROM R;
INSERT INTO P VALUES (1);
INSERT INTO P VALUES (1);
INSERT INTO P VALUES (0);
INSERT INTO P VALUES (0);
INSERT INTO Q VALUES (1);
INSERT INTO Q VALUES (0);
INSERT INTO R VALUES (1);
INSERT INTO R VALUES (0);

 x 
---
 1
 1
 0
 0
(4 rows)

DELETE FROM P;
DELETE FROM Q;
DELETE FROM R;
INSERT INTO P VALUES (1);

 x 
---
 1
(1 row)

{code}

The semi left join/distinct/aggregation handles the multiplicity.  All of the 
conditional need to hoisted to the top where clause.  If the joins are kept 
right-associative, then only top most query joined, in the example 
q_present_given_r_present_given_z, needs to have an aggregation.  However, I 
think this could result in cross product joins if there are 2 nested queries in 
a query that is already nested.  However, most likely the optimization rules 
should be able to rearrange the join such that it be right or left associative 
is not a question of performance, simply of correctness, assuming that the 
rewrite is in such a way that it does not block the rule. 

This query could be very expensive if it was right associative and the planner 
did not rearrange the joins.  
{code:sql}
CREATE TABLE t_1(t1_id INTEGER);
CREATE TABLE t_2(y INTEGER);
CREATE TABLE t_2_1(t1_id INTEGER);
CREATE TABLE t_2_2(t1_id INTEGER);

SELECT * FROM t_1
WHERE EXISTS (
  SELECT * FROM t_2
  WHERE EXISTS(SELECT * FROM t_2_1 WHERE t1.t1_id = t2_1.t1_id)
  AND EXISTS(SELECT * FROM t_2_2 WHERE t1.t1_id = t2_2.t1_id)
);
{code}

So I believe the algorithm works out to something like this:
{code:java}
//De-morgan the expression such that you have a normalized AND expression
List demorganSubExpression = demorganToAnd(subExpression);
List resultExpression = new ArrayList<>();
for(RexNode subExpresion:  demorganSubExpression){
  if(hasSubQuery(subExpression)) {
bb.hoiste(subExpression);
  } else {
resultExpression.add(subExpression);
  }
  return buildAndExpression(resultExpression);
{code}

With bb.hoiste traversing up the tree until until all elements are in scope.


> Wrong plan for nested NOT EXISTS subqueries
> ---
>
> Key: CALCITE-4242
> URL: https://issues.apache.org/jira/browse/CALCITE-4242
> Project: Calcite
>  Issue Type: Bug
>Reporter: Martin Raszyk
>Priority: Major
>
> Suppose we initialize an empty database as follows.
>  
> {code:java}
> CREATE TABLE P(x INTEGER);
> CREATE TABLE Q(y INTEGER);
> CREATE TABLE R(z INTEGER);
> INSERT INTO P VALUES (1);
> INSERT INTO Q VALUES (1);{code}
>  
> The following query is supposed to yield an empty table as the result.
>  
> {code:java}
> SELECT x FROM P
> WHERE NOT EXISTS (
>   SELECT y FROM Q
>   WHERE NOT EXISTS (
> SELECT z FROM R
> WHERE x = z
>   )
> ){code}
>  
> However, the query is parsed and converted to the following plan
> {code:java}
> LogicalProject(X=[$0])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[=($0, $1)], joinType=[left])
>   LogicalTableScan(table=[[Bug, P]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$1], $f0=[true])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[true], joinType=[left])
>   LogicalTableScan(table=[[Bug, Q]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$0], $f0=[true])
>   LogicalTableScan(table=[[Bug, R]])
> {code}
> that corresponds to the following SQL query
> {code:java}
> SELECT P.X
> FROM Bug.P
> LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1
> FROM Bug.Q
> LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1
> FROM Bug.R
> GROUP BY Z) AS t0 ON TRUE
> WHERE t0.$f1 IS NULL
> GROUP BY 

[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries

2020-09-17 Thread Martin Raszyk (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17197422#comment-17197422
 ] 

Martin Raszyk commented on CALCITE-4242:


I tried to use `LEFT SEMI JOIN` in PostgreSQL version `psql (PostgreSQL) 12.4 
(Ubuntu 12.4-0ubuntu0.20.04.1)`, but I didn't succeed.

The query

 
{code:java}
SELECT x FROM P LEFT SEMI JOIN Q ON TRUE;
{code}
leads to syntax error.

 

The query
{code:java}
SELECT x FROM P SEMI JOIN Q ON TRUE;
{code}
is evaluated, but does not yield the proper multiplicities.

Finally, the query
{code:java}
SELECT my_p.x FROM P AS my_p SEMI JOIN Q ON TRUE;
{code}
leads to syntax error, too.

Moreover, I said that "one could keep the LEFT JOIN right-associative for 
nested NOT EXISTS subqueries *without data dependencies crossing two nesting 
levels*". In my very original query
{code:java}
SELECT x FROM P
WHERE NOT EXISTS (
  SELECT y FROM Q
  WHERE NOT EXISTS (
SELECT z FROM R
WHERE x = z
  )
)
{code}
this is not the case and your most recent query which I fixed as follows so 
that it evaluates
{code:java}
WITH
q_present AS (SELECT TRUE as present FROM q),
r_present_given_z AS (SELECT TRUE as present, z FROM r GROUP BY z),
q_present_given_r_present_given_z AS (
SELECT TRUE AS present, r_present_given_z.present AS r_present, 
r_present_given_z.z AS r_z
FROM q_present
LEFT JOIN r_present_given_z ON TRUE
)
SELECT my_p.x
FROM P my_p
LEFT JOIN q_present_given_r_present_given_z ON 
q_present_given_r_present_given_z.r_z = my_p.x
WHERE NOT(
  q_present_given_r_present_given_z.present
  AND NOT q_present_given_r_present_given_z.r_present
);
{code}
is not equivalent to my very original query with the same counter-example as 
you very first suggestion:
{code:java}
CREATE TABLE P(x INTEGER);
CREATE TABLE Q(y INTEGER);
CREATE TABLE R(z INTEGER);
INSERT INTO P VALUES (1);
{code}

> Wrong plan for nested NOT EXISTS subqueries
> ---
>
> Key: CALCITE-4242
> URL: https://issues.apache.org/jira/browse/CALCITE-4242
> Project: Calcite
>  Issue Type: Bug
>Reporter: Martin Raszyk
>Priority: Major
>
> Suppose we initialize an empty database as follows.
>  
> {code:java}
> CREATE TABLE P(x INTEGER);
> CREATE TABLE Q(y INTEGER);
> CREATE TABLE R(z INTEGER);
> INSERT INTO P VALUES (1);
> INSERT INTO Q VALUES (1);{code}
>  
> The following query is supposed to yield an empty table as the result.
>  
> {code:java}
> SELECT x FROM P
> WHERE NOT EXISTS (
>   SELECT y FROM Q
>   WHERE NOT EXISTS (
> SELECT z FROM R
> WHERE x = z
>   )
> ){code}
>  
> However, the query is parsed and converted to the following plan
> {code:java}
> LogicalProject(X=[$0])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[=($0, $1)], joinType=[left])
>   LogicalTableScan(table=[[Bug, P]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$1], $f0=[true])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[true], joinType=[left])
>   LogicalTableScan(table=[[Bug, Q]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$0], $f0=[true])
>   LogicalTableScan(table=[[Bug, R]])
> {code}
> that corresponds to the following SQL query
> {code:java}
> SELECT P.X
> FROM Bug.P
> LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1
> FROM Bug.Q
> LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1
> FROM Bug.R
> GROUP BY Z) AS t0 ON TRUE
> WHERE t0.$f1 IS NULL
> GROUP BY t0.Z) AS t3 ON P.X = t3.Z
> WHERE t3.$f1 IS NULL
> {code}
> which yields the (non-empty) table P as the result.
> Hence, the parsed and converted query is not equivalent to the input query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries

2020-09-11 Thread James Starr (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17194478#comment-17194478
 ] 

James Starr commented on CALCITE-4242:
--

I believe you are correct about keeping the {code:sql}LEFT JOIN{code} right 
associative due to solving the multiplicity issue. If we make the top most join 
`LEFT SEMI`, it would solve the multiplicity issue.

Further more, I think this could be expanded to all exists clauses not just NOT 
exists.

 
{code:sql}
WITH 
q_present AS (SELECT TRUE as present FROM q),
r_present_given_z AS (SELECT TRUE, z FROM r GROUP BY z)
q_present_given_r_present_given_z AS (
SELECT TRUE AS present, r_present_given_x.present AS r_present, 
r_present_given_z.z AS r_z
FROM q_present
LEFT JOIN r_present_given_z ON TRUE
)
SELECT my_p.x
FROM P my_p
LEFT SEMI JOIN q_present_given_r_present_given_z ON 
q_present_given_r_present_given_z.z = my_p.x
WHERE NOT(
  q_present_given_r_present_given_z.present = TRUE 
  AND NOT q_present_given_r_present_given_z.r_present
);
{code}

> Wrong plan for nested NOT EXISTS subqueries
> ---
>
> Key: CALCITE-4242
> URL: https://issues.apache.org/jira/browse/CALCITE-4242
> Project: Calcite
>  Issue Type: Bug
>Reporter: Martin Raszyk
>Priority: Major
>
> Suppose we initialize an empty database as follows.
>  
> {code:java}
> CREATE TABLE P(x INTEGER);
> CREATE TABLE Q(y INTEGER);
> CREATE TABLE R(z INTEGER);
> INSERT INTO P VALUES (1);
> INSERT INTO Q VALUES (1);{code}
>  
> The following query is supposed to yield an empty table as the result.
>  
> {code:java}
> SELECT x FROM P
> WHERE NOT EXISTS (
>   SELECT y FROM Q
>   WHERE NOT EXISTS (
> SELECT z FROM R
> WHERE x = z
>   )
> ){code}
>  
> However, the query is parsed and converted to the following plan
> {code:java}
> LogicalProject(X=[$0])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[=($0, $1)], joinType=[left])
>   LogicalTableScan(table=[[Bug, P]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$1], $f0=[true])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[true], joinType=[left])
>   LogicalTableScan(table=[[Bug, Q]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$0], $f0=[true])
>   LogicalTableScan(table=[[Bug, R]])
> {code}
> that corresponds to the following SQL query
> {code:java}
> SELECT P.X
> FROM Bug.P
> LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1
> FROM Bug.Q
> LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1
> FROM Bug.R
> GROUP BY Z) AS t0 ON TRUE
> WHERE t0.$f1 IS NULL
> GROUP BY t0.Z) AS t3 ON P.X = t3.Z
> WHERE t3.$f1 IS NULL
> {code}
> which yields the (non-empty) table P as the result.
> Hence, the parsed and converted query is not equivalent to the input query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries

2020-09-11 Thread Martin Raszyk (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17194049#comment-17194049
 ] 

Martin Raszyk commented on CALCITE-4242:


Your last query makes sense in the set semantics. However, it seems hard to 
enforce the right multiplicities, e.g., on a database obtained by
{code:java}
DELETE FROM P;
DELETE FROM Q;
DELETE FROM R;
INSERT INTO P VALUES (1);
INSERT INTO P VALUES (1);
INSERT INTO P VALUES (0);
INSERT INTO P VALUES (0);
INSERT INTO Q VALUES (1);
INSERT INTO Q VALUES (0);
INSERT INTO R VALUES (1);
INSERT INTO R VALUES (0);
{code}
Let me also point out that it still makes sense to keep the LEFT JOIN 
right-associative for nested NOT EXISTS subqueries without data dependencies 
crossing two nesting levels, e.g., in the input query
{code:java}
SELECT x FROM P
WHERE NOT EXISTS (
  SELECT y FROM Q
  WHERE x = y AND NOT EXISTS (
SELECT z FROM R
WHERE y = z
  )
)
{code}
for which the currently converted query
{code:java}
SELECT P.X
FROM PQR.P
LEFT JOIN (SELECT Y, MIN(TRUE) AS $f1
FROM (SELECT Q.Y, t0.$f1 AS $f0
FROM PQR.Q
LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1
FROM PQR.R
GROUP BY Z) AS t0 ON Q.Y = t0.Z) AS t1
WHERE t1.$f0 IS NULL
GROUP BY Y) AS t4 ON P.X = t4.Y
WHERE t4.$f1 IS NULL
{code}
is equivalent (also w.r.t. multiplicities) and could be more efficient than 
with a left-associative LEFT JOIN.

Finally, let me point out that the WHERE clause from your last query becomes 
more involved if the nesting depth increases, e.g., on the input query
{code:java}
SELECT x FROM P
WHERE NOT EXISTS (
  SELECT y FROM Q
  WHERE x = y AND NOT EXISTS (
SELECT z FROM R
WHERE x = z AND y = z AND NOT EXISTS (
  SELECT w FROM S
  WHERE x = w AND y = w AND z = w AND NOT EXISTS (
SELECT v FROM T
WHERE x = v AND y = v AND z = v AND w = v
  )
)
  )
)
{code}
where it should read
{code:java}
WHERE my_q.present IS NULL OR (my_r.present IS NOT NULL AND (my_s.present IS 
NULL OR my_t.present IS NOT NULL))
{code}

> Wrong plan for nested NOT EXISTS subqueries
> ---
>
> Key: CALCITE-4242
> URL: https://issues.apache.org/jira/browse/CALCITE-4242
> Project: Calcite
>  Issue Type: Bug
>Reporter: Martin Raszyk
>Priority: Major
>
> Suppose we initialize an empty database as follows.
>  
> {code:java}
> CREATE TABLE P(x INTEGER);
> CREATE TABLE Q(y INTEGER);
> CREATE TABLE R(z INTEGER);
> INSERT INTO P VALUES (1);
> INSERT INTO Q VALUES (1);{code}
>  
> The following query is supposed to yield an empty table as the result.
>  
> {code:java}
> SELECT x FROM P
> WHERE NOT EXISTS (
>   SELECT y FROM Q
>   WHERE NOT EXISTS (
> SELECT z FROM R
> WHERE x = z
>   )
> ){code}
>  
> However, the query is parsed and converted to the following plan
> {code:java}
> LogicalProject(X=[$0])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[=($0, $1)], joinType=[left])
>   LogicalTableScan(table=[[Bug, P]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$1], $f0=[true])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[true], joinType=[left])
>   LogicalTableScan(table=[[Bug, Q]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$0], $f0=[true])
>   LogicalTableScan(table=[[Bug, R]])
> {code}
> that corresponds to the following SQL query
> {code:java}
> SELECT P.X
> FROM Bug.P
> LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1
> FROM Bug.Q
> LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1
> FROM Bug.R
> GROUP BY Z) AS t0 ON TRUE
> WHERE t0.$f1 IS NULL
> GROUP BY t0.Z) AS t3 ON P.X = t3.Z
> WHERE t3.$f1 IS NULL
> {code}
> which yields the (non-empty) table P as the result.
> Hence, the parsed and converted query is not equivalent to the input query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries

2020-09-10 Thread James Starr (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17193868#comment-17193868
 ] 

James Starr commented on CALCITE-4242:
--

This is or at least reasonable close to what the decorrelated query should look 
like.  So perhaps how logic for pushing up joins is not correct.
{code:java}
SELECT my_p.X
FROM P my_p
LEFT JOIN (SELECT TRUE as present FROM Q ) AS my_q ON TRUE
LEFT JOIN (SELECT Z, TRUE AS present FROM R GROUP BY Z) AS my_r ON my_p.X = 
my_r.Z
WHERE my_q.present IS NULL OR my_r.present IS NOT NULL
{code}
 

> Wrong plan for nested NOT EXISTS subqueries
> ---
>
> Key: CALCITE-4242
> URL: https://issues.apache.org/jira/browse/CALCITE-4242
> Project: Calcite
>  Issue Type: Bug
>Reporter: Martin Raszyk
>Priority: Major
>
> Suppose we initialize an empty database as follows.
>  
> {code:java}
> CREATE TABLE P(x INTEGER);
> CREATE TABLE Q(y INTEGER);
> CREATE TABLE R(z INTEGER);
> INSERT INTO P VALUES (1);
> INSERT INTO Q VALUES (1);{code}
>  
> The following query is supposed to yield an empty table as the result.
>  
> {code:java}
> SELECT x FROM P
> WHERE NOT EXISTS (
>   SELECT y FROM Q
>   WHERE NOT EXISTS (
> SELECT z FROM R
> WHERE x = z
>   )
> ){code}
>  
> However, the query is parsed and converted to the following plan
> {code:java}
> LogicalProject(X=[$0])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[=($0, $1)], joinType=[left])
>   LogicalTableScan(table=[[Bug, P]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$1], $f0=[true])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[true], joinType=[left])
>   LogicalTableScan(table=[[Bug, Q]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$0], $f0=[true])
>   LogicalTableScan(table=[[Bug, R]])
> {code}
> that corresponds to the following SQL query
> {code:java}
> SELECT P.X
> FROM Bug.P
> LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1
> FROM Bug.Q
> LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1
> FROM Bug.R
> GROUP BY Z) AS t0 ON TRUE
> WHERE t0.$f1 IS NULL
> GROUP BY t0.Z) AS t3 ON P.X = t3.Z
> WHERE t3.$f1 IS NULL
> {code}
> which yields the (non-empty) table P as the result.
> Hence, the parsed and converted query is not equivalent to the input query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries

2020-09-10 Thread Martin Raszyk (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17193811#comment-17193811
 ] 

Martin Raszyk commented on CALCITE-4242:


Your proposal is not equivalent to the input query if you initialize the 
database as follows.

 
{code:java}
CREATE TABLE P(x INTEGER);
CREATE TABLE Q(y INTEGER);
CREATE TABLE R(z INTEGER);
INSERT INTO P VALUES (1);
{code}
Now the input query yields the table P as the result, but your query yields an 
empty result.

 

> Wrong plan for nested NOT EXISTS subqueries
> ---
>
> Key: CALCITE-4242
> URL: https://issues.apache.org/jira/browse/CALCITE-4242
> Project: Calcite
>  Issue Type: Bug
>Reporter: Martin Raszyk
>Priority: Major
>
> Suppose we initialize an empty database as follows.
>  
> {code:java}
> CREATE TABLE P(x INTEGER);
> CREATE TABLE Q(y INTEGER);
> CREATE TABLE R(z INTEGER);
> INSERT INTO P VALUES (1);
> INSERT INTO Q VALUES (1);{code}
>  
> The following query is supposed to yield an empty table as the result.
>  
> {code:java}
> SELECT x FROM P
> WHERE NOT EXISTS (
>   SELECT y FROM Q
>   WHERE NOT EXISTS (
> SELECT z FROM R
> WHERE x = z
>   )
> ){code}
>  
> However, the query is parsed and converted to the following plan
> {code:java}
> LogicalProject(X=[$0])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[=($0, $1)], joinType=[left])
>   LogicalTableScan(table=[[Bug, P]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$1], $f0=[true])
>   LogicalFilter(condition=[IS NULL($2)])
> LogicalJoin(condition=[true], joinType=[left])
>   LogicalTableScan(table=[[Bug, Q]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(Z=[$0], $f0=[true])
>   LogicalTableScan(table=[[Bug, R]])
> {code}
> that corresponds to the following SQL query
> {code:java}
> SELECT P.X
> FROM Bug.P
> LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1
> FROM Bug.Q
> LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1
> FROM Bug.R
> GROUP BY Z) AS t0 ON TRUE
> WHERE t0.$f1 IS NULL
> GROUP BY t0.Z) AS t3 ON P.X = t3.Z
> WHERE t3.$f1 IS NULL
> {code}
> which yields the (non-empty) table P as the result.
> Hence, the parsed and converted query is not equivalent to the input query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)