francis0407 commented on issue #23783: [SPARK-26854][SQL] Support ANY/SOME subquery URL: https://github.com/apache/spark/pull/23783#issuecomment-477161182 Hi @dongjoon-hyun , I reconsidered your suggestion, and made some tests. Semantically, all predicate subqueries(`IN`, `ANY`, `ALL`) are able to be transformed to `EXISTS`. ``` a in (select b from t) -> exists(select b from t where a = b) a > any (select b from t) -> exists(select b from t where a > b) a > all (select b from t) -> not exists(select b from t where not(a > b)) ``` And this is truly useful because we only need to match the `exists` case after the transformation. But currently, this may cause some issues. Spark supports this query: ``` scala> spark.sql( | """ | select * | from a | where exists (select * | from b | where a1 in (select c1 | from c)) | """).explain(true) == Parsed Logical Plan == 'Project [*] +- 'Filter exists#15 [] : +- 'Project [*] : +- 'Filter 'a1 IN (list#14 []) : : +- 'Project ['c1] : : +- 'UnresolvedRelation `c` : +- 'UnresolvedRelation `b` +- 'UnresolvedRelation `a` == Analyzed Logical Plan == a1: int, a2: int Project [a1#2, a2#3] +- Filter exists#15 [a1#2] : +- Project [b1#4, b2#5] : +- Filter outer(a1#2) IN (list#14 []) : : +- Project [c1#6] : : +- SubqueryAlias `default`.`c` : : +- Relation[c1#6,c2#7] parquet : +- SubqueryAlias `default`.`b` : +- Relation[b1#4,b2#5] parquet +- SubqueryAlias `default`.`a` +- Relation[a1#2,a2#3] parquet == Optimized Logical Plan == Join LeftSemi, exists#19: boolean :- Join ExistenceJoin(exists#19), (a1#2 = c1#6) : :- Relation[a1#2,a2#3] parquet : +- Project [c1#6] : +- Relation[c1#6,c2#7] parquet +- Project +- Relation[b1#4,b2#5] parquet ``` But not this one: ``` scala> spark.sql( | """ | select * | from a | where exists(select * | from b | where exists(select * | from c | where a1 = c1)) | """).explain(true) org.apache.spark.sql.AnalysisException: cannot resolve '`a1`' given input columns: [default.c.c1, default.c.c2]; line 8 pos 32; 'Project [*] +- 'Filter exists#13 [] : +- 'Project [*] : +- 'Filter exists#12 [] : : +- 'Project [*] : : +- 'Filter ('a1 = c1#6) : : +- SubqueryAlias `default`.`c` : : +- Relation[c1#6,c2#7] parquet : +- SubqueryAlias `default`.`b` : +- Relation[b1#4,b2#5] parquet +- SubqueryAlias `default`.`a` +- Relation[a1#2,a2#3] parquet ``` Therefore, I think we are not able to use this method in this PR.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
