[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22141#discussion_r215333831 --- Diff: sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/nested-not-in.sql --- @@ -0,0 +1,198 @@ +-- Tests NOT-IN subqueries nested inside OR expression(s). --- End diff -- @maropu Oh.. thank you very much for trying out. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22141#discussion_r214007983 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteSubquerySuite.scala --- @@ -52,4 +52,21 @@ class RewriteSubquerySuite extends PlanTest { comparePlans(optimized, correctAnswer) } + test("NOT-IN subquery nested inside OR") { +val relation1 = LocalRelation('a.int, 'b.int) +val relation2 = LocalRelation('c.int, 'd.int) +val exists = 'exists.boolean.notNull + +val query = relation1.where('b === 1 || Not('a.in(ListQuery(relation2.select('c).select('a) + +val plan = relation1.select('a).where('b === 1 || Not('exists)) +val correctAnswer = relation1 + .join(relation2.select('c), ExistenceJoin(exists), Some('a === 'c || IsNull('a === 'c))) + .where('b === 1 || Not(exists)) + .select('a) + .analyze +val optimized = Optimize.execute(query.analyze) + --- End diff -- The end-to-end tests for conflicting attr case already exist though, IMO it'd be better to add deduplication tests here, too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22141#discussion_r214005837 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala --- @@ -137,13 +137,21 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper { plan: LogicalPlan): (Option[Expression], LogicalPlan) = { var newPlan = plan val newExprs = exprs.map { e => - e transformUp { + e transformDown { case Exists(sub, conditions, _) => val exists = AttributeReference("exists", BooleanType, nullable = false)() // Deduplicate conflicting attributes if any. newPlan = dedupJoin( Join(newPlan, sub, ExistenceJoin(exists), conditions.reduceLeftOption(And))) exists +case (Not(InSubquery(values, ListQuery(sub, conditions, _, _ => --- End diff -- nit: removes outermost parentheses --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22141#discussion_r213988466 --- Diff: sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/nested-not-in.sql --- @@ -0,0 +1,198 @@ +-- Tests NOT-IN subqueries nested inside OR expression(s). --- End diff -- https://github.com/apache/spark/pull/22141/files#r211835129 To make sure all the results are correct, I also checked that postgresql had the same output by the queries below. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22141#discussion_r212785260 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala --- @@ -137,13 +137,21 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper { plan: LogicalPlan): (Option[Expression], LogicalPlan) = { var newPlan = plan val newExprs = exprs.map { e => - e transformUp { + e transformDown { case Exists(sub, conditions, _) => val exists = AttributeReference("exists", BooleanType, nullable = false)() // Deduplicate conflicting attributes if any. newPlan = dedupJoin( Join(newPlan, sub, ExistenceJoin(exists), conditions.reduceLeftOption(And))) exists +case (Not(InSubquery(values, ListQuery(sub, conditions, _, _ => + val exists = AttributeReference("exists", BooleanType, nullable = false)() --- End diff -- yea, it's ok to keep the current one. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22141#discussion_r211986809 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala --- @@ -137,13 +137,21 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper { plan: LogicalPlan): (Option[Expression], LogicalPlan) = { var newPlan = plan val newExprs = exprs.map { e => - e transformUp { + e transformDown { --- End diff -- thanks @dilipbiswal :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22141#discussion_r211985537 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala --- @@ -137,13 +137,21 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper { plan: LogicalPlan): (Option[Expression], LogicalPlan) = { var newPlan = plan val newExprs = exprs.map { e => - e transformUp { + e transformDown { --- End diff -- @mgaido91 How can i say "no" to more testing :-) ? I will add the tests. Thanks !! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22141#discussion_r211979620 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala --- @@ -137,13 +137,21 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper { plan: LogicalPlan): (Option[Expression], LogicalPlan) = { var newPlan = plan val newExprs = exprs.map { e => - e transformUp { + e transformDown { --- End diff -- yes, thanks, but that doesn't test when the outer values are null, right? I think it would be good to have also cases with: - more than 2 attributes; - with the outer values being null; - complex data types involved (eg. structs) What do you think? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22141#discussion_r211971929 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala --- @@ -137,13 +137,21 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper { plan: LogicalPlan): (Option[Expression], LogicalPlan) = { var newPlan = plan val newExprs = exprs.map { e => - e transformUp { + e transformDown { --- End diff -- @mgaido91 >> I don't see any test (please correct me if I am wrong) where multiple attributes are used as output of the subquery. Can we add and compare with other RDBMS? Thanks. In [here](https://github.com/apache/spark/blob/844a3ff82a688e7398bb130a44750aec78420698/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/nested-not-in.sql#L113-L134) ? Is this what you meant ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22141#discussion_r211969009 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala --- @@ -137,13 +137,21 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper { plan: LogicalPlan): (Option[Expression], LogicalPlan) = { var newPlan = plan val newExprs = exprs.map { e => - e transformUp { + e transformDown { --- End diff -- to be able to see Not(In) first before (In) ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22141#discussion_r211961738 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala --- @@ -137,13 +137,21 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper { plan: LogicalPlan): (Option[Expression], LogicalPlan) = { var newPlan = plan val newExprs = exprs.map { e => - e transformUp { + e transformDown { --- End diff -- why did you change this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22141#discussion_r211961021 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala --- @@ -137,13 +137,21 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper { plan: LogicalPlan): (Option[Expression], LogicalPlan) = { var newPlan = plan val newExprs = exprs.map { e => - e transformUp { + e transformDown { case Exists(sub, conditions, _) => val exists = AttributeReference("exists", BooleanType, nullable = false)() // Deduplicate conflicting attributes if any. newPlan = dedupJoin( Join(newPlan, sub, ExistenceJoin(exists), conditions.reduceLeftOption(And))) exists +case (Not(InSubquery(values, ListQuery(sub, conditions, _, _ => + val exists = AttributeReference("exists", BooleanType, nullable = false)() + val inConditions = values.zip(sub.output).map(EqualTo.tupled) + val nullAwareJoinConds = inConditions.map(c => Or(c, IsNull(c))) --- End diff -- makes sense, thanks for your answer @dilipbiswal --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22141#discussion_r211955605 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala --- @@ -137,13 +137,21 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper { plan: LogicalPlan): (Option[Expression], LogicalPlan) = { var newPlan = plan val newExprs = exprs.map { e => - e transformUp { + e transformDown { case Exists(sub, conditions, _) => val exists = AttributeReference("exists", BooleanType, nullable = false)() // Deduplicate conflicting attributes if any. newPlan = dedupJoin( Join(newPlan, sub, ExistenceJoin(exists), conditions.reduceLeftOption(And))) exists +case (Not(InSubquery(values, ListQuery(sub, conditions, _, _ => + val exists = AttributeReference("exists", BooleanType, nullable = false)() + val inConditions = values.zip(sub.output).map(EqualTo.tupled) + val nullAwareJoinConds = inConditions.map(c => Or(c, IsNull(c))) --- End diff -- @mgaido91 Thanks !! Actually i have been thinking about it for last few days :-). We probably need a new optimizer rule that simplifies the join conditions based on its child's constraints. So we should be able to simplify - ``` SQL select * from t1 join t2 on (t1c1 = t2c1 OR isnull(t1c1 = t2c1) where t1c1 is not null and t2c1 is not null ``` to ```SQL select * from t1 join t2 on (t1c1 = t2c1) where t1c1 is not null and t2c1 is not null I wanted to handle it as a follow-up. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22141#discussion_r211864482 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala --- @@ -137,13 +137,21 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper { plan: LogicalPlan): (Option[Expression], LogicalPlan) = { var newPlan = plan val newExprs = exprs.map { e => - e transformUp { + e transformDown { case Exists(sub, conditions, _) => val exists = AttributeReference("exists", BooleanType, nullable = false)() // Deduplicate conflicting attributes if any. newPlan = dedupJoin( Join(newPlan, sub, ExistenceJoin(exists), conditions.reduceLeftOption(And))) exists +case (Not(InSubquery(values, ListQuery(sub, conditions, _, _ => + val exists = AttributeReference("exists", BooleanType, nullable = false)() + val inConditions = values.zip(sub.output).map(EqualTo.tupled) + val nullAwareJoinConds = inConditions.map(c => Or(c, IsNull(c))) --- End diff -- shall we add this only when the condition is nullable? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22141#discussion_r211839392 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala --- @@ -137,13 +137,21 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper { plan: LogicalPlan): (Option[Expression], LogicalPlan) = { var newPlan = plan val newExprs = exprs.map { e => - e transformUp { + e transformDown { case Exists(sub, conditions, _) => val exists = AttributeReference("exists", BooleanType, nullable = false)() // Deduplicate conflicting attributes if any. newPlan = dedupJoin( Join(newPlan, sub, ExistenceJoin(exists), conditions.reduceLeftOption(And))) exists +case (Not(InSubquery(values, ListQuery(sub, conditions, _, _ => + val exists = AttributeReference("exists", BooleanType, nullable = false)() --- End diff -- @maropu The common method looks like this - ``` scala def getCommonParams( values: Seq[Expression], sub: LogicalPlan) : (AttributeReference, Seq[Expression]) = { ( AttributeReference("exists", BooleanType, nullable = false)(), values.zip(sub.output).map(EqualTo.tupled) ) } ``` Not much in terms of lines saved since things don't fit in single line.. You still wanted to do it :-) ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22141#discussion_r211835129 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala --- @@ -137,13 +137,21 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper { plan: LogicalPlan): (Option[Expression], LogicalPlan) = { var newPlan = plan val newExprs = exprs.map { e => - e transformUp { + e transformDown { case Exists(sub, conditions, _) => val exists = AttributeReference("exists", BooleanType, nullable = false)() // Deduplicate conflicting attributes if any. newPlan = dedupJoin( Join(newPlan, sub, ExistenceJoin(exists), conditions.reduceLeftOption(And))) exists +case (Not(InSubquery(values, ListQuery(sub, conditions, _, _ => + val exists = AttributeReference("exists", BooleanType, nullable = false)() + val inConditions = values.zip(sub.output).map(EqualTo.tupled) + val nullAwareJoinConds = inConditions.map(c => Or(c, IsNull(c))) --- End diff -- @liwensun I tried all the five queries and they work fine. I verified the results with another database just to make sure. I briefly looked at the plan and they look ok to me. Also i have added all the five tests in my last commit. Please take a look and let me know if anything amiss. Thanks a lot. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...
Github user liwensun commented on a diff in the pull request: https://github.com/apache/spark/pull/22141#discussion_r211806238 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala --- @@ -137,13 +137,21 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper { plan: LogicalPlan): (Option[Expression], LogicalPlan) = { var newPlan = plan val newExprs = exprs.map { e => - e transformUp { + e transformDown { case Exists(sub, conditions, _) => val exists = AttributeReference("exists", BooleanType, nullable = false)() // Deduplicate conflicting attributes if any. newPlan = dedupJoin( Join(newPlan, sub, ExistenceJoin(exists), conditions.reduceLeftOption(And))) exists +case (Not(InSubquery(values, ListQuery(sub, conditions, _, _ => + val exists = AttributeReference("exists", BooleanType, nullable = false)() + val inConditions = values.zip(sub.output).map(EqualTo.tupled) + val nullAwareJoinConds = inConditions.map(c => Or(c, IsNull(c))) --- End diff -- Thanks for the follow up! I think these should be enough to reveal the issue if I understand it correctly. Make sure c2 has null values. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22141#discussion_r211804802 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala --- @@ -137,13 +137,21 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper { plan: LogicalPlan): (Option[Expression], LogicalPlan) = { var newPlan = plan val newExprs = exprs.map { e => - e transformUp { + e transformDown { case Exists(sub, conditions, _) => val exists = AttributeReference("exists", BooleanType, nullable = false)() // Deduplicate conflicting attributes if any. newPlan = dedupJoin( Join(newPlan, sub, ExistenceJoin(exists), conditions.reduceLeftOption(And))) exists +case (Not(InSubquery(values, ListQuery(sub, conditions, _, _ => + val exists = AttributeReference("exists", BooleanType, nullable = false)() + val inConditions = values.zip(sub.output).map(EqualTo.tupled) + val nullAwareJoinConds = inConditions.map(c => Or(c, IsNull(c))) --- End diff -- @liwensun Just to confirm, here is a list of queries i was going to try - select * from t1 where not (c1 not in (select c2 from t2)) select * from t1 where not (c1 > 5 or c1 in (select c2 from t2)) select * from t1 where not (c1 > 5 or c1 not in (select c2 from t2)) select * from t1 where not (c1 > 5 and c1 in (select c2 from t2)) select * from t1 where not (c1 > 5 and c1 not in (select c2 from t2)) Does this look ok to you ? Please let me know if i missed out on any cases ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22141#discussion_r211802296 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala --- @@ -137,13 +137,21 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper { plan: LogicalPlan): (Option[Expression], LogicalPlan) = { var newPlan = plan val newExprs = exprs.map { e => - e transformUp { + e transformDown { case Exists(sub, conditions, _) => val exists = AttributeReference("exists", BooleanType, nullable = false)() // Deduplicate conflicting attributes if any. newPlan = dedupJoin( Join(newPlan, sub, ExistenceJoin(exists), conditions.reduceLeftOption(And))) exists +case (Not(InSubquery(values, ListQuery(sub, conditions, _, _ => + val exists = AttributeReference("exists", BooleanType, nullable = false)() + val inConditions = values.zip(sub.output).map(EqualTo.tupled) + val nullAwareJoinConds = inConditions.map(c => Or(c, IsNull(c))) --- End diff -- @liwensun Thank you very much. Let me try the scenarios and get back.. I may have some questions on the semantics. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22141#discussion_r211801302 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala --- @@ -137,13 +137,21 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper { plan: LogicalPlan): (Option[Expression], LogicalPlan) = { var newPlan = plan val newExprs = exprs.map { e => - e transformUp { + e transformDown { case Exists(sub, conditions, _) => val exists = AttributeReference("exists", BooleanType, nullable = false)() // Deduplicate conflicting attributes if any. newPlan = dedupJoin( Join(newPlan, sub, ExistenceJoin(exists), conditions.reduceLeftOption(And))) exists +case (Not(InSubquery(values, ListQuery(sub, conditions, _, _ => + val exists = AttributeReference("exists", BooleanType, nullable = false)() --- End diff -- nit: There are duplicate codes in `case (Not(InSubqyer...` and `case InSubquery...`. Can we make a simple helper method to remove these? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...
Github user liwensun commented on a diff in the pull request: https://github.com/apache/spark/pull/22141#discussion_r211798295 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala --- @@ -137,13 +137,21 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper { plan: LogicalPlan): (Option[Expression], LogicalPlan) = { var newPlan = plan val newExprs = exprs.map { e => - e transformUp { + e transformDown { case Exists(sub, conditions, _) => val exists = AttributeReference("exists", BooleanType, nullable = false)() // Deduplicate conflicting attributes if any. newPlan = dedupJoin( Join(newPlan, sub, ExistenceJoin(exists), conditions.reduceLeftOption(And))) exists +case (Not(InSubquery(values, ListQuery(sub, conditions, _, _ => + val exists = AttributeReference("exists", BooleanType, nullable = false)() + val inConditions = values.zip(sub.output).map(EqualTo.tupled) + val nullAwareJoinConds = inConditions.map(c => Or(c, IsNull(c))) --- End diff -- Thanks for working on this! But I'm not sure if this can handle the expression like this correctly: ```Not(And/Or(InSubquery, otherExpressiions*))``` or this ```Not(Not(InSubquery))``` Based on my understanding I think fundamentally what we want is probably to change the handling for the InSubquery case here by making the ExistenceJoin null aware somehow instead of adding another `Not(InSubquery(..))` case, right? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22141#discussion_r211788303 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala --- @@ -137,13 +137,21 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper { plan: LogicalPlan): (Option[Expression], LogicalPlan) = { var newPlan = plan val newExprs = exprs.map { e => - e transformUp { + e transformDown { case Exists(sub, conditions, _) => val exists = AttributeReference("exists", BooleanType, nullable = false)() // Deduplicate conflicting attributes if any. newPlan = dedupJoin( Join(newPlan, sub, ExistenceJoin(exists), conditions.reduceLeftOption(And))) exists +case (Not(InSubquery(values, ListQuery(sub, conditions, _, _ => + val exists = AttributeReference("exists", BooleanType, nullable = false)() + val inConditions = values.zip(sub.output).map(EqualTo.tupled) + val nullAwareJoinConds = inConditions.map(c => Or(c, IsNull(c))) --- End diff -- This sounds a reasonable fix. cc @hvanhovell @liwensun --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org