[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...

2018-08-21 Thread liwensun
Github user liwensun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22141#discussion_r211806238
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala
 ---
@@ -137,13 +137,21 @@ object RewritePredicateSubquery extends 
Rule[LogicalPlan] with PredicateHelper {
   plan: LogicalPlan): (Option[Expression], LogicalPlan) = {
 var newPlan = plan
 val newExprs = exprs.map { e =>
-  e transformUp {
+  e transformDown {
 case Exists(sub, conditions, _) =>
   val exists = AttributeReference("exists", BooleanType, nullable 
= false)()
   // Deduplicate conflicting attributes if any.
   newPlan = dedupJoin(
 Join(newPlan, sub, ExistenceJoin(exists), 
conditions.reduceLeftOption(And)))
   exists
+case (Not(InSubquery(values, ListQuery(sub, conditions, _, _ =>
+  val exists = AttributeReference("exists", BooleanType, nullable 
= false)()
+  val inConditions = values.zip(sub.output).map(EqualTo.tupled)
+  val nullAwareJoinConds = inConditions.map(c => Or(c, IsNull(c)))
--- End diff --

Thanks for the follow up! I think these should be enough to reveal the 
issue if I understand it correctly. Make sure c2 has null values. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...

2018-08-21 Thread liwensun
Github user liwensun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22141#discussion_r211798295
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala
 ---
@@ -137,13 +137,21 @@ object RewritePredicateSubquery extends 
Rule[LogicalPlan] with PredicateHelper {
   plan: LogicalPlan): (Option[Expression], LogicalPlan) = {
 var newPlan = plan
 val newExprs = exprs.map { e =>
-  e transformUp {
+  e transformDown {
 case Exists(sub, conditions, _) =>
   val exists = AttributeReference("exists", BooleanType, nullable 
= false)()
   // Deduplicate conflicting attributes if any.
   newPlan = dedupJoin(
 Join(newPlan, sub, ExistenceJoin(exists), 
conditions.reduceLeftOption(And)))
   exists
+case (Not(InSubquery(values, ListQuery(sub, conditions, _, _ =>
+  val exists = AttributeReference("exists", BooleanType, nullable 
= false)()
+  val inConditions = values.zip(sub.output).map(EqualTo.tupled)
+  val nullAwareJoinConds = inConditions.map(c => Or(c, IsNull(c)))
--- End diff --

Thanks for working on this! 

But I'm not sure if this can handle the expression like this correctly:
```Not(And/Or(InSubquery, otherExpressiions*))```
or this
```Not(Not(InSubquery))``` 

Based on my understanding I think fundamentally what we want is probably to 
change the handling for the InSubquery case here by making the ExistenceJoin 
null aware somehow instead of adding another `Not(InSubquery(..))` case, right?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org