[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...

dilipbiswal Wed, 22 Aug 2018 06:35:28 -0700

Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22141#discussion_r211955605
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala
 ---
    @@ -137,13 +137,21 @@ object RewritePredicateSubquery extends 
Rule[LogicalPlan] with PredicateHelper {
           plan: LogicalPlan): (Option[Expression], LogicalPlan) = {
         var newPlan = plan
         val newExprs = exprs.map { e =>
    -      e transformUp {
    +      e transformDown {
             case Exists(sub, conditions, _) =>
               val exists = AttributeReference("exists", BooleanType, nullable 
= false)()
               // Deduplicate conflicting attributes if any.
               newPlan = dedupJoin(
                 Join(newPlan, sub, ExistenceJoin(exists), 
conditions.reduceLeftOption(And)))
               exists
    +        case (Not(InSubquery(values, ListQuery(sub, conditions, _, _)))) =>
    +          val exists = AttributeReference("exists", BooleanType, nullable 
= false)()
    +          val inConditions = values.zip(sub.output).map(EqualTo.tupled)
    +          val nullAwareJoinConds = inConditions.map(c => Or(c, IsNull(c)))
    --- End diff --
    
    @mgaido91 Thanks !! Actually i have been thinking about it for last few 
days :-). We probably need a new optimizer rule that simplifies the join 
conditions based on its child's constraints. So we should be able to simplify -
    
    ``` SQL
    select * from t1 join t2 on (t1c1 = t2c1 OR isnull(t1c1 = t2c1) where t1c1 
is not null and t2c1 is not null
    ```
    to
    ```SQL
    select * from t1 join t2 on (t1c1 = t2c1) where  t1c1 is not null and t2c1 
is not null
    ````
    I wanted to handle it as a follow-up.




---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...

Reply via email to