[GitHub] [spark] wangyum commented on a change in pull request #28642: [SPARK-31809][SQL] Infer IsNotNull for non null intolerant child of null intolerant in join condition

GitBox Thu, 04 Jun 2020 23:59:12 -0700


wangyum commented on a change in pull request #28642:
URL: https://github.com/apache/spark/pull/28642#discussion_r435725267




##########
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/InferFiltersFromConstraintsSuite.scala
##########
@@ -316,4 +316,19 @@ class InferFiltersFromConstraintsSuite extends PlanTest {
         condition)
     }
   }
+
+  test("Infer IsNotNull for non null-intolerant child of null intolerant join 
condition") {
+    testConstraintsAfterJoin(
+      testRelation.subquery('left),
+      testRelation.subquery('right),
+      testRelation.where(IsNotNull(Coalesce(Seq('a, 'b)))).subquery('left),

Review comment:
       ```
   hive> EXPLAIN SELECT t1.* FROM t1 JOIN t2 ON coalesce(t1.a, t1.b)=t2.a;
   OK
   STAGE DEPENDENCIES:
     Stage-4 is a root stage
     Stage-3 depends on stages: Stage-4
     Stage-0 depends on stages: Stage-3
   
   STAGE PLANS:
     Stage: Stage-4
       Map Reduce Local Work
         Alias -> Map Local Tables:
           $hdt$_0:t1
             Fetch Operator
               limit: -1
         Alias -> Map Local Operator Tree:
           $hdt$_0:t1
             TableScan
               alias: t1
               Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column 
stats: NONE
               Filter Operator
                 predicate: COALESCE(a,b) is not null (type: boolean)
                 Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL 
Column stats: NONE
                 Select Operator
                   expressions: a (type: string), b (type: string), c (type: 
string)
                   outputColumnNames: _col0, _col1, _col2
                   Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL 
Column stats: NONE
                   HashTable Sink Operator
                     keys:
                       0 COALESCE(_col0,_col1) (type: string)
                       1 _col0 (type: string)
   
     Stage: Stage-3
       Map Reduce
         Map Operator Tree:
             TableScan
               alias: t2
               Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column 
stats: NONE
               Filter Operator
                 predicate: a is not null (type: boolean)
                 Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL 
Column stats: NONE
                 Select Operator
                   expressions: a (type: string)
                   outputColumnNames: _col0
                   Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL 
Column stats: NONE
                   Map Join Operator
                     condition map:
                          Inner Join 0 to 1
                     keys:
                       0 COALESCE(_col0,_col1) (type: string)
                       1 _col0 (type: string)
                     outputColumnNames: _col0, _col1, _col2
                     Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL 
Column stats: NONE
                     File Output Operator
                       compressed: false
                       Statistics: Num rows: 1 Data size: 0 Basic stats: 
PARTIAL Column stats: NONE
                       table:
                           input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat
                           output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                           serde: 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
         Execution mode: vectorized
         Local Work:
           Map Reduce Local Work
   
     Stage: Stage-0
       Fetch Operator
         limit: -1
         Processor Tree:
           ListSink
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] wangyum commented on a change in pull request #28642: [SPARK-31809][SQL] Infer IsNotNull for non null intolerant child of null intolerant in join condition

Reply via email to