GitHub user KaiXinXiaoLei opened a pull request:

    https://github.com/apache/spark/pull/20670

    add constranits

    ## What changes were proposed in this pull request?
    
    (Please fill in changes proposed in this fix)
    I run a sql: `select ls.cs_order_number from ls left semi join 
catalog_sales cs on ls.cs_order_number = cs.cs_order_number`, The `ls` table is 
a small table ,and the number is one. The `catalog_sales` table is a big table, 
 and the number is 10 billion. The task will be hang up. And i find the many 
null values of `cs_order_number` in the `catalog_sales` table. I think the null 
value should be removed in the logical plan.
    
    ## How was this patch tested?
    
    (Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
    (If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)
    
    
    Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/KaiXinXiaoLei/spark Spark-23405

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20670.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20670
    
----
commit 705ed462bb307871e65199ce02576f12d60d2176
Author: KaiXinXiaoLei <584620569@...>
Date:   2018-02-25T06:06:39Z

    add constranits

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to