GitHub user KaiXinXiaoLei opened a pull request:
https://github.com/apache/spark/pull/20670
add constranits
## What changes were proposed in this pull request?
(Please fill in changes proposed in this fix)
I run a sql: `select ls.cs_order_number from ls left semi join
catalog_sales cs on ls.cs_order_number = cs.cs_order_number`, The `ls` table is
a small table ,and the number is one. The `catalog_sales` table is a big table,
and the number is 10 billion. The task will be hang up. And i find the many
null values of `cs_order_number` in the `catalog_sales` table. I think the null
value should be removed in the logical plan.
## How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise,
remove this)
Please review http://spark.apache.org/contributing.html before opening a
pull request.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/KaiXinXiaoLei/spark Spark-23405
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20670.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20670
----
commit 705ed462bb307871e65199ce02576f12d60d2176
Author: KaiXinXiaoLei <584620569@...>
Date: 2018-02-25T06:06:39Z
add constranits
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]