GitHub user sddyljsx opened a pull request:
https://github.com/apache/spark/pull/21859
[SPARK-24900][SQL]speed up sort when the dataset is small
## What changes were proposed in this pull request?
when running the sql like 'select * from order where order_status = 4 order
by order_id'. The filescan and filter will be executed twice, it may take a
long time. If the final dataset is small, and the sample data covers all the
data, there is no need to do so.
## How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise,
remove this)
Please review http://spark.apache.org/contributing.html before opening a
pull request.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sddyljsx/spark order-optimization
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21859.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21859
----
commit dd50783d638ca5804531061c0a8aef2c8fef9dc1
Author: neal <neal_song@...>
Date: 2018-07-24T07:26:58Z
speed up sort when the dataset is small
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]