GitHub user sddyljsx opened a pull request:

    https://github.com/apache/spark/pull/21859

    [SPARK-24900][SQL]speed up sort when the dataset is small

    ## What changes were proposed in this pull request?
    
    when running the sql like 'select * from order where order_status = 4 order 
by order_id'. The filescan and filter will be executed twice, it may take a 
long time. If the final dataset is small, and the sample data covers all the 
data, there is no need to do so.
    
    ## How was this patch tested?
    
    (Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
    (If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)
    
    Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sddyljsx/spark order-optimization

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21859.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21859
    
----
commit dd50783d638ca5804531061c0a8aef2c8fef9dc1
Author: neal <neal_song@...>
Date:   2018-07-24T07:26:58Z

    speed up sort when the dataset is small

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to