GitHub user jinxing64 opened a pull request:

    https://github.com/apache/spark/pull/21252

    [SPARK-24193] Sort by disk when number of limit is big in 
TakeOrderedAndProjectExec

    ## What changes were proposed in this pull request?
    
    Physical plan of `select colA from t order by colB limit M` is 
`TakeOrderedAndProject`;
    Currently `TakeOrderedAndProject` sorts data in memory, see 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala#L158
 
    Shall we add a config – if the number of limit (M) is too big, we can 
sort by disk ? Thus memory issue can be resolved.
    
    ## How was this patch tested?
    
    Not added yet.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jinxing64/spark SPARK-24193

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21252.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21252
    
----
commit 522c374f2f757b384132b3227e82e9688b2c9ffd
Author: jinxing <jinxing6042@...>
Date:   2018-05-05T04:40:42Z

    [SPARK-24193] Sort by disk when number of limit is big in 
TakeOrderedAndProjectExec

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to