GitHub user jinxing64 opened a pull request:
https://github.com/apache/spark/pull/21252
[SPARK-24193] Sort by disk when number of limit is big in
TakeOrderedAndProjectExec
## What changes were proposed in this pull request?
Physical plan of `select colA from t order by colB limit M` is
`TakeOrderedAndProject`;
Currently `TakeOrderedAndProject` sorts data in memory, see
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala#L158
Shall we add a config â if the number of limit (M) is too big, we can
sort by disk ? Thus memory issue can be resolved.
## How was this patch tested?
Not added yet.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jinxing64/spark SPARK-24193
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21252.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21252
----
commit 522c374f2f757b384132b3227e82e9688b2c9ffd
Author: jinxing <jinxing6042@...>
Date: 2018-05-05T04:40:42Z
[SPARK-24193] Sort by disk when number of limit is big in
TakeOrderedAndProjectExec
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]