GitHub user viirya opened a pull request:
https://github.com/apache/spark/pull/22344
[SPARK-25352][SQL] Perform ordered global limit when limit number is bigger
than topKSortFallbackThreshold
## What changes were proposed in this pull request?
We have optimization on global limit to evenly distribute limit rows across
all partitions. This optimization doesn't work for ordered results.
For a query ending with sort + limit, in most cases it is performed by
`TakeOrderedAndProjectExec`.
But if limit number is bigger than `SQLConf.TOP_K_SORT_FALLBACK_THRESHOLD`,
global limit will be used. At this moment, we need to do ordered global limit.
## How was this patch tested?
Unit tests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/viirya/spark-1 SPARK-25352
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22344.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22344
----
commit 8d49c1afdbd6c0219d6cc182e53311201f73489f
Author: Liang-Chi Hsieh <viirya@...>
Date: 2018-09-06T04:43:15Z
Do ordered global limit when limit number is bigger than
topKSortFallbackThreshold.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]