GitHub user viirya opened a pull request:
https://github.com/apache/spark/pull/22524
[SPARK-25497][SQL] Limit operation within whole stage codegen should not
consume all the inputs
## What changes were proposed in this pull request?
This issue was discovered during https://github.com/apache/spark/pull/21738
.
It turns out that limit is not whole-stage-codegened correctly and always
consume all the inputs.
This patch fixes limit's whole-stage codegen. Some nodes like hash
aggregate and range have loop structure that doesn't properly check the
condition to stop early. It is fixed to stop consume inputs after limit number
is reached.
## How was this patch tested?
Added tests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/viirya/spark-1 SPARK-25497
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22524.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22524
----
commit 12703bded143002be417ffa247eef4a970ffd54c
Author: Liang-Chi Hsieh <viirya@...>
Date: 2018-09-22T09:34:41Z
limit operation within whole stage codegen should not consume all the
inputs.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]