[GitHub] spark pull request #22524: [SPARK-25497][SQL] Limit operation within whole s...

viirya Sat, 22 Sep 2018 03:11:54 -0700

GitHub user viirya opened a pull request:

    https://github.com/apache/spark/pull/22524


    [SPARK-25497][SQL] Limit operation within whole stage codegen should not 
consume all the inputs

    ## What changes were proposed in this pull request?
    
    This issue was discovered during https://github.com/apache/spark/pull/21738 
.
    
    It turns out that limit is not whole-stage-codegened correctly and always 
consume all the inputs.
    
    This patch fixes limit's whole-stage codegen. Some nodes like hash 
aggregate and range have loop structure that doesn't properly check the 
condition to stop early. It is fixed to stop consume inputs after limit number 
is reached.
    
    ## How was this patch tested?
    
    Added tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/viirya/spark-1 SPARK-25497

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22524.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22524
    
----
commit 12703bded143002be417ffa247eef4a970ffd54c
Author: Liang-Chi Hsieh <viirya@...>
Date:   2018-09-22T09:34:41Z

    limit operation within whole stage codegen should not consume all the 
inputs.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22524: [SPARK-25497][SQL] Limit operation within whole s...

Reply via email to