GitHub user gatorsmile opened a pull request:

    https://github.com/apache/spark/pull/19393

    [SPARK-21644][SQL] LocalLimit.maxRows is defined incorrectly

    ## What changes were proposed in this pull request?
    The definition of `maxRows` in `LocalLimit` operator was simply wrong. This 
patch introduces a new `maxRowsPerPartition` method and uses that in pruning. 
The patch also adds more documentation on why we need local limit vs global 
limit.
    
    Note that this previously has never been a bug because the way the code is 
structured, but future use of the maxRows could lead to bugs.
    
    ## How was this patch tested?
    Should be covered by existing test cases.
    
    Closes #18851 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gatorsmile/spark pr-18851

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19393.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19393
    
----
commit 9911176d901a6bdfadec3ef68b28e6dd2c82be9e
Author: Reynold Xin <r...@databricks.com>
Date:   2017-08-05T01:19:26Z

    [SPARK-21644][SQL] LocalLimit.maxRows is defined incorrectly

commit fb97a73cd9ad217f04992cd70a44d22b29dc6a9b
Author: gatorsmile <gatorsm...@gmail.com>
Date:   2017-09-29T21:34:05Z

    fix

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to