GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/22698

    [SPARK-25710][SQL] range should report metrics correctly

    ## What changes were proposed in this pull request?
    
    Currently `Range` reports metrics in batch granularity. This is acceptable, 
but it's better if we can make it row granularity without performance penalty.
    
    Before this PR,  the metrics are updated when preparing the batch, which is 
before we actually consume data. In this PR, the metrics are updated after the 
data are consumed. There are 2 different cases:
    1. The data processing loop has a stop check. The metrics are updated when 
we need to stop.
    2. no stop check. The metrics are updated after the loop.
    
    ## How was this patch tested?
    
    existing tests and a new benchmark

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark range

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22698.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22698
    
----
commit 1071c14a1fac551a3dc8025f7b7da1a89892d05a
Author: Wenchen Fan <wenchen@...>
Date:   2018-10-11T14:12:28Z

    range should report metrics correctly

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to