[GitHub] spark pull request #19324: [SPARK-22103] Move HashAggregateExec parent consu...

juliuszsompolski Fri, 22 Sep 2017 08:40:57 -0700

GitHub user juliuszsompolski opened a pull request:

    https://github.com/apache/spark/pull/19324


    [SPARK-22103] Move HashAggregateExec parent consume to a separate function 
in codegen

    ## What changes were proposed in this pull request?
    
    HashAggregateExec codegen uses two paths for fast hash table and a generic 
one.
    It generates code paths for iterating over both, and both code paths 
generate the consume code of the parent operator, resulting in that code being 
expanded twice.
    This leads to a long generated function that might be an issue for the 
compiler (see e.g. SPARK-21603).
    I propose to remove the double expansion by generating the consume code in 
a helper function that can just be called from both iterating loops.
    
    An issue with separating the `consume` code to a helper function was that a 
number of places relied and assumed on being in the scope of an outside 
`produce` loop and e.g. use `continue` to jump out.
    I replaced such code flows with nested scopes. It is code that should be 
handled the same by compiler, while getting rid of depending on assumptions 
that are outside of the `consume`'s own scope.
    
    ## How was this patch tested?
    
    Existing test coverage.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/juliuszsompolski/apache-spark 
aggrconsumecodegen

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19324.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19324
    
----
commit ca64368b369e2afa6b49e4cdcfa0a3d80704cadb
Author: Juliusz Sompolski <ju...@databricks.com>
Date:   2017-09-18T15:53:06Z

    Move HashAggregateExec parent consume to a separate function in codegen

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19324: [SPARK-22103] Move HashAggregateExec parent consu...

Reply via email to