GitHub user juliuszsompolski opened a pull request: https://github.com/apache/spark/pull/19324
[SPARK-22103] Move HashAggregateExec parent consume to a separate function in codegen ## What changes were proposed in this pull request? HashAggregateExec codegen uses two paths for fast hash table and a generic one. It generates code paths for iterating over both, and both code paths generate the consume code of the parent operator, resulting in that code being expanded twice. This leads to a long generated function that might be an issue for the compiler (see e.g. SPARK-21603). I propose to remove the double expansion by generating the consume code in a helper function that can just be called from both iterating loops. An issue with separating the `consume` code to a helper function was that a number of places relied and assumed on being in the scope of an outside `produce` loop and e.g. use `continue` to jump out. I replaced such code flows with nested scopes. It is code that should be handled the same by compiler, while getting rid of depending on assumptions that are outside of the `consume`'s own scope. ## How was this patch tested? Existing test coverage. You can merge this pull request into a Git repository by running: $ git pull https://github.com/juliuszsompolski/apache-spark aggrconsumecodegen Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19324.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19324 ---- commit ca64368b369e2afa6b49e4cdcfa0a3d80704cadb Author: Juliusz Sompolski <ju...@databricks.com> Date: 2017-09-18T15:53:06Z Move HashAggregateExec parent consume to a separate function in codegen ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org