GitHub user nongli opened a pull request:
https://github.com/apache/spark/pull/9480
[SPARK-10371] [SQL] Implement subexpr elimination for UnsafeProjections
This patch adds the building blocks for codegening subexpr elimination and
implements
it end to end for UnsafeProjection. The building blocks can be used to do
the same thing
for other operators.
It introduces some utilities to compute common sub expressions. Expressions
can be added to
this data structure. The expr and its children will be recursively matched
against existing
expressions (ones previously added) and grouped into common groups. This is
built using
the existing `semanticEquals`. It does not understand things like
commutative or associative
expressions. This can be done as future work.
After building this data structure, the codegen process takes advantage of
it by:
1. Generating a helper function in the generated class that computes the
common
subexpression. This is done for all common subexpressions that have at
least
two occurrences and the expression tree is sufficiently complex.
2. When generating the apply() function, if the helper function exists,
call that
instead of regenerating the expression tree. Repeated calls to the
helper function
shortcircuit the evaluation logic.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/nongli/spark spark-10371
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/9480.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #9480
----
commit 2feafbcc2472503048d9d81c6985c1fcdd1dab80
Author: Nong Li <[email protected]>
Date: 2015-10-28T20:40:17Z
[SPARK-10371] [SQL] Implement subexpr elimination for UnsafeProjections
This patch adds the building blocks for codegening subexpr elimination and
implements
it end to end for UnsafeProjection. The building blocks can be used to do
the same thing
for other operators.
It introduces some utilities to compute common sub expressions. Expressions
can be added to
this data structure. The expr and its children will be recursively matched
against existing
expressions (ones previously added) and grouped into common groups. This is
built using
the existing `semanticEquals`. It does not understand things like
commutative or associative
expressions. This can be done as future work.
After building this data structure, the codegen process takes advantage of
it by:
1. Generating a helper function in the generated class that computes the
common
subexpression. This is done for all common subexpressions that have at
least
two occurrences and the expression tree is sufficiently complex.
2. When generating the apply() function, if the helper function exists,
call that
instead of regenerating the expression tree. Repeated calls to the
helper function
shortcircuit the evaluation logic.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]