[GitHub] spark pull request: [SPARK-10371] [SQL] Implement subexpr eliminat...

nongli Wed, 04 Nov 2015 17:22:55 -0800

GitHub user nongli opened a pull request:

    https://github.com/apache/spark/pull/9480


    [SPARK-10371] [SQL] Implement subexpr elimination for UnsafeProjections

    This patch adds the building blocks for codegening subexpr elimination and 
implements
    it end to end for UnsafeProjection. The building blocks can be used to do 
the same thing
    for other operators.
    
    It introduces some utilities to compute common sub expressions. Expressions 
can be added to
    this data structure. The expr and its children will be recursively matched 
against existing
    expressions (ones previously added) and grouped into common groups. This is 
built using
    the existing `semanticEquals`. It does not understand things like 
commutative or associative
    expressions. This can be done as future work.
    
    After building this data structure, the codegen process takes advantage of 
it by:
      1. Generating a helper function in the generated class that computes the 
common
         subexpression. This is done for all common subexpressions that have at 
least
         two occurrences and the expression tree is sufficiently complex.
      2. When generating the apply() function, if the helper function exists, 
call that
         instead of regenerating the expression tree. Repeated calls to the 
helper function
         shortcircuit the evaluation logic.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/nongli/spark spark-10371

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9480.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9480
    
----
commit 2feafbcc2472503048d9d81c6985c1fcdd1dab80
Author: Nong Li <[email protected]>
Date:   2015-10-28T20:40:17Z

    [SPARK-10371] [SQL] Implement subexpr elimination for UnsafeProjections
    
    This patch adds the building blocks for codegening subexpr elimination and 
implements
    it end to end for UnsafeProjection. The building blocks can be used to do 
the same thing
    for other operators.
    
    It introduces some utilities to compute common sub expressions. Expressions 
can be added to
    this data structure. The expr and its children will be recursively matched 
against existing
    expressions (ones previously added) and grouped into common groups. This is 
built using
    the existing `semanticEquals`. It does not understand things like 
commutative or associative
    expressions. This can be done as future work.
    
    After building this data structure, the codegen process takes advantage of 
it by:
      1. Generating a helper function in the generated class that computes the 
common
         subexpression. This is done for all common subexpressions that have at 
least
         two occurrences and the expression tree is sufficiently complex.
      2. When generating the apply() function, if the helper function exists, 
call that
         instead of regenerating the expression tree. Repeated calls to the 
helper function
         shortcircuit the evaluation logic.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-10371] [SQL] Implement subexpr eliminat...

Reply via email to