GitHub user rednaxelafx opened a pull request:

    https://github.com/apache/spark/pull/18255

    [SPARK-21038][SQL] Reduce redundant generated init code in Catalyst codegen

    ## What changes were proposed in this pull request?
    
    In Java, instance fields are guaranteed to be first initialized to their 
corresponding default values (zero values) before the constructor is invoked. 
Thus, explicitly code to initialize fields to their zero values is redundant 
and should be avoided.
    
    It's usually harmless to have such code in hand-written constructors, but 
in the case of mechanically generating code, such code could contribute to a 
significant portion of the code size and cause issues.
    
    This ticket is a step in reducing the likelihood of hitting the 64KB 
bytecode method size limit in the Java Class files. This PR uses simple 
heuristics to filter out redundant code of initializing mutable state to their 
default (zero) values in `CodegenContext.addMutableState`, by matching string 
patterns. Basically, if the `initCode` in the added mutable state is like:
    ```
    [this.]${variableName} = ${defaultValue};
    ```
    (where `[this.]` indicates it's optional)
    Then it'll be replaced by an empty string instead.
    
    This pattern will catch the most common cases. An example of drawn from 
production is:
    ```
    /* 262651 */     value24709 = -1L;
    /* 262652 */     isNull24710 = false;
    /* 262653 */     value24710 = -1L;
    /* 262654 */     isNull24718 = false;
    /* 262655 */     value24718 = false;
    /* 262656 */     isNull24719 = false;
    /* 262657 */     value24719 = -1L;
    /* 262658 */     isNull24720 = false;
    /* 262659 */     value24720 = -1L;
    /* 262660 */     isNull24728 = false;
    /* 262661 */     value24728 = false;
    /* 262662 */     isNull24729 = false;
    /* 262663 */     value24729 = -1L;
    /* 262664 */     isNull24730 = false;
    /* 262665 */     value24730 = -1L;
    /* 262666 */     isNull24738 = false;
    ...
    ```
    where all of the `isNullNNN = false;` initialization code is redundant.
    
    Alternatives are:
    * Manually go through all the places where redundant initialization code is 
specified in `addMutableState()` call sites, and change them to empty string. 
That's tedious and involves changing a lot of code. But if we do it this way, 
we could considering giving `initCode` an default value of `""` in the 
declaration of `addMutableState()`, which could be a nice improvement too.
    * Instead of using a simple regular-expression-based heuristic, use an 
actual parse tree to verify the redundant code pattern. That would make it very 
precise and be able to catch all cases, but it'll involve a bigger change, 
potentially changing a bit of infrastructure in the codegen, e.g. codegen with 
parse tree (AST) rather than strings. I'd like to take baby steps first and 
move to more heavyweight options later.
    
    ## How was this patch tested?
    
    Ran SQL and Catalyst unit tests. Also added some unit tests for the new 
filtering heuristic.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rednaxelafx/apache-spark 
codegen-reduce-redundant-init

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18255.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18255
    
----
commit de9ce5b19719a629ceccc48389e856a4c1959174
Author: Kris Mok <[email protected]>
Date:   2017-06-08T17:46:15Z

    Normalize the initCode in CodegenContext's mutableState by filtering out 
redundant
    default initialization code with a simple heuristic.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to