[GitHub] spark pull request #18075: [SPARK-18016][SQL][CATALYST] Code Generation: Con...

bdrillard Tue, 23 May 2017 13:42:04 -0700

GitHub user bdrillard opened a pull request:

    https://github.com/apache/spark/pull/18075


    [SPARK-18016][SQL][CATALYST] Code Generation: Constant Pool Limit - Class 
Splitting

    ## What changes were proposed in this pull request?
    
    This pull-request exclusively includes the class splitting feature 
described in #16648. When code for a given class would grow beyond 1600k bytes, 
a private, nested sub-class is generated into which subsequent functions are 
inlined. Additional sub-classes are generated as the code threshold is met 
subsequent times. This code includes 3 changes:
    
    1. Includes helper maps, lists, and functions for keeping track of 
sub-classes during code generation (included in the `CodeGenerator` class). 
These helper functions allow nested classes and split functions to be 
initialized/declared/inlined to the appropriate locations in the various 
projection classes.
    2. Changes `addNewFunction` to return a string to support instances where a 
split function is inlined to a nested class and not the outer class (and so 
must be invoked using the class-qualified name). Uses of `addNewFunction` 
throughout the codebase are modified so that the returned name is properly used.
    3. Removes instances of the `this` keyword when used on data inside 
generated classes. All state declared in the outer class is by default global 
and accessible to the nested classes. However, if a reference to global state 
in a nested class is prepended with the `this` keyword, it would attempt to 
reference state belonging to the nested class (which would not exist), rather 
than the correct variable belonging to the outer class.
    
    ## How was this patch tested?
    
    Added a test case to the `GeneratedProjectionSuite` that increases the 
number of columns tested in various projections to a threshold that would 
previously have triggered a `JaninoRuntimeException` for the Constant Pool. 
    
    Note: This PR does not address the second Constant Pool issue with code 
generation (also mentioned in #16648): excess global mutable state. A second PR 
may be opened to resolve that issue. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/bdrillard/spark class_splitting_only

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18075.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18075
    
----
commit a73fdded9ea15bbe66f5f86ca158bb76cdf79033
Author: ALeksander Eskilson <[email protected]>
Date:   2017-05-23T19:41:21Z

    class_splitting_only adding class splitting

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #18075: [SPARK-18016][SQL][CATALYST] Code Generation: Con...

Reply via email to