GitHub user bdrillard opened a pull request:
https://github.com/apache/spark/pull/18075
[SPARK-18016][SQL][CATALYST] Code Generation: Constant Pool Limit - Class
Splitting
## What changes were proposed in this pull request?
This pull-request exclusively includes the class splitting feature
described in #16648. When code for a given class would grow beyond 1600k bytes,
a private, nested sub-class is generated into which subsequent functions are
inlined. Additional sub-classes are generated as the code threshold is met
subsequent times. This code includes 3 changes:
1. Includes helper maps, lists, and functions for keeping track of
sub-classes during code generation (included in the `CodeGenerator` class).
These helper functions allow nested classes and split functions to be
initialized/declared/inlined to the appropriate locations in the various
projection classes.
2. Changes `addNewFunction` to return a string to support instances where a
split function is inlined to a nested class and not the outer class (and so
must be invoked using the class-qualified name). Uses of `addNewFunction`
throughout the codebase are modified so that the returned name is properly used.
3. Removes instances of the `this` keyword when used on data inside
generated classes. All state declared in the outer class is by default global
and accessible to the nested classes. However, if a reference to global state
in a nested class is prepended with the `this` keyword, it would attempt to
reference state belonging to the nested class (which would not exist), rather
than the correct variable belonging to the outer class.
## How was this patch tested?
Added a test case to the `GeneratedProjectionSuite` that increases the
number of columns tested in various projections to a threshold that would
previously have triggered a `JaninoRuntimeException` for the Constant Pool.
Note: This PR does not address the second Constant Pool issue with code
generation (also mentioned in #16648): excess global mutable state. A second PR
may be opened to resolve that issue.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/bdrillard/spark class_splitting_only
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/18075.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #18075
----
commit a73fdded9ea15bbe66f5f86ca158bb76cdf79033
Author: ALeksander Eskilson <[email protected]>
Date: 2017-05-23T19:41:21Z
class_splitting_only adding class splitting
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]