Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/19082
Let me summarize recent interesting PRs for code generation regarding JVM
bytecode limit for JIT compilation. These PRs encourages to apply JIT
compilation to more methods since most of JIT compilers stop performing JIT
compilation for a method with larger size (e.g. 8000 byte in HotSpot compiler).
There are two categories for PRs.
1. limit the total JVM bytecode size of the generated Java method (#18810,
#19083)
2. Generate a Java method with smaller size (#18931, #19082).
I think that both categories are complementary. I like these activities.
In category 1., it is to disable a whole-stage codegen for a large method
(i.e. more than 8000 JVM byte code) that will not be JIT-compiled.
#18810 tries to **estimate whether JVM bytecode size** is less than 8000 or
not by using the number of lines of a method. The threshold of the line is
2667. If estimated bytecode size more than 8000, whole-stage codegen is
disabled. This threshold worked well for most of programs. However, as @maropu
summarized
[here](https://github.com/apache/spark/pull/19082#issuecomment-335336076), it
did not work for some program (e.g.
[q66](https://github.com/apache/spark/pull/18810#issuecomment-323620029)).
Then, #19083 **checks actual JVM bytecode size** by using the compiled JVM
bytecode. This PR can precisely avoids not to perform JIT compilation.
Category 1. cannot encourage JIT compilation to the whole-staged method.
In category 2., code generation in each part tries to smaller methods (i.e.
8000 JVM byte codes per method) to apply JIT compilations to more methods or to
avoid JVM byte code generation failure (beyond 64KB per methods). This will not
dis
One of activities is to use `CodeGenerator.splitExpressions()`.
#18931 splits a set of `comsume()` functions in a physical plan [into
multiple
methods](https://github.com/apache/spark/pull/18931#issuecomment-325907224)
instead of embedding into one method (e.g. `processNext()`).
#19082 splits operations in aggregation into multiple methods instead of
embedding into one method (e.g. `agg_doAggregateWithoutKey()`).
Even if these PRs create smaller methods, *JIT compiler can make
compilation unit larger* by applying method inlining. To make compilation unit
larger encourages to apply more optimizations in the compilation. For example,
in HotSpot, a method whose JVM bytecode size is up to [325 (frequently
executed)](http://isuru-perera.blogspot.jp/2014/12/java-jit-compilation-inlining-jitwatch.html)
or [35
(normal)](http://www.oracle.com/technetwork/java/vmoptions-jsp-140102.html)
will be inlined. Thus, I think that we will rarely see performance regression.
Category 2. tries to encourage JIT compilation to the whole-staged method
by making its method size smaller.
@gatorsmile, @viirya, @maropu, @rednaxelafx what do you think? Do you have
any comments or questions?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]