Liya Fan created FLINK-11421:
--------------------------------

             Summary: Providing more compilation options for code-generated 
operators
                 Key: FLINK-11421
                 URL: https://issues.apache.org/jira/browse/FLINK-11421
             Project: Flink
          Issue Type: New Feature
          Components: Core
            Reporter: Liya Fan
            Assignee: Liya Fan


Flink supports some operators (like Calc, Hash Agg, Hash Join, etc.) by code 
generation. That is, Flink generates their source code dynamically, and then 
compile it into Java Byte Code, which is load and executed at runtime.

 

By default, Flink compiles the generated source code by Janino. This is fast, 
as the compilation often finishes in hundreds of milliseconds. The generated 
Java Byte Code, however, is of poor quality. To illustrate, we use Java 
Compiler API (JCA) to compile the generated code. Experiments on TPC-H (1 TB) 
queries show that the E2E time can be more than 10% shorter, when operators are 
compiled by JCA, despite that it takes more time (a few seconds) to compile 
with JCA.

 

Therefore, we believe it is beneficial to compile generated code by JCA in the 
following scenarios: 1) For batch jobs, the E2E time is relatively long, so it 
is worth of spending more time compiling and generating high quality Java Byte 
Code. 2) For repeated stream jobs, the generated code will be compiled once and 
run many times. Therefore, it pays to spend more time compiling for the first 
time, and enjoy the high byte code qualities for later runs.

 

According to the above observations, we want to provide a compilation option 
(Janino, JCA, or dynamic) for Flink, so that the user can choose the one 
suitable for their specific scenario and obtain better performance whenever 
possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to