[jira] [Commented] (FLINK-11421) Add compilation options to allow compiling generated code with JDK compiler
[ https://issues.apache.org/jira/browse/FLINK-11421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17334029#comment-17334029 ] Flink Jira Bot commented on FLINK-11421: This issue was marked "stale-assigned" and has not received an update in 7 days. It is now automatically unassigned. If you are still working on it, you can assign it to yourself again. Please also give an update about the status of the work. > Add compilation options to allow compiling generated code with JDK compiler > > > Key: FLINK-11421 > URL: https://issues.apache.org/jira/browse/FLINK-11421 > Project: Flink > Issue Type: New Feature > Components: Table SQL / Runtime >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Major > Labels: pull-request-available, stale-assigned > Original Estimate: 240h > Time Spent: 40m > Remaining Estimate: 239h 20m > > Flink supports some operators (like Calc, Hash Agg, Hash Join, etc.) by code > generation. That is, Flink generates their source code dynamically, and then > compile it into Java Byte Code, which is load and executed at runtime. > > By default, Flink compiles the generated source code by Janino. This is fast, > as the compilation often finishes in hundreds of milliseconds. The generated > Java Byte Code, however, is of poor quality. To illustrate, we use Java > Compiler API (JCA) to compile the generated code. Experiments on TPC-H (1 TB) > queries show that the E2E time can be more than 10% shorter, when operators > are compiled by JCA, despite that it takes more time (a few seconds) to > compile with JCA. > > Therefore, we believe it is beneficial to compile generated code by JCA in > the following scenarios: 1) For batch jobs, the E2E time is relatively long, > so it is worth of spending more time compiling and generating high quality > Java Byte Code. 2) For repeated stream jobs, the generated code will be > compiled once and run many times. Therefore, it pays to spend more time > compiling for the first time, and enjoy the high byte code qualities for > later runs. > > According to the above observations, we want to provide a compilation option > (Janino, JCA, or dynamic) for Flink, so that the user can choose the one > suitable for their specific scenario and obtain better performance whenever > possible. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-11421) Add compilation options to allow compiling generated code with JDK compiler
[ https://issues.apache.org/jira/browse/FLINK-11421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17323384#comment-17323384 ] Flink Jira Bot commented on FLINK-11421: This issue is assigned but has not received an update in 7 days so it has been labeled "stale-assigned". If you are still working on the issue, please give an update and remove the label. If you are no longer working on the issue, please unassign so someone else may work on it. In 7 days the issue will be automatically unassigned. > Add compilation options to allow compiling generated code with JDK compiler > > > Key: FLINK-11421 > URL: https://issues.apache.org/jira/browse/FLINK-11421 > Project: Flink > Issue Type: New Feature > Components: Table SQL / Runtime >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Major > Labels: pull-request-available, stale-assigned > Original Estimate: 240h > Time Spent: 40m > Remaining Estimate: 239h 20m > > Flink supports some operators (like Calc, Hash Agg, Hash Join, etc.) by code > generation. That is, Flink generates their source code dynamically, and then > compile it into Java Byte Code, which is load and executed at runtime. > > By default, Flink compiles the generated source code by Janino. This is fast, > as the compilation often finishes in hundreds of milliseconds. The generated > Java Byte Code, however, is of poor quality. To illustrate, we use Java > Compiler API (JCA) to compile the generated code. Experiments on TPC-H (1 TB) > queries show that the E2E time can be more than 10% shorter, when operators > are compiled by JCA, despite that it takes more time (a few seconds) to > compile with JCA. > > Therefore, we believe it is beneficial to compile generated code by JCA in > the following scenarios: 1) For batch jobs, the E2E time is relatively long, > so it is worth of spending more time compiling and generating high quality > Java Byte Code. 2) For repeated stream jobs, the generated code will be > compiled once and run many times. Therefore, it pays to spend more time > compiling for the first time, and enjoy the high byte code qualities for > later runs. > > According to the above observations, we want to provide a compilation option > (Janino, JCA, or dynamic) for Flink, so that the user can choose the one > suitable for their specific scenario and obtain better performance whenever > possible. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-11421) Add compilation options to allow compiling generated code with JDK compiler
[ https://issues.apache.org/jira/browse/FLINK-11421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833687#comment-16833687 ] Liya Fan commented on FLINK-11421: -- [~lzljs3620320] Thanks a lot for your confirm. >> I think we should know what patterns can be improved and what patterns may >> not be improved. (and how much has it improved) This is an interesting question, and I would like to spend some effort on that. >> I mean you can run these tests manual, just test the JCA logical. Sounds good. Thanks for your suggestion. > Add compilation options to allow compiling generated code with JDK compiler > > > Key: FLINK-11421 > URL: https://issues.apache.org/jira/browse/FLINK-11421 > Project: Flink > Issue Type: New Feature > Components: Table SQL / Runtime >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Major > Labels: pull-request-available > Original Estimate: 240h > Time Spent: 20m > Remaining Estimate: 239h 40m > > Flink supports some operators (like Calc, Hash Agg, Hash Join, etc.) by code > generation. That is, Flink generates their source code dynamically, and then > compile it into Java Byte Code, which is load and executed at runtime. > > By default, Flink compiles the generated source code by Janino. This is fast, > as the compilation often finishes in hundreds of milliseconds. The generated > Java Byte Code, however, is of poor quality. To illustrate, we use Java > Compiler API (JCA) to compile the generated code. Experiments on TPC-H (1 TB) > queries show that the E2E time can be more than 10% shorter, when operators > are compiled by JCA, despite that it takes more time (a few seconds) to > compile with JCA. > > Therefore, we believe it is beneficial to compile generated code by JCA in > the following scenarios: 1) For batch jobs, the E2E time is relatively long, > so it is worth of spending more time compiling and generating high quality > Java Byte Code. 2) For repeated stream jobs, the generated code will be > compiled once and run many times. Therefore, it pays to spend more time > compiling for the first time, and enjoy the high byte code qualities for > later runs. > > According to the above observations, we want to provide a compilation option > (Janino, JCA, or dynamic) for Flink, so that the user can choose the one > suitable for their specific scenario and obtain better performance whenever > possible. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-11421) Add compilation options to allow compiling generated code with JDK compiler
[ https://issues.apache.org/jira/browse/FLINK-11421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833673#comment-16833673 ] Jingsong Lee commented on FLINK-11421: -- [~fan_li_ya] I think you can restart it for blink planner. >> Do some benchmark to measure how fast E2E run after compiling by JCA? I think we should know what patterns can be improved and what patterns may not be improved. (and how much has it improved) >> 4.Open the JCA compiler to run all tests of table-planner? I mean you can run these tests manual, just test the JCA logical. > Add compilation options to allow compiling generated code with JDK compiler > > > Key: FLINK-11421 > URL: https://issues.apache.org/jira/browse/FLINK-11421 > Project: Flink > Issue Type: New Feature > Components: Table SQL / Runtime >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Major > Labels: pull-request-available > Original Estimate: 240h > Time Spent: 20m > Remaining Estimate: 239h 40m > > Flink supports some operators (like Calc, Hash Agg, Hash Join, etc.) by code > generation. That is, Flink generates their source code dynamically, and then > compile it into Java Byte Code, which is load and executed at runtime. > > By default, Flink compiles the generated source code by Janino. This is fast, > as the compilation often finishes in hundreds of milliseconds. The generated > Java Byte Code, however, is of poor quality. To illustrate, we use Java > Compiler API (JCA) to compile the generated code. Experiments on TPC-H (1 TB) > queries show that the E2E time can be more than 10% shorter, when operators > are compiled by JCA, despite that it takes more time (a few seconds) to > compile with JCA. > > Therefore, we believe it is beneficial to compile generated code by JCA in > the following scenarios: 1) For batch jobs, the E2E time is relatively long, > so it is worth of spending more time compiling and generating high quality > Java Byte Code. 2) For repeated stream jobs, the generated code will be > compiled once and run many times. Therefore, it pays to spend more time > compiling for the first time, and enjoy the high byte code qualities for > later runs. > > According to the above observations, we want to provide a compilation option > (Janino, JCA, or dynamic) for Flink, so that the user can choose the one > suitable for their specific scenario and obtain better performance whenever > possible. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-11421) Add compilation options to allow compiling generated code with JDK compiler
[ https://issues.apache.org/jira/browse/FLINK-11421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833664#comment-16833664 ] Liya Fan commented on FLINK-11421: -- Hi [~lzljs3620320], thanks a lot for your information. Can I restart the PR now? My comments in line: 1.Why is Java Compiler faster than Janino? any technical details and evidence? Generally speaking, if a compiler takes longer time to compile the code, the compilation results will have higher quality. This is because, a compiler taking longer time usually applies more optimizations to the code. Similarly, for native language compilers, like gcc, we have different optimization levels, -O0, -O1, -O2, and -O3. A higher compilation level means a longer compilation time. However, the generated machine code will have better quality. 2.Do some benchmark to measure how fast E2E run after compiling by JCA? We first found that JCA could improve E2E performance when we were trying to support vectorization of TPC-H Q1. JCA compilation provided a performance improvement of about 27% (from 27-28s to 20s). This is also witnessed in some other TPC-H Queries, like Q12, Q18, etc. 3.Do some benchmark to measure how slowly JCA compiles? Good question. It takes about 2s to finish a JCA compilation task, which is more than 10 times slower than compiling by Janino. To alleviate the performance impact, we introduce 2 improvements: 1) Compilation by chain: it seems the compilation time does not increase much as the number of source files increases. So we compile the source code for all operators in a chain in a single batch. 2) Class cache: once a source file is compiled, we save it into a cache, so other tasks in the same JVM can reuse the compilation results. 4.Open the JCA compiler to run all tests of table-planner? Sounds reasonable. The only drawback can be that, the time for running the tests can be much longer, since compiling by JCA is much slower. > Add compilation options to allow compiling generated code with JDK compiler > > > Key: FLINK-11421 > URL: https://issues.apache.org/jira/browse/FLINK-11421 > Project: Flink > Issue Type: New Feature > Components: Table SQL / Runtime >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Major > Labels: pull-request-available > Original Estimate: 240h > Time Spent: 20m > Remaining Estimate: 239h 40m > > Flink supports some operators (like Calc, Hash Agg, Hash Join, etc.) by code > generation. That is, Flink generates their source code dynamically, and then > compile it into Java Byte Code, which is load and executed at runtime. > > By default, Flink compiles the generated source code by Janino. This is fast, > as the compilation often finishes in hundreds of milliseconds. The generated > Java Byte Code, however, is of poor quality. To illustrate, we use Java > Compiler API (JCA) to compile the generated code. Experiments on TPC-H (1 TB) > queries show that the E2E time can be more than 10% shorter, when operators > are compiled by JCA, despite that it takes more time (a few seconds) to > compile with JCA. > > Therefore, we believe it is beneficial to compile generated code by JCA in > the following scenarios: 1) For batch jobs, the E2E time is relatively long, > so it is worth of spending more time compiling and generating high quality > Java Byte Code. 2) For repeated stream jobs, the generated code will be > compiled once and run many times. Therefore, it pays to spend more time > compiling for the first time, and enjoy the high byte code qualities for > later runs. > > According to the above observations, we want to provide a compilation option > (Janino, JCA, or dynamic) for Flink, so that the user can choose the one > suitable for their specific scenario and obtain better performance whenever > possible. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-11421) Add compilation options to allow compiling generated code with JDK compiler
[ https://issues.apache.org/jira/browse/FLINK-11421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833578#comment-16833578 ] Jingsong Lee commented on FLINK-11421: -- Hi [~fan_li_ya] , CodeGenerator of blink almost merge done. I have a few questions here: 1.Why is Java Compiler faster than Janino? any technical details and evidence? 2.Do some benchmark to measure how fast E2E run after compiling by JCA? 3.Do some benchmark to measure how slowly JCA compiles? 4.Open the JCA compiler to run all tests of table-planner? > Add compilation options to allow compiling generated code with JDK compiler > > > Key: FLINK-11421 > URL: https://issues.apache.org/jira/browse/FLINK-11421 > Project: Flink > Issue Type: New Feature > Components: Table SQL / Runtime >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Major > Labels: pull-request-available > Original Estimate: 240h > Time Spent: 20m > Remaining Estimate: 239h 40m > > Flink supports some operators (like Calc, Hash Agg, Hash Join, etc.) by code > generation. That is, Flink generates their source code dynamically, and then > compile it into Java Byte Code, which is load and executed at runtime. > > By default, Flink compiles the generated source code by Janino. This is fast, > as the compilation often finishes in hundreds of milliseconds. The generated > Java Byte Code, however, is of poor quality. To illustrate, we use Java > Compiler API (JCA) to compile the generated code. Experiments on TPC-H (1 TB) > queries show that the E2E time can be more than 10% shorter, when operators > are compiled by JCA, despite that it takes more time (a few seconds) to > compile with JCA. > > Therefore, we believe it is beneficial to compile generated code by JCA in > the following scenarios: 1) For batch jobs, the E2E time is relatively long, > so it is worth of spending more time compiling and generating high quality > Java Byte Code. 2) For repeated stream jobs, the generated code will be > compiled once and run many times. Therefore, it pays to spend more time > compiling for the first time, and enjoy the high byte code qualities for > later runs. > > According to the above observations, we want to provide a compilation option > (Janino, JCA, or dynamic) for Flink, so that the user can choose the one > suitable for their specific scenario and obtain better performance whenever > possible. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-11421) Add compilation options to allow compiling generated code with JDK compiler
[ https://issues.apache.org/jira/browse/FLINK-11421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16772937#comment-16772937 ] Liya Fan commented on FLINK-11421: -- [~ykt836] Closed. Thanks for the reminder. > Add compilation options to allow compiling generated code with JDK compiler > > > Key: FLINK-11421 > URL: https://issues.apache.org/jira/browse/FLINK-11421 > Project: Flink > Issue Type: New Feature > Components: Table API SQL >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Major > Labels: pull-request-available > Original Estimate: 240h > Time Spent: 20m > Remaining Estimate: 239h 40m > > Flink supports some operators (like Calc, Hash Agg, Hash Join, etc.) by code > generation. That is, Flink generates their source code dynamically, and then > compile it into Java Byte Code, which is load and executed at runtime. > > By default, Flink compiles the generated source code by Janino. This is fast, > as the compilation often finishes in hundreds of milliseconds. The generated > Java Byte Code, however, is of poor quality. To illustrate, we use Java > Compiler API (JCA) to compile the generated code. Experiments on TPC-H (1 TB) > queries show that the E2E time can be more than 10% shorter, when operators > are compiled by JCA, despite that it takes more time (a few seconds) to > compile with JCA. > > Therefore, we believe it is beneficial to compile generated code by JCA in > the following scenarios: 1) For batch jobs, the E2E time is relatively long, > so it is worth of spending more time compiling and generating high quality > Java Byte Code. 2) For repeated stream jobs, the generated code will be > compiled once and run many times. Therefore, it pays to spend more time > compiling for the first time, and enjoy the high byte code qualities for > later runs. > > According to the above observations, we want to provide a compilation option > (Janino, JCA, or dynamic) for Flink, so that the user can choose the one > suitable for their specific scenario and obtain better performance whenever > possible. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-11421) Add compilation options to allow compiling generated code with JDK compiler
[ https://issues.apache.org/jira/browse/FLINK-11421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16772832#comment-16772832 ] Kurt Young commented on FLINK-11421: [~fan_li_ya] Sounds great, thanks! BTW, could please close the pull request for now? We want to keep all PR more effective for current status, and you can reopen it anytime once you think the timing is good. > Add compilation options to allow compiling generated code with JDK compiler > > > Key: FLINK-11421 > URL: https://issues.apache.org/jira/browse/FLINK-11421 > Project: Flink > Issue Type: New Feature > Components: Table API SQL >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Major > Labels: pull-request-available > Original Estimate: 240h > Time Spent: 10m > Remaining Estimate: 239h 50m > > Flink supports some operators (like Calc, Hash Agg, Hash Join, etc.) by code > generation. That is, Flink generates their source code dynamically, and then > compile it into Java Byte Code, which is load and executed at runtime. > > By default, Flink compiles the generated source code by Janino. This is fast, > as the compilation often finishes in hundreds of milliseconds. The generated > Java Byte Code, however, is of poor quality. To illustrate, we use Java > Compiler API (JCA) to compile the generated code. Experiments on TPC-H (1 TB) > queries show that the E2E time can be more than 10% shorter, when operators > are compiled by JCA, despite that it takes more time (a few seconds) to > compile with JCA. > > Therefore, we believe it is beneficial to compile generated code by JCA in > the following scenarios: 1) For batch jobs, the E2E time is relatively long, > so it is worth of spending more time compiling and generating high quality > Java Byte Code. 2) For repeated stream jobs, the generated code will be > compiled once and run many times. Therefore, it pays to spend more time > compiling for the first time, and enjoy the high byte code qualities for > later runs. > > According to the above observations, we want to provide a compilation option > (Janino, JCA, or dynamic) for Flink, so that the user can choose the one > suitable for their specific scenario and obtain better performance whenever > possible. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-11421) Add compilation options to allow compiling generated code with JDK compiler
[ https://issues.apache.org/jira/browse/FLINK-11421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16772760#comment-16772760 ] Liya Fan commented on FLINK-11421: -- [~ykt836] I see. Thanks a lot for the comments. I will work on this Jira after Blink is merged. > Add compilation options to allow compiling generated code with JDK compiler > > > Key: FLINK-11421 > URL: https://issues.apache.org/jira/browse/FLINK-11421 > Project: Flink > Issue Type: New Feature > Components: Table API SQL >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Major > Labels: pull-request-available > Original Estimate: 240h > Time Spent: 10m > Remaining Estimate: 239h 50m > > Flink supports some operators (like Calc, Hash Agg, Hash Join, etc.) by code > generation. That is, Flink generates their source code dynamically, and then > compile it into Java Byte Code, which is load and executed at runtime. > > By default, Flink compiles the generated source code by Janino. This is fast, > as the compilation often finishes in hundreds of milliseconds. The generated > Java Byte Code, however, is of poor quality. To illustrate, we use Java > Compiler API (JCA) to compile the generated code. Experiments on TPC-H (1 TB) > queries show that the E2E time can be more than 10% shorter, when operators > are compiled by JCA, despite that it takes more time (a few seconds) to > compile with JCA. > > Therefore, we believe it is beneficial to compile generated code by JCA in > the following scenarios: 1) For batch jobs, the E2E time is relatively long, > so it is worth of spending more time compiling and generating high quality > Java Byte Code. 2) For repeated stream jobs, the generated code will be > compiled once and run many times. Therefore, it pays to spend more time > compiling for the first time, and enjoy the high byte code qualities for > later runs. > > According to the above observations, we want to provide a compilation option > (Janino, JCA, or dynamic) for Flink, so that the user can choose the one > suitable for their specific scenario and obtain better performance whenever > possible. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-11421) Add compilation options to allow compiling generated code with JDK compiler
[ https://issues.apache.org/jira/browse/FLINK-11421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16772679#comment-16772679 ] Kurt Young commented on FLINK-11421: [~fan_li_ya] I'm aware of this information. But according to current merge plan, Flink and Blink will have it's own independent planner implementation. The planner is responsible to translate a SQL statement or a table directly into underlying runtime APIs, thus will include the code generation part. It's highly possible that during the merging of Blink, we will have a totally new code generation for blink and forget to adopt improvements like this. So i think maybe it's better to evaluate this after the merge work has been done. > Add compilation options to allow compiling generated code with JDK compiler > > > Key: FLINK-11421 > URL: https://issues.apache.org/jira/browse/FLINK-11421 > Project: Flink > Issue Type: New Feature > Components: Table API SQL >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Major > Labels: pull-request-available > Original Estimate: 240h > Time Spent: 10m > Remaining Estimate: 239h 50m > > Flink supports some operators (like Calc, Hash Agg, Hash Join, etc.) by code > generation. That is, Flink generates their source code dynamically, and then > compile it into Java Byte Code, which is load and executed at runtime. > > By default, Flink compiles the generated source code by Janino. This is fast, > as the compilation often finishes in hundreds of milliseconds. The generated > Java Byte Code, however, is of poor quality. To illustrate, we use Java > Compiler API (JCA) to compile the generated code. Experiments on TPC-H (1 TB) > queries show that the E2E time can be more than 10% shorter, when operators > are compiled by JCA, despite that it takes more time (a few seconds) to > compile with JCA. > > Therefore, we believe it is beneficial to compile generated code by JCA in > the following scenarios: 1) For batch jobs, the E2E time is relatively long, > so it is worth of spending more time compiling and generating high quality > Java Byte Code. 2) For repeated stream jobs, the generated code will be > compiled once and run many times. Therefore, it pays to spend more time > compiling for the first time, and enjoy the high byte code qualities for > later runs. > > According to the above observations, we want to provide a compilation option > (Janino, JCA, or dynamic) for Flink, so that the user can choose the one > suitable for their specific scenario and obtain better performance whenever > possible. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-11421) Add compilation options to allow compiling generated code with JDK compiler
[ https://issues.apache.org/jira/browse/FLINK-11421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16772585#comment-16772585 ] Liya Fan commented on FLINK-11421: -- Hi Kurt Young, thank you so much for your attention. I understand that there is a lot of effort going on merging Blink. I want to clarify that, our evaluations were based on Blink (I am also from Alibaba), and the 10% performance improvement mentioned in the description has been observed on Blink for TPC-H queries (1TB, for example, Q1). We open this Jira and PR now, because it is independent of Blink features, and we want the community to use it as early as possible to enjoy the high performance and flexibility whenever possible. > Add compilation options to allow compiling generated code with JDK compiler > > > Key: FLINK-11421 > URL: https://issues.apache.org/jira/browse/FLINK-11421 > Project: Flink > Issue Type: New Feature > Components: Table API SQL >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Major > Labels: pull-request-available > Original Estimate: 240h > Time Spent: 10m > Remaining Estimate: 239h 50m > > Flink supports some operators (like Calc, Hash Agg, Hash Join, etc.) by code > generation. That is, Flink generates their source code dynamically, and then > compile it into Java Byte Code, which is load and executed at runtime. > > By default, Flink compiles the generated source code by Janino. This is fast, > as the compilation often finishes in hundreds of milliseconds. The generated > Java Byte Code, however, is of poor quality. To illustrate, we use Java > Compiler API (JCA) to compile the generated code. Experiments on TPC-H (1 TB) > queries show that the E2E time can be more than 10% shorter, when operators > are compiled by JCA, despite that it takes more time (a few seconds) to > compile with JCA. > > Therefore, we believe it is beneficial to compile generated code by JCA in > the following scenarios: 1) For batch jobs, the E2E time is relatively long, > so it is worth of spending more time compiling and generating high quality > Java Byte Code. 2) For repeated stream jobs, the generated code will be > compiled once and run many times. Therefore, it pays to spend more time > compiling for the first time, and enjoy the high byte code qualities for > later runs. > > According to the above observations, we want to provide a compilation option > (Janino, JCA, or dynamic) for Flink, so that the user can choose the one > suitable for their specific scenario and obtain better performance whenever > possible. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-11421) Add compilation options to allow compiling generated code with JDK compiler
[ https://issues.apache.org/jira/browse/FLINK-11421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16771941#comment-16771941 ] Kurt Young commented on FLINK-11421: Hi [~fan_li_ya], thanks for the proposal. As far as i know, currently Flink did not do a lot of code generations for batch and streaming jobs. And there is a big blink merge efforts currently going on, and blink will rely on code generation much heavier. Does it make sense to you if we can evaluate this approach after blink merge be done? > Add compilation options to allow compiling generated code with JDK compiler > > > Key: FLINK-11421 > URL: https://issues.apache.org/jira/browse/FLINK-11421 > Project: Flink > Issue Type: New Feature > Components: Table API SQL >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Major > Labels: pull-request-available > Original Estimate: 240h > Time Spent: 10m > Remaining Estimate: 239h 50m > > Flink supports some operators (like Calc, Hash Agg, Hash Join, etc.) by code > generation. That is, Flink generates their source code dynamically, and then > compile it into Java Byte Code, which is load and executed at runtime. > > By default, Flink compiles the generated source code by Janino. This is fast, > as the compilation often finishes in hundreds of milliseconds. The generated > Java Byte Code, however, is of poor quality. To illustrate, we use Java > Compiler API (JCA) to compile the generated code. Experiments on TPC-H (1 TB) > queries show that the E2E time can be more than 10% shorter, when operators > are compiled by JCA, despite that it takes more time (a few seconds) to > compile with JCA. > > Therefore, we believe it is beneficial to compile generated code by JCA in > the following scenarios: 1) For batch jobs, the E2E time is relatively long, > so it is worth of spending more time compiling and generating high quality > Java Byte Code. 2) For repeated stream jobs, the generated code will be > compiled once and run many times. Therefore, it pays to spend more time > compiling for the first time, and enjoy the high byte code qualities for > later runs. > > According to the above observations, we want to provide a compilation option > (Janino, JCA, or dynamic) for Flink, so that the user can choose the one > suitable for their specific scenario and obtain better performance whenever > possible. -- This message was sent by Atlassian JIRA (v7.6.3#76005)