[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Michael Smith has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. IMPALA-11805: Use llvm ObjectCache for codegen caching Currently, we employ llvm::ExecutionEngine for codegen caching, providing access to compiled functions within the cached engine. However, the real challenge is the ExecutionEngine uses a lot of memory which largely exceeds our memory estimates and it is very hard to predict. This patch addresses this issue by using llvm::ObjectCache for codegen caching. In our case, each execution engine would have only one module, and after the compilation of the module, the compiled codegened functions of the module would be set to the execution engine, therefore functions could be used by Impala. During function compilation within the module, if an ObjectCache is set to the execution engine, the compiled codegened functions would be also written into the cache. This way, if we keep the cache, when revisiting the same module (fragment), we can efficiently reuse the specific ObjectCache, loading pre-compiled codegened functions and saving time. The tpch performance test indicates no significant regression compared to the previous use of ExecutionEngine. Post-change, the actual memory usage of each codegen caching entry is notably reduced. +--+---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +--+---+-++++ | TPCH(1) | parquet / none / none | 0.22| -0.65% | 0.20 | -0.75% | +--+---+-++++ +--+--+---++-++++---++-+---+ | Workload | Query| File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval | +--+--+---++-++++---++-+---+ | TPCH(1) | TPCH-Q13 | parquet / none / none | 0.49 | 0.47| +2.80% | 5.32%| 5.07%| 10| +1.22% | 1.63| 1.19 | | TPCH(1) | TPCH-Q4 | parquet / none / none | 0.16 | 0.16| +3.51% | 1.32%| * 10.38% * | 10| +0.06% | 0.49| 1.06 | | TPCH(1) | TPCH-Q11 | parquet / none / none | 0.12 | 0.12| +1.39% | 2.27%| 2.24%| 10| +1.50% | 1.90| 1.37 | | TPCH(1) | TPCH-Q19 | parquet / none / none | 0.21 | 0.21| +1.56% | * 10.02% * | * 11.42% * | 10| +1.18% | 0.57| 0.32 | | TPCH(1) | TPCH-Q18 | parquet / none / none | 0.27 | 0.27| +1.71% | 6.46%| 1.29%| 10| -0.19% | -1.19 | 0.81 | | TPCH(1) | TPCH-Q6 | parquet / none / none | 0.11 | 0.11| +0.79% | 2.76%| 2.15%| 10| +0.10% | 1.46| 0.71 | | TPCH(1) | TPCH-Q3 | parquet / none / none | 0.26 | 0.26| +0.71% | 6.63%| 6.18%| 10| +0.04% | 0.49| 0.25 | | TPCH(1) | TPCH-Q17 | parquet / none / none | 0.17 | 0.17| +0.41% | * 14.66% * | * 13.01% * | 10| +0.05% | 0.40| 0.07 | | TPCH(1) | TPCH-Q14 | parquet / none / none | 0.16 | 0.16| +0.19% | 1.41%| 1.39%| 10| +0.25% | 1.46| 0.31 | | TPCH(1) | TPCH-Q20 | parquet / none / none | 0.17 | 0.17| +0.22% | 1.70%| 1.77%| 10| -0.05% | -0.40 | 0.28 | | TPCH(1) | TPCH-Q12 | parquet / none / none | 0.16 | 0.16| -0.27% | 0.54%| 1.46%| 10| +0.14% | 0.93| -0.54 | | TPCH(1) | TPCH-Q22 | parquet / none / none | 0.11 | 0.11| -0.38% | 0.81%| 2.06%| 10| +0.03% | 0.22| -0.54 | | TPCH(1) | TPCH-Q16 | parquet / none / none | 0.17 | 0.17| -0.38% | 0.67%| 1.58%| 10| -0.01% | -0.13 | -0.70 | | TPCH(1) | TPCH-Q8 | parquet / none / none | 0.27 | 0.27| -0.08% | 1.24%| 1.15%| 10| -0.33% | -1.37 | -0.15 | | TPCH(1) | TPCH-Q15 | parquet / none / none | 0.16 | 0.16| -1.18% | * 16.61% * | * 10.25% * | 10| +0.33% | 0.40| -0.19 | | TPCH(1) | TPCH-Q1 | parquet / none / none | 0.22 | 0.22| -1.67% | 1.62%| 7.45%| 10| +0.43% | 1.02| -0.70 | | TPCH(1) | TPCH-Q5 | parquet / none / none | 0.22 | 0.22| -0.98% | 0.22%|
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. Patch Set 8: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/20733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677 Gerrit-Change-Number: 20733 Gerrit-PatchSet: 8 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Tue, 19 Dec 2023 00:34:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. Patch Set 8: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/20733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677 Gerrit-Change-Number: 20733 Gerrit-PatchSet: 8 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Mon, 18 Dec 2023 20:06:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. Patch Set 8: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10069/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/20733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677 Gerrit-Change-Number: 20733 Gerrit-PatchSet: 8 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Mon, 18 Dec 2023 20:06:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14766/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677 Gerrit-Change-Number: 20733 Gerrit-PatchSet: 8 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Mon, 18 Dec 2023 08:16:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Yida Wu has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. IMPALA-11805: Use llvm ObjectCache for codegen caching Currently, we employ llvm::ExecutionEngine for codegen caching, providing access to compiled functions within the cached engine. However, the real challenge is the ExecutionEngine uses a lot of memory which largely exceeds our memory estimates and it is very hard to predict. This patch addresses this issue by using llvm::ObjectCache for codegen caching. In our case, each execution engine would have only one module, and after the compilation of the module, the compiled codegened functions of the module would be set to the execution engine, therefore functions could be used by Impala. During function compilation within the module, if an ObjectCache is set to the execution engine, the compiled codegened functions would be also written into the cache. This way, if we keep the cache, when revisiting the same module (fragment), we can efficiently reuse the specific ObjectCache, loading pre-compiled codegened functions and saving time. The tpch performance test indicates no significant regression compared to the previous use of ExecutionEngine. Post-change, the actual memory usage of each codegen caching entry is notably reduced. +--+---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +--+---+-++++ | TPCH(1) | parquet / none / none | 0.22| -0.65% | 0.20 | -0.75% | +--+---+-++++ +--+--+---++-++++---++-+---+ | Workload | Query| File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval | +--+--+---++-++++---++-+---+ | TPCH(1) | TPCH-Q13 | parquet / none / none | 0.49 | 0.47| +2.80% | 5.32%| 5.07%| 10| +1.22% | 1.63| 1.19 | | TPCH(1) | TPCH-Q4 | parquet / none / none | 0.16 | 0.16| +3.51% | 1.32%| * 10.38% * | 10| +0.06% | 0.49| 1.06 | | TPCH(1) | TPCH-Q11 | parquet / none / none | 0.12 | 0.12| +1.39% | 2.27%| 2.24%| 10| +1.50% | 1.90| 1.37 | | TPCH(1) | TPCH-Q19 | parquet / none / none | 0.21 | 0.21| +1.56% | * 10.02% * | * 11.42% * | 10| +1.18% | 0.57| 0.32 | | TPCH(1) | TPCH-Q18 | parquet / none / none | 0.27 | 0.27| +1.71% | 6.46%| 1.29%| 10| -0.19% | -1.19 | 0.81 | | TPCH(1) | TPCH-Q6 | parquet / none / none | 0.11 | 0.11| +0.79% | 2.76%| 2.15%| 10| +0.10% | 1.46| 0.71 | | TPCH(1) | TPCH-Q3 | parquet / none / none | 0.26 | 0.26| +0.71% | 6.63%| 6.18%| 10| +0.04% | 0.49| 0.25 | | TPCH(1) | TPCH-Q17 | parquet / none / none | 0.17 | 0.17| +0.41% | * 14.66% * | * 13.01% * | 10| +0.05% | 0.40| 0.07 | | TPCH(1) | TPCH-Q14 | parquet / none / none | 0.16 | 0.16| +0.19% | 1.41%| 1.39%| 10| +0.25% | 1.46| 0.31 | | TPCH(1) | TPCH-Q20 | parquet / none / none | 0.17 | 0.17| +0.22% | 1.70%| 1.77%| 10| -0.05% | -0.40 | 0.28 | | TPCH(1) | TPCH-Q12 | parquet / none / none | 0.16 | 0.16| -0.27% | 0.54%| 1.46%| 10| +0.14% | 0.93| -0.54 | | TPCH(1) | TPCH-Q22 | parquet / none / none | 0.11 | 0.11| -0.38% | 0.81%| 2.06%| 10| +0.03% | 0.22| -0.54 | | TPCH(1) | TPCH-Q16 | parquet / none / none | 0.17 | 0.17| -0.38% | 0.67%| 1.58%| 10| -0.01% | -0.13 | -0.70 | | TPCH(1) | TPCH-Q8 | parquet / none / none | 0.27 | 0.27| -0.08% | 1.24%| 1.15%| 10| -0.33% | -1.37 | -0.15 | | TPCH(1) | TPCH-Q15 | parquet / none / none | 0.16 | 0.16| -1.18% | * 16.61% * | * 10.25% * | 10| +0.33% | 0.40| -0.19 | | TPCH(1) | TPCH-Q1 | parquet / none / none | 0.22 | 0.22| -1.67% | 1.62%| 7.45%| 10| +0.43% | 1.02| -0.70 | | TPCH(1) | TPCH-Q5 | parquet / none / none | 0.22 | 0.22| -0.98% | 0.22%| 1.55%|
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. Patch Set 7: (10 comments) Thanks Yida! http://gerrit.cloudera.org:8080/#/c/20733/7//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20733/7//COMMIT_MSG@17 PS7, Line 17: after Could add "and" before "after". http://gerrit.cloudera.org:8080/#/c/20733/7//COMMIT_MSG@19 PS7, Line 19: funtions Nit: funCtions http://gerrit.cloudera.org:8080/#/c/20733/7/be/src/codegen/llvm-codegen-cache-test.cc File be/src/codegen/llvm-codegen-cache-test.cc: http://gerrit.cloudera.org:8080/#/c/20733/7/be/src/codegen/llvm-codegen-cache-test.cc@37 PS7, Line 37: // The object cache size for an engine containing a single compiled Could include that it is in bytes. http://gerrit.cloudera.org:8080/#/c/20733/7/be/src/codegen/llvm-codegen-cache-test.cc@41 PS7, Line 41: _LARGE Are there tests with small (non-large) capacity? Otherwise it doesn't need to be included in the name. http://gerrit.cloudera.org:8080/#/c/20733/7/be/src/codegen/llvm-codegen-cache-test.cc@50 PS7, Line 50: // Using single shard makes the logic of scenarios simple for capacity and : // eviction-related behavior. : FLAGS_cache_force_single_shard = true; Duplicate lines from L46. http://gerrit.cloudera.org:8080/#/c/20733/7/be/src/codegen/llvm-codegen-cache-test.cc@114 PS7, Line 114: shared_ptr CreateObjCache( I don't think the arguments should have default values. There is only one place where this function is called without arguments, it could explicitly pass (false, nullptr). http://gerrit.cloudera.org:8080/#/c/20733/7/be/src/codegen/llvm-codegen-cache-test.cc@271 PS7, Line 271: codegen This could also be 'codegen_echo'. http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-object-cache.cc File be/src/codegen/llvm-codegen-object-cache.cc: http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-object-cache.cc@57 PS4, Line 57: // ID. If the ID doesn't match the one in the object cache, a warning is logged for > From the definition of the interface getObject(), "Returns a pointer to a n I haven't checked what MemoryBuffer objects are, but is it true that we've already copied the buffer into 'cached_obj_'? Or who owns it? Is it possible that the reference we give out will outlive this object? If so, we should put the buffer behind a shared pointer, if not, then we could safely give out references to it. I'm not familiar with how these objects can be used in LLVM but always copying the whole module buffer seems wasteful. http://gerrit.cloudera.org:8080/#/c/20733/7/be/src/codegen/llvm-codegen-object-cache.cc File be/src/codegen/llvm-codegen-object-cache.cc: http://gerrit.cloudera.org:8080/#/c/20733/7/be/src/codegen/llvm-codegen-object-cache.cc@39 PS7, Line 39: << ObjBuffer.getBuffer().size(); Some context could be provided, like "size: " and the unit (bytes?). Applies also to L62. http://gerrit.cloudera.org:8080/#/c/20733/7/be/src/codegen/llvm-codegen-object-cache.cc@56 PS7, Line 56: object cache Isn't it the "cached object" instead? See also on the next line. -- To view, visit http://gerrit.cloudera.org:8080/20733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677 Gerrit-Change-Number: 20733 Gerrit-PatchSet: 7 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Fri, 15 Dec 2023 12:23:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. Patch Set 7: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/20733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677 Gerrit-Change-Number: 20733 Gerrit-PatchSet: 7 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Tue, 12 Dec 2023 19:26:43 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14657/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677 Gerrit-Change-Number: 20733 Gerrit-PatchSet: 7 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Tue, 12 Dec 2023 06:32:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. Patch Set 7: (1 comment) Rebased and updated the commit message over the perf A-B result. http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG@28 PS3, Line 28: compared to the previous use of ExecutionEngine. Post-change, > https://jenkins.impala.io/job/perf-AB-test-ub2004/29 for example shows Thanks Michael. Yeah, that makes sense. Reran the perf A-B test in https://jenkins.impala.io/job/perf-AB-test-ub2004/33/. And added below to the commit message. +--+---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +--+---+-++++ | TPCH(1) | parquet / none / none | 0.22| -0.65% | 0.20 | -0.75% | +--+---+-++++ -- To view, visit http://gerrit.cloudera.org:8080/20733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677 Gerrit-Change-Number: 20733 Gerrit-PatchSet: 7 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Tue, 12 Dec 2023 06:06:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Yida Wu has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. IMPALA-11805: Use llvm ObjectCache for codegen caching Currently, we employ llvm::ExecutionEngine for codegen caching, providing access to compiled functions within the cached engine. However, the real challenge is the ExecutionEngine uses a lot of memory which largely exceeds our memory estimates and it is very hard to predict. This patch addresses this issue by using llvm::ObjectCache for codegen caching. In our case, each execution engine would have only one module, after the compilation of the module, the compiled codegened functions of the module would be set to the execution engine, therefore funtions could be used by Impala. During function compilation within the module, if an ObjectCache is set to the execution engine, the compiled codegened functions would be also written into the cache. This way, if we keep the cache, when revisiting the same module (fragment), we can efficiently reuse the specific ObjectCache, loading pre-compiled codegened functions and saving time. The tpch performance test indicates no significant regression compared to the previous use of ExecutionEngine. Post-change, the actual memory usage of each codegen caching entry is notably reduced. +--+---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +--+---+-++++ | TPCH(1) | parquet / none / none | 0.22| -0.65% | 0.20 | -0.75% | +--+---+-++++ +--+--+---++-++++---++-+---+ | Workload | Query| File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval | +--+--+---++-++++---++-+---+ | TPCH(1) | TPCH-Q13 | parquet / none / none | 0.49 | 0.47| +2.80% | 5.32%| 5.07%| 10| +1.22% | 1.63| 1.19 | | TPCH(1) | TPCH-Q4 | parquet / none / none | 0.16 | 0.16| +3.51% | 1.32%| * 10.38% * | 10| +0.06% | 0.49| 1.06 | | TPCH(1) | TPCH-Q11 | parquet / none / none | 0.12 | 0.12| +1.39% | 2.27%| 2.24%| 10| +1.50% | 1.90| 1.37 | | TPCH(1) | TPCH-Q19 | parquet / none / none | 0.21 | 0.21| +1.56% | * 10.02% * | * 11.42% * | 10| +1.18% | 0.57| 0.32 | | TPCH(1) | TPCH-Q18 | parquet / none / none | 0.27 | 0.27| +1.71% | 6.46%| 1.29%| 10| -0.19% | -1.19 | 0.81 | | TPCH(1) | TPCH-Q6 | parquet / none / none | 0.11 | 0.11| +0.79% | 2.76%| 2.15%| 10| +0.10% | 1.46| 0.71 | | TPCH(1) | TPCH-Q3 | parquet / none / none | 0.26 | 0.26| +0.71% | 6.63%| 6.18%| 10| +0.04% | 0.49| 0.25 | | TPCH(1) | TPCH-Q17 | parquet / none / none | 0.17 | 0.17| +0.41% | * 14.66% * | * 13.01% * | 10| +0.05% | 0.40| 0.07 | | TPCH(1) | TPCH-Q14 | parquet / none / none | 0.16 | 0.16| +0.19% | 1.41%| 1.39%| 10| +0.25% | 1.46| 0.31 | | TPCH(1) | TPCH-Q20 | parquet / none / none | 0.17 | 0.17| +0.22% | 1.70%| 1.77%| 10| -0.05% | -0.40 | 0.28 | | TPCH(1) | TPCH-Q12 | parquet / none / none | 0.16 | 0.16| -0.27% | 0.54%| 1.46%| 10| +0.14% | 0.93| -0.54 | | TPCH(1) | TPCH-Q22 | parquet / none / none | 0.11 | 0.11| -0.38% | 0.81%| 2.06%| 10| +0.03% | 0.22| -0.54 | | TPCH(1) | TPCH-Q16 | parquet / none / none | 0.17 | 0.17| -0.38% | 0.67%| 1.58%| 10| -0.01% | -0.13 | -0.70 | | TPCH(1) | TPCH-Q8 | parquet / none / none | 0.27 | 0.27| -0.08% | 1.24%| 1.15%| 10| -0.33% | -1.37 | -0.15 | | TPCH(1) | TPCH-Q15 | parquet / none / none | 0.16 | 0.16| -1.18% | * 16.61% * | * 10.25% * | 10| +0.33% | 0.40| -0.19 | | TPCH(1) | TPCH-Q1 | parquet / none / none | 0.22 | 0.22| -1.67% | 1.62%| 7.45%| 10| +0.43% | 1.02| -0.70 | | TPCH(1) | TPCH-Q5 | parquet / none / none | 0.22 | 0.22| -0.98% | 0.22%| 1.55%| 10
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG@28 PS3, Line 28: compared to the previous use of ExecutionEngine. Post-change, > Double checked, but it doesn't seem to have the grand total of our benchmar https://jenkins.impala.io/job/perf-AB-test-ub2004/29 for example shows +---+---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +---+---+-++++ | TPCDS(20) | parquet / none / none | 1.23| -1.89% | 0.54 | -6.53% | +---+---+-++++ I guess that's added some other way? -- To view, visit http://gerrit.cloudera.org:8080/20733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677 Gerrit-Change-Number: 20733 Gerrit-PatchSet: 6 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Mon, 11 Dec 2023 18:04:08 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG@28 PS3, Line 28: compared to the previous use of ExecutionEngine. Post-change, > Doesn't the benchmarking framework print a grand total? If not, we can let Double checked, but it doesn't seem to have the grand total of our benchmarking framework. Filed IMPALA-12615 for it. -- To view, visit http://gerrit.cloudera.org:8080/20733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677 Gerrit-Change-Number: 20733 Gerrit-PatchSet: 6 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Mon, 11 Dec 2023 03:18:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14616/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677 Gerrit-Change-Number: 20733 Gerrit-PatchSet: 6 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Fri, 08 Dec 2023 00:55:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. Patch Set 6: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/20733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677 Gerrit-Change-Number: 20733 Gerrit-PatchSet: 6 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Fri, 08 Dec 2023 00:34:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/20733/5/be/src/codegen/llvm-codegen-cache-test.cc File be/src/codegen/llvm-codegen-cache-test.cc: http://gerrit.cloudera.org:8080/#/c/20733/5/be/src/codegen/llvm-codegen-cache-test.cc@39 PS5, Line 39: const int ENGINE_CACHE_SIZE = 990; > This is larger than the value that was here before. Was the test failing un Yeah, this is the real number of the cache containing one compiled function. The earlier count was misleading as the cache was empty. I've adjusted the testcase, and added LlvmCodeGenCacheTest::CheckObjCacheExists to confirm the existence of contents within the cache. http://gerrit.cloudera.org:8080/#/c/20733/5/be/src/codegen/llvm-codegen-cache-test.cc@176 PS5, Line 176: // The function is to create and return a CodeGenObjectCache which contains a specific > nit: "contains" not "contians" Done -- To view, visit http://gerrit.cloudera.org:8080/20733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677 Gerrit-Change-Number: 20733 Gerrit-PatchSet: 6 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Fri, 08 Dec 2023 00:28:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Yida Wu has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. IMPALA-11805: Use llvm ObjectCache for codegen caching Currently, we employ llvm::ExecutionEngine for codegen caching, providing access to compiled functions within the cached engine. However, the real challenge is the ExecutionEngine uses a lot of memory which largely exceeds our memory estimates and it is very hard to predict. This patch addresses this issue by using llvm::ObjectCache for codegen caching. In our case, each execution engine would have only one module, after the compilation of the module, the compiled codegened functions of the module would be set to the execution engine, therefore funtions could be used by Impala. During function compilation within the module, if an ObjectCache is set to the execution engine, the compiled codegened functions would be also written into the cache. This way, if we keep the cache, when revisiting the same module (fragment), we can efficiently reuse the specific ObjectCache, loading pre-compiled codegened functions and saving time. The tpch performance test indicates no significant regression compared to the previous use of ExecutionEngine. Post-change, the actual memory usage of each codegen caching entry is notably reduced. +--+--+---++-++---++---++-+---+ | Workload | Query| File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval | +--+--+---++-++---++---++-+---+ | TPCH(1) | TPCH-Q15 | parquet / none / none | 0.48 | 0.46| +5.16% | 4.31% | 4.78%| 8 | +3.51% | 2.11| 2.22 | | TPCH(1) | TPCH-Q17 | parquet / none / none | 1.28 | 1.25| +1.76% | 1.83% | 1.88%| 8 | +3.53% | 1.34| 1.88 | | TPCH(1) | TPCH-Q8 | parquet / none / none | 0.77 | 0.75| +3.11% | 3.88% | 2.75%| 8 | +0.79% | 0.83| 1.81 | | TPCH(1) | TPCH-Q7 | parquet / none / none | 0.72 | 0.71| +1.89% | 3.48% | 3.15%| 8 | +1.61% | 0.83| 1.13 | | TPCH(1) | TPCH-Q22 | parquet / none / none | 0.43 | 0.41| +3.40% | 1.21% | 6.11%| 8 | +0.05% | 0.45| 1.54 | | TPCH(1) | TPCH-Q10 | parquet / none / none | 0.74 | 0.73| +2.04% | 0.75% | 2.57%| 8 | +1.13% | 2.11| 2.16 | | TPCH(1) | TPCH-Q20 | parquet / none / none | 0.55 | 0.54| +1.42% | 2.95% | 1.14%| 8 | +0.19% | 0.57| 1.25 | | TPCH(1) | TPCH-Q11 | parquet / none / none | 0.25 | 0.25| +0.37% | 2.41% | 3.09%| 8 | +0.82% | 0.45| 0.27 | | TPCH(1) | TPCH-Q19 | parquet / none / none | 0.62 | 0.61| +0.92% | 3.17% | 1.53%| 8 | +0.05% | 0.19| 0.73 | | TPCH(1) | TPCH-Q4 | parquet / none / none | 0.42 | 0.42| +0.37% | 1.24% | 1.08%| 8 | +0.03% | 0.19| 0.63 | | TPCH(1) | TPCH-Q5 | parquet / none / none | 0.72 | 0.72| +0.19% | 3.08% | 3.23%| 8 | +0.07% | 0.57| 0.12 | | TPCH(1) | TPCH-Q18 | parquet / none / none | 1.20 | 1.19| +0.04% | 0.12% | 0.15%| 8 | +0.03% | 0.57| 0.53 | | TPCH(1) | TPCH-Q9 | parquet / none / none | 1.13 | 1.13| +0.02% | 2.05% | 2.04%| 8 | -0.01% | -0.57 | 0.02 | | TPCH(1) | TPCH-Q3 | parquet / none / none | 0.60 | 0.60| +0.03% | 3.71% | 3.99%| 8 | -0.04% | -0.06 | 0.01 | | TPCH(1) | TPCH-Q2 | parquet / none / none | 0.44 | 0.44| -0.70% | 8.56% | 7.42%| 8 | +0.61% | 0.45| -0.18 | | TPCH(1) | TPCH-Q6 | parquet / none / none | 0.31 | 0.31| -0.07% | 1.73% | 1.56%| 8 | -0.06% | -1.34 | -0.09 | | TPCH(1) | TPCH-Q14 | parquet / none / none | 0.48 | 0.48| -0.25% | 1.20% | 1.47%| 8 | -0.09% | -0.57 | -0.37 | | TPCH(1) | TPCH-Q12 | parquet / none / none | 0.54 | 0.54| -0.86% | 1.90% | 4.48%| 8 | +0.11% | 0.32| -0.50 | | TPCH(1) | TPCH-Q21 | parquet / none / none | 1.35 | 1.37| -1.56% | 3.11% | 2.55%| 8 | -0.19% | -0.70 | -1.11 | | TPCH(1) | TPCH-Q16 | parquet / none / none | 0.26 | 0.27| -1.74% | 9.53% | * 13.61% * | 8
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/20733/5/be/src/codegen/llvm-codegen-cache-test.cc File be/src/codegen/llvm-codegen-cache-test.cc: http://gerrit.cloudera.org:8080/#/c/20733/5/be/src/codegen/llvm-codegen-cache-test.cc@39 PS5, Line 39: const int ENGINE_CACHE_SIZE = 990; This is larger than the value that was here before. Was the test failing until this patch set? http://gerrit.cloudera.org:8080/#/c/20733/5/be/src/codegen/llvm-codegen-cache-test.cc@176 PS5, Line 176: // The function to create and return a CodeGenObjectCache which contians a specific nit: "contains" not "contians" -- To view, visit http://gerrit.cloudera.org:8080/20733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677 Gerrit-Change-Number: 20733 Gerrit-PatchSet: 5 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Fri, 08 Dec 2023 00:07:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14611/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677 Gerrit-Change-Number: 20733 Gerrit-PatchSet: 5 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 07 Dec 2023 13:52:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/20733/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20733/4//COMMIT_MSG@17 PS4, Line 17: after the compila > Could you clarify the relationship of the module and exec engine here? I.e. Yes. Added some comments. -- To view, visit http://gerrit.cloudera.org:8080/20733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677 Gerrit-Change-Number: 20733 Gerrit-PatchSet: 5 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 07 Dec 2023 13:38:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. Patch Set 5: (15 comments) http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG@28 PS3, Line 28: compared to the previous use of ExecutionEngine. Post-change, > Doesn't the benchmarking framework print a grand total? If not, we can let It seems our benchmarking framework doesn't print the total. Will double check and file a jira if we don't have it. http://gerrit.cloudera.org:8080/#/c/20733/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20733/4//COMMIT_MSG@18 PS4, Line 18: the executi > Why are they pre-compiled? Aren't they compiled already? Or you mean they a I think the "pre-compiled" means it is compiled in the last round and ready for use. Yeah, maybe call it "compiled" is better, because in this case it means the first round to write the compiled stuff into the cache. http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-cache.h File be/src/codegen/llvm-codegen-cache.h: http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-cache.h@164 PS4, Line 164: cached_engine > Thinking about this, wouldn't 'cached_engine...' express the meaning better Done http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-cache.cc File be/src/codegen/llvm-codegen-cache.cc: http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-cache.cc@155 PS4, Line 155: y > I think removing the negation and inverting the branches is better, one les Done http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-object-cache.cc File be/src/codegen/llvm-codegen-object-cache.cc: http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-object-cache.cc@39 PS4, Line 39: uffer().size(); > Is a copy necessary here? Can't we move the buffer somehow? CodeGenObjectCache doesn't have the ownership of ObjBuffer, the llvm engine may use the ObjBuffer later, so it is safer to make a copy. (And MemoryBufferRef may not allow to move the real Buffer, the example I can see is to do a copy) http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-object-cache.cc@42 PS4, Line 42: // can conserve memory. > Shouldn't we add a DCHECK here? If this happens it is a bug. Done http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-object-cache.cc@57 PS4, Line 57: // ID. If the ID doesn't match the one in the object cache, a warning is logged for > Will it always copy the buffer? Is there a way to access it without copying >From the definition of the interface getObject(), "Returns a pointer to a >newly allocated MemoryBuffer that contains the object which corresponds with >Module M", it would be safer to return the copy of the buffer. >https://llvm.org/doxygen/classllvm_1_1ObjectCache.html#details http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-object-cache.cc@60 PS4, Line 60: std::lock_guard l(mutex_); > Shouldn't we add a DCHECK here? If this happens it is a bug. Done http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-object-cache.cc@63 PS4, Line 63: << cached_obj_->getMemBufferRef().getBufferSize(); > Can this function be called when 'key_' is empty? Isn't that a bug? If there is no cache yet, would return nullptr, for example, the llvm engine would look up the cache to see if it contains the pre-compiled module, if it doesn't contain any, the llvm engine would proceed to compile the module and then write compiled module to the cache. This happens in cache look up and doesn't hit any. http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen.h File be/src/codegen/llvm-codegen.h: http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen.h@330 PS4, Line 330: > See comment on llvm-codegen-cache.h:164. Changed the one in llvm-codegen-cache.h, but would prefer to keep this, because this one is the cache for llvm engine to write the compiled functions to, it doesn't have to be anything "cached". Added some comments. http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen.h@925 PS4, Line 925: > Do I understand it correctly that this is for writing to the cache (storing Yes, we can say that the engine_cache_ is for writing, and engine_cache_cached_ is for reading. Added some comments. http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen.h@929 PS4, Line 929: > We could extend the comment with what is written on llvm-codegen.cc:1499, i Added some comments. http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen.cc@1192 PS4,
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Yida Wu has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. IMPALA-11805: Use llvm ObjectCache for codegen caching Currently, we employ llvm::ExecutionEngine for codegen caching, providing access to compiled functions within the cached engine. However, the real challenge is the ExecutionEngine uses a lot of memory which largely exceeds our memory estimates and it is very hard to predict. This patch addresses this issue by using llvm::ObjectCache for codegen caching. In our case, each execution engine would have only one module, after the compilation of the module, the compiled codegened functions of the module would be set to the execution engine, therefore funtions could be used by Impala. During function compilation within the module, if an ObjectCache is set to the execution engine, the compiled codegened functions would be also written into the cache. This way, if we keep the cache, when revisiting the same module (fragment), we can efficiently reuse the specific ObjectCache, loading pre-compiled codegened functions and saving time. The tpch performance test indicates no significant regression compared to the previous use of ExecutionEngine. Post-change, the actual memory usage of each codegen caching entry is notably reduced. +--+--+---++-++---++---++-+---+ | Workload | Query| File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval | +--+--+---++-++---++---++-+---+ | TPCH(1) | TPCH-Q15 | parquet / none / none | 0.48 | 0.46| +5.16% | 4.31% | 4.78%| 8 | +3.51% | 2.11| 2.22 | | TPCH(1) | TPCH-Q17 | parquet / none / none | 1.28 | 1.25| +1.76% | 1.83% | 1.88%| 8 | +3.53% | 1.34| 1.88 | | TPCH(1) | TPCH-Q8 | parquet / none / none | 0.77 | 0.75| +3.11% | 3.88% | 2.75%| 8 | +0.79% | 0.83| 1.81 | | TPCH(1) | TPCH-Q7 | parquet / none / none | 0.72 | 0.71| +1.89% | 3.48% | 3.15%| 8 | +1.61% | 0.83| 1.13 | | TPCH(1) | TPCH-Q22 | parquet / none / none | 0.43 | 0.41| +3.40% | 1.21% | 6.11%| 8 | +0.05% | 0.45| 1.54 | | TPCH(1) | TPCH-Q10 | parquet / none / none | 0.74 | 0.73| +2.04% | 0.75% | 2.57%| 8 | +1.13% | 2.11| 2.16 | | TPCH(1) | TPCH-Q20 | parquet / none / none | 0.55 | 0.54| +1.42% | 2.95% | 1.14%| 8 | +0.19% | 0.57| 1.25 | | TPCH(1) | TPCH-Q11 | parquet / none / none | 0.25 | 0.25| +0.37% | 2.41% | 3.09%| 8 | +0.82% | 0.45| 0.27 | | TPCH(1) | TPCH-Q19 | parquet / none / none | 0.62 | 0.61| +0.92% | 3.17% | 1.53%| 8 | +0.05% | 0.19| 0.73 | | TPCH(1) | TPCH-Q4 | parquet / none / none | 0.42 | 0.42| +0.37% | 1.24% | 1.08%| 8 | +0.03% | 0.19| 0.63 | | TPCH(1) | TPCH-Q5 | parquet / none / none | 0.72 | 0.72| +0.19% | 3.08% | 3.23%| 8 | +0.07% | 0.57| 0.12 | | TPCH(1) | TPCH-Q18 | parquet / none / none | 1.20 | 1.19| +0.04% | 0.12% | 0.15%| 8 | +0.03% | 0.57| 0.53 | | TPCH(1) | TPCH-Q9 | parquet / none / none | 1.13 | 1.13| +0.02% | 2.05% | 2.04%| 8 | -0.01% | -0.57 | 0.02 | | TPCH(1) | TPCH-Q3 | parquet / none / none | 0.60 | 0.60| +0.03% | 3.71% | 3.99%| 8 | -0.04% | -0.06 | 0.01 | | TPCH(1) | TPCH-Q2 | parquet / none / none | 0.44 | 0.44| -0.70% | 8.56% | 7.42%| 8 | +0.61% | 0.45| -0.18 | | TPCH(1) | TPCH-Q6 | parquet / none / none | 0.31 | 0.31| -0.07% | 1.73% | 1.56%| 8 | -0.06% | -1.34 | -0.09 | | TPCH(1) | TPCH-Q14 | parquet / none / none | 0.48 | 0.48| -0.25% | 1.20% | 1.47%| 8 | -0.09% | -0.57 | -0.37 | | TPCH(1) | TPCH-Q12 | parquet / none / none | 0.54 | 0.54| -0.86% | 1.90% | 4.48%| 8 | +0.11% | 0.32| -0.50 | | TPCH(1) | TPCH-Q21 | parquet / none / none | 1.35 | 1.37| -1.56% | 3.11% | 2.55%| 8 | -0.19% | -0.70 | -1.11 | | TPCH(1) | TPCH-Q16 | parquet / none / none | 0.26 | 0.27| -1.74% | 9.53% | * 13.61% * | 8
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. Patch Set 4: (16 comments) http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG@28 PS3, Line 28: > Do you mean run the time running all the tpch queries for once? Did a test, Doesn't the benchmarking framework print a grand total? If not, we can let it be like this. http://gerrit.cloudera.org:8080/#/c/20733/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20733/4//COMMIT_MSG@17 PS4, Line 17: (execution engine) Could you clarify the relationship of the module and exec engine here? I.e. one execution engine corresponds to exactly one module - is it correct? http://gerrit.cloudera.org:8080/#/c/20733/4//COMMIT_MSG@18 PS4, Line 18: pre-compiled Why are they pre-compiled? Aren't they compiled already? Or you mean they are not compiled to machine code? Then "partially compiled" or "optimized" may be better. http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-cache.h File be/src/codegen/llvm-codegen-cache.h: http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-cache.h@164 PS4, Line 164: engine_cache_ Thinking about this, wouldn't 'cached_engine...' express the meaning better? This is not a cache of multiple engines but a cached instance of an engine, or even better, its functions. Maybe 'cached_module...'? This applies also to the other variables named similarly in this file. http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-cache.cc File be/src/codegen/llvm-codegen-cache.cc: http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-cache.cc@155 PS4, Line 155: ! I think removing the negation and inverting the branches is better, one less negation to process mentally. See also L179. http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-object-cache.cc File be/src/codegen/llvm-codegen-object-cache.cc: http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-object-cache.cc@39 PS4, Line 39: getMemBufferCopy Is a copy necessary here? Can't we move the buffer somehow? http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-object-cache.cc@42 PS4, Line 42: LOG(WARNING) << "Inserting a different module in CodeGenObjectCache " << this Shouldn't we add a DCHECK here? If this happens it is a bug. http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-object-cache.cc@57 PS4, Line 57: return llvm::MemoryBuffer::getMemBufferCopy( Will it always copy the buffer? Is there a way to access it without copying? http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-object-cache.cc@60 PS4, Line 60: LOG(WARNING) << "Object for module id " << module_id << " can't be loaded from cache." Shouldn't we add a DCHECK here? If this happens it is a bug. http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-object-cache.cc@63 PS4, Line 63: return nullptr; Can this function be called when 'key_' is empty? Isn't that a bug? http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen.h File be/src/codegen/llvm-codegen.h: http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen.h@330 PS4, Line 330: engine_cache See comment on llvm-codegen-cache.h:164. http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen.h@925 PS4, Line 925: /// Object cache for engine. Do I understand it correctly that this is for writing to the cache (storing the module we compiled) and 'engine_cache_cached_' is for reading (when we retrieve a module from the cache)? It could be clarified in the comment(s). http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen.h@929 PS4, Line 929: /// Not null only when there is a cache hit. We could extend the comment with what is written on llvm-codegen.cc:1499, i.e. that we need the shared_ptr so that the lifetime of this object doesn't end even if it is evicted from the cache while we're using it. http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen.cc@1192 PS4, Line 1192: entire execution engine We no longer cache the engine. http://gerrit.cloudera.org:8080/#/c/20733/4/tests/custom_cluster/test_codegen_cache.py File tests/custom_cluster/test_codegen_cache.py: http://gerrit.cloudera.org:8080/#/c/20733/4/tests/custom_cluster/test_codegen_cache.py@169 PS4, Line 169: When an 'llvm::ExecutionEngine', which is part of the cache entry, is destroyed, it The situation described in this paragraph is no longer true after this change. We should write that this is how it used to work and also
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. Patch Set 4: (4 comments) http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen-cache-test.cc File be/src/codegen/llvm-codegen-cache-test.cc: http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen-cache-test.cc@298 PS2, Line 298: void LlvmCodeGenCacheTest::TestAtCapacity(TCodeGenCacheMode::type mode) { > Did a test in both arm and x86. In our testcase of mem_charge_1, the mem ch Ack http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc@a1185 PS2, Line 1185: > Done Ack http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-object-cache-wrapper.h File be/src/codegen/llvm-object-cache-wrapper.h: http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-object-cache-wrapper.h@44 PS2, Line 44: > Done Ack http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-object-cache-wrapper.cc File be/src/codegen/llvm-object-cache-wrapper.cc: http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-object-cache-wrapper.cc@66 PS2, Line 66: > Yeah, this is a good point. Added here. Ack -- To view, visit http://gerrit.cloudera.org:8080/20733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677 Gerrit-Change-Number: 20733 Gerrit-PatchSet: 4 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Mon, 04 Dec 2023 18:45:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. Patch Set 4: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/20733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677 Gerrit-Change-Number: 20733 Gerrit-PatchSet: 4 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Mon, 04 Dec 2023 18:44:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. Patch Set 4: (12 comments) Thanks Daniel for reviewing. http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG@12 PS3, Line 12: which largely exceeds our memo > I think this should come before "which largely exceeds...", because "which Done http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG@17 PS3, Line 17: > Nit: space before '('. Done http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG@18 PS3, Line 18: pre-compiled : codegened > Do you mean the codegened functions? First I thought it referred to the fun Yeah, changed it to pre-compiled codegened functions. http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG@28 PS3, Line 28: > Could you also include the aggregate result over all queries? Do you mean run the time running all the tpch queries for once? Did a test, run only once for all the TPCH queries, the result of TPCH-Q1 for Execution Engine looks suspicious (maybe because it is the first of all to run), but all others look similar. Query Execution EngineObject Cache Q1 4 1.68 Q10 1.521.67 Q11 0.8 0.8 Q12 0.680.58 Q13 1.091.29 Q14 0.820.82 Q15 0.890.88 Q16 0.8 0.75 Q17 1.091.09 Q18 1.151.25 Q19 2.072.11 Q2 1.361.36 Q20 1.051.06 Q21 1.911.91 Q22 0.630.63 Q3 0.940.93 Q4 0.480.47 Q5 1.451.45 Q6 0.260.26 Q7 2.2 2.15 Q8 1.811.81 Q9 1.851.85 Total 28.85 26.8 http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache-test.cc File be/src/codegen/llvm-codegen-cache-test.cc: http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache-test.cc@99 PS3, Line 99: dule_id, bool > We could include the parameter names here and on the next line if 'codegen' Done http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache-test.cc@115 PS3, Line 115: > Maybe 'cached_engine' would be better, also on L117. Done http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache-test.cc@132 PS3, Line 132: > I'd prefer something like 'codegen_obj_cache_', it's easier to get what it Done http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache-test.cc@138 PS3, Line 138: > Could it be const? Done http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache-test.cc@142 PS3, Line 142: > Could use 'nullptr' instead. Done http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache-test.cc@225 PS3, Line 225: te a Llvm > 'module_id_echo' would be better, cf. 'module_id_double'. Done http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache.h File be/src/codegen/llvm-codegen-cache.h: http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache.h@231 PS3, Line 231: engine_cache_raw > Maybe 'engine_cache_raw[_ptr]' would be better. Done http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache.cc File be/src/codegen/llvm-codegen-cache.cc: http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache.cc@154 PS3, Line 154: Slice key; > The assignment could be made in an IF-ELSE (or a ? : operator). Now we get/ Done -- To view, visit http://gerrit.cloudera.org:8080/20733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677 Gerrit-Change-Number: 20733 Gerrit-PatchSet: 4 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Sat, 02 Dec 2023 13:15:54 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14575/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677 Gerrit-Change-Number: 20733 Gerrit-PatchSet: 4 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Sat, 02 Dec 2023 13:08:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Yida Wu has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. IMPALA-11805: Use llvm ObjectCache for codegen caching Currently, we employ llvm::ExecutionEngine for codegen caching, providing access to compiled functions within the cached engine. However, the real challenge is the ExecutionEngine uses a lot of memory which largely exceeds our memory estimates and it is very hard to predict. This patch addresses this issue by using llvm::ObjectCache for codegen caching. The approach involves associating an ObjectCache with the module (execution engine). During function compilation within the module, if an ObjectCache is set, the pre-compiled codegened functions are written into the cache. This way, when revisiting the same module (fragment), we can efficiently reuse the specific ObjectCache, loading pre-compiled codegened functions and saving time. The tpch performance test indicates no significant regression compared to the previous use of ExecutionEngine. Post-change, the actual memory usage of each codegen caching entry is notably reduced. +--+--+---++-++---++---++-+---+ | Workload | Query| File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval | +--+--+---++-++---++---++-+---+ | TPCH(1) | TPCH-Q15 | parquet / none / none | 0.48 | 0.46| +5.16% | 4.31% | 4.78%| 8 | +3.51% | 2.11| 2.22 | | TPCH(1) | TPCH-Q17 | parquet / none / none | 1.28 | 1.25| +1.76% | 1.83% | 1.88%| 8 | +3.53% | 1.34| 1.88 | | TPCH(1) | TPCH-Q8 | parquet / none / none | 0.77 | 0.75| +3.11% | 3.88% | 2.75%| 8 | +0.79% | 0.83| 1.81 | | TPCH(1) | TPCH-Q7 | parquet / none / none | 0.72 | 0.71| +1.89% | 3.48% | 3.15%| 8 | +1.61% | 0.83| 1.13 | | TPCH(1) | TPCH-Q22 | parquet / none / none | 0.43 | 0.41| +3.40% | 1.21% | 6.11%| 8 | +0.05% | 0.45| 1.54 | | TPCH(1) | TPCH-Q10 | parquet / none / none | 0.74 | 0.73| +2.04% | 0.75% | 2.57%| 8 | +1.13% | 2.11| 2.16 | | TPCH(1) | TPCH-Q20 | parquet / none / none | 0.55 | 0.54| +1.42% | 2.95% | 1.14%| 8 | +0.19% | 0.57| 1.25 | | TPCH(1) | TPCH-Q11 | parquet / none / none | 0.25 | 0.25| +0.37% | 2.41% | 3.09%| 8 | +0.82% | 0.45| 0.27 | | TPCH(1) | TPCH-Q19 | parquet / none / none | 0.62 | 0.61| +0.92% | 3.17% | 1.53%| 8 | +0.05% | 0.19| 0.73 | | TPCH(1) | TPCH-Q4 | parquet / none / none | 0.42 | 0.42| +0.37% | 1.24% | 1.08%| 8 | +0.03% | 0.19| 0.63 | | TPCH(1) | TPCH-Q5 | parquet / none / none | 0.72 | 0.72| +0.19% | 3.08% | 3.23%| 8 | +0.07% | 0.57| 0.12 | | TPCH(1) | TPCH-Q18 | parquet / none / none | 1.20 | 1.19| +0.04% | 0.12% | 0.15%| 8 | +0.03% | 0.57| 0.53 | | TPCH(1) | TPCH-Q9 | parquet / none / none | 1.13 | 1.13| +0.02% | 2.05% | 2.04%| 8 | -0.01% | -0.57 | 0.02 | | TPCH(1) | TPCH-Q3 | parquet / none / none | 0.60 | 0.60| +0.03% | 3.71% | 3.99%| 8 | -0.04% | -0.06 | 0.01 | | TPCH(1) | TPCH-Q2 | parquet / none / none | 0.44 | 0.44| -0.70% | 8.56% | 7.42%| 8 | +0.61% | 0.45| -0.18 | | TPCH(1) | TPCH-Q6 | parquet / none / none | 0.31 | 0.31| -0.07% | 1.73% | 1.56%| 8 | -0.06% | -1.34 | -0.09 | | TPCH(1) | TPCH-Q14 | parquet / none / none | 0.48 | 0.48| -0.25% | 1.20% | 1.47%| 8 | -0.09% | -0.57 | -0.37 | | TPCH(1) | TPCH-Q12 | parquet / none / none | 0.54 | 0.54| -0.86% | 1.90% | 4.48%| 8 | +0.11% | 0.32| -0.50 | | TPCH(1) | TPCH-Q21 | parquet / none / none | 1.35 | 1.37| -1.56% | 3.11% | 2.55%| 8 | -0.19% | -0.70 | -1.11 | | TPCH(1) | TPCH-Q16 | parquet / none / none | 0.26 | 0.27| -1.74% | 9.53% | * 13.61% * | 8 | -0.32% | -0.06 | -0.30 | | TPCH(1) | TPCH-Q1 | parquet / none / none | 0.64 | 0.67| -4.64% | 1.13% | * 12.59% * | 8 | -0.28% | -0.19 | -1.04 | |
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. Patch Set 3: (12 comments) Couldn't go through all of it yet, here are some comments. http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG@12 PS3, Line 12: and it is very hard to predict I think this should come before "which largely exceeds...", because "which largely exceeds..." refers to "uses a lot of memory". http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG@17 PS3, Line 17: ( Nit: space before '('. Could you clarify the relationship of the module and exec engine here? http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG@18 PS3, Line 18: pre-compiled : functions Do you mean the codegened functions? First I thought it referred to the functions compiled from C++ to IR, maybe it could be clarified. See also L21. http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG@28 PS3, Line 28: +--+--+---++-++---++---++-+---+ Could you also include the aggregate result over all queries? http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache-test.cc File be/src/codegen/llvm-codegen-cache-test.cc: http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache-test.cc@99 PS3, Line 99: string*, bool We could include the parameter names here and on the next line if 'codegen' is included... I can see that on L101 the name of the 'Function**' parameter is omitted, but on L98 all names are included. This should be uniform, I think the best would be to add the name on L101 too. http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache-test.cc@115 PS3, Line 115: engine Maybe 'cached_engine' would be better, also on L117. http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache-test.cc@132 PS3, Line 132: obj_cache_ I'd prefer something like 'codegen_obj_cache_', it's easier to get what it is just by looking at the name. http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache-test.cc@138 PS3, Line 138: string& Could it be const? http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache-test.cc@142 PS3, Line 142: NULL Could use 'nullptr' instead. http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache-test.cc@225 PS3, Line 225: module_id 'module_id_echo' would be better, cf. 'module_id_double'. http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache.h File be/src/codegen/llvm-codegen-cache.h: http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache.h@231 PS3, Line 231: engine_cache_ptr Maybe 'engine_cache_raw[_ptr]' would be better. Also, in the .cc file, the parameter names of this function have not been updated, please update them. http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache.cc File be/src/codegen/llvm-codegen-cache.cc: http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache.cc@154 PS3, Line 154: Slice key = cache_key.optimal_key_slice(); The assignment could be made in an IF-ELSE (or a ? : operator). Now we get/calculate the optimal key slice even in non-optimal mode. -- To view, visit http://gerrit.cloudera.org:8080/20733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677 Gerrit-Change-Number: 20733 Gerrit-PatchSet: 3 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 30 Nov 2023 16:19:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen-cache-test.cc File be/src/codegen/llvm-codegen-cache-test.cc: http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen-cache-test.cc@298 PS2, Line 298: // 172B for optimal mode, or 180B on ARM systems > Oh, I meant the largest difference I saw between normal and optimal mode is Did a test in both arm and x86. In our testcase of mem_charge_1, the mem charge in arm is always 8B larger than x86 in normal and optimal mode. For mem_charge_1, arm 188B/180B, x86 180B/172B. The current size setting should be good for testing, and I can pass the test in arm. http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc@1385 PS2, Line 1385: > So bytes_allocated is still needed, which means we still want https://gerri >From the source code and gdb debugging that I can see, the bytes_allocated >will be used when loading from the pre-compiled object. >https://llvm.org/doxygen/MCJIT_8cpp_source.html#l00224. While the manage usage >of pre-compiled object is currently managed by the codegen cache if codegen >cache is enabled, the bytes_allocated is managed by the runtime execution >engine, I think bytes_allocated is still useful for runtime use no matter >codegen cache is on or off. -- To view, visit http://gerrit.cloudera.org:8080/20733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677 Gerrit-Change-Number: 20733 Gerrit-PatchSet: 3 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 30 Nov 2023 10:00:35 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. Patch Set 3: (9 comments) http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen-cache-test.cc File be/src/codegen/llvm-codegen-cache-test.cc: http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen-cache-test.cc@298 PS2, Line 298: // 172B for optimal mode, or 180B on ARM systems > Adjusted the value after adding sizeof(CodeGenObjectCache). Oh, I meant the largest difference I saw between normal and optimal mode is 8B. We should double-check the difference between x86 and aarch64. http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen-cache.h File be/src/codegen/llvm-codegen-cache.h: http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen-cache.h@308 PS2, Line 308: std::unordered_map> > When creating a temporary shared_ptr from the raw pointer, there might be a Ack http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.h File be/src/codegen/llvm-codegen.h: http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.h@33 PS2, Line 33: #include > Done Ack http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.h@353 PS2, Line 353: /// retrieval. If module_id is not nullptr, the final module id is returned. > Done Ack http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.h@920 PS2, Line 920: llvm::Module* module_; > Done Ack http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc@496 PS2, Line 496: engine_cache_ = make_shared(); > Done Ack http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc@498 PS2, Line 498: RETURN_IF_ERROR(LoadIntrinsics()); > Done Ack http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc@499 PS2, Line 499: > Done Ack http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc@1385 PS2, Line 1385: > Done So bytes_allocated is still needed, which means we still want https://gerrit.cloudera.org/c/20697/ as well. -- To view, visit http://gerrit.cloudera.org:8080/20733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677 Gerrit-Change-Number: 20733 Gerrit-PatchSet: 3 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Wed, 29 Nov 2023 16:36:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. Patch Set 3: (12 comments) Thanks Michael for the review. http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen-cache-test.cc File be/src/codegen/llvm-codegen-cache-test.cc: http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen-cache-test.cc@298 PS2, Line 298: // 172B for optimal mode, or 180B on ARM systems > I experimented with this. In practice I saw minimal (<= 8B) difference betw Adjusted the value after adding sizeof(CodeGenObjectCache). http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen-cache.h File be/src/codegen/llvm-codegen-cache.h: http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen-cache.h@308 PS2, Line 308: std::unordered_map> > This would also work as a std::unordered_set. https://en.cppreference.com/w When creating a temporary shared_ptr from the raw pointer, there might be a double-free problem. I'm uncertain if there's a more optimal solution. However, considering the cache entry is typically over 1k bytes while the 8-byte pointer is relatively small, it doesn't seem like a significant concern in this context. http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.h File be/src/codegen/llvm-codegen.h: http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.h@33 PS2, Line 33: #include > What's this used for? Done http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.h@353 PS2, Line 353: /// retrieval. If module_id is not nullptr, the final module id is returned. > Can you expand what module_id is used for. Would help to identify when to p Done http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.h@920 PS2, Line 920: llvm::Module* module_; > engine_cache_wrapper no longer owns an execution engine, so this comment is Done http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc@a1185 PS2, Line 1185: > I thought this comment was kind of useful. And I see you still use it at li Done http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc@496 PS2, Line 496: engine_cache_ = make_shared(); > Why initialize it this way? You could also just pass "new CodegenObjectCach Done http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc@498 PS2, Line 498: RETURN_IF_ERROR(LoadIntrinsics()); > You could directly assign to execution_engine_ on line 479. Lots of extra m Done http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc@499 PS2, Line 499: > Why not assign directly to symbol_emitter_? Done http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc@1385 PS2, Line 1385: > Is this still right? Shouldn't it be the object cache size, since that's wh Done http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-object-cache-wrapper.h File be/src/codegen/llvm-object-cache-wrapper.h: http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-object-cache-wrapper.h@44 PS2, Line 44: > possibly other objects seems weird as there aren't other objects. Do we rea Done http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-object-cache-wrapper.cc File be/src/codegen/llvm-object-cache-wrapper.cc: http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-object-cache-wrapper.cc@66 PS2, Line 66: > Is sizeof(CodeGenObjectCache) accounted for elsewhere? Yeah, this is a good point. Added here. -- To view, visit http://gerrit.cloudera.org:8080/20733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677 Gerrit-Change-Number: 20733 Gerrit-PatchSet: 3 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Wed, 29 Nov 2023 14:37:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14548/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677 Gerrit-Change-Number: 20733 Gerrit-PatchSet: 3 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Comment-Date: Wed, 29 Nov 2023 14:33:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Yida Wu has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. IMPALA-11805: Use llvm ObjectCache for codegen caching Currently, we employ llvm::ExecutionEngine for codegen caching, providing access to compiled functions within the cached engine. However, the real challenge is the ExecutionEngine uses a lot of memory and it is very hard to predict which largely exceeds our memory estimates. This patch addresses this issue by using llvm::ObjectCache for codegen caching. The approach involves associating an ObjectCache with the module(execution engine). During function compilation within the module, if an ObjectCache is set, the pre-compiled functions are written into the cache. This way, when revisiting the same module (fragment), we can efficiently reuse the specific ObjectCache, loading pre-compiled functions and saving time. The tpch performance test indicates no significant regression compared to the previous use of ExecutionEngine. Post-change, the actual memory usage of each codegen caching entry is notably reduced. +--+--+---++-++---++---++-+---+ | Workload | Query| File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval | +--+--+---++-++---++---++-+---+ | TPCH(1) | TPCH-Q15 | parquet / none / none | 0.48 | 0.46| +5.16% | 4.31% | 4.78%| 8 | +3.51% | 2.11| 2.22 | | TPCH(1) | TPCH-Q17 | parquet / none / none | 1.28 | 1.25| +1.76% | 1.83% | 1.88%| 8 | +3.53% | 1.34| 1.88 | | TPCH(1) | TPCH-Q8 | parquet / none / none | 0.77 | 0.75| +3.11% | 3.88% | 2.75%| 8 | +0.79% | 0.83| 1.81 | | TPCH(1) | TPCH-Q7 | parquet / none / none | 0.72 | 0.71| +1.89% | 3.48% | 3.15%| 8 | +1.61% | 0.83| 1.13 | | TPCH(1) | TPCH-Q22 | parquet / none / none | 0.43 | 0.41| +3.40% | 1.21% | 6.11%| 8 | +0.05% | 0.45| 1.54 | | TPCH(1) | TPCH-Q10 | parquet / none / none | 0.74 | 0.73| +2.04% | 0.75% | 2.57%| 8 | +1.13% | 2.11| 2.16 | | TPCH(1) | TPCH-Q20 | parquet / none / none | 0.55 | 0.54| +1.42% | 2.95% | 1.14%| 8 | +0.19% | 0.57| 1.25 | | TPCH(1) | TPCH-Q11 | parquet / none / none | 0.25 | 0.25| +0.37% | 2.41% | 3.09%| 8 | +0.82% | 0.45| 0.27 | | TPCH(1) | TPCH-Q19 | parquet / none / none | 0.62 | 0.61| +0.92% | 3.17% | 1.53%| 8 | +0.05% | 0.19| 0.73 | | TPCH(1) | TPCH-Q4 | parquet / none / none | 0.42 | 0.42| +0.37% | 1.24% | 1.08%| 8 | +0.03% | 0.19| 0.63 | | TPCH(1) | TPCH-Q5 | parquet / none / none | 0.72 | 0.72| +0.19% | 3.08% | 3.23%| 8 | +0.07% | 0.57| 0.12 | | TPCH(1) | TPCH-Q18 | parquet / none / none | 1.20 | 1.19| +0.04% | 0.12% | 0.15%| 8 | +0.03% | 0.57| 0.53 | | TPCH(1) | TPCH-Q9 | parquet / none / none | 1.13 | 1.13| +0.02% | 2.05% | 2.04%| 8 | -0.01% | -0.57 | 0.02 | | TPCH(1) | TPCH-Q3 | parquet / none / none | 0.60 | 0.60| +0.03% | 3.71% | 3.99%| 8 | -0.04% | -0.06 | 0.01 | | TPCH(1) | TPCH-Q2 | parquet / none / none | 0.44 | 0.44| -0.70% | 8.56% | 7.42%| 8 | +0.61% | 0.45| -0.18 | | TPCH(1) | TPCH-Q6 | parquet / none / none | 0.31 | 0.31| -0.07% | 1.73% | 1.56%| 8 | -0.06% | -1.34 | -0.09 | | TPCH(1) | TPCH-Q14 | parquet / none / none | 0.48 | 0.48| -0.25% | 1.20% | 1.47%| 8 | -0.09% | -0.57 | -0.37 | | TPCH(1) | TPCH-Q12 | parquet / none / none | 0.54 | 0.54| -0.86% | 1.90% | 4.48%| 8 | +0.11% | 0.32| -0.50 | | TPCH(1) | TPCH-Q21 | parquet / none / none | 1.35 | 1.37| -1.56% | 3.11% | 2.55%| 8 | -0.19% | -0.70 | -1.11 | | TPCH(1) | TPCH-Q16 | parquet / none / none | 0.26 | 0.27| -1.74% | 9.53% | * 13.61% * | 8 | -0.32% | -0.06 | -0.30 | | TPCH(1) | TPCH-Q1 | parquet / none / none | 0.64 | 0.67| -4.64% | 1.13% | * 12.59% * | 8 | -0.28% | -0.19 | -1.04 | | TPCH(1) | TPCH-Q13 |
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc@1385 PS2, Line 1385: if (!mem_tracker_->TryConsume(bytes_allocated)) { Is this still right? Shouldn't it be the object cache size, since that's what we're retaining? I would expect bytes_allocated() to be removed in this patch. -- To view, visit http://gerrit.cloudera.org:8080/20733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677 Gerrit-Change-Number: 20733 Gerrit-PatchSet: 2 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Comment-Date: Tue, 28 Nov 2023 18:22:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. Patch Set 2: (11 comments) http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen-cache-test.cc File be/src/codegen/llvm-codegen-cache-test.cc: http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen-cache-test.cc@298 PS2, Line 298: // 130B for optimal mode, or 148B on ARM systems I experimented with this. In practice I saw minimal (<= 8B) difference between normal and optimal modes. http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen-cache.h File be/src/codegen/llvm-codegen-cache.h: http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen-cache.h@308 PS2, Line 308: std::unordered_map> This would also work as a std::unordered_set. https://en.cppreference.com/w/cpp/memory/shared_ptr/hash means the lookup would be based on pointer value. Lookups would construct a temporary shared_ptr, but it would reduce memory use. http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.h File be/src/codegen/llvm-codegen.h: http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.h@33 PS2, Line 33: #include What's this used for? http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.h@353 PS2, Line 353: /// Returns the module id if module_id is not nullptr. Can you expand what module_id is used for. Would help to identify when to pass a non-null argument. http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.h@920 PS2, Line 920: /// engine_cache_wrapper_. engine_cache_wrapper no longer owns an execution engine, so this comment is incorrect. Should be module_ is owned by execution_engine_. http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc@a1185 PS2, Line 1185: I thought this comment was kind of useful. And I see you still use it at line 1207, so we should keep them all. http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc@496 PS2, Line 496: unique_ptr obj_cache(new CodeGenObjectCache); Why initialize it this way? You could also just pass "new CodegenObjectCache" on line 503. http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc@498 PS2, Line 498: execution_engine_ = std::move(execution_engine); You could directly assign to execution_engine_ on line 479. Lots of extra moves here for no additional use or clarity. http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc@499 PS2, Line 499: unique_ptr symbol_emitter = Why not assign directly to symbol_emitter_? http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-object-cache-wrapper.h File be/src/codegen/llvm-object-cache-wrapper.h: http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-object-cache-wrapper.h@44 PS2, Line 44: /// Class that encapsulates a non-NULL 'CodeGenObjectCache' and possibly other objects possibly other objects seems weird as there aren't other objects. Do we really need this wrapper? http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-object-cache-wrapper.cc File be/src/codegen/llvm-object-cache-wrapper.cc: http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-object-cache-wrapper.cc@66 PS2, Line 66: size_t CodeGenObjectCache::objSize() { Is sizeof(CodeGenObjectCache) accounted for elsewhere? -- To view, visit http://gerrit.cloudera.org:8080/20733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677 Gerrit-Change-Number: 20733 Gerrit-PatchSet: 2 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Comment-Date: Tue, 28 Nov 2023 18:19:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14526/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677 Gerrit-Change-Number: 20733 Gerrit-PatchSet: 2 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 28 Nov 2023 07:35:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Yida Wu has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. IMPALA-11805: Use llvm ObjectCache for codegen caching Currently, we employ llvm::ExecutionEngine for codegen caching, providing access to compiled functions within the cached engine. However, the real challenge is the ExecutionEngine uses a lot of memory and it is very hard to predict which largely exceeds our memory estimates. This patch addresses this issue by using llvm::ObjectCache for codegen caching. The approach involves associating an ObjectCache with the module(execution engine). During function compilation within the module, if an ObjectCache is set, the pre-compiled functions are written into the cache. This way, when revisiting the same module (fragment), we can efficiently reuse the specific ObjectCache, loading pre-compiled functions and saving time. The tpch performance test indicates no significant regression compared to the previous use of ExecutionEngine. Post-change, the actual memory usage of each codegen caching entry is notably reduced. +--+--+---++-++---++---++-+---+ | Workload | Query| File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval | +--+--+---++-++---++---++-+---+ | TPCH(1) | TPCH-Q15 | parquet / none / none | 0.48 | 0.46| +5.16% | 4.31% | 4.78%| 8 | +3.51% | 2.11| 2.22 | | TPCH(1) | TPCH-Q17 | parquet / none / none | 1.28 | 1.25| +1.76% | 1.83% | 1.88%| 8 | +3.53% | 1.34| 1.88 | | TPCH(1) | TPCH-Q8 | parquet / none / none | 0.77 | 0.75| +3.11% | 3.88% | 2.75%| 8 | +0.79% | 0.83| 1.81 | | TPCH(1) | TPCH-Q7 | parquet / none / none | 0.72 | 0.71| +1.89% | 3.48% | 3.15%| 8 | +1.61% | 0.83| 1.13 | | TPCH(1) | TPCH-Q22 | parquet / none / none | 0.43 | 0.41| +3.40% | 1.21% | 6.11%| 8 | +0.05% | 0.45| 1.54 | | TPCH(1) | TPCH-Q10 | parquet / none / none | 0.74 | 0.73| +2.04% | 0.75% | 2.57%| 8 | +1.13% | 2.11| 2.16 | | TPCH(1) | TPCH-Q20 | parquet / none / none | 0.55 | 0.54| +1.42% | 2.95% | 1.14%| 8 | +0.19% | 0.57| 1.25 | | TPCH(1) | TPCH-Q11 | parquet / none / none | 0.25 | 0.25| +0.37% | 2.41% | 3.09%| 8 | +0.82% | 0.45| 0.27 | | TPCH(1) | TPCH-Q19 | parquet / none / none | 0.62 | 0.61| +0.92% | 3.17% | 1.53%| 8 | +0.05% | 0.19| 0.73 | | TPCH(1) | TPCH-Q4 | parquet / none / none | 0.42 | 0.42| +0.37% | 1.24% | 1.08%| 8 | +0.03% | 0.19| 0.63 | | TPCH(1) | TPCH-Q5 | parquet / none / none | 0.72 | 0.72| +0.19% | 3.08% | 3.23%| 8 | +0.07% | 0.57| 0.12 | | TPCH(1) | TPCH-Q18 | parquet / none / none | 1.20 | 1.19| +0.04% | 0.12% | 0.15%| 8 | +0.03% | 0.57| 0.53 | | TPCH(1) | TPCH-Q9 | parquet / none / none | 1.13 | 1.13| +0.02% | 2.05% | 2.04%| 8 | -0.01% | -0.57 | 0.02 | | TPCH(1) | TPCH-Q3 | parquet / none / none | 0.60 | 0.60| +0.03% | 3.71% | 3.99%| 8 | -0.04% | -0.06 | 0.01 | | TPCH(1) | TPCH-Q2 | parquet / none / none | 0.44 | 0.44| -0.70% | 8.56% | 7.42%| 8 | +0.61% | 0.45| -0.18 | | TPCH(1) | TPCH-Q6 | parquet / none / none | 0.31 | 0.31| -0.07% | 1.73% | 1.56%| 8 | -0.06% | -1.34 | -0.09 | | TPCH(1) | TPCH-Q14 | parquet / none / none | 0.48 | 0.48| -0.25% | 1.20% | 1.47%| 8 | -0.09% | -0.57 | -0.37 | | TPCH(1) | TPCH-Q12 | parquet / none / none | 0.54 | 0.54| -0.86% | 1.90% | 4.48%| 8 | +0.11% | 0.32| -0.50 | | TPCH(1) | TPCH-Q21 | parquet / none / none | 1.35 | 1.37| -1.56% | 3.11% | 2.55%| 8 | -0.19% | -0.70 | -1.11 | | TPCH(1) | TPCH-Q16 | parquet / none / none | 0.26 | 0.27| -1.74% | 9.53% | * 13.61% * | 8 | -0.32% | -0.06 | -0.30 | | TPCH(1) | TPCH-Q1 | parquet / none / none | 0.64 | 0.67| -4.64% | 1.13% | * 12.59% * | 8 | -0.28% | -0.19 | -1.04 | | TPCH(1) | TPCH-Q13 |
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. Patch Set 1: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/14525/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/20733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677 Gerrit-Change-Number: 20733 Gerrit-PatchSet: 1 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 28 Nov 2023 06:44:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Yida Wu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/20733 Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. IMPALA-11805: Use llvm ObjectCache for codegen caching Currently, we employ llvm::ExecutionEngine for codegen caching, providing access to compiled functions within the cached engine. However, the real challenge is the ExecutionEngine uses a lot of memory and it is very hard to predict which largely exceeds our memory estimates. This patch addresses this issue by using llvm::ObjectCache for codegen caching. The approach involves associating an ObjectCache with the module(execution engine). During function compilation within the module, if an ObjectCache is set, the pre-compiled functions are written into the cache. This way, when revisiting the same module (fragment), we can efficiently reuse the specific ObjectCache, loading pre-compiled functions and saving time. The tpch performance test indicates no significant regression compared to the previous use of ExecutionEngine. Post-change, the actual memory usage of each codegen caching entry is notably reduced. +--+--+---++-++---++---++-+---+ | Workload | Query| File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval | +--+--+---++-++---++---++-+---+ | TPCH(1) | TPCH-Q15 | parquet / none / none | 0.48 | 0.46| +5.16% | 4.31% | 4.78%| 8 | +3.51% | 2.11| 2.22 | | TPCH(1) | TPCH-Q17 | parquet / none / none | 1.28 | 1.25| +1.76% | 1.83% | 1.88%| 8 | +3.53% | 1.34| 1.88 | | TPCH(1) | TPCH-Q8 | parquet / none / none | 0.77 | 0.75| +3.11% | 3.88% | 2.75%| 8 | +0.79% | 0.83| 1.81 | | TPCH(1) | TPCH-Q7 | parquet / none / none | 0.72 | 0.71| +1.89% | 3.48% | 3.15%| 8 | +1.61% | 0.83| 1.13 | | TPCH(1) | TPCH-Q22 | parquet / none / none | 0.43 | 0.41| +3.40% | 1.21% | 6.11%| 8 | +0.05% | 0.45| 1.54 | | TPCH(1) | TPCH-Q10 | parquet / none / none | 0.74 | 0.73| +2.04% | 0.75% | 2.57%| 8 | +1.13% | 2.11| 2.16 | | TPCH(1) | TPCH-Q20 | parquet / none / none | 0.55 | 0.54| +1.42% | 2.95% | 1.14%| 8 | +0.19% | 0.57| 1.25 | | TPCH(1) | TPCH-Q11 | parquet / none / none | 0.25 | 0.25| +0.37% | 2.41% | 3.09%| 8 | +0.82% | 0.45| 0.27 | | TPCH(1) | TPCH-Q19 | parquet / none / none | 0.62 | 0.61| +0.92% | 3.17% | 1.53%| 8 | +0.05% | 0.19| 0.73 | | TPCH(1) | TPCH-Q4 | parquet / none / none | 0.42 | 0.42| +0.37% | 1.24% | 1.08%| 8 | +0.03% | 0.19| 0.63 | | TPCH(1) | TPCH-Q5 | parquet / none / none | 0.72 | 0.72| +0.19% | 3.08% | 3.23%| 8 | +0.07% | 0.57| 0.12 | | TPCH(1) | TPCH-Q18 | parquet / none / none | 1.20 | 1.19| +0.04% | 0.12% | 0.15%| 8 | +0.03% | 0.57| 0.53 | | TPCH(1) | TPCH-Q9 | parquet / none / none | 1.13 | 1.13| +0.02% | 2.05% | 2.04%| 8 | -0.01% | -0.57 | 0.02 | | TPCH(1) | TPCH-Q3 | parquet / none / none | 0.60 | 0.60| +0.03% | 3.71% | 3.99%| 8 | -0.04% | -0.06 | 0.01 | | TPCH(1) | TPCH-Q2 | parquet / none / none | 0.44 | 0.44| -0.70% | 8.56% | 7.42%| 8 | +0.61% | 0.45| -0.18 | | TPCH(1) | TPCH-Q6 | parquet / none / none | 0.31 | 0.31| -0.07% | 1.73% | 1.56%| 8 | -0.06% | -1.34 | -0.09 | | TPCH(1) | TPCH-Q14 | parquet / none / none | 0.48 | 0.48| -0.25% | 1.20% | 1.47%| 8 | -0.09% | -0.57 | -0.37 | | TPCH(1) | TPCH-Q12 | parquet / none / none | 0.54 | 0.54| -0.86% | 1.90% | 4.48%| 8 | +0.11% | 0.32| -0.50 | | TPCH(1) | TPCH-Q21 | parquet / none / none | 1.35 | 1.37| -1.56% | 3.11% | 2.55%| 8 | -0.19% | -0.70 | -1.11 | | TPCH(1) | TPCH-Q16 | parquet / none / none | 0.26 | 0.27| -1.74% | 9.53% | * 13.61% * | 8 | -0.32% | -0.06 | -0.30 | | TPCH(1) | TPCH-Q1 | parquet / none / none | 0.64 | 0.67| -4.64% | 1.13% | * 12.59% * | 8 | -0.28% | -0.19 | -1.04 | | TPCH(1) | TPCH-Q13
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/20733/1/be/src/codegen/llvm-object-cache-wrapper.cc File be/src/codegen/llvm-object-cache-wrapper.cc: http://gerrit.cloudera.org:8080/#/c/20733/1/be/src/codegen/llvm-object-cache-wrapper.cc@50 PS1, Line 50: // Return a copy of the object cache containing the compiled module with the specified ID. line too long (92 > 90) -- To view, visit http://gerrit.cloudera.org:8080/20733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677 Gerrit-Change-Number: 20733 Gerrit-PatchSet: 1 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 28 Nov 2023 06:17:54 + Gerrit-HasComments: Yes