[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-12-19 Thread Michael Smith (Code Review)
Michael Smith has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..

IMPALA-11805: Use llvm ObjectCache for codegen caching

Currently, we employ llvm::ExecutionEngine for codegen caching,
providing access to compiled functions within the cached engine.
However, the real challenge is the ExecutionEngine uses a lot of
memory which largely exceeds our memory estimates and it is very
hard to predict.

This patch addresses this issue by using llvm::ObjectCache for
codegen caching. In our case, each execution engine would have
only one module, and after the compilation of the module, the
compiled codegened functions of the module would be set to the
execution engine, therefore functions could be used by Impala.
During function compilation within the module, if an ObjectCache
is set to the execution engine, the compiled codegened functions
would be also written into the cache. This way, if we keep the
cache, when revisiting the same module (fragment), we can
efficiently reuse the specific ObjectCache, loading pre-compiled
codegened functions and saving time.

The tpch performance test indicates no significant regression
compared to the previous use of ExecutionEngine. Post-change,
the actual memory usage of each codegen caching entry is notably
reduced.

+--+---+-++++
| Workload | File Format   | Avg (s) | Delta(Avg) | GeoMean(s) | 
Delta(GeoMean) |
+--+---+-++++
| TPCH(1)  | parquet / none / none | 0.22| -0.65% | 0.20   | -0.75% 
|
+--+---+-++++
+--+--+---++-++++---++-+---+
| Workload | Query| File Format   | Avg(s) | Base Avg(s) | 
Delta(Avg) | StdDev(%)  | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | 
Tval  |
+--+--+---++-++++---++-+---+
| TPCH(1)  | TPCH-Q13 | parquet / none / none | 0.49   | 0.47|   +2.80% 
  |   5.32%|   5.07%| 10|   +1.22%   | 1.63| 1.19  |
| TPCH(1)  | TPCH-Q4  | parquet / none / none | 0.16   | 0.16|   +3.51% 
  |   1.32%| * 10.38% * | 10|   +0.06%   | 0.49| 1.06  |
| TPCH(1)  | TPCH-Q11 | parquet / none / none | 0.12   | 0.12|   +1.39% 
  |   2.27%|   2.24%| 10|   +1.50%   | 1.90| 1.37  |
| TPCH(1)  | TPCH-Q19 | parquet / none / none | 0.21   | 0.21|   +1.56% 
  | * 10.02% * | * 11.42% * | 10|   +1.18%   | 0.57| 0.32  |
| TPCH(1)  | TPCH-Q18 | parquet / none / none | 0.27   | 0.27|   +1.71% 
  |   6.46%|   1.29%| 10|   -0.19%   | -1.19   | 0.81  |
| TPCH(1)  | TPCH-Q6  | parquet / none / none | 0.11   | 0.11|   +0.79% 
  |   2.76%|   2.15%| 10|   +0.10%   | 1.46| 0.71  |
| TPCH(1)  | TPCH-Q3  | parquet / none / none | 0.26   | 0.26|   +0.71% 
  |   6.63%|   6.18%| 10|   +0.04%   | 0.49| 0.25  |
| TPCH(1)  | TPCH-Q17 | parquet / none / none | 0.17   | 0.17|   +0.41% 
  | * 14.66% * | * 13.01% * | 10|   +0.05%   | 0.40| 0.07  |
| TPCH(1)  | TPCH-Q14 | parquet / none / none | 0.16   | 0.16|   +0.19% 
  |   1.41%|   1.39%| 10|   +0.25%   | 1.46| 0.31  |
| TPCH(1)  | TPCH-Q20 | parquet / none / none | 0.17   | 0.17|   +0.22% 
  |   1.70%|   1.77%| 10|   -0.05%   | -0.40   | 0.28  |
| TPCH(1)  | TPCH-Q12 | parquet / none / none | 0.16   | 0.16|   -0.27% 
  |   0.54%|   1.46%| 10|   +0.14%   | 0.93| -0.54 |
| TPCH(1)  | TPCH-Q22 | parquet / none / none | 0.11   | 0.11|   -0.38% 
  |   0.81%|   2.06%| 10|   +0.03%   | 0.22| -0.54 |
| TPCH(1)  | TPCH-Q16 | parquet / none / none | 0.17   | 0.17|   -0.38% 
  |   0.67%|   1.58%| 10|   -0.01%   | -0.13   | -0.70 |
| TPCH(1)  | TPCH-Q8  | parquet / none / none | 0.27   | 0.27|   -0.08% 
  |   1.24%|   1.15%| 10|   -0.33%   | -1.37   | -0.15 |
| TPCH(1)  | TPCH-Q15 | parquet / none / none | 0.16   | 0.16|   -1.18% 
  | * 16.61% * | * 10.25% * | 10|   +0.33%   | 0.40| -0.19 |
| TPCH(1)  | TPCH-Q1  | parquet / none / none | 0.22   | 0.22|   -1.67% 
  |   1.62%|   7.45%| 10|   +0.43%   | 1.02| -0.70 |
| TPCH(1)  | TPCH-Q5  | parquet / none / none | 0.22   | 0.22|   -0.98% 
  |   0.22%|   

[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-12-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..


Patch Set 8: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/20733
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677
Gerrit-Change-Number: 20733
Gerrit-PatchSet: 8
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Tue, 19 Dec 2023 00:34:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-12-18 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..


Patch Set 8: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/20733
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677
Gerrit-Change-Number: 20733
Gerrit-PatchSet: 8
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Mon, 18 Dec 2023 20:06:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-12-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..


Patch Set 8:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10069/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/20733
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677
Gerrit-Change-Number: 20733
Gerrit-PatchSet: 8
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Mon, 18 Dec 2023 20:06:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-12-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14766/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20733
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677
Gerrit-Change-Number: 20733
Gerrit-PatchSet: 8
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Mon, 18 Dec 2023 08:16:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-12-17 Thread Yida Wu (Code Review)
Yida Wu has uploaded a new patch set (#8). ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..

IMPALA-11805: Use llvm ObjectCache for codegen caching

Currently, we employ llvm::ExecutionEngine for codegen caching,
providing access to compiled functions within the cached engine.
However, the real challenge is the ExecutionEngine uses a lot of
memory which largely exceeds our memory estimates and it is very
hard to predict.

This patch addresses this issue by using llvm::ObjectCache for
codegen caching. In our case, each execution engine would have
only one module, and after the compilation of the module, the
compiled codegened functions of the module would be set to the
execution engine, therefore functions could be used by Impala.
During function compilation within the module, if an ObjectCache
is set to the execution engine, the compiled codegened functions
would be also written into the cache. This way, if we keep the
cache, when revisiting the same module (fragment), we can
efficiently reuse the specific ObjectCache, loading pre-compiled
codegened functions and saving time.

The tpch performance test indicates no significant regression
compared to the previous use of ExecutionEngine. Post-change,
the actual memory usage of each codegen caching entry is notably
reduced.

+--+---+-++++
| Workload | File Format   | Avg (s) | Delta(Avg) | GeoMean(s) | 
Delta(GeoMean) |
+--+---+-++++
| TPCH(1)  | parquet / none / none | 0.22| -0.65% | 0.20   | -0.75% 
|
+--+---+-++++
+--+--+---++-++++---++-+---+
| Workload | Query| File Format   | Avg(s) | Base Avg(s) | 
Delta(Avg) | StdDev(%)  | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | 
Tval  |
+--+--+---++-++++---++-+---+
| TPCH(1)  | TPCH-Q13 | parquet / none / none | 0.49   | 0.47|   +2.80% 
  |   5.32%|   5.07%| 10|   +1.22%   | 1.63| 1.19  |
| TPCH(1)  | TPCH-Q4  | parquet / none / none | 0.16   | 0.16|   +3.51% 
  |   1.32%| * 10.38% * | 10|   +0.06%   | 0.49| 1.06  |
| TPCH(1)  | TPCH-Q11 | parquet / none / none | 0.12   | 0.12|   +1.39% 
  |   2.27%|   2.24%| 10|   +1.50%   | 1.90| 1.37  |
| TPCH(1)  | TPCH-Q19 | parquet / none / none | 0.21   | 0.21|   +1.56% 
  | * 10.02% * | * 11.42% * | 10|   +1.18%   | 0.57| 0.32  |
| TPCH(1)  | TPCH-Q18 | parquet / none / none | 0.27   | 0.27|   +1.71% 
  |   6.46%|   1.29%| 10|   -0.19%   | -1.19   | 0.81  |
| TPCH(1)  | TPCH-Q6  | parquet / none / none | 0.11   | 0.11|   +0.79% 
  |   2.76%|   2.15%| 10|   +0.10%   | 1.46| 0.71  |
| TPCH(1)  | TPCH-Q3  | parquet / none / none | 0.26   | 0.26|   +0.71% 
  |   6.63%|   6.18%| 10|   +0.04%   | 0.49| 0.25  |
| TPCH(1)  | TPCH-Q17 | parquet / none / none | 0.17   | 0.17|   +0.41% 
  | * 14.66% * | * 13.01% * | 10|   +0.05%   | 0.40| 0.07  |
| TPCH(1)  | TPCH-Q14 | parquet / none / none | 0.16   | 0.16|   +0.19% 
  |   1.41%|   1.39%| 10|   +0.25%   | 1.46| 0.31  |
| TPCH(1)  | TPCH-Q20 | parquet / none / none | 0.17   | 0.17|   +0.22% 
  |   1.70%|   1.77%| 10|   -0.05%   | -0.40   | 0.28  |
| TPCH(1)  | TPCH-Q12 | parquet / none / none | 0.16   | 0.16|   -0.27% 
  |   0.54%|   1.46%| 10|   +0.14%   | 0.93| -0.54 |
| TPCH(1)  | TPCH-Q22 | parquet / none / none | 0.11   | 0.11|   -0.38% 
  |   0.81%|   2.06%| 10|   +0.03%   | 0.22| -0.54 |
| TPCH(1)  | TPCH-Q16 | parquet / none / none | 0.17   | 0.17|   -0.38% 
  |   0.67%|   1.58%| 10|   -0.01%   | -0.13   | -0.70 |
| TPCH(1)  | TPCH-Q8  | parquet / none / none | 0.27   | 0.27|   -0.08% 
  |   1.24%|   1.15%| 10|   -0.33%   | -1.37   | -0.15 |
| TPCH(1)  | TPCH-Q15 | parquet / none / none | 0.16   | 0.16|   -1.18% 
  | * 16.61% * | * 10.25% * | 10|   +0.33%   | 0.40| -0.19 |
| TPCH(1)  | TPCH-Q1  | parquet / none / none | 0.22   | 0.22|   -1.67% 
  |   1.62%|   7.45%| 10|   +0.43%   | 1.02| -0.70 |
| TPCH(1)  | TPCH-Q5  | parquet / none / none | 0.22   | 0.22|   -0.98% 
  |   0.22%|   1.55%| 

[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-12-15 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..


Patch Set 7:

(10 comments)

Thanks Yida!

http://gerrit.cloudera.org:8080/#/c/20733/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20733/7//COMMIT_MSG@17
PS7, Line 17: after
Could add "and" before "after".


http://gerrit.cloudera.org:8080/#/c/20733/7//COMMIT_MSG@19
PS7, Line 19: funtions
Nit: funCtions


http://gerrit.cloudera.org:8080/#/c/20733/7/be/src/codegen/llvm-codegen-cache-test.cc
File be/src/codegen/llvm-codegen-cache-test.cc:

http://gerrit.cloudera.org:8080/#/c/20733/7/be/src/codegen/llvm-codegen-cache-test.cc@37
PS7, Line 37: // The object cache size for an engine containing a single 
compiled
Could include that it is in bytes.


http://gerrit.cloudera.org:8080/#/c/20733/7/be/src/codegen/llvm-codegen-cache-test.cc@41
PS7, Line 41: _LARGE
Are there tests with small (non-large) capacity? Otherwise it doesn't need to 
be included in the name.


http://gerrit.cloudera.org:8080/#/c/20733/7/be/src/codegen/llvm-codegen-cache-test.cc@50
PS7, Line 50: // Using single shard makes the logic of scenarios simple for 
capacity and
: // eviction-related behavior.
: FLAGS_cache_force_single_shard = true;
Duplicate lines from L46.


http://gerrit.cloudera.org:8080/#/c/20733/7/be/src/codegen/llvm-codegen-cache-test.cc@114
PS7, Line 114:   shared_ptr CreateObjCache(
I don't think the arguments should have default values.
There is only one place where this function is called without arguments, it 
could explicitly pass (false, nullptr).


http://gerrit.cloudera.org:8080/#/c/20733/7/be/src/codegen/llvm-codegen-cache-test.cc@271
PS7, Line 271: codegen
This could also be 'codegen_echo'.


http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-object-cache.cc
File be/src/codegen/llvm-codegen-object-cache.cc:

http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-object-cache.cc@57
PS4, Line 57:   // ID. If the ID doesn't match the one in the object cache, a 
warning is logged for
> From the definition of the interface getObject(), "Returns a pointer to a n
I haven't checked what MemoryBuffer objects are, but is it true that we've 
already copied the buffer into 'cached_obj_'? Or who owns it? Is it possible 
that the reference we give out will outlive this object? If so, we should put 
the buffer behind a shared pointer, if not, then we could safely give out 
references to it.

I'm not familiar with how these objects can be used in LLVM but always copying 
the whole module buffer seems wasteful.


http://gerrit.cloudera.org:8080/#/c/20733/7/be/src/codegen/llvm-codegen-object-cache.cc
File be/src/codegen/llvm-codegen-object-cache.cc:

http://gerrit.cloudera.org:8080/#/c/20733/7/be/src/codegen/llvm-codegen-object-cache.cc@39
PS7, Line 39: << ObjBuffer.getBuffer().size();
Some context could be provided, like "size: " and the unit (bytes?). Applies 
also to L62.


http://gerrit.cloudera.org:8080/#/c/20733/7/be/src/codegen/llvm-codegen-object-cache.cc@56
PS7, Line 56: object cache
Isn't it the "cached object" instead? See also on the next line.



--
To view, visit http://gerrit.cloudera.org:8080/20733
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677
Gerrit-Change-Number: 20733
Gerrit-PatchSet: 7
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Fri, 15 Dec 2023 12:23:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-12-12 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..


Patch Set 7: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/20733
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677
Gerrit-Change-Number: 20733
Gerrit-PatchSet: 7
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Tue, 12 Dec 2023 19:26:43 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-12-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14657/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20733
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677
Gerrit-Change-Number: 20733
Gerrit-PatchSet: 7
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Tue, 12 Dec 2023 06:32:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-12-11 Thread Yida Wu (Code Review)
Yida Wu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..


Patch Set 7:

(1 comment)

Rebased and updated the commit message over the perf A-B result.

http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG@28
PS3, Line 28: compared to the previous use of ExecutionEngine. Post-change,
> https://jenkins.impala.io/job/perf-AB-test-ub2004/29 for example shows
Thanks Michael. Yeah, that makes sense. Reran the perf A-B test in 
https://jenkins.impala.io/job/perf-AB-test-ub2004/33/. And added below to the 
commit message.
+--+---+-++++
| Workload | File Format   | Avg (s) | Delta(Avg) | GeoMean(s) | 
Delta(GeoMean) |
+--+---+-++++
| TPCH(1)  | parquet / none / none | 0.22| -0.65% | 0.20   | -0.75% 
|
+--+---+-++++



--
To view, visit http://gerrit.cloudera.org:8080/20733
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677
Gerrit-Change-Number: 20733
Gerrit-PatchSet: 7
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Tue, 12 Dec 2023 06:06:26 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-12-11 Thread Yida Wu (Code Review)
Yida Wu has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..

IMPALA-11805: Use llvm ObjectCache for codegen caching

Currently, we employ llvm::ExecutionEngine for codegen caching,
providing access to compiled functions within the cached engine.
However, the real challenge is the ExecutionEngine uses a lot of
memory which largely exceeds our memory estimates and it is very
hard to predict.

This patch addresses this issue by using llvm::ObjectCache for
codegen caching. In our case, each execution engine would have
only one module, after the compilation of the module, the compiled
codegened functions of the module would be set to the execution
engine, therefore funtions could be used by Impala. During function
compilation within the module, if an ObjectCache is set to the
execution engine, the compiled codegened functions would be also
written into the cache. This way, if we keep the cache, when
revisiting the same module (fragment), we can efficiently reuse
the specific ObjectCache, loading pre-compiled codegened functions
and saving time.

The tpch performance test indicates no significant regression
compared to the previous use of ExecutionEngine. Post-change,
the actual memory usage of each codegen caching entry is notably
reduced.

+--+---+-++++
| Workload | File Format   | Avg (s) | Delta(Avg) | GeoMean(s) | 
Delta(GeoMean) |
+--+---+-++++
| TPCH(1)  | parquet / none / none | 0.22| -0.65% | 0.20   | -0.75% 
|
+--+---+-++++
+--+--+---++-++++---++-+---+
| Workload | Query| File Format   | Avg(s) | Base Avg(s) | 
Delta(Avg) | StdDev(%)  | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | 
Tval  |
+--+--+---++-++++---++-+---+
| TPCH(1)  | TPCH-Q13 | parquet / none / none | 0.49   | 0.47|   +2.80% 
  |   5.32%|   5.07%| 10|   +1.22%   | 1.63| 1.19  |
| TPCH(1)  | TPCH-Q4  | parquet / none / none | 0.16   | 0.16|   +3.51% 
  |   1.32%| * 10.38% * | 10|   +0.06%   | 0.49| 1.06  |
| TPCH(1)  | TPCH-Q11 | parquet / none / none | 0.12   | 0.12|   +1.39% 
  |   2.27%|   2.24%| 10|   +1.50%   | 1.90| 1.37  |
| TPCH(1)  | TPCH-Q19 | parquet / none / none | 0.21   | 0.21|   +1.56% 
  | * 10.02% * | * 11.42% * | 10|   +1.18%   | 0.57| 0.32  |
| TPCH(1)  | TPCH-Q18 | parquet / none / none | 0.27   | 0.27|   +1.71% 
  |   6.46%|   1.29%| 10|   -0.19%   | -1.19   | 0.81  |
| TPCH(1)  | TPCH-Q6  | parquet / none / none | 0.11   | 0.11|   +0.79% 
  |   2.76%|   2.15%| 10|   +0.10%   | 1.46| 0.71  |
| TPCH(1)  | TPCH-Q3  | parquet / none / none | 0.26   | 0.26|   +0.71% 
  |   6.63%|   6.18%| 10|   +0.04%   | 0.49| 0.25  |
| TPCH(1)  | TPCH-Q17 | parquet / none / none | 0.17   | 0.17|   +0.41% 
  | * 14.66% * | * 13.01% * | 10|   +0.05%   | 0.40| 0.07  |
| TPCH(1)  | TPCH-Q14 | parquet / none / none | 0.16   | 0.16|   +0.19% 
  |   1.41%|   1.39%| 10|   +0.25%   | 1.46| 0.31  |
| TPCH(1)  | TPCH-Q20 | parquet / none / none | 0.17   | 0.17|   +0.22% 
  |   1.70%|   1.77%| 10|   -0.05%   | -0.40   | 0.28  |
| TPCH(1)  | TPCH-Q12 | parquet / none / none | 0.16   | 0.16|   -0.27% 
  |   0.54%|   1.46%| 10|   +0.14%   | 0.93| -0.54 |
| TPCH(1)  | TPCH-Q22 | parquet / none / none | 0.11   | 0.11|   -0.38% 
  |   0.81%|   2.06%| 10|   +0.03%   | 0.22| -0.54 |
| TPCH(1)  | TPCH-Q16 | parquet / none / none | 0.17   | 0.17|   -0.38% 
  |   0.67%|   1.58%| 10|   -0.01%   | -0.13   | -0.70 |
| TPCH(1)  | TPCH-Q8  | parquet / none / none | 0.27   | 0.27|   -0.08% 
  |   1.24%|   1.15%| 10|   -0.33%   | -1.37   | -0.15 |
| TPCH(1)  | TPCH-Q15 | parquet / none / none | 0.16   | 0.16|   -1.18% 
  | * 16.61% * | * 10.25% * | 10|   +0.33%   | 0.40| -0.19 |
| TPCH(1)  | TPCH-Q1  | parquet / none / none | 0.22   | 0.22|   -1.67% 
  |   1.62%|   7.45%| 10|   +0.43%   | 1.02| -0.70 |
| TPCH(1)  | TPCH-Q5  | parquet / none / none | 0.22   | 0.22|   -0.98% 
  |   0.22%|   1.55%| 10

[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-12-11 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG@28
PS3, Line 28: compared to the previous use of ExecutionEngine. Post-change,
> Double checked, but it doesn't seem to have the grand total of our benchmar
https://jenkins.impala.io/job/perf-AB-test-ub2004/29 for example shows

  
+---+---+-++++
  | Workload  | File Format   | Avg (s) | Delta(Avg) | GeoMean(s) | 
Delta(GeoMean) |
  
+---+---+-++++
  | TPCDS(20) | parquet / none / none | 1.23| -1.89% | 0.54   | 
-6.53% |
  
+---+---+-++++

I guess that's added some other way?



--
To view, visit http://gerrit.cloudera.org:8080/20733
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677
Gerrit-Change-Number: 20733
Gerrit-PatchSet: 6
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Mon, 11 Dec 2023 18:04:08 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-12-10 Thread Yida Wu (Code Review)
Yida Wu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG@28
PS3, Line 28: compared to the previous use of ExecutionEngine. Post-change,
> Doesn't the benchmarking framework print a grand total? If not, we can let
Double checked, but it doesn't seem to have the grand total of our benchmarking 
framework. Filed IMPALA-12615 for it.



--
To view, visit http://gerrit.cloudera.org:8080/20733
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677
Gerrit-Change-Number: 20733
Gerrit-PatchSet: 6
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Mon, 11 Dec 2023 03:18:56 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-12-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14616/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20733
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677
Gerrit-Change-Number: 20733
Gerrit-PatchSet: 6
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Fri, 08 Dec 2023 00:55:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-12-07 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..


Patch Set 6: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/20733
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677
Gerrit-Change-Number: 20733
Gerrit-PatchSet: 6
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Fri, 08 Dec 2023 00:34:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-12-07 Thread Yida Wu (Code Review)
Yida Wu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/20733/5/be/src/codegen/llvm-codegen-cache-test.cc
File be/src/codegen/llvm-codegen-cache-test.cc:

http://gerrit.cloudera.org:8080/#/c/20733/5/be/src/codegen/llvm-codegen-cache-test.cc@39
PS5, Line 39: const int ENGINE_CACHE_SIZE = 990;
> This is larger than the value that was here before. Was the test failing un
Yeah, this is the real number of the cache containing one compiled function. 
The earlier count was misleading as the cache was empty. I've adjusted the 
testcase, and added LlvmCodeGenCacheTest::CheckObjCacheExists to confirm the 
existence of contents within the cache.


http://gerrit.cloudera.org:8080/#/c/20733/5/be/src/codegen/llvm-codegen-cache-test.cc@176
PS5, Line 176: // The function is to create and return a CodeGenObjectCache 
which contains a specific
> nit: "contains" not "contians"
Done



--
To view, visit http://gerrit.cloudera.org:8080/20733
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677
Gerrit-Change-Number: 20733
Gerrit-PatchSet: 6
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Fri, 08 Dec 2023 00:28:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-12-07 Thread Yida Wu (Code Review)
Yida Wu has uploaded a new patch set (#6). ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..

IMPALA-11805: Use llvm ObjectCache for codegen caching

Currently, we employ llvm::ExecutionEngine for codegen caching,
providing access to compiled functions within the cached engine.
However, the real challenge is the ExecutionEngine uses a lot of
memory which largely exceeds our memory estimates and it is very
hard to predict.

This patch addresses this issue by using llvm::ObjectCache for
codegen caching. In our case, each execution engine would have
only one module, after the compilation of the module, the compiled
codegened functions of the module would be set to the execution
engine, therefore funtions could be used by Impala. During function
compilation within the module, if an ObjectCache is set to the
execution engine, the compiled codegened functions would be also
written into the cache. This way, if we keep the cache, when
revisiting the same module (fragment), we can efficiently reuse
the specific ObjectCache, loading pre-compiled codegened functions
and saving time.

The tpch performance test indicates no significant regression
compared to the previous use of ExecutionEngine. Post-change,
the actual memory usage of each codegen caching entry is notably
reduced.

+--+--+---++-++---++---++-+---+
| Workload | Query| File Format   | Avg(s) | Base Avg(s) | 
Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | 
Tval  |
+--+--+---++-++---++---++-+---+
| TPCH(1)  | TPCH-Q15 | parquet / none / none | 0.48   | 0.46|   +5.16% 
  |   4.31%   |   4.78%| 8 |   +3.51%   | 2.11| 2.22  |
| TPCH(1)  | TPCH-Q17 | parquet / none / none | 1.28   | 1.25|   +1.76% 
  |   1.83%   |   1.88%| 8 |   +3.53%   | 1.34| 1.88  |
| TPCH(1)  | TPCH-Q8  | parquet / none / none | 0.77   | 0.75|   +3.11% 
  |   3.88%   |   2.75%| 8 |   +0.79%   | 0.83| 1.81  |
| TPCH(1)  | TPCH-Q7  | parquet / none / none | 0.72   | 0.71|   +1.89% 
  |   3.48%   |   3.15%| 8 |   +1.61%   | 0.83| 1.13  |
| TPCH(1)  | TPCH-Q22 | parquet / none / none | 0.43   | 0.41|   +3.40% 
  |   1.21%   |   6.11%| 8 |   +0.05%   | 0.45| 1.54  |
| TPCH(1)  | TPCH-Q10 | parquet / none / none | 0.74   | 0.73|   +2.04% 
  |   0.75%   |   2.57%| 8 |   +1.13%   | 2.11| 2.16  |
| TPCH(1)  | TPCH-Q20 | parquet / none / none | 0.55   | 0.54|   +1.42% 
  |   2.95%   |   1.14%| 8 |   +0.19%   | 0.57| 1.25  |
| TPCH(1)  | TPCH-Q11 | parquet / none / none | 0.25   | 0.25|   +0.37% 
  |   2.41%   |   3.09%| 8 |   +0.82%   | 0.45| 0.27  |
| TPCH(1)  | TPCH-Q19 | parquet / none / none | 0.62   | 0.61|   +0.92% 
  |   3.17%   |   1.53%| 8 |   +0.05%   | 0.19| 0.73  |
| TPCH(1)  | TPCH-Q4  | parquet / none / none | 0.42   | 0.42|   +0.37% 
  |   1.24%   |   1.08%| 8 |   +0.03%   | 0.19| 0.63  |
| TPCH(1)  | TPCH-Q5  | parquet / none / none | 0.72   | 0.72|   +0.19% 
  |   3.08%   |   3.23%| 8 |   +0.07%   | 0.57| 0.12  |
| TPCH(1)  | TPCH-Q18 | parquet / none / none | 1.20   | 1.19|   +0.04% 
  |   0.12%   |   0.15%| 8 |   +0.03%   | 0.57| 0.53  |
| TPCH(1)  | TPCH-Q9  | parquet / none / none | 1.13   | 1.13|   +0.02% 
  |   2.05%   |   2.04%| 8 |   -0.01%   | -0.57   | 0.02  |
| TPCH(1)  | TPCH-Q3  | parquet / none / none | 0.60   | 0.60|   +0.03% 
  |   3.71%   |   3.99%| 8 |   -0.04%   | -0.06   | 0.01  |
| TPCH(1)  | TPCH-Q2  | parquet / none / none | 0.44   | 0.44|   -0.70% 
  |   8.56%   |   7.42%| 8 |   +0.61%   | 0.45| -0.18 |
| TPCH(1)  | TPCH-Q6  | parquet / none / none | 0.31   | 0.31|   -0.07% 
  |   1.73%   |   1.56%| 8 |   -0.06%   | -1.34   | -0.09 |
| TPCH(1)  | TPCH-Q14 | parquet / none / none | 0.48   | 0.48|   -0.25% 
  |   1.20%   |   1.47%| 8 |   -0.09%   | -0.57   | -0.37 |
| TPCH(1)  | TPCH-Q12 | parquet / none / none | 0.54   | 0.54|   -0.86% 
  |   1.90%   |   4.48%| 8 |   +0.11%   | 0.32| -0.50 |
| TPCH(1)  | TPCH-Q21 | parquet / none / none | 1.35   | 1.37|   -1.56% 
  |   3.11%   |   2.55%| 8 |   -0.19%   | -0.70   | -1.11 |
| TPCH(1)  | TPCH-Q16 | parquet / none / none | 0.26   | 0.27|   -1.74% 
  |   9.53%   | * 13.61% * | 8 

[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-12-07 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/20733/5/be/src/codegen/llvm-codegen-cache-test.cc
File be/src/codegen/llvm-codegen-cache-test.cc:

http://gerrit.cloudera.org:8080/#/c/20733/5/be/src/codegen/llvm-codegen-cache-test.cc@39
PS5, Line 39: const int ENGINE_CACHE_SIZE = 990;
This is larger than the value that was here before. Was the test failing until 
this patch set?


http://gerrit.cloudera.org:8080/#/c/20733/5/be/src/codegen/llvm-codegen-cache-test.cc@176
PS5, Line 176: // The function to create and return a CodeGenObjectCache which 
contians a specific
nit: "contains" not "contians"



--
To view, visit http://gerrit.cloudera.org:8080/20733
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677
Gerrit-Change-Number: 20733
Gerrit-PatchSet: 5
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Fri, 08 Dec 2023 00:07:42 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-12-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14611/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20733
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677
Gerrit-Change-Number: 20733
Gerrit-PatchSet: 5
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Thu, 07 Dec 2023 13:52:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-12-07 Thread Yida Wu (Code Review)
Yida Wu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20733/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20733/4//COMMIT_MSG@17
PS4, Line 17:  after the compila
> Could you clarify the relationship of the module and exec engine here? I.e.
Yes. Added some comments.



--
To view, visit http://gerrit.cloudera.org:8080/20733
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677
Gerrit-Change-Number: 20733
Gerrit-PatchSet: 5
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Thu, 07 Dec 2023 13:38:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-12-07 Thread Yida Wu (Code Review)
Yida Wu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..


Patch Set 5:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG@28
PS3, Line 28: compared to the previous use of ExecutionEngine. Post-change,
> Doesn't the benchmarking framework print a grand total? If not, we can let
It seems our benchmarking framework doesn't print the total. Will double check 
and file a jira if we don't have it.


http://gerrit.cloudera.org:8080/#/c/20733/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20733/4//COMMIT_MSG@18
PS4, Line 18:  the executi
> Why are they pre-compiled? Aren't they compiled already? Or you mean they a
I think the "pre-compiled" means it is compiled in the last round and ready for 
use. Yeah, maybe call it "compiled" is better, because in this case it means 
the first round to write the compiled stuff into the cache.


http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-cache.h
File be/src/codegen/llvm-codegen-cache.h:

http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-cache.h@164
PS4, Line 164: cached_engine
> Thinking about this, wouldn't 'cached_engine...' express the meaning better
Done


http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-cache.cc
File be/src/codegen/llvm-codegen-cache.cc:

http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-cache.cc@155
PS4, Line 155: y
> I think removing the negation and inverting the branches is better, one les
Done


http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-object-cache.cc
File be/src/codegen/llvm-codegen-object-cache.cc:

http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-object-cache.cc@39
PS4, Line 39: uffer().size();
> Is a copy necessary here? Can't we move the buffer somehow?
CodeGenObjectCache doesn't have the ownership of ObjBuffer, the llvm engine may 
use the ObjBuffer later, so it is safer to make a copy. (And MemoryBufferRef 
may not allow to move the real Buffer, the example I can see is to do a copy)


http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-object-cache.cc@42
PS4, Line 42: // can conserve memory.
> Shouldn't we add a DCHECK here? If this happens it is a bug.
Done


http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-object-cache.cc@57
PS4, Line 57:   // ID. If the ID doesn't match the one in the object cache, a 
warning is logged for
> Will it always copy the buffer? Is there a way to access it without copying
>From the definition of the interface getObject(), "Returns a pointer to a 
>newly allocated MemoryBuffer that contains the object which corresponds with 
>Module M", it would be safer to return the copy of the buffer. 
>https://llvm.org/doxygen/classllvm_1_1ObjectCache.html#details


http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-object-cache.cc@60
PS4, Line 60:   std::lock_guard l(mutex_);
> Shouldn't we add a DCHECK here? If this happens it is a bug.
Done


http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-object-cache.cc@63
PS4, Line 63:   << cached_obj_->getMemBufferRef().getBufferSize();
> Can this function be called when 'key_' is empty? Isn't that a bug?
If there is no cache yet, would return nullptr, for example, the llvm engine 
would look up the cache to see if it contains the pre-compiled module, if it 
doesn't contain any, the llvm engine would proceed to compile the module and 
then write compiled module to the cache. This happens in cache look up and 
doesn't hit any.


http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen.h
File be/src/codegen/llvm-codegen.h:

http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen.h@330
PS4, Line 330:
> See comment on llvm-codegen-cache.h:164.
Changed the one in llvm-codegen-cache.h, but would prefer to keep this, because 
this one is the cache for llvm engine to write the compiled functions to, it 
doesn't have to be anything "cached". Added some comments.


http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen.h@925
PS4, Line 925:
> Do I understand it correctly that this is for writing to the cache (storing
Yes, we can say that the engine_cache_ is for writing, and engine_cache_cached_ 
is for reading. Added some comments.


http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen.h@929
PS4, Line 929:
> We could extend the comment with what is written on llvm-codegen.cc:1499, i
Added some comments.


http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen.cc
File be/src/codegen/llvm-codegen.cc:

http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen.cc@1192
PS4, 

[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-12-07 Thread Yida Wu (Code Review)
Yida Wu has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..

IMPALA-11805: Use llvm ObjectCache for codegen caching

Currently, we employ llvm::ExecutionEngine for codegen caching,
providing access to compiled functions within the cached engine.
However, the real challenge is the ExecutionEngine uses a lot of
memory which largely exceeds our memory estimates and it is very
hard to predict.

This patch addresses this issue by using llvm::ObjectCache for
codegen caching. In our case, each execution engine would have
only one module, after the compilation of the module, the compiled
codegened functions of the module would be set to the execution
engine, therefore funtions could be used by Impala. During function
compilation within the module, if an ObjectCache is set to the
execution engine, the compiled codegened functions would be also
written into the cache. This way, if we keep the cache, when
revisiting the same module (fragment), we can efficiently reuse
the specific ObjectCache, loading pre-compiled codegened functions
and saving time.

The tpch performance test indicates no significant regression
compared to the previous use of ExecutionEngine. Post-change,
the actual memory usage of each codegen caching entry is notably
reduced.

+--+--+---++-++---++---++-+---+
| Workload | Query| File Format   | Avg(s) | Base Avg(s) | 
Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | 
Tval  |
+--+--+---++-++---++---++-+---+
| TPCH(1)  | TPCH-Q15 | parquet / none / none | 0.48   | 0.46|   +5.16% 
  |   4.31%   |   4.78%| 8 |   +3.51%   | 2.11| 2.22  |
| TPCH(1)  | TPCH-Q17 | parquet / none / none | 1.28   | 1.25|   +1.76% 
  |   1.83%   |   1.88%| 8 |   +3.53%   | 1.34| 1.88  |
| TPCH(1)  | TPCH-Q8  | parquet / none / none | 0.77   | 0.75|   +3.11% 
  |   3.88%   |   2.75%| 8 |   +0.79%   | 0.83| 1.81  |
| TPCH(1)  | TPCH-Q7  | parquet / none / none | 0.72   | 0.71|   +1.89% 
  |   3.48%   |   3.15%| 8 |   +1.61%   | 0.83| 1.13  |
| TPCH(1)  | TPCH-Q22 | parquet / none / none | 0.43   | 0.41|   +3.40% 
  |   1.21%   |   6.11%| 8 |   +0.05%   | 0.45| 1.54  |
| TPCH(1)  | TPCH-Q10 | parquet / none / none | 0.74   | 0.73|   +2.04% 
  |   0.75%   |   2.57%| 8 |   +1.13%   | 2.11| 2.16  |
| TPCH(1)  | TPCH-Q20 | parquet / none / none | 0.55   | 0.54|   +1.42% 
  |   2.95%   |   1.14%| 8 |   +0.19%   | 0.57| 1.25  |
| TPCH(1)  | TPCH-Q11 | parquet / none / none | 0.25   | 0.25|   +0.37% 
  |   2.41%   |   3.09%| 8 |   +0.82%   | 0.45| 0.27  |
| TPCH(1)  | TPCH-Q19 | parquet / none / none | 0.62   | 0.61|   +0.92% 
  |   3.17%   |   1.53%| 8 |   +0.05%   | 0.19| 0.73  |
| TPCH(1)  | TPCH-Q4  | parquet / none / none | 0.42   | 0.42|   +0.37% 
  |   1.24%   |   1.08%| 8 |   +0.03%   | 0.19| 0.63  |
| TPCH(1)  | TPCH-Q5  | parquet / none / none | 0.72   | 0.72|   +0.19% 
  |   3.08%   |   3.23%| 8 |   +0.07%   | 0.57| 0.12  |
| TPCH(1)  | TPCH-Q18 | parquet / none / none | 1.20   | 1.19|   +0.04% 
  |   0.12%   |   0.15%| 8 |   +0.03%   | 0.57| 0.53  |
| TPCH(1)  | TPCH-Q9  | parquet / none / none | 1.13   | 1.13|   +0.02% 
  |   2.05%   |   2.04%| 8 |   -0.01%   | -0.57   | 0.02  |
| TPCH(1)  | TPCH-Q3  | parquet / none / none | 0.60   | 0.60|   +0.03% 
  |   3.71%   |   3.99%| 8 |   -0.04%   | -0.06   | 0.01  |
| TPCH(1)  | TPCH-Q2  | parquet / none / none | 0.44   | 0.44|   -0.70% 
  |   8.56%   |   7.42%| 8 |   +0.61%   | 0.45| -0.18 |
| TPCH(1)  | TPCH-Q6  | parquet / none / none | 0.31   | 0.31|   -0.07% 
  |   1.73%   |   1.56%| 8 |   -0.06%   | -1.34   | -0.09 |
| TPCH(1)  | TPCH-Q14 | parquet / none / none | 0.48   | 0.48|   -0.25% 
  |   1.20%   |   1.47%| 8 |   -0.09%   | -0.57   | -0.37 |
| TPCH(1)  | TPCH-Q12 | parquet / none / none | 0.54   | 0.54|   -0.86% 
  |   1.90%   |   4.48%| 8 |   +0.11%   | 0.32| -0.50 |
| TPCH(1)  | TPCH-Q21 | parquet / none / none | 1.35   | 1.37|   -1.56% 
  |   3.11%   |   2.55%| 8 |   -0.19%   | -0.70   | -1.11 |
| TPCH(1)  | TPCH-Q16 | parquet / none / none | 0.26   | 0.27|   -1.74% 
  |   9.53%   | * 13.61% * | 8 

[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-12-05 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..


Patch Set 4:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG@28
PS3, Line 28:
> Do you mean run the time running all the tpch queries for once? Did a test,
Doesn't the benchmarking framework print a grand total? If not, we can let it 
be like this.


http://gerrit.cloudera.org:8080/#/c/20733/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20733/4//COMMIT_MSG@17
PS4, Line 17: (execution engine)
Could you clarify the relationship of the module and exec engine here? I.e. one 
execution engine corresponds to exactly one module - is it correct?


http://gerrit.cloudera.org:8080/#/c/20733/4//COMMIT_MSG@18
PS4, Line 18: pre-compiled
Why are they pre-compiled? Aren't they compiled already? Or you mean they are 
not compiled to machine code? Then "partially compiled" or "optimized" may be 
better.


http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-cache.h
File be/src/codegen/llvm-codegen-cache.h:

http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-cache.h@164
PS4, Line 164: engine_cache_
Thinking about this, wouldn't 'cached_engine...' express the meaning better? 
This is not a cache of multiple engines but a cached instance of an engine, or 
even better, its functions. Maybe 'cached_module...'?

This applies also to the other variables named similarly in this file.


http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-cache.cc
File be/src/codegen/llvm-codegen-cache.cc:

http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-cache.cc@155
PS4, Line 155: !
I think removing the negation and inverting the branches is better, one less 
negation to process mentally. See also L179.


http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-object-cache.cc
File be/src/codegen/llvm-codegen-object-cache.cc:

http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-object-cache.cc@39
PS4, Line 39: getMemBufferCopy
Is a copy necessary here? Can't we move the buffer somehow?


http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-object-cache.cc@42
PS4, Line 42: LOG(WARNING) << "Inserting a different module in 
CodeGenObjectCache " << this
Shouldn't we add a DCHECK here? If this happens it is a bug.


http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-object-cache.cc@57
PS4, Line 57: return llvm::MemoryBuffer::getMemBufferCopy(
Will it always copy the buffer? Is there a way to access it without copying?


http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-object-cache.cc@60
PS4, Line 60: LOG(WARNING) << "Object for module id " << module_id << " 
can't be loaded from cache."
Shouldn't we add a DCHECK here? If this happens it is a bug.


http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-object-cache.cc@63
PS4, Line 63:   return nullptr;
Can this function be called when 'key_' is empty? Isn't that a bug?


http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen.h
File be/src/codegen/llvm-codegen.h:

http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen.h@330
PS4, Line 330: engine_cache
See comment on llvm-codegen-cache.h:164.


http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen.h@925
PS4, Line 925:   /// Object cache for engine.
Do I understand it correctly that this is for writing to the cache (storing the 
module we compiled) and 'engine_cache_cached_' is for reading (when we retrieve 
a module from the cache)? It could be clarified in the comment(s).


http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen.h@929
PS4, Line 929:   /// Not null only when there is a cache hit.
We could extend the comment with what is written on llvm-codegen.cc:1499, i.e. 
that we need the shared_ptr so that the lifetime of this object doesn't end 
even if it is evicted from the cache while we're using it.


http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen.cc
File be/src/codegen/llvm-codegen.cc:

http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen.cc@1192
PS4, Line 1192: entire execution engine
We no longer cache the engine.


http://gerrit.cloudera.org:8080/#/c/20733/4/tests/custom_cluster/test_codegen_cache.py
File tests/custom_cluster/test_codegen_cache.py:

http://gerrit.cloudera.org:8080/#/c/20733/4/tests/custom_cluster/test_codegen_cache.py@169
PS4, Line 169: When an 'llvm::ExecutionEngine', which is part of the cache 
entry, is destroyed, it
The situation described in this paragraph is no longer true after this change. 
We should write that this is how it used to work and also 

[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-12-04 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen-cache-test.cc
File be/src/codegen/llvm-codegen-cache-test.cc:

http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen-cache-test.cc@298
PS2, Line 298: void 
LlvmCodeGenCacheTest::TestAtCapacity(TCodeGenCacheMode::type mode) {
> Did a test in both arm and x86. In our testcase of mem_charge_1, the mem ch
Ack


http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc
File be/src/codegen/llvm-codegen.cc:

http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc@a1185
PS2, Line 1185: 
> Done
Ack


http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-object-cache-wrapper.h
File be/src/codegen/llvm-object-cache-wrapper.h:

http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-object-cache-wrapper.h@44
PS2, Line 44:
> Done
Ack


http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-object-cache-wrapper.cc
File be/src/codegen/llvm-object-cache-wrapper.cc:

http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-object-cache-wrapper.cc@66
PS2, Line 66:
> Yeah, this is a good point. Added here.
Ack



--
To view, visit http://gerrit.cloudera.org:8080/20733
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677
Gerrit-Change-Number: 20733
Gerrit-PatchSet: 4
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Mon, 04 Dec 2023 18:45:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-12-04 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..


Patch Set 4: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/20733
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677
Gerrit-Change-Number: 20733
Gerrit-PatchSet: 4
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Mon, 04 Dec 2023 18:44:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-12-02 Thread Yida Wu (Code Review)
Yida Wu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..


Patch Set 4:

(12 comments)

Thanks Daniel for reviewing.

http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG@12
PS3, Line 12: which largely exceeds our memo
> I think this should come before "which largely exceeds...", because "which
Done


http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG@17
PS3, Line 17:
> Nit: space before '('.
Done


http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG@18
PS3, Line 18: pre-compiled
: codegened
> Do you mean the codegened functions? First I thought it referred to the fun
Yeah, changed it to pre-compiled codegened functions.


http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG@28
PS3, Line 28:
> Could you also include the aggregate result over all queries?
Do you mean run the time running all the tpch queries for once? Did a test, run 
only once for all the TPCH queries, the result of TPCH-Q1 for Execution Engine 
looks suspicious (maybe because it is the first of all to run), but all others 
look similar.
Query   Execution EngineObject Cache
Q1  4   1.68
Q10 1.521.67
Q11 0.8 0.8
Q12 0.680.58
Q13 1.091.29
Q14 0.820.82
Q15 0.890.88
Q16 0.8 0.75
Q17 1.091.09
Q18 1.151.25
Q19 2.072.11
Q2  1.361.36
Q20 1.051.06
Q21 1.911.91
Q22 0.630.63
Q3  0.940.93
Q4  0.480.47
Q5  1.451.45
Q6  0.260.26
Q7  2.2 2.15
Q8  1.811.81
Q9  1.851.85
Total   28.85   26.8


http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache-test.cc
File be/src/codegen/llvm-codegen-cache-test.cc:

http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache-test.cc@99
PS3, Line 99: dule_id, bool
> We could include the parameter names here and on the next line if 'codegen'
Done


http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache-test.cc@115
PS3, Line 115:
> Maybe 'cached_engine' would be better, also on L117.
Done


http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache-test.cc@132
PS3, Line 132:
> I'd prefer something like 'codegen_obj_cache_', it's easier to get what it
Done


http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache-test.cc@138
PS3, Line 138:
> Could it be const?
Done


http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache-test.cc@142
PS3, Line 142:
> Could use 'nullptr' instead.
Done


http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache-test.cc@225
PS3, Line 225: te a Llvm
> 'module_id_echo' would be better, cf. 'module_id_double'.
Done


http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache.h
File be/src/codegen/llvm-codegen-cache.h:

http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache.h@231
PS3, Line 231: engine_cache_raw
> Maybe 'engine_cache_raw[_ptr]' would be better.
Done


http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache.cc
File be/src/codegen/llvm-codegen-cache.cc:

http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache.cc@154
PS3, Line 154:   Slice key;
> The assignment could be made in an IF-ELSE (or a ? : operator). Now we get/
Done



--
To view, visit http://gerrit.cloudera.org:8080/20733
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677
Gerrit-Change-Number: 20733
Gerrit-PatchSet: 4
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Sat, 02 Dec 2023 13:15:54 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-12-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14575/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20733
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677
Gerrit-Change-Number: 20733
Gerrit-PatchSet: 4
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Sat, 02 Dec 2023 13:08:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-12-02 Thread Yida Wu (Code Review)
Yida Wu has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..

IMPALA-11805: Use llvm ObjectCache for codegen caching

Currently, we employ llvm::ExecutionEngine for codegen caching,
providing access to compiled functions within the cached engine.
However, the real challenge is the ExecutionEngine uses a lot of
memory which largely exceeds our memory estimates and it is very
hard to predict.

This patch addresses this issue by using llvm::ObjectCache for
codegen caching. The approach involves associating an ObjectCache
with the module (execution engine). During function compilation
within the module, if an ObjectCache is set, the pre-compiled
codegened functions are written into the cache. This way, when
revisiting the same module (fragment), we can efficiently reuse
the specific ObjectCache, loading pre-compiled codegened functions
and saving time.

The tpch performance test indicates no significant regression
compared to the previous use of ExecutionEngine. Post-change,
the actual memory usage of each codegen caching entry is notably
reduced.

+--+--+---++-++---++---++-+---+
| Workload | Query| File Format   | Avg(s) | Base Avg(s) | 
Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | 
Tval  |
+--+--+---++-++---++---++-+---+
| TPCH(1)  | TPCH-Q15 | parquet / none / none | 0.48   | 0.46|   +5.16% 
  |   4.31%   |   4.78%| 8 |   +3.51%   | 2.11| 2.22  |
| TPCH(1)  | TPCH-Q17 | parquet / none / none | 1.28   | 1.25|   +1.76% 
  |   1.83%   |   1.88%| 8 |   +3.53%   | 1.34| 1.88  |
| TPCH(1)  | TPCH-Q8  | parquet / none / none | 0.77   | 0.75|   +3.11% 
  |   3.88%   |   2.75%| 8 |   +0.79%   | 0.83| 1.81  |
| TPCH(1)  | TPCH-Q7  | parquet / none / none | 0.72   | 0.71|   +1.89% 
  |   3.48%   |   3.15%| 8 |   +1.61%   | 0.83| 1.13  |
| TPCH(1)  | TPCH-Q22 | parquet / none / none | 0.43   | 0.41|   +3.40% 
  |   1.21%   |   6.11%| 8 |   +0.05%   | 0.45| 1.54  |
| TPCH(1)  | TPCH-Q10 | parquet / none / none | 0.74   | 0.73|   +2.04% 
  |   0.75%   |   2.57%| 8 |   +1.13%   | 2.11| 2.16  |
| TPCH(1)  | TPCH-Q20 | parquet / none / none | 0.55   | 0.54|   +1.42% 
  |   2.95%   |   1.14%| 8 |   +0.19%   | 0.57| 1.25  |
| TPCH(1)  | TPCH-Q11 | parquet / none / none | 0.25   | 0.25|   +0.37% 
  |   2.41%   |   3.09%| 8 |   +0.82%   | 0.45| 0.27  |
| TPCH(1)  | TPCH-Q19 | parquet / none / none | 0.62   | 0.61|   +0.92% 
  |   3.17%   |   1.53%| 8 |   +0.05%   | 0.19| 0.73  |
| TPCH(1)  | TPCH-Q4  | parquet / none / none | 0.42   | 0.42|   +0.37% 
  |   1.24%   |   1.08%| 8 |   +0.03%   | 0.19| 0.63  |
| TPCH(1)  | TPCH-Q5  | parquet / none / none | 0.72   | 0.72|   +0.19% 
  |   3.08%   |   3.23%| 8 |   +0.07%   | 0.57| 0.12  |
| TPCH(1)  | TPCH-Q18 | parquet / none / none | 1.20   | 1.19|   +0.04% 
  |   0.12%   |   0.15%| 8 |   +0.03%   | 0.57| 0.53  |
| TPCH(1)  | TPCH-Q9  | parquet / none / none | 1.13   | 1.13|   +0.02% 
  |   2.05%   |   2.04%| 8 |   -0.01%   | -0.57   | 0.02  |
| TPCH(1)  | TPCH-Q3  | parquet / none / none | 0.60   | 0.60|   +0.03% 
  |   3.71%   |   3.99%| 8 |   -0.04%   | -0.06   | 0.01  |
| TPCH(1)  | TPCH-Q2  | parquet / none / none | 0.44   | 0.44|   -0.70% 
  |   8.56%   |   7.42%| 8 |   +0.61%   | 0.45| -0.18 |
| TPCH(1)  | TPCH-Q6  | parquet / none / none | 0.31   | 0.31|   -0.07% 
  |   1.73%   |   1.56%| 8 |   -0.06%   | -1.34   | -0.09 |
| TPCH(1)  | TPCH-Q14 | parquet / none / none | 0.48   | 0.48|   -0.25% 
  |   1.20%   |   1.47%| 8 |   -0.09%   | -0.57   | -0.37 |
| TPCH(1)  | TPCH-Q12 | parquet / none / none | 0.54   | 0.54|   -0.86% 
  |   1.90%   |   4.48%| 8 |   +0.11%   | 0.32| -0.50 |
| TPCH(1)  | TPCH-Q21 | parquet / none / none | 1.35   | 1.37|   -1.56% 
  |   3.11%   |   2.55%| 8 |   -0.19%   | -0.70   | -1.11 |
| TPCH(1)  | TPCH-Q16 | parquet / none / none | 0.26   | 0.27|   -1.74% 
  |   9.53%   | * 13.61% * | 8 |   -0.32%   | -0.06   | -0.30 |
| TPCH(1)  | TPCH-Q1  | parquet / none / none | 0.64   | 0.67|   -4.64% 
  |   1.13%   | * 12.59% * | 8 |   -0.28%   | -0.19   | -1.04 |
| 

[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-11-30 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..


Patch Set 3:

(12 comments)

Couldn't go through all of it yet, here are some comments.

http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG@12
PS3, Line 12: and it is very hard to predict
I think this should come before "which largely exceeds...", because "which 
largely exceeds..." refers to "uses a lot of memory".


http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG@17
PS3, Line 17: (
Nit: space before '('.
Could you clarify the relationship of the module and exec engine here?


http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG@18
PS3, Line 18: pre-compiled
: functions
Do you mean the codegened functions? First I thought it referred to the 
functions compiled from C++ to IR, maybe it could be clarified.
See also L21.


http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG@28
PS3, Line 28: 
+--+--+---++-++---++---++-+---+
Could you also include the aggregate result over all queries?


http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache-test.cc
File be/src/codegen/llvm-codegen-cache-test.cc:

http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache-test.cc@99
PS3, Line 99: string*, bool
We could include the parameter names here and on the next line if 'codegen' is 
included... I can see that on L101 the name of the 'Function**' parameter is 
omitted, but on L98 all names are included. This should be uniform, I think the 
best would be to add the name on L101 too.


http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache-test.cc@115
PS3, Line 115: engine
Maybe 'cached_engine' would be better, also on L117.


http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache-test.cc@132
PS3, Line 132: obj_cache_
I'd prefer something like 'codegen_obj_cache_', it's easier to get what it is 
just by looking at the name.


http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache-test.cc@138
PS3, Line 138: string&
Could it be const?


http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache-test.cc@142
PS3, Line 142: NULL
Could use 'nullptr' instead.


http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache-test.cc@225
PS3, Line 225: module_id
'module_id_echo' would be better, cf. 'module_id_double'.


http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache.h
File be/src/codegen/llvm-codegen-cache.h:

http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache.h@231
PS3, Line 231: engine_cache_ptr
Maybe 'engine_cache_raw[_ptr]' would be better.
Also, in the .cc file, the parameter names of this function have not been 
updated, please update them.


http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache.cc
File be/src/codegen/llvm-codegen-cache.cc:

http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache.cc@154
PS3, Line 154:   Slice key = cache_key.optimal_key_slice();
The assignment could be made in an IF-ELSE (or a ? : operator). Now we 
get/calculate the optimal key slice even in non-optimal mode.



--
To view, visit http://gerrit.cloudera.org:8080/20733
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677
Gerrit-Change-Number: 20733
Gerrit-PatchSet: 3
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Thu, 30 Nov 2023 16:19:21 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-11-30 Thread Yida Wu (Code Review)
Yida Wu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen-cache-test.cc
File be/src/codegen/llvm-codegen-cache-test.cc:

http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen-cache-test.cc@298
PS2, Line 298:   // 172B for optimal mode, or 180B on ARM systems
> Oh, I meant the largest difference I saw between normal and optimal mode is
Did a test in both arm and x86. In our testcase of mem_charge_1, the mem charge 
in arm is always 8B larger than x86 in normal and optimal mode.
For mem_charge_1, arm 188B/180B, x86 180B/172B. The current size setting should 
be good for testing, and I can pass the test in arm.


http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc
File be/src/codegen/llvm-codegen.cc:

http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc@1385
PS2, Line 1385:
> So bytes_allocated is still needed, which means we still want https://gerri
>From the source code and gdb debugging that I can see, the bytes_allocated 
>will be used when loading from the pre-compiled object. 
>https://llvm.org/doxygen/MCJIT_8cpp_source.html#l00224. While the manage usage 
>of pre-compiled object is currently managed by the codegen cache if codegen 
>cache is enabled, the bytes_allocated is managed by the runtime execution 
>engine, I think bytes_allocated is still useful for runtime use no matter 
>codegen cache is on or off.



--
To view, visit http://gerrit.cloudera.org:8080/20733
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677
Gerrit-Change-Number: 20733
Gerrit-PatchSet: 3
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Thu, 30 Nov 2023 10:00:35 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-11-29 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..


Patch Set 3:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen-cache-test.cc
File be/src/codegen/llvm-codegen-cache-test.cc:

http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen-cache-test.cc@298
PS2, Line 298:   // 172B for optimal mode, or 180B on ARM systems
> Adjusted the value after adding sizeof(CodeGenObjectCache).
Oh, I meant the largest difference I saw between normal and optimal mode is 8B. 
We should double-check the difference between x86 and aarch64.


http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen-cache.h
File be/src/codegen/llvm-codegen-cache.h:

http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen-cache.h@308
PS2, Line 308:   std::unordered_map>
> When creating a temporary shared_ptr from the raw pointer, there might be a
Ack


http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.h
File be/src/codegen/llvm-codegen.h:

http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.h@33
PS2, Line 33: #include 
> Done
Ack


http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.h@353
PS2, Line 353:   /// retrieval. If module_id is not nullptr, the final module 
id is returned.
> Done
Ack


http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.h@920
PS2, Line 920:   llvm::Module* module_;
> Done
Ack


http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc
File be/src/codegen/llvm-codegen.cc:

http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc@496
PS2, Line 496:   engine_cache_ = make_shared();
> Done
Ack


http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc@498
PS2, Line 498:   RETURN_IF_ERROR(LoadIntrinsics());
> Done
Ack


http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc@499
PS2, Line 499:
> Done
Ack


http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc@1385
PS2, Line 1385:
> Done
So bytes_allocated is still needed, which means we still want 
https://gerrit.cloudera.org/c/20697/ as well.



--
To view, visit http://gerrit.cloudera.org:8080/20733
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677
Gerrit-Change-Number: 20733
Gerrit-PatchSet: 3
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Wed, 29 Nov 2023 16:36:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-11-29 Thread Yida Wu (Code Review)
Yida Wu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..


Patch Set 3:

(12 comments)

Thanks Michael for the review.

http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen-cache-test.cc
File be/src/codegen/llvm-codegen-cache-test.cc:

http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen-cache-test.cc@298
PS2, Line 298:   // 172B for optimal mode, or 180B on ARM systems
> I experimented with this. In practice I saw minimal (<= 8B) difference betw
Adjusted the value after adding sizeof(CodeGenObjectCache).


http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen-cache.h
File be/src/codegen/llvm-codegen-cache.h:

http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen-cache.h@308
PS2, Line 308:   std::unordered_map>
> This would also work as a std::unordered_set. https://en.cppreference.com/w
When creating a temporary shared_ptr from the raw pointer, there might be a 
double-free problem. I'm uncertain if there's a more optimal solution. However, 
considering the cache entry is typically over 1k bytes while the 8-byte pointer 
is relatively small, it doesn't seem like a significant concern in this context.


http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.h
File be/src/codegen/llvm-codegen.h:

http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.h@33
PS2, Line 33: #include 
> What's this used for?
Done


http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.h@353
PS2, Line 353:   /// retrieval. If module_id is not nullptr, the final module 
id is returned.
> Can you expand what module_id is used for. Would help to identify when to p
Done


http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.h@920
PS2, Line 920:   llvm::Module* module_;
> engine_cache_wrapper no longer owns an execution engine, so this comment is
Done


http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc
File be/src/codegen/llvm-codegen.cc:

http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc@a1185
PS2, Line 1185:
> I thought this comment was kind of useful. And I see you still use it at li
Done


http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc@496
PS2, Line 496:   engine_cache_ = make_shared();
> Why initialize it this way? You could also just pass "new CodegenObjectCach
Done


http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc@498
PS2, Line 498:   RETURN_IF_ERROR(LoadIntrinsics());
> You could directly assign to execution_engine_ on line 479. Lots of extra m
Done


http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc@499
PS2, Line 499:
> Why not assign directly to symbol_emitter_?
Done


http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc@1385
PS2, Line 1385:
> Is this still right? Shouldn't it be the object cache size, since that's wh
Done


http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-object-cache-wrapper.h
File be/src/codegen/llvm-object-cache-wrapper.h:

http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-object-cache-wrapper.h@44
PS2, Line 44:
> possibly other objects seems weird as there aren't other objects. Do we rea
Done


http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-object-cache-wrapper.cc
File be/src/codegen/llvm-object-cache-wrapper.cc:

http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-object-cache-wrapper.cc@66
PS2, Line 66:
> Is sizeof(CodeGenObjectCache) accounted for elsewhere?
Yeah, this is a good point. Added here.



--
To view, visit http://gerrit.cloudera.org:8080/20733
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677
Gerrit-Change-Number: 20733
Gerrit-PatchSet: 3
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Wed, 29 Nov 2023 14:37:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-11-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14548/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20733
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677
Gerrit-Change-Number: 20733
Gerrit-PatchSet: 3
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Comment-Date: Wed, 29 Nov 2023 14:33:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-11-29 Thread Yida Wu (Code Review)
Yida Wu has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..

IMPALA-11805: Use llvm ObjectCache for codegen caching

Currently, we employ llvm::ExecutionEngine for codegen caching,
providing access to compiled functions within the cached engine.
However, the real challenge is the ExecutionEngine uses a lot of
memory and it is very hard to predict which largely exceeds our
memory estimates.

This patch addresses this issue by using llvm::ObjectCache for
codegen caching. The approach involves associating an ObjectCache
with the module(execution engine). During function compilation
within the module, if an ObjectCache is set, the pre-compiled
functions are written into the cache. This way, when revisiting
the same module (fragment), we can efficiently reuse the specific
ObjectCache, loading pre-compiled functions and saving time.

The tpch performance test indicates no significant regression
compared to the previous use of ExecutionEngine. Post-change,
the actual memory usage of each codegen caching entry is notably
reduced.

+--+--+---++-++---++---++-+---+
| Workload | Query| File Format   | Avg(s) | Base Avg(s) | 
Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | 
Tval  |
+--+--+---++-++---++---++-+---+
| TPCH(1)  | TPCH-Q15 | parquet / none / none | 0.48   | 0.46|   +5.16% 
  |   4.31%   |   4.78%| 8 |   +3.51%   | 2.11| 2.22  |
| TPCH(1)  | TPCH-Q17 | parquet / none / none | 1.28   | 1.25|   +1.76% 
  |   1.83%   |   1.88%| 8 |   +3.53%   | 1.34| 1.88  |
| TPCH(1)  | TPCH-Q8  | parquet / none / none | 0.77   | 0.75|   +3.11% 
  |   3.88%   |   2.75%| 8 |   +0.79%   | 0.83| 1.81  |
| TPCH(1)  | TPCH-Q7  | parquet / none / none | 0.72   | 0.71|   +1.89% 
  |   3.48%   |   3.15%| 8 |   +1.61%   | 0.83| 1.13  |
| TPCH(1)  | TPCH-Q22 | parquet / none / none | 0.43   | 0.41|   +3.40% 
  |   1.21%   |   6.11%| 8 |   +0.05%   | 0.45| 1.54  |
| TPCH(1)  | TPCH-Q10 | parquet / none / none | 0.74   | 0.73|   +2.04% 
  |   0.75%   |   2.57%| 8 |   +1.13%   | 2.11| 2.16  |
| TPCH(1)  | TPCH-Q20 | parquet / none / none | 0.55   | 0.54|   +1.42% 
  |   2.95%   |   1.14%| 8 |   +0.19%   | 0.57| 1.25  |
| TPCH(1)  | TPCH-Q11 | parquet / none / none | 0.25   | 0.25|   +0.37% 
  |   2.41%   |   3.09%| 8 |   +0.82%   | 0.45| 0.27  |
| TPCH(1)  | TPCH-Q19 | parquet / none / none | 0.62   | 0.61|   +0.92% 
  |   3.17%   |   1.53%| 8 |   +0.05%   | 0.19| 0.73  |
| TPCH(1)  | TPCH-Q4  | parquet / none / none | 0.42   | 0.42|   +0.37% 
  |   1.24%   |   1.08%| 8 |   +0.03%   | 0.19| 0.63  |
| TPCH(1)  | TPCH-Q5  | parquet / none / none | 0.72   | 0.72|   +0.19% 
  |   3.08%   |   3.23%| 8 |   +0.07%   | 0.57| 0.12  |
| TPCH(1)  | TPCH-Q18 | parquet / none / none | 1.20   | 1.19|   +0.04% 
  |   0.12%   |   0.15%| 8 |   +0.03%   | 0.57| 0.53  |
| TPCH(1)  | TPCH-Q9  | parquet / none / none | 1.13   | 1.13|   +0.02% 
  |   2.05%   |   2.04%| 8 |   -0.01%   | -0.57   | 0.02  |
| TPCH(1)  | TPCH-Q3  | parquet / none / none | 0.60   | 0.60|   +0.03% 
  |   3.71%   |   3.99%| 8 |   -0.04%   | -0.06   | 0.01  |
| TPCH(1)  | TPCH-Q2  | parquet / none / none | 0.44   | 0.44|   -0.70% 
  |   8.56%   |   7.42%| 8 |   +0.61%   | 0.45| -0.18 |
| TPCH(1)  | TPCH-Q6  | parquet / none / none | 0.31   | 0.31|   -0.07% 
  |   1.73%   |   1.56%| 8 |   -0.06%   | -1.34   | -0.09 |
| TPCH(1)  | TPCH-Q14 | parquet / none / none | 0.48   | 0.48|   -0.25% 
  |   1.20%   |   1.47%| 8 |   -0.09%   | -0.57   | -0.37 |
| TPCH(1)  | TPCH-Q12 | parquet / none / none | 0.54   | 0.54|   -0.86% 
  |   1.90%   |   4.48%| 8 |   +0.11%   | 0.32| -0.50 |
| TPCH(1)  | TPCH-Q21 | parquet / none / none | 1.35   | 1.37|   -1.56% 
  |   3.11%   |   2.55%| 8 |   -0.19%   | -0.70   | -1.11 |
| TPCH(1)  | TPCH-Q16 | parquet / none / none | 0.26   | 0.27|   -1.74% 
  |   9.53%   | * 13.61% * | 8 |   -0.32%   | -0.06   | -0.30 |
| TPCH(1)  | TPCH-Q1  | parquet / none / none | 0.64   | 0.67|   -4.64% 
  |   1.13%   | * 12.59% * | 8 |   -0.28%   | -0.19   | -1.04 |
| TPCH(1)  | TPCH-Q13 | 

[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-11-28 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc
File be/src/codegen/llvm-codegen.cc:

http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc@1385
PS2, Line 1385: if (!mem_tracker_->TryConsume(bytes_allocated)) {
Is this still right? Shouldn't it be the object cache size, since that's what 
we're retaining?

I would expect bytes_allocated() to be removed in this patch.



--
To view, visit http://gerrit.cloudera.org:8080/20733
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677
Gerrit-Change-Number: 20733
Gerrit-PatchSet: 2
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Comment-Date: Tue, 28 Nov 2023 18:22:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-11-28 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..


Patch Set 2:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen-cache-test.cc
File be/src/codegen/llvm-codegen-cache-test.cc:

http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen-cache-test.cc@298
PS2, Line 298:   // 130B for optimal mode, or 148B on ARM systems
I experimented with this. In practice I saw minimal (<= 8B) difference between 
normal and optimal modes.


http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen-cache.h
File be/src/codegen/llvm-codegen-cache.h:

http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen-cache.h@308
PS2, Line 308:   std::unordered_map>
This would also work as a std::unordered_set. 
https://en.cppreference.com/w/cpp/memory/shared_ptr/hash means the lookup would 
be based on pointer value. Lookups would construct a temporary shared_ptr, but 
it would reduce memory use.


http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.h
File be/src/codegen/llvm-codegen.h:

http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.h@33
PS2, Line 33: #include 
What's this used for?


http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.h@353
PS2, Line 353:   /// Returns the module id  if module_id is not nullptr.
Can you expand what module_id is used for. Would help to identify when to pass 
a non-null argument.


http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.h@920
PS2, Line 920:   /// engine_cache_wrapper_.
engine_cache_wrapper no longer owns an execution engine, so this comment is 
incorrect. Should be

  module_ is owned by execution_engine_.


http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc
File be/src/codegen/llvm-codegen.cc:

http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc@a1185
PS2, Line 1185:
I thought this comment was kind of useful. And I see you still use it at line 
1207, so we should keep them all.


http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc@496
PS2, Line 496:   unique_ptr obj_cache(new 
CodeGenObjectCache);
Why initialize it this way? You could also just pass "new CodegenObjectCache" 
on line 503.


http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc@498
PS2, Line 498:   execution_engine_ = std::move(execution_engine);
You could directly assign to execution_engine_ on line 479. Lots of extra moves 
here for no additional use or clarity.


http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc@499
PS2, Line 499:   unique_ptr symbol_emitter =
Why not assign directly to symbol_emitter_?


http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-object-cache-wrapper.h
File be/src/codegen/llvm-object-cache-wrapper.h:

http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-object-cache-wrapper.h@44
PS2, Line 44: /// Class that encapsulates a non-NULL 'CodeGenObjectCache' and 
possibly other objects
possibly other objects seems weird as there aren't other objects. Do we really 
need this wrapper?


http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-object-cache-wrapper.cc
File be/src/codegen/llvm-object-cache-wrapper.cc:

http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-object-cache-wrapper.cc@66
PS2, Line 66: size_t CodeGenObjectCache::objSize() {
Is sizeof(CodeGenObjectCache) accounted for elsewhere?



--
To view, visit http://gerrit.cloudera.org:8080/20733
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677
Gerrit-Change-Number: 20733
Gerrit-PatchSet: 2
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Comment-Date: Tue, 28 Nov 2023 18:19:32 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-11-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14526/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20733
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677
Gerrit-Change-Number: 20733
Gerrit-PatchSet: 2
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 28 Nov 2023 07:35:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-11-27 Thread Yida Wu (Code Review)
Yida Wu has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..

IMPALA-11805: Use llvm ObjectCache for codegen caching

Currently, we employ llvm::ExecutionEngine for codegen caching,
providing access to compiled functions within the cached engine.
However, the real challenge is the ExecutionEngine uses a lot of
memory and it is very hard to predict which largely exceeds our
memory estimates.

This patch addresses this issue by using llvm::ObjectCache for
codegen caching. The approach involves associating an ObjectCache
with the module(execution engine). During function compilation
within the module, if an ObjectCache is set, the pre-compiled
functions are written into the cache. This way, when revisiting
the same module (fragment), we can efficiently reuse the specific
ObjectCache, loading pre-compiled functions and saving time.

The tpch performance test indicates no significant regression
compared to the previous use of ExecutionEngine. Post-change,
the actual memory usage of each codegen caching entry is notably
reduced.

+--+--+---++-++---++---++-+---+
| Workload | Query| File Format   | Avg(s) | Base Avg(s) | 
Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | 
Tval  |
+--+--+---++-++---++---++-+---+
| TPCH(1)  | TPCH-Q15 | parquet / none / none | 0.48   | 0.46|   +5.16% 
  |   4.31%   |   4.78%| 8 |   +3.51%   | 2.11| 2.22  |
| TPCH(1)  | TPCH-Q17 | parquet / none / none | 1.28   | 1.25|   +1.76% 
  |   1.83%   |   1.88%| 8 |   +3.53%   | 1.34| 1.88  |
| TPCH(1)  | TPCH-Q8  | parquet / none / none | 0.77   | 0.75|   +3.11% 
  |   3.88%   |   2.75%| 8 |   +0.79%   | 0.83| 1.81  |
| TPCH(1)  | TPCH-Q7  | parquet / none / none | 0.72   | 0.71|   +1.89% 
  |   3.48%   |   3.15%| 8 |   +1.61%   | 0.83| 1.13  |
| TPCH(1)  | TPCH-Q22 | parquet / none / none | 0.43   | 0.41|   +3.40% 
  |   1.21%   |   6.11%| 8 |   +0.05%   | 0.45| 1.54  |
| TPCH(1)  | TPCH-Q10 | parquet / none / none | 0.74   | 0.73|   +2.04% 
  |   0.75%   |   2.57%| 8 |   +1.13%   | 2.11| 2.16  |
| TPCH(1)  | TPCH-Q20 | parquet / none / none | 0.55   | 0.54|   +1.42% 
  |   2.95%   |   1.14%| 8 |   +0.19%   | 0.57| 1.25  |
| TPCH(1)  | TPCH-Q11 | parquet / none / none | 0.25   | 0.25|   +0.37% 
  |   2.41%   |   3.09%| 8 |   +0.82%   | 0.45| 0.27  |
| TPCH(1)  | TPCH-Q19 | parquet / none / none | 0.62   | 0.61|   +0.92% 
  |   3.17%   |   1.53%| 8 |   +0.05%   | 0.19| 0.73  |
| TPCH(1)  | TPCH-Q4  | parquet / none / none | 0.42   | 0.42|   +0.37% 
  |   1.24%   |   1.08%| 8 |   +0.03%   | 0.19| 0.63  |
| TPCH(1)  | TPCH-Q5  | parquet / none / none | 0.72   | 0.72|   +0.19% 
  |   3.08%   |   3.23%| 8 |   +0.07%   | 0.57| 0.12  |
| TPCH(1)  | TPCH-Q18 | parquet / none / none | 1.20   | 1.19|   +0.04% 
  |   0.12%   |   0.15%| 8 |   +0.03%   | 0.57| 0.53  |
| TPCH(1)  | TPCH-Q9  | parquet / none / none | 1.13   | 1.13|   +0.02% 
  |   2.05%   |   2.04%| 8 |   -0.01%   | -0.57   | 0.02  |
| TPCH(1)  | TPCH-Q3  | parquet / none / none | 0.60   | 0.60|   +0.03% 
  |   3.71%   |   3.99%| 8 |   -0.04%   | -0.06   | 0.01  |
| TPCH(1)  | TPCH-Q2  | parquet / none / none | 0.44   | 0.44|   -0.70% 
  |   8.56%   |   7.42%| 8 |   +0.61%   | 0.45| -0.18 |
| TPCH(1)  | TPCH-Q6  | parquet / none / none | 0.31   | 0.31|   -0.07% 
  |   1.73%   |   1.56%| 8 |   -0.06%   | -1.34   | -0.09 |
| TPCH(1)  | TPCH-Q14 | parquet / none / none | 0.48   | 0.48|   -0.25% 
  |   1.20%   |   1.47%| 8 |   -0.09%   | -0.57   | -0.37 |
| TPCH(1)  | TPCH-Q12 | parquet / none / none | 0.54   | 0.54|   -0.86% 
  |   1.90%   |   4.48%| 8 |   +0.11%   | 0.32| -0.50 |
| TPCH(1)  | TPCH-Q21 | parquet / none / none | 1.35   | 1.37|   -1.56% 
  |   3.11%   |   2.55%| 8 |   -0.19%   | -0.70   | -1.11 |
| TPCH(1)  | TPCH-Q16 | parquet / none / none | 0.26   | 0.27|   -1.74% 
  |   9.53%   | * 13.61% * | 8 |   -0.32%   | -0.06   | -0.30 |
| TPCH(1)  | TPCH-Q1  | parquet / none / none | 0.64   | 0.67|   -4.64% 
  |   1.13%   | * 12.59% * | 8 |   -0.28%   | -0.19   | -1.04 |
| TPCH(1)  | TPCH-Q13 | 

[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-11-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..


Patch Set 1:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/14525/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/20733
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677
Gerrit-Change-Number: 20733
Gerrit-PatchSet: 1
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 28 Nov 2023 06:44:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-11-27 Thread Yida Wu (Code Review)
Yida Wu has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/20733


Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..

IMPALA-11805: Use llvm ObjectCache for codegen caching

Currently, we employ llvm::ExecutionEngine for codegen caching,
providing access to compiled functions within the cached engine.
However, the real challenge is the ExecutionEngine uses a lot of
memory and it is very hard to predict which largely exceeds our
memory estimates.

This patch addresses this issue by using llvm::ObjectCache for
codegen caching. The approach involves associating an ObjectCache
with the module(execution engine). During function compilation
within the module, if an ObjectCache is set, the pre-compiled
functions are written into the cache. This way, when revisiting
the same module (fragment), we can efficiently reuse the specific
ObjectCache, loading pre-compiled functions and saving time.

The tpch performance test indicates no significant regression
compared to the previous use of ExecutionEngine. Post-change,
the actual memory usage of each codegen caching entry is notably
reduced.

+--+--+---++-++---++---++-+---+
| Workload | Query| File Format   | Avg(s) | Base Avg(s) | 
Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | 
Tval  |
+--+--+---++-++---++---++-+---+
| TPCH(1)  | TPCH-Q15 | parquet / none / none | 0.48   | 0.46|   +5.16% 
  |   4.31%   |   4.78%| 8 |   +3.51%   | 2.11| 2.22  |
| TPCH(1)  | TPCH-Q17 | parquet / none / none | 1.28   | 1.25|   +1.76% 
  |   1.83%   |   1.88%| 8 |   +3.53%   | 1.34| 1.88  |
| TPCH(1)  | TPCH-Q8  | parquet / none / none | 0.77   | 0.75|   +3.11% 
  |   3.88%   |   2.75%| 8 |   +0.79%   | 0.83| 1.81  |
| TPCH(1)  | TPCH-Q7  | parquet / none / none | 0.72   | 0.71|   +1.89% 
  |   3.48%   |   3.15%| 8 |   +1.61%   | 0.83| 1.13  |
| TPCH(1)  | TPCH-Q22 | parquet / none / none | 0.43   | 0.41|   +3.40% 
  |   1.21%   |   6.11%| 8 |   +0.05%   | 0.45| 1.54  |
| TPCH(1)  | TPCH-Q10 | parquet / none / none | 0.74   | 0.73|   +2.04% 
  |   0.75%   |   2.57%| 8 |   +1.13%   | 2.11| 2.16  |
| TPCH(1)  | TPCH-Q20 | parquet / none / none | 0.55   | 0.54|   +1.42% 
  |   2.95%   |   1.14%| 8 |   +0.19%   | 0.57| 1.25  |
| TPCH(1)  | TPCH-Q11 | parquet / none / none | 0.25   | 0.25|   +0.37% 
  |   2.41%   |   3.09%| 8 |   +0.82%   | 0.45| 0.27  |
| TPCH(1)  | TPCH-Q19 | parquet / none / none | 0.62   | 0.61|   +0.92% 
  |   3.17%   |   1.53%| 8 |   +0.05%   | 0.19| 0.73  |
| TPCH(1)  | TPCH-Q4  | parquet / none / none | 0.42   | 0.42|   +0.37% 
  |   1.24%   |   1.08%| 8 |   +0.03%   | 0.19| 0.63  |
| TPCH(1)  | TPCH-Q5  | parquet / none / none | 0.72   | 0.72|   +0.19% 
  |   3.08%   |   3.23%| 8 |   +0.07%   | 0.57| 0.12  |
| TPCH(1)  | TPCH-Q18 | parquet / none / none | 1.20   | 1.19|   +0.04% 
  |   0.12%   |   0.15%| 8 |   +0.03%   | 0.57| 0.53  |
| TPCH(1)  | TPCH-Q9  | parquet / none / none | 1.13   | 1.13|   +0.02% 
  |   2.05%   |   2.04%| 8 |   -0.01%   | -0.57   | 0.02  |
| TPCH(1)  | TPCH-Q3  | parquet / none / none | 0.60   | 0.60|   +0.03% 
  |   3.71%   |   3.99%| 8 |   -0.04%   | -0.06   | 0.01  |
| TPCH(1)  | TPCH-Q2  | parquet / none / none | 0.44   | 0.44|   -0.70% 
  |   8.56%   |   7.42%| 8 |   +0.61%   | 0.45| -0.18 |
| TPCH(1)  | TPCH-Q6  | parquet / none / none | 0.31   | 0.31|   -0.07% 
  |   1.73%   |   1.56%| 8 |   -0.06%   | -1.34   | -0.09 |
| TPCH(1)  | TPCH-Q14 | parquet / none / none | 0.48   | 0.48|   -0.25% 
  |   1.20%   |   1.47%| 8 |   -0.09%   | -0.57   | -0.37 |
| TPCH(1)  | TPCH-Q12 | parquet / none / none | 0.54   | 0.54|   -0.86% 
  |   1.90%   |   4.48%| 8 |   +0.11%   | 0.32| -0.50 |
| TPCH(1)  | TPCH-Q21 | parquet / none / none | 1.35   | 1.37|   -1.56% 
  |   3.11%   |   2.55%| 8 |   -0.19%   | -0.70   | -1.11 |
| TPCH(1)  | TPCH-Q16 | parquet / none / none | 0.26   | 0.27|   -1.74% 
  |   9.53%   | * 13.61% * | 8 |   -0.32%   | -0.06   | -0.30 |
| TPCH(1)  | TPCH-Q1  | parquet / none / none | 0.64   | 0.67|   -4.64% 
  |   1.13%   | * 12.59% * | 8 |   -0.28%   | -0.19   | -1.04 |
| TPCH(1)  | TPCH-Q13 

[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-11-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20733/1/be/src/codegen/llvm-object-cache-wrapper.cc
File be/src/codegen/llvm-object-cache-wrapper.cc:

http://gerrit.cloudera.org:8080/#/c/20733/1/be/src/codegen/llvm-object-cache-wrapper.cc@50
PS1, Line 50:   // Return a copy of the object cache containing the compiled 
module with the specified ID.
line too long (92 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/20733
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677
Gerrit-Change-Number: 20733
Gerrit-PatchSet: 1
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 28 Nov 2023 06:17:54 +
Gerrit-HasComments: Yes