Yida Wu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19181 )

Change subject: IMPALA-11470: Add Cache For Codegen Functions
......................................................................


Patch Set 13:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/19181/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19181/11//COMMIT_MSG@53
PS11, Line 53: codegen_cache_mode
> What will happen if the user changes between the normal and optimal? This c
Yeah, for the same entry, normal and optimal keys will use the same hashcode, 
but the normal one will have the content of keys, so always longer and 
different from the optimal one.
NORMAL: hashcode + keys
OPTIMAL: hashcode
So, they will be different keys.
Added a unit test for testing the mode switch.


http://gerrit.cloudera.org:8080/#/c/19181/12/be/src/codegen/llvm-codegen-cache.h
File be/src/codegen/llvm-codegen-cache.h:

http://gerrit.cloudera.org:8080/#/c/19181/12/be/src/codegen/llvm-codegen-cache.h@30
PS12, Line 30: #include "thirdparty/datasketches/MurmurHash3.h"
> nit: not actually used in the header.
Header thirdparty/datasketches/MurmurHash3.h is needed by the struct HashState.
It is okay to change to thirdparty/murmurhash, but it looks to me the 
datasketches one is a better implementation, because according to the header 
description it is modified from the "thirdparty/murmurhash" version, using a 
uint64_t seed and passing a reference of a defined structure instead of setting 
values on a (void*) pointer, which can be scary.


http://gerrit.cloudera.org:8080/#/c/19181/11/be/src/codegen/llvm-codegen.cc
File be/src/codegen/llvm-codegen.cc:

http://gerrit.cloudera.org:8080/#/c/19181/11/be/src/codegen/llvm-codegen.cc@932
PS11, Line 932: fn.binary_type == TFunctionBinaryType::NATIVE
> My understanding is that this can be false only in case broken_builtin == t
Did an investigation, the broken_builtin, such as from_utc_timestamp and 
months_add (*AddSub*) would go in this logic. However, the builtin function 
address should not be changed during runtime because the predefined database 
_impala_builtins which holds the builtin functions couldn't be dropped and they 
should be loaded from current_process_handle_, 
https://github.com/apache/impala/blob/84fa6d210d3966e5ece8b4ac84ff8bd8780dec4e/be/src/runtime/lib-cache.cc#L147,
 and current_process_handle_ is loaded only when initialization, at least for 
from_utc_timestamp and months_add, there seems no chance to change the function 
address in runtime.
So it looks fine to me if we don't handle the broken builtin, because they 
won't be changed in runtime (unless I have missed anything), it should be good 
to use the old address from the cache.
Added an EE test using from_utc_timestamp for codegen cache.


http://gerrit.cloudera.org:8080/#/c/19181/12/be/src/runtime/test-env.h
File be/src/runtime/test-env.h:

http://gerrit.cloudera.org:8080/#/c/19181/12/be/src/runtime/test-env.h@23
PS12, Line 23: #include "runtime/io/disk-io-mgr.h"
> Why was this include needed? Only new identifier used that I see is CodeGen
It is from the old code, the formatter rearranges the headers to a correct 
order.


http://gerrit.cloudera.org:8080/#/c/19181/11/tests/custom_cluster/test_codegen_cache.py
File tests/custom_cluster/test_codegen_cache.py:

http://gerrit.cloudera.org:8080/#/c/19181/11/tests/custom_cluster/test_codegen_cache.py@51
PS11, Line 51: test_codegen_cache_int_col
> optional: I think that it could make the test s bit faster if these similar
It could need some work to change. But the tests are running in exhaustive mode 
only, like the data cache EE tests, should be good for now because all the 
tests can be finished in two minutes in total, very small compared to the 
entire exhaustive tests.


http://gerrit.cloudera.org:8080/#/c/19181/12/tests/custom_cluster/test_codegen_cache.py
File tests/custom_cluster/test_codegen_cache.py:

http://gerrit.cloudera.org:8080/#/c/19181/12/tests/custom_cluster/test_codegen_cache.py@71
PS12, Line 71:       "select * from functional.alltypes where bigint_col > 0")
> nit: add space before '>' operator
Done


http://gerrit.cloudera.org:8080/#/c/19181/12/tests/custom_cluster/test_codegen_cache.py@77
PS12, Line 77:       "select * from functional.alltypes where float_col > 0")
> nit: add space before '>' operator
Done


http://gerrit.cloudera.org:8080/#/c/19181/12/tests/custom_cluster/test_codegen_cache.py@83
PS12, Line 83:       "select * from functional.alltypes where double_col > 0")
> nit: add space before '>' operator
Done


http://gerrit.cloudera.org:8080/#/c/19181/12/tests/custom_cluster/test_codegen_cache.py@89
PS12, Line 89:       "select * from functional.alltypes where date_string_col 
!= ''")
> nit: add space before '!' operator
Done


http://gerrit.cloudera.org:8080/#/c/19181/12/tests/custom_cluster/test_codegen_cache.py@95
PS12, Line 95:       "select * from functional.alltypes where string_col !=''")
> nit: add space before '>' operator
Done



--
To view, visit http://gerrit.cloudera.org:8080/19181
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If42c78a7f51fd582e5fe331fead494dadf544eb1
Gerrit-Change-Number: 19181
Gerrit-PatchSet: 13
Gerrit-Owner: Yida Wu <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Yida Wu <[email protected]>
Gerrit-Comment-Date: Thu, 01 Dec 2022 06:16:04 +0000
Gerrit-HasComments: Yes

Reply via email to