Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/19181 )
Change subject: IMPALA-11470: Add Cache For Codegen Functions ...................................................................... Patch Set 13: (10 comments) http://gerrit.cloudera.org:8080/#/c/19181/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19181/11//COMMIT_MSG@53 PS11, Line 53: codegen_cache_mode > What will happen if the user changes between the normal and optimal? This c Yeah, for the same entry, normal and optimal keys will use the same hashcode, but the normal one will have the content of keys, so always longer and different from the optimal one. NORMAL: hashcode + keys OPTIMAL: hashcode So, they will be different keys. Added a unit test for testing the mode switch. http://gerrit.cloudera.org:8080/#/c/19181/12/be/src/codegen/llvm-codegen-cache.h File be/src/codegen/llvm-codegen-cache.h: http://gerrit.cloudera.org:8080/#/c/19181/12/be/src/codegen/llvm-codegen-cache.h@30 PS12, Line 30: #include "thirdparty/datasketches/MurmurHash3.h" > nit: not actually used in the header. Header thirdparty/datasketches/MurmurHash3.h is needed by the struct HashState. It is okay to change to thirdparty/murmurhash, but it looks to me the datasketches one is a better implementation, because according to the header description it is modified from the "thirdparty/murmurhash" version, using a uint64_t seed and passing a reference of a defined structure instead of setting values on a (void*) pointer, which can be scary. http://gerrit.cloudera.org:8080/#/c/19181/11/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: http://gerrit.cloudera.org:8080/#/c/19181/11/be/src/codegen/llvm-codegen.cc@932 PS11, Line 932: fn.binary_type == TFunctionBinaryType::NATIVE > My understanding is that this can be false only in case broken_builtin == t Did an investigation, the broken_builtin, such as from_utc_timestamp and months_add (*AddSub*) would go in this logic. However, the builtin function address should not be changed during runtime because the predefined database _impala_builtins which holds the builtin functions couldn't be dropped and they should be loaded from current_process_handle_, https://github.com/apache/impala/blob/84fa6d210d3966e5ece8b4ac84ff8bd8780dec4e/be/src/runtime/lib-cache.cc#L147, and current_process_handle_ is loaded only when initialization, at least for from_utc_timestamp and months_add, there seems no chance to change the function address in runtime. So it looks fine to me if we don't handle the broken builtin, because they won't be changed in runtime (unless I have missed anything), it should be good to use the old address from the cache. Added an EE test using from_utc_timestamp for codegen cache. http://gerrit.cloudera.org:8080/#/c/19181/12/be/src/runtime/test-env.h File be/src/runtime/test-env.h: http://gerrit.cloudera.org:8080/#/c/19181/12/be/src/runtime/test-env.h@23 PS12, Line 23: #include "runtime/io/disk-io-mgr.h" > Why was this include needed? Only new identifier used that I see is CodeGen It is from the old code, the formatter rearranges the headers to a correct order. http://gerrit.cloudera.org:8080/#/c/19181/11/tests/custom_cluster/test_codegen_cache.py File tests/custom_cluster/test_codegen_cache.py: http://gerrit.cloudera.org:8080/#/c/19181/11/tests/custom_cluster/test_codegen_cache.py@51 PS11, Line 51: test_codegen_cache_int_col > optional: I think that it could make the test s bit faster if these similar It could need some work to change. But the tests are running in exhaustive mode only, like the data cache EE tests, should be good for now because all the tests can be finished in two minutes in total, very small compared to the entire exhaustive tests. http://gerrit.cloudera.org:8080/#/c/19181/12/tests/custom_cluster/test_codegen_cache.py File tests/custom_cluster/test_codegen_cache.py: http://gerrit.cloudera.org:8080/#/c/19181/12/tests/custom_cluster/test_codegen_cache.py@71 PS12, Line 71: "select * from functional.alltypes where bigint_col > 0") > nit: add space before '>' operator Done http://gerrit.cloudera.org:8080/#/c/19181/12/tests/custom_cluster/test_codegen_cache.py@77 PS12, Line 77: "select * from functional.alltypes where float_col > 0") > nit: add space before '>' operator Done http://gerrit.cloudera.org:8080/#/c/19181/12/tests/custom_cluster/test_codegen_cache.py@83 PS12, Line 83: "select * from functional.alltypes where double_col > 0") > nit: add space before '>' operator Done http://gerrit.cloudera.org:8080/#/c/19181/12/tests/custom_cluster/test_codegen_cache.py@89 PS12, Line 89: "select * from functional.alltypes where date_string_col != ''") > nit: add space before '!' operator Done http://gerrit.cloudera.org:8080/#/c/19181/12/tests/custom_cluster/test_codegen_cache.py@95 PS12, Line 95: "select * from functional.alltypes where string_col !=''") > nit: add space before '>' operator Done -- To view, visit http://gerrit.cloudera.org:8080/19181 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If42c78a7f51fd582e5fe331fead494dadf544eb1 Gerrit-Change-Number: 19181 Gerrit-PatchSet: 13 Gerrit-Owner: Yida Wu <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Yida Wu <[email protected]> Gerrit-Comment-Date: Thu, 01 Dec 2022 06:16:04 +0000 Gerrit-HasComments: Yes
