[Impala-ASF-CR] IMPALA-12269: Codegen cache false negative because of function names hash
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20168 ) Change subject: IMPALA-12269: Codegen cache false negative because of function names hash .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13714/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibf1d2b424c969fbba181ab90bf9c7bf22355f139 Gerrit-Change-Number: 20168 Gerrit-PatchSet: 8 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Fri, 04 Aug 2023 10:09:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12269: Codegen cache false negative because of function names hash
Daniel Becker has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/20168 ) Change subject: IMPALA-12269: Codegen cache false negative because of function names hash .. IMPALA-12269: Codegen cache false negative because of function names hash Codegen cache entries (execution engines holding an LLVM code module) are stored by keys derived from the unoptimised llvm modules: the key is either the whole unoptimised module (normal mode) or its hash (optimal mode). Because hash collisions are possible (in optimal mode), as an extra precaution we also compare the hashes of the function names in the current and the cached module. However, when assembling the function name list we do not filter out duplicate function names, which may result in cases where the unoptimised llvm modules are identical but the function name hashes do not match. Example: First query: select int_col, tinyint_col from alltypessmall order by int_col desc limit 20; Second query: select tinyint_col from alltypessmall order by int_col desc limit 20; In the first query, there are two 'SlotRef' objects referencing 'tinyint_col' which want to codegen a 'GetSlotRef()' function. The second invokation of 'SlotRef::GetCodegendComputeFnImpl()' checks the already codegen'd functions, finds the function created by its first invokation and returns that. The two 'SlotRef' objects will use the same 'llvm::Function' and there will be only one copy of it in the module, but both 'SlotRef's will call 'LlvmCodeGen::AddFunctionToJit()' with this function in order for their respective function pointers to be set after JIT-compilation. 'LlvmCodeGen::GetAllFunctionNames()' will return the names of all functions with which 'LlvmCodeGen::AddFunctionToJit()' has been called, including duplicates. The second query generates the same unoptimised module as the first query (for the corresponding fragment), but does not have a duplicated 'GetSlotRef()' function in its function name list, so the cached module is rejected. Note that this also results in the cached module being evicted when the new module from the second query is inserted into the cache because the new module will have the same key as the cached one (the modules are identical). This change fixes this problem by using a de-duplicated and sorted function name list. Testing: - Added a test in test_codegen_cache.py that asserts that there is a cache hit and no eviction in the above example. Change-Id: Ibf1d2b424c969fbba181ab90bf9c7bf22355f139 Reviewed-on: http://gerrit.cloudera.org:8080/20168 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/codegen/llvm-codegen-cache-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M tests/custom_cluster/test_codegen_cache.py 4 files changed, 125 insertions(+), 61 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/20168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ibf1d2b424c969fbba181ab90bf9c7bf22355f139 Gerrit-Change-Number: 20168 Gerrit-PatchSet: 9 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu
[Impala-ASF-CR] IMPALA-12269: Codegen cache false negative because of function names hash
Daniel Becker has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/20168 ) Change subject: IMPALA-12269: Codegen cache false negative because of function names hash .. IMPALA-12269: Codegen cache false negative because of function names hash Codegen cache entries (execution engines holding an LLVM code module) are stored by keys derived from the unoptimised llvm modules: the key is either the whole unoptimised module (normal mode) or its hash (optimal mode). Because hash collisions are possible (in optimal mode), as an extra precaution we also compare the hashes of the function names in the current and the cached module. However, when assembling the function name list we do not filter out duplicate function names, which may result in cases where the unoptimised llvm modules are identical but the function name hashes do not match. Example: First query: select int_col, tinyint_col from alltypessmall order by int_col desc limit 20; Second query: select tinyint_col from alltypessmall order by int_col desc limit 20; In the first query, there are two 'SlotRef' objects referencing 'tinyint_col' which want to codegen a 'GetSlotRef()' function. The second invokation of 'SlotRef::GetCodegendComputeFnImpl()' checks the already codegen'd functions, finds the function created by its first invokation and returns that. The two 'SlotRef' objects will use the same 'llvm::Function' and there will be only one copy of it in the module, but both 'SlotRef's will call 'LlvmCodeGen::AddFunctionToJit()' with this function in order for their respective function pointers to be set after JIT-compilation. 'LlvmCodeGen::GetAllFunctionNames()' will return the names of all functions with which 'LlvmCodeGen::AddFunctionToJit()' has been called, including duplicates. The second query generates the same unoptimised module as the first query (for the corresponding fragment), but does not have a duplicated 'GetSlotRef()' function in its function name list, so the cached module is rejected. Note that this also results in the cached module being evicted when the new module from the second query is inserted into the cache because the new module will have the same key as the cached one (the modules are identical). This change fixes this problem by using a de-duplicated and sorted function name list. Testing: - Added a test in test_codegen_cache.py that asserts that there is a cache hit and no eviction in the above example. Change-Id: Ibf1d2b424c969fbba181ab90bf9c7bf22355f139 --- M be/src/codegen/llvm-codegen-cache-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M tests/custom_cluster/test_codegen_cache.py 4 files changed, 125 insertions(+), 61 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/20168/8 -- To view, visit http://gerrit.cloudera.org:8080/20168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibf1d2b424c969fbba181ab90bf9c7bf22355f139 Gerrit-Change-Number: 20168 Gerrit-PatchSet: 8 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu
[Impala-ASF-CR] IMPALA-12269: Codegen cache false negative because of function names hash
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20168 ) Change subject: IMPALA-12269: Codegen cache false negative because of function names hash .. Patch Set 6: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/20168/5/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: http://gerrit.cloudera.org:8080/#/c/20168/5/be/src/codegen/llvm-codegen.cc@1438 PS5, Line 1438: for (auto& entry : fns_to_jit_compile_) { > It's already checked by the DCHECKS on L1448 and L1469. Ack -- To view, visit http://gerrit.cloudera.org:8080/20168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibf1d2b424c969fbba181ab90bf9c7bf22355f139 Gerrit-Change-Number: 20168 Gerrit-PatchSet: 6 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 03 Aug 2023 15:02:38 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12269: Codegen cache false negative because of function names hash
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/20168 ) Change subject: IMPALA-12269: Codegen cache false negative because of function names hash .. Patch Set 6: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/20168/6/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: http://gerrit.cloudera.org:8080/#/c/20168/6/be/src/codegen/llvm-codegen.cc@1196 PS6, Line 1196: // The function names are sorted in 'fns_to_jit_compile_'. nit: extra space? -- To view, visit http://gerrit.cloudera.org:8080/20168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibf1d2b424c969fbba181ab90bf9c7bf22355f139 Gerrit-Change-Number: 20168 Gerrit-PatchSet: 6 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 03 Aug 2023 14:43:46 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12269: Codegen cache false negative because of function names hash
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20168 ) Change subject: IMPALA-12269: Codegen cache false negative because of function names hash .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13705/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibf1d2b424c969fbba181ab90bf9c7bf22355f139 Gerrit-Change-Number: 20168 Gerrit-PatchSet: 6 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 03 Aug 2023 10:17:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12269: Codegen cache false negative because of function names hash
Daniel Becker has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/20168 ) Change subject: IMPALA-12269: Codegen cache false negative because of function names hash .. IMPALA-12269: Codegen cache false negative because of function names hash Codegen cache entries (execution engines holding an LLVM code module) are stored by keys derived from the unoptimised llvm modules: the key is either the whole unoptimised module (normal mode) or its hash (optimal mode). Because hash collisions are possible (in optimal mode), as an extra precaution we also compare the hashes of the function names in the current and the cached module. However, when assembling the function name list we do not filter out duplicate function names, which may result in cases where the unoptimised llvm modules are identical but the function name hashes do not match. Example: First query: select int_col, tinyint_col from alltypessmall order by int_col desc limit 20; Second query: select tinyint_col from alltypessmall order by int_col desc limit 20; In the first query, there are two 'SlotRef' objects referencing 'tinyint_col' which want to codegen a 'GetSlotRef()' function. The second invokation of 'SlotRef::GetCodegendComputeFnImpl()' checks the already codegen'd functions, finds the function created by its first invokation and returns that. The two 'SlotRef' objects will use the same 'llvm::Function' and there will be only one copy of it in the module, but both 'SlotRef's will call 'LlvmCodeGen::AddFunctionToJit()' with this function in order for their respective function pointers to be set after JIT-compilation. 'LlvmCodeGen::GetAllFunctionNames()' will return the names of all functions with which 'LlvmCodeGen::AddFunctionToJit()' has been called, including duplicates. The second query generates the same unoptimised module as the first query (for the corresponding fragment), but does not have a duplicated 'GetSlotRef()' function in its function name list, so the cached module is rejected. Note that this also results in the cached module being evicted when the new module from the second query is inserted into the cache because the new module will have the same key as the cached one (the modules are identical). This change fixes this problem by using a de-duplicated and sorted function name list. Testing: - Added a test in test_codegen_cache.py that asserts that there is a cache hit and no eviction in the above example. Change-Id: Ibf1d2b424c969fbba181ab90bf9c7bf22355f139 --- M be/src/codegen/llvm-codegen-cache-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M tests/custom_cluster/test_codegen_cache.py 4 files changed, 125 insertions(+), 61 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/20168/6 -- To view, visit http://gerrit.cloudera.org:8080/20168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibf1d2b424c969fbba181ab90bf9c7bf22355f139 Gerrit-Change-Number: 20168 Gerrit-PatchSet: 6 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu
[Impala-ASF-CR] IMPALA-12269: Codegen cache false negative because of function names hash
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/20168 ) Change subject: IMPALA-12269: Codegen cache false negative because of function names hash .. Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/20168/5/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: http://gerrit.cloudera.org:8080/#/c/20168/5/be/src/codegen/llvm-codegen.cc@1202 PS5, Line 1202: return result.str(); > nit. unnecessary newline Done http://gerrit.cloudera.org:8080/#/c/20168/5/be/src/codegen/llvm-codegen.cc@1438 PS5, Line 1438: for (auto& entry : fns_to_jit_compile_) { > Because for SetFunctionPointers() "either both or none of 'cache' and 'cach It's already checked by the DCHECKS on L1448 and L1469. -- To view, visit http://gerrit.cloudera.org:8080/20168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibf1d2b424c969fbba181ab90bf9c7bf22355f139 Gerrit-Change-Number: 20168 Gerrit-PatchSet: 6 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 03 Aug 2023 09:52:09 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12269: Codegen cache false negative because of function names hash
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20168 ) Change subject: IMPALA-12269: Codegen cache false negative because of function names hash .. Patch Set 5: Code-Review+1 (2 comments) It looks good. Could Csaba have one more look at the patch after changing to use map for codegen functions? http://gerrit.cloudera.org:8080/#/c/20168/5/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: http://gerrit.cloudera.org:8080/#/c/20168/5/be/src/codegen/llvm-codegen.cc@1202 PS5, Line 1202: nit. unnecessary newline http://gerrit.cloudera.org:8080/#/c/20168/5/be/src/codegen/llvm-codegen.cc@1438 PS5, Line 1438: // Get pointers to all codegen'd functions. Because for SetFunctionPointers() "either both or none of 'cache' and 'cache_key' should be NULL", can we add a DCHECK here for cache and cache_key to make sure they meet our requirement? -- To view, visit http://gerrit.cloudera.org:8080/20168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibf1d2b424c969fbba181ab90bf9c7bf22355f139 Gerrit-Change-Number: 20168 Gerrit-PatchSet: 5 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Tue, 01 Aug 2023 23:18:00 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12269: Codegen cache false negative because of function names hash
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20168 ) Change subject: IMPALA-12269: Codegen cache false negative because of function names hash .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13639/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibf1d2b424c969fbba181ab90bf9c7bf22355f139 Gerrit-Change-Number: 20168 Gerrit-PatchSet: 5 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Wed, 26 Jul 2023 16:34:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12269: Codegen cache false negative because of function names hash
Daniel Becker has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/20168 ) Change subject: IMPALA-12269: Codegen cache false negative because of function names hash .. IMPALA-12269: Codegen cache false negative because of function names hash Codegen cache entries (execution engines holding an LLVM code module) are stored by keys derived from the unoptimised llvm modules: the key is either the whole unoptimised module (normal mode) or its hash (optimal mode). Because hash collisions are possible (in optimal mode), as an extra precaution we also compare the hashes of the function names in the current and the cached module. However, when assembling the function name list we do not filter out duplicate function names, which may result in cases where the unoptimised llvm modules are identical but the function name hashes do not match. Example: First query: select int_col, tinyint_col from alltypessmall order by int_col desc limit 20; Second query: select tinyint_col from alltypessmall order by int_col desc limit 20; In the first query, there are two 'SlotRef' objects referencing 'tinyint_col' which want to codegen a 'GetSlotRef()' function. The second invokation of 'SlotRef::GetCodegendComputeFnImpl()' checks the already codegen'd functions, finds the function created by its first invokation and returns that. The two 'SlotRef' objects will use the same 'llvm::Function' and there will be only one copy of it in the module, but both 'SlotRef's will call 'LlvmCodeGen::AddFunctionToJit()' with this function in order for their respective function pointers to be set after JIT-compilation. 'LlvmCodeGen::GetAllFunctionNames()' will return the names of all functions with which 'LlvmCodeGen::AddFunctionToJit()' has been called, including duplicates. The second query generates the same unoptimised module as the first query (for the corresponding fragment), but does not have a duplicated 'GetSlotRef()' function in its function name list, so the cached module is rejected. Note that this also results in the cached module being evicted when the new module from the second query is inserted into the cache because the new module will have the same key as the cached one (the modules are identical). This change fixes this problem by using a de-duplicated and sorted function name list. Testing: - Added a test in test_codegen_cache.py that asserts that there is a cache hit and no eviction in the above example. Change-Id: Ibf1d2b424c969fbba181ab90bf9c7bf22355f139 --- M be/src/codegen/llvm-codegen-cache-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M tests/custom_cluster/test_codegen_cache.py 4 files changed, 126 insertions(+), 61 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/20168/5 -- To view, visit http://gerrit.cloudera.org:8080/20168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibf1d2b424c969fbba181ab90bf9c7bf22355f139 Gerrit-Change-Number: 20168 Gerrit-PatchSet: 5 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu
[Impala-ASF-CR] IMPALA-12269: Codegen cache false negative because of function names hash
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20168 ) Change subject: IMPALA-12269: Codegen cache false negative because of function names hash .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13625/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibf1d2b424c969fbba181ab90bf9c7bf22355f139 Gerrit-Change-Number: 20168 Gerrit-PatchSet: 4 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Mon, 24 Jul 2023 13:06:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12269: Codegen cache false negative because of function names hash
Daniel Becker has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/20168 ) Change subject: IMPALA-12269: Codegen cache false negative because of function names hash .. IMPALA-12269: Codegen cache false negative because of function names hash Codegen cache entries (execution engines holding an LLVM code module) are stored by keys derived from the unoptimised llvm modules: the key is either the whole unoptimised module (normal mode) or its hash (optimal mode). Because hash collisions are possible (in optimal mode), as an extra precaution we also compare the hashes of the function names in the current and the cached module. However, when assembling the function name list we do not filter out duplicate function names, which may result in cases where the unoptimised llvm modules are identical but the function name hashes do not match. Example: First query: select int_col, tinyint_col from alltypessmall order by int_col desc limit 20; Second query: select tinyint_col from alltypessmall order by int_col desc limit 20; In the first query, there are two 'SlotRef' objects referencing 'tinyint_col' which want to codegen a 'GetSlotRef()' function. The second invokation of 'SlotRef::GetCodegendComputeFnImpl()' checks the already codegen'd functions, finds the function created by its first invokation and returns that. The two 'SlotRef' objects will use the same 'llvm::Function' and there will be only one copy of it in the module, but both 'SlotRef's will call 'LlvmCodeGen::AddFunctionToJit()' with this function in order for their respective function pointers to be set after JIT-compilation. 'LlvmCodeGen::GetAllFunctionNames()' will return the names of all functions with which 'LlvmCodeGen::AddFunctionToJit()' has been called, including duplicates. The second query generates the same unoptimised module as the first query (for the corresponding fragment), but does not have a duplicated 'GetSlotRef()' function in its function name list, so the cached module is rejected. Note that this also results in the cached module being evicted when the new module from the second query is inserted into the cache because the new module will have the same key as the cached one (the modules are identical). This change fixes this problem by filtering out duplicates and sorting the elements when creating the list of function names. Testing: - Added a test in test_codegen_cache.py that asserts that there is a cache hit and no eviction in the above example. Change-Id: Ibf1d2b424c969fbba181ab90bf9c7bf22355f139 --- M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M tests/custom_cluster/test_codegen_cache.py 3 files changed, 51 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/20168/4 -- To view, visit http://gerrit.cloudera.org:8080/20168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibf1d2b424c969fbba181ab90bf9c7bf22355f139 Gerrit-Change-Number: 20168 Gerrit-PatchSet: 4 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu
[Impala-ASF-CR] IMPALA-12269: Codegen cache false negative because of function names hash
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20168 ) Change subject: IMPALA-12269: Codegen cache false negative because of function names hash .. Patch Set 3: Code-Review+1 (1 comment) Thanks Daniel for fixing this. A question maybe not related to this fix, does the same function name always mean the same function in different modules? i.e, is GetSlotRef_xxx in module a always same as GetSlotRef_xxx in module b? http://gerrit.cloudera.org:8080/#/c/20168/3/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: http://gerrit.cloudera.org:8080/#/c/20168/3/be/src/codegen/llvm-codegen.cc@1256 PS3, Line 1256: exported_fn_names Here doing a similar thing to put all the names to a set. Seems the repetitive computation can be avoided. Do you think it good to let them use the same set (this one uses unordered_set, but we require an order)? Or even maybe change the fns_to_jit_compile_ to a map with the function name being the key? -- To view, visit http://gerrit.cloudera.org:8080/20168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibf1d2b424c969fbba181ab90bf9c7bf22355f139 Gerrit-Change-Number: 20168 Gerrit-PatchSet: 3 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Tue, 11 Jul 2023 21:10:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12269: Codegen cache false negative because of function names hash
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/20168 ) Change subject: IMPALA-12269: Codegen cache false negative because of function names hash .. Patch Set 3: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/20168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibf1d2b424c969fbba181ab90bf9c7bf22355f139 Gerrit-Change-Number: 20168 Gerrit-PatchSet: 3 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Tue, 11 Jul 2023 10:00:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12269: Codegen cache false negative because of function names hash
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20168 ) Change subject: IMPALA-12269: Codegen cache false negative because of function names hash .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13498/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibf1d2b424c969fbba181ab90bf9c7bf22355f139 Gerrit-Change-Number: 20168 Gerrit-PatchSet: 3 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Mon, 10 Jul 2023 15:00:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12269: Codegen cache false negative because of function names hash
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/20168 ) Change subject: IMPALA-12269: Codegen cache false negative because of function names hash .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/20168/2/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: http://gerrit.cloudera.org:8080/#/c/20168/2/be/src/codegen/llvm-codegen.cc@1216 PS2, Line 1216: fns_to_jit_compile_ > Can you add a comment to the definition of fns_to_jit_compile_ that it can Done -- To view, visit http://gerrit.cloudera.org:8080/20168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibf1d2b424c969fbba181ab90bf9c7bf22355f139 Gerrit-Change-Number: 20168 Gerrit-PatchSet: 3 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Mon, 10 Jul 2023 14:36:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12269: Codegen cache false negative because of function names hash
Daniel Becker has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/20168 ) Change subject: IMPALA-12269: Codegen cache false negative because of function names hash .. IMPALA-12269: Codegen cache false negative because of function names hash Codegen cache entries (execution engines holding an LLVM code module) are stored by keys derived from the unoptimised llvm modules: the key is either the whole unoptimised module (normal mode) or its hash (optimal mode). Because hash collisions are possible (in optimal mode), as an extra precaution we also compare the hashes of the function names in the current and the cached module. However, when assembling the function name list we do not filter out duplicate function names, which may result in cases where the unoptimised llvm modules are identical but the function name hashes do not match. Example: First query: select int_col, tinyint_col from alltypessmall order by int_col desc limit 20; Second query: select tinyint_col from alltypessmall order by int_col desc limit 20; In the first query, there are two 'SlotRef' objects referencing 'tinyint_col' which want to codegen a 'GetSlotRef()' function. The second invokation of 'SlotRef::GetCodegendComputeFnImpl()' checks the already codegen'd functions, finds the function created by its first invokation and returns that. The two 'SlotRef' objects will use the same 'llvm::Function' and there will be only one copy of it in the module, but both 'SlotRef's will call 'LlvmCodeGen::AddFunctionToJit()' with this function in order for their respective function pointers to be set after JIT-compilation. 'LlvmCodeGen::GetAllFunctionNames()' will return the names of all functions with which 'LlvmCodeGen::AddFunctionToJit()' has been called, including duplicates. The second query generates the same unoptimised module as the first query (for the corresponding fragment), but does not have a duplicated 'GetSlotRef()' function in its function name list, so the cached module is rejected. Note that this also results in the cached module being evicted when the new module from the second query is inserted into the cache because the new module will have the same key as the cached one (the modules are identical). This change fixes this problem by filtering out duplicates and sorting the elements when creating the list of function names. Testing: - Added a test in test_codegen_cache.py that asserts that there is a cache hit and no eviction in the above example. Change-Id: Ibf1d2b424c969fbba181ab90bf9c7bf22355f139 --- M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M tests/custom_cluster/test_codegen_cache.py 3 files changed, 51 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/20168/3 -- To view, visit http://gerrit.cloudera.org:8080/20168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibf1d2b424c969fbba181ab90bf9c7bf22355f139 Gerrit-Change-Number: 20168 Gerrit-PatchSet: 3 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu
[Impala-ASF-CR] IMPALA-12269: Codegen cache false negative because of function names hash
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/20168 ) Change subject: IMPALA-12269: Codegen cache false negative because of function names hash .. Patch Set 2: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/20168/2/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: http://gerrit.cloudera.org:8080/#/c/20168/2/be/src/codegen/llvm-codegen.cc@1216 PS2, Line 1216: fns_to_jit_compile_ Can you add a comment to the definition of fns_to_jit_compile_ that it can contain duplicates? -- To view, visit http://gerrit.cloudera.org:8080/20168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibf1d2b424c969fbba181ab90bf9c7bf22355f139 Gerrit-Change-Number: 20168 Gerrit-PatchSet: 2 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Mon, 10 Jul 2023 14:06:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12269: Codegen cache false negative because of function names hash
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20168 ) Change subject: IMPALA-12269: Codegen cache false negative because of function names hash .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13489/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibf1d2b424c969fbba181ab90bf9c7bf22355f139 Gerrit-Change-Number: 20168 Gerrit-PatchSet: 2 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Fri, 07 Jul 2023 17:18:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12269: Codegen cache false negative because of function names hash
Daniel Becker has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/20168 ) Change subject: IMPALA-12269: Codegen cache false negative because of function names hash .. IMPALA-12269: Codegen cache false negative because of function names hash Codegen cache entries (execution engines holding an LLVM code module) are stored by keys derived from the unoptimised llvm modules: the key is either the whole unoptimised module (normal mode) or its hash (optimal mode). Because hash collisions are possible (in optimal mode), as an extra precaution we also compare the hashes of the function names in the current and the cached module. However, when assembling the function name list we do not filter out duplicate function names, which may result in cases where the unoptimised llvm modules are identical but the function name hashes do not match. Example: First query: select int_col, tinyint_col from alltypessmall order by int_col desc limit 20; Second query: select tinyint_col from alltypessmall order by int_col desc limit 20; In the first query, there are two 'SlotRef' objects referencing 'tinyint_col' which want to codegen a 'GetSlotRef()' function. The second invokation of 'SlotRef::GetCodegendComputeFnImpl()' checks the already codegen'd functions, finds the function created by its first invokation and returns that. The two 'SlotRef' objects will use the same 'llvm::Function' and there will be only one copy of it in the module, but both 'SlotRef's will call 'LlvmCodeGen::AddFunctionToJit()' with this function in order for their respective function pointers to be set after JIT-compilation. 'LlvmCodeGen::GetAllFunctionNames()' will return the names of all functions with which 'LlvmCodeGen::AddFunctionToJit()' has been called, including duplicates. The second query generates the same unoptimised module as the first query (for the corresponding fragment), but does not have a duplicated 'GetSlotRef()' function in its function name list, so the cached module is rejected. Note that this also results in the cached module being evicted when the new module from the second query is inserted into the cache because the new module will have the same key as the cached one (the modules are identical). This change fixes this problem by filtering out duplicates and sorting the elements when creating the list of function names. Testing: - Added a test in test_codegen_cache.py that asserts that there is a cache hit and no eviction in the above example. Change-Id: Ibf1d2b424c969fbba181ab90bf9c7bf22355f139 --- M be/src/codegen/llvm-codegen.cc M tests/custom_cluster/test_codegen_cache.py 2 files changed, 43 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/20168/2 -- To view, visit http://gerrit.cloudera.org:8080/20168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibf1d2b424c969fbba181ab90bf9c7bf22355f139 Gerrit-Change-Number: 20168 Gerrit-PatchSet: 2 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu
[Impala-ASF-CR] IMPALA-12269: Codegen cache false negative because of function names hash
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20168 ) Change subject: IMPALA-12269: Codegen cache false negative because of function names hash .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13487/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibf1d2b424c969fbba181ab90bf9c7bf22355f139 Gerrit-Change-Number: 20168 Gerrit-PatchSet: 1 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Fri, 07 Jul 2023 14:31:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12269: Codegen cache false negative because of function names hash
Daniel Becker has uploaded this change for review. ( http://gerrit.cloudera.org:8080/20168 Change subject: IMPALA-12269: Codegen cache false negative because of function names hash .. IMPALA-12269: Codegen cache false negative because of function names hash Codegen cache entries (execution engines holding an LLVM code module) are stored by keys derived from the unoptimised llvm modules: the key is either the whole unoptimised module (normal mode) or its hash (optimal mode). Because hash collisions are possible (in optimal mode), as an extra precaution we also compare the hashes of the function names in the current and the cached module. However, when assembling the function name list we do not filter out duplicate function names, which may result in cases where the unoptimised llvm modules are identical but the function name hashes do not match. Example: First query: select int_col, tinyint_col from alltypessmall order by int_col desc limit 20; Second query: select tinyint_col from alltypessmall order by int_col desc limit 20; In the first query, there are two 'SlotRef' objects referencing 'tinyint_col' which want to codegen a 'GetSlotRef()' function. The second invokation of 'SlotRef::GetCodegendComputeFnImpl()' checks the already codegen'd functions, finds the function created by its first invokation and returns that. The two 'SlotRef' objects will use the same 'llvm::Function' and there will be only one copy of it in the module, but both 'SlotRef's will call 'LlvmCodeGen::AddFunctionToJit()' with this function in order for their respective function pointers to be set after JIT-compilation. 'LlvmCodeGen::GetAllFunctionNames()' will return the names of all functions with which 'LlvmCodeGen::AddFunctionToJit()' has been called, including duplicates. The second query generates the same unoptimised module as the first query (for the corresponding fragment), but does not have a duplicated 'GetSlotRef()' function in its function name list, so the cached module is rejected. Note that this also results in the cached module being evicted when the new module from the second query is inserted into the cache because the new module will have the same key as the cached one (the modules are identical). This change fixes this problem by filtering out duplicates and sorting the elements when creating the list of function names. Testing: - Added a test in test_codegen_cache.py that asserts that there is a cache hit and no eviction in the above example. Change-Id: Ibf1d2b424c969fbba181ab90bf9c7bf22355f139 --- M be/src/codegen/llvm-codegen.cc M tests/custom_cluster/test_codegen_cache.py 2 files changed, 44 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/20168/1 -- To view, visit http://gerrit.cloudera.org:8080/20168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ibf1d2b424c969fbba181ab90bf9c7bf22355f139 Gerrit-Change-Number: 20168 Gerrit-PatchSet: 1 Gerrit-Owner: Daniel Becker
[Impala-ASF-CR] IMPALA-12269: Codegen cache false negative because of function names hash
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20168 ) Change subject: IMPALA-12269: Codegen cache false negative because of function names hash .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/20168/1/tests/custom_cluster/test_codegen_cache.py File tests/custom_cluster/test_codegen_cache.py: http://gerrit.cloudera.org:8080/#/c/20168/1/tests/custom_cluster/test_codegen_cache.py@21 PS1, Line 21: import re flake8: F401 're' imported but unused http://gerrit.cloudera.org:8080/#/c/20168/1/tests/custom_cluster/test_codegen_cache.py@163 PS1, Line 163: r flake8: F841 local variable 'result' is assigned to but never used -- To view, visit http://gerrit.cloudera.org:8080/20168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibf1d2b424c969fbba181ab90bf9c7bf22355f139 Gerrit-Change-Number: 20168 Gerrit-PatchSet: 1 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Fri, 07 Jul 2023 14:09:33 + Gerrit-HasComments: Yes