[Impala-ASF-CR] IMPALA-13031: Enhancing logging for spilling configuration with local buffer directory details
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/21350 ) Change subject: IMPALA-13031: Enhancing logging for spilling configuration with local buffer directory details .. Patch Set 1: Thanks wenzhe for reviewing it. -- To view, visit http://gerrit.cloudera.org:8080/21350 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8fb357016d72a363ee5016f7881b0f6b0426aff5 Gerrit-Change-Number: 21350 Gerrit-PatchSet: 1 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Wed, 01 May 2024 23:27:36 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13031: Enhancing logging for spilling configuration with local buffer directory details
Yida Wu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21350 Change subject: IMPALA-13031: Enhancing logging for spilling configuration with local buffer directory details .. IMPALA-13031: Enhancing logging for spilling configuration with local buffer directory details The patch adds logging for local buffer directory when using remote scratch space. The printed log would be like "Using local buffer directory for scratch space /tmp/test/impala-scratch on disk 8 limit: 500.00 MB, priority: 2147483647". Tests: Manally tests the logging working as described. Change-Id: I8fb357016d72a363ee5016f7881b0f6b0426aff5 --- M be/src/runtime/tmp-file-mgr-internal.h M be/src/runtime/tmp-file-mgr.cc 2 files changed, 14 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/21350/1 -- To view, visit http://gerrit.cloudera.org:8080/21350 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I8fb357016d72a363ee5016f7881b0f6b0426aff5 Gerrit-Change-Number: 21350 Gerrit-PatchSet: 1 Gerrit-Owner: Yida Wu
[Impala-ASF-CR] IMPALA-13015: Dataload fails due to concurrency issue with test.jceks
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/21346 ) Change subject: IMPALA-13015: Dataload fails due to concurrency issue with test.jceks .. Patch Set 3: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/21346 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7fbeffc19f2b78c19fee9acf7f96466c8f4f9bcd Gerrit-Change-Number: 21346 Gerrit-PatchSet: 3 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Tue, 23 Apr 2024 02:28:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13015: Dataload fails due to concurrency issue with test.jceks
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/21346 ) Change subject: IMPALA-13015: Dataload fails due to concurrency issue with test.jceks .. Patch Set 2: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/21346/2/testdata/bin/create-load-data.sh File testdata/bin/create-load-data.sh: http://gerrit.cloudera.org:8080/#/c/21346/2/testdata/bin/create-load-data.sh@555 PS2, Line 555: run-step "Creating hadoop credential" create-hadoop-credential.log \ How about adding a comment here to explain why we place the command here and the potential concurrency issue with the Jira number? -- To view, visit http://gerrit.cloudera.org:8080/21346 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7fbeffc19f2b78c19fee9acf7f96466c8f4f9bcd Gerrit-Change-Number: 21346 Gerrit-PatchSet: 2 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Tue, 23 Apr 2024 00:28:23 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/21315 ) Change subject: IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest .. Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/21315/2/be/src/exprs/ai-functions.inline.h File be/src/exprs/ai-functions.inline.h: http://gerrit.cloudera.org:8080/#/c/21315/2/be/src/exprs/ai-functions.inline.h@108 PS2, Line 108: plac > 'Move' is good from the context of the change, but if someone is reading th Done http://gerrit.cloudera.org:8080/#/c/21315/2/be/src/exprs/ai-functions.inline.h@108 PS2, Line 108: > Nit: it is not a loop, I wrote it wrong in my comment. "'if' statement" wou Done http://gerrit.cloudera.org:8080/#/c/21315/2/be/src/exprs/ai-functions.inline.h@178 PS2, Line 178: std::string response = AiGenerateTextParseOpenAiResponse( > The other alternative would've been to create rapid::json Document and pass Agree. Preferring string for now -- To view, visit http://gerrit.cloudera.org:8080/21315 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3bb9dcf9d72cce7ad37d5bc25821cf6ee55a8ab5 Gerrit-Change-Number: 21315 Gerrit-PatchSet: 3 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 18 Apr 2024 13:38:54 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest
Yida Wu has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/21315 ) Change subject: IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest .. IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest The issue is that the code previously used a std::string_view to hold the data which is actually returned by rapidjson::Document. However, the rapidjson::Document object gets destroyed after creating the std::string_view. This meant the std::string_view referenced memory that was no longer valid, leading to a heap-use-after-free error. This patch fixes this issue by modifying the function to return a std::string instead of a std::string_view. When the function returns a string, it creates a copy of the data from rapidjson::Document. This ensures the returned string has its own memory allocation and doesn't rely on the destroyed rapidjson::Document. Tests: Reran the asan build and passed. Change-Id: I3bb9dcf9d72cce7ad37d5bc25821cf6ee55a8ab5 --- M be/src/exprs/ai-functions-ir.cc M be/src/exprs/ai-functions.h M be/src/exprs/ai-functions.inline.h M be/src/exprs/expr-test.cc 4 files changed, 11 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/21315/3 -- To view, visit http://gerrit.cloudera.org:8080/21315 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3bb9dcf9d72cce7ad37d5bc25821cf6ee55a8ab5 Gerrit-Change-Number: 21315 Gerrit-PatchSet: 3 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu
[Impala-ASF-CR] IMPALA-12874: Identify active and standby catalog and statestore in the web debug endpoint
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/21294 ) Change subject: IMPALA-12874: Identify active and standby catalog and statestore in the web debug endpoint .. Patch Set 4: Irrelevant Iceberg issue: IMPALA-12621 -- To view, visit http://gerrit.cloudera.org:8080/21294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie9435ba7a9549ea56f9d080a9315aecbcc630cd2 Gerrit-Change-Number: 21294 Gerrit-PatchSet: 4 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 18 Apr 2024 13:21:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12874: Identify active and standby catalog and statestore in the web debug endpoint
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/21294 ) Change subject: IMPALA-12874: Identify active and standby catalog and statestore in the web debug endpoint .. Patch Set 3: (3 comments) Thanks for the review. This patch fixes previous feedbacks, and combines the logic of adding status to a function AddActiveStatus() in default-path-handlers.cc. http://gerrit.cloudera.org:8080/#/c/21294/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21294/2//COMMIT_MSG@7 PS2, Line 7: IMPALA-12874: Identify active and standby catalog and statestore in the web debug endpoint > Incomplete sentence: Done http://gerrit.cloudera.org:8080/#/c/21294/2//COMMIT_MSG@20 PS2, Line 20: Manually tests the web page, and verified the status display is : correct. > If there is a failover does the status get updated? Can we also test that? Tested and worked. Added in the commit message. http://gerrit.cloudera.org:8080/#/c/21294/2/be/src/util/default-path-handlers.cc File be/src/util/default-path-handlers.cc: http://gerrit.cloudera.org:8080/#/c/21294/2/be/src/util/default-path-handlers.cc@271 PS2, Line 271: > admissiond also uses DaemonEnv, we should handle that case also. Maybe exit Done -- To view, visit http://gerrit.cloudera.org:8080/21294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie9435ba7a9549ea56f9d080a9315aecbcc630cd2 Gerrit-Change-Number: 21294 Gerrit-PatchSet: 3 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Wed, 17 Apr 2024 18:18:03 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12874: Identify active and standby catalog and statestore in the web debug endpoint
Yida Wu has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/21294 ) Change subject: IMPALA-12874: Identify active and standby catalog and statestore in the web debug endpoint .. IMPALA-12874: Identify active and standby catalog and statestore in the web debug endpoint This patch adds support to display the HA status of catalog and statestore on the root web page. The status will be presented as "Catalog Status: Active" or "Statestore Status: Standby" based on the values retrieved from the metrics catalogd-server.active-status and statestore.active-status. If the catalog or statestore is standalone, it will show active as the status, which is same as the metric. Tests: Ran core tests. Manually tests the web page, and verified the status display is correct. Also checked the situation when the failover happens, the current 'standby' status can be changed to 'active'. Change-Id: Ie9435ba7a9549ea56f9d080a9315aecbcc630cd2 --- M be/src/common/daemon-env.h M be/src/util/default-path-handlers.cc M be/src/util/default-path-handlers.h M www/root.tmpl 4 files changed, 69 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/21294/3 -- To view, visit http://gerrit.cloudera.org:8080/21294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie9435ba7a9549ea56f9d080a9315aecbcc630cd2 Gerrit-Change-Number: 21294 Gerrit-PatchSet: 3 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Yida Wu
[Impala-ASF-CR] IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/21315 ) Change subject: IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest .. Patch Set 2: (2 comments) Thanks Daniel for the review. http://gerrit.cloudera.org:8080/#/c/21315/1/be/src/exprs/ai-functions-ir.cc File be/src/exprs/ai-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/21315/1/be/src/exprs/ai-functions-ir.cc@88 PS1, Line 88: string > Nit: we could keep the "std::" prefix, it is also used at L81 and L74. Thanks for pointing this out. But I would prefer removing unnecessary prefixes like line 49. http://gerrit.cloudera.org:8080/#/c/21315/1/be/src/exprs/ai-functions.inline.h File be/src/exprs/ai-functions.inline.h: http://gerrit.cloudera.org:8080/#/c/21315/1/be/src/exprs/ai-functions.inline.h@107 PS1, Line 107: // Caution: 'payload' might reference data owned by 'overrides'. > Is it intentional that 'overrides' is taken out of the loop? Is it because Yes. Added a comment here. -- To view, visit http://gerrit.cloudera.org:8080/21315 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3bb9dcf9d72cce7ad37d5bc25821cf6ee55a8ab5 Gerrit-Change-Number: 21315 Gerrit-PatchSet: 2 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Wed, 17 Apr 2024 16:27:59 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest
Yida Wu has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/21315 ) Change subject: IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest .. IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest The issue is that the code previously used a std::string_view to hold the data which is actually returned by rapidjson::Document. However, the rapidjson::Document object gets destroyed after creating the std::string_view. This meant the std::string_view referenced memory that was no longer valid, leading to a heap-use-after-free error. This patch fixes this issue by modifying the function to return a std::string instead of a std::string_view. When the function returns a string, it creates a copy of the data from rapidjson::Document. This ensures the returned string has its own memory allocation and doesn't rely on the destroyed rapidjson::Document. Tests: Reran the asan build and passed. Change-Id: I3bb9dcf9d72cce7ad37d5bc25821cf6ee55a8ab5 --- M be/src/exprs/ai-functions-ir.cc M be/src/exprs/ai-functions.h M be/src/exprs/ai-functions.inline.h M be/src/exprs/expr-test.cc 4 files changed, 11 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/21315/2 -- To view, visit http://gerrit.cloudera.org:8080/21315 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3bb9dcf9d72cce7ad37d5bc25821cf6ee55a8ab5 Gerrit-Change-Number: 21315 Gerrit-PatchSet: 2 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest
Yida Wu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21315 Change subject: IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest .. IMPALA-13004: Fix heap-use-after-free error in ExprTest AiFunctionsTest The issue is that the code previously used a std::string_view to hold the data which is actually returned by rapidjson::Document. However, the rapidjson::Document object gets destroyed after creating the std::string_view. This meant the std::string_view referenced memory that was no longer valid, leading to a heap-use-after-free error. This patch fixes this issue by modifying the function to return a std::string instead of a std::string_view. When the function returns a string, it creates a copy of the data from rapidjson::Document. This ensures the returned string has its own memory allocation and doesn't rely on the destroyed rapidjson::Document. Tests: Reran the asan build and passed. Change-Id: I3bb9dcf9d72cce7ad37d5bc25821cf6ee55a8ab5 --- M be/src/exprs/ai-functions-ir.cc M be/src/exprs/ai-functions.h M be/src/exprs/ai-functions.inline.h M be/src/exprs/expr-test.cc 4 files changed, 7 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/21315/1 -- To view, visit http://gerrit.cloudera.org:8080/21315 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I3bb9dcf9d72cce7ad37d5bc25821cf6ee55a8ab5 Gerrit-Change-Number: 21315 Gerrit-PatchSet: 1 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat
[Impala-ASF-CR] WIP: IMPALA-12046: Add profile counter for scan range queueing time on disk queues
Yida Wu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21309 Change subject: WIP: IMPALA-12046: Add profile counter for scan range queueing time on disk queues .. WIP: IMPALA-12046: Add profile counter for scan range queueing time on disk queues This patch introduces a new timer counter in the query profile to track the time spent by scan ranges in the disk io queues. The solution is to add a timer to each request range and starting and stopping the timer when the request range is in the disk io queue. Each request context will have a queueing timer counter, and every timer associated with the range belonging to the request context will aggregate the time count to this counter. Also adds Reset() function to existing timer to enable fast resetting of the timer. Since ranges are now reusable, it is more efficient to reset the timer instead of reallocating it every time the range is reused. Tests: Manuall tests and it shows the aggregated queueing time in the profile. TODO: Test the performance impact and add tests. Change-Id: Ia0cb24db36cd89933c150b88e166e68a3abc7a60 --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/request-context.cc M be/src/runtime/io/request-context.h M be/src/runtime/io/request-ranges.h M be/src/util/runtime-profile-counters.h M be/src/util/stopwatch.h 8 files changed, 90 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/21309/1 -- To view, visit http://gerrit.cloudera.org:8080/21309 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ia0cb24db36cd89933c150b88e166e68a3abc7a60 Gerrit-Change-Number: 21309 Gerrit-PatchSet: 1 Gerrit-Owner: Yida Wu
[Impala-ASF-CR] IMPALA-12874: Identify active and standby catalog and statestore the web debug endpoint
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/21294 ) Change subject: IMPALA-12874: Identify active and standby catalog and statestore the web debug endpoint .. Patch Set 2: (2 comments) Thanks Wenzhe for reviewing. http://gerrit.cloudera.org:8080/#/c/21294/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21294/1//COMMIT_MSG@15 PS1, Line 15: the : status as > nit: status as active Done http://gerrit.cloudera.org:8080/#/c/21294/1/be/src/util/default-path-handlers.cc File be/src/util/default-path-handlers.cc: http://gerrit.cloudera.org:8080/#/c/21294/1/be/src/util/default-path-handlers.cc@283 PS1, Line 283: if (metric->GetValue()) { : document->AddMember( : "catalogd_active_status", "Active", document->GetAllocator()); : } else { : document->AddMember( : "catalogd_active_status", "Standby", document->GetAllocator()); : } > nit: this can be simplified as Tried, but the initializer of rapidjson::GenericValue seems not working this way.. -- To view, visit http://gerrit.cloudera.org:8080/21294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie9435ba7a9549ea56f9d080a9315aecbcc630cd2 Gerrit-Change-Number: 21294 Gerrit-PatchSet: 2 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Mon, 15 Apr 2024 17:06:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12874: Identify active and standby catalog and statestore the web debug endpoint
Yida Wu has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/21294 ) Change subject: IMPALA-12874: Identify active and standby catalog and statestore the web debug endpoint .. IMPALA-12874: Identify active and standby catalog and statestore the web debug endpoint This patch adds support to display the HA status of catalog and statestore on the root web page. The status will be presented as "Catalog Status: Active" or "Statestore Status: Standby" based on the values retrieved from the metrics catalogd-server.active-status and statestore.active-status. If the catalog or statestore is standalone, it will show the status as active, which is same as the metric. Tests: Ran core tests. Manually tests the web page, and verified the status display is correct. Change-Id: Ie9435ba7a9549ea56f9d080a9315aecbcc630cd2 --- M be/src/common/daemon-env.h M be/src/util/default-path-handlers.cc M be/src/util/default-path-handlers.h M www/root.tmpl 4 files changed, 61 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/21294/2 -- To view, visit http://gerrit.cloudera.org:8080/21294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie9435ba7a9549ea56f9d080a9315aecbcc630cd2 Gerrit-Change-Number: 21294 Gerrit-PatchSet: 2 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-12874: Identify active and standby catalog and statestore the web debug endpoint
Yida Wu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21294 Change subject: IMPALA-12874: Identify active and standby catalog and statestore the web debug endpoint .. IMPALA-12874: Identify active and standby catalog and statestore the web debug endpoint This patch adds support to display the HA status of catalog and statestore on the root web page. The status will be presented as "Catalog Status: Active" or "Statestore Status: Standby" based on the values retrieved from the metrics catalogd-server.active-status and statestore.active-status. If the catalog or statestore is standalone, it will show active as the status, which is same as the metric. Tests: Ran core tests. Manually tests the web page, and verified the status display is correct. Change-Id: Ie9435ba7a9549ea56f9d080a9315aecbcc630cd2 --- M be/src/common/daemon-env.h M be/src/util/default-path-handlers.cc M be/src/util/default-path-handlers.h M www/root.tmpl 4 files changed, 61 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/21294/1 -- To view, visit http://gerrit.cloudera.org:8080/21294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ie9435ba7a9549ea56f9d080a9315aecbcc630cd2 Gerrit-Change-Number: 21294 Gerrit-PatchSet: 1 Gerrit-Owner: Yida Wu
[Impala-ASF-CR] IMPALA-12963: Return parent PID when children spawned
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/21278 ) Change subject: IMPALA-12963: Return parent PID when children spawned .. Patch Set 1: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/21278 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I214e79507c717340863d27f68f6ea54c169e4090 Gerrit-Change-Number: 21278 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Wed, 10 Apr 2024 16:58:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12920: Support ai generate text built-in function for OpenAI's chat completion API
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/21168 ) Change subject: IMPALA-12920: Support ai_generate_text built-in function for OpenAI's chat completion API .. Patch Set 16: Code-Review+1 (2 comments) http://gerrit.cloudera.org:8080/#/c/21168/16/be/src/exprs/ai-functions.h File be/src/exprs/ai-functions.h: http://gerrit.cloudera.org:8080/#/c/21168/16/be/src/exprs/ai-functions.h@18 PS16, Line 18: #ifndef IMPALA_EXPRS_AI_FUNCTIONS_H : #define IMPALA_EXPRS_AI_FUNCTIONS_H nit. #pragma once seems simpler http://gerrit.cloudera.org:8080/#/c/21168/16/be/src/exprs/ai-functions.inline.h File be/src/exprs/ai-functions.inline.h: http://gerrit.cloudera.org:8080/#/c/21168/16/be/src/exprs/ai-functions.inline.h@18 PS16, Line 18: #ifndef IMPALA_EXPRS_AI_FUNCTIONS_INLINE_H : #define IMPALA_EXPRS_AI_FUNCTIONS_INLINE_H nit. #pragma once seems simpler, but not a big deal -- To view, visit http://gerrit.cloudera.org:8080/21168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id4446957f6030bab1f985fdd69185c3da07d7c4b Gerrit-Change-Number: 21168 Gerrit-PatchSet: 16 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Wed, 10 Apr 2024 15:41:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12905: Disk-based tuple caching
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/21171 ) Change subject: IMPALA-12905: Disk-based tuple caching .. Patch Set 9: (2 comments) http://gerrit.cloudera.org:8080/#/c/21171/9/be/src/exec/tuple-file-writer.h File be/src/exec/tuple-file-writer.h: http://gerrit.cloudera.org:8080/#/c/21171/9/be/src/exec/tuple-file-writer.h@46 PS9, Line 46: wtihout nit. without http://gerrit.cloudera.org:8080/#/c/21171/9/be/src/runtime/tuple-cache-mgr.h File be/src/runtime/tuple-cache-mgr.h: http://gerrit.cloudera.org:8080/#/c/21171/9/be/src/runtime/tuple-cache-mgr.h@110 PS9, Line 110: increament nit. increment -- To view, visit http://gerrit.cloudera.org:8080/21171 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I13a65c4c0559cad3559d5f714a074dd06e9cc9bf Gerrit-Change-Number: 21171 Gerrit-PatchSet: 9 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Mon, 08 Apr 2024 22:28:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12960: Fix Incorrect RowsPassedThrough Metric in Streaming Aggregation
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/21235 ) Change subject: IMPALA-12960: Fix Incorrect RowsPassedThrough Metric in Streaming Aggregation .. Patch Set 1: Thanks Riza and Wenzhe for the review. -- To view, visit http://gerrit.cloudera.org:8080/21235 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I59205a4b06824ee1607a25e906db1f96dc4eda9f Gerrit-Change-Number: 21235 Gerrit-PatchSet: 1 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 04 Apr 2024 15:27:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12960: Fix Incorrect RowsPassedThrough Metric in Streaming Aggregation
Yida Wu has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21235 ) Change subject: IMPALA-12960: Fix Incorrect RowsPassedThrough Metric in Streaming Aggregation .. IMPALA-12960: Fix Incorrect RowsPassedThrough Metric in Streaming Aggregation This patch fixes a bug in the RowsPassedThrough metric within the query profile while using Streaming Aggregation. The issue is from the AddBatchStreaming() function's logic, where the number of rows in the output batch isn't necessarily initialized to 0, while the function uses num_rows() of the output batch directly to be the actual number of rows returned and passed through of this specific aggregator. This discrepancy can significantly impact the accuracy of the returned and passed through numbers, as well as the calculation of reduction rates during hash table expansion in Streaming Aggregation. Huge differences can be observed especially when using the rollup function. The solution is to calculate the actual number of rows added to the output batch within each round of the AddBatchStreaming() function. Tests: Passed exhaustive tests. Added a corresponding case in tpch-passthrough-aggregations.test. Change-Id: I59205a4b06824ee1607a25e906db1f96dc4eda9f Reviewed-on: http://gerrit.cloudera.org:8080/21235 Reviewed-by: Wenzhe Zhou Reviewed-by: Riza Suminto Tested-by: Impala Public Jenkins --- M be/src/exec/grouping-aggregator.cc M testdata/workloads/tpch/queries/tpch-passthrough-aggregations.test 2 files changed, 27 insertions(+), 2 deletions(-) Approvals: Wenzhe Zhou: Looks good to me, but someone else must approve Riza Suminto: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/21235 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I59205a4b06824ee1607a25e906db1f96dc4eda9f Gerrit-Change-Number: 21235 Gerrit-PatchSet: 2 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Yida Wu
[Impala-ASF-CR] IMPALA-12960: Fix Incorrect RowsPassedThrough Metric in Streaming Aggregation
Yida Wu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21235 Change subject: IMPALA-12960: Fix Incorrect RowsPassedThrough Metric in Streaming Aggregation .. IMPALA-12960: Fix Incorrect RowsPassedThrough Metric in Streaming Aggregation This patch fixes a bug in the RowsPassedThrough metric within the query profile while using Streaming Aggregation. The issue is from the AddBatchStreaming() function's logic, where the number of rows in the output batch isn't necessarily initialized to 0, while the function uses num_rows() of the output batch directly to be the actual number of rows returned and passed through of this specific aggregator. This discrepancy can significantly impact the accuracy of the returned and passed through numbers, as well as the calculation of reduction rates during hash table expansion in Streaming Aggregation. Huge differences can be observed especially when using the rollup function. The solution is to calculate the actual number of rows added to the output batch within each round of the AddBatchStreaming() function. Tests: Passed exhaustive tests. Added a corresponding case in tpch-passthrough-aggregations.test. Change-Id: I59205a4b06824ee1607a25e906db1f96dc4eda9f --- M be/src/exec/grouping-aggregator.cc M testdata/workloads/tpch/queries/tpch-passthrough-aggregations.test 2 files changed, 27 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/21235/1 -- To view, visit http://gerrit.cloudera.org:8080/21235 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I59205a4b06824ee1607a25e906db1f96dc4eda9f Gerrit-Change-Number: 21235 Gerrit-PatchSet: 1 Gerrit-Owner: Yida Wu
[Impala-ASF-CR] IMPALA-12920: Support ai generate text built-in function for OpenAI's chat completion API
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/21168 ) Change subject: IMPALA-12920: Support ai_generate_text built-in function for OpenAI's chat completion API .. Patch Set 6: Code-Review+1 (2 comments) http://gerrit.cloudera.org:8080/#/c/21168/5/be/src/exprs/ai-functions-ir.cc File be/src/exprs/ai-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/21168/5/be/src/exprs/ai-functions-ir.cc@83 PS5, Line 83: != nullptr); > This was suggested by clang to improve readability, I think. I'm inclined t Ack http://gerrit.cloudera.org:8080/#/c/21168/5/be/src/exprs/ai-functions-ir.cc@251 PS5, Line 251: response > Some of that would be controlled by parameters such as max_tokens that you That makes sense. Ack. -- To view, visit http://gerrit.cloudera.org:8080/21168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id4446957f6030bab1f985fdd69185c3da07d7c4b Gerrit-Change-Number: 21168 Gerrit-PatchSet: 6 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Wed, 03 Apr 2024 15:05:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12920: Support ai generate text built-in function for OpenAI's chat completion API
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/21168 ) Change subject: IMPALA-12920: Support ai_generate_text built-in function for OpenAI's chat completion API .. Patch Set 5: Code-Review+1 (4 comments) LGTM! Some minor comments. http://gerrit.cloudera.org:8080/#/c/21168/5/be/src/exprs/ai-functions-ir.cc File be/src/exprs/ai-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/21168/5/be/src/exprs/ai-functions-ir.cc@83 PS5, Line 83: != nullptr); nit. seems we can save one line here because above line has enough space http://gerrit.cloudera.org:8080/#/c/21168/5/be/src/exprs/ai-functions-ir.cc@225 PS5, Line 225: kudu::EasyCurl curl; : curl.set_timeout(kudu::MonoDelta::FromSeconds(FLAGS_ai_connection_timeout_s)); : curl.set_fail_on_http_error(true); : kudu::faststring resp; : kudu::Status status = curl.PostToURL(endpoint_str, payload_str, , headers); : VLOG(2) << "AI Generate Text: \noriginal response: " << resp.ToString(); : if (!status.ok()) { : string msg = status.ToString(); : return StringVal::CopyFrom( : ctx, reinterpret_cast(msg.c_str()), msg.size()); : } Can we add a summary comment about what this part of code is doing? I think it does the real thing to contact with the Ai endpoint, and can be critical to performance, maybe a comment can make it clearer for others to understand its importance. http://gerrit.cloudera.org:8080/#/c/21168/5/be/src/exprs/ai-functions-ir.cc@236 PS5, Line 236: JSON string nit. how about saying "response JSON string" http://gerrit.cloudera.org:8080/#/c/21168/5/be/src/exprs/ai-functions-ir.cc@251 PS5, Line 251: response Thinking about whether we need to worry about the case when the length of response is very large or unexpected large. How about we print a warning log if the response is larger than certain size? -- To view, visit http://gerrit.cloudera.org:8080/21168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id4446957f6030bab1f985fdd69185c3da07d7c4b Gerrit-Change-Number: 21168 Gerrit-PatchSet: 5 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Mon, 01 Apr 2024 22:19:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4545: Simplify test dimension in test decimal casting.py
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/21174 ) Change subject: IMPALA-4545: Simplify test dimension in test_decimal_casting.py .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21174 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibe269e08a955097ad9e924d5d64b42438ad15be2 Gerrit-Change-Number: 21174 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Fri, 22 Mar 2024 21:24:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12920: Support ai generate text built-in function for OpenAI's chat completion API
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/21168 ) Change subject: IMPALA-12920: Support ai_generate_text built-in function for OpenAI's chat completion API .. Patch Set 3: (8 comments) Thanks for bringing this up! It looks good overall. I noticed some formatting issues in the file ai-functions-ir.cc, could you please rerun clang-format on that file? http://gerrit.cloudera.org:8080/#/c/21168/3/be/src/exprs/ai-functions-ir.cc File be/src/exprs/ai-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/21168/3/be/src/exprs/ai-functions-ir.cc@74 PS3, Line 74: https:// Maybe declare a const string for "https://;, we can also use size of it instead of strlen every time. http://gerrit.cloudera.org:8080/#/c/21168/3/be/src/exprs/ai-functions-ir.cc@75 PS3, Line 75: no need spaces http://gerrit.cloudera.org:8080/#/c/21168/3/be/src/exprs/ai-functions-ir.cc@79 PS3, Line 79: openai.azure.com Maybe declare a const string for it? Also same to "api.openai.com" http://gerrit.cloudera.org:8080/#/c/21168/3/be/src/exprs/ai-functions-ir.cc@81 PS3, Line 81: no need spaces http://gerrit.cloudera.org:8080/#/c/21168/3/be/src/exprs/ai-functions-ir.cc@85 PS3, Line 85: choices Thinking may be good the have const strings for all the json field names, easier to maintain the field name later http://gerrit.cloudera.org:8080/#/c/21168/3/be/src/exprs/ai-functions-ir.cc@122 PS3, Line 122: should be two spaces http://gerrit.cloudera.org:8080/#/c/21168/3/be/src/exprs/ai-functions-ir.cc@196 PS3, Line 196: std:: doesn't need the namespace http://gerrit.cloudera.org:8080/#/c/21168/3/be/src/exprs/string-functions-ir.cc File be/src/exprs/string-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/21168/3/be/src/exprs/string-functions-ir.cc@1950 PS3, Line 1950: } Seems the change is unnecessary. -- To view, visit http://gerrit.cloudera.org:8080/21168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id4446957f6030bab1f985fdd69185c3da07d7c4b Gerrit-Change-Number: 21168 Gerrit-PatchSet: 3 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Fri, 22 Mar 2024 01:14:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4545: Simplify test dimension in test decimal casting.py
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/21174 ) Change subject: IMPALA-4545: Simplify test dimension in test_decimal_casting.py .. Patch Set 3: Code-Review+1 LGTM! I can upgrade to +2 if no one else comments -- To view, visit http://gerrit.cloudera.org:8080/21174 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibe269e08a955097ad9e924d5d64b42438ad15be2 Gerrit-Change-Number: 21174 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 21 Mar 2024 18:14:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4545: Simplify test dimension in test decimal casting.py
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/21174 ) Change subject: IMPALA-4545: Simplify test dimension in test_decimal_casting.py .. Patch Set 2: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/21174/1/tests/query_test/test_decimal_casting.py File tests/query_test/test_decimal_casting.py: http://gerrit.cloudera.org:8080/#/c/21174/1/tests/query_test/test_decimal_casting.py@54 PS1, Line 54: Core/pairwise only deals with precision > Yes, core is permuted the same as pairwise now. Can you also mention it in the commit message? Because the testing behavior for "pairwise" is changed. -- To view, visit http://gerrit.cloudera.org:8080/21174 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibe269e08a955097ad9e924d5d64b42438ad15be2 Gerrit-Change-Number: 21174 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 21 Mar 2024 15:47:54 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4545: Simplify test dimension in test decimal casting.py
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/21174 ) Change subject: IMPALA-4545: Simplify test dimension in test_decimal_casting.py .. Patch Set 1: Code-Review+1 (2 comments) http://gerrit.cloudera.org:8080/#/c/21174/1/tests/query_test/test_decimal_casting.py File tests/query_test/test_decimal_casting.py: http://gerrit.cloudera.org:8080/#/c/21174/1/tests/query_test/test_decimal_casting.py@54 PS1, Line 54: Core only deals with precision 6, 16, 26 Not familiar with the "pairwise" mode, is it the same as the core now? http://gerrit.cloudera.org:8080/#/c/21174/1/tests/query_test/test_decimal_casting.py@109 PS1, Line 109: # precision, scale = vector.get_value('decimal_type') This line can be removed? -- To view, visit http://gerrit.cloudera.org:8080/21174 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibe269e08a955097ad9e924d5d64b42438ad15be2 Gerrit-Change-Number: 21174 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 21 Mar 2024 01:47:50 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12883: Support updating the charge on an entry in the cache
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/21122 ) Change subject: IMPALA-12883: Support updating the charge on an entry in the cache .. Patch Set 7: Code-Review+2 Carry Michael's +1. -- To view, visit http://gerrit.cloudera.org:8080/21122 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I34f54fb3a91a77821651c25d8d3bc3a2a3945025 Gerrit-Change-Number: 21122 Gerrit-PatchSet: 7 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Tue, 19 Mar 2024 17:16:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12883: Support updating the charge on an entry in the cache
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/21122 ) Change subject: IMPALA-12883: Support updating the charge on an entry in the cache .. Patch Set 5: Code-Review+1 (6 comments) http://gerrit.cloudera.org:8080/#/c/21122/5/be/src/util/cache/lirs-cache-test.cc File be/src/util/cache/lirs-cache-test.cc: http://gerrit.cloudera.org:8080/#/c/21122/5/be/src/util/cache/lirs-cache-test.cc@274 PS5, Line 274: nit. unnecessary new line http://gerrit.cloudera.org:8080/#/c/21122/5/be/src/util/cache/lirs-cache-test.cc@424 PS5, Line 424: nit. unnecessary new line http://gerrit.cloudera.org:8080/#/c/21122/5/be/src/util/cache/lirs-cache-test.cc@455 PS5, Line 455: nit. unnecessary new line http://gerrit.cloudera.org:8080/#/c/21122/5/be/src/util/cache/lirs-cache.cc File be/src/util/cache/lirs-cache.cc: http://gerrit.cloudera.org:8080/#/c/21122/5/be/src/util/cache/lirs-cache.cc@990 PS5, Line 990: CHECK(false); Maybe add a logging for unexpected state? CHECK(false) << "Unexpected state: " << e->state(); http://gerrit.cloudera.org:8080/#/c/21122/5/be/src/util/cache/rl-cache.cc File be/src/util/cache/rl-cache.cc: http://gerrit.cloudera.org:8080/#/c/21122/5/be/src/util/cache/rl-cache.cc@324 PS5, Line 324: while (usage_ > capacity_ && rl_.next != _) { : RLHandle* old = rl_.next; : RL_Remove(old); : table_.Remove(old->key(), old->hash()); : if (Unref(old)) { : old->next = to_remove_head; : to_remove_head = old; : } : } This section of code is identical to a part of RLCacheShard::Insert. Could we consider extracting this segment into a separate function for it to reduce redundancy? http://gerrit.cloudera.org:8080/#/c/21122/5/be/src/util/cache/rl-cache.cc@336 PS5, Line 336: while (to_remove_head != nullptr) { : RLHandle* next = to_remove_head->next; : FreeEntry(to_remove_head); : to_remove_head = next; : } Same as above -- To view, visit http://gerrit.cloudera.org:8080/21122 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I34f54fb3a91a77821651c25d8d3bc3a2a3945025 Gerrit-Change-Number: 21122 Gerrit-PatchSet: 5 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 14 Mar 2024 18:03:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12818: Intermediate Result Caching plan node framework
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/21035 ) Change subject: IMPALA-12818: Intermediate Result Caching plan node framework .. Patch Set 5: Code-Review+1 (3 comments) Looks good for an initial patch. Could be +2 if no other major issues http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/main/java/org/apache/impala/planner/PlanNode.java File fe/src/main/java/org/apache/impala/planner/PlanNode.java: http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/main/java/org/apache/impala/planner/PlanNode.java@530 PS3, Line 530: have not been calcula > Refactored this to be a ThriftSerializationCtx that is passed in. This code Done http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/main/java/org/apache/impala/planner/PlanNode.java@1237 PS3, Line 1237: > Reworked this comment to describe how this is a bottom-up tree traversal. Done http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/main/java/org/apache/impala/planner/TupleCacheInfo.java File fe/src/main/java/org/apache/impala/planner/TupleCacheInfo.java: http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/main/java/org/apache/impala/planner/TupleCacheInfo.java@59 PS3, Line 59: > Changed this to use StringBuilder. I also added an explicit finalize() call Done -- To view, visit http://gerrit.cloudera.org:8080/21035 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia1f36a87dcce6efd5d1e1f0bc04009bf009b1961 Gerrit-Change-Number: 21035 Gerrit-PatchSet: 5 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Tue, 12 Mar 2024 17:14:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12818: Intermediate Result Caching plan node framework
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/21035 ) Change subject: IMPALA-12818: Intermediate Result Caching plan node framework .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/21035/3/be/src/exec/tuple-cache-node.cc File be/src/exec/tuple-cache-node.cc: http://gerrit.cloudera.org:8080/#/c/21035/3/be/src/exec/tuple-cache-node.cc@28 PS3, Line 28: allow_tuple_caching > The way I see it working is: Agreed. In this case, I think including a DCHECK(FLAGS_allow_tuple_caching) within TupleCacheNode::TupleCacheNode or TupleCacheNode::Open would be a beneficial assertion. -- To view, visit http://gerrit.cloudera.org:8080/21035 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia1f36a87dcce6efd5d1e1f0bc04009bf009b1961 Gerrit-Change-Number: 21035 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Tue, 27 Feb 2024 23:43:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12818: Intermediate Result Caching plan node framework
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/21035 ) Change subject: IMPALA-12818: Intermediate Result Caching plan node framework .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/21035/3/be/src/exec/tuple-cache-node.cc File be/src/exec/tuple-cache-node.cc: http://gerrit.cloudera.org:8080/#/c/21035/3/be/src/exec/tuple-cache-node.cc@28 PS3, Line 28: allow_tuple_caching > Used in query-options.cc. Does that mean the allow_tuple_caching is always true when it comes to the class TupleCacheNode so that we don't need to care about this flag in TupleCacheNode? If that is the case, maybe we can add a DCHECK in the class http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/test/java/org/apache/impala/planner/TupleCacheTest.java File fe/src/test/java/org/apache/impala/planner/TupleCacheTest.java: http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/test/java/org/apache/impala/planner/TupleCacheTest.java@59 PS3, Line 59: // TODO: incorporate column information into the key : // verifyDifferentCacheKeys("select id from functional.alltypes", : // "sele > I'm less concerned about adding commented test cases that are part of a TOD Ack -- To view, visit http://gerrit.cloudera.org:8080/21035 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia1f36a87dcce6efd5d1e1f0bc04009bf009b1961 Gerrit-Change-Number: 21035 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Tue, 27 Feb 2024 21:00:00 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12818: Intermediate Result Caching plan node framework
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/21035 ) Change subject: IMPALA-12818: Intermediate Result Caching plan node framework .. Patch Set 3: (10 comments) http://gerrit.cloudera.org:8080/#/c/21035/3/be/src/exec/tuple-cache-node.cc File be/src/exec/tuple-cache-node.cc: http://gerrit.cloudera.org:8080/#/c/21035/3/be/src/exec/tuple-cache-node.cc@28 PS3, Line 28: allow_tuple_caching It seems the flag is not used in the class, or do I miss something? http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/main/java/org/apache/impala/planner/PlanNode.java File fe/src/main/java/org/apache/impala/planner/PlanNode.java: http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/main/java/org/apache/impala/planner/PlanNode.java@530 PS3, Line 530: tupleCacheInfo == null Could you please add a comment to describe when or why the tupleCacheInfo is null http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/main/java/org/apache/impala/planner/PlanNode.java@1237 PS3, Line 1237: recurse to children For clarity, maybe it is better to have a full sentence like, "Recursively compute tuple cache info for children nodes" http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/main/java/org/apache/impala/planner/PlanNode.java@1251 PS3, Line 1251: TODO Would be good if we have a Jira to follow the TODOs http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/main/java/org/apache/impala/planner/TupleCacheInfo.java File fe/src/main/java/org/apache/impala/planner/TupleCacheInfo.java: http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/main/java/org/apache/impala/planner/TupleCacheInfo.java@59 PS3, Line 59: String May consider StringBuilder for hashTrace_ due to string concatenations in mergeChild(), StringBuilder could be more efficient http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/main/java/org/apache/impala/planner/TupleCacheInfo.java@94 PS3, Line 94: TODO Would be good if we have a Jira to follow the TODOs http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/main/java/org/apache/impala/planner/TupleCachePlanner.java File fe/src/main/java/org/apache/impala/planner/TupleCachePlanner.java: http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/main/java/org/apache/impala/planner/TupleCachePlanner.java@75 PS3, Line 75: TODO Would be good if we have a Jira to follow the TODOs http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/test/java/org/apache/impala/planner/TupleCacheTest.java File fe/src/test/java/org/apache/impala/planner/TupleCacheTest.java: http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/test/java/org/apache/impala/planner/TupleCacheTest.java@59 PS3, Line 59: // TODO: incorporate column information into the key : // verifyDifferentCacheKeys("select id from functional.alltypes", : // "sele May simply remove the case from the testcase for the commit if we don't support it yet http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/test/java/org/apache/impala/planner/TupleCacheTest.java@69 PS3, Line 69: TODO same as above http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/test/java/org/apache/impala/planner/TupleCacheTest.java@79 PS3, Line 79: // TODO: Random functions should make a location ineligible same as above -- To view, visit http://gerrit.cloudera.org:8080/21035 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia1f36a87dcce6efd5d1e1f0bc04009bf009b1961 Gerrit-Change-Number: 21035 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Mon, 26 Feb 2024 02:35:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12828: Remove Usage of Unnecessary this->
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/21047 ) Change subject: IMPALA-12828: Remove Usage of Unnecessary this-> .. Patch Set 3: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/21047/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21047/3//COMMIT_MSG@9 PS3, Line 9: 408c606 nit. include the related Jira ticket IMPALA-12426 in the commit message might be beneficial. -- To view, visit http://gerrit.cloudera.org:8080/21047 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia6b1d8ba7e27d20ba1ac83caff5e56fd0f9347c0 Gerrit-Change-Number: 21047 Gerrit-PatchSet: 3 Gerrit-Owner: Jason Fehr Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Wed, 21 Feb 2024 22:25:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12790: Fix overestimation in ScanNode.getInputCardinality
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/21005 ) Change subject: IMPALA-12790: Fix overestimation in ScanNode.getInputCardinality .. Patch Set 9: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/21005/9/tests/query_test/test_codegen.py File tests/query_test/test_codegen.py: http://gerrit.cloudera.org:8080/#/c/21005/9/tests/query_test/test_codegen.py@51 PS9, Line 51: {'disable_codegen_rows_threshold': 7000} > disable_codegen_rows_threshold default value is 5. Ack -- To view, visit http://gerrit.cloudera.org:8080/21005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icc5b39a7684fb8748185349d0b80baf8dcd6b126 Gerrit-Change-Number: 21005 Gerrit-PatchSet: 9 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Wed, 14 Feb 2024 00:58:18 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12790: Fix overestimation in ScanNode.getInputCardinality
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/21005 ) Change subject: IMPALA-12790: Fix overestimation in ScanNode.getInputCardinality .. Patch Set 6: Code-Review+1 (2 comments) It looks good to me, just minor things. http://gerrit.cloudera.org:8080/#/c/21005/6/fe/src/main/java/org/apache/impala/planner/ScanNode.java File fe/src/main/java/org/apache/impala/planner/ScanNode.java: http://gerrit.cloudera.org:8080/#/c/21005/6/fe/src/main/java/org/apache/impala/planner/ScanNode.java@336 PS6, Line 336: if (hasSimpleLimit()) { It would be good to add comments describing the high-level logic of the function http://gerrit.cloudera.org:8080/#/c/21005/6/fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java File fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java: http://gerrit.cloudera.org:8080/#/c/21005/6/fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java@54 PS6, Line 54: hasSimpleLimit Would it be simpler to use scan.hasSimpleLimit() directly since hasSimpleLimit is not used anywhere else in the code -- To view, visit http://gerrit.cloudera.org:8080/21005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icc5b39a7684fb8748185349d0b80baf8dcd6b126 Gerrit-Change-Number: 21005 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Mon, 12 Feb 2024 01:16:55 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12721: Fix flaky tests involving check deleted file fd()
Yida Wu has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/21000 ) Change subject: IMPALA-12721: Fix flaky tests involving check_deleted_file_fd() .. IMPALA-12721: Fix flaky tests involving check_deleted_file_fd() check_deleted_file_fd() is introduced in IMPALA-12681, however some spilling testcases involving check_deleted_file_fd() seem flaky. This patch fixed the issue by adding a retry mechanism within the check_deleted_file_fd() function. If the function encounters a failure, it retries the process of verifying the presence of a deleted referencing file. Based on my local test, the file will be removed after the test even when the test fails and the call to delete the file handle is ahead of the call to remove the file (This has been confirmed through additional testing logs). While there is no theory why this would happen, introducing a retry mechanism has allowed the test case to run successfully for 200 times without encountering any failures. It is possible that a delay may be occurring at some point in the process which leads to this kind of failure. Tests: Reran the testcase 200 times without a failure. Change-Id: I900aab7dc9833015ce140253ff40da28a6ed3ba6 --- M tests/custom_cluster/test_scratch_disk.py 1 file changed, 5 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/21000/3 -- To view, visit http://gerrit.cloudera.org:8080/21000 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I900aab7dc9833015ce140253ff40da28a6ed3ba6 Gerrit-Change-Number: 21000 Gerrit-PatchSet: 3 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu
[Impala-ASF-CR] IMPALA-12721: Fix flaky tests involving check deleted file fd()
Yida Wu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21000 Change subject: IMPALA-12721: Fix flaky tests involving check_deleted_file_fd() .. IMPALA-12721: Fix flaky tests involving check_deleted_file_fd() check_deleted_file_fd() is introduced in IMPALA-12681, however some spilling testcases involving check_deleted_file_fd() seem flaky. This patch fixed the issue by adding a retry to the testcase if a failure happens. Based on my local test, the file will be removed after the test even when the test fails and the call to delete the file handle is ahead of the call to remove the file (This has been confirmed through additional testing logs). While there is no theory why this would happen, introducing a retry mechanism has allowed the test case to run successfully for 200 times without encountering any failures. It is possible that a delay may be occurring at some point in the process which leads to this kind of failure. Tests: Reran the testcase 200 times without a failure. Change-Id: I900aab7dc9833015ce140253ff40da28a6ed3ba6 --- M tests/custom_cluster/test_scratch_disk.py 1 file changed, 5 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/21000/1 -- To view, visit http://gerrit.cloudera.org:8080/21000 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I900aab7dc9833015ce140253ff40da28a6ed3ba6 Gerrit-Change-Number: 21000 Gerrit-PatchSet: 1 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer
[Impala-ASF-CR] IMPALA-12745: Skip parallel symbol dumping with RPM/DEB packages
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20943 ) Change subject: IMPALA-12745: Skip parallel symbol dumping with RPM/DEB packages .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/20943 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If2885a9cfb36a4f616b539599e7f744bd23552c3 Gerrit-Change-Number: 20943 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Mon, 29 Jan 2024 18:15:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12125: Support for dumping symbols from RPMs without separate symbols
Yida Wu has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/20944 ) Change subject: IMPALA-12125: Support for dumping symbols from RPMs without separate symbols .. IMPALA-12125: Support for dumping symbols from RPMs without separate symbols Some RPMs contain binaries with debug symbols with no separate debuginfo package needed. bin/dump_breakpad_symbols.py does not allow this combination, as it expects a corresponding symbol package. This adds a --no_symbol_pkg option to dump_breakpad_symbols.py to turn off the requirement that --pkg be combined with --symbol_pkg. Testing: - Tested with an RPM package with an unstripped impalad binary - Tested with the usual RPM + debuginfo RPM combination Change-Id: I9589b0ed7855fe49c6989ec3dcc51a9e9c4f476b Reviewed-on: http://gerrit.cloudera.org:8080/20944 Reviewed-by: Impala Public Jenkins Tested-by: Yida Wu --- M bin/dump_breakpad_symbols.py 1 file changed, 22 insertions(+), 13 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved Yida Wu: Verified -- To view, visit http://gerrit.cloudera.org:8080/20944 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I9589b0ed7855fe49c6989ec3dcc51a9e9c4f476b Gerrit-Change-Number: 20944 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu
[Impala-ASF-CR] IMPALA-12125: Support for dumping symbols from RPMs without separate symbols
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20944 ) Change subject: IMPALA-12125: Support for dumping symbols from RPMs without separate symbols .. Patch Set 2: Verified+1 Irrelevant failure at TestEventProcessingCustomConfigs.test_skipping_older_events -- To view, visit http://gerrit.cloudera.org:8080/20944 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9589b0ed7855fe49c6989ec3dcc51a9e9c4f476b Gerrit-Change-Number: 20944 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Mon, 29 Jan 2024 17:32:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12125: Support for dumping symbols from RPMs without separate symbols
Yida Wu has removed a vote on this change. Change subject: IMPALA-12125: Support for dumping symbols from RPMs without separate symbols .. Removed Verified-1 by Impala Public Jenkins -- To view, visit http://gerrit.cloudera.org:8080/20944 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: I9589b0ed7855fe49c6989ec3dcc51a9e9c4f476b Gerrit-Change-Number: 20944 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu
[Impala-ASF-CR] IMPALA-12745: Skip parallel symbol dumping with RPM/DEB packages
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20943 ) Change subject: IMPALA-12745: Skip parallel symbol dumping with RPM/DEB packages .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/20943/1/bin/dump_breakpad_symbols.py File bin/dump_breakpad_symbols.py: http://gerrit.cloudera.org:8080/#/c/20943/1/bin/dump_breakpad_symbols.py@369 PS1, Line 369: status = 1 Do we need a break here when something happens? -- To view, visit http://gerrit.cloudera.org:8080/20943 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If2885a9cfb36a4f616b539599e7f744bd23552c3 Gerrit-Change-Number: 20943 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Mon, 29 Jan 2024 02:28:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12125: Support for dumping symbols from RPMs without separate symbols
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20944 ) Change subject: IMPALA-12125: Support for dumping symbols from RPMs without separate symbols .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/20944 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9589b0ed7855fe49c6989ec3dcc51a9e9c4f476b Gerrit-Change-Number: 20944 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Mon, 29 Jan 2024 02:00:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12698: Restrict check deleted file fd() for fixing flaky tests
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20898 ) Change subject: IMPALA-12698: Restrict check_deleted_file_fd() for fixing flaky tests .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/20898/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20898/2//COMMIT_MSG@10 PS2, Line 10: spillin > nit: typo in 'spilling' Thanks. Done. -- To view, visit http://gerrit.cloudera.org:8080/20898 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I55f5aa1cdbc0c74f6c7ebd25575e71d2b238bf98 Gerrit-Change-Number: 20898 Gerrit-PatchSet: 3 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Mon, 15 Jan 2024 10:59:50 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12698: Restrict check deleted file fd() for fixing flaky tests
Yida Wu has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/20898 ) Change subject: IMPALA-12698: Restrict check_deleted_file_fd() for fixing flaky tests .. IMPALA-12698: Restrict check_deleted_file_fd() for fixing flaky tests The introduction of check_deleted_file_fd() in IMPALA-12681 aimed to detect a bug related to remote spilling where local temporary file handles were not being released after deletion. However, the tests associated with this function seem flaky in exhaustive builds with occasionally some files of hdfs may not be promptly released after deletion, though locally, I observed that these files are eventually removed from /proc/xx/fd in a few minutes, the reason is unclear yet. To fix the flaky build failure, this patch confines the scope of check_deleted_file_fd() to detect files containing the keyword "scratch" only. Given that hdfs files eventually get removed, and it seems not an urgent issue, a separate Jira will be filed to track and investigate this behavior further. Testing: Reran the tests a couple times and passed. Change-Id: I55f5aa1cdbc0c74f6c7ebd25575e71d2b238bf98 --- M tests/custom_cluster/test_scratch_disk.py 1 file changed, 6 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/20898/3 -- To view, visit http://gerrit.cloudera.org:8080/20898 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I55f5aa1cdbc0c74f6c7ebd25575e71d2b238bf98 Gerrit-Change-Number: 20898 Gerrit-PatchSet: 3 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Yida Wu
[Impala-ASF-CR] IMPALA-12698: Restrict check deleted file fd() for fixing flaky tests
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20898 ) Change subject: IMPALA-12698: Restrict check_deleted_file_fd() for fixing flaky tests .. Patch Set 2: (1 comment) Thanks Csaba for the review. http://gerrit.cloudera.org:8080/#/c/20898/1/tests/custom_cluster/test_scratch_disk.py File tests/custom_cluster/test_scratch_disk.py: http://gerrit.cloudera.org:8080/#/c/20898/1/tests/custom_cluster/test_scratch_disk.py@286 PS1, Line 286: # Look for the files with keywords 'scratch' and '(deleted)'. > Can you mention IMPALA-12698 and that there are temporary files like this c Done -- To view, visit http://gerrit.cloudera.org:8080/20898 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I55f5aa1cdbc0c74f6c7ebd25575e71d2b238bf98 Gerrit-Change-Number: 20898 Gerrit-PatchSet: 2 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Mon, 15 Jan 2024 09:49:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12698: Restrict check deleted file fd() for fixing flaky tests
Yida Wu has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/20898 ) Change subject: IMPALA-12698: Restrict check_deleted_file_fd() for fixing flaky tests .. IMPALA-12698: Restrict check_deleted_file_fd() for fixing flaky tests The introduction of check_deleted_file_fd() in IMPALA-12681 aimed to detect a bug related to remote splling where local temporary file handles were not being released after deletion. However, the tests associated with this function seem flaky in exhaustive builds with occasionally some files of hdfs may not be promptly released after deletion, though locally, I observed that these files are eventually removed from /proc/xx/fd in a few minutes, the reason is unclear yet. To fix the flaky build failure, this patch confines the scope of check_deleted_file_fd() to detect files containing the keyword "scratch" only. Given that hdfs files eventually get removed, and it seems not an urgent issue, a separate Jira will be filed to track and investigate this behavior further. Testing: Reran the tests a couple times and passed. Change-Id: I55f5aa1cdbc0c74f6c7ebd25575e71d2b238bf98 --- M tests/custom_cluster/test_scratch_disk.py 1 file changed, 6 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/20898/2 -- To view, visit http://gerrit.cloudera.org:8080/20898 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I55f5aa1cdbc0c74f6c7ebd25575e71d2b238bf98 Gerrit-Change-Number: 20898 Gerrit-PatchSet: 2 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-12698: Restrict check deleted file fd() for fixing flaky tests
Yida Wu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/20898 Change subject: IMPALA-12698: Restrict check_deleted_file_fd() for fixing flaky tests .. IMPALA-12698: Restrict check_deleted_file_fd() for fixing flaky tests The introduction of check_deleted_file_fd() in IMPALA-12681 aimed to detect a bug related to remote splling where local temporary file handles were not being released after deletion. However, the tests associated with this function seem flaky in exhaustive builds with occasionally some files of hdfs may not be promptly released after deletion, though locally, I observed that these files are eventually removed from /proc/xx/fd in a few minutes, the reason is unclear yet. To fix the flaky build failure, this patch confines the scope of check_deleted_file_fd() to detect files containing the keyword "scratch" only. Given that hdfs files eventually get removed, and it seems not an urgent issue, a separate Jira will be filed to track and investigate this behavior further. Testing: Reran the tests a couple times and passed. Change-Id: I55f5aa1cdbc0c74f6c7ebd25575e71d2b238bf98 --- M tests/custom_cluster/test_scratch_disk.py 1 file changed, 2 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/20898/1 -- To view, visit http://gerrit.cloudera.org:8080/20898 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I55f5aa1cdbc0c74f6c7ebd25575e71d2b238bf98 Gerrit-Change-Number: 20898 Gerrit-PatchSet: 1 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-12681: Release file descriptors for partially written temporary files
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20852 ) Change subject: IMPALA-12681: Release file descriptors for partially written temporary files .. Patch Set 7: (2 comments) http://gerrit.cloudera.org:8080/#/c/20852/6/tests/custom_cluster/test_scratch_disk.py File tests/custom_cluster/test_scratch_disk.py: http://gerrit.cloudera.org:8080/#/c/20852/6/tests/custom_cluster/test_scratch_disk.py@288 PS6, Line 288: result = subprocess.check_output(command, shell=True) : return result.strip() > nit: -2 indentation Done http://gerrit.cloudera.org:8080/#/c/20852/6/tests/custom_cluster/test_scratch_disk.py@291 PS6, Line 291: if not e.output: : # If there is no output, return None. : return None : return "Error checking the fd path with error '{}'".format(e) > nit: -2 indentation Done -- To view, visit http://gerrit.cloudera.org:8080/20852 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I58a2bac419ced806d6f5a32bcdf24d79e078ab14 Gerrit-Change-Number: 20852 Gerrit-PatchSet: 7 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Mon, 08 Jan 2024 12:10:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12681: Release file descriptors for partially written temporary files
Yida Wu has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/20852 ) Change subject: IMPALA-12681: Release file descriptors for partially written temporary files .. IMPALA-12681: Release file descriptors for partially written temporary files This patch fixes a bug where partially written temporary files are removed without releasing the file descriptors. This patch fixes the bug by adding a call to Close() of the local file writer during the Delete() of the DiskFile class, which could be called when the local buffer file is being evicted or the query ends, ensuring proper release of the file handle. Testing: Passed core tests. Additionally, a check has been added in the test test_scratch_disk.py to verify that there are no deleted files in the /proc/x/fd/ directory. Change-Id: I58a2bac419ced806d6f5a32bcdf24d79e078ab14 --- M be/src/runtime/io/disk-file.cc M be/src/runtime/io/disk-io-mgr-test.cc M be/src/runtime/io/local-file-writer.cc M be/src/runtime/io/local-file-writer.h M be/src/runtime/tmp-file-mgr-test.cc M tests/custom_cluster/test_scratch_disk.py 6 files changed, 41 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/20852/7 -- To view, visit http://gerrit.cloudera.org:8080/20852 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I58a2bac419ced806d6f5a32bcdf24d79e078ab14 Gerrit-Change-Number: 20852 Gerrit-PatchSet: 7 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu
[Impala-ASF-CR] IMPALA-12681: Release file descriptors for partially written temporary files
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20852 ) Change subject: IMPALA-12681: Release file descriptors for partially written temporary files .. Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/20852/5/tests/custom_cluster/test_scratch_disk.py File tests/custom_cluster/test_scratch_disk.py: http://gerrit.cloudera.org:8080/#/c/20852/5/tests/custom_cluster/test_scratch_disk.py@300 PS5, Line 300: for impalad in self.cluster.impalads: > You can get the list of impalads with Thanks, this is better. Done. http://gerrit.cloudera.org:8080/#/c/20852/5/tests/custom_cluster/test_scratch_disk.py@335 PS5, Line 335: @pytest.mark.execute_serially > It would make sense to check this on each scratch test, not just on remote Done -- To view, visit http://gerrit.cloudera.org:8080/20852 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I58a2bac419ced806d6f5a32bcdf24d79e078ab14 Gerrit-Change-Number: 20852 Gerrit-PatchSet: 6 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Fri, 05 Jan 2024 22:17:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12681: Release file descriptors for partially written temporary files
Yida Wu has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/20852 ) Change subject: IMPALA-12681: Release file descriptors for partially written temporary files .. IMPALA-12681: Release file descriptors for partially written temporary files This patch fixes a bug where partially written temporary files are removed without releasing the file descriptors. This patch fixes the bug by adding a call to Close() of the local file writer during the Delete() of the DiskFile class, which could be called when the local buffer file is being evicted or the query ends, ensuring proper release of the file handle. Testing: Passed core tests. Additionally, a check has been added in the test test_scratch_disk.py to verify that there are no deleted files in the /proc/x/fd/ directory. Change-Id: I58a2bac419ced806d6f5a32bcdf24d79e078ab14 --- M be/src/runtime/io/disk-file.cc M be/src/runtime/io/disk-io-mgr-test.cc M be/src/runtime/io/local-file-writer.cc M be/src/runtime/io/local-file-writer.h M be/src/runtime/tmp-file-mgr-test.cc M tests/custom_cluster/test_scratch_disk.py 6 files changed, 41 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/20852/6 -- To view, visit http://gerrit.cloudera.org:8080/20852 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I58a2bac419ced806d6f5a32bcdf24d79e078ab14 Gerrit-Change-Number: 20852 Gerrit-PatchSet: 6 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu
[Impala-ASF-CR] IMPALA-12681: Release file descriptors for partially written temporary files
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20852 ) Change subject: IMPALA-12681: Release file descriptors for partially written temporary files .. Patch Set 5: The new patch fixes some test cases, and added a check in test_scratch_disk.py to automatically check the path. Also manually tested the case to cancel the query in the middle during spilling, everything looks fine, I don't see any deleted files in /proc/x/fd. -- To view, visit http://gerrit.cloudera.org:8080/20852 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I58a2bac419ced806d6f5a32bcdf24d79e078ab14 Gerrit-Change-Number: 20852 Gerrit-PatchSet: 5 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Fri, 05 Jan 2024 14:20:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12681: Release file descriptors for partially written temporary files
Yida Wu has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/20852 ) Change subject: IMPALA-12681: Release file descriptors for partially written temporary files .. IMPALA-12681: Release file descriptors for partially written temporary files This patch fixes a bug where partially written temporary files are removed without releasing the file descriptors. This patch fixes the bug by adding a call to Close() of the local file writer during the Delete() of the DiskFile class, which could be called when the local buffer file is being evicted or the query ends, ensuring proper release of the file handle. Testing: Passed core tests. Additionally, a check has been added in the test test_scratch_disk.py to verify that there are no deleted files in the /proc/x/fd/ directory. Change-Id: I58a2bac419ced806d6f5a32bcdf24d79e078ab14 --- M be/src/runtime/io/disk-file.cc M be/src/runtime/io/disk-io-mgr-test.cc M be/src/runtime/io/local-file-writer.cc M be/src/runtime/io/local-file-writer.h M be/src/runtime/tmp-file-mgr-test.cc M tests/custom_cluster/test_scratch_disk.py 6 files changed, 50 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/20852/5 -- To view, visit http://gerrit.cloudera.org:8080/20852 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I58a2bac419ced806d6f5a32bcdf24d79e078ab14 Gerrit-Change-Number: 20852 Gerrit-PatchSet: 5 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu
[Impala-ASF-CR] IMPALA-12681: Release file descriptors for partially written temporary files
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20852 ) Change subject: IMPALA-12681: Release file descriptors for partially written temporary files .. Patch Set 4: Thanks Csaba and Abhishek for the review. The patch still needs to change one or two tests before merge. Still testing, will upload a new patch soon. -- To view, visit http://gerrit.cloudera.org:8080/20852 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I58a2bac419ced806d6f5a32bcdf24d79e078ab14 Gerrit-Change-Number: 20852 Gerrit-PatchSet: 4 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Fri, 05 Jan 2024 11:42:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12681: Release file descriptors for partially written temporary files
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20852 ) Change subject: IMPALA-12681: Release file descriptors for partially written temporary files .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/20852/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20852/3//COMMIT_MSG@7 PS3, Line 7: IMPALA-12681: Release file descriptors for partially written temporary files > nit: partially Done http://gerrit.cloudera.org:8080/#/c/20852/3//COMMIT_MSG@9 PS3, Line 9: This patch fixes a bug where partially written temporary files are > nit: This patch fixes a bug where partially written temporary files... Done -- To view, visit http://gerrit.cloudera.org:8080/20852 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I58a2bac419ced806d6f5a32bcdf24d79e078ab14 Gerrit-Change-Number: 20852 Gerrit-PatchSet: 4 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 04 Jan 2024 14:47:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12681: Release file descriptors for partially written temporary files
Yida Wu has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/20852 ) Change subject: IMPALA-12681: Release file descriptors for partially written temporary files .. IMPALA-12681: Release file descriptors for partially written temporary files This patch fixes a bug where partially written temporary files are removed without releasing the file descriptors. This patch fixes the bug by adding a call to Close() of the local file writer during the Delete() of the DiskFile class, which could be called when the local buffer file is being evicted or the query ends, ensuring proper release of the file handle. Testing: Manual testing of remote spilling, configure a local path as the local buffer and a S3 remote path for remote storage in scratch dir. Ran tpcds q78 with a low buffer_pool_limit in the query option to trigger spilling, and checked the path of local buffer path of the scratch dir that the temporary files did exist during running the query. Without the fix, after running the query, we can find references of deleted files in /proc/x/fd. With the patch that there are no references to the deleted files in the /proc/x/fd/. Change-Id: I58a2bac419ced806d6f5a32bcdf24d79e078ab14 --- M be/src/runtime/io/disk-file.cc M be/src/runtime/io/local-file-writer.cc M be/src/runtime/io/local-file-writer.h 3 files changed, 11 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/20852/4 -- To view, visit http://gerrit.cloudera.org:8080/20852 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I58a2bac419ced806d6f5a32bcdf24d79e078ab14 Gerrit-Change-Number: 20852 Gerrit-PatchSet: 4 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu
[Impala-ASF-CR] IMPALA-12681: Release file descriptors for partial written temporary files
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20852 ) Change subject: IMPALA-12681: Release file descriptors for partial written temporary files .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/20852/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20852/2//COMMIT_MSG@16 PS2, Line 16: Testing: > can you add a bit more context here or in the Jira on how to trigger this? The query that can spill (contains something like aggregation or hash join or others) should be good enough, in my test, I use tpcds q78, dont need to cancel, simply monitor the /proc/x/fd. Added some comments. -- To view, visit http://gerrit.cloudera.org:8080/20852 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I58a2bac419ced806d6f5a32bcdf24d79e078ab14 Gerrit-Change-Number: 20852 Gerrit-PatchSet: 3 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 04 Jan 2024 12:34:54 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12681: Release file descriptors for partial written temporary files
Yida Wu has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/20852 ) Change subject: IMPALA-12681: Release file descriptors for partial written temporary files .. IMPALA-12681: Release file descriptors for partial written temporary files The patch fixes a bug that the partial written temporary files are removed without releasing the file descriptors. This patch fixes the bug by adding a call to Close() of the local file writer during the Delete() of the DiskFile class, which could be called when the local buffer file is being evicted or the query ends, ensuring proper release of the file handle. Testing: Manual testing of remote spilling, configure a local path as the local buffer and a S3 remote path for remote storage in scratch dir. Ran tpcds q78 with a low buffer_pool_limit in the query option to trigger spilling, and checked the path of local buffer path of the scratch dir that the temporary files did exist during running the query. Without the fix, after running the query, we can find references of deleted files in /proc/x/fd. With the patch that there are no references to the deleted files in the /proc/x/fd/. Change-Id: I58a2bac419ced806d6f5a32bcdf24d79e078ab14 --- M be/src/runtime/io/disk-file.cc M be/src/runtime/io/local-file-writer.cc M be/src/runtime/io/local-file-writer.h 3 files changed, 11 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/20852/3 -- To view, visit http://gerrit.cloudera.org:8080/20852 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I58a2bac419ced806d6f5a32bcdf24d79e078ab14 Gerrit-Change-Number: 20852 Gerrit-PatchSet: 3 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu
[Impala-ASF-CR] IMPALA-12681: Release file descriptors for partial written temporary files
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20852 ) Change subject: IMPALA-12681: Release file descriptors for partial written temporary files .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/20852/1/be/src/runtime/io/disk-file.cc File be/src/runtime/io/disk-file.cc: http://gerrit.cloudera.org:8080/#/c/20852/1/be/src/runtime/io/disk-file.cc@41 PS1, Line 41: // Close the file writer to release the file handle. : RETURN_IF_ERROR(file_writer_->Close()); : } > Is it safe to delete it here, so can we exclude the possibility of having p Thanks Csaba! Yeah, it is better to call Close() here and add a DCHECK in the destructor of file writer. Changed. -- To view, visit http://gerrit.cloudera.org:8080/20852 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I58a2bac419ced806d6f5a32bcdf24d79e078ab14 Gerrit-Change-Number: 20852 Gerrit-PatchSet: 2 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 04 Jan 2024 09:06:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12681: Release file descriptors for partial written temporary files
Yida Wu has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/20852 ) Change subject: IMPALA-12681: Release file descriptors for partial written temporary files .. IMPALA-12681: Release file descriptors for partial written temporary files The patch fixes a bug that the partial written temporary files are removed without releasing the file descriptors. This patch fixes the bug by adding a call to Close() of the local file writer during the Delete() of the DiskFile class, which could be called when the local buffer file is being evicted or the query ends, ensuring proper release of the file handle. Testing: Manual testing of remote spilling, with the patch that there are no references to the deleted files in the /proc/x/fd/. Conversely, in the absence of the patch, such references are observed. Change-Id: I58a2bac419ced806d6f5a32bcdf24d79e078ab14 --- M be/src/runtime/io/disk-file.cc M be/src/runtime/io/local-file-writer.cc M be/src/runtime/io/local-file-writer.h 3 files changed, 11 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/20852/2 -- To view, visit http://gerrit.cloudera.org:8080/20852 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I58a2bac419ced806d6f5a32bcdf24d79e078ab14 Gerrit-Change-Number: 20852 Gerrit-PatchSet: 2 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-12681: Release file descriptors for partial written temporary files
Yida Wu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/20852 Change subject: IMPALA-12681: Release file descriptors for partial written temporary files .. IMPALA-12681: Release file descriptors for partial written temporary files The patch fixes a bug that the partial written temporary files are removed without releasing the file descriptors. This patch fixes the bug by adding a call to Close() during the destruction of the LocalFileWriter class, ensuring proper release of the file handle. Additionally, the patch introduces the immediate release of the file handle by adding the file writer reset when the Delete() is called within the DiskFile class. Testing: Manual testing of remote spilling, with the patch that there are no references to the deleted files in the /proc/x/fd/. Conversely, in the absence of the patch, such references are observed. Change-Id: I58a2bac419ced806d6f5a32bcdf24d79e078ab14 --- M be/src/runtime/io/disk-file.cc M be/src/runtime/io/local-file-writer.cc M be/src/runtime/io/local-file-writer.h 3 files changed, 13 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/20852/1 -- To view, visit http://gerrit.cloudera.org:8080/20852 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I58a2bac419ced806d6f5a32bcdf24d79e078ab14 Gerrit-Change-Number: 20852 Gerrit-PatchSet: 1 Gerrit-Owner: Yida Wu
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Yida Wu has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. IMPALA-11805: Use llvm ObjectCache for codegen caching Currently, we employ llvm::ExecutionEngine for codegen caching, providing access to compiled functions within the cached engine. However, the real challenge is the ExecutionEngine uses a lot of memory which largely exceeds our memory estimates and it is very hard to predict. This patch addresses this issue by using llvm::ObjectCache for codegen caching. In our case, each execution engine would have only one module, and after the compilation of the module, the compiled codegened functions of the module would be set to the execution engine, therefore functions could be used by Impala. During function compilation within the module, if an ObjectCache is set to the execution engine, the compiled codegened functions would be also written into the cache. This way, if we keep the cache, when revisiting the same module (fragment), we can efficiently reuse the specific ObjectCache, loading pre-compiled codegened functions and saving time. The tpch performance test indicates no significant regression compared to the previous use of ExecutionEngine. Post-change, the actual memory usage of each codegen caching entry is notably reduced. +--+---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +--+---+-++++ | TPCH(1) | parquet / none / none | 0.22| -0.65% | 0.20 | -0.75% | +--+---+-++++ +--+--+---++-++++---++-+---+ | Workload | Query| File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval | +--+--+---++-++++---++-+---+ | TPCH(1) | TPCH-Q13 | parquet / none / none | 0.49 | 0.47| +2.80% | 5.32%| 5.07%| 10| +1.22% | 1.63| 1.19 | | TPCH(1) | TPCH-Q4 | parquet / none / none | 0.16 | 0.16| +3.51% | 1.32%| * 10.38% * | 10| +0.06% | 0.49| 1.06 | | TPCH(1) | TPCH-Q11 | parquet / none / none | 0.12 | 0.12| +1.39% | 2.27%| 2.24%| 10| +1.50% | 1.90| 1.37 | | TPCH(1) | TPCH-Q19 | parquet / none / none | 0.21 | 0.21| +1.56% | * 10.02% * | * 11.42% * | 10| +1.18% | 0.57| 0.32 | | TPCH(1) | TPCH-Q18 | parquet / none / none | 0.27 | 0.27| +1.71% | 6.46%| 1.29%| 10| -0.19% | -1.19 | 0.81 | | TPCH(1) | TPCH-Q6 | parquet / none / none | 0.11 | 0.11| +0.79% | 2.76%| 2.15%| 10| +0.10% | 1.46| 0.71 | | TPCH(1) | TPCH-Q3 | parquet / none / none | 0.26 | 0.26| +0.71% | 6.63%| 6.18%| 10| +0.04% | 0.49| 0.25 | | TPCH(1) | TPCH-Q17 | parquet / none / none | 0.17 | 0.17| +0.41% | * 14.66% * | * 13.01% * | 10| +0.05% | 0.40| 0.07 | | TPCH(1) | TPCH-Q14 | parquet / none / none | 0.16 | 0.16| +0.19% | 1.41%| 1.39%| 10| +0.25% | 1.46| 0.31 | | TPCH(1) | TPCH-Q20 | parquet / none / none | 0.17 | 0.17| +0.22% | 1.70%| 1.77%| 10| -0.05% | -0.40 | 0.28 | | TPCH(1) | TPCH-Q12 | parquet / none / none | 0.16 | 0.16| -0.27% | 0.54%| 1.46%| 10| +0.14% | 0.93| -0.54 | | TPCH(1) | TPCH-Q22 | parquet / none / none | 0.11 | 0.11| -0.38% | 0.81%| 2.06%| 10| +0.03% | 0.22| -0.54 | | TPCH(1) | TPCH-Q16 | parquet / none / none | 0.17 | 0.17| -0.38% | 0.67%| 1.58%| 10| -0.01% | -0.13 | -0.70 | | TPCH(1) | TPCH-Q8 | parquet / none / none | 0.27 | 0.27| -0.08% | 1.24%| 1.15%| 10| -0.33% | -1.37 | -0.15 | | TPCH(1) | TPCH-Q15 | parquet / none / none | 0.16 | 0.16| -1.18% | * 16.61% * | * 10.25% * | 10| +0.33% | 0.40| -0.19 | | TPCH(1) | TPCH-Q1 | parquet / none / none | 0.22 | 0.22| -1.67% | 1.62%| 7.45%| 10| +0.43% | 1.02| -0.70 | | TPCH(1) | TPCH-Q5 | parquet / none / none | 0.22 | 0.22| -0.98% | 0.22%| 1.55%|
[Impala-ASF-CR] Revert "IMPALA-11805: Fix LLVM memory manager bytes allocated"
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20796 ) Change subject: Revert "IMPALA-11805: Fix LLVM memory manager bytes allocated" .. Patch Set 1: Code-Review+1 Thanks Michael for the prompt response. Wait to see if there are any additional comments. If not, I'll be ready to give it a +2. -- To view, visit http://gerrit.cloudera.org:8080/20796 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7f7a6b21b3a9fc7c9e675c8d3349725c4863f744 Gerrit-Change-Number: 20796 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Fri, 15 Dec 2023 00:13:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. Patch Set 7: (1 comment) Rebased and updated the commit message over the perf A-B result. http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG@28 PS3, Line 28: compared to the previous use of ExecutionEngine. Post-change, > https://jenkins.impala.io/job/perf-AB-test-ub2004/29 for example shows Thanks Michael. Yeah, that makes sense. Reran the perf A-B test in https://jenkins.impala.io/job/perf-AB-test-ub2004/33/. And added below to the commit message. +--+---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +--+---+-++++ | TPCH(1) | parquet / none / none | 0.22| -0.65% | 0.20 | -0.75% | +--+---+-++++ -- To view, visit http://gerrit.cloudera.org:8080/20733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677 Gerrit-Change-Number: 20733 Gerrit-PatchSet: 7 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Tue, 12 Dec 2023 06:06:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Yida Wu has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. IMPALA-11805: Use llvm ObjectCache for codegen caching Currently, we employ llvm::ExecutionEngine for codegen caching, providing access to compiled functions within the cached engine. However, the real challenge is the ExecutionEngine uses a lot of memory which largely exceeds our memory estimates and it is very hard to predict. This patch addresses this issue by using llvm::ObjectCache for codegen caching. In our case, each execution engine would have only one module, after the compilation of the module, the compiled codegened functions of the module would be set to the execution engine, therefore funtions could be used by Impala. During function compilation within the module, if an ObjectCache is set to the execution engine, the compiled codegened functions would be also written into the cache. This way, if we keep the cache, when revisiting the same module (fragment), we can efficiently reuse the specific ObjectCache, loading pre-compiled codegened functions and saving time. The tpch performance test indicates no significant regression compared to the previous use of ExecutionEngine. Post-change, the actual memory usage of each codegen caching entry is notably reduced. +--+---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +--+---+-++++ | TPCH(1) | parquet / none / none | 0.22| -0.65% | 0.20 | -0.75% | +--+---+-++++ +--+--+---++-++++---++-+---+ | Workload | Query| File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval | +--+--+---++-++++---++-+---+ | TPCH(1) | TPCH-Q13 | parquet / none / none | 0.49 | 0.47| +2.80% | 5.32%| 5.07%| 10| +1.22% | 1.63| 1.19 | | TPCH(1) | TPCH-Q4 | parquet / none / none | 0.16 | 0.16| +3.51% | 1.32%| * 10.38% * | 10| +0.06% | 0.49| 1.06 | | TPCH(1) | TPCH-Q11 | parquet / none / none | 0.12 | 0.12| +1.39% | 2.27%| 2.24%| 10| +1.50% | 1.90| 1.37 | | TPCH(1) | TPCH-Q19 | parquet / none / none | 0.21 | 0.21| +1.56% | * 10.02% * | * 11.42% * | 10| +1.18% | 0.57| 0.32 | | TPCH(1) | TPCH-Q18 | parquet / none / none | 0.27 | 0.27| +1.71% | 6.46%| 1.29%| 10| -0.19% | -1.19 | 0.81 | | TPCH(1) | TPCH-Q6 | parquet / none / none | 0.11 | 0.11| +0.79% | 2.76%| 2.15%| 10| +0.10% | 1.46| 0.71 | | TPCH(1) | TPCH-Q3 | parquet / none / none | 0.26 | 0.26| +0.71% | 6.63%| 6.18%| 10| +0.04% | 0.49| 0.25 | | TPCH(1) | TPCH-Q17 | parquet / none / none | 0.17 | 0.17| +0.41% | * 14.66% * | * 13.01% * | 10| +0.05% | 0.40| 0.07 | | TPCH(1) | TPCH-Q14 | parquet / none / none | 0.16 | 0.16| +0.19% | 1.41%| 1.39%| 10| +0.25% | 1.46| 0.31 | | TPCH(1) | TPCH-Q20 | parquet / none / none | 0.17 | 0.17| +0.22% | 1.70%| 1.77%| 10| -0.05% | -0.40 | 0.28 | | TPCH(1) | TPCH-Q12 | parquet / none / none | 0.16 | 0.16| -0.27% | 0.54%| 1.46%| 10| +0.14% | 0.93| -0.54 | | TPCH(1) | TPCH-Q22 | parquet / none / none | 0.11 | 0.11| -0.38% | 0.81%| 2.06%| 10| +0.03% | 0.22| -0.54 | | TPCH(1) | TPCH-Q16 | parquet / none / none | 0.17 | 0.17| -0.38% | 0.67%| 1.58%| 10| -0.01% | -0.13 | -0.70 | | TPCH(1) | TPCH-Q8 | parquet / none / none | 0.27 | 0.27| -0.08% | 1.24%| 1.15%| 10| -0.33% | -1.37 | -0.15 | | TPCH(1) | TPCH-Q15 | parquet / none / none | 0.16 | 0.16| -1.18% | * 16.61% * | * 10.25% * | 10| +0.33% | 0.40| -0.19 | | TPCH(1) | TPCH-Q1 | parquet / none / none | 0.22 | 0.22| -1.67% | 1.62%| 7.45%| 10| +0.43% | 1.02| -0.70 | | TPCH(1) | TPCH-Q5 | parquet / none / none | 0.22 | 0.22| -0.98% | 0.22%| 1.55%| 10
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG@28 PS3, Line 28: compared to the previous use of ExecutionEngine. Post-change, > Doesn't the benchmarking framework print a grand total? If not, we can let Double checked, but it doesn't seem to have the grand total of our benchmarking framework. Filed IMPALA-12615 for it. -- To view, visit http://gerrit.cloudera.org:8080/20733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677 Gerrit-Change-Number: 20733 Gerrit-PatchSet: 6 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Mon, 11 Dec 2023 03:18:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/20733/5/be/src/codegen/llvm-codegen-cache-test.cc File be/src/codegen/llvm-codegen-cache-test.cc: http://gerrit.cloudera.org:8080/#/c/20733/5/be/src/codegen/llvm-codegen-cache-test.cc@39 PS5, Line 39: const int ENGINE_CACHE_SIZE = 990; > This is larger than the value that was here before. Was the test failing un Yeah, this is the real number of the cache containing one compiled function. The earlier count was misleading as the cache was empty. I've adjusted the testcase, and added LlvmCodeGenCacheTest::CheckObjCacheExists to confirm the existence of contents within the cache. http://gerrit.cloudera.org:8080/#/c/20733/5/be/src/codegen/llvm-codegen-cache-test.cc@176 PS5, Line 176: // The function is to create and return a CodeGenObjectCache which contains a specific > nit: "contains" not "contians" Done -- To view, visit http://gerrit.cloudera.org:8080/20733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677 Gerrit-Change-Number: 20733 Gerrit-PatchSet: 6 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Fri, 08 Dec 2023 00:28:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Yida Wu has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. IMPALA-11805: Use llvm ObjectCache for codegen caching Currently, we employ llvm::ExecutionEngine for codegen caching, providing access to compiled functions within the cached engine. However, the real challenge is the ExecutionEngine uses a lot of memory which largely exceeds our memory estimates and it is very hard to predict. This patch addresses this issue by using llvm::ObjectCache for codegen caching. In our case, each execution engine would have only one module, after the compilation of the module, the compiled codegened functions of the module would be set to the execution engine, therefore funtions could be used by Impala. During function compilation within the module, if an ObjectCache is set to the execution engine, the compiled codegened functions would be also written into the cache. This way, if we keep the cache, when revisiting the same module (fragment), we can efficiently reuse the specific ObjectCache, loading pre-compiled codegened functions and saving time. The tpch performance test indicates no significant regression compared to the previous use of ExecutionEngine. Post-change, the actual memory usage of each codegen caching entry is notably reduced. +--+--+---++-++---++---++-+---+ | Workload | Query| File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval | +--+--+---++-++---++---++-+---+ | TPCH(1) | TPCH-Q15 | parquet / none / none | 0.48 | 0.46| +5.16% | 4.31% | 4.78%| 8 | +3.51% | 2.11| 2.22 | | TPCH(1) | TPCH-Q17 | parquet / none / none | 1.28 | 1.25| +1.76% | 1.83% | 1.88%| 8 | +3.53% | 1.34| 1.88 | | TPCH(1) | TPCH-Q8 | parquet / none / none | 0.77 | 0.75| +3.11% | 3.88% | 2.75%| 8 | +0.79% | 0.83| 1.81 | | TPCH(1) | TPCH-Q7 | parquet / none / none | 0.72 | 0.71| +1.89% | 3.48% | 3.15%| 8 | +1.61% | 0.83| 1.13 | | TPCH(1) | TPCH-Q22 | parquet / none / none | 0.43 | 0.41| +3.40% | 1.21% | 6.11%| 8 | +0.05% | 0.45| 1.54 | | TPCH(1) | TPCH-Q10 | parquet / none / none | 0.74 | 0.73| +2.04% | 0.75% | 2.57%| 8 | +1.13% | 2.11| 2.16 | | TPCH(1) | TPCH-Q20 | parquet / none / none | 0.55 | 0.54| +1.42% | 2.95% | 1.14%| 8 | +0.19% | 0.57| 1.25 | | TPCH(1) | TPCH-Q11 | parquet / none / none | 0.25 | 0.25| +0.37% | 2.41% | 3.09%| 8 | +0.82% | 0.45| 0.27 | | TPCH(1) | TPCH-Q19 | parquet / none / none | 0.62 | 0.61| +0.92% | 3.17% | 1.53%| 8 | +0.05% | 0.19| 0.73 | | TPCH(1) | TPCH-Q4 | parquet / none / none | 0.42 | 0.42| +0.37% | 1.24% | 1.08%| 8 | +0.03% | 0.19| 0.63 | | TPCH(1) | TPCH-Q5 | parquet / none / none | 0.72 | 0.72| +0.19% | 3.08% | 3.23%| 8 | +0.07% | 0.57| 0.12 | | TPCH(1) | TPCH-Q18 | parquet / none / none | 1.20 | 1.19| +0.04% | 0.12% | 0.15%| 8 | +0.03% | 0.57| 0.53 | | TPCH(1) | TPCH-Q9 | parquet / none / none | 1.13 | 1.13| +0.02% | 2.05% | 2.04%| 8 | -0.01% | -0.57 | 0.02 | | TPCH(1) | TPCH-Q3 | parquet / none / none | 0.60 | 0.60| +0.03% | 3.71% | 3.99%| 8 | -0.04% | -0.06 | 0.01 | | TPCH(1) | TPCH-Q2 | parquet / none / none | 0.44 | 0.44| -0.70% | 8.56% | 7.42%| 8 | +0.61% | 0.45| -0.18 | | TPCH(1) | TPCH-Q6 | parquet / none / none | 0.31 | 0.31| -0.07% | 1.73% | 1.56%| 8 | -0.06% | -1.34 | -0.09 | | TPCH(1) | TPCH-Q14 | parquet / none / none | 0.48 | 0.48| -0.25% | 1.20% | 1.47%| 8 | -0.09% | -0.57 | -0.37 | | TPCH(1) | TPCH-Q12 | parquet / none / none | 0.54 | 0.54| -0.86% | 1.90% | 4.48%| 8 | +0.11% | 0.32| -0.50 | | TPCH(1) | TPCH-Q21 | parquet / none / none | 1.35 | 1.37| -1.56% | 3.11% | 2.55%| 8 | -0.19% | -0.70 | -1.11 | | TPCH(1) | TPCH-Q16 | parquet / none / none | 0.26 | 0.27| -1.74% | 9.53% | * 13.61% * | 8
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/20733/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20733/4//COMMIT_MSG@17 PS4, Line 17: after the compila > Could you clarify the relationship of the module and exec engine here? I.e. Yes. Added some comments. -- To view, visit http://gerrit.cloudera.org:8080/20733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677 Gerrit-Change-Number: 20733 Gerrit-PatchSet: 5 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 07 Dec 2023 13:38:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. Patch Set 5: (15 comments) http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG@28 PS3, Line 28: compared to the previous use of ExecutionEngine. Post-change, > Doesn't the benchmarking framework print a grand total? If not, we can let It seems our benchmarking framework doesn't print the total. Will double check and file a jira if we don't have it. http://gerrit.cloudera.org:8080/#/c/20733/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20733/4//COMMIT_MSG@18 PS4, Line 18: the executi > Why are they pre-compiled? Aren't they compiled already? Or you mean they a I think the "pre-compiled" means it is compiled in the last round and ready for use. Yeah, maybe call it "compiled" is better, because in this case it means the first round to write the compiled stuff into the cache. http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-cache.h File be/src/codegen/llvm-codegen-cache.h: http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-cache.h@164 PS4, Line 164: cached_engine > Thinking about this, wouldn't 'cached_engine...' express the meaning better Done http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-cache.cc File be/src/codegen/llvm-codegen-cache.cc: http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-cache.cc@155 PS4, Line 155: y > I think removing the negation and inverting the branches is better, one les Done http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-object-cache.cc File be/src/codegen/llvm-codegen-object-cache.cc: http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-object-cache.cc@39 PS4, Line 39: uffer().size(); > Is a copy necessary here? Can't we move the buffer somehow? CodeGenObjectCache doesn't have the ownership of ObjBuffer, the llvm engine may use the ObjBuffer later, so it is safer to make a copy. (And MemoryBufferRef may not allow to move the real Buffer, the example I can see is to do a copy) http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-object-cache.cc@42 PS4, Line 42: // can conserve memory. > Shouldn't we add a DCHECK here? If this happens it is a bug. Done http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-object-cache.cc@57 PS4, Line 57: // ID. If the ID doesn't match the one in the object cache, a warning is logged for > Will it always copy the buffer? Is there a way to access it without copying >From the definition of the interface getObject(), "Returns a pointer to a >newly allocated MemoryBuffer that contains the object which corresponds with >Module M", it would be safer to return the copy of the buffer. >https://llvm.org/doxygen/classllvm_1_1ObjectCache.html#details http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-object-cache.cc@60 PS4, Line 60: std::lock_guard l(mutex_); > Shouldn't we add a DCHECK here? If this happens it is a bug. Done http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-object-cache.cc@63 PS4, Line 63: << cached_obj_->getMemBufferRef().getBufferSize(); > Can this function be called when 'key_' is empty? Isn't that a bug? If there is no cache yet, would return nullptr, for example, the llvm engine would look up the cache to see if it contains the pre-compiled module, if it doesn't contain any, the llvm engine would proceed to compile the module and then write compiled module to the cache. This happens in cache look up and doesn't hit any. http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen.h File be/src/codegen/llvm-codegen.h: http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen.h@330 PS4, Line 330: > See comment on llvm-codegen-cache.h:164. Changed the one in llvm-codegen-cache.h, but would prefer to keep this, because this one is the cache for llvm engine to write the compiled functions to, it doesn't have to be anything "cached". Added some comments. http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen.h@925 PS4, Line 925: > Do I understand it correctly that this is for writing to the cache (storing Yes, we can say that the engine_cache_ is for writing, and engine_cache_cached_ is for reading. Added some comments. http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen.h@929 PS4, Line 929: > We could extend the comment with what is written on llvm-codegen.cc:1499, i Added some comments. http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen.cc@1192 PS4,
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Yida Wu has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. IMPALA-11805: Use llvm ObjectCache for codegen caching Currently, we employ llvm::ExecutionEngine for codegen caching, providing access to compiled functions within the cached engine. However, the real challenge is the ExecutionEngine uses a lot of memory which largely exceeds our memory estimates and it is very hard to predict. This patch addresses this issue by using llvm::ObjectCache for codegen caching. In our case, each execution engine would have only one module, after the compilation of the module, the compiled codegened functions of the module would be set to the execution engine, therefore funtions could be used by Impala. During function compilation within the module, if an ObjectCache is set to the execution engine, the compiled codegened functions would be also written into the cache. This way, if we keep the cache, when revisiting the same module (fragment), we can efficiently reuse the specific ObjectCache, loading pre-compiled codegened functions and saving time. The tpch performance test indicates no significant regression compared to the previous use of ExecutionEngine. Post-change, the actual memory usage of each codegen caching entry is notably reduced. +--+--+---++-++---++---++-+---+ | Workload | Query| File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval | +--+--+---++-++---++---++-+---+ | TPCH(1) | TPCH-Q15 | parquet / none / none | 0.48 | 0.46| +5.16% | 4.31% | 4.78%| 8 | +3.51% | 2.11| 2.22 | | TPCH(1) | TPCH-Q17 | parquet / none / none | 1.28 | 1.25| +1.76% | 1.83% | 1.88%| 8 | +3.53% | 1.34| 1.88 | | TPCH(1) | TPCH-Q8 | parquet / none / none | 0.77 | 0.75| +3.11% | 3.88% | 2.75%| 8 | +0.79% | 0.83| 1.81 | | TPCH(1) | TPCH-Q7 | parquet / none / none | 0.72 | 0.71| +1.89% | 3.48% | 3.15%| 8 | +1.61% | 0.83| 1.13 | | TPCH(1) | TPCH-Q22 | parquet / none / none | 0.43 | 0.41| +3.40% | 1.21% | 6.11%| 8 | +0.05% | 0.45| 1.54 | | TPCH(1) | TPCH-Q10 | parquet / none / none | 0.74 | 0.73| +2.04% | 0.75% | 2.57%| 8 | +1.13% | 2.11| 2.16 | | TPCH(1) | TPCH-Q20 | parquet / none / none | 0.55 | 0.54| +1.42% | 2.95% | 1.14%| 8 | +0.19% | 0.57| 1.25 | | TPCH(1) | TPCH-Q11 | parquet / none / none | 0.25 | 0.25| +0.37% | 2.41% | 3.09%| 8 | +0.82% | 0.45| 0.27 | | TPCH(1) | TPCH-Q19 | parquet / none / none | 0.62 | 0.61| +0.92% | 3.17% | 1.53%| 8 | +0.05% | 0.19| 0.73 | | TPCH(1) | TPCH-Q4 | parquet / none / none | 0.42 | 0.42| +0.37% | 1.24% | 1.08%| 8 | +0.03% | 0.19| 0.63 | | TPCH(1) | TPCH-Q5 | parquet / none / none | 0.72 | 0.72| +0.19% | 3.08% | 3.23%| 8 | +0.07% | 0.57| 0.12 | | TPCH(1) | TPCH-Q18 | parquet / none / none | 1.20 | 1.19| +0.04% | 0.12% | 0.15%| 8 | +0.03% | 0.57| 0.53 | | TPCH(1) | TPCH-Q9 | parquet / none / none | 1.13 | 1.13| +0.02% | 2.05% | 2.04%| 8 | -0.01% | -0.57 | 0.02 | | TPCH(1) | TPCH-Q3 | parquet / none / none | 0.60 | 0.60| +0.03% | 3.71% | 3.99%| 8 | -0.04% | -0.06 | 0.01 | | TPCH(1) | TPCH-Q2 | parquet / none / none | 0.44 | 0.44| -0.70% | 8.56% | 7.42%| 8 | +0.61% | 0.45| -0.18 | | TPCH(1) | TPCH-Q6 | parquet / none / none | 0.31 | 0.31| -0.07% | 1.73% | 1.56%| 8 | -0.06% | -1.34 | -0.09 | | TPCH(1) | TPCH-Q14 | parquet / none / none | 0.48 | 0.48| -0.25% | 1.20% | 1.47%| 8 | -0.09% | -0.57 | -0.37 | | TPCH(1) | TPCH-Q12 | parquet / none / none | 0.54 | 0.54| -0.86% | 1.90% | 4.48%| 8 | +0.11% | 0.32| -0.50 | | TPCH(1) | TPCH-Q21 | parquet / none / none | 1.35 | 1.37| -1.56% | 3.11% | 2.55%| 8 | -0.19% | -0.70 | -1.11 | | TPCH(1) | TPCH-Q16 | parquet / none / none | 0.26 | 0.27| -1.74% | 9.53% | * 13.61% * | 8
[Impala-ASF-CR] IMPALA-11805: Fix LLVM memory manager bytes allocated
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20697 ) Change subject: IMPALA-11805: Fix LLVM memory manager bytes allocated .. Patch Set 13: Code-Review+1 LGTM!! Considering the rise in bytes_allocated after the change, have you run exhaustive tests to ensure the memory tracker can handle the increased demand? -- To view, visit http://gerrit.cloudera.org:8080/20697 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5a0193a1694db0718d75651dc2b2f9f92c0d6064 Gerrit-Change-Number: 20697 Gerrit-PatchSet: 13 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Mon, 04 Dec 2023 02:01:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. Patch Set 4: (12 comments) Thanks Daniel for reviewing. http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG@12 PS3, Line 12: which largely exceeds our memo > I think this should come before "which largely exceeds...", because "which Done http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG@17 PS3, Line 17: > Nit: space before '('. Done http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG@18 PS3, Line 18: pre-compiled : codegened > Do you mean the codegened functions? First I thought it referred to the fun Yeah, changed it to pre-compiled codegened functions. http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG@28 PS3, Line 28: > Could you also include the aggregate result over all queries? Do you mean run the time running all the tpch queries for once? Did a test, run only once for all the TPCH queries, the result of TPCH-Q1 for Execution Engine looks suspicious (maybe because it is the first of all to run), but all others look similar. Query Execution EngineObject Cache Q1 4 1.68 Q10 1.521.67 Q11 0.8 0.8 Q12 0.680.58 Q13 1.091.29 Q14 0.820.82 Q15 0.890.88 Q16 0.8 0.75 Q17 1.091.09 Q18 1.151.25 Q19 2.072.11 Q2 1.361.36 Q20 1.051.06 Q21 1.911.91 Q22 0.630.63 Q3 0.940.93 Q4 0.480.47 Q5 1.451.45 Q6 0.260.26 Q7 2.2 2.15 Q8 1.811.81 Q9 1.851.85 Total 28.85 26.8 http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache-test.cc File be/src/codegen/llvm-codegen-cache-test.cc: http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache-test.cc@99 PS3, Line 99: dule_id, bool > We could include the parameter names here and on the next line if 'codegen' Done http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache-test.cc@115 PS3, Line 115: > Maybe 'cached_engine' would be better, also on L117. Done http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache-test.cc@132 PS3, Line 132: > I'd prefer something like 'codegen_obj_cache_', it's easier to get what it Done http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache-test.cc@138 PS3, Line 138: > Could it be const? Done http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache-test.cc@142 PS3, Line 142: > Could use 'nullptr' instead. Done http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache-test.cc@225 PS3, Line 225: te a Llvm > 'module_id_echo' would be better, cf. 'module_id_double'. Done http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache.h File be/src/codegen/llvm-codegen-cache.h: http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache.h@231 PS3, Line 231: engine_cache_raw > Maybe 'engine_cache_raw[_ptr]' would be better. Done http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache.cc File be/src/codegen/llvm-codegen-cache.cc: http://gerrit.cloudera.org:8080/#/c/20733/3/be/src/codegen/llvm-codegen-cache.cc@154 PS3, Line 154: Slice key; > The assignment could be made in an IF-ELSE (or a ? : operator). Now we get/ Done -- To view, visit http://gerrit.cloudera.org:8080/20733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677 Gerrit-Change-Number: 20733 Gerrit-PatchSet: 4 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Sat, 02 Dec 2023 13:15:54 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Yida Wu has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. IMPALA-11805: Use llvm ObjectCache for codegen caching Currently, we employ llvm::ExecutionEngine for codegen caching, providing access to compiled functions within the cached engine. However, the real challenge is the ExecutionEngine uses a lot of memory which largely exceeds our memory estimates and it is very hard to predict. This patch addresses this issue by using llvm::ObjectCache for codegen caching. The approach involves associating an ObjectCache with the module (execution engine). During function compilation within the module, if an ObjectCache is set, the pre-compiled codegened functions are written into the cache. This way, when revisiting the same module (fragment), we can efficiently reuse the specific ObjectCache, loading pre-compiled codegened functions and saving time. The tpch performance test indicates no significant regression compared to the previous use of ExecutionEngine. Post-change, the actual memory usage of each codegen caching entry is notably reduced. +--+--+---++-++---++---++-+---+ | Workload | Query| File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval | +--+--+---++-++---++---++-+---+ | TPCH(1) | TPCH-Q15 | parquet / none / none | 0.48 | 0.46| +5.16% | 4.31% | 4.78%| 8 | +3.51% | 2.11| 2.22 | | TPCH(1) | TPCH-Q17 | parquet / none / none | 1.28 | 1.25| +1.76% | 1.83% | 1.88%| 8 | +3.53% | 1.34| 1.88 | | TPCH(1) | TPCH-Q8 | parquet / none / none | 0.77 | 0.75| +3.11% | 3.88% | 2.75%| 8 | +0.79% | 0.83| 1.81 | | TPCH(1) | TPCH-Q7 | parquet / none / none | 0.72 | 0.71| +1.89% | 3.48% | 3.15%| 8 | +1.61% | 0.83| 1.13 | | TPCH(1) | TPCH-Q22 | parquet / none / none | 0.43 | 0.41| +3.40% | 1.21% | 6.11%| 8 | +0.05% | 0.45| 1.54 | | TPCH(1) | TPCH-Q10 | parquet / none / none | 0.74 | 0.73| +2.04% | 0.75% | 2.57%| 8 | +1.13% | 2.11| 2.16 | | TPCH(1) | TPCH-Q20 | parquet / none / none | 0.55 | 0.54| +1.42% | 2.95% | 1.14%| 8 | +0.19% | 0.57| 1.25 | | TPCH(1) | TPCH-Q11 | parquet / none / none | 0.25 | 0.25| +0.37% | 2.41% | 3.09%| 8 | +0.82% | 0.45| 0.27 | | TPCH(1) | TPCH-Q19 | parquet / none / none | 0.62 | 0.61| +0.92% | 3.17% | 1.53%| 8 | +0.05% | 0.19| 0.73 | | TPCH(1) | TPCH-Q4 | parquet / none / none | 0.42 | 0.42| +0.37% | 1.24% | 1.08%| 8 | +0.03% | 0.19| 0.63 | | TPCH(1) | TPCH-Q5 | parquet / none / none | 0.72 | 0.72| +0.19% | 3.08% | 3.23%| 8 | +0.07% | 0.57| 0.12 | | TPCH(1) | TPCH-Q18 | parquet / none / none | 1.20 | 1.19| +0.04% | 0.12% | 0.15%| 8 | +0.03% | 0.57| 0.53 | | TPCH(1) | TPCH-Q9 | parquet / none / none | 1.13 | 1.13| +0.02% | 2.05% | 2.04%| 8 | -0.01% | -0.57 | 0.02 | | TPCH(1) | TPCH-Q3 | parquet / none / none | 0.60 | 0.60| +0.03% | 3.71% | 3.99%| 8 | -0.04% | -0.06 | 0.01 | | TPCH(1) | TPCH-Q2 | parquet / none / none | 0.44 | 0.44| -0.70% | 8.56% | 7.42%| 8 | +0.61% | 0.45| -0.18 | | TPCH(1) | TPCH-Q6 | parquet / none / none | 0.31 | 0.31| -0.07% | 1.73% | 1.56%| 8 | -0.06% | -1.34 | -0.09 | | TPCH(1) | TPCH-Q14 | parquet / none / none | 0.48 | 0.48| -0.25% | 1.20% | 1.47%| 8 | -0.09% | -0.57 | -0.37 | | TPCH(1) | TPCH-Q12 | parquet / none / none | 0.54 | 0.54| -0.86% | 1.90% | 4.48%| 8 | +0.11% | 0.32| -0.50 | | TPCH(1) | TPCH-Q21 | parquet / none / none | 1.35 | 1.37| -1.56% | 3.11% | 2.55%| 8 | -0.19% | -0.70 | -1.11 | | TPCH(1) | TPCH-Q16 | parquet / none / none | 0.26 | 0.27| -1.74% | 9.53% | * 13.61% * | 8 | -0.32% | -0.06 | -0.30 | | TPCH(1) | TPCH-Q1 | parquet / none / none | 0.64 | 0.67| -4.64% | 1.13% | * 12.59% * | 8 | -0.28% | -0.19 | -1.04 | |
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen-cache-test.cc File be/src/codegen/llvm-codegen-cache-test.cc: http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen-cache-test.cc@298 PS2, Line 298: // 172B for optimal mode, or 180B on ARM systems > Oh, I meant the largest difference I saw between normal and optimal mode is Did a test in both arm and x86. In our testcase of mem_charge_1, the mem charge in arm is always 8B larger than x86 in normal and optimal mode. For mem_charge_1, arm 188B/180B, x86 180B/172B. The current size setting should be good for testing, and I can pass the test in arm. http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc@1385 PS2, Line 1385: > So bytes_allocated is still needed, which means we still want https://gerri >From the source code and gdb debugging that I can see, the bytes_allocated >will be used when loading from the pre-compiled object. >https://llvm.org/doxygen/MCJIT_8cpp_source.html#l00224. While the manage usage >of pre-compiled object is currently managed by the codegen cache if codegen >cache is enabled, the bytes_allocated is managed by the runtime execution >engine, I think bytes_allocated is still useful for runtime use no matter >codegen cache is on or off. -- To view, visit http://gerrit.cloudera.org:8080/20733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677 Gerrit-Change-Number: 20733 Gerrit-PatchSet: 3 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 30 Nov 2023 10:00:35 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. Patch Set 3: (12 comments) Thanks Michael for the review. http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen-cache-test.cc File be/src/codegen/llvm-codegen-cache-test.cc: http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen-cache-test.cc@298 PS2, Line 298: // 172B for optimal mode, or 180B on ARM systems > I experimented with this. In practice I saw minimal (<= 8B) difference betw Adjusted the value after adding sizeof(CodeGenObjectCache). http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen-cache.h File be/src/codegen/llvm-codegen-cache.h: http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen-cache.h@308 PS2, Line 308: std::unordered_map> > This would also work as a std::unordered_set. https://en.cppreference.com/w When creating a temporary shared_ptr from the raw pointer, there might be a double-free problem. I'm uncertain if there's a more optimal solution. However, considering the cache entry is typically over 1k bytes while the 8-byte pointer is relatively small, it doesn't seem like a significant concern in this context. http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.h File be/src/codegen/llvm-codegen.h: http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.h@33 PS2, Line 33: #include > What's this used for? Done http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.h@353 PS2, Line 353: /// retrieval. If module_id is not nullptr, the final module id is returned. > Can you expand what module_id is used for. Would help to identify when to p Done http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.h@920 PS2, Line 920: llvm::Module* module_; > engine_cache_wrapper no longer owns an execution engine, so this comment is Done http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc@a1185 PS2, Line 1185: > I thought this comment was kind of useful. And I see you still use it at li Done http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc@496 PS2, Line 496: engine_cache_ = make_shared(); > Why initialize it this way? You could also just pass "new CodegenObjectCach Done http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc@498 PS2, Line 498: RETURN_IF_ERROR(LoadIntrinsics()); > You could directly assign to execution_engine_ on line 479. Lots of extra m Done http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc@499 PS2, Line 499: > Why not assign directly to symbol_emitter_? Done http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-codegen.cc@1385 PS2, Line 1385: > Is this still right? Shouldn't it be the object cache size, since that's wh Done http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-object-cache-wrapper.h File be/src/codegen/llvm-object-cache-wrapper.h: http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-object-cache-wrapper.h@44 PS2, Line 44: > possibly other objects seems weird as there aren't other objects. Do we rea Done http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-object-cache-wrapper.cc File be/src/codegen/llvm-object-cache-wrapper.cc: http://gerrit.cloudera.org:8080/#/c/20733/2/be/src/codegen/llvm-object-cache-wrapper.cc@66 PS2, Line 66: > Is sizeof(CodeGenObjectCache) accounted for elsewhere? Yeah, this is a good point. Added here. -- To view, visit http://gerrit.cloudera.org:8080/20733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677 Gerrit-Change-Number: 20733 Gerrit-PatchSet: 3 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Wed, 29 Nov 2023 14:37:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Yida Wu has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. IMPALA-11805: Use llvm ObjectCache for codegen caching Currently, we employ llvm::ExecutionEngine for codegen caching, providing access to compiled functions within the cached engine. However, the real challenge is the ExecutionEngine uses a lot of memory and it is very hard to predict which largely exceeds our memory estimates. This patch addresses this issue by using llvm::ObjectCache for codegen caching. The approach involves associating an ObjectCache with the module(execution engine). During function compilation within the module, if an ObjectCache is set, the pre-compiled functions are written into the cache. This way, when revisiting the same module (fragment), we can efficiently reuse the specific ObjectCache, loading pre-compiled functions and saving time. The tpch performance test indicates no significant regression compared to the previous use of ExecutionEngine. Post-change, the actual memory usage of each codegen caching entry is notably reduced. +--+--+---++-++---++---++-+---+ | Workload | Query| File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval | +--+--+---++-++---++---++-+---+ | TPCH(1) | TPCH-Q15 | parquet / none / none | 0.48 | 0.46| +5.16% | 4.31% | 4.78%| 8 | +3.51% | 2.11| 2.22 | | TPCH(1) | TPCH-Q17 | parquet / none / none | 1.28 | 1.25| +1.76% | 1.83% | 1.88%| 8 | +3.53% | 1.34| 1.88 | | TPCH(1) | TPCH-Q8 | parquet / none / none | 0.77 | 0.75| +3.11% | 3.88% | 2.75%| 8 | +0.79% | 0.83| 1.81 | | TPCH(1) | TPCH-Q7 | parquet / none / none | 0.72 | 0.71| +1.89% | 3.48% | 3.15%| 8 | +1.61% | 0.83| 1.13 | | TPCH(1) | TPCH-Q22 | parquet / none / none | 0.43 | 0.41| +3.40% | 1.21% | 6.11%| 8 | +0.05% | 0.45| 1.54 | | TPCH(1) | TPCH-Q10 | parquet / none / none | 0.74 | 0.73| +2.04% | 0.75% | 2.57%| 8 | +1.13% | 2.11| 2.16 | | TPCH(1) | TPCH-Q20 | parquet / none / none | 0.55 | 0.54| +1.42% | 2.95% | 1.14%| 8 | +0.19% | 0.57| 1.25 | | TPCH(1) | TPCH-Q11 | parquet / none / none | 0.25 | 0.25| +0.37% | 2.41% | 3.09%| 8 | +0.82% | 0.45| 0.27 | | TPCH(1) | TPCH-Q19 | parquet / none / none | 0.62 | 0.61| +0.92% | 3.17% | 1.53%| 8 | +0.05% | 0.19| 0.73 | | TPCH(1) | TPCH-Q4 | parquet / none / none | 0.42 | 0.42| +0.37% | 1.24% | 1.08%| 8 | +0.03% | 0.19| 0.63 | | TPCH(1) | TPCH-Q5 | parquet / none / none | 0.72 | 0.72| +0.19% | 3.08% | 3.23%| 8 | +0.07% | 0.57| 0.12 | | TPCH(1) | TPCH-Q18 | parquet / none / none | 1.20 | 1.19| +0.04% | 0.12% | 0.15%| 8 | +0.03% | 0.57| 0.53 | | TPCH(1) | TPCH-Q9 | parquet / none / none | 1.13 | 1.13| +0.02% | 2.05% | 2.04%| 8 | -0.01% | -0.57 | 0.02 | | TPCH(1) | TPCH-Q3 | parquet / none / none | 0.60 | 0.60| +0.03% | 3.71% | 3.99%| 8 | -0.04% | -0.06 | 0.01 | | TPCH(1) | TPCH-Q2 | parquet / none / none | 0.44 | 0.44| -0.70% | 8.56% | 7.42%| 8 | +0.61% | 0.45| -0.18 | | TPCH(1) | TPCH-Q6 | parquet / none / none | 0.31 | 0.31| -0.07% | 1.73% | 1.56%| 8 | -0.06% | -1.34 | -0.09 | | TPCH(1) | TPCH-Q14 | parquet / none / none | 0.48 | 0.48| -0.25% | 1.20% | 1.47%| 8 | -0.09% | -0.57 | -0.37 | | TPCH(1) | TPCH-Q12 | parquet / none / none | 0.54 | 0.54| -0.86% | 1.90% | 4.48%| 8 | +0.11% | 0.32| -0.50 | | TPCH(1) | TPCH-Q21 | parquet / none / none | 1.35 | 1.37| -1.56% | 3.11% | 2.55%| 8 | -0.19% | -0.70 | -1.11 | | TPCH(1) | TPCH-Q16 | parquet / none / none | 0.26 | 0.27| -1.74% | 9.53% | * 13.61% * | 8 | -0.32% | -0.06 | -0.30 | | TPCH(1) | TPCH-Q1 | parquet / none / none | 0.64 | 0.67| -4.64% | 1.13% | * 12.59% * | 8 | -0.28% | -0.19 | -1.04 | | TPCH(1) | TPCH-Q13 |
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Yida Wu has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/20733 ) Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. IMPALA-11805: Use llvm ObjectCache for codegen caching Currently, we employ llvm::ExecutionEngine for codegen caching, providing access to compiled functions within the cached engine. However, the real challenge is the ExecutionEngine uses a lot of memory and it is very hard to predict which largely exceeds our memory estimates. This patch addresses this issue by using llvm::ObjectCache for codegen caching. The approach involves associating an ObjectCache with the module(execution engine). During function compilation within the module, if an ObjectCache is set, the pre-compiled functions are written into the cache. This way, when revisiting the same module (fragment), we can efficiently reuse the specific ObjectCache, loading pre-compiled functions and saving time. The tpch performance test indicates no significant regression compared to the previous use of ExecutionEngine. Post-change, the actual memory usage of each codegen caching entry is notably reduced. +--+--+---++-++---++---++-+---+ | Workload | Query| File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval | +--+--+---++-++---++---++-+---+ | TPCH(1) | TPCH-Q15 | parquet / none / none | 0.48 | 0.46| +5.16% | 4.31% | 4.78%| 8 | +3.51% | 2.11| 2.22 | | TPCH(1) | TPCH-Q17 | parquet / none / none | 1.28 | 1.25| +1.76% | 1.83% | 1.88%| 8 | +3.53% | 1.34| 1.88 | | TPCH(1) | TPCH-Q8 | parquet / none / none | 0.77 | 0.75| +3.11% | 3.88% | 2.75%| 8 | +0.79% | 0.83| 1.81 | | TPCH(1) | TPCH-Q7 | parquet / none / none | 0.72 | 0.71| +1.89% | 3.48% | 3.15%| 8 | +1.61% | 0.83| 1.13 | | TPCH(1) | TPCH-Q22 | parquet / none / none | 0.43 | 0.41| +3.40% | 1.21% | 6.11%| 8 | +0.05% | 0.45| 1.54 | | TPCH(1) | TPCH-Q10 | parquet / none / none | 0.74 | 0.73| +2.04% | 0.75% | 2.57%| 8 | +1.13% | 2.11| 2.16 | | TPCH(1) | TPCH-Q20 | parquet / none / none | 0.55 | 0.54| +1.42% | 2.95% | 1.14%| 8 | +0.19% | 0.57| 1.25 | | TPCH(1) | TPCH-Q11 | parquet / none / none | 0.25 | 0.25| +0.37% | 2.41% | 3.09%| 8 | +0.82% | 0.45| 0.27 | | TPCH(1) | TPCH-Q19 | parquet / none / none | 0.62 | 0.61| +0.92% | 3.17% | 1.53%| 8 | +0.05% | 0.19| 0.73 | | TPCH(1) | TPCH-Q4 | parquet / none / none | 0.42 | 0.42| +0.37% | 1.24% | 1.08%| 8 | +0.03% | 0.19| 0.63 | | TPCH(1) | TPCH-Q5 | parquet / none / none | 0.72 | 0.72| +0.19% | 3.08% | 3.23%| 8 | +0.07% | 0.57| 0.12 | | TPCH(1) | TPCH-Q18 | parquet / none / none | 1.20 | 1.19| +0.04% | 0.12% | 0.15%| 8 | +0.03% | 0.57| 0.53 | | TPCH(1) | TPCH-Q9 | parquet / none / none | 1.13 | 1.13| +0.02% | 2.05% | 2.04%| 8 | -0.01% | -0.57 | 0.02 | | TPCH(1) | TPCH-Q3 | parquet / none / none | 0.60 | 0.60| +0.03% | 3.71% | 3.99%| 8 | -0.04% | -0.06 | 0.01 | | TPCH(1) | TPCH-Q2 | parquet / none / none | 0.44 | 0.44| -0.70% | 8.56% | 7.42%| 8 | +0.61% | 0.45| -0.18 | | TPCH(1) | TPCH-Q6 | parquet / none / none | 0.31 | 0.31| -0.07% | 1.73% | 1.56%| 8 | -0.06% | -1.34 | -0.09 | | TPCH(1) | TPCH-Q14 | parquet / none / none | 0.48 | 0.48| -0.25% | 1.20% | 1.47%| 8 | -0.09% | -0.57 | -0.37 | | TPCH(1) | TPCH-Q12 | parquet / none / none | 0.54 | 0.54| -0.86% | 1.90% | 4.48%| 8 | +0.11% | 0.32| -0.50 | | TPCH(1) | TPCH-Q21 | parquet / none / none | 1.35 | 1.37| -1.56% | 3.11% | 2.55%| 8 | -0.19% | -0.70 | -1.11 | | TPCH(1) | TPCH-Q16 | parquet / none / none | 0.26 | 0.27| -1.74% | 9.53% | * 13.61% * | 8 | -0.32% | -0.06 | -0.30 | | TPCH(1) | TPCH-Q1 | parquet / none / none | 0.64 | 0.67| -4.64% | 1.13% | * 12.59% * | 8 | -0.28% | -0.19 | -1.04 | | TPCH(1) | TPCH-Q13 |
[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching
Yida Wu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/20733 Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching .. IMPALA-11805: Use llvm ObjectCache for codegen caching Currently, we employ llvm::ExecutionEngine for codegen caching, providing access to compiled functions within the cached engine. However, the real challenge is the ExecutionEngine uses a lot of memory and it is very hard to predict which largely exceeds our memory estimates. This patch addresses this issue by using llvm::ObjectCache for codegen caching. The approach involves associating an ObjectCache with the module(execution engine). During function compilation within the module, if an ObjectCache is set, the pre-compiled functions are written into the cache. This way, when revisiting the same module (fragment), we can efficiently reuse the specific ObjectCache, loading pre-compiled functions and saving time. The tpch performance test indicates no significant regression compared to the previous use of ExecutionEngine. Post-change, the actual memory usage of each codegen caching entry is notably reduced. +--+--+---++-++---++---++-+---+ | Workload | Query| File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval | +--+--+---++-++---++---++-+---+ | TPCH(1) | TPCH-Q15 | parquet / none / none | 0.48 | 0.46| +5.16% | 4.31% | 4.78%| 8 | +3.51% | 2.11| 2.22 | | TPCH(1) | TPCH-Q17 | parquet / none / none | 1.28 | 1.25| +1.76% | 1.83% | 1.88%| 8 | +3.53% | 1.34| 1.88 | | TPCH(1) | TPCH-Q8 | parquet / none / none | 0.77 | 0.75| +3.11% | 3.88% | 2.75%| 8 | +0.79% | 0.83| 1.81 | | TPCH(1) | TPCH-Q7 | parquet / none / none | 0.72 | 0.71| +1.89% | 3.48% | 3.15%| 8 | +1.61% | 0.83| 1.13 | | TPCH(1) | TPCH-Q22 | parquet / none / none | 0.43 | 0.41| +3.40% | 1.21% | 6.11%| 8 | +0.05% | 0.45| 1.54 | | TPCH(1) | TPCH-Q10 | parquet / none / none | 0.74 | 0.73| +2.04% | 0.75% | 2.57%| 8 | +1.13% | 2.11| 2.16 | | TPCH(1) | TPCH-Q20 | parquet / none / none | 0.55 | 0.54| +1.42% | 2.95% | 1.14%| 8 | +0.19% | 0.57| 1.25 | | TPCH(1) | TPCH-Q11 | parquet / none / none | 0.25 | 0.25| +0.37% | 2.41% | 3.09%| 8 | +0.82% | 0.45| 0.27 | | TPCH(1) | TPCH-Q19 | parquet / none / none | 0.62 | 0.61| +0.92% | 3.17% | 1.53%| 8 | +0.05% | 0.19| 0.73 | | TPCH(1) | TPCH-Q4 | parquet / none / none | 0.42 | 0.42| +0.37% | 1.24% | 1.08%| 8 | +0.03% | 0.19| 0.63 | | TPCH(1) | TPCH-Q5 | parquet / none / none | 0.72 | 0.72| +0.19% | 3.08% | 3.23%| 8 | +0.07% | 0.57| 0.12 | | TPCH(1) | TPCH-Q18 | parquet / none / none | 1.20 | 1.19| +0.04% | 0.12% | 0.15%| 8 | +0.03% | 0.57| 0.53 | | TPCH(1) | TPCH-Q9 | parquet / none / none | 1.13 | 1.13| +0.02% | 2.05% | 2.04%| 8 | -0.01% | -0.57 | 0.02 | | TPCH(1) | TPCH-Q3 | parquet / none / none | 0.60 | 0.60| +0.03% | 3.71% | 3.99%| 8 | -0.04% | -0.06 | 0.01 | | TPCH(1) | TPCH-Q2 | parquet / none / none | 0.44 | 0.44| -0.70% | 8.56% | 7.42%| 8 | +0.61% | 0.45| -0.18 | | TPCH(1) | TPCH-Q6 | parquet / none / none | 0.31 | 0.31| -0.07% | 1.73% | 1.56%| 8 | -0.06% | -1.34 | -0.09 | | TPCH(1) | TPCH-Q14 | parquet / none / none | 0.48 | 0.48| -0.25% | 1.20% | 1.47%| 8 | -0.09% | -0.57 | -0.37 | | TPCH(1) | TPCH-Q12 | parquet / none / none | 0.54 | 0.54| -0.86% | 1.90% | 4.48%| 8 | +0.11% | 0.32| -0.50 | | TPCH(1) | TPCH-Q21 | parquet / none / none | 1.35 | 1.37| -1.56% | 3.11% | 2.55%| 8 | -0.19% | -0.70 | -1.11 | | TPCH(1) | TPCH-Q16 | parquet / none / none | 0.26 | 0.27| -1.74% | 9.53% | * 13.61% * | 8 | -0.32% | -0.06 | -0.30 | | TPCH(1) | TPCH-Q1 | parquet / none / none | 0.64 | 0.67| -4.64% | 1.13% | * 12.59% * | 8 | -0.28% | -0.19 | -1.04 | | TPCH(1) | TPCH-Q13
[Impala-ASF-CR] IMPALA-12535: Fix misleading metric keys for the threadz page
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20658 ) Change subject: IMPALA-12535: Fix misleading metric keys for the threadz page .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/20658 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I15a8cf0a318bc7122d1f5df29f18d8e467249ef7 Gerrit-Change-Number: 20658 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Yida Wu Gerrit-Reviewer: Yifan Zhang Gerrit-Comment-Date: Thu, 16 Nov 2023 23:41:43 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12306: (Part 2) Make codegen cache tests with symbol emitter more robust
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20318 ) Change subject: IMPALA-12306: (Part 2) Make codegen cache tests with symbol emitter more robust .. Patch Set 6: Code-Review+1 (1 comment) It makes sense to me. http://gerrit.cloudera.org:8080/#/c/20318/4/be/src/codegen/codegen-symbol-emitter.cc File be/src/codegen/codegen-symbol-emitter.cc: http://gerrit.cloudera.org:8080/#/c/20318/4/be/src/codegen/codegen-symbol-emitter.cc@58 PS4, Line 58: stringstream s; > I think it's a good idea to check it and doesn't make normal execution slow Done -- To view, visit http://gerrit.cloudera.org:8080/20318 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I61b9b0de9c896f3de7eb1be7de33d822b1ab70d0 Gerrit-Change-Number: 20318 Gerrit-PatchSet: 6 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Mon, 13 Nov 2023 19:15:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12306: (Part 2) Make codegen cache tests with symbol emitter more robust
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20318 ) Change subject: IMPALA-12306: (Part 2) Make codegen cache tests with symbol emitter more robust .. Patch Set 4: Code-Review+1 (2 comments) It looks a great idea to me http://gerrit.cloudera.org:8080/#/c/20318/4/be/src/codegen/codegen-symbol-emitter.cc File be/src/codegen/codegen-symbol-emitter.cc: http://gerrit.cloudera.org:8080/#/c/20318/4/be/src/codegen/codegen-symbol-emitter.cc@58 PS4, Line 58: if (non_freed_objects_ > 0) { I am not very sure if we also need to consider the abnormal case non_freed_objects_ less than 0 in release version, because DCHECK only handles debug version. What do you think? http://gerrit.cloudera.org:8080/#/c/20318/4/be/src/codegen/codegen-symbol-emitter.cc@64 PS4, Line 64: nit. seems no need for the new line -- To view, visit http://gerrit.cloudera.org:8080/20318 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I61b9b0de9c896f3de7eb1be7de33d822b1ab70d0 Gerrit-Change-Number: 20318 Gerrit-PatchSet: 4 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Mon, 13 Nov 2023 07:14:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12535: Fix misleading metric keys for the threadz page
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20658 ) Change subject: IMPALA-12535: Fix misleading metric keys for the threadz page .. Patch Set 2: Code-Review+1 LGTM! -- To view, visit http://gerrit.cloudera.org:8080/20658 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I15a8cf0a318bc7122d1f5df29f18d8e467249ef7 Gerrit-Change-Number: 20658 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Sat, 11 Nov 2023 06:41:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12535: Fix misleading metric keys for the threadz page
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20658 ) Change subject: IMPALA-12535: Fix misleading metric keys for the threadz page .. Patch Set 1: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/20658 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I15a8cf0a318bc7122d1f5df29f18d8e467249ef7 Gerrit-Change-Number: 20658 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Mon, 06 Nov 2023 02:15:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12510: Floor PlanFragment.maxParallelism at 1
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20606 ) Change subject: IMPALA-12510: Floor PlanFragment.maxParallelism_ at 1 .. Patch Set 2: Code-Review+2 (2 comments) Thanks Riza for the quick fix. http://gerrit.cloudera.org:8080/#/c/20606/2/fe/src/main/java/org/apache/impala/planner/ScanNode.java File fe/src/main/java/org/apache/impala/planner/ScanNode.java: http://gerrit.cloudera.org:8080/#/c/20606/2/fe/src/main/java/org/apache/impala/planner/ScanNode.java@89 PS2, Line 89: 1 nit. seems MIN_NUM_SCAN_THREADS is better http://gerrit.cloudera.org:8080/#/c/20606/2/testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test File testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test: http://gerrit.cloudera.org:8080/#/c/20606/2/testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test@1 PS2, Line 1: # select star on empty table nit. Can we add a comment: # Regression test for IMPALA-12510. -- To view, visit http://gerrit.cloudera.org:8080/20606 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibfa50abfdb9cdb994c5c3d7904b377a25f5b8b97 Gerrit-Change-Number: 20606 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Sat, 21 Oct 2023 02:48:55 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12510: Floor PlanFragment.maxParallelism at 1
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20606 ) Change subject: IMPALA-12510: Floor PlanFragment.maxParallelism_ at 1 .. Patch Set 1: (5 comments) http://gerrit.cloudera.org:8080/#/c/20606/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20606/1//COMMIT_MSG@9 PS1, Line 9: bug I assume the bug is only happening over an empty table, could you please describe a bit more how the bug looks like, what error it may throw. http://gerrit.cloudera.org:8080/#/c/20606/1//COMMIT_MSG@14 PS1, Line 14: empty nit. an empty http://gerrit.cloudera.org:8080/#/c/20606/1/fe/src/main/java/org/apache/impala/planner/PlanFragment.java File fe/src/main/java/org/apache/impala/planner/PlanFragment.java: http://gerrit.cloudera.org:8080/#/c/20606/1/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1139 PS1, Line 1139: 1 How about sharing the same const mentioned in ScanNode.java? Or maybe use a separate const for this number? http://gerrit.cloudera.org:8080/#/c/20606/1/fe/src/main/java/org/apache/impala/planner/ScanNode.java File fe/src/main/java/org/apache/impala/planner/ScanNode.java: http://gerrit.cloudera.org:8080/#/c/20606/1/fe/src/main/java/org/apache/impala/planner/ScanNode.java@89 PS1, Line 89: 1 How about we name a const for it? Like MIN_SCANNER_THREADS? http://gerrit.cloudera.org:8080/#/c/20606/1/fe/src/main/java/org/apache/impala/planner/ScanNode.java@373 PS1, Line 373: 1 May use the const mentioned above. -- To view, visit http://gerrit.cloudera.org:8080/20606 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibfa50abfdb9cdb994c5c3d7904b377a25f5b8b97 Gerrit-Change-Number: 20606 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Fri, 20 Oct 2023 20:58:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12478: Invalid length in StringValue::LeastSmallerString()
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20579 ) Change subject: IMPALA-12478: Invalid length in StringValue::LeastSmallerString() .. Patch Set 1: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/20579 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0888f89070a1efeb9efefd1c3ca96dfa873942fd Gerrit-Change-Number: 20579 Gerrit-PatchSet: 1 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Yida Wu Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 16 Oct 2023 16:27:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5081: Add codegen opt level query option
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20399 ) Change subject: IMPALA-5081: Add codegen_opt_level query option .. Patch Set 7: Code-Review+1 (4 comments) http://gerrit.cloudera.org:8080/#/c/20399/7/be/src/codegen/llvm-codegen-cache.h File be/src/codegen/llvm-codegen-cache.h: http://gerrit.cloudera.org:8080/#/c/20399/7/be/src/codegen/llvm-codegen-cache.h@178 PS7, Line 178: opt_level nit. do you think it good to add a DCHECK for opt_level to see if it out of TCodeGenOptLevel range? Because we will reset the cache entry from existing cache entry in the look up process, maybe some abnormal could be detected earlier here if opt value goes wrong. http://gerrit.cloudera.org:8080/#/c/20399/7/be/src/codegen/llvm-codegen-cache.h@183 PS7, Line 183: num_opt_functions nit. could you please add comments for the new fields? Would be good to tell the difference between the existing num_functions and num_instructions. http://gerrit.cloudera.org:8080/#/c/20399/7/be/src/codegen/llvm-codegen-cache.h@189 PS7, Line 189: TCodeGenOptLevel::type opt_level; nit. could you please add a simple comment for the field. http://gerrit.cloudera.org:8080/#/c/20399/7/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: http://gerrit.cloudera.org:8080/#/c/20399/7/be/src/codegen/llvm-codegen.cc@1454 PS7, Line 1454: break; nit. do we need the default case to handle unexpected values? -- To view, visit http://gerrit.cloudera.org:8080/20399 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I371f8758b6552263e91a1fbfd9a6e1c28e1fa2bd Gerrit-Change-Number: 20399 Gerrit-PatchSet: 7 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Tue, 12 Sep 2023 17:51:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5081: Add codegen opt level query option
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20399 ) Change subject: IMPALA-5081: Add codegen_opt_level query option .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/20399/2/be/src/codegen/llvm-codegen-cache.h File be/src/codegen/llvm-codegen-cache.h: http://gerrit.cloudera.org:8080/#/c/20399/2/be/src/codegen/llvm-codegen-cache.h@166 PS2, Line 166: bool Empty() { return engine_pointer == nullptr; } > Not relevant to this change but wouldn't it be better to call Reset() here Thanks Daniel for pointing this out. It makes sense to me. I think it was because previously the struct contained shared_ptr, and somehow needs some hack to initialize it, but now it changes to a pointer. It should be okay if all the codegen caching tests can pass. -- To view, visit http://gerrit.cloudera.org:8080/20399 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I371f8758b6552263e91a1fbfd9a6e1c28e1fa2bd Gerrit-Change-Number: 20399 Gerrit-PatchSet: 3 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Fri, 25 Aug 2023 00:09:35 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12253: Improve the integration of codegen cache and async codegen
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20211 ) Change subject: IMPALA-12253: Improve the integration of codegen cache and async codegen .. Patch Set 2: Thanks Michael for the review. Passed exhaustive tests. -- To view, visit http://gerrit.cloudera.org:8080/20211 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5ae4b342ff8ef1c3b7ce35c927baa8b59d72908 Gerrit-Change-Number: 20211 Gerrit-PatchSet: 2 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Tue, 08 Aug 2023 19:39:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12253: Improve the integration of codegen cache and async codegen
Yida Wu has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/20211 ) Change subject: IMPALA-12253: Improve the integration of codegen cache and async codegen .. IMPALA-12253: Improve the integration of codegen cache and async codegen To improve the integration of codegen cache and async codegen, the seperate the FinalizeModule() into two parts, PrepareFinalizeModule() and FinalizeModule(). If codegen cache enabled, PrepareFinalizeModule() would generate the key of the module, so that if there is a cache hit, we could skip the FinalizeModule() no matter in async or sync. Otherwise, if the cache misses, we would keep going and call FinalizeModule() to finish the compilation of the module. This helps the codegen cache to have the advantage of async codegen for the first visit while reducing the need of creating new threads for async codegen if cache exists. Did the perf a-b test with async codegen and codegen cache enabled, didn't see any significant performance regression. Change-Id: Ic5ae4b342ff8ef1c3b7ce35c927baa8b59d72908 --- M be/src/benchmarks/expr-benchmark.cc M be/src/benchmarks/hash-benchmark.cc M be/src/codegen/llvm-codegen-cache-test.cc M be/src/codegen/llvm-codegen-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exprs/expr-codegen-test.cc M be/src/runtime/fragment-state.cc M be/src/runtime/test-env.cc M be/src/service/fe-support.cc 10 files changed, 125 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/20211/2 -- To view, visit http://gerrit.cloudera.org:8080/20211 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic5ae4b342ff8ef1c3b7ce35c927baa8b59d72908 Gerrit-Change-Number: 20211 Gerrit-PatchSet: 2 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith
[native-toolchain-CR] IMPALA-12334: Add debug symbols to LLVM assert build
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20311 ) Change subject: IMPALA-12334: Add debug symbols to LLVM assert build .. Patch Set 2: Paste a link of related doc: https://llvm.org/docs/CMake.html#cmake-build-type -- To view, visit http://gerrit.cloudera.org:8080/20311 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9c58bfe2233dcec39fcad92fc52ce372165229d8 Gerrit-Change-Number: 20311 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 03 Aug 2023 18:05:50 + Gerrit-HasComments: No
[native-toolchain-CR] IMPALA-12334: Add debug symbols to LLVM assert build
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20311 ) Change subject: IMPALA-12334: Add debug symbols to LLVM assert build .. Patch Set 2: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/20311 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9c58bfe2233dcec39fcad92fc52ce372165229d8 Gerrit-Change-Number: 20311 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 03 Aug 2023 18:04:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12269: Codegen cache false negative because of function names hash
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20168 ) Change subject: IMPALA-12269: Codegen cache false negative because of function names hash .. Patch Set 6: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/20168/5/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: http://gerrit.cloudera.org:8080/#/c/20168/5/be/src/codegen/llvm-codegen.cc@1438 PS5, Line 1438: for (auto& entry : fns_to_jit_compile_) { > It's already checked by the DCHECKS on L1448 and L1469. Ack -- To view, visit http://gerrit.cloudera.org:8080/20168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibf1d2b424c969fbba181ab90bf9c7bf22355f139 Gerrit-Change-Number: 20168 Gerrit-PatchSet: 6 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 03 Aug 2023 15:02:38 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12269: Codegen cache false negative because of function names hash
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20168 ) Change subject: IMPALA-12269: Codegen cache false negative because of function names hash .. Patch Set 5: Code-Review+1 (2 comments) It looks good. Could Csaba have one more look at the patch after changing to use map for codegen functions? http://gerrit.cloudera.org:8080/#/c/20168/5/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: http://gerrit.cloudera.org:8080/#/c/20168/5/be/src/codegen/llvm-codegen.cc@1202 PS5, Line 1202: nit. unnecessary newline http://gerrit.cloudera.org:8080/#/c/20168/5/be/src/codegen/llvm-codegen.cc@1438 PS5, Line 1438: // Get pointers to all codegen'd functions. Because for SetFunctionPointers() "either both or none of 'cache' and 'cache_key' should be NULL", can we add a DCHECK here for cache and cache_key to make sure they meet our requirement? -- To view, visit http://gerrit.cloudera.org:8080/20168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibf1d2b424c969fbba181ab90bf9c7bf22355f139 Gerrit-Change-Number: 20168 Gerrit-PatchSet: 5 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Tue, 01 Aug 2023 23:18:00 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12292: TestCodegenCache.{test codegen cache with asm module dir,test codegen cache with perf map} fail in builds
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20224 ) Change subject: IMPALA-12292: TestCodegenCache.{test_codegen_cache_with_asm_module_dir,test_codegen_cache_with_perf_map} fail in builds .. Patch Set 4: Code-Review+2 Thanks Daniel for fixing this. To enhance the robustness of the testcase, we could follow up on IMPALA-12306. -- To view, visit http://gerrit.cloudera.org:8080/20224 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I15320b8c0d06f4d93927b19731c11bd4e15b3690 Gerrit-Change-Number: 20224 Gerrit-PatchSet: 4 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 27 Jul 2023 18:20:06 + Gerrit-HasComments: No