[Impala-ASF-CR] IMPALA-12562: Cast double and float to string with exact presicion
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/21441 ) Change subject: IMPALA-12562: Cast double and float to string with exact presicion .. Patch Set 5: > Patch Set 4: > > > Patch Set 4: Verified-1 > > > > Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10665/ > > Tests show some real failures: > https://jenkins.impala.io/job/ubuntu-20.04-from-scratch/2665/testReport/ Related tests have been updated in PS5. We use boost::lexical_cast to cast double/float to string in TestCast, which also loses precisions. So I use string values directly for comparison. -- To view, visit http://gerrit.cloudera.org:8080/21441 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icd79c55dd57dc0fa13e4ec11c2284ef2800e8b1a Gerrit-Change-Number: 21441 Gerrit-PatchSet: 5 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yifan Zhang Gerrit-Comment-Date: Fri, 24 May 2024 14:59:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12562: Cast double and float to string with exact presicion
Hello Daniel Becker, Gabor Kaszab, Csaba Ringhofer, Michael Smith, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21441 to look at the new patch set (#5). Change subject: IMPALA-12562: Cast double and float to string with exact presicion .. IMPALA-12562: Cast double and float to string with exact presicion The builtin functions casttostring(DOUBLE) and casttostring(FLOAT) printed more digits when converting double and float values to string values. This patch fixes this by switching to use the existing methods DoubleToBuffer and FloatToBuffer, which are simple and fast implementations to print necessary digits. Testing: - Add end-to-end tests to verify the fixes - Add benchmarks for modified functions - Update tests in expr-test Change-Id: Icd79c55dd57dc0fa13e4ec11c2284ef2800e8b1a --- M be/src/benchmarks/expr-benchmark.cc M be/src/exprs/cast-functions-ir.cc M be/src/exprs/expr-test.cc M testdata/workloads/functional-query/queries/QueryTest/exprs.test 4 files changed, 173 insertions(+), 71 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/21441/5 -- To view, visit http://gerrit.cloudera.org:8080/21441 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icd79c55dd57dc0fa13e4ec11c2284ef2800e8b1a Gerrit-Change-Number: 21441 Gerrit-PatchSet: 5 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yifan Zhang
[Impala-ASF-CR] IMPALA-12562: Cast double and float to string with exact presicion
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/21441 ) Change subject: IMPALA-12562: Cast double and float to string with exact presicion .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/21441/3/be/src/exprs/cast-functions-ir.cc File be/src/exprs/cast-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/21441/3/be/src/exprs/cast-functions-ir.cc@352 PS3, Line 352: strlen > strlen could be avoid by saving the original pointer and comparing the diff It seems that the returned pointer doesn't point to the terminating null byte and it can't be used to compare with the original pointer. I also found this strlen redundant, since snprintf returns the length of the buffer. Maybe we have to use strlen if we don't want to change implementations in gutil/strings. -- To view, visit http://gerrit.cloudera.org:8080/21441 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icd79c55dd57dc0fa13e4ec11c2284ef2800e8b1a Gerrit-Change-Number: 21441 Gerrit-PatchSet: 3 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yifan Zhang Gerrit-Comment-Date: Thu, 23 May 2024 08:33:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12562: Cast double and float to string with exact presicion
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/21441 ) Change subject: IMPALA-12562: Cast double and float to string with exact presicion .. Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/21441/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21441/2//COMMIT_MSG@12 PS2, Line 12: le a > Can you provide some benchmarks? I expect it to be an improvement, but it w I added some benchmarks, I guess this could bring a slight performance degradation since it does more work to print more accurate results. According to comments in gutil/strings/numbers.cc, these implementations are about as fast as other strategies and do not need to introduce a new library. http://gerrit.cloudera.org:8080/#/c/21441/2/be/src/exprs/cast-functions-ir.cc File be/src/exprs/cast-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/21441/2/be/src/exprs/cast-functions-ir.cc@344 PS2, Line 344: if (val.is_null) return StringVal::null(); \ > At this point no allocation happened yet so this check is not useful. Done http://gerrit.cloudera.org:8080/#/c/21441/2/be/src/exprs/cast-functions-ir.cc@355 PS2, Line 355: sary(ctx-> > I would prefer using DoubleToBuffer / FloatToBuffer as it avoids the extra Done -- To view, visit http://gerrit.cloudera.org:8080/21441 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icd79c55dd57dc0fa13e4ec11c2284ef2800e8b1a Gerrit-Change-Number: 21441 Gerrit-PatchSet: 3 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yifan Zhang Gerrit-Comment-Date: Wed, 22 May 2024 09:56:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12562: Cast double and float to string with exact presicion
Hello Daniel Becker, Gabor Kaszab, Csaba Ringhofer, Michael Smith, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21441 to look at the new patch set (#3). Change subject: IMPALA-12562: Cast double and float to string with exact presicion .. IMPALA-12562: Cast double and float to string with exact presicion The builtin functions casttostring(DOUBLE) and casttostring(FLOAT) printed more digits when converting double and float values to string values. This patch fixes this by switching to use the existing methods DoubleToBuffer and FloatToBuffer, which are simple and fast implementations to print necessary digits. Testing: - Add end-to-end tests to verify the fixes - Add benchmarks for modified functions Change-Id: Icd79c55dd57dc0fa13e4ec11c2284ef2800e8b1a --- M be/src/benchmarks/expr-benchmark.cc M be/src/exprs/cast-functions-ir.cc M testdata/workloads/functional-query/queries/QueryTest/exprs.test 3 files changed, 135 insertions(+), 47 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/21441/3 -- To view, visit http://gerrit.cloudera.org:8080/21441 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icd79c55dd57dc0fa13e4ec11c2284ef2800e8b1a Gerrit-Change-Number: 21441 Gerrit-PatchSet: 3 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yifan Zhang
[Impala-ASF-CR] IMPALA-12562: Cast double and float to string with exact presicion
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/21441 ) Change subject: IMPALA-12562: Cast double and float to string with exact presicion .. Patch Set 2: (1 comment) > Patch Set 1: Code-Review+1 > > (1 comment) > > Thanks for fixing this! Thanks for your review! http://gerrit.cloudera.org:8080/#/c/21441/1/testdata/workloads/functional-query/queries/QueryTest/exprs.test File testdata/workloads/functional-query/queries/QueryTest/exprs.test: http://gerrit.cloudera.org:8080/#/c/21441/1/testdata/workloads/functional-query/queries/QueryTest/exprs.test@3300 PS1, Line 3300: select cast(round(cast(1.33 as double), 2) as string); > A few more test cases around the limits of large/small number of decimals w Done -- To view, visit http://gerrit.cloudera.org:8080/21441 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icd79c55dd57dc0fa13e4ec11c2284ef2800e8b1a Gerrit-Change-Number: 21441 Gerrit-PatchSet: 2 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yifan Zhang Gerrit-Comment-Date: Sat, 18 May 2024 01:14:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12562: Cast double and float to string with exact presicion
Hello Daniel Becker, Gabor Kaszab, Csaba Ringhofer, Michael Smith, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21441 to look at the new patch set (#2). Change subject: IMPALA-12562: Cast double and float to string with exact presicion .. IMPALA-12562: Cast double and float to string with exact presicion The builtin functions casttostring(DOUBLE) and casttostring(FLOAT) printed more digits when converting double and float values to string values. This patch fixes this by switching to use the existing methods SimpleDtoa and SimpleFtoa, which are simple and fast implementations to print necessary digits. Testing: - Add end-to-end tests to query_test/test_exprs.py Change-Id: Icd79c55dd57dc0fa13e4ec11c2284ef2800e8b1a --- M be/src/exprs/cast-functions-ir.cc M testdata/workloads/functional-query/queries/QueryTest/exprs.test 2 files changed, 104 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/21441/2 -- To view, visit http://gerrit.cloudera.org:8080/21441 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icd79c55dd57dc0fa13e4ec11c2284ef2800e8b1a Gerrit-Change-Number: 21441 Gerrit-PatchSet: 2 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith
[Impala-ASF-CR] IMPALA-12562: Cast double and float to string with exact presicion
Yifan Zhang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21441 Change subject: IMPALA-12562: Cast double and float to string with exact presicion .. IMPALA-12562: Cast double and float to string with exact presicion The builtin functions casttostring(DOUBLE) and casttostring(FLOAT) printed more digits when converting double and float values to string values. This patch fixes this by switching to use the existing methods SimpleDtoa and SimpleFtoa, which are simple and fast implementations to print necessary digits. Testing: - Add end-to-end tests to query_test/test_exprs.py Change-Id: Icd79c55dd57dc0fa13e4ec11c2284ef2800e8b1a --- M be/src/exprs/cast-functions-ir.cc M testdata/workloads/functional-query/queries/QueryTest/exprs.test 2 files changed, 43 insertions(+), 25 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/21441/1 -- To view, visit http://gerrit.cloudera.org:8080/21441 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Icd79c55dd57dc0fa13e4ec11c2284ef2800e8b1a Gerrit-Change-Number: 21441 Gerrit-PatchSet: 1 Gerrit-Owner: Yifan Zhang
[Impala-ASF-CR](branch-3.4.2) IMPALA-12288: Add BUILD WITH NO TESTS option to remove test targets
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/21262 ) Change subject: IMPALA-12288: Add BUILD_WITH_NO_TESTS option to remove test targets .. Patch Set 1: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/21262 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: branch-3.4.2 Gerrit-MessageType: comment Gerrit-Change-Id: I575ce76176c9f6a05fd2db0f420ebe6926d0272a Gerrit-Change-Number: 21262 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Yifan Zhang Gerrit-Reviewer: Zihao Ye Gerrit-Comment-Date: Tue, 09 Apr 2024 08:19:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12852: Make Kudu service start and stop independent
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/21090 ) Change subject: IMPALA-12852: Make Kudu service start and stop independent .. Patch Set 2: > Patch Set 2: > > Please rename the new scripts to start-kudu and stop-kudu Considering that the other scripts in the folder 'testdata/bin/' used for starting and stoping cluster services are named ‘run-xxx.sh' and 'kill-xxx.sh', I'd like to make new scripts consistent in naming. -- To view, visit http://gerrit.cloudera.org:8080/21090 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9624aaa61353bb4520e879570e5688d5e3493201 Gerrit-Change-Number: 21090 Gerrit-PatchSet: 2 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Yifan Zhang Gerrit-Comment-Date: Wed, 27 Mar 2024 09:26:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12852: Make Kudu service start and stop independent
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21090 to look at the new patch set (#2). Change subject: IMPALA-12852: Make Kudu service start and stop independent .. IMPALA-12852: Make Kudu service start and stop independent This patch decouples run-kudu.sh and kill-kudu.sh from run-mini-dfs.sh and kill-mini-dfs.sh. These scripts can be useful for setting up test environments that require no or only Kudu service. Testing: - Ran the modified and new scripts and checked they worked as expected. Change-Id: I9624aaa61353bb4520e879570e5688d5e3493201 --- A testdata/bin/kill-kudu.sh M testdata/bin/run-all.sh A testdata/bin/run-kudu.sh M testdata/cluster/admin 4 files changed, 123 insertions(+), 16 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/21090/2 -- To view, visit http://gerrit.cloudera.org:8080/21090 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9624aaa61353bb4520e879570e5688d5e3493201 Gerrit-Change-Number: 21090 Gerrit-PatchSet: 2 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-12834: Add number of concurrent queries to profile
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/21063 ) Change subject: IMPALA-12834: Add number of concurrent queries to profile .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/21063/3/be/src/scheduling/admission-controller.cc File be/src/scheduling/admission-controller.cc: http://gerrit.cloudera.org:8080/#/c/21063/3/be/src/scheduling/admission-controller.cc@189 PS3, Line 189: Number of running queries in designated executor group when ad > I think this description is somewhat misleading, its actual meaning seems t The description is updated. Collecting the query load during the execution of a query is a good idea, but it's hard to define and calculate an average value. -- To view, visit http://gerrit.cloudera.org:8080/21063 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8389215b60022b39e7d171d6fc2418acca7c0658 Gerrit-Change-Number: 21063 Gerrit-PatchSet: 4 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Yifan Zhang Gerrit-Reviewer: Zihao Ye Gerrit-Comment-Date: Thu, 29 Feb 2024 10:14:00 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12834: Add number of concurrent queries to profile
Hello Zihao Ye, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21063 to look at the new patch set (#4). Change subject: IMPALA-12834: Add number of concurrent queries to profile .. IMPALA-12834: Add number of concurrent queries to profile This patch adds profile info string for the number of current running queries of the executor group on which the query is scheduled, to diagnose potential performance issues due to resource limit. Testing: - Add an e2e test to verify the information appears in profile Change-Id: I8389215b60022b39e7d171d6fc2418acca7c0658 --- M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M tests/custom_cluster/test_admission_controller.py 3 files changed, 43 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/21063/4 -- To view, visit http://gerrit.cloudera.org:8080/21063 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8389215b60022b39e7d171d6fc2418acca7c0658 Gerrit-Change-Number: 21063 Gerrit-PatchSet: 4 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Yifan Zhang Gerrit-Reviewer: Zihao Ye
[Impala-ASF-CR] IMPALA-12852: Make Kudu service start and stop independent
Yifan Zhang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21090 Change subject: IMPALA-12852: Make Kudu service start and stop independent .. IMPALA-12852: Make Kudu service start and stop independent This patch decouples run-kudu.sh and kill-kudu.sh from run-mini-dfs.sh and kill-mini-dfs.sh. These scripts can be useful for setting up test environments that require no or only Kudu service. Testing: - Ran the modified and new scripts and checked they worked as expected. Change-Id: I9624aaa61353bb4520e879570e5688d5e3493201 --- A testdata/bin/kill-kudu.sh A testdata/bin/run-kudu.sh M testdata/cluster/admin 3 files changed, 118 insertions(+), 13 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/21090/1 -- To view, visit http://gerrit.cloudera.org:8080/21090 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I9624aaa61353bb4520e879570e5688d5e3493201 Gerrit-Change-Number: 21090 Gerrit-PatchSet: 1 Gerrit-Owner: Yifan Zhang
[Impala-ASF-CR] IMPALA-12834: Add number of concurrent queries to profile
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/21063 ) Change subject: IMPALA-12834: Add number of concurrent queries to profile .. Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/21063/2/be/src/scheduling/admission-controller.h File be/src/scheduling/admission-controller.h: http://gerrit.cloudera.org:8080/#/c/21063/2/be/src/scheduling/admission-controller.h@1171 PS2, Line 1171: for the given executor group. > nit: for the given executor group Done http://gerrit.cloudera.org:8080/#/c/21063/2/be/src/scheduling/admission-controller.cc File be/src/scheduling/admission-controller.cc: http://gerrit.cloudera.org:8080/#/c/21063/2/be/src/scheduling/admission-controller.cc@189 PS2, Line 189: designate > nit: designated ? Done http://gerrit.cloudera.org:8080/#/c/21063/2/tests/custom_cluster/test_admission_controller.py File tests/custom_cluster/test_admission_controller.py: http://gerrit.cloudera.org:8080/#/c/21063/2/tests/custom_cluster/test_admission_controller.py@928 PS2, Line 928: "Admission result: Admitted im > nit: indent spaces Done -- To view, visit http://gerrit.cloudera.org:8080/21063 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8389215b60022b39e7d171d6fc2418acca7c0658 Gerrit-Change-Number: 21063 Gerrit-PatchSet: 3 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Yifan Zhang Gerrit-Comment-Date: Tue, 27 Feb 2024 03:05:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12834: Add number of concurrent queries to profile
Hello Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21063 to look at the new patch set (#3). Change subject: IMPALA-12834: Add number of concurrent queries to profile .. IMPALA-12834: Add number of concurrent queries to profile This patch adds profile info string for the number of current running queries of the executor group on which the query is scheduled, to diagnose potential performance issues due to resource limit. Testing: - Add an e2e test to verify the information appears in profile Change-Id: I8389215b60022b39e7d171d6fc2418acca7c0658 --- M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M tests/custom_cluster/test_admission_controller.py 3 files changed, 43 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/21063/3 -- To view, visit http://gerrit.cloudera.org:8080/21063 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8389215b60022b39e7d171d6fc2418acca7c0658 Gerrit-Change-Number: 21063 Gerrit-PatchSet: 3 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-12834: Add number of concurrent queries to profile
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21063 to look at the new patch set (#2). Change subject: IMPALA-12834: Add number of concurrent queries to profile .. IMPALA-12834: Add number of concurrent queries to profile This patch adds profile info string for the number of current running queries of the executor group on which the query is scheduled, to diagnose potential performance issues due to resource limit. Testing: - Add an e2e test to verify the information appears in profile Change-Id: I8389215b60022b39e7d171d6fc2418acca7c0658 --- M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M tests/custom_cluster/test_admission_controller.py 3 files changed, 43 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/21063/2 -- To view, visit http://gerrit.cloudera.org:8080/21063 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8389215b60022b39e7d171d6fc2418acca7c0658 Gerrit-Change-Number: 21063 Gerrit-PatchSet: 2 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-12834: Add number of concurrent queries to profile
Yifan Zhang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21063 Change subject: IMPALA-12834: Add number of concurrent queries to profile .. IMPALA-12834: Add number of concurrent queries to profile This patch adds profile info string for the number of current running queries of the executor group on which the query is scheduled, to diagnose potential performance issues due to resource limit. Testing: - add an e2e test to verify the information appears in profile Change-Id: I8389215b60022b39e7d171d6fc2418acca7c0658 --- M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M tests/custom_cluster/test_admission_controller.py 3 files changed, 42 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/21063/1 -- To view, visit http://gerrit.cloudera.org:8080/21063 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I8389215b60022b39e7d171d6fc2418acca7c0658 Gerrit-Change-Number: 21063 Gerrit-PatchSet: 1 Gerrit-Owner: Yifan Zhang
[Impala-ASF-CR] IMPALA-12801: Increase query log default size and bound its memory.
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/21020 ) Change subject: IMPALA-12801: Increase query_log_ default size and bound its memory. .. Patch Set 8: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/21020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I107e2c2c7f2b239557be37360e8eecf5479e8602 Gerrit-Change-Number: 21020 Gerrit-PatchSet: 8 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Yifan Zhang Gerrit-Reviewer: Zihao Ye Gerrit-Comment-Date: Tue, 20 Feb 2024 03:09:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12426: Adds the Impala built-in functions prettyprint duration and prettyprint memory.
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/21038 ) Change subject: IMPALA-12426: Adds the Impala built-in functions prettyprint_duration and prettyprint_memory. .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/21038/4/docs/topics/impala_string_functions.xml File docs/topics/impala_string_functions.xml: http://gerrit.cloudera.org:8080/#/c/21038/4/docs/topics/impala_string_functions.xml@1183 PS4, Line 1183: PRETTYPRINT_MEMORY nit: Maybe we can rename it to 'prettyprint_size' or 'prettyprint_bytes'? -- To view, visit http://gerrit.cloudera.org:8080/21038 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3e76632ce21ad2ca5df474160338699a542a6913 Gerrit-Change-Number: 21038 Gerrit-PatchSet: 4 Gerrit-Owner: Jason Fehr Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Yifan Zhang Gerrit-Comment-Date: Sun, 18 Feb 2024 10:14:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12801: Increase query log size from 100 to 200.
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/21020 ) Change subject: IMPALA-12801: Increase query_log_size from 100 to 200. .. Patch Set 6: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/21020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I107e2c2c7f2b239557be37360e8eecf5479e8602 Gerrit-Change-Number: 21020 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Yifan Zhang Gerrit-Reviewer: Zihao Ye Gerrit-Comment-Date: Sun, 18 Feb 2024 10:13:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12801: Increase query log size from 100 to 200.
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/21020 ) Change subject: IMPALA-12801: Increase query_log_size from 100 to 200. .. Patch Set 4: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/21020/4/www/queries.tmpl File www/queries.tmpl: http://gerrit.cloudera.org:8080/#/c/21020/4/www/queries.tmpl@30 PS4, Line 30: The size of that archive is controlled with the : --query_log_size command line parameter. nit: This should be updated. -- To view, visit http://gerrit.cloudera.org:8080/21020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I107e2c2c7f2b239557be37360e8eecf5479e8602 Gerrit-Change-Number: 21020 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Yifan Zhang Gerrit-Reviewer: Zihao Ye Gerrit-Comment-Date: Sun, 18 Feb 2024 04:01:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20804 ) Change subject: IMPALA-12631: Improve count star performance for parquet scans .. Patch Set 15: (1 comment) http://gerrit.cloudera.org:8080/#/c/20804/15/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/20804/15/be/src/exec/parquet/hdfs-parquet-scanner.cc@444 PS15, Line 444: if (file_metadata_.num_rows > 0) { Add this for backward compatibility: See L893 in this file. -- To view, visit http://gerrit.cloudera.org:8080/20804 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd Gerrit-Change-Number: 20804 Gerrit-PatchSet: 15 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Yifan Zhang Gerrit-Reviewer: Zihao Ye Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 05 Feb 2024 04:24:50 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans
Hello Riza Suminto, Zoltan Borok-Nagy, Zihao Ye, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20804 to look at the new patch set (#15). Change subject: IMPALA-12631: Improve count star performance for parquet scans .. IMPALA-12631: Improve count star performance for parquet scans Before this patch frontend generates multiple scan ranges for a parquet file when count star optimization is enabled. Backend function HdfsParquetScanner::GetNextInternal() also call NextRowGroup() multiple times to find row groups and sum up RowGroup.num_rows. This could be inefficient because we only need to read file metadata to compute count star. This patch optimizes it by creating only one scan range that contains the file footer for each parquet file. The following table shows a performance comparison before and after the patch. primitive_count_star_multiblock query is a modified primitive_count_star query that targets a multi-block tpch10_parquet.lineitem table. The files of the table are generated by the command `hdfs dfs -Ddfs.block.size=1048576 -cp -f -d`. +---+-+---++-++++---++-++ | Workload | Query | File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval | +---+-+---++-++++---++-++ | TPCDS(10) | TPCDS-Q_COUNT_OPTIMIZED | parquet / none / none | 0.17 | 0.16| +2.58% | * 29.53% * | * 27.16% * | 30| +1.20% | 0.58| 0.35 | | TPCDS(10) | TPCDS-Q_COUNT_UNOPTIMIZED | parquet / none / none | 0.27 | 0.26| +2.96% | 8.97%| 9.94%| 30| +0.16% | 0.44| 1.19 | | TPCDS(10) | TPCDS-Q_COUNT_ZERO_SLOT | parquet / none / none | 0.18 | 0.18| -0.69% | 1.65%| 1.99%| 30| -0.34% | -1.55 | -1.47 | | TARGETED-PERF(10) | primitive_count_star_multiblock | parquet / none / none | 0.06 | 0.12| I -49.88% | 4.11%| 3.53%| 30| I -99.97% | -6.54 | -66.81 | +---+-+---++-++++---++-++ Testing: - Ran PlannerTest#testParquetStatsAgg - Added new test cases to query_test/test_aggregation.py Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd --- M be/src/exec/parquet/hdfs-parquet-scanner.cc M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M testdata/workloads/functional-query/queries/QueryTest/hdfs-tiny-scan.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-in-predicate-push-down.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-plain-count-star-optimization.test M testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_optimized.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_unoptimized.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_zero_slot.test M tests/util/parse_util.py 11 files changed, 138 insertions(+), 63 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/20804/15 -- To view, visit http://gerrit.cloudera.org:8080/20804 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd Gerrit-Change-Number: 20804 Gerrit-PatchSet: 15 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Yifan Zhang Gerrit-Reviewer: Zihao Ye Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12381: Set JDBC related properties in JDBC data source
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20941 ) Change subject: IMPALA-12381: Set JDBC related properties in JDBC data source .. Patch Set 3: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/20941 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0a0b5d9d7b06825842828c3722c2bcdb4ea8 Gerrit-Change-Number: 20941 Gerrit-PatchSet: 3 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Yifan Zhang Gerrit-Reviewer: gaurav singh Gerrit-Comment-Date: Sun, 04 Feb 2024 09:59:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans
Yifan Zhang has abandoned this change. ( http://gerrit.cloudera.org:8080/20992 ) Change subject: IMPALA-12631: Improve count star performance for parquet scans .. Abandoned -- To view, visit http://gerrit.cloudera.org:8080/20992 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: Idf1177617477d19a92c4526adac3c486ae65ccd5 Gerrit-Change-Number: 20992 Gerrit-PatchSet: 1 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans
Hello Riza Suminto, Zoltan Borok-Nagy, Zihao Ye, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20804 to look at the new patch set (#14). Change subject: IMPALA-12631: Improve count star performance for parquet scans .. IMPALA-12631: Improve count star performance for parquet scans Before this patch frontend generates multiple scan ranges for a parquet file when count star optimization is enabled. Backend function HdfsParquetScanner::GetNextInternal() also call NextRowGroup() multiple times to find row groups and sum up RowGroup.num_rows. This could be inefficient because we only need to read file metadata to compute count star. This patch optimizes it by creating only one scan range that contains the file footer for each parquet file. The following table shows a performance comparison before and after the patch. primitive_count_star_multiblock query is a modified primitive_count_star query that targets a multi-block tpch10_parquet.lineitem table. The files of the table are generated by the command `hdfs dfs -Ddfs.block.size=1048576 -cp -f -d`. +---+-+---++-++++---++-++ | Workload | Query | File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval | +---+-+---++-++++---++-++ | TPCDS(10) | TPCDS-Q_COUNT_OPTIMIZED | parquet / none / none | 0.17 | 0.16| +2.58% | * 29.53% * | * 27.16% * | 30| +1.20% | 0.58| 0.35 | | TPCDS(10) | TPCDS-Q_COUNT_UNOPTIMIZED | parquet / none / none | 0.27 | 0.26| +2.96% | 8.97%| 9.94%| 30| +0.16% | 0.44| 1.19 | | TPCDS(10) | TPCDS-Q_COUNT_ZERO_SLOT | parquet / none / none | 0.18 | 0.18| -0.69% | 1.65%| 1.99%| 30| -0.34% | -1.55 | -1.47 | | TARGETED-PERF(10) | primitive_count_star_multiblock | parquet / none / none | 0.06 | 0.12| I -49.88% | 4.11%| 3.53%| 30| I -99.97% | -6.54 | -66.81 | +---+-+---++-++++---++-++ Testing: - Ran PlannerTest#testParquetStatsAgg - Added new test cases to query_test/test_aggregation.py Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd --- M be/src/exec/parquet/hdfs-parquet-scanner.cc M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M testdata/workloads/functional-query/queries/QueryTest/hdfs-tiny-scan.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-in-predicate-push-down.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-plain-count-star-optimization.test M testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_optimized.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_unoptimized.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_zero_slot.test M tests/util/parse_util.py 11 files changed, 144 insertions(+), 71 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/20804/14 -- To view, visit http://gerrit.cloudera.org:8080/20804 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd Gerrit-Change-Number: 20804 Gerrit-PatchSet: 14 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Yifan Zhang Gerrit-Reviewer: Zihao Ye Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans
Yifan Zhang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/20992 Change subject: IMPALA-12631: Improve count star performance for parquet scans .. IMPALA-12631: Improve count star performance for parquet scans Before this patch frontend generates multiple scan ranges for a parquet file when count star optimization is enabled. Backend function HdfsParquetScanner::GetNextInternal() also call NextRowGroup() multiple times to find row groups and sum up RowGroup.num_rows. This could be inefficient because we only need to read file metadata to compute count star. This patch optimizes it by creating only one scan range that contains the file footer for each parquet file. The following table shows a performance comparison before and after the patch. primitive_count_star_multiblock query is a modified primitive_count_star query that targets a multi-block tpch10_parquet.lineitem table. The files of the table are generated by the command `hdfs dfs -Ddfs.block.size=1048576 -cp -f -d`. +---+-+---++-++++---++-++ | Workload | Query | File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval | +---+-+---++-++++---++-++ | TPCDS(10) | TPCDS-Q_COUNT_OPTIMIZED | parquet / none / none | 0.17 | 0.16| +2.58% | * 29.53% * | * 27.16% * | 30| +1.20% | 0.58| 0.35 | | TPCDS(10) | TPCDS-Q_COUNT_UNOPTIMIZED | parquet / none / none | 0.27 | 0.26| +2.96% | 8.97%| 9.94%| 30| +0.16% | 0.44| 1.19 | | TPCDS(10) | TPCDS-Q_COUNT_ZERO_SLOT | parquet / none / none | 0.18 | 0.18| -0.69% | 1.65%| 1.99%| 30| -0.34% | -1.55 | -1.47 | | TARGETED-PERF(10) | primitive_count_star_multiblock | parquet / none / none | 0.06 | 0.12| I -49.88% | 4.11%| 3.53%| 30| I -99.97% | -6.54 | -66.81 | +---+-+---++-++++---++-++ Testing: - Ran PlannerTest#testParquetStatsAgg - Added new test cases to query_test/test_aggregation.py Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd rm query option Change-Id: Idf1177617477d19a92c4526adac3c486ae65ccd5 --- M be/src/exec/parquet/hdfs-parquet-scanner.cc M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M testdata/workloads/functional-query/queries/QueryTest/hdfs-tiny-scan.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-in-predicate-push-down.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-plain-count-star-optimization.test M testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_optimized.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_unoptimized.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_zero_slot.test M tests/util/parse_util.py 11 files changed, 144 insertions(+), 71 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/20992/1 -- To view, visit http://gerrit.cloudera.org:8080/20992 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Idf1177617477d19a92c4526adac3c486ae65ccd5 Gerrit-Change-Number: 20992 Gerrit-PatchSet: 1 Gerrit-Owner: Yifan Zhang
[Impala-ASF-CR] IMPALA-12381: Set JDBC related properties in JDBC data source
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20941 ) Change subject: IMPALA-12381: Set JDBC related properties in JDBC data source .. Patch Set 2: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/20941/2/fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java File fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java: http://gerrit.cloudera.org:8080/#/c/20941/2/fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java@324 PS2, Line 324: if (Strings.isNullOrEmpty(tblInitString)) { : if (dsPropertyMap != null) { : // Change keys to lower case. : for (Map.Entry entry : dsPropertyMap.entrySet()) { : combinedPropertyMap.put(entry.getKey().toLowerCase(), entry.getValue()); : } : } : } else { : // Append additional properties of DataSource to initString. : try { : Map tblPropertyMap = : JsonUtil.convertJSonToPropertyMap(tblInitString); : if (tblPropertyMap != null) { : // Change keys to lower case. : for (Map.Entry entry : tblPropertyMap.entrySet()) { : combinedPropertyMap.put(entry.getKey().toLowerCase(), entry.getValue()); : } : } : } catch (ImpalaRuntimeException e) { : // Return initString which is set in the table creation statement if it's : // invalid JSON string. This could happen for non JDBC data source. : return tblInitString; : } : if (dsPropertyMap != null) { : for (Map.Entry entry : dsPropertyMap.entrySet()) { : if (!combinedPropertyMap.containsKey(entry.getKey().toLowerCase())) { : combinedPropertyMap.put(entry.getKey().toLowerCase(), entry.getValue()); : } : } : } : } Can we reorganize this to: if (dsPropertyMap != null) { // Change keys to lower case. for (Map.Entry entry : dsPropertyMap.entrySet()) { combinedPropertyMap.put(entry.getKey().toLowerCase(), entry.getValue()); } } if (Strings.isNullOrEmpty(tblInitString)) { Map tblPropertyMap = JsonUtil.convertJSonToPropertyMap(tblInitString); ... } That would be more readable. -- To view, visit http://gerrit.cloudera.org:8080/20941 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0a0b5d9d7b06825842828c3722c2bcdb4ea8 Gerrit-Change-Number: 20941 Gerrit-PatchSet: 2 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yifan Zhang Gerrit-Reviewer: gaurav singh Gerrit-Comment-Date: Fri, 02 Feb 2024 11:53:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12762: Fix cmake error in package building
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20965 ) Change subject: IMPALA-12762: Fix cmake error in package building .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/20965/1/bin/jenkins/build-all-flag-combinations.sh File bin/jenkins/build-all-flag-combinations.sh: http://gerrit.cloudera.org:8080/#/c/20965/1/bin/jenkins/build-all-flag-combinations.sh@42 PS1, Line 42: notests - > Let's change this to "notests" so new changes won't break the build again. Done http://gerrit.cloudera.org:8080/#/c/20965/1/bin/jenkins/build-all-flag-combinations.sh@50 PS1, Line 50: notests - > Let's change this to "notests" as well. Done -- To view, visit http://gerrit.cloudera.org:8080/20965 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ice0cbb0671d915f997fa74217521a82be164ae57 Gerrit-Change-Number: 20965 Gerrit-PatchSet: 2 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Yifan Zhang Gerrit-Reviewer: Zihao Ye Gerrit-Comment-Date: Tue, 30 Jan 2024 08:31:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12762: Fix cmake error in package building
Hello Quanlong Huang, Zihao Ye, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20965 to look at the new patch set (#2). Change subject: IMPALA-12762: Fix cmake error in package building .. IMPALA-12762: Fix cmake error in package building This patch adds extra processing of option 'BUILD_WITH_NO_TESTS' in be/src/exec/json/CMakeLists.txt, so test targets will not be generated by the CMake when building Impala with -package and -notests. Testing: - Run './buildall.sh -noclean -notests -package' with no error Change-Id: Ice0cbb0671d915f997fa74217521a82be164ae57 --- M be/src/exec/json/CMakeLists.txt M bin/jenkins/build-all-flag-combinations.sh 2 files changed, 6 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/20965/2 -- To view, visit http://gerrit.cloudera.org:8080/20965 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ice0cbb0671d915f997fa74217521a82be164ae57 Gerrit-Change-Number: 20965 Gerrit-PatchSet: 2 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Yifan Zhang Gerrit-Reviewer: Zihao Ye
[Impala-ASF-CR] IMPALA-12762: Fix cmake error in package building
Yifan Zhang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/20965 Change subject: IMPALA-12762: Fix cmake error in package building .. IMPALA-12762: Fix cmake error in package building This patch adds extra processing of option 'BUILD_WITH_NO_TESTS' in be/src/exec/json/CMakeLists.txt, so test targets will not be generated by the CMake when building Impala with -package and -notests. Testing: - Run './buildall.sh -noclean -notests -package' with no error Change-Id: Ice0cbb0671d915f997fa74217521a82be164ae57 --- M be/src/exec/json/CMakeLists.txt 1 file changed, 4 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/20965/1 -- To view, visit http://gerrit.cloudera.org:8080/20965 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ice0cbb0671d915f997fa74217521a82be164ae57 Gerrit-Change-Number: 20965 Gerrit-PatchSet: 1 Gerrit-Owner: Yifan Zhang
[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20804 ) Change subject: IMPALA-12631: Improve count star performance for parquet scans .. Patch Set 13: (4 comments) http://gerrit.cloudera.org:8080/#/c/20804/12//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20804/12//COMMIT_MSG@16 PS12, Line 16: A new query option parquet_count_star_use_file_metadata is added for : forward compatibility. Its default value is true, if any inconsistency : between FileMetaData.num_rows and RowGroup.num_rows is found, we can : set it to false to get same results as old versions. > Probably that would be a corrupt Parquet file. But if we are afraid of inco Yeah. I adjusted it to sum RowGroup.num_rows in PS13 and got the same performance improvement by running the single node perf test. Then I think we do not need to introduce this new query option since no behavior changes are made. What do you think? http://gerrit.cloudera.org:8080/#/c/20804/12/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/20804/12/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1452 PS12, Line 1452: if (isFooterOnly) { : // Only generate one scan range for footer only scans. : currentOffset += remainingLength - currentLength; : remainingLength = currentLength; : } > Why do we need to this now? We didn't do that for partition key scans. For count star optimization scans, it's not a zero-slot scan, we have one slot for num rows statistic. But a partition scan is a zero-slot scan. We create a footer range for every scan range if it is not a zero-slot scan in HdfsScanner::IssueFooterRanges(). http://gerrit.cloudera.org:8080/#/c/20804/12/tests/query_test/test_aggregation.py File tests/query_test/test_aggregation.py: http://gerrit.cloudera.org:8080/#/c/20804/12/tests/query_test/test_aggregation.py@275 PS12, Line 275: > flake8: E501 line too long (91 > 90 characters) Done http://gerrit.cloudera.org:8080/#/c/20804/12/tests/query_test/test_aggregation.py@277 PS12, Line 277: > flake8: E501 line too long (91 > 90 characters) Done -- To view, visit http://gerrit.cloudera.org:8080/20804 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd Gerrit-Change-Number: 20804 Gerrit-PatchSet: 13 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Yifan Zhang Gerrit-Reviewer: Zihao Ye Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 22 Jan 2024 09:02:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans
Hello Riza Suminto, Zoltan Borok-Nagy, Zihao Ye, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20804 to look at the new patch set (#13). Change subject: IMPALA-12631: Improve count star performance for parquet scans .. IMPALA-12631: Improve count star performance for parquet scans Backend function HdfsParquetScanner::GetNextInternal() uses the data stored in the Parquet RowGroup.num_rows field to compute count star, it still needs to find row groups and sum all RowGroup.num_rows. This patch uses the 'num_rows' field in Parquet file metadata, it avoids NextRowGroup() function calls, generates and processes only one footer range per file. A new query option parquet_count_star_use_file_metadata is added for forward compatibility. Its default value is true, if any inconsistency between FileMetaData.num_rows and RowGroup.num_rows is found, we can set it to false to get same results as old versions. The following table shows a performance comparison before and after the patch. primitive_count_star_multiblock query is a modified primitive_count_star query that targets a multi-block tpch10_parquet.lineitem table. The files of the table is generated by the command `hdfs dfs -Ddfs.block.size=1048576 -cp -f -d`. +---+-+---++-++++---++-++ | Workload | Query | File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval | +---+-+---++-++++---++-++ | TPCDS(10) | TPCDS-Q_COUNT_OPTIMIZED | parquet / none / none | 0.17 | 0.16| +2.58% | * 29.53% * | * 27.16% * | 30| +1.20% | 0.58| 0.35 | | TPCDS(10) | TPCDS-Q_COUNT_UNOPTIMIZED | parquet / none / none | 0.27 | 0.26| +2.96% | 8.97%| 9.94%| 30| +0.16% | 0.44| 1.19 | | TPCDS(10) | TPCDS-Q_COUNT_ZERO_SLOT | parquet / none / none | 0.18 | 0.18| -0.69% | 1.65%| 1.99%| 30| -0.34% | -1.55 | -1.47 | | TARGETED-PERF(10) | primitive_count_star_multiblock | parquet / none / none | 0.06 | 0.12| I -49.88% | 4.11%| 3.53%| 30| I -99.97% | -6.54 | -66.81 | +---+-+---++-++++---++-++ Testing: - Ran PlannerTest#testParquetStatsAgg - Added new test cases to query_test/test_aggregation.py Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd --- M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M testdata/workloads/functional-query/queries/QueryTest/hdfs-tiny-scan.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-in-predicate-push-down.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-plain-count-star-optimization.test M testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_optimized.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_unoptimized.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_zero_slot.test M tests/query_test/test_aggregation.py M tests/util/parse_util.py 16 files changed, 177 insertions(+), 52 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/20804/13 -- To view, visit http://gerrit.cloudera.org:8080/20804 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd Gerrit-Change-Number: 20804 Gerrit-PatchSet: 13 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Yifan Zhang Gerrit-Reviewer: Zihao Ye Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20804 ) Change subject: IMPALA-12631: Improve count star performance for parquet scans .. Patch Set 12: (4 comments) http://gerrit.cloudera.org:8080/#/c/20804/11/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/20804/11/be/src/exec/parquet/hdfs-parquet-scanner.cc@445 PS11, Line 445: _file_metadata) { > What is the reason behind this check? Done. When max_scan_range_length is set to a small value, we may generate more than one scan range per block. We should also handle the cases. Related checks are updated in frontend. http://gerrit.cloudera.org:8080/#/c/20804/11/be/src/exec/parquet/hdfs-parquet-scanner.cc@447 PS11, Line 447: capacity = 1; > Add this check before assignment: Done http://gerrit.cloudera.org:8080/#/c/20804/11/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/20804/11/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1260 PS11, Line 1260: If parquet_count_star_use_file_metadata is enabled, we only need the : // 'num_ro > Mention the difference between enabling/disabling parquet_count_star_use_fi Done http://gerrit.cloudera.org:8080/#/c/20804/11/tests/query_test/test_aggregation.py File tests/query_test/test_aggregation.py: http://gerrit.cloudera.org:8080/#/c/20804/11/tests/query_test/test_aggregation.py@271 PS11, Line 271: vector.get_value('exec_option')['batch_size'] = 1 : self.run_test_case('QueryTest/parquet-stats-agg', vector, unique_database) : : vector.get_value('exec_option')['parquet_count_star_ > If parquet_count_star_use_file_metadata = true becomes default, I'd prefer Done -- To view, visit http://gerrit.cloudera.org:8080/20804 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd Gerrit-Change-Number: 20804 Gerrit-PatchSet: 12 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Yifan Zhang Gerrit-Reviewer: Zihao Ye Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 18 Jan 2024 09:25:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans
Hello Riza Suminto, Zoltan Borok-Nagy, Zihao Ye, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20804 to look at the new patch set (#12). Change subject: IMPALA-12631: Improve count star performance for parquet scans .. IMPALA-12631: Improve count star performance for parquet scans Backend function HdfsParquetScanner::GetNextInternal() uses the data stored in the Parquet RowGroup.num_rows field to compute count star, it still needs to find row groups and sum all RowGroup.num_rows. This patch uses the 'num_rows' field in Parquet file metadata, it avoids NextRowGroup() function calls, generates and processes only one footer range per file. A new query option parquet_count_star_use_file_metadata is added for forward compatibility. Its default value is true, if any inconsistency between FileMetaData.num_rows and RowGroup.num_rows is found, we can set it to false to get same results as old versions. The following table shows a performance comparison before and after the patch. primitive_count_star_multiblock query is a modified primitive_count_star query that targets a multi-block tpch10_parquet.lineitem table. The files of the table is generated by the command `hdfs dfs -Ddfs.block.size=1048576 -cp -f -d`. +---+-+---++-++++---++-++ | Workload | Query | File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval | +---+-+---++-++++---++-++ | TPCDS(10) | TPCDS-Q_COUNT_OPTIMIZED | parquet / none / none | 0.17 | 0.16| +2.58% | * 29.53% * | * 27.16% * | 30| +1.20% | 0.58| 0.35 | | TPCDS(10) | TPCDS-Q_COUNT_UNOPTIMIZED | parquet / none / none | 0.27 | 0.26| +2.96% | 8.97%| 9.94%| 30| +0.16% | 0.44| 1.19 | | TPCDS(10) | TPCDS-Q_COUNT_ZERO_SLOT | parquet / none / none | 0.18 | 0.18| -0.69% | 1.65%| 1.99%| 30| -0.34% | -1.55 | -1.47 | | TARGETED-PERF(10) | primitive_count_star_multiblock | parquet / none / none | 0.06 | 0.12| I -49.88% | 4.11%| 3.53%| 30| I -99.97% | -6.54 | -66.81 | +---+-+---++-++++---++-++ Testing: - Ran PlannerTest#testParquetStatsAgg - Added new test cases to query_test/test_aggregation.py Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd --- M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M testdata/workloads/functional-query/queries/QueryTest/hdfs-tiny-scan.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-in-predicate-push-down.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-plain-count-star-optimization.test M testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_optimized.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_unoptimized.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_zero_slot.test M tests/query_test/test_aggregation.py M tests/util/parse_util.py 16 files changed, 172 insertions(+), 52 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/20804/12 -- To view, visit http://gerrit.cloudera.org:8080/20804 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd Gerrit-Change-Number: 20804 Gerrit-PatchSet: 12 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Yifan Zhang Gerrit-Reviewer: Zihao Ye Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12054: Lazily check Kudu flags in tests
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20904 ) Change subject: IMPALA-12054: Lazily check Kudu flags in tests .. Patch Set 1: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/20904 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic7a8282b59d72322085c21c70a5019c51b586a52 Gerrit-Change-Number: 20904 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yifan Zhang Gerrit-Comment-Date: Tue, 16 Jan 2024 03:17:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12642: Support query options for Impala external JDBC table
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20837 ) Change subject: IMPALA-12642: Support query options for Impala external JDBC table .. Patch Set 7: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/20837 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I47687b7a93e90cea8ebd5f3fc280c9135bd97992 Gerrit-Change-Number: 20837 Gerrit-PatchSet: 7 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Yifan Zhang Gerrit-Comment-Date: Mon, 15 Jan 2024 12:36:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans
Hello Riza Suminto, Zoltan Borok-Nagy, Zihao Ye, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20804 to look at the new patch set (#11). Change subject: IMPALA-12631: Improve count star performance for parquet scans .. IMPALA-12631: Improve count star performance for parquet scans Backend function HdfsParquetScanner::GetNextInternal() uses the data stored in the Parquet RowGroup.num_rows field to compute count star, it still needs to find row groups and sum all RowGroup.num_rows. This patch uses the 'num_rows' field in Parquet file metadata, it avoids NextRowGroup() function calls, generates and processes only one footer range per file. A new query option parquet_count_star_use_file_metadata is added for forward compatibility. Its default value is true, if any inconsistency between FileMetaData.num_rows and RowGroup.num_rows is found, we can set it to false to get same results as old versions. The following table shows a performance comparison before and after the patch. primitive_count_star_multiblock query is a modified primitive_count_star query that targets a multi-block tpch10_parquet.lineitem table. The files of the table is generated by the command `hdfs dfs -Ddfs.block.size=1048576 -cp -f -d`. +---+-+---++-++++---++-++ | Workload | Query | File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval | +---+-+---++-++++---++-++ | TPCDS(10) | TPCDS-Q_COUNT_OPTIMIZED | parquet / none / none | 0.17 | 0.16| +2.58% | * 29.53% * | * 27.16% * | 30| +1.20% | 0.58| 0.35 | | TPCDS(10) | TPCDS-Q_COUNT_UNOPTIMIZED | parquet / none / none | 0.27 | 0.26| +2.96% | 8.97%| 9.94%| 30| +0.16% | 0.44| 1.19 | | TPCDS(10) | TPCDS-Q_COUNT_ZERO_SLOT | parquet / none / none | 0.18 | 0.18| -0.69% | 1.65%| 1.99%| 30| -0.34% | -1.55 | -1.47 | | TARGETED-PERF(10) | primitive_count_star_multiblock | parquet / none / none | 0.06 | 0.12| I -49.88% | 4.11%| 3.53%| 30| I -99.97% | -6.54 | -66.81 | +---+-+---++-++++---++-++ Testing: - Ran PlannerTest#testParquetStatsAgg - Added new test cases to query_test/test_aggregation.py Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd --- M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-plain-count-star-optimization.test A testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg-default.test M testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_optimized.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_unoptimized.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_zero_slot.test M tests/query_test/test_aggregation.py 13 files changed, 331 insertions(+), 36 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/20804/11 -- To view, visit http://gerrit.cloudera.org:8080/20804 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd Gerrit-Change-Number: 20804 Gerrit-PatchSet: 11 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Yifan Zhang Gerrit-Reviewer: Zihao Ye Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12642: Support query options for Impala external JDBC table
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20837 ) Change subject: IMPALA-12642: Support query options for Impala external JDBC table .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/20837/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20837/1//COMMIT_MSG@11 PS1, Line 11: comma-delimited key=value string, > A few query options have string type of value like request_pool and client_ I found the query option 'ENABLED_RUNTIME_FILTER_TYPES' with a set type, sometimes we could set it with multiple values separated by a comma. Also, can the string value in 'jdbc.options' also be used to be as settings for MySQL or other JDBC tables except for external Impala tables? I'm not sure if some setting configs that may contain commas. -- To view, visit http://gerrit.cloudera.org:8080/20837 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I47687b7a93e90cea8ebd5f3fc280c9135bd97992 Gerrit-Change-Number: 20837 Gerrit-PatchSet: 6 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Yifan Zhang Gerrit-Comment-Date: Fri, 12 Jan 2024 13:23:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20804 ) Change subject: IMPALA-12631: Improve count star performance for parquet scans .. Patch Set 10: (1 comment) http://gerrit.cloudera.org:8080/#/c/20804/8/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/20804/8/be/src/exec/parquet/hdfs-parquet-scanner.cc@473 PS8, Line 473: > The following query return different output with and without PARQUET_COUNT_ It turned out all 'dst_row' points to the same 'dst_tuple', so we can't reuse the same 'tuple_buf' to hold different values. I have updated this and added new test cases. -- To view, visit http://gerrit.cloudera.org:8080/20804 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd Gerrit-Change-Number: 20804 Gerrit-PatchSet: 10 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Yifan Zhang Gerrit-Reviewer: Zihao Ye Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 11 Jan 2024 13:09:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans
Hello Riza Suminto, Zoltan Borok-Nagy, Zihao Ye, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20804 to look at the new patch set (#10). Change subject: IMPALA-12631: Improve count star performance for parquet scans .. IMPALA-12631: Improve count star performance for parquet scans Backend function HdfsParquetScanner::GetNextInternal() uses the data stored in the Parquet RowGroup.num_rows field to compute count star, it still needs to find row groups and sum all RowGroup.num_rows. This patch uses the 'num_rows' field in Parquet file metadata, it avoids NextRowGroup() function calls, generates and processes only one footer range per file. A new query option parquet_count_star_use_file_metadata is added for forward compatibility. Its default value is true, if any inconsistency between FileMetaData.num_rows and RowGroup.num_rows is found, we can set it to false to get same results as old versions. The following table shows a performance comparison before and after the patch. primitive_count_star_multiblock query is a modified primitive_count_star query that targets a multi-block tpch10_parquet.lineitem table. The files of the table is generated by the command `hdfs dfs -Ddfs.block.size=1048576 -cp -f -d`. +---+-+---++-++++---++-++ | Workload | Query | File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval | +---+-+---++-++++---++-++ | TPCDS(10) | TPCDS-Q_COUNT_OPTIMIZED | parquet / none / none | 0.17 | 0.16| +2.58% | * 29.53% * | * 27.16% * | 30| +1.20% | 0.58| 0.35 | | TPCDS(10) | TPCDS-Q_COUNT_UNOPTIMIZED | parquet / none / none | 0.27 | 0.26| +2.96% | 8.97%| 9.94%| 30| +0.16% | 0.44| 1.19 | | TPCDS(10) | TPCDS-Q_COUNT_ZERO_SLOT | parquet / none / none | 0.18 | 0.18| -0.69% | 1.65%| 1.99%| 30| -0.34% | -1.55 | -1.47 | | TARGETED-PERF(10) | primitive_count_star_multiblock | parquet / none / none | 0.06 | 0.12| I -49.88% | 4.11%| 3.53%| 30| I -99.97% | -6.54 | -66.81 | +---+-+---++-++++---++-++ Testing: - Ran PlannerTest#testParquetStatsAgg - Added new test cases to query_test/test_aggregation.py Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd --- M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java A testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg-default.test M testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_optimized.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_unoptimized.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_zero_slot.test M tests/query_test/test_aggregation.py 12 files changed, 327 insertions(+), 32 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/20804/10 -- To view, visit http://gerrit.cloudera.org:8080/20804 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd Gerrit-Change-Number: 20804 Gerrit-PatchSet: 10 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Yifan Zhang Gerrit-Reviewer: Zihao Ye Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20804 ) Change subject: IMPALA-12631: Improve count star performance for parquet scans .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/20804/8/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/20804/8/be/src/exec/parquet/hdfs-parquet-scanner.cc@473 PS8, Line 473: dst_tuple > I still think there's an issue here. Only tuple_buf is updated in the loop, There is already a test case covering this, see 'QueryTest/parquet-stats-agg', we query multiblock tables including 'functional_parquet.lineitem_multiblock' and 'functional_parquet.lineitem_sixblocks'. I have tested it with this patch and got the correct results. I didn't quite understand what you meant. But 'So, in each iteration, data is written to the first Tuple' is not true. We created a new 'dst_row' in the loop(see L470). -- To view, visit http://gerrit.cloudera.org:8080/20804 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd Gerrit-Change-Number: 20804 Gerrit-PatchSet: 9 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Yifan Zhang Gerrit-Reviewer: Zihao Ye Gerrit-Comment-Date: Fri, 05 Jan 2024 07:07:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20804 ) Change subject: IMPALA-12631: Improve count star performance for parquet scans .. Patch Set 8: (2 comments) http://gerrit.cloudera.org:8080/#/c/20804/8/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/20804/8/be/src/exec/parquet/hdfs-parquet-scanner.cc@473 PS8, Line 473: dst_tuple > Is it okay if we don't update dst_tuple inside the loop? If there are multi Yes, it is. Since 'dst_slot' points to the slot of the 'dst_tuple', we only need to update 'dst_slot' in the loop. http://gerrit.cloudera.org:8080/#/c/20804/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/20804/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1274 PS8, Line 1274: isFooterOnly > nit: This parameter seems unnecessary. Can't we do the same check (countSta This is to address the review feedback in PS7:https://gerrit.cloudera.org/c/20804/7/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java#1397. I think wrapping it into a boolean variable makes codes in transformBlocksToScanRanges() more readable. -- To view, visit http://gerrit.cloudera.org:8080/20804 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd Gerrit-Change-Number: 20804 Gerrit-PatchSet: 8 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Yifan Zhang Gerrit-Reviewer: Zihao Ye Gerrit-Comment-Date: Fri, 05 Jan 2024 05:09:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans
Hello Riza Suminto, Zihao Ye, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20804 to look at the new patch set (#9). Change subject: IMPALA-12631: Improve count star performance for parquet scans .. IMPALA-12631: Improve count star performance for parquet scans Backend function HdfsParquetScanner::GetNextInternal() uses the data stored in the Parquet RowGroup.num_rows field to compute count star, it still needs to find row groups and sum all RowGroup.num_rows. This patch uses the 'num_rows' field in Parquet file metadata, it avoids NextRowGroup() function calls, generates and processes only one footer range per file. A new query option parquet_count_star_use_file_metadata is added for forward compatibility. Its default value is true, if any inconsistency between FileMetaData.num_rows and RowGroup.num_rows is found, we can set it to false to get same results as old versions. The following table shows a performance comparison before and after the patch. primitive_count_star_multiblock query is a modified primitive_count_star query that targets a multi-block tpch10_parquet.lineitem table. The files of the table is generated by the command `hdfs dfs -Ddfs.block.size=1048576 -cp -f -d`. +---+-+---++-++++---++-++ | Workload | Query | File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval | +---+-+---++-++++---++-++ | TPCDS(10) | TPCDS-Q_COUNT_OPTIMIZED | parquet / none / none | 0.17 | 0.16| +2.58% | * 29.53% * | * 27.16% * | 30| +1.20% | 0.58| 0.35 | | TPCDS(10) | TPCDS-Q_COUNT_UNOPTIMIZED | parquet / none / none | 0.27 | 0.26| +2.96% | 8.97%| 9.94%| 30| +0.16% | 0.44| 1.19 | | TPCDS(10) | TPCDS-Q_COUNT_ZERO_SLOT | parquet / none / none | 0.18 | 0.18| -0.69% | 1.65%| 1.99%| 30| -0.34% | -1.55 | -1.47 | | TARGETED-PERF(10) | primitive_count_star_multiblock | parquet / none / none | 0.06 | 0.12| I -49.88% | 4.11%| 3.53%| 30| I -99.97% | -6.54 | -66.81 | +---+-+---++-++++---++-++ Testing: - Ran PlannerTest#testParquetStatsAgg - Added new test cases to query_test/test_aggregation.py Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd --- M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java A testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg-default.test M testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_optimized.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_unoptimized.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_zero_slot.test M tests/query_test/test_aggregation.py 12 files changed, 316 insertions(+), 33 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/20804/9 -- To view, visit http://gerrit.cloudera.org:8080/20804 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd Gerrit-Change-Number: 20804 Gerrit-PatchSet: 9 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Yifan Zhang Gerrit-Reviewer: Zihao Ye
[Impala-ASF-CR] IMPALA-12642: Support query options for Impala external JDBC table
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20837 ) Change subject: IMPALA-12642: Support query options for Impala external JDBC table .. Patch Set 3: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/20837 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I47687b7a93e90cea8ebd5f3fc280c9135bd97992 Gerrit-Change-Number: 20837 Gerrit-PatchSet: 3 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Yifan Zhang Gerrit-Comment-Date: Fri, 05 Jan 2024 02:47:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12642: Support query options for Impala external JDBC table
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20837 ) Change subject: IMPALA-12642: Support query options for Impala external JDBC table .. Patch Set 2: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/20837 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I47687b7a93e90cea8ebd5f3fc280c9135bd97992 Gerrit-Change-Number: 20837 Gerrit-PatchSet: 2 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Yifan Zhang Gerrit-Comment-Date: Wed, 03 Jan 2024 02:22:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12642: Support query options for Impala external JDBC table
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20837 ) Change subject: IMPALA-12642: Support query options for Impala external JDBC table .. Patch Set 1: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/20837/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20837/1//COMMIT_MSG@11 PS1, Line 11: comma-delimited key=value string, nit: Just curious about whether is there a case that the value of the option contains commas. -- To view, visit http://gerrit.cloudera.org:8080/20837 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I47687b7a93e90cea8ebd5f3fc280c9135bd97992 Gerrit-Change-Number: 20837 Gerrit-PatchSet: 1 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yifan Zhang Gerrit-Comment-Date: Tue, 02 Jan 2024 09:26:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12665: Update ScratchMicroBatch length to new scratch batch ->capacity after ScratchTupleBatch::Reset
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20832 ) Change subject: IMPALA-12665: Update ScratchMicroBatch length to new scratch_batch_->capacity after ScratchTupleBatch::Reset .. Patch Set 2: (2 comments) I think tests are needed to verify the problem is fixed, and we should also check whether orc scanner also has this problem. http://gerrit.cloudera.org:8080/#/c/20832/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20832/2//COMMIT_MSG@7 PS2, Line 7: IMPALA-12665: Update ScratchMicroBatch length to new scratch_batch_->capacity after ScratchTupleBatch::Reset You need to add more information in the commit message, which should include what the problem was, and how it was fixed. http://gerrit.cloudera.org:8080/#/c/20832/2/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/20832/2/be/src/exec/parquet/hdfs-parquet-scanner.cc@2504 PS2, Line 2504: // Update length to new scratch_batch_->capacity after ScratchTupleBatch::Reset Can we update ScratchMicroBatch just after calling ScratchTupleBatch::Reset()? -- To view, visit http://gerrit.cloudera.org:8080/20832 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I75e19c4e7a566441510a1172ea01537046c5c885 Gerrit-Change-Number: 20832 Gerrit-PatchSet: 2 Gerrit-Owner: Zinway Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yifan Zhang Gerrit-Comment-Date: Tue, 26 Dec 2023 12:28:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20804 ) Change subject: IMPALA-12631: Improve count star performance for parquet scans .. Patch Set 8: (4 comments) http://gerrit.cloudera.org:8080/#/c/20804/7/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/20804/7/be/src/exec/parquet/hdfs-parquet-scanner.cc@480 PS7, Line 480: // There are no materialized slots and we are not optimizing coun > Is it possible to unify the code here? Done http://gerrit.cloudera.org:8080/#/c/20804/7/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/20804/7/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@426 PS7, Line 426: // compute scan range locations with optional sampling : computeScanRangeLocations(analyzer); : : if (hasParquet(fileFormats_)) { : // Compute min-max conjuncts only if the PARQUET_READ_STATISTICS query option is : // set to true. : if (analyzer.getQueryOptions().parquet_read_statistics) { : computeStatsTupleAndConjuncts(analyzer); : } : // Compute dictionary conjuncts only if the PARQUET_DICTIONARY_FILTERING query : > Is it OK to move these into computeScanRangeLocations? Done http://gerrit.cloudera.org:8080/#/c/20804/7/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1396 PS7, Line 1396: :* Given a fileDesc of partition, transforms the blocks into TScanRanges. Eac > Contain this into a boolean variable. Possibly at caller of transformBlocks Done http://gerrit.cloudera.org:8080/#/c/20804/7/testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test File testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test: http://gerrit.cloudera.org:8080/#/c/20804/7/testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test@10 PS7, Line 10: RUNTIME_PROFILE : aggregation(SUM, NumRowGroups): 24 : aggregation(SUM, NumFileMetadataRead): 24 : aggregation(SUM, RowsRead): 0 > For modified tests here, can you also add the same test with parquet_count_ Done -- To view, visit http://gerrit.cloudera.org:8080/20804 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd Gerrit-Change-Number: 20804 Gerrit-PatchSet: 8 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Yifan Zhang Gerrit-Comment-Date: Mon, 25 Dec 2023 11:28:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans
Hello Riza Suminto, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20804 to look at the new patch set (#8). Change subject: IMPALA-12631: Improve count star performance for parquet scans .. IMPALA-12631: Improve count star performance for parquet scans Backend function HdfsParquetScanner::GetNextInternal() uses the data stored in the Parquet RowGroup.num_rows field to compute count star, it still needs to find row groups and sum all RowGroup.num_rows. This patch uses the 'num_rows' field in Parquet file metadata, it avoids NextRowGroup() function calls, generates and processes only one footer range per file. A new query option parquet_count_star_use_file_metadata is added for forward compatibility. Its default value is true, if any inconsistency between FileMetaData.num_rows and RowGroup.num_rows is found, we can set it to false to get same results as old versions. The following table shows a performance comparison before and after the patch. primitive_count_star_multiblock query is a modified primitive_count_star query that targets a multi-block tpch10_parquet.lineitem table. The files of the table is generated by the command `hdfs dfs -Ddfs.block.size=1048576 -cp -f -d`. +---+-+---++-++++---++-++ | Workload | Query | File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval | +---+-+---++-++++---++-++ | TPCDS(10) | TPCDS-Q_COUNT_OPTIMIZED | parquet / none / none | 0.17 | 0.16| +2.58% | * 29.53% * | * 27.16% * | 30| +1.20% | 0.58| 0.35 | | TPCDS(10) | TPCDS-Q_COUNT_UNOPTIMIZED | parquet / none / none | 0.27 | 0.26| +2.96% | 8.97%| 9.94%| 30| +0.16% | 0.44| 1.19 | | TPCDS(10) | TPCDS-Q_COUNT_ZERO_SLOT | parquet / none / none | 0.18 | 0.18| -0.69% | 1.65%| 1.99%| 30| -0.34% | -1.55 | -1.47 | | TARGETED-PERF(10) | primitive_count_star_multiblock | parquet / none / none | 0.06 | 0.12| I -49.88% | 4.11%| 3.53%| 30| I -99.97% | -6.54 | -66.81 | +---+-+---++-++++---++-++ Testing: - Ran PlannerTest#testParquetStatsAgg - Added new test cases to query_test/test_aggregation.py Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd --- M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java A testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg-default.test M testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_optimized.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_unoptimized.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_zero_slot.test M tests/query_test/test_aggregation.py 12 files changed, 316 insertions(+), 33 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/20804/8 -- To view, visit http://gerrit.cloudera.org:8080/20804 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd Gerrit-Change-Number: 20804 Gerrit-PatchSet: 8 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Yifan Zhang
[Impala-ASF-CR] IMPALA-12581: Fix issue of ILIKE and IREGEXP not working correctly with non-const pattern
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20785 ) Change subject: IMPALA-12581: Fix issue of ILIKE and IREGEXP not working correctly with non-const pattern .. Patch Set 4: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/20785 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d66680f5a7660e6a41859754c4230f276e66712 Gerrit-Change-Number: 20785 Gerrit-PatchSet: 4 Gerrit-Owner: Zihao Ye Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Peter Rozsa Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Yifan Zhang Gerrit-Reviewer: Zihao Ye Gerrit-Comment-Date: Fri, 22 Dec 2023 08:43:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20804 ) Change subject: IMPALA-12631: Improve count star performance for parquet scans .. Patch Set 7: (4 comments) http://gerrit.cloudera.org:8080/#/c/20804/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20804/5//COMMIT_MSG@30 PS5, Line 30: TPCDS(10) | TPCDS-Q_COUNT_OPTIMIZED | parquet / none / none | 0.17 | 0.16 > Any idea why this benchmark does not improve? I thought this patch should i I checked profiles without this patch and found that 'NumFileMetadataRead' is the same as 'NumRowGroups'. In this case, the calculation cost is not saved. http://gerrit.cloudera.org:8080/#/c/20804/5/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/20804/5/be/src/exec/parquet/hdfs-parquet-scanner.cc@464 PS5, Line 464: int capacity = min( : static_cast(file_metadata_.row_groups.size()), row_batch->capacity()); : RETURN_IF_ERROR(RowBatch::ResizeAndAllocateTupleBuffer(state_, : row_batch->tuple_data_pool(), row_batch->row_desc()->GetRowSize(), : , _buf_size, _buf)); > Unnecessary changes? Done http://gerrit.cloudera.org:8080/#/c/20804/5/be/src/service/query-options.cc File be/src/service/query-options.cc: http://gerrit.cloudera.org:8080/#/c/20804/5/be/src/service/query-options.cc@1198 PS5, Line 1198: case TImpalaQueryOptions::PARQUET_COUNT_STAR_USE_FILE_METADATA: { : query_options->__set_parquet_count_star_use_file_metadata(IsTrue(value)); : break; : } > Option parsing does not look right to me. Fixed http://gerrit.cloudera.org:8080/#/c/20804/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/20804/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@a411 PS5, Line 411: > I think canApplyCountStarOptimization should not change, especially this ch Done -- To view, visit http://gerrit.cloudera.org:8080/20804 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd Gerrit-Change-Number: 20804 Gerrit-PatchSet: 7 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Yifan Zhang Gerrit-Comment-Date: Fri, 22 Dec 2023 06:12:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans
Hello Riza Suminto, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20804 to look at the new patch set (#7). Change subject: IMPALA-12631: Improve count star performance for parquet scans .. IMPALA-12631: Improve count star performance for parquet scans Backend function HdfsParquetScanner::GetNextInternal() uses the data stored in the Parquet RowGroup.num_rows field to compute count star, it still needs to find row groups and sum all RowGroup.num_rows. This patch uses the 'num_rows' field in Parquet file metadata, it avoids NextRowGroup() function calls, generates and processes only one footer range per file. A new query option parquet_count_star_use_file_metadata is added for forward compatibility. Its default value is true, if any inconsistency between FileMetaData.num_rows and RowGroup.num_rows is found, we can set it to false to get same results as old versions. The following table shows a performance comparison before and after the patch. primitive_count_star_multiblock query is a modified primitive_count_star query that targets a multi-block tpch10_parquet.lineitem table. The files of the table is generated by the command `hdfs dfs -Ddfs.block.size=1048576 -cp -f -d`. +---+-+---++-++++---++-++ | Workload | Query | File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval | +---+-+---++-++++---++-++ | TPCDS(10) | TPCDS-Q_COUNT_OPTIMIZED | parquet / none / none | 0.17 | 0.16| +2.58% | * 29.53% * | * 27.16% * | 30| +1.20% | 0.58| 0.35 | | TPCDS(10) | TPCDS-Q_COUNT_UNOPTIMIZED | parquet / none / none | 0.27 | 0.26| +2.96% | 8.97%| 9.94%| 30| +0.16% | 0.44| 1.19 | | TPCDS(10) | TPCDS-Q_COUNT_ZERO_SLOT | parquet / none / none | 0.18 | 0.18| -0.69% | 1.65%| 1.99%| 30| -0.34% | -1.55 | -1.47 | | TARGETED-PERF(10) | primitive_count_star_multiblock | parquet / none / none | 0.06 | 0.12| I -49.88% | 4.11%| 3.53%| 30| I -99.97% | -6.54 | -66.81 | +---+-+---++-++++---++-++ Testing: - Ran PlannerTest#testParquetStatsAgg - Ran query_test/test_aggregation.py Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd --- M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_optimized.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_unoptimized.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_zero_slot.test 10 files changed, 119 insertions(+), 14 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/20804/7 -- To view, visit http://gerrit.cloudera.org:8080/20804 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd Gerrit-Change-Number: 20804 Gerrit-PatchSet: 7 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Yifan Zhang
[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans
Hello Riza Suminto, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20804 to look at the new patch set (#6). Change subject: IMPALA-12631: Improve count star performance for parquet scans .. IMPALA-12631: Improve count star performance for parquet scans Backend function HdfsParquetScanner::GetNextInternal() uses the data stored in the Parquet RowGroup.num_rows field to compute count star, it still needs to find row groups and sum all RowGroup.num_rows. This patch uses the 'num_rows' field in Parquet file metadata, it avoids NextRowGroup() function calls, generates and processes only one footer range per file. A new query option parquet_count_star_use_file_metadata is added for forward compatibility. Its default value is true, if any inconsistency between FileMetaData.num_rows and RowGroup.num_rows is found, we can set it to false to get same results as old versions. The following table shows a performance comparison before and after the patch. primitive_count_star_multiblock query is a modified primitive_count_star query that targets a multi-block tpch10_parquet.lineitem table. The files of the table is generated by the command `hdfs dfs -Ddfs.block.size=1048576 -cp -f -d`. +---+-+---++-++++---++-++ | Workload | Query | File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval | +---+-+---++-++++---++-++ | TPCDS(10) | TPCDS-Q_COUNT_OPTIMIZED | parquet / none / none | 0.17 | 0.16| +2.58% | * 29.53% * | * 27.16% * | 30| +1.20% | 0.58| 0.35 | | TPCDS(10) | TPCDS-Q_COUNT_UNOPTIMIZED | parquet / none / none | 0.27 | 0.26| +2.96% | 8.97%| 9.94%| 30| +0.16% | 0.44| 1.19 | | TPCDS(10) | TPCDS-Q_COUNT_ZERO_SLOT | parquet / none / none | 0.18 | 0.18| -0.69% | 1.65%| 1.99%| 30| -0.34% | -1.55 | -1.47 | | TARGETED-PERF(10) | primitive_count_star_multiblock | parquet / none / none | 0.06 | 0.12| I -49.88% | 4.11%| 3.53%| 30| I -99.97% | -6.54 | -66.81 | +---+-+---++-++++---++-++ Testing: - Ran PlannerTest#testParquetStatsAgg - Ran query_test/test_aggregation.py Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd --- M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_optimized.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_unoptimized.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_zero_slot.test 10 files changed, 119 insertions(+), 14 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/20804/6 -- To view, visit http://gerrit.cloudera.org:8080/20804 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd Gerrit-Change-Number: 20804 Gerrit-PatchSet: 6 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Yifan Zhang
[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans
Hello Riza Suminto, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20804 to look at the new patch set (#5). Change subject: IMPALA-12631: Improve count star performance for parquet scans .. IMPALA-12631: Improve count star performance for parquet scans Backend function HdfsParquetScanner::GetNextInternal() uses the data stored in the Parquet RowGroup.num_rows field to compute count star, it still needs to find row groups and sum all RowGroup.num_rows. This patch uses the 'num_rows' field in Parquet file metadata, it avoids NextRowGroup() function calls, generates and processes only one footer range per file. A new query option parquet_count_star_use_file_metadata is added for forward compatibility. Its default value is true, if any inconsistency between FileMetaData.num_rows and RowGroup.num_rows is found, we can set it to false to get same results as old versions. The following table shows a performance comparison before and after the patch. primitive_count_star_multiblock query is a modified primitive_count_star query that targets a multi-block tpch10_parquet.lineitem table. The files of the table is generated by the command `hdfs dfs -Ddfs.block.size=1048576 -cp -f -d`. +---+-+---++-++++---++-++ | Workload | Query | File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval | +---+-+---++-++++---++-++ | TPCDS(10) | TPCDS-Q_COUNT_OPTIMIZED | parquet / none / none | 0.17 | 0.16| +2.58% | * 29.53% * | * 27.16% * | 30| +1.20% | 0.58| 0.35 | | TPCDS(10) | TPCDS-Q_COUNT_UNOPTIMIZED | parquet / none / none | 0.27 | 0.26| +2.96% | 8.97%| 9.94%| 30| +0.16% | 0.44| 1.19 | | TPCDS(10) | TPCDS-Q_COUNT_ZERO_SLOT | parquet / none / none | 0.18 | 0.18| -0.69% | 1.65%| 1.99%| 30| -0.34% | -1.55 | -1.47 | | TARGETED-PERF(10) | primitive_count_star_multiblock | parquet / none / none | 0.06 | 0.12| I -49.88% | 4.11%| 3.53%| 30| I -99.97% | -6.54 | -66.81 | +---+-+---++-++++---++-++ Testing: - Ran PlannerTest#testParquetStatsAgg - Ran query_test/test_aggregation.py Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd --- M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_optimized.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_unoptimized.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_zero_slot.test 10 files changed, 124 insertions(+), 21 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/20804/5 -- To view, visit http://gerrit.cloudera.org:8080/20804 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd Gerrit-Change-Number: 20804 Gerrit-PatchSet: 5 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Yifan Zhang
[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20804 ) Change subject: IMPALA-12631: Improve count star performance for parquet scans .. Patch Set 5: I chose to add a query option instead of a backend flag to control whether to enable this optimization. The reason is: - This patch also contains changes on frontend. - Different configurations for backends in a cluster can lead to incorrect query results. -- To view, visit http://gerrit.cloudera.org:8080/20804 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd Gerrit-Change-Number: 20804 Gerrit-PatchSet: 5 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Yifan Zhang Gerrit-Comment-Date: Thu, 21 Dec 2023 12:35:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans
Hello Riza Suminto, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20804 to look at the new patch set (#4). Change subject: IMPALA-12631: Improve count star performance for parquet scans .. IMPALA-12631: Improve count star performance for parquet scans Backend function HdfsParquetScanner::GetNextInternal() uses the data stored in the Parquet RowGroup.num_rows field to compute count star, it still needs to find row groups and sum all RowGroup.num_rows. This patch uses the 'num_rows' field in Parquet file metadata, it avoids NextRowGroup() function calls, generates and processes only one footer range per file. A new query option parquet_count_star_use_file_metadata is added for forward compatibility. Its default value is true, if any inconsistency between FileMetaData.num_rows and RowGroup.num_rows is found, we can set it to false to get same results as old versions. The following table shows a performance comparison before and after the patch. primitive_count_star_multiblock query is a modified primitive_count_star query that targets a multi-block tpch10_parquet.lineitem table. The files of the table is generated by the command `hdfs dfs -Ddfs.block.size=1048576 -cp -f -d`. +---+-+---++-++++---++-++ | Workload | Query | File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval | +---+-+---++-++++---++-++ | TPCDS(10) | TPCDS-Q_COUNT_OPTIMIZED | parquet / none / none | 0.17 | 0.16| +2.58% | * 29.53% * | * 27.16% * | 30| +1.20% | 0.58| 0.35 | | TPCDS(10) | TPCDS-Q_COUNT_UNOPTIMIZED | parquet / none / none | 0.27 | 0.26| +2.96% | 8.97%| 9.94%| 30| +0.16% | 0.44| 1.19 | | TPCDS(10) | TPCDS-Q_COUNT_ZERO_SLOT | parquet / none / none | 0.18 | 0.18| -0.69% | 1.65%| 1.99%| 30| -0.34% | -1.55 | -1.47 | | TARGETED-PERF(10) | primitive_count_star_multiblock | parquet / none / none | 0.06 | 0.12| I -49.88% | 4.11%| 3.53%| 30| I -99.97% | -6.54 | -66.81 | +---+-+---++-++++---++-++ Testing: - Ran PlannerTest#testParquetStatsAgg - Ran query_test/test_aggregation.py Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd --- M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_optimized.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_unoptimized.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q_count_zero_slot.test 10 files changed, 124 insertions(+), 21 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/20804/4 -- To view, visit http://gerrit.cloudera.org:8080/20804 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd Gerrit-Change-Number: 20804 Gerrit-PatchSet: 4 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Yifan Zhang
[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20804 ) Change subject: IMPALA-12631: Improve count star performance for parquet scans .. Patch Set 3: > Since this will be a behavior change, do you mind adding a backend flag to > control this? Default to count using FileMetaData.num_rows, but back to > RowGroups.num_rows when flag is disabled. This way, user can revert to old > behavior if they do hit an inaccurate FileMetaData.num_rows issue. > > Basic performance benchmark is also desirable to ensure no regression happen > like IMPALA-11123. Maybe you can steal TPCDS-Q_COUNT_OPTIMIZED, > TPCDS-Q_COUNT_UNOPTIMIZED, and TPCDS-Q_COUNT_ZERO_SLOT from > https://gerrit.cloudera.org/c/19927 and run single_node_perf_run.py such as: > > ./bin/single_node_perf_run.py --num_impalads=3 \ > --workloads=tpcds --iterations=9 --table_formats=parquet/none/none \ > > --query_names=TPCDS-Q_COUNT_OPTIMIZED,TPCDS-Q_COUNT_UNOPTIMIZED,TPCDS-Q_COUNT_ZERO_SLOT > \ > asf-master > > Even better if you can do it with larger scale TPC-DS like 10GB: > > ./bin/single_node_perf_run.py --num_impalads=3 --load --scale=10 \ > --workloads=tpcds --iterations=9 --table_formats=parquet/none/none \ > > --query_names=TPCDS-Q_COUNT_OPTIMIZED,TPCDS-Q_COUNT_UNOPTIMIZED,TPCDS-Q_COUNT_ZERO_SLOT > \ > asf-master > > Using tpch_parquet.lineitem should be fine as well. Thanks for the guidance! I'll try to add a backend flag and do some performance tests. -- To view, visit http://gerrit.cloudera.org:8080/20804 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd Gerrit-Change-Number: 20804 Gerrit-PatchSet: 3 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Yifan Zhang Gerrit-Comment-Date: Tue, 19 Dec 2023 09:33:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20804 ) Change subject: IMPALA-12631: Improve count star performance for parquet scans .. Patch Set 3: > Patch Set 3: > > I think there is a strong reason why Impala trust RowGroups.num_rows more > than FileMetaData.num_rows. Maybe there are still invalid parquet files out > there that is written with mismatched FileMetaData.num_rows. > > See: > https://issues.apache.org/jira/browse/IMPALA-3943 > https://issues.apache.org/jira/browse/IMPALA-2230 Thanks for your reply, Riza! I investigated the issues, especially IMPALA-3943. For parquet scans, Impala now treats the file with FileMetaData.num_rows=0 as an empty file, see: https://github.com/apache/impala/blob/4114fe8db6ec80b2e1679e946555f91ab7043f2e/be/src/exec/parquet/hdfs-parquet-scanner.cc#L895C1-L898. And I also found other places that using FileMetaData.num_rows to generate query results for metadata only queries, see: https://github.com/apache/impala/blob/4114fe8db6ec80b2e1679e946555f91ab7043f2e/be/src/exec/parquet/hdfs-parquet-scanner.cc#L477-L481. Besides, Impala-3943 also added warning logs for inconsistency of FileMetaData.num_rows and RowGroups.num_rows. So I think using FileMetaData.num_rows for count star optimizations should be acceptable. -- To view, visit http://gerrit.cloudera.org:8080/20804 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd Gerrit-Change-Number: 20804 Gerrit-PatchSet: 3 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Yifan Zhang Gerrit-Comment-Date: Mon, 18 Dec 2023 09:56:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20804 to look at the new patch set (#3). Change subject: IMPALA-12631: Improve count star performance for parquet scans .. IMPALA-12631: Improve count star performance for parquet scans Backend function HdfsParquetScanner::GetNextInternal() uses the data stored in Parquet 'RowGroup.num_rows' field to compute count star, it still needs to find row groups and sum all 'RowGroup.num_rows'. This patch uses the 'num_rows' field in Parquet file metadata, it avoids NextRowGroup() function calls. Then each file only needs to be processed once. The planner is also modified to generate only one scan range per file. Testing: - Ran PlannerTest#testParquetStatsAgg - Ran query_test/test_aggregation.py Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd --- M be/src/exec/parquet/hdfs-parquet-scanner.cc M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test 3 files changed, 70 insertions(+), 35 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/20804/3 -- To view, visit http://gerrit.cloudera.org:8080/20804 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd Gerrit-Change-Number: 20804 Gerrit-PatchSet: 3 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20804 to look at the new patch set (#2). Change subject: IMPALA-12631: Improve count star performance for parquet scans .. IMPALA-12631: Improve count star performance for parquet scans Backend function HdfsParquetScanner::GetNextInternal() uses the data stored in the Parquet RowGroup.num_rows field to compute count star, it still needs to find row groups and sum all RowGroup.num_rows. This patch uses the 'num_rows' field in Parquet file metadata, it avoids NextRowGroup() function calls. Then each file only needs to be processed once. The planner is also modified to generate only one scan range per file. Testing: - Ran PlannerTest#testParquetStatsAgg - Ran query_test/test_aggregation.py Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd --- M be/src/exec/parquet/hdfs-parquet-scanner.cc M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test 3 files changed, 70 insertions(+), 35 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/20804/2 -- To view, visit http://gerrit.cloudera.org:8080/20804 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd Gerrit-Change-Number: 20804 Gerrit-PatchSet: 2 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-12631: Improve count star performance for parquet scans
Yifan Zhang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/20804 Change subject: IMPALA-12631: Improve count star performance for parquet scans .. IMPALA-12631: Improve count star performance for parquet scans Backend function HdfsParquetScanner::GetNextInternal() uses the data stored in the Parquet RowGroup.num_rows field to compute count star, it still needs to find row groups and sum all RowGroup.num_rows. This patch uses the 'num_rows' field in Parquet file metadata, it avoids NextRowGroup() function calls. Then each file only needs to be processed once. The planner is also modified to generate only one scan range per file. Testing: - Ran PlannerTest#testParquetStatsAgg - Ran query_test/test_aggregation.py Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd --- M be/src/exec/parquet/hdfs-parquet-scanner.cc M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test 3 files changed, 68 insertions(+), 35 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/20804/1 -- To view, visit http://gerrit.cloudera.org:8080/20804 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ib9cd2448fe51a420d4559d0cc861c4d30822f4fd Gerrit-Change-Number: 20804 Gerrit-PatchSet: 1 Gerrit-Owner: Yifan Zhang
[Impala-ASF-CR] IMPALA-12229: Support soft-delete Kudu table
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20773 ) Change subject: IMPALA-12229: Support soft-delete Kudu table .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/20773/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20773/2//COMMIT_MSG@11 PS2, Line 11: prevent users from del > nit: prevent users from deleting Done http://gerrit.cloudera.org:8080/#/c/20773/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/20773/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2683 PS2, Line 2683: dropTablesFromKudu(db, kudu_table_reserve_seconds) > What's behavior of Kudu engine for soft table deletion when the database is The managed Kudu tables will be in the 'soft-deleted' state for the reservation period, during this time the tables can be recovered by calling Kudu's 'recall table' API. -- To view, visit http://gerrit.cloudera.org:8080/20773 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3020567bb6cfe4dd48ef17906f8de674f37217e7 Gerrit-Change-Number: 20773 Gerrit-PatchSet: 4 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Yifan Zhang Gerrit-Comment-Date: Wed, 13 Dec 2023 07:29:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12229: Support soft-delete Kudu table
Hello Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20773 to look at the new patch set (#4). Change subject: IMPALA-12229: Support soft-delete Kudu table .. IMPALA-12229: Support soft-delete Kudu table Adds 'kudu_table_reserve_seconds' query option to set reserved time for deleted Impala managed Kudu tables. The default value is 0. This option can prevent users from deleting important Kudu tables by mistake. Testing: - Added e2e tests. Change-Id: I3020567bb6cfe4dd48ef17906f8de674f37217e7 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/CatalogService.thrift M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java M infra/python/deps/kudu-requirements.txt M tests/query_test/test_kudu.py 10 files changed, 112 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/20773/4 -- To view, visit http://gerrit.cloudera.org:8080/20773 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3020567bb6cfe4dd48ef17906f8de674f37217e7 Gerrit-Change-Number: 20773 Gerrit-PatchSet: 4 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-12229: Support soft-delete Kudu table
Hello Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20773 to look at the new patch set (#3). Change subject: IMPALA-12229: Support soft-delete Kudu table .. IMPALA-12229: Support soft-delete Kudu table Adds 'kudu_table_reserve_seconds' query option to set reserved time for deleted Impala managed Kudu tables. The default value is 0. This option can prevent users from deleting important Kudu tables by mistake. Testing: - Added e2e tests. Change-Id: I3020567bb6cfe4dd48ef17906f8de674f37217e7 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/CatalogService.thrift M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java M infra/python/deps/kudu-requirements.txt M tests/query_test/test_kudu.py 10 files changed, 107 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/20773/3 -- To view, visit http://gerrit.cloudera.org:8080/20773 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3020567bb6cfe4dd48ef17906f8de674f37217e7 Gerrit-Change-Number: 20773 Gerrit-PatchSet: 3 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-12229: Support soft-delete Kudu table
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20773 to look at the new patch set (#2). Change subject: IMPALA-12229: Support soft-delete Kudu table .. IMPALA-12229: Support soft-delete Kudu table Adds 'kudu_table_reserve_seconds' query option to set reserved time for deleted Impala managed Kudu tables. The default value is 0. This option can prevent users deleting important Kudu tables by mistake. Testing: - Added an e2e test. Change-Id: I3020567bb6cfe4dd48ef17906f8de674f37217e7 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/CatalogService.thrift M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java M infra/python/deps/kudu-requirements.txt M tests/query_test/test_kudu.py 10 files changed, 72 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/20773/2 -- To view, visit http://gerrit.cloudera.org:8080/20773 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3020567bb6cfe4dd48ef17906f8de674f37217e7 Gerrit-Change-Number: 20773 Gerrit-PatchSet: 2 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-12229: Support soft-delete Kudu table
Yifan Zhang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/20773 Change subject: IMPALA-12229: Support soft-delete Kudu table .. IMPALA-12229: Support soft-delete Kudu table Adds 'kudu_table_reserve_seconds' query option to set reserved time for deleted Impala managed Kudu tables. The default value is 0. This option can prevent users deleting important Kudu tables by mistake. Testing: - Added an e2e test. Change-Id: I3020567bb6cfe4dd48ef17906f8de674f37217e7 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/CatalogService.thrift M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java M infra/python/deps/kudu-requirements.txt M tests/query_test/test_kudu.py 10 files changed, 73 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/20773/1 -- To view, visit http://gerrit.cloudera.org:8080/20773 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I3020567bb6cfe4dd48ef17906f8de674f37217e7 Gerrit-Change-Number: 20773 Gerrit-PatchSet: 1 Gerrit-Owner: Yifan Zhang
[Impala-ASF-CR] IMPALA-12535: Fix misleading metric keys for the threadz page
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20658 ) Change subject: IMPALA-12535: Fix misleading metric keys for the threadz page .. Patch Set 2: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/20658 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I15a8cf0a318bc7122d1f5df29f18d8e467249ef7 Gerrit-Change-Number: 20658 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Yida Wu Gerrit-Reviewer: Yifan Zhang Gerrit-Comment-Date: Tue, 14 Nov 2023 12:44:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12544: Add additional query progress reporting for the shell
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20672 ) Change subject: IMPALA-12544: Add additional query progress reporting for the shell .. Patch Set 5: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/20672/5/shell/impala_shell.py File shell/impala_shell.py: http://gerrit.cloudera.org:8080/#/c/20672/5/shell/impala_shell.py@1317 PS5, Line 1317: nit: Remove this blank? -- To view, visit http://gerrit.cloudera.org:8080/20672 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I11a704885505442b7499a026fcee3b86696cd064 Gerrit-Change-Number: 20672 Gerrit-PatchSet: 5 Gerrit-Owner: Zihao Ye Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yifan Zhang Gerrit-Reviewer: Zihao Ye Gerrit-Comment-Date: Thu, 09 Nov 2023 07:28:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12377: Improve count(*) performance for jdbc external table
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20653 ) Change subject: IMPALA-12377: Improve count(*) performance for jdbc external table .. Patch Set 4: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/20653 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9953dca949eb773022f1d6dcf48d8877857635d6 Gerrit-Change-Number: 20653 Gerrit-PatchSet: 4 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Yifan Zhang Gerrit-Comment-Date: Thu, 09 Nov 2023 07:19:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12377: Improve count(*) performance for jdbc external table
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20653 ) Change subject: IMPALA-12377: Improve count(*) performance for jdbc external table .. Patch Set 2: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/20653 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9953dca949eb773022f1d6dcf48d8877857635d6 Gerrit-Change-Number: 20653 Gerrit-PatchSet: 2 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Yifan Zhang Gerrit-Comment-Date: Sun, 05 Nov 2023 06:46:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12376: DataSourceScanNode drop some returned rows
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20636 ) Change subject: IMPALA-12376: DataSourceScanNode drop some returned rows .. Patch Set 1: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/20636 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I978d0a65faa63a47ec86a0127c0bee8dfb79530b Gerrit-Change-Number: 20636 Gerrit-PatchSet: 1 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yifan Zhang Gerrit-Comment-Date: Thu, 02 Nov 2023 06:26:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12312: Using correct executor group set info for planning
Yifan Zhang has abandoned this change. ( http://gerrit.cloudera.org:8080/20273 ) Change subject: IMPALA-12312: Using correct executor group set info for planning .. Abandoned -- To view, visit http://gerrit.cloudera.org:8080/20273 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: Ia13fb40558441d4dcc0b3e7910d3746bb61e6b80 Gerrit-Change-Number: 20273 Gerrit-PatchSet: 7 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Yifan Zhang
[Impala-ASF-CR] IMPALA-12312: Using correct executor group set info for planning
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20273 ) Change subject: IMPALA-12312: Using correct executor group set info for planning .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/20273/6/be/src/scheduling/cluster-membership-mgr.cc File be/src/scheduling/cluster-membership-mgr.cc: http://gerrit.cloudera.org:8080/#/c/20273/6/be/src/scheduling/cluster-membership-mgr.cc@671 PS6, Line 671: // Add a default exec group set if no expected group sets were specified. : exec_group_sets.emplace_back(); : exec_group_sets.back().__set_expected_num_executors(FLAGS_num_expected_executors); > Agree with Riza. We only add 'default' EG if expected EGs are not set. With and without this change, if some backends/executors in a cluster are configured with '--executor_groups=default' or configured without setting the 'executor_groups' flag(an empty string in this flag means 'default'), the coordinator can find the 'default' group in 'all_groups'. Actually, all executors and executor groups in the cluster can be found in the cluster membership snapshot. Impala now(without this change), only sends the 'default' EG to the frontend as long as a 'default' group can be found in the cluster membership snapshot, no matter whether the flag 'expected_exec_group_sets' is set or not. IIUC, 'cluster in multiple executor group set mode' means multiple EG sets are configured in the startup flag 'expected_exec_group_sets'. Impala should assume that the cluster only has one 'default' EG if the flag 'expected_exec_group_sets' is not set. Is this right? If so, seems we need a check on whether EGs in 'expected_exec_group_sets' and EGs in the cluster membership snapshot are consistent and default groups should not exist with other non-default groups. If not, frontend will see different cluster members with backends, and can't make good query plans. -- To view, visit http://gerrit.cloudera.org:8080/20273 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia13fb40558441d4dcc0b3e7910d3746bb61e6b80 Gerrit-Change-Number: 20273 Gerrit-PatchSet: 7 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Yifan Zhang Gerrit-Comment-Date: Tue, 26 Sep 2023 08:56:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12312: Using correct executor group set info for planning
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20273 ) Change subject: IMPALA-12312: Using correct executor group set info for planning .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/20273/6/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/20273/6/fe/src/main/java/org/apache/impala/service/Frontend.java@1974 PS6, Line 1974: isDefaultExecGroupSet(e)) { : result.add(new TExecutorGroupSet(e)); > Wrap this into a function? Done -- To view, visit http://gerrit.cloudera.org:8080/20273 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia13fb40558441d4dcc0b3e7910d3746bb61e6b80 Gerrit-Change-Number: 20273 Gerrit-PatchSet: 7 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Yifan Zhang Gerrit-Comment-Date: Fri, 22 Sep 2023 12:51:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12312: Using correct executor group set info for planning
Hello Riza Suminto, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20273 to look at the new patch set (#7). Change subject: IMPALA-12312: Using correct executor group set info for planning .. IMPALA-12312: Using correct executor group set info for planning Prior to this patch, planner always selects the default group if there is a default group in an impala cluster. When a client sets a non-default request pool, planner still assumes the query run on the default group and calculates the wrong number of nodes and instances. This patch fixes it by including both default and non-default groups in the update message sent from BE to FE, so planner can generate a plan based on correct executor group set info. Besides, if no matched executor group is found, planner falls back to using the default group for planning, which is consistent with BE's behavior in GetExecutorGroupsForQuery. Tests: - Add new test cases to ClusterMembershipMgrUnitTest. - Add e2e test to verify the new behavior of planner. Change-Id: Ia13fb40558441d4dcc0b3e7910d3746bb61e6b80 --- M be/src/scheduling/cluster-membership-mgr-test.cc M be/src/scheduling/cluster-membership-mgr.cc M be/src/scheduling/cluster-membership-mgr.h M fe/src/main/java/org/apache/impala/service/Frontend.java M tests/custom_cluster/test_executor_groups.py 5 files changed, 166 insertions(+), 55 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/20273/7 -- To view, visit http://gerrit.cloudera.org:8080/20273 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia13fb40558441d4dcc0b3e7910d3746bb61e6b80 Gerrit-Change-Number: 20273 Gerrit-PatchSet: 7 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Yifan Zhang
[Impala-ASF-CR] IMPALA-12312: Using correct executor group set info for planning
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20273 ) Change subject: IMPALA-12312: Using correct executor group set info for planning .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/20273/6/be/src/scheduling/cluster-membership-mgr.cc File be/src/scheduling/cluster-membership-mgr.cc: http://gerrit.cloudera.org:8080/#/c/20273/6/be/src/scheduling/cluster-membership-mgr.cc@671 PS6, Line 671: // Add a default exec group set if no expected group sets were specified. : exec_group_sets.emplace_back(); : exec_group_sets.back().__set_expected_num_executors(FLAGS_num_expected_executors); > Can we double check what is Impala rules on executor group sets configurati AFAIK, Impala does allow users to configure a 'default' EG and other EGs in a cluster. Yes, the 'default' EG here is just a fallback. If REQUEST_POOL query option is set to a non-existent pool(misconfiguration or some EG sets have been destroyed for auto-scaling), the coordinator should schedule this query on the 'default' group: https://github.com/apache/impala/blob/4d15558b5eaa69e872917c8bbf69dc1dc2146bc5/be/src/scheduling/admission-controller.cc#L2603-L2608. -- To view, visit http://gerrit.cloudera.org:8080/20273 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia13fb40558441d4dcc0b3e7910d3746bb61e6b80 Gerrit-Change-Number: 20273 Gerrit-PatchSet: 6 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Yifan Zhang Gerrit-Comment-Date: Thu, 21 Sep 2023 13:36:09 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12312: Using correct executor group set info for planning
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20273 to look at the new patch set (#6). Change subject: IMPALA-12312: Using correct executor group set info for planning .. IMPALA-12312: Using correct executor group set info for planning Prior to this patch, planner always selects the default group if there is a default group in an impala cluster. When a client sets a non-default request pool, planner still assumes the query run on the default group and calculates the wrong number of nodes and instances. This patch fixes it by including both default and non-default groups in the update message sent from BE to FE, so planner can generate a plan based on correct executor group set info. Besides, if no matched executor group is found, planner falls back to using the default group for planning, which is consistent with BE's behavior in GetExecutorGroupsForQuery. Tests: - Add new test cases to ClusterMembershipMgrUnitTest. - Add e2e test to verify the new behavior of planner. Change-Id: Ia13fb40558441d4dcc0b3e7910d3746bb61e6b80 --- M be/src/scheduling/cluster-membership-mgr-test.cc M be/src/scheduling/cluster-membership-mgr.cc M be/src/scheduling/cluster-membership-mgr.h M fe/src/main/java/org/apache/impala/service/Frontend.java M tests/custom_cluster/test_executor_groups.py 5 files changed, 165 insertions(+), 54 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/20273/6 -- To view, visit http://gerrit.cloudera.org:8080/20273 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia13fb40558441d4dcc0b3e7910d3746bb61e6b80 Gerrit-Change-Number: 20273 Gerrit-PatchSet: 6 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-12312: Using correct executor group set info for planning
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20273 to look at the new patch set (#5). Change subject: IMPALA-12312: Using correct executor group set info for planning .. IMPALA-12312: Using correct executor group set info for planning Prior to this patch, planner always selects the default group if there is a default group in an impala cluster. When a client sets a non-default request pool, planner still assumes the query run on the default group and calculates the wrong number of nodes and instances. This patch fixes it by including both default and non-default groups in the update message sent from BE to FE, so planner can generate a plan based on correct executor group set info. Besides, if no matched executor group is found, planner falls back to using the default group for planning, which is consistent with BE's behavior in GetExecutorGroupsForQuery. Tests: - Add new test cases to ClusterMembershipMgrUnitTest. - Add e2e test to verify the new behavior of planner. Change-Id: Ia13fb40558441d4dcc0b3e7910d3746bb61e6b80 --- M be/src/scheduling/cluster-membership-mgr-test.cc M be/src/scheduling/cluster-membership-mgr.cc M be/src/scheduling/cluster-membership-mgr.h M fe/src/main/java/org/apache/impala/service/Frontend.java M tests/custom_cluster/test_executor_groups.py 5 files changed, 163 insertions(+), 54 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/20273/5 -- To view, visit http://gerrit.cloudera.org:8080/20273 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia13fb40558441d4dcc0b3e7910d3746bb61e6b80 Gerrit-Change-Number: 20273 Gerrit-PatchSet: 5 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-12312: Using correct executor group set info for planning
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20273 to look at the new patch set (#4). Change subject: IMPALA-12312: Using correct executor group set info for planning .. IMPALA-12312: Using correct executor group set info for planning Prior to this patch, planner always selects the default group if there is a default group in an impala cluster. When a client sets a non-default request pool, planner still assumes the query run on the default group and calculates the wrong number of nodes and instances. This patch fixes it by including both default and non-default groups in the update message sent from BE to FE, so planner can generate a plan based on correct executor group set info. Besides, if no matched executor group is found, planner falls back to using the default group for planning, which is consistent with BE's behavior in GetExecutorGroupsForQuery. Tests: - Add new test cases to ClusterMembershipMgrUnitTest. - Add e2e test to verify the new behavior of planner. Change-Id: Ia13fb40558441d4dcc0b3e7910d3746bb61e6b80 --- M be/src/scheduling/admission-controller.cc M be/src/scheduling/cluster-membership-mgr-test.cc M be/src/scheduling/cluster-membership-mgr.cc M be/src/scheduling/cluster-membership-mgr.h M be/src/service/impala-server.cc M fe/src/main/java/org/apache/impala/service/Frontend.java M tests/custom_cluster/test_executor_groups.py 7 files changed, 161 insertions(+), 54 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/20273/4 -- To view, visit http://gerrit.cloudera.org:8080/20273 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia13fb40558441d4dcc0b3e7910d3746bb61e6b80 Gerrit-Change-Number: 20273 Gerrit-PatchSet: 4 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-12288: Add BUILD WITH NO TESTS option to remove test targets
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20294 ) Change subject: IMPALA-12288: Add BUILD_WITH_NO_TESTS option to remove test targets .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/20294/5/testdata/bin/copy-udfs-udas.sh File testdata/bin/copy-udfs-udas.sh: http://gerrit.cloudera.org:8080/#/c/20294/5/testdata/bin/copy-udfs-udas.sh@58 PS5, Line 58: cd "${IMPALA_HOME}/java/test-hive-udfs" > optional: Can we skip this when we are building Impala without the 'notests It's truly a bit tricky to set the CMAKE option again like this. I updated 'buildall.sh' in the newest patch set, now the 'BUILD_WITH_NO_TESTS' is set ON only when '-notests' and 'package' flags are used at the same time. So that the previous test workflow will not be impacted. -- To view, visit http://gerrit.cloudera.org:8080/20294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I575ce76176c9f6a05fd2db0f420ebe6926d0272a Gerrit-Change-Number: 20294 Gerrit-PatchSet: 7 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Xiang Yang Gerrit-Reviewer: Yifan Zhang Gerrit-Comment-Date: Thu, 10 Aug 2023 12:31:00 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12288: Add BUILD WITH NO TESTS option to remove test targets
Hello Quanlong Huang, Michael Smith, Joe McDonnell, Impala Public Jenkins, Xiang Yang, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20294 to look at the new patch set (#7). Change subject: IMPALA-12288: Add BUILD_WITH_NO_TESTS option to remove test targets .. IMPALA-12288: Add BUILD_WITH_NO_TESTS option to remove test targets This patch adds a new option 'BUILD_WITH_NO_TESTS' to tell CMake not to generate test targets. In order to be consistent with the previous test workflow, this option is only set ON when building impala using the 'buildall.sh' script with '-notest' and '-package' flags. This is useful for a packaging build which do not need to build all test binaries. Testing: - Ran 'buildall.sh -release -package' with and without '-notests' flag and verified generated executables. Change-Id: I575ce76176c9f6a05fd2db0f420ebe6926d0272a --- M be/CMakeLists.txt M be/src/benchmarks/CMakeLists.txt M be/src/catalog/CMakeLists.txt M be/src/codegen/CMakeLists.txt M be/src/common/CMakeLists.txt M be/src/exec/CMakeLists.txt M be/src/exec/avro/CMakeLists.txt M be/src/exec/parquet/CMakeLists.txt M be/src/experiments/CMakeLists.txt M be/src/exprs/CMakeLists.txt M be/src/gutil/CMakeLists.txt M be/src/rpc/CMakeLists.txt M be/src/runtime/CMakeLists.txt M be/src/runtime/bufferpool/CMakeLists.txt M be/src/runtime/io/CMakeLists.txt M be/src/scheduling/CMakeLists.txt M be/src/service/CMakeLists.txt M be/src/statestore/CMakeLists.txt M be/src/testutil/CMakeLists.txt M be/src/udf/CMakeLists.txt M be/src/udf_samples/CMakeLists.txt M be/src/util/CMakeLists.txt M be/src/util/cache/CMakeLists.txt M buildall.sh 24 files changed, 167 insertions(+), 66 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/20294/7 -- To view, visit http://gerrit.cloudera.org:8080/20294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I575ce76176c9f6a05fd2db0f420ebe6926d0272a Gerrit-Change-Number: 20294 Gerrit-PatchSet: 7 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Xiang Yang Gerrit-Reviewer: Yifan Zhang
[Impala-ASF-CR] IMPALA-12288: Add BUILD WITH NO TESTS option to remove test targets
Hello Quanlong Huang, Michael Smith, Joe McDonnell, Impala Public Jenkins, Xiang Yang, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20294 to look at the new patch set (#6). Change subject: IMPALA-12288: Add BUILD_WITH_NO_TESTS option to remove test targets .. IMPALA-12288: Add BUILD_WITH_NO_TESTS option to remove test targets This patch adds a new option 'BUILD_WITH_NO_TESTS' to tell CMake not to generate test targets. In order to be consistent with the previous test workflow, this option is only set ON when building impala using the 'buildall.sh' script with '-notest' and '-package' flags. This is useful for a packaging build which do not need to build all test binaries. Testing: - Ran 'buildall.sh -release -package' with and without '-notests' flag and verified generated executables. Change-Id: I575ce76176c9f6a05fd2db0f420ebe6926d0272a --- M be/CMakeLists.txt M be/src/benchmarks/CMakeLists.txt M be/src/catalog/CMakeLists.txt M be/src/codegen/CMakeLists.txt M be/src/common/CMakeLists.txt M be/src/exec/CMakeLists.txt M be/src/exec/avro/CMakeLists.txt M be/src/exec/parquet/CMakeLists.txt M be/src/experiments/CMakeLists.txt M be/src/exprs/CMakeLists.txt M be/src/gutil/CMakeLists.txt M be/src/rpc/CMakeLists.txt M be/src/runtime/CMakeLists.txt M be/src/runtime/bufferpool/CMakeLists.txt M be/src/runtime/io/CMakeLists.txt M be/src/scheduling/CMakeLists.txt M be/src/service/CMakeLists.txt M be/src/statestore/CMakeLists.txt M be/src/testutil/CMakeLists.txt M be/src/udf/CMakeLists.txt M be/src/udf_samples/CMakeLists.txt M be/src/util/CMakeLists.txt M be/src/util/cache/CMakeLists.txt M buildall.sh 24 files changed, 167 insertions(+), 66 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/20294/6 -- To view, visit http://gerrit.cloudera.org:8080/20294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I575ce76176c9f6a05fd2db0f420ebe6926d0272a Gerrit-Change-Number: 20294 Gerrit-PatchSet: 6 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Xiang Yang Gerrit-Reviewer: Yifan Zhang
[Impala-ASF-CR] IMPALA-12288: Add BUILD WITH NO TESTS option to remove test targets
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20294 ) Change subject: IMPALA-12288: Add BUILD_WITH_NO_TESTS option to remove test targets .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/20294/5/be/src/catalog/CMakeLists.txt File be/src/catalog/CMakeLists.txt: http://gerrit.cloudera.org:8080/#/c/20294/5/be/src/catalog/CMakeLists.txt@29 PS5, Line 29: if (BUILD_WITH_NO_TESTS) > Well, for example, if someone add new release target at the bottom of this Yeah, that makes sense. But I think this kind of error can be easily detected by regression tests. -- To view, visit http://gerrit.cloudera.org:8080/20294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I575ce76176c9f6a05fd2db0f420ebe6926d0272a Gerrit-Change-Number: 20294 Gerrit-PatchSet: 5 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Xiang Yang Gerrit-Reviewer: Yifan Zhang Gerrit-Comment-Date: Mon, 07 Aug 2023 13:21:14 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12288: Add BUILD WITH NO TESTS option to remove test targets
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20294 ) Change subject: IMPALA-12288: Add BUILD_WITH_NO_TESTS option to remove test targets .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/20294/5/be/src/catalog/CMakeLists.txt File be/src/catalog/CMakeLists.txt: http://gerrit.cloudera.org:8080/#/c/20294/5/be/src/catalog/CMakeLists.txt@29 PS5, Line 29: if (BUILD_WITH_NO_TESTS) > Hi yifan, I think it'd better to wrap the following code block within the ' Well, I think we can get the same result by returning early without modifying too much codes. Could you elaborate more on why is it recommended to wrap the codes into an if block? -- To view, visit http://gerrit.cloudera.org:8080/20294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I575ce76176c9f6a05fd2db0f420ebe6926d0272a Gerrit-Change-Number: 20294 Gerrit-PatchSet: 5 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Xiang Yang Gerrit-Reviewer: Yifan Zhang Gerrit-Comment-Date: Mon, 07 Aug 2023 06:38:57 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12288: Add BUILD WITH NO TESTS option to remove test targets
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20294 ) Change subject: IMPALA-12288: Add BUILD_WITH_NO_TESTS option to remove test targets .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/20294/3/bin/bootstrap_build.sh File bin/bootstrap_build.sh: http://gerrit.cloudera.org:8080/#/c/20294/3/bin/bootstrap_build.sh@64 PS3, Line 64: ./buildall.sh -notests -so > Ah, create-load-data calls copy-udfs-udas, which makes a few specific targe This change was intended to fix the failure in copy-udfs-udas. But it turns out that 'bootstrap_build.sh' is called in 'jenkins/build-only.sh', which is not used to run all tests. It seems that we run tests using 'jenkins/dockerized-impala-run-tests.sh', which calls './buildall.sh -format -testdata -notests' to build Impala and load data. Considering that building all backend tests is not necessary, I fixed the issue in copy-udfs-udas by manually running cmake again before building tests. -- To view, visit http://gerrit.cloudera.org:8080/20294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I575ce76176c9f6a05fd2db0f420ebe6926d0272a Gerrit-Change-Number: 20294 Gerrit-PatchSet: 5 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yifan Zhang Gerrit-Comment-Date: Fri, 04 Aug 2023 04:53:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12288: Add BUILD WITH NO TESTS option to remove test targets
Hello Michael Smith, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20294 to look at the new patch set (#5). Change subject: IMPALA-12288: Add BUILD_WITH_NO_TESTS option to remove test targets .. IMPALA-12288: Add BUILD_WITH_NO_TESTS option to remove test targets This patch adds a new option 'BUILD_WITH_NO_TESTS' to tell CMake not to generate test targets. The option is set ON when building impala using the command 'buildall.sh -notests'. This should be useful for a packing build because cpack built all targets prior to this change. Testing: - Ran 'buildall.sh -release -package' with and without '-notests' flag and verified generated executables. Change-Id: I575ce76176c9f6a05fd2db0f420ebe6926d0272a --- M be/CMakeLists.txt M be/src/benchmarks/CMakeLists.txt M be/src/catalog/CMakeLists.txt M be/src/codegen/CMakeLists.txt M be/src/common/CMakeLists.txt M be/src/exec/CMakeLists.txt M be/src/exec/avro/CMakeLists.txt M be/src/exec/parquet/CMakeLists.txt M be/src/experiments/CMakeLists.txt M be/src/exprs/CMakeLists.txt M be/src/gutil/CMakeLists.txt M be/src/rpc/CMakeLists.txt M be/src/runtime/CMakeLists.txt M be/src/runtime/bufferpool/CMakeLists.txt M be/src/runtime/io/CMakeLists.txt M be/src/scheduling/CMakeLists.txt M be/src/service/CMakeLists.txt M be/src/statestore/CMakeLists.txt M be/src/testutil/CMakeLists.txt M be/src/udf/CMakeLists.txt M be/src/udf_samples/CMakeLists.txt M be/src/util/CMakeLists.txt M be/src/util/cache/CMakeLists.txt M buildall.sh M testdata/bin/copy-udfs-udas.sh 25 files changed, 174 insertions(+), 66 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/20294/5 -- To view, visit http://gerrit.cloudera.org:8080/20294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I575ce76176c9f6a05fd2db0f420ebe6926d0272a Gerrit-Change-Number: 20294 Gerrit-PatchSet: 5 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yifan Zhang
[Impala-ASF-CR] IMPALA-12288: Add BUILD WITH NO TESTS option to remove test targets
Hello Michael Smith, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20294 to look at the new patch set (#4). Change subject: IMPALA-12288: Add BUILD_WITH_NO_TESTS option to remove test targets .. IMPALA-12288: Add BUILD_WITH_NO_TESTS option to remove test targets This patch adds a new option 'BUILD_WITH_NO_TESTS' to tell CMake not to generate test targets. The option is set ON when building impala using the command 'buildall.sh -notests'. This should be useful for a packing build because cpack built all targets prior to this change. Testing: - Ran 'buildall.sh -release -package' with and without '-notests' flag and verified generated executables. Change-Id: I575ce76176c9f6a05fd2db0f420ebe6926d0272a --- M be/CMakeLists.txt M be/src/benchmarks/CMakeLists.txt M be/src/catalog/CMakeLists.txt M be/src/codegen/CMakeLists.txt M be/src/common/CMakeLists.txt M be/src/exec/CMakeLists.txt M be/src/exec/avro/CMakeLists.txt M be/src/exec/parquet/CMakeLists.txt M be/src/experiments/CMakeLists.txt M be/src/exprs/CMakeLists.txt M be/src/gutil/CMakeLists.txt M be/src/rpc/CMakeLists.txt M be/src/runtime/CMakeLists.txt M be/src/runtime/bufferpool/CMakeLists.txt M be/src/runtime/io/CMakeLists.txt M be/src/scheduling/CMakeLists.txt M be/src/service/CMakeLists.txt M be/src/statestore/CMakeLists.txt M be/src/testutil/CMakeLists.txt M be/src/udf/CMakeLists.txt M be/src/udf_samples/CMakeLists.txt M be/src/util/CMakeLists.txt M be/src/util/cache/CMakeLists.txt M buildall.sh M testdata/bin/copy-udfs-udas.sh 25 files changed, 174 insertions(+), 66 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/20294/4 -- To view, visit http://gerrit.cloudera.org:8080/20294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I575ce76176c9f6a05fd2db0f420ebe6926d0272a Gerrit-Change-Number: 20294 Gerrit-PatchSet: 4 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yifan Zhang
[Impala-ASF-CR] IMPALA-12288: Add BUILD WITH NO TESTS option to remove test targets
Hello Michael Smith, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20294 to look at the new patch set (#3). Change subject: IMPALA-12288: Add BUILD_WITH_NO_TESTS option to remove test targets .. IMPALA-12288: Add BUILD_WITH_NO_TESTS option to remove test targets This patch adds a new option 'BUILD_WITH_NO_TESTS' to tell CMake not to generate test targets. The option is set ON when building impala using the command 'buildall.sh -notests'. This should be useful for a packing build because cpack built all targets prior to this change. Testing: - Ran 'buildall.sh -release -package' with and without '-notests' flag and verified generated executables. Change-Id: I575ce76176c9f6a05fd2db0f420ebe6926d0272a --- M be/CMakeLists.txt M be/src/benchmarks/CMakeLists.txt M be/src/catalog/CMakeLists.txt M be/src/codegen/CMakeLists.txt M be/src/common/CMakeLists.txt M be/src/exec/CMakeLists.txt M be/src/exec/avro/CMakeLists.txt M be/src/exec/parquet/CMakeLists.txt M be/src/experiments/CMakeLists.txt M be/src/exprs/CMakeLists.txt M be/src/gutil/CMakeLists.txt M be/src/rpc/CMakeLists.txt M be/src/runtime/CMakeLists.txt M be/src/runtime/bufferpool/CMakeLists.txt M be/src/runtime/io/CMakeLists.txt M be/src/scheduling/CMakeLists.txt M be/src/service/CMakeLists.txt M be/src/statestore/CMakeLists.txt M be/src/testutil/CMakeLists.txt M be/src/udf/CMakeLists.txt M be/src/udf_samples/CMakeLists.txt M be/src/util/CMakeLists.txt M be/src/util/cache/CMakeLists.txt M bin/bootstrap_build.sh M buildall.sh 25 files changed, 168 insertions(+), 67 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/20294/3 -- To view, visit http://gerrit.cloudera.org:8080/20294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I575ce76176c9f6a05fd2db0f420ebe6926d0272a Gerrit-Change-Number: 20294 Gerrit-PatchSet: 3 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yifan Zhang
[Impala-ASF-CR] IMPALA-12288: Add BUILD WITH NO TESTS option to remove test targets
Hello Michael Smith, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20294 to look at the new patch set (#2). Change subject: IMPALA-12288: Add BUILD_WITH_NO_TESTS option to remove test targets .. IMPALA-12288: Add BUILD_WITH_NO_TESTS option to remove test targets This patch adds a new option 'BUILD_WITH_NO_TESTS' to tell CMake not to generate test targets. The option is set ON when building impala using the command 'buildall.sh -notests'. This should be useful for a packing build because cpack built all targets prior to this change. Testing: - Ran 'buildall.sh -release -package' with and without '-notests' flag and verified generated executables. Change-Id: I575ce76176c9f6a05fd2db0f420ebe6926d0272a --- M be/CMakeLists.txt M be/src/benchmarks/CMakeLists.txt M be/src/catalog/CMakeLists.txt M be/src/codegen/CMakeLists.txt M be/src/common/CMakeLists.txt M be/src/exec/CMakeLists.txt M be/src/exec/avro/CMakeLists.txt M be/src/exec/parquet/CMakeLists.txt M be/src/experiments/CMakeLists.txt M be/src/exprs/CMakeLists.txt M be/src/gutil/CMakeLists.txt M be/src/rpc/CMakeLists.txt M be/src/runtime/CMakeLists.txt M be/src/runtime/bufferpool/CMakeLists.txt M be/src/runtime/io/CMakeLists.txt M be/src/scheduling/CMakeLists.txt M be/src/service/CMakeLists.txt M be/src/statestore/CMakeLists.txt M be/src/testutil/CMakeLists.txt M be/src/udf/CMakeLists.txt M be/src/udf_samples/CMakeLists.txt M be/src/util/CMakeLists.txt M be/src/util/cache/CMakeLists.txt M buildall.sh 24 files changed, 167 insertions(+), 66 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/20294/2 -- To view, visit http://gerrit.cloudera.org:8080/20294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I575ce76176c9f6a05fd2db0f420ebe6926d0272a Gerrit-Change-Number: 20294 Gerrit-PatchSet: 2 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith
[Impala-ASF-CR] IMPALA-12288: Add BUILD WITH NO TESTS option to remove test targets
Yifan Zhang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/20294 Change subject: IMPALA-12288: Add BUILD_WITH_NO_TESTS option to remove test targets .. IMPALA-12288: Add BUILD_WITH_NO_TESTS option to remove test targets This patch adds a new option 'BUILD_WITH_NO_TESTS' to tell CMake not to generate test targets. The option is set ON when building impala using the command 'buildall.sh -notests'. This should be useful for a packing build because cpack build all targets prior to this change. Testing: - Ran 'buildall.sh -release -package' with and without '-notests' flag and verified generated executables. Change-Id: I575ce76176c9f6a05fd2db0f420ebe6926d0272a --- M be/CMakeLists.txt M be/src/benchmarks/CMakeLists.txt M be/src/catalog/CMakeLists.txt M be/src/codegen/CMakeLists.txt M be/src/common/CMakeLists.txt M be/src/exec/CMakeLists.txt M be/src/exec/avro/CMakeLists.txt M be/src/exec/parquet/CMakeLists.txt M be/src/experiments/CMakeLists.txt M be/src/exprs/CMakeLists.txt M be/src/gutil/CMakeLists.txt M be/src/rpc/CMakeLists.txt M be/src/runtime/CMakeLists.txt M be/src/runtime/bufferpool/CMakeLists.txt M be/src/runtime/io/CMakeLists.txt M be/src/scheduling/CMakeLists.txt M be/src/service/CMakeLists.txt M be/src/statestore/CMakeLists.txt M be/src/testutil/CMakeLists.txt M be/src/udf/CMakeLists.txt M be/src/udf_samples/CMakeLists.txt M be/src/util/CMakeLists.txt M be/src/util/cache/CMakeLists.txt M buildall.sh 24 files changed, 164 insertions(+), 63 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/20294/1 -- To view, visit http://gerrit.cloudera.org:8080/20294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I575ce76176c9f6a05fd2db0f420ebe6926d0272a Gerrit-Change-Number: 20294 Gerrit-PatchSet: 1 Gerrit-Owner: Yifan Zhang
[Impala-ASF-CR] IMPALA-12312: Using correct executor group set info for planning
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20273 to look at the new patch set (#2). Change subject: IMPALA-12312: Using correct executor group set info for planning .. IMPALA-12312: Using correct executor group set info for planning Prior to this patch, planner always selects the default group if there is a default group in an impala cluster. When a client sets a non-default request pool, planner still assumes the query run on the default group and calculates the wrong number of nodes and instances. This patch fixes it by including both default and non-default groups in the update message sent from BE to FE, so planner can generate a plan based on correct executor group set info. Besides, if no matched executor group is found, planner falls back to using the default group for planning, which is consistent with BE's behavior in GetExecutorGroupsForQuery. Tests: - Add new test cases to ClusterMembershipMgrUnitTest. - Add e2e test to verify the new behavior of planner. Change-Id: Ia13fb40558441d4dcc0b3e7910d3746bb61e6b80 --- M be/src/scheduling/cluster-membership-mgr-test.cc M be/src/scheduling/cluster-membership-mgr.cc M be/src/scheduling/cluster-membership-mgr.h M fe/src/main/java/org/apache/impala/service/Frontend.java M tests/custom_cluster/test_executor_groups.py 5 files changed, 149 insertions(+), 43 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/20273/2 -- To view, visit http://gerrit.cloudera.org:8080/20273 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia13fb40558441d4dcc0b3e7910d3746bb61e6b80 Gerrit-Change-Number: 20273 Gerrit-PatchSet: 2 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-12312: Using correct executor group set info for planning
Yifan Zhang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/20273 Change subject: IMPALA-12312: Using correct executor group set info for planning .. IMPALA-12312: Using correct executor group set info for planning Prior to this patch, planner always selects the default group if there is a default group in an impala cluster. When a client sets a non-default request pool, planner still assumes the query run on the default group and calculates the wrong number of nodes and instances. This patch fixes it by including both default and non-default groups in the update message sent from BE to FE, so planner can generate a plan based on correct executor group set info. Besides, if no matched executor group is found, planner falls back to using the default group for planning, which is consistent with BE's behavior in GetExecutorGroupsForQuery. Tests: - Add new test cases to ClusterMembershipMgrUnitTest. - Add e2e test to verify the new behavior of planner. Change-Id: Ia13fb40558441d4dcc0b3e7910d3746bb61e6b80 --- M be/src/scheduling/cluster-membership-mgr-test.cc M be/src/scheduling/cluster-membership-mgr.cc M be/src/scheduling/cluster-membership-mgr.h M fe/src/main/java/org/apache/impala/service/Frontend.java M tests/custom_cluster/test_executor_groups.py 5 files changed, 149 insertions(+), 43 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/20273/1 -- To view, visit http://gerrit.cloudera.org:8080/20273 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ia13fb40558441d4dcc0b3e7910d3746bb61e6b80 Gerrit-Change-Number: 20273 Gerrit-PatchSet: 1 Gerrit-Owner: Yifan Zhang
[Impala-ASF-CR] IMPALA-10262: RPM/DEB Packaging Support
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18939 ) Change subject: IMPALA-10262: RPM/DEB Packaging Support .. Patch Set 10: > Patch Set 10: Code-Review+2 > > I filed IMPALA-12288 to track having a mode that avoids building the tests > when packaging. > > I'm bumping up to +2, because I think we can address any issues we find in > subsequent changes. That makes sense to me. Thanks for filing the JIRA. -- To view, visit http://gerrit.cloudera.org:8080/18939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I64419fd400fe8d233dac016b6306157fe9461d82 Gerrit-Change-Number: 18939 Gerrit-PatchSet: 10 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Xiang Yang Gerrit-Reviewer: Yifan Zhang Gerrit-Comment-Date: Sat, 15 Jul 2023 01:19:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10262: RPM/DEB Packaging Support
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18939 ) Change subject: IMPALA-10262: RPM/DEB Packaging Support .. Patch Set 10: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/18939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I64419fd400fe8d233dac016b6306157fe9461d82 Gerrit-Change-Number: 18939 Gerrit-PatchSet: 10 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Xiang Yang Gerrit-Reviewer: Yifan Zhang Gerrit-Comment-Date: Sat, 15 Jul 2023 01:16:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10262: RPM/DEB Packaging Support
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18939 ) Change subject: IMPALA-10262: RPM/DEB Packaging Support .. Patch Set 10: I ran the command './buildall.sh -noclean -notests -release -package' and found the ctests were built even with '--notests' option. -- To view, visit http://gerrit.cloudera.org:8080/18939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I64419fd400fe8d233dac016b6306157fe9461d82 Gerrit-Change-Number: 18939 Gerrit-PatchSet: 10 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Xiang Yang Gerrit-Reviewer: Yifan Zhang Gerrit-Comment-Date: Thu, 13 Jul 2023 09:26:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12188: Avoid unnecessary output from sourcing bin/impala-config.sh
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20098 ) Change subject: IMPALA-12188: Avoid unnecessary output from sourcing bin/impala-config.sh .. Patch Set 2: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/20098 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib4e39f50c7efb8c42a6d3597be0e18c4c79457c5 Gerrit-Change-Number: 20098 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yifan Zhang Gerrit-Comment-Date: Mon, 10 Jul 2023 08:16:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12249: Fix the unexpected word wrap of 'Progress' in WebUI queries page
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20130 ) Change subject: IMPALA-12249: Fix the unexpected word wrap of 'Progress' in WebUI queries page .. Patch Set 1: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/20130 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I894ada826282d33c3f2395231db1ddf97bc82367 Gerrit-Change-Number: 20130 Gerrit-PatchSet: 1 Gerrit-Owner: Zihao Ye Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Yifan Zhang Gerrit-Comment-Date: Wed, 28 Jun 2023 09:00:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10262: RPM/DEB Packaging Support
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18939 ) Change subject: IMPALA-10262: RPM/DEB Packaging Support .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/18939/8/package/bin/impala-env.sh File package/bin/impala-env.sh: http://gerrit.cloudera.org:8080/#/c/18939/8/package/bin/impala-env.sh@32 PS8, Line 32: export LIBHDFS_OPTS="${LIBHDFS_OPTS:-} -Djava.library.path=${HADOOP_LIB_DIR}/native/" : echo "Using hadoop native libs in ${HADOOP_LIB_DIR}/native/" : else : echo "WARNING: HDFS short-circuit reads are not enabled due to HADOOP_HOME not set." nit: Could we also pack hadoop native libs into the final package? -- To view, visit http://gerrit.cloudera.org:8080/18939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I64419fd400fe8d233dac016b6306157fe9461d82 Gerrit-Change-Number: 18939 Gerrit-PatchSet: 8 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Xiang Yang Gerrit-Reviewer: Yifan Zhang Gerrit-Comment-Date: Mon, 26 Jun 2023 07:04:21 + Gerrit-HasComments: Yes