[Impala-ASF-CR] IMPALA-9691: Support Kudu Timestamp and Date bloom filter
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16094 ) Change subject: IMPALA-9691: Support Kudu Timestamp and Date bloom filter .. Patch Set 6: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/16094 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3c1e9bcc9fd6d79a39f25eaa3396188fc0a52a48 Gerrit-Change-Number: 16094 Gerrit-PatchSet: 6 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 26 Jun 2020 06:56:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9691: Support Kudu Timestamp and Date bloom filter
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16094 ) Change subject: IMPALA-9691: Support Kudu Timestamp and Date bloom filter .. IMPALA-9691: Support Kudu Timestamp and Date bloom filter Impala save timestamp as 12 bytes of structure TimestampValue with time in nano seconds. Kudu store timestamp as 8 bytes of Unix Time microseconds. To avoid the data truncation issue in the bloom filter, add FunctionCallExpr with 'utc_to_unix_micros' as the root of source expression of bloom filter to convert timestamp values to microseconds when building timestamp bloom filter for Kudu. Generated functional date_tbl table in Kudu format for unit-test. Added new test cases for Kudu Timestamp and Date bloom filters. Testing: Passed all core tests. Change-Id: I3c1e9bcc9fd6d79a39f25eaa3396188fc0a52a48 Reviewed-on: http://gerrit.cloudera.org:8080/16094 Reviewed-by: Thomas Tauber-Marshall Tested-by: Impala Public Jenkins --- M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test M testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test 7 files changed, 311 insertions(+), 30 deletions(-) Approvals: Thomas Tauber-Marshall: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/16094 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I3c1e9bcc9fd6d79a39f25eaa3396188fc0a52a48 Gerrit-Change-Number: 16094 Gerrit-PatchSet: 7 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-9697: Support priority based scratch directory selection
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16091 ) Change subject: IMPALA-9697: Support priority based scratch directory selection .. Patch Set 6: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16091 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I381c3a358e1382e6696325fec74667f1fa18dd17 Gerrit-Change-Number: 16091 Gerrit-PatchSet: 6 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 26 Jun 2020 05:19:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9697: Support priority based scratch directory selection
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16091 ) Change subject: IMPALA-9697: Support priority based scratch directory selection .. Patch Set 6: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6055/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16091 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I381c3a358e1382e6696325fec74667f1fa18dd17 Gerrit-Change-Number: 16091 Gerrit-PatchSet: 6 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 26 Jun 2020 05:19:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9697: Support priority based scratch directory selection
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16091 ) Change subject: IMPALA-9697: Support priority based scratch directory selection .. Patch Set 5: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6052/ -- To view, visit http://gerrit.cloudera.org:8080/16091 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I381c3a358e1382e6696325fec74667f1fa18dd17 Gerrit-Change-Number: 16091 Gerrit-PatchSet: 5 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 26 Jun 2020 03:43:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9691: Support Kudu Timestamp and Date bloom filter
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/16094 ) Change subject: IMPALA-9691: Support Kudu Timestamp and Date bloom filter .. Patch Set 6: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16094 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3c1e9bcc9fd6d79a39f25eaa3396188fc0a52a48 Gerrit-Change-Number: 16094 Gerrit-PatchSet: 6 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 26 Jun 2020 01:57:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9691: Support Kudu Timestamp and Date bloom filter
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16094 ) Change subject: IMPALA-9691: Support Kudu Timestamp and Date bloom filter .. Patch Set 6: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6054/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16094 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3c1e9bcc9fd6d79a39f25eaa3396188fc0a52a48 Gerrit-Change-Number: 16094 Gerrit-PatchSet: 6 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 26 Jun 2020 01:57:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9691: Support Kudu Timestamp and Date bloom filter
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16094 ) Change subject: IMPALA-9691: Support Kudu Timestamp and Date bloom filter .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6427/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16094 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3c1e9bcc9fd6d79a39f25eaa3396188fc0a52a48 Gerrit-Change-Number: 16094 Gerrit-PatchSet: 6 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 26 Jun 2020 01:43:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9691: Support Kudu Timestamp and Date bloom filter
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/16094 ) Change subject: IMPALA-9691: Support Kudu Timestamp and Date bloom filter .. Patch Set 6: (4 comments) http://gerrit.cloudera.org:8080/#/c/16094/5/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: http://gerrit.cloudera.org:8080/#/c/16094/5/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@199 PS5, Line 199: // If set, indicates that the filter is targeted for Kudu scan node with source > mention that this is for Kudu Ok, will fix it. http://gerrit.cloudera.org:8080/#/c/16094/5/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@346 PS5, Line 346: } catch (AnalysisException e) { > Lets add a Log.warn with the error message here. Will add log message. http://gerrit.cloudera.org:8080/#/c/16094/5/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@644 PS5, Line 644: joinNode, filterType, bloomFilterSizeLimits_, > Add a brief comment here saying something like "For timestamp bloom filters Will add comments. http://gerrit.cloudera.org:8080/#/c/16094/5/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@773 PS5, Line 773: nue; > I think it would make this 'if' clearer if you surrounded this with parenth Will fix it. -- To view, visit http://gerrit.cloudera.org:8080/16094 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3c1e9bcc9fd6d79a39f25eaa3396188fc0a52a48 Gerrit-Change-Number: 16094 Gerrit-PatchSet: 6 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 26 Jun 2020 01:16:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9691: Support Kudu Timestamp and Date bloom filter
Wenzhe Zhou has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/16094 ) Change subject: IMPALA-9691: Support Kudu Timestamp and Date bloom filter .. IMPALA-9691: Support Kudu Timestamp and Date bloom filter Impala save timestamp as 12 bytes of structure TimestampValue with time in nano seconds. Kudu store timestamp as 8 bytes of Unix Time microseconds. To avoid the data truncation issue in the bloom filter, add FunctionCallExpr with 'utc_to_unix_micros' as the root of source expression of bloom filter to convert timestamp values to microseconds when building timestamp bloom filter for Kudu. Generated functional date_tbl table in Kudu format for unit-test. Added new test cases for Kudu Timestamp and Date bloom filters. Testing: Passed all core tests. Change-Id: I3c1e9bcc9fd6d79a39f25eaa3396188fc0a52a48 --- M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test M testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test 7 files changed, 311 insertions(+), 30 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/16094/6 -- To view, visit http://gerrit.cloudera.org:8080/16094 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3c1e9bcc9fd6d79a39f25eaa3396188fc0a52a48 Gerrit-Change-Number: 16094 Gerrit-PatchSet: 6 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-7020: fix costing of non-trivial CAST expressions
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16073 ) Change subject: IMPALA-7020: fix costing of non-trivial CAST expressions .. IMPALA-7020: fix costing of non-trivial CAST expressions Some cast operations are quite expensive to evaluate, which was not reflected in the uniform costing of CAST expresions. We fix this by increasing the cost of non-trivial casts to be the same as an arbitrary function call. Testing: Ran exhaustive tests. Add planner tests to check that CAST expressions are materialized or not based on the input and output types - the planner output lists 'materialized:' expressions for the SORT operator. A few existing planner tests had changes in predicate ordering. I checked manually that these changes made sense. Perf: I sanity-checked that this actually helped (a variant of) the example query from IMPALA-7020. The following query went from ~8s to ~2s in my dev environment: select * FROM ( SELECT o.*, ROW_NUMBER() OVER(ORDER BY evt_ts DESC) AS rn FROM ( SELECT l_orderkey,l_partkey,l_linenumber,l_quantity, cast (l_shipdate as date) evt_ts FROM tpch_parquet.lineitem ) o ) r WHERE rn BETWEEN 1 AND 101 ORDER BY rn; Change-Id: I3f1a16fc45191a2eedf38cc243c70173d44806c6 Reviewed-on: http://gerrit.cloudera.org:8080/16073 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins --- M fe/src/main/java/org/apache/impala/analysis/CastExpr.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test M testdata/workloads/functional-planner/queries/PlannerTest/sort-expr-materialization.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test 6 files changed, 212 insertions(+), 10 deletions(-) Approvals: Tim Armstrong: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/16073 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I3f1a16fc45191a2eedf38cc243c70173d44806c6 Gerrit-Change-Number: 16073 Gerrit-PatchSet: 7 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-7020: fix costing of non-trivial CAST expressions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16073 ) Change subject: IMPALA-7020: fix costing of non-trivial CAST expressions .. Patch Set 6: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/16073 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3f1a16fc45191a2eedf38cc243c70173d44806c6 Gerrit-Change-Number: 16073 Gerrit-PatchSet: 6 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 26 Jun 2020 00:05:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9691: Support Kudu Timestamp and Date bloom filter
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/16094 ) Change subject: IMPALA-9691: Support Kudu Timestamp and Date bloom filter .. Patch Set 5: (4 comments) http://gerrit.cloudera.org:8080/#/c/16094/5/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: http://gerrit.cloudera.org:8080/#/c/16094/5/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@199 PS5, Line 199: // If set, indicates that the filter need to truncate timestamp. mention that this is for Kudu http://gerrit.cloudera.org:8080/#/c/16094/5/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@346 PS5, Line 346: return null; Lets add a Log.warn with the error message here. http://gerrit.cloudera.org:8080/#/c/16094/5/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@644 PS5, Line 644: if (filterType == TRuntimeFilterType.BLOOM Add a brief comment here saying something like "For timestamp bloom filters we also generate a RuntimeFilter with the src timestamp truncated for Kudu scan node targets" http://gerrit.cloudera.org:8080/#/c/16094/5/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@773 PS5, Line 773: targetExpr.getType().isTimestamp() && !filter.isTimestampTruncation() I think it would make this 'if' clearer if you surrounded this with parentheses, to show that these conditions are related. -- To view, visit http://gerrit.cloudera.org:8080/16094 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3c1e9bcc9fd6d79a39f25eaa3396188fc0a52a48 Gerrit-Change-Number: 16094 Gerrit-PatchSet: 5 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 26 Jun 2020 00:00:59 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9294: Support DATE for min-max runtime filter
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16103 ) Change subject: IMPALA-9294: Support DATE for min-max runtime filter .. Patch Set 2: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6050/ -- To view, visit http://gerrit.cloudera.org:8080/16103 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2f6e2dc6949735d5f0fcf317361cc2969a5e82c Gerrit-Change-Number: 16103 Gerrit-PatchSet: 2 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 25 Jun 2020 23:19:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9691: Support Kudu Timestamp and Date bloom filter
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16094 ) Change subject: IMPALA-9691: Support Kudu Timestamp and Date bloom filter .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6426/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16094 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3c1e9bcc9fd6d79a39f25eaa3396188fc0a52a48 Gerrit-Change-Number: 16094 Gerrit-PatchSet: 5 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 25 Jun 2020 23:11:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9697: Support priority based scratch directory selection
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16091 ) Change subject: IMPALA-9697: Support priority based scratch directory selection .. Patch Set 5: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6052/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16091 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I381c3a358e1382e6696325fec74667f1fa18dd17 Gerrit-Change-Number: 16091 Gerrit-PatchSet: 5 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 25 Jun 2020 22:45:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9697: Support priority based scratch directory selection
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16091 ) Change subject: IMPALA-9697: Support priority based scratch directory selection .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16091 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I381c3a358e1382e6696325fec74667f1fa18dd17 Gerrit-Change-Number: 16091 Gerrit-PatchSet: 5 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 25 Jun 2020 22:45:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9691: Support Kudu Timestamp and Date bloom filter
Wenzhe Zhou has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/16094 ) Change subject: IMPALA-9691: Support Kudu Timestamp and Date bloom filter .. IMPALA-9691: Support Kudu Timestamp and Date bloom filter Impala save timestamp as 12 bytes of structure TimestampValue with time in nano seconds. Kudu store timestamp as 8 bytes of Unix Time microseconds. To avoid the data truncation issue in the bloom filter, add FunctionCallExpr with 'utc_to_unix_micros' as the root of source expression of bloom filter to convert timestamp values to microseconds when building timestamp bloom filter for Kudu. Generated functional date_tbl table in Kudu format for unit-test. Added new test cases for Kudu Timestamp and Date bloom filters. Testing: Passed all core tests. Change-Id: I3c1e9bcc9fd6d79a39f25eaa3396188fc0a52a48 --- M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test M testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test 7 files changed, 299 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/16094/5 -- To view, visit http://gerrit.cloudera.org:8080/16094 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3c1e9bcc9fd6d79a39f25eaa3396188fc0a52a48 Gerrit-Change-Number: 16094 Gerrit-PatchSet: 5 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-9691: Support Kudu Timestamp and Date bloom filter
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/16094 ) Change subject: IMPALA-9691: Support Kudu Timestamp and Date bloom filter .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/16094/4/fe/src/main/java/org/apache/impala/analysis/TimestampTruncationExpr.java File fe/src/main/java/org/apache/impala/analysis/TimestampTruncationExpr.java: http://gerrit.cloudera.org:8080/#/c/16094/4/fe/src/main/java/org/apache/impala/analysis/TimestampTruncationExpr.java@41 PS4, Line 41: public class TimestampTruncationExpr extends Expr { > So I'm sorry I didn't think of this before and wasted your time, but I real Right, the code change will be much more simple and don't need to make any change in backend. Thanks. -- To view, visit http://gerrit.cloudera.org:8080/16094 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3c1e9bcc9fd6d79a39f25eaa3396188fc0a52a48 Gerrit-Change-Number: 16094 Gerrit-PatchSet: 4 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 25 Jun 2020 22:18:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9829: Add Write Metrics for Spilling
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16083 ) Change subject: IMPALA-9829: Add Write Metrics for Spilling .. Patch Set 13: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6425/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16083 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I152b9c5339cedabe33f8873a2bbf651aa5dbb914 Gerrit-Change-Number: 16083 Gerrit-PatchSet: 13 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 25 Jun 2020 22:14:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9697: Support priority based scratch directory selection
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16091 ) Change subject: IMPALA-9697: Support priority based scratch directory selection .. Patch Set 4: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6049/ -- To view, visit http://gerrit.cloudera.org:8080/16091 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I381c3a358e1382e6696325fec74667f1fa18dd17 Gerrit-Change-Number: 16091 Gerrit-PatchSet: 4 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 25 Jun 2020 22:09:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2658: Extend the NDV function to accept a precision
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/15997 ) Change subject: IMPALA-2658: Extend the NDV function to accept a precision .. IMPALA-2658: Extend the NDV function to accept a precision This work addresses the current limitation in NDV function by extending the function to optionally take a secondary argument called scale. NDV([DISTINCT | ALL] expression [, scale]) Without the secondary argument, all the syntax and semantics are preserved. The precision, which determines the total number of different estimators in the HLL algorithm, is still 10. When supplied, the scale argument must be an interger literal in the range from 1 to 10. Its value is internally mapped to a precision used by the HLL algorithm, with the following mapping formula: precision = scale + 8. Thus, a scale of 1 is mapped to a precision of 9 and a scale of 10 is mapped to a precision of 18. A large precision value generally produces a better estimation (i.e. with less error) than a small precision value, due to extra number of estimators involved. The expense is at the extra amount of memory needed. For a given precision p, the amount of memory used by the HLL algorithm is in the order of 2^p bytes. Testing: 1. Ran unit tests against table store_sales in TPC-DS and table customer in TPCH in both serial and parallel plan settings; 2. Added and ran a new regression test (test_ndv)) in TestAggregationQueries section to compute NDV() for every supported Impala data type over all valid scale values; 3. Ran "core" tests. Performance: 1. Ran estimation error tests against a total of 22 distinct data sets loaded into external Impala tables. The error was computed as abs( - ) / . Overall, the precision of 18 (or the scale value of 10) gave the best result with worst estimation error at 0.42% (for one set of 10 million integers), and average error no more than 0.17%, at the cost of 256Kb of memory for the internal data structure per evaluation of the HLL algorithm. Other precisions (such as 16 and 17) were also very reasonable but with slightly larger estimation errors. 2. Ran execution time tests against a total of 6 distinct data files on a single node EC2 VM in debug mode. These data files were loaded in turn into a single column in an external Impala table. It was found that the total execution time was relatively the same across different scales for a given table configuration. It remains to be seen the execution time for tables involving multiple data files across multiple nodes. 3. Ran execution time tests comparing the before- and after-enhancement version of NDV(). Change-Id: I48a4517bd0959f7021143073d37505a46c551a58 Reviewed-on: http://gerrit.cloudera.org:8080/15997 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/common/logging.h M be/src/exec/incr-stats-util-test.cc M be/src/exec/incr-stats-util.cc M be/src/exec/incr-stats-util.h M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M tests/query_test/test_aggregation.py 9 files changed, 426 insertions(+), 82 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/15997 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I48a4517bd0959f7021143073d37505a46c551a58 Gerrit-Change-Number: 15997 Gerrit-PatchSet: 42 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar
[Impala-ASF-CR] IMPALA-2658: Extend the NDV function to accept a precision
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15997 ) Change subject: IMPALA-2658: Extend the NDV function to accept a precision .. Patch Set 41: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/15997 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I48a4517bd0959f7021143073d37505a46c551a58 Gerrit-Change-Number: 15997 Gerrit-PatchSet: 41 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Thu, 25 Jun 2020 21:55:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9829: Add Write Metrics for Spilling
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/16083 ) Change subject: IMPALA-9829: Add Write Metrics for Spilling .. Patch Set 13: (13 comments) http://gerrit.cloudera.org:8080/#/c/16083/10//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16083/10//COMMIT_MSG@9 PS10, Line 9: Currently, we have read metrics for spilling, in this patch, we add > Nit: in the commit message it can be best to be more high level, and descri Thank you for the suggestion. Have added a summary of the task and changed for some descriptive terms in the commit message. http://gerrit.cloudera.org:8080/#/c/16083/10/be/src/runtime/io/disk-io-mgr-test.cc File be/src/runtime/io/disk-io-mgr-test.cc: http://gerrit.cloudera.org:8080/#/c/16083/10/be/src/runtime/io/disk-io-mgr-test.cc@1648 PS10, Line 1648: // the write operations. > Finish comment with a period. Done http://gerrit.cloudera.org:8080/#/c/16083/10/be/src/runtime/io/disk-io-mgr-test.cc@1664 PS10, Line 1664: // Reset the Metric if it exists. > Add period. Done http://gerrit.cloudera.org:8080/#/c/16083/10/be/src/runtime/io/disk-io-mgr-test.cc@1686 PS10, Line 1686: // Issue a number of writes to the disks. > Add period. Done http://gerrit.cloudera.org:8080/#/c/16083/10/be/src/runtime/io/disk-io-mgr-test.cc@1706 PS10, Line 1706: // Check the count and max/min of the histogram metric. > Add period. Done http://gerrit.cloudera.org:8080/#/c/16083/10/be/src/runtime/io/disk-io-mgr-test.cc@1712 PS10, Line 1712: // The count should be added by num_ranges/num_disks per disk. > Add period. Done http://gerrit.cloudera.org:8080/#/c/16083/10/be/src/runtime/io/disk-io-mgr-test.cc@1714 PS10, Line 1714: // Check if the min and max of write size are the same as the written len. > Add period. Done http://gerrit.cloudera.org:8080/#/c/16083/10/be/src/runtime/io/disk-io-mgr-test.cc@1732 PS10, Line 1732: // Issue a writing operation to a non-existent tmp file path. > Add periods Done http://gerrit.cloudera.org:8080/#/c/16083/10/be/src/runtime/io/disk-io-mgr-test.cc@1737 PS10, Line 1737: string tmp_file = "/non-existent/file.txt"; > Another test uses "/non-existent/file.txt" to indicate a non-existing file, Done http://gerrit.cloudera.org:8080/#/c/16083/10/be/src/runtime/io/disk-io-mgr-test.cc@1739 PS10, Line 1739: // Reset the Metric if it exists. > Add period. Done http://gerrit.cloudera.org:8080/#/c/16083/10/be/src/runtime/io/disk-io-mgr-test.cc@1746 PS10, Line 1746: // Remove the path in case it exists. > Add period. Done http://gerrit.cloudera.org:8080/#/c/16083/10/be/src/runtime/io/disk-io-mgr-test.cc@1767 PS10, Line 1767: // One IO Error should be added to the metrics counter. > Add period. Done http://gerrit.cloudera.org:8080/#/c/16083/10/common/thrift/metrics.json File common/thrift/metrics.json: http://gerrit.cloudera.org:8080/#/c/16083/10/common/thrift/metrics.json@663 PS10, Line 663: "description": "The number of write io errors on disk.", > Should be "errors". Done -- To view, visit http://gerrit.cloudera.org:8080/16083 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I152b9c5339cedabe33f8873a2bbf651aa5dbb914 Gerrit-Change-Number: 16083 Gerrit-PatchSet: 13 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 25 Jun 2020 21:51:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9829: Add Write Metrics for Spilling
Yida Wu has uploaded a new patch set (#13). ( http://gerrit.cloudera.org:8080/16083 ) Change subject: IMPALA-9829: Add Write Metrics for Spilling .. IMPALA-9829: Add Write Metrics for Spilling Currently, we have read metrics for spilling, in this patch, we add support for write metrics. The new metrics could be useful to measure the write operations and target performance issues when involving in spilling to remote disks(S3) (IMPALA-9828). The metrics added record the information includes: 1. write latency of each write operation to the disk, metric kind: HistogramMetric, unit: nanosecond. 2. write size of each write operation to the disk, metric kind: HistogramMetric, unit: Bytes. 3. number of write IO errors when writing to the disk, metric kind: IntCounter. Testing: * added DiskIoMgrTest.MetricsOfWriteSizeAndLatency * added DiskIoMgrTest.MetricsOfWriteIoError Ran unit test disk-io-mgr-test and pre-commit test Change-Id: I152b9c5339cedabe33f8873a2bbf651aa5dbb914 --- M be/src/runtime/io/disk-io-mgr-internal.h M be/src/runtime/io/disk-io-mgr-test.cc M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h M be/src/util/histogram-metric.h M common/thrift/metrics.json 6 files changed, 258 insertions(+), 15 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/16083/13 -- To view, visit http://gerrit.cloudera.org:8080/16083 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I152b9c5339cedabe33f8873a2bbf651aa5dbb914 Gerrit-Change-Number: 16083 Gerrit-PatchSet: 13 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu
[Impala-ASF-CR] IMPALA-9829: Add Write Metrics for Spilling
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16083 ) Change subject: IMPALA-9829: Add Write Metrics for Spilling .. Patch Set 11: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6424/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16083 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I152b9c5339cedabe33f8873a2bbf651aa5dbb914 Gerrit-Change-Number: 16083 Gerrit-PatchSet: 11 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 25 Jun 2020 21:43:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9829: Add Write Metrics for Spilling
Yida Wu has uploaded a new patch set (#11). ( http://gerrit.cloudera.org:8080/16083 ) Change subject: IMPALA-9829: Add Write Metrics for Spilling .. IMPALA-9829: Add Write Metrics for Spilling Currently, we have read metrics for spilling, in this patch, we add support for write metrics. The new metrics could be useful to measure the write operations and target performance issues when involving in spilling to remote disks(S3) (IMPALA-9828). Three types of metrics are added in disk-io-mgr: 1. impala-server.io-mgr.queue-$0.write-latency, unit: ns, kind: HistogramMetric 2. impala-server.io-mgr.queue-$0.write-size, unit: Bytes, kind: HistogramMetric 3. impala-server.io-mgr.queue-$0.write-io-error, kind: IntCounter Write size, latency and io errors will be recorded in impala::io::DiskIoMgr::Write. Testing: * added DiskIoMgrTest.MetricsOfWriteSizeAndLatency * added DiskIoMgrTest.MetricsOfWriteIoError Ran unit test disk-io-mgr-test and pre-commit test Change-Id: I152b9c5339cedabe33f8873a2bbf651aa5dbb914 --- M be/src/runtime/io/disk-io-mgr-internal.h M be/src/runtime/io/disk-io-mgr-test.cc M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h M be/src/util/histogram-metric.h M common/thrift/metrics.json 6 files changed, 258 insertions(+), 15 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/16083/11 -- To view, visit http://gerrit.cloudera.org:8080/16083 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I152b9c5339cedabe33f8873a2bbf651aa5dbb914 Gerrit-Change-Number: 16083 Gerrit-PatchSet: 11 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu
[Impala-ASF-CR] IMPALA-9691: Support Kudu Timestamp and Date bloom filter
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/16094 ) Change subject: IMPALA-9691: Support Kudu Timestamp and Date bloom filter .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/16094/4/fe/src/main/java/org/apache/impala/analysis/TimestampTruncationExpr.java File fe/src/main/java/org/apache/impala/analysis/TimestampTruncationExpr.java: http://gerrit.cloudera.org:8080/#/c/16094/4/fe/src/main/java/org/apache/impala/analysis/TimestampTruncationExpr.java@41 PS4, Line 41: public class TimestampTruncationExpr extends Expr { So I'm sorry I didn't think of this before and wasted your time, but I realized I don't think this is actually necessary. There's an existing function called 'utc_to_unix_micros' that does exactly what you're doing here. All you should need to do is create a FunctionCallExpr with that function name in RuntimeFilterGenerator. -- To view, visit http://gerrit.cloudera.org:8080/16094 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3c1e9bcc9fd6d79a39f25eaa3396188fc0a52a48 Gerrit-Change-Number: 16094 Gerrit-PatchSet: 4 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 25 Jun 2020 20:27:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 12: (1 comment) http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc File be/src/exec/sort-node.cc: http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc@90 PS12, Line 90: GetRowSize > I didn't dig too deep, but row_descriptor_->GetRowSize() seems to contain t I will look at possibility to access that average size data in the backend. But just to make sure I get it right. For row that contain varlen data, the GetRowSize() will most likely underestimate the size, since it only takes account for the pointer, but not the string length itself? So that, in turn, will cause return value of this ComputeInputSizeEstimate() to be underestimate as well. But isn't this input size underestimation better than overestimation? In case of underestimation, the worse situation is that we don't enforce sort_run_bytes_limit for the first run (hoping that all will fit in memory), turns out wrong and spill, but we then enforce sort_run_bytes_limit for the next runs. Overestimation is worse, because we unnecessarily spill from beginning when the input can possibly fit in the memory. -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 12 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 25 Jun 2020 20:09:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8984: Uncorrelated scalar subqueries in the select list
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16007 ) Change subject: IMPALA-8984: Uncorrelated scalar subqueries in the select list .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6422/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16007 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibcf55d26889aa01d69bb85f18c9241dda095fb66 Gerrit-Change-Number: 16007 Gerrit-PatchSet: 4 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 25 Jun 2020 20:01:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9784: Non correlated subqueries in HAVING.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16052 ) Change subject: IMPALA-9784: Non correlated subqueries in HAVING. .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6423/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16052 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I124a58a09a1a47e1222a22d84b54fe7d07844461 Gerrit-Change-Number: 16052 Gerrit-PatchSet: 2 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 25 Jun 2020 19:57:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9784: Non correlated subqueries in HAVING.
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16052 ) Change subject: IMPALA-9784: Non correlated subqueries in HAVING. .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/16052/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java: http://gerrit.cloudera.org:8080/#/c/16052/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@466 PS2, Line 466: // TODO: IMPALA-5100 to cover all cases, we do let through runtime scalars with I relaxed some of these rules to let through subqueries such as (select count(a) from t group by b where b=1). Referenced the jira to enhance the scalar subquery planner checks to handle more expression evaluation but for now thought the tradeoff was better to let these queries through wrapped in a CardinalityCheckNode. There are case where two different runtime scalar subqueries in nested query blocks could run and have runtime errors that interfere with each other since we don't have independent execution, but I checked and many other database (hive, vertica, vectorwise) have this kind of behavior. It feels like a worthwhile trade off to allow more queries to run where some might have a runtime error in an off chance when we'd just otherwise not let the query run at all. Also it's needed to support Q44 from TPC-DS. -- To view, visit http://gerrit.cloudera.org:8080/16052 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I124a58a09a1a47e1222a22d84b54fe7d07844461 Gerrit-Change-Number: 16052 Gerrit-PatchSet: 2 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 25 Jun 2020 19:52:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7020: fix costing of non-trivial CAST expressions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16073 ) Change subject: IMPALA-7020: fix costing of non-trivial CAST expressions .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6421/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16073 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3f1a16fc45191a2eedf38cc243c70173d44806c6 Gerrit-Change-Number: 16073 Gerrit-PatchSet: 5 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 25 Jun 2020 19:40:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8984: Uncorrelated scalar subqueries in the select list
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16007 ) Change subject: IMPALA-8984: Uncorrelated scalar subqueries in the select list .. Patch Set 4: (9 comments) Some more tests and review comments addressed. Still want to get a good test run out of jenkins. http://gerrit.cloudera.org:8080/#/c/16007/3/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java: http://gerrit.cloudera.org:8080/#/c/16007/3/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@292 PS3, Line 292: "Invariant violated: Only subqueries that are guaranteed to return a " > nit: "guaranteed" Done http://gerrit.cloudera.org:8080/#/c/16007/3/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java: http://gerrit.cloudera.org:8080/#/c/16007/3/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@937 PS3, Line 937: * supported in the FROM clause, WHERE clause and SELECT list. The rewrite is > Update this comment for SELECT clause. Done http://gerrit.cloudera.org:8080/#/c/16007/3/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1137 PS3, Line 1137: *returned per group so a run time cardinality check must be applied. An exception > nit: duplicate 'primary' Done http://gerrit.cloudera.org:8080/#/c/16007/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java: http://gerrit.cloudera.org:8080/#/c/16007/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1117 PS4, Line 1117: * Done http://gerrit.cloudera.org:8080/#/c/16007/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1117 PS4, Line 1117: * Done http://gerrit.cloudera.org:8080/#/c/16007/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1135 PS4, Line 1135: * Done http://gerrit.cloudera.org:8080/#/c/16007/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1230 PS4, Line 1230: // rewrite to a LOJ. I added a test for this. I know it feels weird but since the slotref for the subquery is marked as materialized and the other join queries get bound by the USING/ON clause, nothing explodes. Since there are scalar subqueries, the only weird situation is if the cardinality of all the joins where equal to 1 then it might get reordered but the results would still be correct. If it were a correlated subquery then we've need to handle things more carefully, but that's for a later commit. http://gerrit.cloudera.org:8080/#/c/16007/4/testdata/workloads/functional-query/queries/QueryTest/subquery.test File testdata/workloads/functional-query/queries/QueryTest/subquery.test: http://gerrit.cloudera.org:8080/#/c/16007/4/testdata/workloads/functional-query/queries/QueryTest/subquery.test@1044 PS4, Line 1044: select id, 1+(select min(id) from functional.alltypessmall) Added the tpc-ds query, it's a pretty complex plan. http://gerrit.cloudera.org:8080/#/c/16007/4/testdata/workloads/tpcds/queries/tpcds-decimal_v2-q9.test File testdata/workloads/tpcds/queries/tpcds-decimal_v2-q9.test: http://gerrit.cloudera.org:8080/#/c/16007/4/testdata/workloads/tpcds/queries/tpcds-decimal_v2-q9.test@3 PS4, Line 3: select case when (select count(*) Done -- To view, visit http://gerrit.cloudera.org:8080/16007 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibcf55d26889aa01d69bb85f18c9241dda095fb66 Gerrit-Change-Number: 16007 Gerrit-PatchSet: 4 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 25 Jun 2020 19:38:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9784: Non correlated subqueries in HAVING.
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16052 to look at the new patch set (#2). Change subject: IMPALA-9784: Non correlated subqueries in HAVING. .. IMPALA-9784: Non correlated subqueries in HAVING. Support rewriting subqueries in the HAVING clause by nesting the aggregation query and pulling up the subquery predicates into the outer WHERE clause. Testing: * New analyzer tests * New functional subquery tests * Added Q23, Q24 and Q44 to the tpcds workload Change-Id: I124a58a09a1a47e1222a22d84b54fe7d07844461 --- M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test M testdata/workloads/functional-query/queries/QueryTest/subquery.test A testdata/workloads/tpcds/queries/tpcds-q23-1.test A testdata/workloads/tpcds/queries/tpcds-q23-2.test A testdata/workloads/tpcds/queries/tpcds-q24-1.test A testdata/workloads/tpcds/queries/tpcds-q24-2.test A testdata/workloads/tpcds/queries/tpcds-q44.test M tests/query_test/test_tpcds_queries.py 11 files changed, 1,123 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/16052/2 -- To view, visit http://gerrit.cloudera.org:8080/16052 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I124a58a09a1a47e1222a22d84b54fe7d07844461 Gerrit-Change-Number: 16052 Gerrit-PatchSet: 2 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-8984: Uncorrelated scalar subqueries in the select list
Hello Aman Sinha, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16007 to look at the new patch set (#4). Change subject: IMPALA-8984: Uncorrelated scalar subqueries in the select list .. IMPALA-8984: Uncorrelated scalar subqueries in the select list Extend StmtRewriter with the ability to rewrite scalar subqueries in the select list into cross joins. Currently the subquery must pass plan-time checks to determine that it returns a single row which may miss cases that may be valid at runtime or with more complex evaluation of the predicate expressions in the planner. Support for correlated subqueries will be a follow on change. With this change Q9 of TPC-DS is supported, we now load the 'reasons' table as part of the TPC-DS workload for use by Q9. Testing: * Added new analyzer tests, updated previous subquery tests * test_queries.py::TestQueries::test_subquery Change-Id: Ibcf55d26889aa01d69bb85f18c9241dda095fb66 --- M fe/src/main/java/org/apache/impala/analysis/SelectList.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java M testdata/datasets/tpcds/tpcds_schema_template.sql M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test M testdata/workloads/functional-query/queries/QueryTest/subquery.test M testdata/workloads/tpcds/queries/count.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q9.test A testdata/workloads/tpcds/queries/tpcds-q9.test M tests/query_test/test_tpcds_queries.py 11 files changed, 1,455 insertions(+), 16 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/16007/4 -- To view, visit http://gerrit.cloudera.org:8080/16007 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibcf55d26889aa01d69bb85f18c9241dda095fb66 Gerrit-Change-Number: 16007 Gerrit-PatchSet: 4 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-8253: Parquet delta encoding and decoding.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/12621 ) Change subject: IMPALA-8253: Parquet delta encoding and decoding. .. Patch Set 16: Hmm, this CR somehow got forgotten. Anyway, I'm planning to take a look in the following days. Daniel, do you plan to continue this work? -- To view, visit http://gerrit.cloudera.org:8080/12621 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8 Gerrit-Change-Number: 12621 Gerrit-PatchSet: 16 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 25 Jun 2020 19:12:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7020: fix costing of non-trivial CAST expressions
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16073 ) Change subject: IMPALA-7020: fix costing of non-trivial CAST expressions .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16073 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3f1a16fc45191a2eedf38cc243c70173d44806c6 Gerrit-Change-Number: 16073 Gerrit-PatchSet: 5 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 25 Jun 2020 19:10:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7020: fix costing of non-trivial CAST expressions
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16073 ) Change subject: IMPALA-7020: fix costing of non-trivial CAST expressions .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/16073/4/fe/src/main/java/org/apache/impala/analysis/CastExpr.java File fe/src/main/java/org/apache/impala/analysis/CastExpr.java: http://gerrit.cloudera.org:8080/#/c/16073/4/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@a280 PS4, Line 280: > I think this isn't used anywhere now, so you could remove its definition in Done -- To view, visit http://gerrit.cloudera.org:8080/16073 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3f1a16fc45191a2eedf38cc243c70173d44806c6 Gerrit-Change-Number: 16073 Gerrit-PatchSet: 5 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 25 Jun 2020 19:10:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7020: fix costing of non-trivial CAST expressions
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16073 ) Change subject: IMPALA-7020: fix costing of non-trivial CAST expressions .. Patch Set 6: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16073 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3f1a16fc45191a2eedf38cc243c70173d44806c6 Gerrit-Change-Number: 16073 Gerrit-PatchSet: 6 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 25 Jun 2020 19:11:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7020: fix costing of non-trivial CAST expressions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16073 ) Change subject: IMPALA-7020: fix costing of non-trivial CAST expressions .. Patch Set 6: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6051/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16073 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3f1a16fc45191a2eedf38cc243c70173d44806c6 Gerrit-Change-Number: 16073 Gerrit-PatchSet: 6 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 25 Jun 2020 19:10:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7020: fix costing of non-trivial CAST expressions
Hello Aman Sinha, Thomas Tauber-Marshall, Shant Hovsepian, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16073 to look at the new patch set (#5). Change subject: IMPALA-7020: fix costing of non-trivial CAST expressions .. IMPALA-7020: fix costing of non-trivial CAST expressions Some cast operations are quite expensive to evaluate, which was not reflected in the uniform costing of CAST expresions. We fix this by increasing the cost of non-trivial casts to be the same as an arbitrary function call. Testing: Ran exhaustive tests. Add planner tests to check that CAST expressions are materialized or not based on the input and output types - the planner output lists 'materialized:' expressions for the SORT operator. A few existing planner tests had changes in predicate ordering. I checked manually that these changes made sense. Perf: I sanity-checked that this actually helped (a variant of) the example query from IMPALA-7020. The following query went from ~8s to ~2s in my dev environment: select * FROM ( SELECT o.*, ROW_NUMBER() OVER(ORDER BY evt_ts DESC) AS rn FROM ( SELECT l_orderkey,l_partkey,l_linenumber,l_quantity, cast (l_shipdate as date) evt_ts FROM tpch_parquet.lineitem ) o ) r WHERE rn BETWEEN 1 AND 101 ORDER BY rn; Change-Id: I3f1a16fc45191a2eedf38cc243c70173d44806c6 --- M fe/src/main/java/org/apache/impala/analysis/CastExpr.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test M testdata/workloads/functional-planner/queries/PlannerTest/sort-expr-materialization.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test 6 files changed, 212 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/16073/5 -- To view, visit http://gerrit.cloudera.org:8080/16073 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3f1a16fc45191a2eedf38cc243c70173d44806c6 Gerrit-Change-Number: 16073 Gerrit-PatchSet: 5 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-7020: fix costing of non-trivial CAST expressions
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/16073 ) Change subject: IMPALA-7020: fix costing of non-trivial CAST expressions .. Patch Set 4: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/16073/4/fe/src/main/java/org/apache/impala/analysis/CastExpr.java File fe/src/main/java/org/apache/impala/analysis/CastExpr.java: http://gerrit.cloudera.org:8080/#/c/16073/4/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@a280 PS4, Line 280: I think this isn't used anywhere now, so you could remove its definition in Expr.java -- To view, visit http://gerrit.cloudera.org:8080/16073 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3f1a16fc45191a2eedf38cc243c70173d44806c6 Gerrit-Change-Number: 16073 Gerrit-PatchSet: 4 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 25 Jun 2020 18:30:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9294: Support DATE for min-max runtime filter
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16103 ) Change subject: IMPALA-9294: Support DATE for min-max runtime filter .. Patch Set 2: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6050/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16103 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2f6e2dc6949735d5f0fcf317361cc2969a5e82c Gerrit-Change-Number: 16103 Gerrit-PatchSet: 2 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 25 Jun 2020 18:14:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9294: Support DATE for min-max runtime filter
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/16103 ) Change subject: IMPALA-9294: Support DATE for min-max runtime filter .. Patch Set 2: Code-Review+2 Thanks for doing this -- To view, visit http://gerrit.cloudera.org:8080/16103 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2f6e2dc6949735d5f0fcf317361cc2969a5e82c Gerrit-Change-Number: 16103 Gerrit-PatchSet: 2 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 25 Jun 2020 18:14:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 12: (1 comment) http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc File be/src/exec/sort-node.cc: http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc@90 PS12, Line 90: GetRowSize > So what is the nature of varlen column? Is each row possibly will have diff I didn't dig too deep, but row_descriptor_->GetRowSize() seems to contain the size of the tuple that holds a row - but in case of string and varchar it contains a pointer (+length), so there is additional data in some buffer. The column stats contain AvgSize and MaxSize - these are constants for fixed sized types, but we calculate them for strings during COMPUTE STATS, so we can get a more or less accurate estimation for the total amount of memory consumed. I don't know from the top of my head how to access this data in the backend. Strings are very common, so many queries contain varlen slots. I am not sure if it is a good idea to create an optimization specifically for queries without strings. -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 12 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 25 Jun 2020 18:11:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9829: Add Write Metrics for Spilling
Andrew Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/16083 ) Change subject: IMPALA-9829: Add Write Metrics for Spilling .. Patch Set 10: (13 comments) Good change! I like the detailed unit tests. I think a few cosmetic changes are all that is needed. This may seem like a picky nit but in Impala code, comments start with a capital letter and end with a period. This may seem weirdly prescriptive but it does enhance readability. http://gerrit.cloudera.org:8080/#/c/16083/10//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16083/10//COMMIT_MSG@9 PS10, Line 9: Three types of metrics are added in disk-io-mgr: Nit: in the commit message it can be best to be more high level, and describe what is added in more descriptive terms. http://gerrit.cloudera.org:8080/#/c/16083/10/be/src/runtime/io/disk-io-mgr-test.cc File be/src/runtime/io/disk-io-mgr-test.cc: http://gerrit.cloudera.org:8080/#/c/16083/10/be/src/runtime/io/disk-io-mgr-test.cc@1648 PS10, Line 1648: // the write operations Finish comment with a period. http://gerrit.cloudera.org:8080/#/c/16083/10/be/src/runtime/io/disk-io-mgr-test.cc@1664 PS10, Line 1664: // Reset the Metric if it exists Add period. http://gerrit.cloudera.org:8080/#/c/16083/10/be/src/runtime/io/disk-io-mgr-test.cc@1686 PS10, Line 1686: // Issue a number of writes to the disks Add period. http://gerrit.cloudera.org:8080/#/c/16083/10/be/src/runtime/io/disk-io-mgr-test.cc@1706 PS10, Line 1706: // Check the count and max/min of the histogram metric Add period. http://gerrit.cloudera.org:8080/#/c/16083/10/be/src/runtime/io/disk-io-mgr-test.cc@1712 PS10, Line 1712: // The count should be added by num_ranges/num_disks per disk Add period. http://gerrit.cloudera.org:8080/#/c/16083/10/be/src/runtime/io/disk-io-mgr-test.cc@1714 PS10, Line 1714: // Check if the min and max of write size are the same as the written len Add period. http://gerrit.cloudera.org:8080/#/c/16083/10/be/src/runtime/io/disk-io-mgr-test.cc@1732 PS10, Line 1732: // Issue a writing operation to a non-existent tmp file path Add periods http://gerrit.cloudera.org:8080/#/c/16083/10/be/src/runtime/io/disk-io-mgr-test.cc@1737 PS10, Line 1737: string tmp_file = "/tmp/disk_io_mgr_test/MetricsOfWriteIoError"; Another test uses "/non-existent/file.txt" to indicate a non-existing file, this makes it clearer what is happening http://gerrit.cloudera.org:8080/#/c/16083/10/be/src/runtime/io/disk-io-mgr-test.cc@1739 PS10, Line 1739: // Reset the Metric if it exists Add period. http://gerrit.cloudera.org:8080/#/c/16083/10/be/src/runtime/io/disk-io-mgr-test.cc@1746 PS10, Line 1746: // Remove the path in case it exists Add period. http://gerrit.cloudera.org:8080/#/c/16083/10/be/src/runtime/io/disk-io-mgr-test.cc@1767 PS10, Line 1767: // One IO Error should be added to the metrics counter Add period. http://gerrit.cloudera.org:8080/#/c/16083/10/common/thrift/metrics.json File common/thrift/metrics.json: http://gerrit.cloudera.org:8080/#/c/16083/10/common/thrift/metrics.json@663 PS10, Line 663: "description": "The number of write io error on disk.", Should be "errors". -- To view, visit http://gerrit.cloudera.org:8080/16083 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I152b9c5339cedabe33f8873a2bbf651aa5dbb914 Gerrit-Change-Number: 16083 Gerrit-PatchSet: 10 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 25 Jun 2020 17:36:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9294: Support DATE for min-max runtime filter
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16103 ) Change subject: IMPALA-9294: Support DATE for min-max runtime filter .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6420/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16103 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2f6e2dc6949735d5f0fcf317361cc2969a5e82c Gerrit-Change-Number: 16103 Gerrit-PatchSet: 2 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 25 Jun 2020 17:34:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9790: option to use resolved hostname everywhere
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16108 ) Change subject: IMPALA-9790: option to use resolved hostname everywhere .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6419/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16108 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0d5cb9c68c60ce8dc838cde9dcf1c590017f5c9a Gerrit-Change-Number: 16108 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Comment-Date: Thu, 25 Jun 2020 17:17:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9294: Support DATE for min-max runtime filter
Wenzhe Zhou has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/16103 ) Change subject: IMPALA-9294: Support DATE for min-max runtime filter .. IMPALA-9294: Support DATE for min-max runtime filter Implemented Date min-max filter and applied it to Kudu as other min-max runtime filters. Added new test cases for Date min-max filters. Testing: Passed all core tests. Change-Id: Ic2f6e2dc6949735d5f0fcf317361cc2969a5e82c --- M be/src/codegen/gen_ir_descriptions.py M be/src/runtime/date-value.h M be/src/util/min-max-filter-ir.cc M be/src/util/min-max-filter-test.cc M be/src/util/min-max-filter.cc M be/src/util/min-max-filter.h M common/protobuf/common.proto M common/thrift/Data.thrift M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test M testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test M testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test 12 files changed, 274 insertions(+), 164 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/16103/2 -- To view, visit http://gerrit.cloudera.org:8080/16103 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic2f6e2dc6949735d5f0fcf317361cc2969a5e82c Gerrit-Change-Number: 16103 Gerrit-PatchSet: 2 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-9697: Support priority based scratch directory selection
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16091 ) Change subject: IMPALA-9697: Support priority based scratch directory selection .. Patch Set 3: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/16091/2/be/src/runtime/tmp-file-mgr.cc File be/src/runtime/tmp-file-mgr.cc: http://gerrit.cloudera.org:8080/#/c/16091/2/be/src/runtime/tmp-file-mgr.cc@540 PS2, Line 540: for (int index = start; index <= end; ++index) { > Maybe print the values in the DCHECK error Checked that this was fixed -- To view, visit http://gerrit.cloudera.org:8080/16091 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I381c3a358e1382e6696325fec74667f1fa18dd17 Gerrit-Change-Number: 16091 Gerrit-PatchSet: 3 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 25 Jun 2020 17:04:35 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9697: Support priority based scratch directory selection
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16091 ) Change subject: IMPALA-9697: Support priority based scratch directory selection .. Patch Set 4: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6049/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16091 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I381c3a358e1382e6696325fec74667f1fa18dd17 Gerrit-Change-Number: 16091 Gerrit-PatchSet: 4 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 25 Jun 2020 17:04:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9697: Support priority based scratch directory selection
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16091 ) Change subject: IMPALA-9697: Support priority based scratch directory selection .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16091 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I381c3a358e1382e6696325fec74667f1fa18dd17 Gerrit-Change-Number: 16091 Gerrit-PatchSet: 4 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 25 Jun 2020 17:04:48 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2658: Extend the NDV function to accept a precision
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15997 ) Change subject: IMPALA-2658: Extend the NDV function to accept a precision .. Patch Set 41: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6048/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/15997 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I48a4517bd0959f7021143073d37505a46c551a58 Gerrit-Change-Number: 15997 Gerrit-PatchSet: 41 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Thu, 25 Jun 2020 16:56:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2658: Extend the NDV function to accept a precision
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15997 ) Change subject: IMPALA-2658: Extend the NDV function to accept a precision .. Patch Set 41: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/15997 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I48a4517bd0959f7021143073d37505a46c551a58 Gerrit-Change-Number: 15997 Gerrit-PatchSet: 41 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Thu, 25 Jun 2020 16:56:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2658: Extend the NDV function to accept a precision
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/15997 ) Change subject: IMPALA-2658: Extend the NDV function to accept a precision .. Patch Set 40: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/15997 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I48a4517bd0959f7021143073d37505a46c551a58 Gerrit-Change-Number: 15997 Gerrit-PatchSet: 40 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Thu, 25 Jun 2020 16:56:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7020: fix costing of non-trivial CAST expressions
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16073 ) Change subject: IMPALA-7020: fix costing of non-trivial CAST expressions .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/16073/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16073/4//COMMIT_MSG@11 PS4, Line 11: expresions spelling -- To view, visit http://gerrit.cloudera.org:8080/16073 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3f1a16fc45191a2eedf38cc243c70173d44806c6 Gerrit-Change-Number: 16073 Gerrit-PatchSet: 4 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 25 Jun 2020 16:52:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9515: Full ACID Milestone 3: Read support for "original files"
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16001 ) Change subject: IMPALA-9515: Full ACID Milestone 3: Read support for "original files" .. Patch Set 11: Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6047/ -- To view, visit http://gerrit.cloudera.org:8080/16001 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I176497ef9873ed7589bd3dee07d048a42dfad953 Gerrit-Change-Number: 16001 Gerrit-PatchSet: 11 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 25 Jun 2020 16:51:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9294: Support DATE for min-max runtime filter
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/16103 ) Change subject: IMPALA-9294: Support DATE for min-max runtime filter .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/16103/1/be/src/util/min-max-filter.h File be/src/util/min-max-filter.h: http://gerrit.cloudera.org:8080/#/c/16103/1/be/src/util/min-max-filter.h@250 PS1, Line 250: class DateMinMaxFilter : public MinMaxFilter { > Oh, I actually meant macro, which is more consistent with the rest of this Will define macros for timestamp and date as suggested. -- To view, visit http://gerrit.cloudera.org:8080/16103 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2f6e2dc6949735d5f0fcf317361cc2969a5e82c Gerrit-Change-Number: 16103 Gerrit-PatchSet: 1 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 25 Jun 2020 16:51:24 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9790: option to use resolved hostname everywhere
Tim Armstrong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16108 Change subject: IMPALA-9790: option to use resolved hostname everywhere .. IMPALA-9790: option to use resolved hostname everywhere This adds a flag --use_resolved_hostname, which replaces --hostname with a resolved IP on startup. This is useful for containerized environments where the hostname -> IP mapping can be very dynamic. This flag is used by default in the dockerized minicluster. This also fixes a bug in the test code that incorrectly identified command line flags. Specifically it only checked the suffix, so it confused use_resolved_hostname and hostname. Change-Id: I0d5cb9c68c60ce8dc838cde9dcf1c590017f5c9a --- M be/src/common/global-flags.cc M be/src/common/init.cc M docker/catalogd/Dockerfile M docker/impalad_coord_exec/Dockerfile M docker/impalad_coordinator/Dockerfile M docker/impalad_executor/Dockerfile M docker/statestored/Dockerfile M tests/common/impala_cluster.py 8 files changed, 23 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/16108/2 -- To view, visit http://gerrit.cloudera.org:8080/16108 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I0d5cb9c68c60ce8dc838cde9dcf1c590017f5c9a Gerrit-Change-Number: 16108 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 12: (3 comments) Thank you Csaba for your feedback! I have couple follow up questions. http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc File be/src/exec/sort-node.cc: http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc@89 PS12, Line 89: cardinality > What is the default value of this? Can it be -1 (unknown)? The result seems Ok, in that case, we should just abandon estimate in case of cardinality is -1. http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc@90 PS12, Line 90: GetRowSize > I think that this doesn't contain varlen data, so it can greatly underestim So what is the nature of varlen column? Is each row possibly will have different sizes with large variations? And what is GetRowSize() return in that case? Thinking if we should abandon the estimate entirely for input rows having varlen data. http://gerrit.cloudera.org:8080/#/c/15963/12/tests/query_test/test_sort.py File tests/query_test/test_sort.py: http://gerrit.cloudera.org:8080/#/c/15963/12/tests/query_test/test_sort.py@74 PS12, Line 74: """The first sort run is given a privilege to ignore sort_run_bytes_limit, except :when estimate hints that spill is inevitable. The lower sort_run_bytes_limit of :a query is, the more sort runs are likely to be produced. :Case 1 : 1 run produced, because all rows fit within the maximum reservation. : sort_run_bytes_limit is not enforced. :Case 2 : 3 run produced, because the first run hit reservation limit, and the : next 2 runs are capped to 150m. :Case 3 : 4 run produced, because sort node estimate that spill is inevitable. : So all runs are capped to 130m, including the first one.""" > Isn't there something in query_result.runtime_profile that could be used to I will look at that 'query_result.runtime_profile'. Otherwise, I will change this test to run_test_case and verify the profile via regex. -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 12 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 25 Jun 2020 16:15:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 12: (13 comments) Sorry for the many grammar comments, I was also the victim of this in the past :) My only real concern is about the case when the cardinality is unknown. My preference would be to try to allow spilling in that case. http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.h File be/src/exec/sort-node.h: http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.h@77 PS12, Line 77: going nit: "will go"? http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc File be/src/exec/sort-node.cc: http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc@89 PS12, Line 89: cardinality What is the default value of this? Can it be -1 (unknown)? The result seems pretty wrong in that case. http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc@90 PS12, Line 90: GetRowSize I think that this doesn't contain varlen data, so it can greatly underestimate the input size if there are strings. http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc@92 PS12, Line 92: 2) I think that VLOG(3) is enough. http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h File be/src/runtime/sorter.h: http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h@101 PS12, Line 101: a "the" would be better http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h@101 PS12, Line 101: /// 'estimated_input_size' is a total rows in bytes that estimated to get added into nit: missing "are" http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h@102 PS12, Line 102: /// this sorter. This is used to decide if sorter need to proactively spilling for nit: needs http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h@102 PS12, Line 102: spilling nit: spill http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h@223 PS12, Line 223: run nit: "do an"? http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.cc File be/src/runtime/sorter.cc: http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.cc@816 PS12, Line 816: VLOG(2) << Substitute( I think that VLOG(3) is enough here - this should happen if the cardinality estimation was wrong, which may make WARNING logical, but this seems unavoidable for many queries, so I wouldn't spam the warning log. http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.cc@907 PS12, Line 907: VLOG(2) << Substitute( Same as line 816. http://gerrit.cloudera.org:8080/#/c/15963/12/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift: http://gerrit.cloudera.org:8080/#/c/15963/12/common/thrift/ImpalaInternalService.thrift@645 PS12, Line 645: backeds typo: backends http://gerrit.cloudera.org:8080/#/c/15963/12/tests/query_test/test_sort.py File tests/query_test/test_sort.py: http://gerrit.cloudera.org:8080/#/c/15963/12/tests/query_test/test_sort.py@74 PS12, Line 74: """The first sort run is given a privilege to ignore sort_run_bytes_limit, except :when estimate hints that spill is inevitable. The lower sort_run_bytes_limit of :a query is, the more sort runs are likely to be produced. :Case 1 : 1 run produced, because all rows fit within the maximum reservation. : sort_run_bytes_limit is not enforced. :Case 2 : 3 run produced, because the first run hit reservation limit, and the : next 2 runs are capped to 150m. :Case 3 : 4 run produced, because sort node estimate that spill is inevitable. : So all runs are capped to 130m, including the first one.""" Isn't there something in query_result.runtime_profile that could be used to check some of these statements? E.g. I think we can check that no spilling occurred for case 1, but it did occur for case 2 and 3 -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 12 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 25 Jun 2020 15:00:47 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9829: Add Write Metrics for Spilling
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16083 ) Change subject: IMPALA-9829: Add Write Metrics for Spilling .. Patch Set 10: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6418/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16083 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I152b9c5339cedabe33f8873a2bbf651aa5dbb914 Gerrit-Change-Number: 16083 Gerrit-PatchSet: 10 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 25 Jun 2020 14:59:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9829: Add Write Metrics for Spilling
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/16083 ) Change subject: IMPALA-9829: Add Write Metrics for Spilling .. Patch Set 10: (10 comments) http://gerrit.cloudera.org:8080/#/c/16083/8/be/src/runtime/io/disk-io-mgr-test.cc File be/src/runtime/io/disk-io-mgr-test.cc: http://gerrit.cloudera.org:8080/#/c/16083/8/be/src/runtime/io/disk-io-mgr-test.cc@1665 PS8, Line 1665: for (int i = 0; i < num_disks; i++) { > Missing spaces: Have got the clang-format-diff setting done. The style problem should be solved now. http://gerrit.cloudera.org:8080/#/c/16083/8/be/src/runtime/io/disk-io-mgr-test.cc@1666 PS8, Line 1666: string key_prefix = "impala-server.io-mgr.queue-"; > The space at the end before the semicolon can probably be removed. Same app Done http://gerrit.cloudera.org:8080/#/c/16083/8/be/src/runtime/io/disk-io-mgr-test.cc@1673 PS8, Line 1673: auto write_latency_org = > Use proper spacing like follows: Done http://gerrit.cloudera.org:8080/#/c/16083/8/be/src/runtime/io/disk-io-mgr-test.cc@1677 PS8, Line 1677: if (write_latency_org != nullptr) write_latency_org->Reset(); > Use proper spacing like follows: Done http://gerrit.cloudera.org:8080/#/c/16083/8/be/src/runtime/io/disk-io-mgr-test.cc@1699 PS8, Line 1699: ASSERT_OK(writer->AddWriteRange(*new_range)); > Note the disk id is always 0 because of `num_ranges % num_disks`. Was the i Done http://gerrit.cloudera.org:8080/#/c/16083/8/be/src/runtime/io/disk-io-mgr-test.cc@1711 PS8, Line 1711: uint64_t max_value = metric->MaxValue(); > Suggest removing the space before semicolon. Done http://gerrit.cloudera.org:8080/#/c/16083/8/be/src/runtime/io/disk-io-mgr-test.cc@1721 PS8, Line 1721: for (int i = 0; i < num_disks; i++) { > Don't think you need the if else block here. The code right now is only usi Done http://gerrit.cloudera.org:8080/#/c/16083/8/be/src/runtime/io/disk-io-mgr-test.cc@1735 PS8, Line 1735: InitRootReservation(LARGE_RESERVATION_LIMIT); > Missing space after 'for' Done http://gerrit.cloudera.org:8080/#/c/16083/8/be/src/runtime/io/disk-io-mgr-test.cc@1783 PS8, Line 1783: > The space before closing bracket could be removed. Done http://gerrit.cloudera.org:8080/#/c/16083/8/be/src/util/histogram-metric.h File be/src/util/histogram-metric.h: http://gerrit.cloudera.org:8080/#/c/16083/8/be/src/util/histogram-metric.h@49 PS8, Line 49: uint64_t MinValue() const { return histogram_->MinValue(); } > The preferred style is no space before the semicolon, but space between sem Done -- To view, visit http://gerrit.cloudera.org:8080/16083 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I152b9c5339cedabe33f8873a2bbf651aa5dbb914 Gerrit-Change-Number: 16083 Gerrit-PatchSet: 10 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 25 Jun 2020 14:32:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9829: Add Write Metrics for Spilling
Yida Wu has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/16083 ) Change subject: IMPALA-9829: Add Write Metrics for Spilling .. IMPALA-9829: Add Write Metrics for Spilling Three types of metrics are added in disk-io-mgr: 1. impala-server.io-mgr.queue-$0.write-latency, unit: ns, kind: HistogramMetric 2. impala-server.io-mgr.queue-$0.write-size, unit: Bytes, kind: HistogramMetric 3. impala-server.io-mgr.queue-$0.write-io-error, kind: IntCounter Write size, latency and io errors will be recorded in impala::io::DiskIoMgr::Write. Testing: * added DiskIoMgrTest.MetricsOfWriteSizeAndLatency * added DiskIoMgrTest.MetricsOfWriteIoError Ran unit test disk-io-mgr-test and pre-commit test Change-Id: I152b9c5339cedabe33f8873a2bbf651aa5dbb914 --- M be/src/runtime/io/disk-io-mgr-internal.h M be/src/runtime/io/disk-io-mgr-test.cc M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h M be/src/util/histogram-metric.h M common/thrift/metrics.json 6 files changed, 258 insertions(+), 15 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/16083/10 -- To view, visit http://gerrit.cloudera.org:8080/16083 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I152b9c5339cedabe33f8873a2bbf651aa5dbb914 Gerrit-Change-Number: 16083 Gerrit-PatchSet: 10 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu
[Impala-ASF-CR] IMPALA-2658: Extend the NDV function to accept a precision
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15997 ) Change subject: IMPALA-2658: Extend the NDV function to accept a precision .. Patch Set 40: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6417/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15997 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I48a4517bd0959f7021143073d37505a46c551a58 Gerrit-Change-Number: 15997 Gerrit-PatchSet: 40 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Thu, 25 Jun 2020 14:28:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2658: Extend the NDV function to accept a precision
Qifan Chen has uploaded a new patch set (#40). ( http://gerrit.cloudera.org:8080/15997 ) Change subject: IMPALA-2658: Extend the NDV function to accept a precision .. IMPALA-2658: Extend the NDV function to accept a precision This work addresses the current limitation in NDV function by extending the function to optionally take a secondary argument called scale. NDV([DISTINCT | ALL] expression [, scale]) Without the secondary argument, all the syntax and semantics are preserved. The precision, which determines the total number of different estimators in the HLL algorithm, is still 10. When supplied, the scale argument must be an interger literal in the range from 1 to 10. Its value is internally mapped to a precision used by the HLL algorithm, with the following mapping formula: precision = scale + 8. Thus, a scale of 1 is mapped to a precision of 9 and a scale of 10 is mapped to a precision of 18. A large precision value generally produces a better estimation (i.e. with less error) than a small precision value, due to extra number of estimators involved. The expense is at the extra amount of memory needed. For a given precision p, the amount of memory used by the HLL algorithm is in the order of 2^p bytes. Testing: 1. Ran unit tests against table store_sales in TPC-DS and table customer in TPCH in both serial and parallel plan settings; 2. Added and ran a new regression test (test_ndv)) in TestAggregationQueries section to compute NDV() for every supported Impala data type over all valid scale values; 3. Ran "core" tests. Performance: 1. Ran estimation error tests against a total of 22 distinct data sets loaded into external Impala tables. The error was computed as abs( - ) / . Overall, the precision of 18 (or the scale value of 10) gave the best result with worst estimation error at 0.42% (for one set of 10 million integers), and average error no more than 0.17%, at the cost of 256Kb of memory for the internal data structure per evaluation of the HLL algorithm. Other precisions (such as 16 and 17) were also very reasonable but with slightly larger estimation errors. 2. Ran execution time tests against a total of 6 distinct data files on a single node EC2 VM in debug mode. These data files were loaded in turn into a single column in an external Impala table. It was found that the total execution time was relatively the same across different scales for a given table configuration. It remains to be seen the execution time for tables involving multiple data files across multiple nodes. 3. Ran execution time tests comparing the before- and after-enhancement version of NDV(). Change-Id: I48a4517bd0959f7021143073d37505a46c551a58 --- M be/src/common/logging.h M be/src/exec/incr-stats-util-test.cc M be/src/exec/incr-stats-util.cc M be/src/exec/incr-stats-util.h M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M tests/query_test/test_aggregation.py 9 files changed, 426 insertions(+), 82 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/15997/40 -- To view, visit http://gerrit.cloudera.org:8080/15997 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I48a4517bd0959f7021143073d37505a46c551a58 Gerrit-Change-Number: 15997 Gerrit-PatchSet: 40 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar
[Impala-ASF-CR] IMPALA-9515: Full ACID Milestone 3: Read support for "original files"
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16001 ) Change subject: IMPALA-9515: Full ACID Milestone 3: Read support for "original files" .. Patch Set 11: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6047/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16001 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I176497ef9873ed7589bd3dee07d048a42dfad953 Gerrit-Change-Number: 16001 Gerrit-PatchSet: 11 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 25 Jun 2020 11:45:05 + Gerrit-HasComments: No