[Impala-ASF-CR] IMPALA-10361: Use field id to resolve columns for Iceberg tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16788 ) Change subject: IMPALA-10361: Use field id to resolve columns for Iceberg tables .. Patch Set 7: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6734/ -- To view, visit http://gerrit.cloudera.org:8080/16788 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I057bdc6ab2859cc4d40de5ed428d0c20028b8435 Gerrit-Change-Number: 16788 Gerrit-PatchSet: 7 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Tue, 08 Dec 2020 07:35:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10377 Improve the accuracy of resource estimation PlanNode does not consider some factors when estimating memory, this will cause a large error rate
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16822 ) Change subject: IMPALA-10377 Improve the accuracy of resource estimation PlanNode does not consider some factors when estimating memory, this will cause a large error rate .. Patch Set 1: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6735/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/16822 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I46c656bc88b969f4de99e187df16be3887592f3d Gerrit-Change-Number: 16822 Gerrit-PatchSet: 1 Gerrit-Owner: Anonymous Coward <54liu...@163.com> Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 08 Dec 2020 07:14:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10360: Allow simple limit to be treated as sampling hint
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/16792 ) Change subject: IMPALA-10360: Allow simple limit to be treated as sampling hint .. Patch Set 7: Code-Review+1 (2 comments) Thanks! http://gerrit.cloudera.org:8080/#/c/16792/5/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java: http://gerrit.cloudera.org:8080/#/c/16792/5/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@223 PS5, Line 223: // for single table query blocks, limit push > I am doing that check in HdfsPartitionPruner.pruneForSimpleLimit() line #20 Okay. That check will do the job. Thanks for pointing it out. http://gerrit.cloudera.org:8080/#/c/16792/7/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java File fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java: http://gerrit.cloudera.org:8080/#/c/16792/7/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java@208 PS7, Line 208: tblRef.hasConvertLimitToSampleHint() We may need to document the case where the sample rate may be too small. For example, the same rate is 1 on a 1000 row table, and the limit value is 500. Understand it is a user mistake now. -- To view, visit http://gerrit.cloudera.org:8080/16792 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ife05a5343c913006f7659949b327b63d3f10c04b Gerrit-Change-Number: 16792 Gerrit-PatchSet: 7 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 08 Dec 2020 03:57:17 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16622 ) Change subject: IMPALA-10252: fix invalid runtime filters for outer joins .. Patch Set 7: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/16622 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261 Gerrit-Change-Number: 16622 Gerrit-PatchSet: 7 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 08 Dec 2020 03:15:43 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16622 ) Change subject: IMPALA-10252: fix invalid runtime filters for outer joins .. IMPALA-10252: fix invalid runtime filters for outer joins The planner generates runtime filters for non-join conjuncts assigned to LEFT OUTER and FULL OUTER JOIN nodes. This is correct in many cases where NULLs stemming from unmatched rows would result in the predicate evaluating to false. E.g. x = y is always false if y is NULL. However, it is incorrect if the NULL returned from the unmatched row can result in the predicate evaluating to true. E.g. x = isnull(y, 1) can return true even if y is NULL. The fix is to detect cases when the source expression from the left input of the join returns non-NULL for null inputs and then skip generating the filter. Examples of expressions that may be affected by this change are COALESCE and ISNULL. Testing: Added regression tests: * Planner tests for LEFT OUTER and FULL OUTER where the runtime filter was incorrectly generated before this patch. * Enabled end-to-end test that was previously failing. * Added a new runtime filter test that will execute on both Parquet and Kudu (which are subtly different because of nullability of slots). Ran exhaustive tests. Change-Id: I507af1cc8df15bca21e0d8555019997812087261 Reviewed-on: http://gerrit.cloudera.org:8080/16622 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test M testdata/workloads/functional-query/queries/QueryTest/subquery.test M tests/query_test/test_queries.py 6 files changed, 144 insertions(+), 9 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/16622 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261 Gerrit-Change-Number: 16622 Gerrit-PatchSet: 8 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10361: Use field id to resolve columns for Iceberg tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16788 ) Change subject: IMPALA-10361: Use field id to resolve columns for Iceberg tables .. Patch Set 7: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6734/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16788 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I057bdc6ab2859cc4d40de5ed428d0c20028b8435 Gerrit-Change-Number: 16788 Gerrit-PatchSet: 7 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Tue, 08 Dec 2020 02:00:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10360: Allow simple limit to be treated as sampling hint
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/16792 ) Change subject: IMPALA-10360: Allow simple limit to be treated as sampling hint .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/16792/5/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java: http://gerrit.cloudera.org:8080/#/c/16792/5/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@223 PS5, Line 223: // for single table query blocks, limit push > Thanks for the explanation. Yes, I am totally fine with non-sampling based I am doing that check in HdfsPartitionPruner.pruneForSimpleLimit() line #208: if (tblRef.hasConvertLimitToSampleHint()) {. That will make sure that even in the single table case, do the sampling only if the hint is present. Were you wanting that check to be applied here ? Given the way the code is structured, checking for that hint here will cause the other (non-sample) optimization to be skipped which is not what we want. -- To view, visit http://gerrit.cloudera.org:8080/16792 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ife05a5343c913006f7659949b327b63d3f10c04b Gerrit-Change-Number: 16792 Gerrit-PatchSet: 7 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 08 Dec 2020 01:21:12 + Gerrit-HasComments: Yes
[native-toolchain-CR] IMPALA-9985/IMPALA-10011: Update supported platforms, fix Maven download
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16823 ) Change subject: IMPALA-9985/IMPALA-10011: Update supported platforms, fix Maven download .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16823 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib622f1e636e8b17b72c67b3d48964acea20db9ff Gerrit-Change-Number: 16823 Gerrit-PatchSet: 2 Gerrit-Owner: Laszlo Gaal Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 08 Dec 2020 01:13:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10360: Allow simple limit to be treated as sampling hint
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/16792 ) Change subject: IMPALA-10360: Allow simple limit to be treated as sampling hint .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/16792/5/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java: http://gerrit.cloudera.org:8080/#/c/16792/5/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@223 PS5, Line 223: // for single table query blocks, limit push > Right, in case 1, the limit to sampling will not be applied but if you reca Thanks for the explanation. Yes, I am totally fine with non-sampling based simple-limit pushdown. For the sampling based simple-limit pushdown, would it be better to check the hint to be present before apply the optimization? That is, if not asked, do not do it. -- To view, visit http://gerrit.cloudera.org:8080/16792 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ife05a5343c913006f7659949b327b63d3f10c04b Gerrit-Change-Number: 16792 Gerrit-PatchSet: 7 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 08 Dec 2020 01:00:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10381: Fix overloading of --ldap passwords in clear ok
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16829 ) Change subject: IMPALA-10381: Fix overloading of --ldap_passwords_in_clear_ok .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7791/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16829 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12ee3a857365c0fca261a8b06de2321ed6b40a83 Gerrit-Change-Number: 16829 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 07 Dec 2020 23:56:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10381: Fix overloading of --ldap passwords in clear ok
Thomas Tauber-Marshall has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16829 Change subject: IMPALA-10381: Fix overloading of --ldap_passwords_in_clear_ok .. IMPALA-10381: Fix overloading of --ldap_passwords_in_clear_ok The --ldap_passwords_in_clear_ok flag was originally intended to allow configurations where Impala connects to LDAP without SSL, for testing purposes. Since then, two other uses of the flag have been added: 1) for controlling whether cookies include the 'Secure' attribute and 2) for controlling whether the webserver allows LDAP auth to be enabled if SSL isn't. Some use cases may prefer to control these values separately, so this patch separates them into three different flags. Testing: - Updated existing tests that use --ldap_passwords_in_clear_ok Change-Id: I12ee3a857365c0fca261a8b06de2321ed6b40a83 --- M be/src/rpc/authentication-util.cc M be/src/util/webserver-test.cc M be/src/util/webserver.cc M fe/src/test/java/org/apache/impala/customcluster/LdapJdbcTest.java M fe/src/test/java/org/apache/impala/customcluster/LdapWebserverTest.java 5 files changed, 17 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/16829/1 -- To view, visit http://gerrit.cloudera.org:8080/16829 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I12ee3a857365c0fca261a8b06de2321ed6b40a83 Gerrit-Change-Number: 16829 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16720 ) Change subject: IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate .. Patch Set 29: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7790/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16720 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691 Gerrit-Change-Number: 16720 Gerrit-PatchSet: 29 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 07 Dec 2020 23:15:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16720 ) Change subject: IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate .. Patch Set 29: (8 comments) http://gerrit.cloudera.org:8080/#/c/16720/29/be/src/util/min-max-filter-test.cc File be/src/util/min-max-filter-test.cc: http://gerrit.cloudera.org:8080/#/c/16720/29/be/src/util/min-max-filter-test.cc@589 PS29, Line 589: EXPECT_EQ(overflow, false); \ line too long (95 > 90) http://gerrit.cloudera.org:8080/#/c/16720/29/be/src/util/min-max-filter-test.cc@592 PS29, Line 592: EXPECT_EQ(overflow, false); \ line too long (95 > 90) http://gerrit.cloudera.org:8080/#/c/16720/29/be/src/util/min-max-filter-test.cc@597 PS29, Line 597: EXPECT_EQ(overflow, false); \ line too long (95 > 90) http://gerrit.cloudera.org:8080/#/c/16720/29/be/src/util/min-max-filter-test.cc@600 PS29, Line 600: EXPECT_EQ(overflow, false); \ line too long (95 > 90) http://gerrit.cloudera.org:8080/#/c/16720/29/be/src/util/min-max-filter-test.cc@649 PS29, Line 649: CheckDecimalVals(filter##SIZE, decimal##SIZE##_type, d1##SIZE, d1##SIZE); \ line too long (108 > 90) http://gerrit.cloudera.org:8080/#/c/16720/29/be/src/util/min-max-filter-test.cc@653 PS29, Line 653: CheckDecimalVals(filter##SIZE, decimal##SIZE##_type, d1##SIZE, d2##SIZE); \ line too long (108 > 90) http://gerrit.cloudera.org:8080/#/c/16720/29/be/src/util/min-max-filter-test.cc@657 PS29, Line 657: CheckDecimalVals(filter##SIZE, decimal##SIZE##_type, d3##SIZE, d2##SIZE); \ line too long (108 > 90) http://gerrit.cloudera.org:8080/#/c/16720/29/be/src/util/min-max-filter-test.cc@669 PS29, Line 669: CheckDecimalVals(filter##SIZE##2, decimal##SIZE##_type, d3##SIZE, d2##SIZE); \ line too long (110 > 90) -- To view, visit http://gerrit.cloudera.org:8080/16720 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691 Gerrit-Change-Number: 16720 Gerrit-PatchSet: 29 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 07 Dec 2020 22:54:17 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate
Qifan Chen has uploaded a new patch set (#29). ( http://gerrit.cloudera.org:8080/16720 ) Change subject: IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate .. IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate This patch adds a new class of predicates called overlap predicates to aid in the determination of whether a Parquet row group or a page overlap with a range computed from an equi hash join. If not, then the entire Parquet row group or the page are skipped. The new class of overlap predicates exist as in the form of min/max filters. For the following query, the min and max in the min/max filter are computed after the hash table is popuated with data from table 'b'. These two values are compared against the min/max of each row group or page at the scan node for 'a'. select straight_join count(*) from lineitem_sorted_l_shipdate a join [SHUFFLE] lineitem_sorted_l_shipdate b where a.l_shipdate = b.l_receiptdate and b.l_commitdate = "1992-01-31"; An overlap predicate associated with a join column of type J and a scan column type of S will be formed provided the following is true: Both J and S are Booleans Both J and S are Integers (tinyint, smallint, int, or bigint) Both J and S are approximate numeric (float or double) Both J and S are Decimals with the same precision and scale Both J and S are strings (STRING, CHAR or VARCHAR) Both J and S are date Both J and S are timestamp Like any existing min/max filters, MAX_NUM_RUNTIME_FILTERS query option does not apply to the min/max filters created for overlap predicates. The overlap predicates will always be evaluated, after the min/max conjuncts (if any). Two new run-time profile counters are added to report the number of row groups or pages filtered out via the overlap predicates respectively: 1. NumMinMaxFilteredRowGroups 2. NumMinMaxFilteredPages Testing: 1. Unit tested on various column types with TPCH and TPCDS tables. Benefits were significant when the join column on the outer table is sorted, or when the min/max boundary values of the pages or row groups are monotonic; 2. Added new tests in min_max_filters.test for join column type compatibility and to demonstrate the number of filtered out pages and row groups with the two new profile counters; 3. Added data type specific overlap method tests in min-max-filter-test.cc; 4. Core testing. TBD in this patch: 1. Performance measurement. To do in follow-up JIRAs: 1. Apply the overlap predicate on partition columns); 2. IR code-gen for various MinMaxFilter::EvalOverlap methods. Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691 --- M be/src/exec/exec-node.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/parquet/parquet-column-stats.cc M be/src/exec/parquet/parquet-column-stats.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/scan-node.cc M be/src/runtime/coordinator.cc M be/src/runtime/date-value.cc M be/src/runtime/date-value.h M be/src/runtime/decimal-value.h M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/util/min-max-filter-test.cc M be/src/util/min-max-filter.cc M be/src/util/min-max-filter.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test 23 files changed, 1,272 insertions(+), 153 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/16720/29 -- To view, visit http://gerrit.cloudera.org:8080/16720 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691 Gerrit-Change-Number: 16720 Gerrit-PatchSet: 29 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-9865: part 1: basic profile log parser
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/16821 ) Change subject: IMPALA-9865: part 1: basic profile log parser .. Patch Set 4: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/16821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6178399ac96e176f7067cc47347e51cda2f3 Gerrit-Change-Number: 16821 Gerrit-PatchSet: 4 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 07 Dec 2020 22:39:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9865: part 1: basic profile log parser
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16821 ) Change subject: IMPALA-9865: part 1: basic profile log parser .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7789/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6178399ac96e176f7067cc47347e51cda2f3 Gerrit-Change-Number: 16821 Gerrit-PatchSet: 4 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 07 Dec 2020 22:29:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9865: part 1: basic profile log parser
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16821 ) Change subject: IMPALA-9865: part 1: basic profile log parser .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7788/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6178399ac96e176f7067cc47347e51cda2f3 Gerrit-Change-Number: 16821 Gerrit-PatchSet: 3 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 07 Dec 2020 22:27:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9865: part 1: basic profile log parser
Hello Riza Suminto, David Rorke, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16821 to look at the new patch set (#4). Change subject: IMPALA-9865: part 1: basic profile log parser .. IMPALA-9865: part 1: basic profile log parser This adds a utility that consumes the Impala profile log format from stdin and pretty-prints the profiles. It supports some basic filters - --query_id, --min_timestamp and --max_timestamp. If --gen_experimental_profile=true is set, it dumps the aggregated part of the profile with the full output for the new experimental profiles. In a future change, we should detect this based on the profile version set. This utility will be extended in future with more options, but is already useful in that it can handle the new experimental profile format and produce pretty-printed output consistent with the Impala web UI and impala-shell. Change-Id: I6178399ac96e176f7067cc47347e51cda2f3 --- M be/src/util/CMakeLists.txt A be/src/util/impala-profile-tool.cc M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h 4 files changed, 116 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/16821/4 -- To view, visit http://gerrit.cloudera.org:8080/16821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6178399ac96e176f7067cc47347e51cda2f3 Gerrit-Change-Number: 16821 Gerrit-PatchSet: 4 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-9865: part 1: basic profile log parser
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16821 ) Change subject: IMPALA-9865: part 1: basic profile log parser .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/16821/2/be/src/util/impala-profile-tool.cc File be/src/util/impala-profile-tool.cc: http://gerrit.cloudera.org:8080/#/c/16821/2/be/src/util/impala-profile-tool.cc@31 PS2, Line 31: // is pretty-printed to standard output. > Add simple usage example in the doc maybe? like Done http://gerrit.cloudera.org:8080/#/c/16821/2/be/src/util/impala-profile-tool.cc@59 PS2, Line 59: getline(cin, line); > Tried to run the parser against my local runtime profile log. It seems It a Thanks, this seems to be a better way to use getline. I haven't done a lot of parsing using C++ streams so kinda learning as I go. -- To view, visit http://gerrit.cloudera.org:8080/16821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6178399ac96e176f7067cc47347e51cda2f3 Gerrit-Change-Number: 16821 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 07 Dec 2020 22:04:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9865: part 1: basic profile log parser
Hello Riza Suminto, David Rorke, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16821 to look at the new patch set (#3). Change subject: IMPALA-9865: part 1: basic profile log parser .. IMPALA-9865: part 1: basic profile log parser This adds a utility that consumes the Impala profile log format from stdin and pretty-prints the profiles. It supports some basic filters - --query_id, --min_timestamp and --max_timestamp. If --gen_experimental_profile=true is set, it dumps the aggregated part of the profile with the full output for the new experimental profiles. In a future change, we should detect this based on the profile version set. This utility will be extended in future with more options, but is already useful in that it can handle the new experimental profile format and produce pretty-printed output consistent with the Impala web UI and impala-shell. Change-Id: I6178399ac96e176f7067cc47347e51cda2f3 --- M be/src/util/CMakeLists.txt A be/src/util/impala-profile-tool.cc M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h 4 files changed, 117 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/16821/3 -- To view, visit http://gerrit.cloudera.org:8080/16821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6178399ac96e176f7067cc47347e51cda2f3 Gerrit-Change-Number: 16821 Gerrit-PatchSet: 3 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10360: Allow simple limit to be treated as sampling hint
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/16792 ) Change subject: IMPALA-10360: Allow simple limit to be treated as sampling hint .. Patch Set 7: (1 comment) > Patch Set 7: > > (3 comments) > > Looks good! http://gerrit.cloudera.org:8080/#/c/16792/5/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java: http://gerrit.cloudera.org:8080/#/c/16792/5/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@223 PS5, Line 223: // for single table query blocks, limit push > Little bit of confusion here for case 1. Since no convert_limit_to_sample h Right, in case 1, the limit to sampling will not be applied but if you recall in https://gerrit.cloudera.org/c/16723/ we added the non-sampling based simple-limit pushdown ..so that's what I meant for case 1. This method gets used in both scenarios. I added a comment in the last patchset but maybe it is still confusing, I can clarify further...but let me know if this makes sense or not. -- To view, visit http://gerrit.cloudera.org:8080/16792 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ife05a5343c913006f7659949b327b63d3f10c04b Gerrit-Change-Number: 16792 Gerrit-PatchSet: 7 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 07 Dec 2020 22:01:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16622 ) Change subject: IMPALA-10252: fix invalid runtime filters for outer joins .. Patch Set 7: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6733/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16622 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261 Gerrit-Change-Number: 16622 Gerrit-PatchSet: 7 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 07 Dec 2020 21:34:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16622 ) Change subject: IMPALA-10252: fix invalid runtime filters for outer joins .. Patch Set 7: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16622 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261 Gerrit-Change-Number: 16622 Gerrit-PatchSet: 7 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 07 Dec 2020 21:34:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16622 ) Change subject: IMPALA-10252: fix invalid runtime filters for outer joins .. Patch Set 6: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16622 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261 Gerrit-Change-Number: 16622 Gerrit-PatchSet: 6 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 07 Dec 2020 21:33:06 + Gerrit-HasComments: No
[Impala-ASF-CR] [WIP] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16720 ) Change subject: [WIP] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate .. Patch Set 28: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7787/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16720 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691 Gerrit-Change-Number: 16720 Gerrit-PatchSet: 28 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 07 Dec 2020 20:35:12 + Gerrit-HasComments: No
[Impala-ASF-CR] [WIP] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16720 ) Change subject: [WIP] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate .. Patch Set 28: (8 comments) http://gerrit.cloudera.org:8080/#/c/16720/28/be/src/util/min-max-filter-test.cc File be/src/util/min-max-filter-test.cc: http://gerrit.cloudera.org:8080/#/c/16720/28/be/src/util/min-max-filter-test.cc@589 PS28, Line 589: EXPECT_EQ(overflow, false); \ line too long (95 > 90) http://gerrit.cloudera.org:8080/#/c/16720/28/be/src/util/min-max-filter-test.cc@592 PS28, Line 592: EXPECT_EQ(overflow, false); \ line too long (95 > 90) http://gerrit.cloudera.org:8080/#/c/16720/28/be/src/util/min-max-filter-test.cc@597 PS28, Line 597: EXPECT_EQ(overflow, false); \ line too long (95 > 90) http://gerrit.cloudera.org:8080/#/c/16720/28/be/src/util/min-max-filter-test.cc@600 PS28, Line 600: EXPECT_EQ(overflow, false); \ line too long (95 > 90) http://gerrit.cloudera.org:8080/#/c/16720/28/be/src/util/min-max-filter-test.cc@649 PS28, Line 649: CheckDecimalVals(filter##SIZE, decimal##SIZE##_type, d1##SIZE, d1##SIZE); \ line too long (108 > 90) http://gerrit.cloudera.org:8080/#/c/16720/28/be/src/util/min-max-filter-test.cc@653 PS28, Line 653: CheckDecimalVals(filter##SIZE, decimal##SIZE##_type, d1##SIZE, d2##SIZE); \ line too long (108 > 90) http://gerrit.cloudera.org:8080/#/c/16720/28/be/src/util/min-max-filter-test.cc@657 PS28, Line 657: CheckDecimalVals(filter##SIZE, decimal##SIZE##_type, d3##SIZE, d2##SIZE); \ line too long (108 > 90) http://gerrit.cloudera.org:8080/#/c/16720/28/be/src/util/min-max-filter-test.cc@669 PS28, Line 669: CheckDecimalVals(filter##SIZE##2, decimal##SIZE##_type, d3##SIZE, d2##SIZE); \ line too long (110 > 90) -- To view, visit http://gerrit.cloudera.org:8080/16720 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691 Gerrit-Change-Number: 16720 Gerrit-PatchSet: 28 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 07 Dec 2020 20:14:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] [WIP] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate
Qifan Chen has uploaded a new patch set (#28). ( http://gerrit.cloudera.org:8080/16720 ) Change subject: [WIP] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate .. [WIP] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate This patch adds a new class of predicates called overlap predicates to aid in the determination of whether a Parquet row group or a page overlap with a range computed from an equi hash join. If not, then the entire Parquet row group or the page are skipped. The new class of overlap predicates exist as in the form of min/max filters. For the following query, the min and max in the min/max filter are computed after the hash table is popuated with data from table 'b'. These two values are compared against the min/max of each row group or page at the scan node for 'a'. select straight_join count(*) from lineitem_sorted_l_shipdate a join [SHUFFLE] lineitem_sorted_l_shipdate b where a.l_shipdate = b.l_receiptdate and b.l_commitdate = "1992-01-31"; An overlap predicate associated with a join column of type J and a scan column type of S will be formed provided the following is true: Both J and S are Booleans Both J and S are Integers (tinyint, smallint, int, or bigint) Both J and S are approximate numeric (float or double) Both J and S are Decimals with the same precision and scale Both J and S are strings (STRING, CHAR or VARCHAR) Both J and S are date Both J and S are timestamp Like any existing min/max filters, MAX_NUM_RUNTIME_FILTERS query option does not apply to the min/max filters created for overlap predicates. The overlap predicates will always be evaluated, after the min/max conjuncts (if any). Two new run-time profile counters are added to report the number of row groups or pages filtered out via the overlap predicates respectively: 1. NumMinMaxFilteredRowGroups 2. NumMinMaxFilteredPages Testing: 1. Added data type specific overlap method tests in min-max-filter-test.cc (boolean, int, string, date, timestamp and decimal); 2. Unit tested on various column types (int, bigint, string and decimal) with TPCH and TPCDS tables. Benefits were significant when the join column on the outer table is sorted, or when the min/max boundary values of the pages or row groups are monotonic; 3. Added new tests in min_max_filters.test to demonstrate the number of filtered out pages and row groups. 4. Core testing. TBD: 1. Performance measurement; Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691 --- M be/src/exec/exec-node.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/parquet/parquet-column-stats.cc M be/src/exec/parquet/parquet-column-stats.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/scan-node.cc M be/src/runtime/coordinator.cc M be/src/runtime/date-value.cc M be/src/runtime/date-value.h M be/src/runtime/decimal-value.h M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/util/min-max-filter-test.cc M be/src/util/min-max-filter.cc M be/src/util/min-max-filter.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test 23 files changed, 1,090 insertions(+), 153 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/16720/28 -- To view, visit http://gerrit.cloudera.org:8080/16720 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691 Gerrit-Change-Number: 16720 Gerrit-PatchSet: 28 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16622 ) Change subject: IMPALA-10252: fix invalid runtime filters for outer joins .. Patch Set 6: Code-Review+1 > Patch Set 5: > > Updated the commit message as requested. > > Shant, I think hasNullRejectingConjucts (sp) in Analyzer.java handles at > least this case correctly - it does call isTrueWithNullSlots() on the > expression. I guess it's possible that it might handle more complex > expressions incorrectly, e.g. if the expression has slots from both sides of > the join and is false when all slots are null but true if a subset of slots > is null. > > > > [localhost.EXAMPLE.COM:21050] default> set > ENABLE_OUTER_JOIN_TO_INNER_TRANSFORMATION=1; > ENABLE_OUTER_JOIN_TO_INNER_TRANSFORMATION set to 1 > [localhost.EXAMPLE.COM:21050] default> explain select * from > functional.alltypes t1 left outer join functional.alltypestiny t2 on t1.id = > t2.id where zeroifnull(t2.int_col) = 0; > Query: explain select * from functional.alltypes t1 left outer join > functional.alltypestiny t2 on t1.id = t2.id where zeroifnull(t2.int_col) = 0 > ++ > | Explain String | > ++ > | Max Per-Host Resource Reservation: Memory=1.98MB Threads=5 | > | Per-Host Resource Estimates: Memory=163MB | > | Codegen disabled by planner| > || > | PLAN-ROOT SINK | > | | | > | 04:EXCHANGE [UNPARTITIONED]| > | | | > | 02:HASH JOIN [LEFT OUTER JOIN, BROADCAST] | > | | hash predicates: t1.id = t2.id | > | | other predicates: zeroifnull(t2.int_col) = 0| > | | row-size=178B cardinality=7.30K | > | | | > | |--03:EXCHANGE [BROADCAST] | > | | | | > | | 01:SCAN HDFS [functional.alltypestiny t2] | > | | HDFS partitions=4/4 files=4 size=460B| > | | row-size=89B cardinality=8 | > | | | > | 00:SCAN HDFS [functional.alltypes t1] | > |HDFS partitions=24/24 files=24 size=478.45KB| > |row-size=89B cardinality=7.30K | > ++ > Fetched 22 row(s) in 0.05s > [localhost.EXAMPLE.COM:21050] default> explain select * from > functional.alltypes t1 left outer join functional.alltypestiny t2 on t1.id = > t2.id where t2.int_col = 0; > Query: explain select * from functional.alltypes t1 left outer join > functional.alltypestiny t2 on t1.id = t2.id where t2.int_col = 0 > ++ > | Explain String | > ++ > | Max Per-Host Resource Reservation: Memory=2.98MB Threads=5 | > | Per-Host Resource Estimates: Memory=163MB | > | Codegen disabled by planner| > || > | PLAN-ROOT SINK | > | | | > | 04:EXCHANGE [UNPARTITIONED]| > | | | > | 02:HASH JOIN [INNER JOIN, BROADCAST] | > | | hash predicates: t1.id = t2.id | > | | runtime filters: RF000 <- t2.id | > | | row-size=178B cardinality=4 | > | | | > | |--03:EXCHANGE [BROADCAST] | > | | | | > | | 01:SCAN HDFS [functional.alltypestiny t2] | > | | HDFS partitions=4/4 files=4 size=460B| > | | predicates: t2.int_col = 0 | > | | row-size=89B cardinality=4 | > | | | > | 00:SCAN HDFS [functional.alltypes t1] | > |HDFS partitions=24/24 files=24 size=478.45KB| > |
[Impala-ASF-CR] IMPALA-10380: INSERT INTO identity-partitioned Iceberg tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16825 ) Change subject: IMPALA-10380: INSERT INTO identity-partitioned Iceberg tables .. Patch Set 2: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6732/ -- To view, visit http://gerrit.cloudera.org:8080/16825 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If98797a2bfdc038d0467c8f83aadf1a12e1d69d4 Gerrit-Change-Number: 16825 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Mon, 07 Dec 2020 19:29:25 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10380: INSERT INTO identity-partitioned Iceberg tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16825 ) Change subject: IMPALA-10380: INSERT INTO identity-partitioned Iceberg tables .. Patch Set 2: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6732/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16825 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If98797a2bfdc038d0467c8f83aadf1a12e1d69d4 Gerrit-Change-Number: 16825 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Mon, 07 Dec 2020 18:09:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10380: INSERT INTO identity-partitioned Iceberg tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16825 ) Change subject: IMPALA-10380: INSERT INTO identity-partitioned Iceberg tables .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7786/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16825 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If98797a2bfdc038d0467c8f83aadf1a12e1d69d4 Gerrit-Change-Number: 16825 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Mon, 07 Dec 2020 17:36:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10380: INSERT INTO identity-partitioned Iceberg tables
Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16825 Change subject: IMPALA-10380: INSERT INTO identity-partitioned Iceberg tables .. IMPALA-10380: INSERT INTO identity-partitioned Iceberg tables This patch adds support to INSERT INTO identity-partitioned Iceberg tables. Identity-partitioned Iceberg tables are similar to regular partitioned tables, they are even stored in the same directory structure. The difference is that the data files still store the partitioning columns. Partitioned Iceberg tables are stored as non-partitioned tables in the Hive Metastore (similarly to partitioned Kudu tables). However, the InsertStmt still generates the partition expressions for them. These partition expressions are used to shuffle and sort the input data so we don't end up writing too many files. The HdfsTableSink also uses the partition expressions to write the data files with the proper partition paths. Iceberg is able to parse the partition paths to generate the corresponding metadata for the partitions. This happens at the end in IcebergCatalogOpExecutor. Testing: * added planner test to verify shuffling and sorting * added negative tests for unsupported features like PARTITION clause and non-identity partition transforms * e2e tests with partitioned inserts Change-Id: If98797a2bfdc038d0467c8f83aadf1a12e1d69d4 --- M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/runtime/coordinator.cc M be/src/runtime/dml-exec-state.cc M be/src/service/client-request-state.cc M common/fbs/IcebergObjects.fbs M common/thrift/CatalogService.thrift M common/thrift/Frontend.thrift M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionSpec.java M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionTransform.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/catalog/IcebergColumn.java M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test A testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test M tests/query_test/test_iceberg.py 23 files changed, 400 insertions(+), 45 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/16825/1 -- To view, visit http://gerrit.cloudera.org:8080/16825 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: If98797a2bfdc038d0467c8f83aadf1a12e1d69d4 Gerrit-Change-Number: 16825 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy
[native-toolchain-CR] IMPALA-9985/IMPALA-10011: Update supported platforms, fix Maven download
Hello Zoltan Borok-Nagy, Tim Armstrong, Joe McDonnell, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16823 to look at the new patch set (#2). Change subject: IMPALA-9985/IMPALA-10011: Update supported platforms, fix Maven download .. IMPALA-9985/IMPALA-10011: Update supported platforms, fix Maven download Based on recent operating system releases and retirements the list of platforms supported by the binary toolchain build is updated as follows: Add Centos 8. CentOS 8 support starts with version 8.2, which is the most recent minor release for this platform. Earlier releases were impacted by several glibc problems, which impacted the GCC binaries in the toolchain (both 4.9.2 and 7.5), causing compilation failures when building Impala. A few of these problems are listed in [1], all of which are fixed in CentOS 8.2 by including an updated version of glibc. Remove Debian 8. Debian 8 has reached end-of-support on June 30th, 2020; see the announcement at [2]. The repo signature verification problems described in IMPALA-10011 were a corollary to this retirement, as infrastructure fell into disuse. This patch removes Debian 8 from the list of Docker containers built for toolchain builds, it also removes Debian 8 from the toolchain target list. Additionally change the Maven download URL to point to the new Apache official archive site for a stable and maintained repo location. [1] https://issues.apache.org/jira/browse/IMPALA-9985#comment-17163097 [2] https://www.debian.org/News/2020/20200709 Tests: 1. Built all the containers for toolchain builds 2. Built and published a new build of the toolchain for the new list of platforms, using the containers prepared in step 1. 3. Ran Impala core-mode tests on Centos 7 and Centos 8. Impala compiled successfully, so this fixes IMPALA-9985. Tests passed on Centos 7. Centos 8 showed a single cipherset-related failure in thrift-server-test on Centos 8, for which an upstream Jira will be filed. All these steps (except the thrift-server-test failure in step 3) were successful; they were executed on private infrastructure. Change-Id: Ib622f1e636e8b17b72c67b3d48964acea20db9ff --- M Makefile M docker/all/postinstall.sh D docker/debian/8/99no-check-valid-until D docker/debian/8/sources.list D docker/debian8.df M docker/redhat8.df M in-docker.py 7 files changed, 5 insertions(+), 59 deletions(-) git pull ssh://gerrit.cloudera.org:29418/native-toolchain refs/changes/23/16823/2 -- To view, visit http://gerrit.cloudera.org:8080/16823 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib622f1e636e8b17b72c67b3d48964acea20db9ff Gerrit-Change-Number: 16823 Gerrit-PatchSet: 2 Gerrit-Owner: Laszlo Gaal Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10360: Allow simple limit to be treated as sampling hint
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/16792 ) Change subject: IMPALA-10360: Allow simple limit to be treated as sampling hint .. Patch Set 7: (3 comments) Looks good! http://gerrit.cloudera.org:8080/#/c/16792/5/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java: http://gerrit.cloudera.org:8080/#/c/16792/5/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@223 PS5, Line 223: // for single table query blocks, limit push > We cannot remove this because as I mentioned in a previous comment (patchse Little bit of confusion here for case 1. Since no convert_limit_to_sample hint is available, I thought we will not apply the limit to sample optimization. Note in the commit message, it says the hint is needed: "If this hint is attached to a table either in the main query block or within a view/subquery and simple limit optimization conditions are satisfied (according to IMPALA-10314), the limit is converted to a table sample" http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java File fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java: http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java@209 PS2, Line 209: f.setTableSampleCl > I haven't looked into why the past decision was to only support whole numbe Done http://gerrit.cloudera.org:8080/#/c/16792/5/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java File fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java: http://gerrit.cloudera.org:8080/#/c/16792/5/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java@217 PS5, Line 217: partitions) { > I went ahead and modified the hint to require a sampling percent be specifi Sounds good to me! -- To view, visit http://gerrit.cloudera.org:8080/16792 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ife05a5343c913006f7659949b327b63d3f10c04b Gerrit-Change-Number: 16792 Gerrit-PatchSet: 7 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 07 Dec 2020 16:26:52 + Gerrit-HasComments: Yes
[native-toolchain-CR] IMPALA-9985/IMPALA-10011: Update supported platforms, fix Maven download
Laszlo Gaal has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16823 Change subject: IMPALA-9985/IMPALA-10011: Update supported platforms, fix Maven download .. IMPALA-9985/IMPALA-10011: Update supported platforms, fix Maven download Based on recent operating system releases and retirements the list of platforms supported by the binary toolchain build is updated as follows: Add Centos 8. CentOS 8 support starts with version 8.2, which is the most recent minor release for this platform. Earlier releases were impacted by several glibc problems, which impacted the GCC binaries in the toolchain (both 4.9.2 and 7.5), causing compilation failures when building Impala. A few of these problems are listed in [1], all of which are fixed in CentOS 8.2 by including an updated version of glibc. Remove Debian 8. Debian 8 has reached end-of-support on June 30th, 2020; see the announcement at [2]. The repo signature verification problems described in IMPALA-10011 were a corollary to this retirement, as infrastructure fell into disuse. This patch removes Debian 8 from the list of Docker containers built for toolchain builds, it also removes Debian 8 from the toolchain target list. Additionally change the Maven download URL to point to the new Apache official archive site for a stable and maintained repo location. [1] https://issues.apache.org/jira/browse/IMPALA-9985#comment-17163097 [2] https://www.debian.org/News/2020/20200709 Change-Id: Ib622f1e636e8b17b72c67b3d48964acea20db9ff --- M Makefile M docker/all/postinstall.sh D docker/debian/8/99no-check-valid-until D docker/debian/8/sources.list D docker/debian8.df M docker/redhat8.df M in-docker.py 7 files changed, 5 insertions(+), 59 deletions(-) git pull ssh://gerrit.cloudera.org:29418/native-toolchain refs/changes/23/16823/1 -- To view, visit http://gerrit.cloudera.org:8080/16823 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ib622f1e636e8b17b72c67b3d48964acea20db9ff Gerrit-Change-Number: 16823 Gerrit-PatchSet: 1 Gerrit-Owner: Laszlo Gaal