[Impala-ASF-CR] IMPALA-10360: Allow simple limit to be treated as sampling hint
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16792 ) Change subject: IMPALA-10360: Allow simple limit to be treated as sampling hint .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7775/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16792 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ife05a5343c913006f7659949b327b63d3f10c04b Gerrit-Change-Number: 16792 Gerrit-PatchSet: 4 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 04 Dec 2020 06:43:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10360: Allow simple limit to be treated as sampling hint
Hello Qifan Chen, Shant Hovsepian, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16792 to look at the new patch set (#4). Change subject: IMPALA-10360: Allow simple limit to be treated as sampling hint .. IMPALA-10360: Allow simple limit to be treated as sampling hint As a follow-up to IMPALA-10314, it is sometimes useful to consider a simple limit as a way to sample from a table if a relevant hint has been provided. Doing a sample instead of pure limit serves dual purposes: (a) it still helps with reducing the planning time since the scan ranges need be computed only for the sample files, (b) it allows sufficient number of files/rows to be read from the table such that after applying filter conditions or joins with another table, the query may still produce the N rows needed for limit. This fuctionality is especially useful if the query is against a view (note that TABLESAMPLE clause cannot be applied to a view). In this patch, a new table level hint, 'convert_limit_to_sample' is added. If this hint is attached to a table either in the main query block or within a view/subquery and simple limit optimization conditions are satisfied (according to IMPALA-10314), the limit is converted to a table sample. For example: set optimize_simple_limit = true; CREATE VIEW v1 as SELECT * FROM T [convert_limit_to_sample] WHERE [always_true] ; SELECT * FROM v1 LIMIT 10; In this case, the limit 10 is converted to a sample of T and the sampling percent is the greater of 1% or ratio (in percent) of limit to the estimated row count of the table (after partition pruning). Testing: - Added a alltypes_date_partition_2 table where the date and timestamp values match (this helps with setting the 'always_true' hint). - Added views with 'convert_limit_to_sample' and 'always_true' hints and added new tests against the views. Modified a few existing tests to reference the new table variant. - Added an end-to-end test. Change-Id: Ife05a5343c913006f7659949b327b63d3f10c04b --- M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/TableRef.java M fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M testdata/datasets/functional/functional_schema_template.sql M testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test M testdata/workloads/functional-query/queries/QueryTest/range-constant-propagation.test 9 files changed, 279 insertions(+), 35 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/16792/4 -- To view, visit http://gerrit.cloudera.org:8080/16792 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ife05a5343c913006f7659949b327b63d3f10c04b Gerrit-Change-Number: 16792 Gerrit-PatchSet: 4 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-9936: Only send invalidations in DDL responses to LocalCatalog coordinators
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16435 ) Change subject: IMPALA-9936: Only send invalidations in DDL responses to LocalCatalog coordinators .. Patch Set 4: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6728/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/16435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id45827295ddee3eb6e98a11c55f582b2aebe5f38 Gerrit-Change-Number: 16435 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 04 Dec 2020 05:41:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16765 ) Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7774/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 7 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 04 Dec 2020 05:15:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/16765 ) Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. Patch Set 7: (3 comments) Thanks Bikram, http://gerrit.cloudera.org:8080/#/c/16765/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16765/6//COMMIT_MSG@14 PS6, Line 14: computing the min reservation for the PlanRootS > nit: i think this description is slightly opaque since the reader would hav Done http://gerrit.cloudera.org:8080/#/c/16765/6/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java: http://gerrit.cloudera.org:8080/#/c/16765/6/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@90 PS6, Line 90: bufferSize, queryOptions.getMax_row_size()); > nit: can you add a small comment on why we need 2 maxRowBufferSize, either I edited the method documentation a bit to explain this. Hope it is clear. http://gerrit.cloudera.org:8080/#/c/16765/6/tests/query_test/test_result_spooling.py File tests/query_test/test_result_spooling.py: http://gerrit.cloudera.org:8080/#/c/16765/6/tests/query_test/test_result_spooling.py@426 PS6, Line 426: DEBUG_ACTION_V > nit: switch to ALL_CAPS to be consistent with rest of the codebase. Done -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 7 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 04 Dec 2020 04:55:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Hello Quanlong Huang, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16765 to look at the new patch set (#7). Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation PlanRootSink can fail silently if result spooling is enabled and maxMemReservationBytes is less than 2 * MAX_ROW_SIZE. This happens because results are spilled using a SpillableRowBatchQueue which needs 2 buffer (read and write) with at least MAX_ROW_SIZE bytes per buffer. This patch fixes this by setting a lower bound of 2 * MAX_ROW_SIZE while computing the min reservation for the PlanRootSink. Testing: - Pass exhaustive tests. - Add e2e TestResultSpoolingMaxReservation. - Lower MAX_ROW_SIZE on tests where MAX_RESULT_SPOOLING_MEM is set to extremely low value. Also verify that PLAN_ROOT_SINK's ReservationLimit remain unchanged after lowering the MAX_ROW_SIZE. Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 --- M be/src/runtime/buffered-tuple-stream.h M be/src/runtime/spillable-row-batch-queue.cc M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java M tests/custom_cluster/test_query_retries.py M tests/query_test/test_result_spooling.py 5 files changed, 118 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/16765/7 -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 7 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-10361: Use field id to resolve columns for Iceberg tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16788 ) Change subject: IMPALA-10361: Use field id to resolve columns for Iceberg tables .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7773/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16788 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I057bdc6ab2859cc4d40de5ed428d0c20028b8435 Gerrit-Change-Number: 16788 Gerrit-PatchSet: 4 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Fri, 04 Dec 2020 03:40:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10361: Use field id to resolve columns for Iceberg tables
wangsheng has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/16788 ) Change subject: IMPALA-10361: Use field id to resolve columns for Iceberg tables .. IMPALA-10361: Use field id to resolve columns for Iceberg tables We supported resolve column by field id for Iceberg table in this patch. We can use 'set PARQUET_FALLBACK_SCHEMA_RESOLUTION=FIELD_ID' or 'set PARQUET_FALLBACK_SCHEMA_RESOLUTION=2' to choose field id resolving. But pay attention, if you use this for non-Iceberg table, the result will be NULL. Change-Id: I057bdc6ab2859cc4d40de5ed428d0c20028b8435 --- M be/src/exec/parquet/parquet-metadata-utils.cc M be/src/exec/parquet/parquet-metadata-utils.h M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M be/src/runtime/types.cc M be/src/runtime/types.h M be/src/service/query-options-test.cc M common/thrift/CatalogObjects.thrift M common/thrift/Descriptors.thrift M common/thrift/ImpalaInternalService.thrift M common/thrift/Types.thrift M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/IcebergColumn.java A fe/src/main/java/org/apache/impala/catalog/IcebergStructField.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/StructType.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/Type.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-08/action=view/1-1-13d79bd6-4b97-4680-b4e1-52e93b6ce04e-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-08/action=view/6-6-305c9b7a-f42d-4245-b806-dfa7a792593f-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-08/action=view/9-9-224fe2d6-b0d9-42d6-bc95-15f52ecb29ad-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-08/action=view/00017-17-1a38e294-5992-48d9-a18e-08e129bb418c-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-08/action=view/00023-23-74cfcf22-3de2-489a-b1ec-d5141e75a8e8-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-08/action=view/00027-27-5f91dc85-b8f3-4cc2-a5c6-38b7fee49709-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-08/action=view/00030-30-dc3510cc-e765-43bc-be03-c5561a8d50a3-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-08/action=view/00031-31-364afc4a-b718-406d-a532-58fab5c8f85d-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-09/action=click/4-4-7a1a8e89-8aeb-4405-be64-76557432cf21-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-09/action=click/00014-14-765d552a-fddc-42f3-adfd-ecba20a01d80-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-09/action=click/00015-15-9957db43-3b9a-4a50-9946-d003cc1d461c-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-09/action=click/00019-19-1e1895d0-1f42-4c30-989f-968802831077-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-09/action=click/00020-20-bb59ac6d-aeee-4c35-9f8a-1a03127d33b8-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-09/action=click/00028-28-44ba3ad9-737c-4416-a32c-501cc9a4aa90-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-10/action=download/3-3-31478795-ff6a-4
[Impala-ASF-CR] IMPALA-10361: Use field id to resolve columns for Iceberg tables
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/16788 ) Change subject: IMPALA-10361: Use field id to resolve columns for Iceberg tables .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/16788/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16788/3//COMMIT_MSG@10 PS3, Line 10: FIELD_I FIELD_ID -- To view, visit http://gerrit.cloudera.org:8080/16788 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I057bdc6ab2859cc4d40de5ed428d0c20028b8435 Gerrit-Change-Number: 16788 Gerrit-PatchSet: 4 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Fri, 04 Dec 2020 03:18:55 + Gerrit-HasComments: Yes
[Impala-ASF-CR] [WIP] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16720 ) Change subject: [WIP] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate .. Patch Set 25: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7772/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16720 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691 Gerrit-Change-Number: 16720 Gerrit-PatchSet: 25 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 04 Dec 2020 02:40:47 + Gerrit-HasComments: No
[Impala-ASF-CR] [WIP] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16720 ) Change subject: [WIP] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate .. Patch Set 25: (8 comments) http://gerrit.cloudera.org:8080/#/c/16720/25/be/src/util/min-max-filter-test.cc File be/src/util/min-max-filter-test.cc: http://gerrit.cloudera.org:8080/#/c/16720/25/be/src/util/min-max-filter-test.cc@589 PS25, Line 589: EXPECT_EQ(overflow, false); \ line too long (95 > 90) http://gerrit.cloudera.org:8080/#/c/16720/25/be/src/util/min-max-filter-test.cc@592 PS25, Line 592: EXPECT_EQ(overflow, false); \ line too long (95 > 90) http://gerrit.cloudera.org:8080/#/c/16720/25/be/src/util/min-max-filter-test.cc@597 PS25, Line 597: EXPECT_EQ(overflow, false); \ line too long (95 > 90) http://gerrit.cloudera.org:8080/#/c/16720/25/be/src/util/min-max-filter-test.cc@600 PS25, Line 600: EXPECT_EQ(overflow, false); \ line too long (95 > 90) http://gerrit.cloudera.org:8080/#/c/16720/25/be/src/util/min-max-filter-test.cc@649 PS25, Line 649: CheckDecimalVals(filter##SIZE, decimal##SIZE##_type, d1##SIZE, d1##SIZE); \ line too long (108 > 90) http://gerrit.cloudera.org:8080/#/c/16720/25/be/src/util/min-max-filter-test.cc@653 PS25, Line 653: CheckDecimalVals(filter##SIZE, decimal##SIZE##_type, d1##SIZE, d2##SIZE); \ line too long (108 > 90) http://gerrit.cloudera.org:8080/#/c/16720/25/be/src/util/min-max-filter-test.cc@657 PS25, Line 657: CheckDecimalVals(filter##SIZE, decimal##SIZE##_type, d3##SIZE, d2##SIZE); \ line too long (108 > 90) http://gerrit.cloudera.org:8080/#/c/16720/25/be/src/util/min-max-filter-test.cc@669 PS25, Line 669: CheckDecimalVals(filter##SIZE##2, decimal##SIZE##_type, d3##SIZE, d2##SIZE); \ line too long (110 > 90) -- To view, visit http://gerrit.cloudera.org:8080/16720 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691 Gerrit-Change-Number: 16720 Gerrit-PatchSet: 25 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 04 Dec 2020 02:19:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] [WIP] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate
Qifan Chen has uploaded a new patch set (#25). ( http://gerrit.cloudera.org:8080/16720 ) Change subject: [WIP] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate .. [WIP] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate This patch adds the logic to utilize min/max stats for Parquet row groups or pages to skip these entities when they don't qualify an equi-join predicate. A new class of predicates called overlap predicates is introduced to aid in the determination of whether a Parquet row group or a page overlap with a range computed from the hash join. If not, then the entire Parquet row group or the page are skipped. The new class of predicates co-exist with the existing min/max conjuncts that are introduced based on the local or transitive scan predicates. Both classes of predicates can work individually or together with each other. The overlap predicates are evaluated after the existing min/max conjuncts. Two new run-time profile counters are added for the number of row groups or pages filtered via the overlap predicates respectively: 1. NumMinMaxFilteredRowGroups 2. NumMinMaxFilteredPages Testing: 1. Added data type specific overlap method tests in min-max-filter-test.cc (boolean, int, string, date, timestamp and decimal); 2. Unit tested on various column types (int, bigint, string and decimal) with TPCH tables. Benefits were significant when the join column on the outer table is sorted, and somewhat observable when the min/max boundary values of the pages or row groups are monotonic; 3. Added new tests in min_max_filters.test (invoked from test_runtime_filters.py) to demonstrate filtered pages in run-time counter NumMinMaxFilteredPage. TBD: 1. Convert remaining unit tests into query tests; 2. Performance measurement; 3. Checkout the effect of implicit casting in join predicate on overlap evaluation; 4. Compute a usefulness score for the overlap predicate and integrate it into MAX_NUM_RUNTIME_FILTERS limit; 5. Core testing. Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691 --- M be/src/exec/exec-node.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/parquet/parquet-column-stats.cc M be/src/exec/parquet/parquet-column-stats.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/scan-node.cc M be/src/runtime/coordinator.cc M be/src/runtime/date-value.cc M be/src/runtime/date-value.h M be/src/runtime/decimal-value.h M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/util/min-max-filter-test.cc M be/src/util/min-max-filter.cc M be/src/util/min-max-filter.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M testdata/datasets/tpch/tpch_schema_template.sql M testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test 24 files changed, 1,046 insertions(+), 153 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/16720/25 -- To view, visit http://gerrit.cloudera.org:8080/16720 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691 Gerrit-Change-Number: 16720 Gerrit-PatchSet: 25 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-9936: Only send invalidations in DDL responses to LocalCatalog coordinators
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16435 ) Change subject: IMPALA-9936: Only send invalidations in DDL responses to LocalCatalog coordinators .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7771/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id45827295ddee3eb6e98a11c55f582b2aebe5f38 Gerrit-Change-Number: 16435 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 04 Dec 2020 01:59:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9936: Only send invalidations in DDL responses to LocalCatalog coordinators
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/16435 ) Change subject: IMPALA-9936: Only send invalidations in DDL responses to LocalCatalog coordinators .. Patch Set 3: > Patch Set 3: Verified-1 > > Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6726/ The DCHECK added in PS3 capture that LOAD DATA DDL still receive full table object from catalogd. Fixed it in PS4. -- To view, visit http://gerrit.cloudera.org:8080/16435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id45827295ddee3eb6e98a11c55f582b2aebe5f38 Gerrit-Change-Number: 16435 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 04 Dec 2020 01:37:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9936: Only send invalidations in DDL responses to LocalCatalog coordinators
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16435 to look at the new patch set (#4). Change subject: IMPALA-9936: Only send invalidations in DDL responses to LocalCatalog coordinators .. IMPALA-9936: Only send invalidations in DDL responses to LocalCatalog coordinators Catalogd RPC response contains the updated catalog objects in a full form. For instance, a RPC for adding a new partition to an HdfsTable will return the whole HdfsTable object(metadata) containing all the partitions. This is required by legacy coordinators where the whole HdfsTable object is used to replace the stale object(metadata snapshot). However, LocalCatalog coordinators just need the object names for invalidations. It's a waste of space to send the full catalog objects to LocalCatalog coordinators. On the other hand, there is a risk of OOM due to hitting the Java array limit when serializing a table that has a huge metadata footprint. This patch refactors the catalogd RPC responses to only send back invalidations in need. To distinguish between legacy and LocalCatalog coordinators, a new field, want_minimal_response, is introduced in TCatalogServiceRequestHeader which is the header for most of the Catalogd RPC requests (e.g. TDdlExecRequest, TUpdateCatalogRequest and TResetMetadataRequest). LocalCatalog coordinators will set this field to true. When adding updated catalog objects to the response, catalogd will add invalidations which only contain the object names (e.g. db name, table name). Note that function objects are small so are ignored in this optimization. Tests: - Add DCHECKs in catalog-op-executor.cc to verify the catalog objects recieved by LocalCatalog coordinators are in minimal mode. - Run test_ddl.py in both legacy catalog mode and local catalog mode. Change-Id: Id45827295ddee3eb6e98a11c55f582b2aebe5f38 --- M be/src/exec/catalog-op-executor.cc M be/src/service/client-request-state.cc M common/thrift/CatalogService.thrift M fe/src/main/java/org/apache/impala/catalog/CatalogObject.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/Db.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java 10 files changed, 239 insertions(+), 139 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/16435/4 -- To view, visit http://gerrit.cloudera.org:8080/16435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id45827295ddee3eb6e98a11c55f582b2aebe5f38 Gerrit-Change-Number: 16435 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/16765 ) Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. Patch Set 6: (3 comments) looks good, will +2 once the nits are resolved http://gerrit.cloudera.org:8080/#/c/16765/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16765/6//COMMIT_MSG@14 PS6, Line 14: minMemReservationBytes as 2 * maxRowBufferSize. nit: i think this description is slightly opaque since the reader would have to know how maxRowBufferSize is computed. I think something like this would suffice: This patch fixes this by setting a lower bound of 2 * MAX_ROW_SIZE while computing the min reservation for the PlanRootSink http://gerrit.cloudera.org:8080/#/c/16765/6/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java: http://gerrit.cloudera.org:8080/#/c/16765/6/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@90 PS6, Line 90: long minMemReservationBytes = 2 * maxRowBufferSize; nit: can you add a small comment on why we need 2 maxRowBufferSize, either here or in the method comment above http://gerrit.cloudera.org:8080/#/c/16765/6/tests/query_test/test_result_spooling.py File tests/query_test/test_result_spooling.py: http://gerrit.cloudera.org:8080/#/c/16765/6/tests/query_test/test_result_spooling.py@426 PS6, Line 426: _debug_actions nit: switch to ALL_CAPS to be consistent with rest of the codebase. DEBUG_ACTION_VALUES -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 04 Dec 2020 01:08:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9930 (part 2): Introduce new admission control rpc service
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16412 ) Change subject: IMPALA-9930 (part 2): Introduce new admission control rpc service .. IMPALA-9930 (part 2): Introduce new admission control rpc service This patch introduces a new krpc service, AdmissionControlService, which coordinators can use to submit queries for admission. This patch adds some simple configuration flags that make it possible to have coordinators use this service to submit their queries for admission to other coordinators. These flags are only to make this patch testable and will be replaced when the separate admission control daemon is introduced in IMPALA-9975. The interface consists of the following RPCs: - AdmitQuery: takes a TQueryExecRequest and a TQueryOptions (serialized into sidecars), places the request on a queue to be processed by a thread pool and then immediately returns. - GetQueryStatus: takes a query id and returns the current admission status, including the QuerySchedulePB if admission has completed successfully but the query has not been released yet. - ReleaseQueryBackends: called when individual backends complete but the overall query is still running to release resources incrementally. This RPC will be called at most O(log(# backends)) per query due to BackendResourceState, which batches backends to release together. - ReleaseQuery: called when the query has completely finished. Releases all remaining resources. - CancelAdmission: called if a query is cancelled before an admission decision has been made to indicate that it should no longer be considered for admission. The majority of the patch consists of two classes: - AdmissionControlClient: used to abstract whether admission is being performed locally or remotely. In the local case, it is basically just a wrapper around AdmissionController. In the remote case, it handles serializing/deserializing of RPC params, polling GetQueryStatus() until a decision has been made, etc. - AdmissionControlService: exports the RPC interface and acts as a wrapper around AdmissionController. Some notable changes involved: - AdmissionController::SubmitForAdmission() no longer blocks while a query is queued. Instead, a new function WaitOnQueued() can be used to monitor the admission status of a queued query. - Adding events to the query timeline is moved out of AdmissionController and into the AdmissionControlClient classes, so that it always happens on the coordinator. - When a cluster is run in the new admission control service mode, only the impalad that is performing admission control exposes the /admission http endpoint. Observability will be cleaned up in a subsequent patch. Testing: - Modified existing admission control tests to run both with and without the admission control service enabled, including both the functional and stress tests. The 'num_queries' param in the stress test is modified to only use a single value to reduce the number of tests that are run and keep the running time reasonable. - Ran tpch10 on a local minicluster and observed no significant regressions. Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae Reviewed-on: http://gerrit.cloudera.org:8080/16412 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h M be/src/scheduling/CMakeLists.txt M be/src/scheduling/admission-control-client.cc M be/src/scheduling/admission-control-client.h A be/src/scheduling/admission-control-service.cc A be/src/scheduling/admission-control-service.h M be/src/scheduling/admission-controller-test.cc M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M be/src/scheduling/local-admission-control-client.cc M be/src/scheduling/local-admission-control-client.h A be/src/scheduling/remote-admission-control-client.cc A be/src/scheduling/remote-admission-control-client.h M be/src/scheduling/schedule-state.cc M be/src/scheduling/schedule-state.h M be/src/service/client-request-state.cc M be/src/service/impala-http-handler.cc M be/src/util/sharded-query-map-util.cc M common/protobuf/admission_control_service.proto M tests/common/resource_pool_config.py M tests/custom_cluster/test_admission_controller.py M tests/hs2/hs2_test_suite.py M tests/util/web_pages_util.py 24 files changed, 1,270 insertions(+), 171 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/16412 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae Gerrit-Change-Number: 16412 Gerrit-PatchSet: 14 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Pub
[Impala-ASF-CR] IMPALA-9930 (part 2): Introduce new admission control rpc service
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16412 ) Change subject: IMPALA-9930 (part 2): Introduce new admission control rpc service .. Patch Set 13: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/16412 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae Gerrit-Change-Number: 16412 Gerrit-PatchSet: 13 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 03 Dec 2020 23:46:27 + Gerrit-HasComments: No
[Impala-ASF-CR] [WIP] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/16720 ) Change subject: [WIP] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate .. Patch Set 24: (3 comments) http://gerrit.cloudera.org:8080/#/c/16720/24/be/src/runtime/date-value.cc File be/src/runtime/date-value.cc: http://gerrit.cloudera.org:8080/#/c/16720/24/be/src/runtime/date-value.cc@369 PS24, Line 369: DateValue DateValue::SubtractDays(int64_t days) const { I am not sure if this is really useful, as we already have AddDays that can be called with negative values. http://gerrit.cloudera.org:8080/#/c/16720/24/be/src/runtime/timestamp-value.cc File be/src/runtime/timestamp-value.cc: http://gerrit.cloudera.org:8080/#/c/16720/24/be/src/runtime/timestamp-value.cc@217 PS24, Line 217: add There is already an implementation to add intervals to timestamps at https://github.com/apache/impala/blob/master/be/src/exprs/timestamp-functions-ir.cc#L685 It would be good to use the same implementation, because for bit time_duration the add/sub can tricky. If I saw correctly then you only add nanoseconds, so it would be enough to add an "addNanoSecond" functions that could handle negative values too. timestamp-functions-ir.h could expose a function that does the add and we could call it from here. http://gerrit.cloudera.org:8080/#/c/16720/24/be/src/runtime/timestamp-value.cc@226 PS24, Line 226: TimestampValue(date_ + boost::gregorian::date_duration(1), this doesn't work correctly if 't' is more then one day -- To view, visit http://gerrit.cloudera.org:8080/16720 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691 Gerrit-Change-Number: 16720 Gerrit-PatchSet: 24 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 03 Dec 2020 22:29:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9910: [DOCS] update retry failed queries query option
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16819 ) Change subject: IMPALA-9910: [DOCS] update retry failed queries query option .. Patch Set 1: Verified+1 Build Successful https://jenkins.impala.io/job/gerrit-docs-auto-test/609/ : Doc tests passed. -- To view, visit http://gerrit.cloudera.org:8080/16819 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3a65357a6e3d0bffa840b8636171a38bd9b22d17 Gerrit-Change-Number: 16819 Gerrit-PatchSet: 1 Gerrit-Owner: Shajini Thayasingh Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 03 Dec 2020 21:55:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9910: [DOCS] update retry failed queries query option
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16819 ) Change subject: IMPALA-9910: [DOCS] update retry failed queries query option .. Patch Set 1: Build Started https://jenkins.impala.io/job/gerrit-docs-auto-test/609/ Testing docs change - this change appears to modify docs/ and no code. This is experimental - please report any issues to tarmstr...@cloudera.com or on this JIRA: IMPALA-7317 -- To view, visit http://gerrit.cloudera.org:8080/16819 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3a65357a6e3d0bffa840b8636171a38bd9b22d17 Gerrit-Change-Number: 16819 Gerrit-PatchSet: 1 Gerrit-Owner: Shajini Thayasingh Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 03 Dec 2020 21:47:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9910: [DOCS] update retry failed queries query option
Shajini Thayasingh has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16819 Change subject: IMPALA-9910: [DOCS] update retry failed queries query option .. IMPALA-9910: [DOCS] update retry failed queries query option elaborated the existing content talked about the new query option spool_all_results_for_retries Change-Id: I3a65357a6e3d0bffa840b8636171a38bd9b22d17 --- M docs/topics/impala_retry_failed_queries.xml 1 file changed, 10 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/16819/1 -- To view, visit http://gerrit.cloudera.org:8080/16819 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I3a65357a6e3d0bffa840b8636171a38bd9b22d17 Gerrit-Change-Number: 16819 Gerrit-PatchSet: 1 Gerrit-Owner: Shajini Thayasingh
[Impala-ASF-CR] IMPALA-10360: Allow simple limit to be treated as sampling hint
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/16792 ) Change subject: IMPALA-10360: Allow simple limit to be treated as sampling hint .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java: http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@223 PS2, Line 223: if (getTableRefs().size() == 1) > By looking at the following view DDL, I have the impression that the conver Yes, the convert_limit_to_sample hint is per table only. The expectation is that a user may want to apply that for the fact table typically but not the dimension table. http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java File fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java: http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java@209 PS2, Line 209: estimatedTotalRows > Okay. It seems getTable().getNumRows() returns the raw row count as recorde The TABLESAMPLE is a long type, so yeah the minimum can be 1%. You're right that the sampling is getting applied after partition pruning but I just want to make clear that there are 2 types of partition pruning: (a) based on predicates on partition column and (b) based on the simple limit. When this method is called, (a) has already been applied. If the sampling hint is provided we don't the pruning for (b) at all. We will use the supplied list of partitions and sample across all those partitions. Our docs (https://impala.apache.org/docs/build/html/topics/impala_tablesample.html) say this: Partitioning: When you query a partitioned table, any partition pruning happens before Impala selects the data files to sample. For example, in a table partitioned by year, a query with WHERE year = 2017 and a TABLESAMPLE SYSTEM(10) clause would sample data files representing at least 10% of the bytes present in the 2017 partition. The expectation of the user is that if they have supplied a sample percent, just use that against the final pruned partitions rather than inflating the percent. I could make the ratio better by considering a heuristic of uniform distribution across partitions and scaling down the total row count in the denominator by multiplying it with num_pruned_partitions/num_total_partitions. I want to avoid having to add up all the partition's row counts. All this is based on the row count... the alternative is the other option I mentioned before with having the percent specified in the hint which makes it explicit but I think in vast majority of cases since simple limit is small (10-100), having a minimum of 1% for a fact table even after partition pruning is going to be sufficient. In fact, it would have been useful to sample in fractional percentage e.g 0.01% of a 10B row table. -- To view, visit http://gerrit.cloudera.org:8080/16792 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ife05a5343c913006f7659949b327b63d3f10c04b Gerrit-Change-Number: 16792 Gerrit-PatchSet: 3 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 03 Dec 2020 18:23:53 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9930 (part 2): Introduce new admission control rpc service
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16412 ) Change subject: IMPALA-9930 (part 2): Introduce new admission control rpc service .. Patch Set 13: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6727/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16412 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae Gerrit-Change-Number: 16412 Gerrit-PatchSet: 13 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 03 Dec 2020 18:17:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9930 (part 2): Introduce new admission control rpc service
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16412 ) Change subject: IMPALA-9930 (part 2): Introduce new admission control rpc service .. Patch Set 13: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16412 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae Gerrit-Change-Number: 16412 Gerrit-PatchSet: 13 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 03 Dec 2020 18:17:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9930 (part 2): Introduce new admission control rpc service
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16412 ) Change subject: IMPALA-9930 (part 2): Introduce new admission control rpc service .. Patch Set 12: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7770/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16412 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae Gerrit-Change-Number: 16412 Gerrit-PatchSet: 12 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 03 Dec 2020 18:16:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9930 (part 2): Introduce new admission control rpc service
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/16412 ) Change subject: IMPALA-9930 (part 2): Introduce new admission control rpc service .. Patch Set 12: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16412 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae Gerrit-Change-Number: 16412 Gerrit-PatchSet: 12 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 03 Dec 2020 17:54:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9930 (part 2): Introduce new admission control rpc service
Hello Sahil Takiar, Joe McDonnell, Tim Armstrong, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16412 to look at the new patch set (#12). Change subject: IMPALA-9930 (part 2): Introduce new admission control rpc service .. IMPALA-9930 (part 2): Introduce new admission control rpc service This patch introduces a new krpc service, AdmissionControlService, which coordinators can use to submit queries for admission. This patch adds some simple configuration flags that make it possible to have coordinators use this service to submit their queries for admission to other coordinators. These flags are only to make this patch testable and will be replaced when the separate admission control daemon is introduced in IMPALA-9975. The interface consists of the following RPCs: - AdmitQuery: takes a TQueryExecRequest and a TQueryOptions (serialized into sidecars), places the request on a queue to be processed by a thread pool and then immediately returns. - GetQueryStatus: takes a query id and returns the current admission status, including the QuerySchedulePB if admission has completed successfully but the query has not been released yet. - ReleaseQueryBackends: called when individual backends complete but the overall query is still running to release resources incrementally. This RPC will be called at most O(log(# backends)) per query due to BackendResourceState, which batches backends to release together. - ReleaseQuery: called when the query has completely finished. Releases all remaining resources. - CancelAdmission: called if a query is cancelled before an admission decision has been made to indicate that it should no longer be considered for admission. The majority of the patch consists of two classes: - AdmissionControlClient: used to abstract whether admission is being performed locally or remotely. In the local case, it is basically just a wrapper around AdmissionController. In the remote case, it handles serializing/deserializing of RPC params, polling GetQueryStatus() until a decision has been made, etc. - AdmissionControlService: exports the RPC interface and acts as a wrapper around AdmissionController. Some notable changes involved: - AdmissionController::SubmitForAdmission() no longer blocks while a query is queued. Instead, a new function WaitOnQueued() can be used to monitor the admission status of a queued query. - Adding events to the query timeline is moved out of AdmissionController and into the AdmissionControlClient classes, so that it always happens on the coordinator. - When a cluster is run in the new admission control service mode, only the impalad that is performing admission control exposes the /admission http endpoint. Observability will be cleaned up in a subsequent patch. Testing: - Modified existing admission control tests to run both with and without the admission control service enabled, including both the functional and stress tests. The 'num_queries' param in the stress test is modified to only use a single value to reduce the number of tests that are run and keep the running time reasonable. - Ran tpch10 on a local minicluster and observed no significant regressions. Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae --- M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h M be/src/scheduling/CMakeLists.txt M be/src/scheduling/admission-control-client.cc M be/src/scheduling/admission-control-client.h A be/src/scheduling/admission-control-service.cc A be/src/scheduling/admission-control-service.h M be/src/scheduling/admission-controller-test.cc M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M be/src/scheduling/local-admission-control-client.cc M be/src/scheduling/local-admission-control-client.h A be/src/scheduling/remote-admission-control-client.cc A be/src/scheduling/remote-admission-control-client.h M be/src/scheduling/schedule-state.cc M be/src/scheduling/schedule-state.h M be/src/service/client-request-state.cc M be/src/service/impala-http-handler.cc M be/src/util/sharded-query-map-util.cc M common/protobuf/admission_control_service.proto M tests/common/resource_pool_config.py M tests/custom_cluster/test_admission_controller.py M tests/hs2/hs2_test_suite.py M tests/util/web_pages_util.py 24 files changed, 1,270 insertions(+), 171 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/16412/12 -- To view, visit http://gerrit.cloudera.org:8080/16412 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae Gerrit-Change-Number: 16412 Gerrit-PatchSet: 12 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Pub
[Impala-ASF-CR] IMPALA-10361: Use field id to resolve columns for Iceberg tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16788 ) Change subject: IMPALA-10361: Use field id to resolve columns for Iceberg tables .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7769/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16788 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I057bdc6ab2859cc4d40de5ed428d0c20028b8435 Gerrit-Change-Number: 16788 Gerrit-PatchSet: 3 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Thu, 03 Dec 2020 15:55:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10360: Allow simple limit to be treated as sampling hint
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/16792 ) Change subject: IMPALA-10360: Allow simple limit to be treated as sampling hint .. Patch Set 3: (5 comments) Thanks a lot for the explanation. Two additional questions follow :-). http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java File fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java: http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java@79 PS2, Line 79: if ((op == Operator.AND && : (Expr.IS_ALWAYS_TRUE_PREDICATE.apply(e1) && : Expr.IS_ALWAYS_TRUE_P > Makes sense to handle OR for completeness. I have added it. One thing tha Done http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/analysis/Expr.java File fe/src/main/java/org/apache/impala/analysis/Expr.java: http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/analysis/Expr.java@520 PS2, Line 520: if (conjuncts.size() > 1) { > For the single conjunct case it should have already been set on the line 51 Done http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java: http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@223 PS2, Line 223: if (getTableRefs().size() == 1) > Since this is a qualifying check to decide whether this query block is elig By looking at the following view DDL, I have the impression that the convert_limit_to_sample hint is only for table alltypes_date_partition_2, not for table alltypessmall. It is about right? DATASET functional BASE_TABLE_NAME alltypes_dp_2_view_2 CREATE DROP VIEW IF EXISTS {db_name}{db_suffix}.{table_name}; -- view which references a table with hint and a WHERE clause with hint. -- WHERE clause has a compound predicate. CREATE VIEW {db_name}{db_suffix}.{table_name} AS SELECT * FROM {db_name}{db_suffix}.alltypes_date_partition_2 [convert_limit_to_sample] where [always_true] date_col = cast(timestamp_col as date) and int_col in (select bigint_col from functional.alltypessmall); LOAD Looks like making it a table level hint helps provide extra level of safety. http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java File fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java: http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java@209 PS2, Line 209: estimatedTotalRows > Actually, the getTable().getNumRows() is the estimated row count. If stats Okay. It seems getTable().getNumRows() returns the raw row count as recorded in HMS for the table. In that case, your code ignores a table if the stats is missing or corrupt, which is good. 311 public void setTableStats(org.apache.hadoop.hive.metastore.api.Table msTbl) { 312 tableStats_ = new TTableStats(FeCatalogUtils.getRowCount(msTbl.getParameters())); 313 tableStats_.setTotal_file_bytes(FeCatalogUtils.getTotalSize(msTbl.getParameters())); 314 } 315 catalog/Table.java The current approach computes the sampling percentage automatically and has a lower bound of 1%. Since simple limit optimization reduces # of partitions to be scanned, I wonder if the sampling rate would still hold on the surviving partitions. For example, there are 4 partitions, p1, p2, p3, p4. The amount of data of the total in each partition: 20% (p1), 20% (p2), 50% (p3) and 10% (p4). Assume p4 is surviving. The for a limit that is close to #rows in p4, I assume we need almost 100% sample rate. http://gerrit.cloudera.org:8080/#/c/16792/2/testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test File testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test: http://gerrit.cloudera.org:8080/#/c/16792/2/testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test@275 PS2, Line 275: select * from functional.alltypes_dp_2_view_2 limit 10; > Yup..makes sense. I added a query at the end that is against the same base Done -- To view, visit http://gerrit.cloudera.org:8080/16792 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ife05a5343c913006f7659949b327b63d3f10c04b Gerrit-Change-Number: 16792 Gerrit-PatchSet: 3 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-C
[Impala-ASF-CR] IMPALA-10361: Use field id to resolve columns for Iceberg tables
wangsheng has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/16788 ) Change subject: IMPALA-10361: Use field id to resolve columns for Iceberg tables .. IMPALA-10361: Use field id to resolve columns for Iceberg tables We supported resolve column by field id for Iceberg table in this patch. We can use 'set PARQUET_FALLBACK_SCHEMA_RESOLUTION=FIELDID' or 'set PARQUET_FALLBACK_SCHEMA_RESOLUTION=2' to choose field id resolving. But pay attention, if you use this for non-Iceberg table, the result will be NULL. Change-Id: I057bdc6ab2859cc4d40de5ed428d0c20028b8435 --- M be/src/exec/parquet/parquet-metadata-utils.cc M be/src/exec/parquet/parquet-metadata-utils.h M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M be/src/runtime/types.cc M be/src/runtime/types.h M common/thrift/CatalogObjects.thrift M common/thrift/Descriptors.thrift M common/thrift/ImpalaInternalService.thrift M common/thrift/Types.thrift M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/IcebergColumn.java A fe/src/main/java/org/apache/impala/catalog/IcebergStructField.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/StructType.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/Type.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-08/action=view/1-1-13d79bd6-4b97-4680-b4e1-52e93b6ce04e-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-08/action=view/6-6-305c9b7a-f42d-4245-b806-dfa7a792593f-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-08/action=view/9-9-224fe2d6-b0d9-42d6-bc95-15f52ecb29ad-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-08/action=view/00017-17-1a38e294-5992-48d9-a18e-08e129bb418c-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-08/action=view/00023-23-74cfcf22-3de2-489a-b1ec-d5141e75a8e8-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-08/action=view/00027-27-5f91dc85-b8f3-4cc2-a5c6-38b7fee49709-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-08/action=view/00030-30-dc3510cc-e765-43bc-be03-c5561a8d50a3-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-08/action=view/00031-31-364afc4a-b718-406d-a532-58fab5c8f85d-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-09/action=click/4-4-7a1a8e89-8aeb-4405-be64-76557432cf21-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-09/action=click/00014-14-765d552a-fddc-42f3-adfd-ecba20a01d80-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-09/action=click/00015-15-9957db43-3b9a-4a50-9946-d003cc1d461c-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-09/action=click/00019-19-1e1895d0-1f42-4c30-989f-968802831077-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-09/action=click/00020-20-bb59ac6d-aeee-4c35-9f8a-1a03127d33b8-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-09/action=click/00028-28-44ba3ad9-737c-4416-a32c-501cc9a4aa90-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-10/action=download/3-3-31478795-ff6a-4a20-9fff-8dc4907c1ba7-0.parquet A t
[Impala-ASF-CR] IMPALA-9936: Only send invalidations in DDL responses to LocalCatalog coordinators
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16435 ) Change subject: IMPALA-9936: Only send invalidations in DDL responses to LocalCatalog coordinators .. Patch Set 3: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6726/ -- To view, visit http://gerrit.cloudera.org:8080/16435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id45827295ddee3eb6e98a11c55f582b2aebe5f38 Gerrit-Change-Number: 16435 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 03 Dec 2020 14:20:37 + Gerrit-HasComments: No
[Impala-ASF-CR] WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16741 ) Change subject: WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7768/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16741 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd Gerrit-Change-Number: 16741 Gerrit-PatchSet: 6 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 03 Dec 2020 09:43:16 + Gerrit-HasComments: No
[Impala-ASF-CR] WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16741 ) Change subject: WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/16741/6/be/src/thirdparty/murmurhash/MurmurHash3.h File be/src/thirdparty/murmurhash/MurmurHash3.h: http://gerrit.cloudera.org:8080/#/c/16741/6/be/src/thirdparty/murmurhash/MurmurHash3.h@21 PS6, Line 21: #else // defined(_MSC_VER) tab used for whitespace -- To view, visit http://gerrit.cloudera.org:8080/16741 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd Gerrit-Change-Number: 16741 Gerrit-PatchSet: 6 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 03 Dec 2020 09:21:59 + Gerrit-HasComments: Yes
[Impala-ASF-CR] WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
Hello Zoltan Borok-Nagy, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16741 to look at the new patch set (#6). Change subject: WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions .. WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd --- M be/src/codegen/impala-ir.cc M be/src/exprs/CMakeLists.txt A be/src/exprs/iceberg-functions-ir.cc A be/src/exprs/iceberg-functions-test.cc A be/src/exprs/iceberg-functions.h M be/src/exprs/scalar-expr-evaluator.cc A be/src/thirdparty/murmurhash/MurmurHash3.cpp A be/src/thirdparty/murmurhash/MurmurHash3.h A be/src/thirdparty/murmurhash/README.md M bin/rat_exclude_files.txt M bin/run_clang_tidy.sh M common/function-registry/impala_functions.py 12 files changed, 860 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/16741/6 -- To view, visit http://gerrit.cloudera.org:8080/16741 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd Gerrit-Change-Number: 16741 Gerrit-PatchSet: 6 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-9936: Only send invalidations in DDL responses to LocalCatalog coordinators
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16435 ) Change subject: IMPALA-9936: Only send invalidations in DDL responses to LocalCatalog coordinators .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7767/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id45827295ddee3eb6e98a11c55f582b2aebe5f38 Gerrit-Change-Number: 16435 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 03 Dec 2020 09:05:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9936: Only send invalidations in DDL responses to LocalCatalog coordinators
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16435 ) Change subject: IMPALA-9936: Only send invalidations in DDL responses to LocalCatalog coordinators .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7766/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id45827295ddee3eb6e98a11c55f582b2aebe5f38 Gerrit-Change-Number: 16435 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 03 Dec 2020 08:48:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9936: Only send invalidations in DDL responses to LocalCatalog coordinators
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16435 ) Change subject: IMPALA-9936: Only send invalidations in DDL responses to LocalCatalog coordinators .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6726/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/16435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id45827295ddee3eb6e98a11c55f582b2aebe5f38 Gerrit-Change-Number: 16435 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 03 Dec 2020 08:44:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9936: Only send invalidations in DDL responses to LocalCatalog coordinators
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16435 to look at the new patch set (#3). Change subject: IMPALA-9936: Only send invalidations in DDL responses to LocalCatalog coordinators .. IMPALA-9936: Only send invalidations in DDL responses to LocalCatalog coordinators Catalogd RPC response contains the updated catalog objects in a full form. For instance, a RPC for adding a new partition to an HdfsTable will return the whole HdfsTable object(metadata) containing all the partitions. This is required by legacy coordinators where the whole HdfsTable object is used to replace the stale object(metadata snapshot). However, LocalCatalog coordinators just need the object names for invalidations. It's a waste of space to send the full catalog objects to LocalCatalog coordinators. On the other hand, there is a risk of OOM due to hitting the Java array limit when serializing a table that has a huge metadata footprint. This patch refactors the catalogd RPC responses to only send back invalidations in need. To distinguish between legacy and LocalCatalog coordinators, a new field, want_minimal_response, is introduced in TCatalogServiceRequestHeader which is the header for most of the Catalogd RPC requests (e.g. TDdlExecRequest, TUpdateCatalogRequest and TResetMetadataRequest). LocalCatalog coordinators will set this field to true. When adding updated catalog objects to the response, catalogd will add invalidations which only contain the object names (e.g. db name, table name). Note that function objects are small so are ignored in this optimization. Tests: - Add DCHECKs in catalog-op-executor.cc to verify the catalog objects recieved by LocalCatalog coordinators are in minimal mode. - Run test_ddl.py in both legacy catalog mode and local catalog mode. Change-Id: Id45827295ddee3eb6e98a11c55f582b2aebe5f38 --- M be/src/exec/catalog-op-executor.cc M be/src/service/client-request-state.cc M common/thrift/CatalogService.thrift M fe/src/main/java/org/apache/impala/catalog/CatalogObject.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/Db.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java 10 files changed, 237 insertions(+), 139 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/16435/3 -- To view, visit http://gerrit.cloudera.org:8080/16435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id45827295ddee3eb6e98a11c55f582b2aebe5f38 Gerrit-Change-Number: 16435 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-9936: Only send invalidations in DDL responses to LocalCatalog coordinators
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16435 ) Change subject: IMPALA-9936: Only send invalidations in DDL responses to LocalCatalog coordinators .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/16435/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/16435/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3413 PS2, Line 3413: response.result.addToRemoved_catalog_objects(result.first.toMinimalTCatalogObject()); line too long (91 > 90) -- To view, visit http://gerrit.cloudera.org:8080/16435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id45827295ddee3eb6e98a11c55f582b2aebe5f38 Gerrit-Change-Number: 16435 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 03 Dec 2020 08:27:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9936: Only send invalidations in DDL responses to LocalCatalog coordinators
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16435 to look at the new patch set (#2). Change subject: IMPALA-9936: Only send invalidations in DDL responses to LocalCatalog coordinators .. IMPALA-9936: Only send invalidations in DDL responses to LocalCatalog coordinators Catalogd RPC response contains the updated catalog objects in a full form. For instance, a RPC for adding a new partition to an HdfsTable will return the whole HdfsTable object(metadata) containing all the partitions. This is required by legacy coordinators where the whole HdfsTable object is used to replace the stale object(metadata snapshot). However, LocalCatalog coordinators just need the object names for invalidations. It's a waste of space to send the full catalog objects to LocalCatalog coordinators. On the other hand, there is a risk of OOM due to hitting the Java array limit when serializing a table that has a huge metadata footprint. This patch refactors the catalogd RPC responses to only send back invalidations in need. To distinguish between legacy and LocalCatalog coordinators, a new field, want_minimal_response, is introduced in TCatalogServiceRequestHeader which is the header for most of the Catalogd RPC requests (e.g. TDdlExecRequest, TUpdateCatalogRequest and TResetMetadataRequest). LocalCatalog coordinators will set this field to true. When adding updated catalog objects to the response, catalogd will add invalidations which only contain the object names (e.g. db name, table name). Note that function objects are small so are ignored in this optimization. Tests: - Add DCHECKs in catalog-op-executor.cc to verify the catalog objects recieved by LocalCatalog coordinators are in minimal mode. - Run test_ddl.py in both legacy catalog mode and local catalog mode. Change-Id: Id45827295ddee3eb6e98a11c55f582b2aebe5f38 --- M be/src/exec/catalog-op-executor.cc M be/src/service/client-request-state.cc M common/thrift/CatalogService.thrift M fe/src/main/java/org/apache/impala/catalog/CatalogObject.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/Db.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java 10 files changed, 236 insertions(+), 139 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/16435/2 -- To view, visit http://gerrit.cloudera.org:8080/16435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id45827295ddee3eb6e98a11c55f582b2aebe5f38 Gerrit-Change-Number: 16435 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10369: Dump server stacktraces when test concurrent ddls.py timeout
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16800 ) Change subject: IMPALA-10369: Dump server stacktraces when test_concurrent_ddls.py timeout .. IMPALA-10369: Dump server stacktraces when test_concurrent_ddls.py timeout Recently, we see many timeout failures of test_concurrent_ddls.py in S3 builds, e.g. IMPALA-10280, IMPALA-10301, IMPALA-10363. It'd be helpful to dump the server stacktraces so we can understand why some RPCs are slow/stuck. This patch extracts the logic of dumping stacktraces in script-timeout-check.sh to a separate script, dump-stacktraces.sh. The script also dumps jstacks of HMS and NameNode. Dumping all these stacktraces is time-consuming so we do them in parallel, which also helps to get consistent snapshots of all servers. When any tests in test_concurrent_ddls.py timeout, we use dump-stacktraces.sh to dump the stacktraces before exit. Previously, some tests depend on pytest.mark.timeout for detecting timeouts. It's hard to add a customized callback for dumping server stacktraces. So this patch refactors test_concurrent_ddls.py to only use timeout of multiprocessing. Tests: - Tested the scripts locally. - Verified the error handling of timeout logics in Jenkins jobs Change-Id: I514cf2d0ff842805c0abf7211f2a395151174173 Reviewed-on: http://gerrit.cloudera.org:8080/16800 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- A bin/dump-stacktraces.sh M bin/script-timeout-check.sh M tests/custom_cluster/test_concurrent_ddls.py M tests/util/shell_util.py 4 files changed, 105 insertions(+), 38 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/16800 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I514cf2d0ff842805c0abf7211f2a395151174173 Gerrit-Change-Number: 16800 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-10369: Dump server stacktraces when test concurrent ddls.py timeout
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16800 ) Change subject: IMPALA-10369: Dump server stacktraces when test_concurrent_ddls.py timeout .. Patch Set 4: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/16800 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I514cf2d0ff842805c0abf7211f2a395151174173 Gerrit-Change-Number: 16800 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 03 Dec 2020 08:05:22 + Gerrit-HasComments: No