[Impala-ASF-CR] IMPALA-10361: Use field id to resolve columns for Iceberg tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16788 ) Change subject: IMPALA-10361: Use field id to resolve columns for Iceberg tables .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7784/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16788 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I057bdc6ab2859cc4d40de5ed428d0c20028b8435 Gerrit-Change-Number: 16788 Gerrit-PatchSet: 6 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Sat, 05 Dec 2020 04:12:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10361: Use field id to resolve columns for Iceberg tables
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/16788 ) Change subject: IMPALA-10361: Use field id to resolve columns for Iceberg tables .. Patch Set 6: (2 comments) Hi Zoltan, thanks for review again. I think it is a good idea to handle with orc tables in another patch. I will consider this lately. And I modify code to set FIELD_ID resolving for Iceberg tables, which means 'PARQUET_FALLBACK_SCHEMA_RESOLUTION' is invalid for Iceberg tables. If you agree with this kind of design, I will update commit message lately. http://gerrit.cloudera.org:8080/#/c/16788/5/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java File fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java: http://gerrit.cloudera.org:8080/#/c/16788/5/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java@339 PS5, Line 339: boolean isFullAcidTable = AcidUtils.isFullAcidTable(msTbl.getParameters()); > Iceberg tables cannot be full ACID, maybe it can be a precondition. Done http://gerrit.cloudera.org:8080/#/c/16788/5/testdata/data/README File testdata/data/README: http://gerrit.cloudera.org:8080/#/c/16788/5/testdata/data/README@608 PS5, Line 608: generated file will contains multi blocks, multi pages per block. > Please add information about the newly added files and tests. Done -- To view, visit http://gerrit.cloudera.org:8080/16788 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I057bdc6ab2859cc4d40de5ed428d0c20028b8435 Gerrit-Change-Number: 16788 Gerrit-PatchSet: 6 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Sat, 05 Dec 2020 03:54:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10361: Use field id to resolve columns for Iceberg tables
wangsheng has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/16788 ) Change subject: IMPALA-10361: Use field id to resolve columns for Iceberg tables .. IMPALA-10361: Use field id to resolve columns for Iceberg tables We supported resolve column by field id for Iceberg table in this patch. We can use 'set PARQUET_FALLBACK_SCHEMA_RESOLUTION=FIELD_ID' or 'set PARQUET_FALLBACK_SCHEMA_RESOLUTION=2' to choose field id resolving. But pay attention, if you use this for non-Iceberg table, the result will be NULL. Change-Id: I057bdc6ab2859cc4d40de5ed428d0c20028b8435 --- M be/src/exec/parquet/parquet-metadata-utils.cc M be/src/exec/parquet/parquet-metadata-utils.h M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M be/src/runtime/types.cc M be/src/runtime/types.h M be/src/service/query-options-test.cc M common/thrift/CatalogObjects.thrift M common/thrift/Descriptors.thrift M common/thrift/ImpalaInternalService.thrift M common/thrift/Types.thrift M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/IcebergColumn.java A fe/src/main/java/org/apache/impala/catalog/IcebergStructField.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/StructType.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/Type.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java M testdata/data/README A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-08/action=view/1-1-13d79bd6-4b97-4680-b4e1-52e93b6ce04e-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-08/action=view/6-6-305c9b7a-f42d-4245-b806-dfa7a792593f-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-08/action=view/9-9-224fe2d6-b0d9-42d6-bc95-15f52ecb29ad-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-08/action=view/00017-17-1a38e294-5992-48d9-a18e-08e129bb418c-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-08/action=view/00023-23-74cfcf22-3de2-489a-b1ec-d5141e75a8e8-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-08/action=view/00027-27-5f91dc85-b8f3-4cc2-a5c6-38b7fee49709-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-08/action=view/00030-30-dc3510cc-e765-43bc-be03-c5561a8d50a3-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-08/action=view/00031-31-364afc4a-b718-406d-a532-58fab5c8f85d-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-09/action=click/4-4-7a1a8e89-8aeb-4405-be64-76557432cf21-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-09/action=click/00014-14-765d552a-fddc-42f3-adfd-ecba20a01d80-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-09/action=click/00015-15-9957db43-3b9a-4a50-9946-d003cc1d461c-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-09/action=click/00019-19-1e1895d0-1f42-4c30-989f-968802831077-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-09/action=click/00020-20-bb59ac6d-aeee-4c35-9f8a-1a03127d33b8-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-09/action=click/00028-28-44ba3ad9-737c-4416-a32c-501cc9a4aa90-0.parquet A
[Impala-ASF-CR] IMPALA-10377 Improve the accuracy of resource estimation PlanNode does not consider some factors when estimating memory, this will cause a large error rate
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16822 ) Change subject: IMPALA-10377 Improve the accuracy of resource estimation PlanNode does not consider some factors when estimating memory, this will cause a large error rate .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7783/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16822 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I46c656bc88b969f4de99e187df16be3887592f3d Gerrit-Change-Number: 16822 Gerrit-PatchSet: 1 Gerrit-Owner: Anonymous Coward <54liu...@163.com> Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Sat, 05 Dec 2020 03:39:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10377 Improve the accuracy of resource estimation PlanNode does not consider some factors when estimating memory, this will cause a large error rate
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16822 ) Change subject: IMPALA-10377 Improve the accuracy of resource estimation PlanNode does not consider some factors when estimating memory, this will cause a large error rate .. Patch Set 1: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6731/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/16822 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I46c656bc88b969f4de99e187df16be3887592f3d Gerrit-Change-Number: 16822 Gerrit-PatchSet: 1 Gerrit-Owner: Anonymous Coward <54liu...@163.com> Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Sat, 05 Dec 2020 03:31:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9936: Only send invalidations in DDL responses to LocalCatalog coordinators
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16435 ) Change subject: IMPALA-9936: Only send invalidations in DDL responses to LocalCatalog coordinators .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7782/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id45827295ddee3eb6e98a11c55f582b2aebe5f38 Gerrit-Change-Number: 16435 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Sat, 05 Dec 2020 03:27:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9936: Only send invalidations in DDL responses to LocalCatalog coordinators
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16435 ) Change subject: IMPALA-9936: Only send invalidations in DDL responses to LocalCatalog coordinators .. Patch Set 5: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6730/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id45827295ddee3eb6e98a11c55f582b2aebe5f38 Gerrit-Change-Number: 16435 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Sat, 05 Dec 2020 03:18:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10377 Improve the accuracy of resource estimation PlanNode does not consider some factors when estimating memory, this will cause a large error rate
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16822 ) Change subject: IMPALA-10377 Improve the accuracy of resource estimation PlanNode does not consider some factors when estimating memory, this will cause a large error rate .. Patch Set 1: (39 comments) http://gerrit.cloudera.org:8080/#/c/16822/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: http://gerrit.cloudera.org:8080/#/c/16822/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@64 PS1, Line 64: // If the group clause is empty ( aggInfo.getGroupingExprs() is empty ), line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/16822/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@538 PS1, Line 538: // A skew factor of 1.5 was added to account for data skew among multiple fragment instances. line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/16822/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@538 PS1, Line 538: // A skew factor of 1.5 was added to account for data skew among multiple fragment instances. line too long (106 > 90) http://gerrit.cloudera.org:8080/#/c/16822/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@539 PS1, Line 539: // This number was derived using empirical analysis of real-world and benchmark (tpch, tpcds) queries. line too long (114 > 90) http://gerrit.cloudera.org:8080/#/c/16822/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@544 PS1, Line 544: perInstanceInputCardinality = (long) Math.ceil(inputCardinality / numInstances); line too long (92 > 90) http://gerrit.cloudera.org:8080/#/c/16822/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@551 PS1, Line 551: line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/16822/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@553 PS1, Line 553: // A reduction factor of 2 (input rows divided by output rows) was added to grow hash tables. line too long (103 > 90) http://gerrit.cloudera.org:8080/#/c/16822/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@554 PS1, Line 554: // If the reduction factor is lower than 2, only part of the data will be inserted into the hash table. line too long (113 > 90) http://gerrit.cloudera.org:8080/#/c/16822/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@562 PS1, Line 562: // The memory of the data stored in hash table and the memory of the hash tableās structure line too long (99 > 90) http://gerrit.cloudera.org:8080/#/c/16822/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@563 PS1, Line 563: perInstanceDataBytes = (long)Math.ceil(perInstanceCardinality * line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/16822/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@564 PS1, Line 564:(avgRowSize_ + PlannerContext.SIZE_OF_BUCKET)); line too long (94 > 90) http://gerrit.cloudera.org:8080/#/c/16822/1/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java: http://gerrit.cloudera.org:8080/#/c/16822/1/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@239 PS1, Line 239: // The memory of the data stored in hash table and line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/16822/1/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@242 PS1, Line 242: BitUtil.roundUpToPowerOf2((long) Math.ceil(3 * rhsCard / 2)) * line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/16822/1/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@245 PS1, Line 245: perBuildInstanceDataBytes += (rhsCard - rhsNdv) * line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/16822/1/fe/src/main/java/org/apache/impala/planner/PlannerContext.java File fe/src/main/java/org/apache/impala/planner/PlannerContext.java: http://gerrit.cloudera.org:8080/#/c/16822/1/fe/src/main/java/org/apache/impala/planner/PlannerContext.java@45 PS1, Line 45: line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/16822/1/fe/src/main/java/org/apache/impala/planner/SortNode.java File fe/src/main/java/org/apache/impala/planner/SortNode.java: http://gerrit.cloudera.org:8080/#/c/16822/1/fe/src/main/java/org/apache/impala/planner/SortNode.java@338 PS1, Line 338: perInstanceMemEstimate = fullInputSize < 0 ? line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/16822/1/fe/src/test/java/org/apache/impala/planner/CardinalityTest.java File fe/src/test/java/org/apache/impala/planner/CardinalityTest.java:
[Impala-ASF-CR] IMPALA-9936: Only send invalidations in DDL responses to LocalCatalog coordinators
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/16435 ) Change subject: IMPALA-9936: Only send invalidations in DDL responses to LocalCatalog coordinators .. Patch Set 5: Code-Review+2 (1 comment) Thank Tim's quick review! Carry on Tim's +2. http://gerrit.cloudera.org:8080/#/c/16435/4/be/src/exec/catalog-op-executor.cc File be/src/exec/catalog-op-executor.cc: http://gerrit.cloudera.org:8080/#/c/16435/4/be/src/exec/catalog-op-executor.cc@72 PS4, Line 72: VerifyMinimalResponse > nit: VerifyMinimalResponse Done -- To view, visit http://gerrit.cloudera.org:8080/16435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id45827295ddee3eb6e98a11c55f582b2aebe5f38 Gerrit-Change-Number: 16435 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Sat, 05 Dec 2020 03:17:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10377 Improve the accuracy of resource estimation PlanNode does not consider some factors when estimating memory, this will cause a large error rate
54liu...@163.com has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16822 Change subject: IMPALA-10377 Improve the accuracy of resource estimation PlanNode does not consider some factors when estimating memory, this will cause a large error rate .. IMPALA-10377 Improve the accuracy of resource estimation PlanNode does not consider some factors when estimating memory, this will cause a large error rate AggregationNode 1.The memory occupied by hash table's own data structure is not considered. Hash table inserts a new value, which will add a bucket. The size of a bucket is 16 bytes. 2.When estimating the NDV of merge aggregation, if there are multiple grouping exprs, it may be divided by the number of Fragment Instances several times, and it should be divided only once. 3.When estimating the NDV of merge aggregation, and there are multiple grouping exprs, the estimated memory is much smaller than the actual use. 4.If there is no grouping exprs, the estimated memory is much larger than the actual use. 5.If the NDV of grouping exprs is very small, the estimated memory is much larger than the actual use. SortNode 1.Estimate the memory usage of external sort. the estimated memory is much smaller than the actual use. HashJoinNode 1.The memory occupied by hash table's own data structure is not considered.Hash Table will keep duplicate data, so the size of DuplicateNode should be considered. 2.Hash table will create multiple buckets in advance. The size of these buckets should be considered. KuduScanNode 1.Estimate memory by scanning all columns,the estimated memory is much larger than the actual use. Change-Id: I46c656bc88b969f4de99e187df16be3887592f3d --- M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlannerContext.java M fe/src/main/java/org/apache/impala/planner/SortNode.java M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java 8 files changed, 227 insertions(+), 16 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/16822/1 -- To view, visit http://gerrit.cloudera.org:8080/16822 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I46c656bc88b969f4de99e187df16be3887592f3d Gerrit-Change-Number: 16822 Gerrit-PatchSet: 1 Gerrit-Owner: Anonymous Coward <54liu...@163.com>
[Impala-ASF-CR] IMPALA-9936: Only send invalidations in DDL responses to LocalCatalog coordinators
Hello Vihang Karajgaonkar, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16435 to look at the new patch set (#5). Change subject: IMPALA-9936: Only send invalidations in DDL responses to LocalCatalog coordinators .. IMPALA-9936: Only send invalidations in DDL responses to LocalCatalog coordinators Catalogd RPC response contains the updated catalog objects in a full form. For instance, a RPC for adding a new partition to an HdfsTable will return the whole HdfsTable object(metadata) containing all the partitions. This is required by legacy coordinators where the whole HdfsTable object is used to replace the stale object(metadata snapshot). However, LocalCatalog coordinators just need the object names for invalidations. It's a waste of space to send the full catalog objects to LocalCatalog coordinators. On the other hand, there is a risk of OOM due to hitting the Java array limit when serializing a table that has a huge metadata footprint. This patch refactors the catalogd RPC responses to only send back invalidations in need. To distinguish between legacy and LocalCatalog coordinators, a new field, want_minimal_response, is introduced in TCatalogServiceRequestHeader which is the header for most of the Catalogd RPC requests (e.g. TDdlExecRequest, TUpdateCatalogRequest and TResetMetadataRequest). LocalCatalog coordinators will set this field to true. When adding updated catalog objects to the response, catalogd will add invalidations which only contain the object names (e.g. db name, table name). Note that function objects are small so are ignored in this optimization. Tests: - Add DCHECKs in catalog-op-executor.cc to verify the catalog objects recieved by LocalCatalog coordinators are in minimal mode. - Run test_ddl.py in both legacy catalog mode and local catalog mode. Change-Id: Id45827295ddee3eb6e98a11c55f582b2aebe5f38 --- M be/src/exec/catalog-op-executor.cc M be/src/service/client-request-state.cc M common/thrift/CatalogService.thrift M fe/src/main/java/org/apache/impala/catalog/CatalogObject.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/Db.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java 10 files changed, 239 insertions(+), 139 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/16435/5 -- To view, visit http://gerrit.cloudera.org:8080/16435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id45827295ddee3eb6e98a11c55f582b2aebe5f38 Gerrit-Change-Number: 16435 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-9865: part 1: basic profile log parser
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/16821 ) Change subject: IMPALA-9865: part 1: basic profile log parser .. Patch Set 2: (2 comments) > Uploaded patch set 2. Hi Tim, Thanks for writing this parser. This is very useful. I just have 2 comment. http://gerrit.cloudera.org:8080/#/c/16821/2/be/src/util/impala-profile-tool.cc File be/src/util/impala-profile-tool.cc: http://gerrit.cloudera.org:8080/#/c/16821/2/be/src/util/impala-profile-tool.cc@31 PS2, Line 31: // is pretty-printed to standard output. Add simple usage example in the doc maybe? like impala-profile-tool < impala_profile_log_1.1-1607057366897 http://gerrit.cloudera.org:8080/#/c/16821/2/be/src/util/impala-profile-tool.cc@59 PS2, Line 59: getline(cin, line); Tried to run the parser against my local runtime profile log. It seems It always hit "Error reading line" when it reach EOF. What if we move this getline as the loop condition? say for (std::string line; std::getline(cin, line); ) { -- To view, visit http://gerrit.cloudera.org:8080/16821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6178399ac96e176f7067cc47347e51cda2f3 Gerrit-Change-Number: 16821 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sat, 05 Dec 2020 02:48:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9936: Only send invalidations in DDL responses to LocalCatalog coordinators
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16435 ) Change subject: IMPALA-9936: Only send invalidations in DDL responses to LocalCatalog coordinators .. Patch Set 4: Code-Review+2 (1 comment) This makes sense to me. http://gerrit.cloudera.org:8080/#/c/16435/4/be/src/exec/catalog-op-executor.cc File be/src/exec/catalog-op-executor.cc: http://gerrit.cloudera.org:8080/#/c/16435/4/be/src/exec/catalog-op-executor.cc@72 PS4, Line 72: verifyMinimalResponse nit: VerifyMinimalResponse -- To view, visit http://gerrit.cloudera.org:8080/16435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id45827295ddee3eb6e98a11c55f582b2aebe5f38 Gerrit-Change-Number: 16435 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Sat, 05 Dec 2020 01:48:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9865: part 1: basic profile log parser
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16821 ) Change subject: IMPALA-9865: part 1: basic profile log parser .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7781/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6178399ac96e176f7067cc47347e51cda2f3 Gerrit-Change-Number: 16821 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sat, 05 Dec 2020 01:24:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9865: part 1: basic profile log parser
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16821 ) Change subject: IMPALA-9865: part 1: basic profile log parser .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7780/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6178399ac96e176f7067cc47347e51cda2f3 Gerrit-Change-Number: 16821 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sat, 05 Dec 2020 01:23:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9865: part 1: basic profile log parser
Hello Riza Suminto, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16821 to look at the new patch set (#2). Change subject: IMPALA-9865: part 1: basic profile log parser .. IMPALA-9865: part 1: basic profile log parser This adds a utility that consumes the Impala profile log format from stdin and pretty-prints the profiles. It supports some basic filters - --query_id, --min_timestamp and --max_timestamp. If --gen_experimental_profile=true is set, it dumps the aggregated part of the profile with the full output for the new experimental profiles. In a future change, we should detect this based on the profile version set. This utility will be extended in future with more options, but is already useful in that it can handle the new experimental profile format and produce pretty-printed output consistent with the Impala web UI and impala-shell. Change-Id: I6178399ac96e176f7067cc47347e51cda2f3 --- M be/src/util/CMakeLists.txt A be/src/util/impala-profile-tool.cc M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h 4 files changed, 115 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/16821/2 -- To view, visit http://gerrit.cloudera.org:8080/16821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6178399ac96e176f7067cc47347e51cda2f3 Gerrit-Change-Number: 16821 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-9865: part 1: basic profile log parser
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16821 ) Change subject: IMPALA-9865: part 1: basic profile log parser .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/16821/1/be/src/util/impala-profile-tool.cc File be/src/util/impala-profile-tool.cc: http://gerrit.cloudera.org:8080/#/c/16821/1/be/src/util/impala-profile-tool.cc@34 PS1, Line 34: // --query_id=: given an impala query ID, only process profiles with this query id line too long (92 > 90) -- To view, visit http://gerrit.cloudera.org:8080/16821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6178399ac96e176f7067cc47347e51cda2f3 Gerrit-Change-Number: 16821 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Sat, 05 Dec 2020 01:02:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9865: part 1: basic profile log parser
Tim Armstrong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16821 Change subject: IMPALA-9865: part 1: basic profile log parser .. IMPALA-9865: part 1: basic profile log parser This adds a utility that consumes the Impala profile log format from stdin and pretty-prints the profiles. It supports some basic filters - --query_id, --min_timestamp and --max_timestamp. If --gen_experimental_profile=true is set, it dumps the aggregated part of the profile with the full output for the new experimental profiles. In a future change, we should detect this based on the profile version set. This utility will be extended in future with more options, but is already useful in that it can handle the new experimental profile format and produce pretty-printed output consistent with the Impala web UI and impala-shell. Change-Id: I6178399ac96e176f7067cc47347e51cda2f3 --- M be/src/util/CMakeLists.txt A be/src/util/impala-profile-tool.cc M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h 4 files changed, 114 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/16821/1 -- To view, visit http://gerrit.cloudera.org:8080/16821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I6178399ac96e176f7067cc47347e51cda2f3 Gerrit-Change-Number: 16821 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16765 ) Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation PlanRootSink can fail silently if result spooling is enabled and maxMemReservationBytes is less than 2 * MAX_ROW_SIZE. This happens because results are spilled using a SpillableRowBatchQueue which needs 2 buffer (read and write) with at least MAX_ROW_SIZE bytes per buffer. This patch fixes this by setting a lower bound of 2 * MAX_ROW_SIZE while computing the min reservation for the PlanRootSink. Testing: - Pass exhaustive tests. - Add e2e TestResultSpoolingMaxReservation. - Lower MAX_ROW_SIZE on tests where MAX_RESULT_SPOOLING_MEM is set to extremely low value. Also verify that PLAN_ROOT_SINK's ReservationLimit remain unchanged after lowering the MAX_ROW_SIZE. Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Reviewed-on: http://gerrit.cloudera.org:8080/16765 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/runtime/buffered-tuple-stream.h M be/src/runtime/spillable-row-batch-queue.cc M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java M tests/custom_cluster/test_query_retries.py M tests/query_test/test_result_spooling.py 5 files changed, 118 insertions(+), 9 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 9 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16765 ) Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. Patch Set 8: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 8 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 04 Dec 2020 23:55:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10360: Allow simple limit to be treated as sampling hint
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16792 ) Change subject: IMPALA-10360: Allow simple limit to be treated as sampling hint .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7779/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16792 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ife05a5343c913006f7659949b327b63d3f10c04b Gerrit-Change-Number: 16792 Gerrit-PatchSet: 6 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 04 Dec 2020 22:30:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10360: Allow simple limit to be treated as sampling hint
Hello Qifan Chen, Shant Hovsepian, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16792 to look at the new patch set (#6). Change subject: IMPALA-10360: Allow simple limit to be treated as sampling hint .. IMPALA-10360: Allow simple limit to be treated as sampling hint As a follow-up to IMPALA-10314, it is sometimes useful to consider a simple limit as a way to sample from a table if a relevant hint has been provided. Doing a sample instead of pure limit serves dual purposes: (a) it still helps with reducing the planning time since the scan ranges need be computed only for the sample files, (b) it allows sufficient number of files/rows to be read from the table such that after applying filter conditions or joins with another table, the query may still produce the N rows needed for limit. This fuctionality is especially useful if the query is against a view (note that TABLESAMPLE clause cannot be applied to a view). In this patch, a new table level hint, 'convert_limit_to_sample' is added. If this hint is attached to a table either in the main query block or within a view/subquery and simple limit optimization conditions are satisfied (according to IMPALA-10314), the limit is converted to a table sample. For example: set optimize_simple_limit = true; CREATE VIEW v1 as SELECT * FROM T [convert_limit_to_sample] WHERE [always_true] ; SELECT * FROM v1 LIMIT 10; In this case, the limit 10 is converted to a sample of T and the sampling percent is the greater of 1% or ratio (in percent) of limit to the estimated row count of the table (after partition pruning). Testing: - Added a alltypes_date_partition_2 table where the date and timestamp values match (this helps with setting the 'always_true' hint). - Added views with 'convert_limit_to_sample' and 'always_true' hints and added new tests against the views. Modified a few existing tests to reference the new table variant. - Added an end-to-end test. Change-Id: Ife05a5343c913006f7659949b327b63d3f10c04b --- M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/TableRef.java M fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M testdata/bin/compute-table-stats.sh M testdata/datasets/functional/functional_schema_template.sql M testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test M testdata/workloads/functional-query/queries/QueryTest/range-constant-propagation.test 10 files changed, 285 insertions(+), 34 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/16792/6 -- To view, visit http://gerrit.cloudera.org:8080/16792 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ife05a5343c913006f7659949b327b63d3f10c04b Gerrit-Change-Number: 16792 Gerrit-PatchSet: 6 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] [WIP] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16720 ) Change subject: [WIP] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate .. Patch Set 27: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7778/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16720 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691 Gerrit-Change-Number: 16720 Gerrit-PatchSet: 27 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 04 Dec 2020 21:24:11 + Gerrit-HasComments: No
[Impala-ASF-CR] [WIP] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16720 ) Change subject: [WIP] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate .. Patch Set 27: (8 comments) http://gerrit.cloudera.org:8080/#/c/16720/27/be/src/util/min-max-filter-test.cc File be/src/util/min-max-filter-test.cc: http://gerrit.cloudera.org:8080/#/c/16720/27/be/src/util/min-max-filter-test.cc@589 PS27, Line 589: EXPECT_EQ(overflow, false); \ line too long (95 > 90) http://gerrit.cloudera.org:8080/#/c/16720/27/be/src/util/min-max-filter-test.cc@592 PS27, Line 592: EXPECT_EQ(overflow, false); \ line too long (95 > 90) http://gerrit.cloudera.org:8080/#/c/16720/27/be/src/util/min-max-filter-test.cc@597 PS27, Line 597: EXPECT_EQ(overflow, false); \ line too long (95 > 90) http://gerrit.cloudera.org:8080/#/c/16720/27/be/src/util/min-max-filter-test.cc@600 PS27, Line 600: EXPECT_EQ(overflow, false); \ line too long (95 > 90) http://gerrit.cloudera.org:8080/#/c/16720/27/be/src/util/min-max-filter-test.cc@649 PS27, Line 649: CheckDecimalVals(filter##SIZE, decimal##SIZE##_type, d1##SIZE, d1##SIZE); \ line too long (108 > 90) http://gerrit.cloudera.org:8080/#/c/16720/27/be/src/util/min-max-filter-test.cc@653 PS27, Line 653: CheckDecimalVals(filter##SIZE, decimal##SIZE##_type, d1##SIZE, d2##SIZE); \ line too long (108 > 90) http://gerrit.cloudera.org:8080/#/c/16720/27/be/src/util/min-max-filter-test.cc@657 PS27, Line 657: CheckDecimalVals(filter##SIZE, decimal##SIZE##_type, d3##SIZE, d2##SIZE); \ line too long (108 > 90) http://gerrit.cloudera.org:8080/#/c/16720/27/be/src/util/min-max-filter-test.cc@669 PS27, Line 669: CheckDecimalVals(filter##SIZE##2, decimal##SIZE##_type, d3##SIZE, d2##SIZE); \ line too long (110 > 90) -- To view, visit http://gerrit.cloudera.org:8080/16720 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691 Gerrit-Change-Number: 16720 Gerrit-PatchSet: 27 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 04 Dec 2020 21:05:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] [WIP] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate
Qifan Chen has uploaded a new patch set (#27). ( http://gerrit.cloudera.org:8080/16720 ) Change subject: [WIP] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate .. [WIP] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate This patch adds the logic to utilize min/max stats for Parquet row groups or pages to skip these entities when they don't qualify an equi-join predicate. A new class of predicates called overlap predicates is introduced to aid in the determination of whether a Parquet row group or a page overlap with a range computed from the hash join. If not, then the entire Parquet row group or the page are skipped. The new class of predicates co-exist with the existing min/max conjuncts that are introduced based on the local or transitive scan predicates. Both classes of predicates can work individually or together with each other. The overlap predicates are evaluated after the existing min/max conjuncts. Two new run-time profile counters are added for the number of row groups or pages filtered out via the overlap predicates respectively: 1. NumMinMaxFilteredRowGroups 2. NumMinMaxFilteredPages An overlap predicate associated with a join column of type J and a scan column type of S will be formed provided the following is true: Both S and J are Booleans Both S and J are Integers (tinyint, smallint, int, or bigint) Both S and J are approximate numeric (float or double) Both S and J are Decimals with the same precision and scale Both S and J are strings (STRING, CHAR or VARCHAR) Both S and J are date Both S and J are timestamp Testing: 1. Added data type specific overlap method tests in min-max-filter-test.cc (boolean, int, string, date, timestamp and decimal); 2. Unit tested on various column types (int, bigint, string and decimal) with TPCH and TPCDS tables. Benefits were significant when the join column on the outer table is sorted, or when the min/max boundary values of the pages or row groups are monotonic; 3. Added new tests in min_max_filters.test to demonstrate filtered pages and row groups. TBD: 1. Compute a usefulness score for the overlap predicate and integrate it into MAX_NUM_RUNTIME_FILTERS limit; 2. Performance measurement; 3. Core testing. Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691 --- M be/src/exec/exec-node.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/parquet/parquet-column-stats.cc M be/src/exec/parquet/parquet-column-stats.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/scan-node.cc M be/src/runtime/coordinator.cc M be/src/runtime/date-value.cc M be/src/runtime/date-value.h M be/src/runtime/decimal-value.h M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/util/min-max-filter-test.cc M be/src/util/min-max-filter.cc M be/src/util/min-max-filter.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test 23 files changed, 1,090 insertions(+), 153 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/16720/27 -- To view, visit http://gerrit.cloudera.org:8080/16720 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691 Gerrit-Change-Number: 16720 Gerrit-PatchSet: 27 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10360: Allow simple limit to be treated as sampling hint
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/16792 ) Change subject: IMPALA-10360: Allow simple limit to be treated as sampling hint .. Patch Set 5: (3 comments) http://gerrit.cloudera.org:8080/#/c/16792/5/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java: http://gerrit.cloudera.org:8080/#/c/16792/5/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@223 PS5, Line 223: if (getTableRefs().size() == 1) return true; > Should we remove this? It seems hasConvertLimitToSampleHint() can return tr We cannot remove this because as I mentioned in a previous comment (patchset 2) the table level hint is not required in order for simple limit optimization to be applied. For example, there are 2 cases: 1. select * from (select * from t where [always_true] a > 0) limit 10; 2. select * from (select * from t [convert_limit_to_sample] where [always_true] a > 0) limit 10; In both cases, we want to be able to apply the optimization. In case 1, it will just pick first 10 files while in case 2 it will sample across multiple partitions. Case 1 will typically be much faster planning time, so we should support that. http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java File fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java: http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java@209 PS2, Line 209: estimatedTotalRows > Sounds about right. I haven't looked into why the past decision was to only support whole numbers for the sampling but probably the use case wasn't there to motivate supporting fractional values. You may want to look into the history but yeah as I said smaller sample size would be useful in this situation. http://gerrit.cloudera.org:8080/#/c/16792/5/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java File fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java: http://gerrit.cloudera.org:8080/#/c/16792/5/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java@217 PS5, Line 217: partitions.size()/numTotalPartitions > Cool! This will work well when the partitions are about the same size, whic Yes, there is an assumption of uniform distribution (I should add a comment) as this is still a heuristic. I would like to avoid the per partition numRows estimate since that can be way off and in the past Tim has also discouraged using it. The total estimated row count can be completely off as well, so I acknowledge there's weakness here. Even if it was accurate, I also didn't want to add a for loop to add up the numRows of the surviving partitions since that could potentially run into tens of thousands or hundreds of thousands (especially if no pruning happens which is quite common). I am beginning to think the only foolproof way is to let the user specify exact percentage in the hint. e.g [convert_limit_to_sample=5]. This guarantees a 5% sampling of surviving partitions if limit is present and does not rely on stats etc. What do you guys think ? I will have to add a bit of parsing logic to the hint processing. -- To view, visit http://gerrit.cloudera.org:8080/16792 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ife05a5343c913006f7659949b327b63d3f10c04b Gerrit-Change-Number: 16792 Gerrit-PatchSet: 5 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 04 Dec 2020 18:57:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16765 ) Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. Patch Set 8: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 8 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 04 Dec 2020 18:23:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16765 ) Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. Patch Set 8: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6729/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 8 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 04 Dec 2020 18:23:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/16765 ) Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. Patch Set 7: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 7 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 04 Dec 2020 17:46:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10360: Allow simple limit to be treated as sampling hint
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/16792 ) Change subject: IMPALA-10360: Allow simple limit to be treated as sampling hint .. Patch Set 5: (3 comments) Looks good to me! http://gerrit.cloudera.org:8080/#/c/16792/5/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java: http://gerrit.cloudera.org:8080/#/c/16792/5/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@223 PS5, Line 223: if (getTableRefs().size() == 1) return true; Should we remove this? It seems hasConvertLimitToSampleHint() can return true or false depending on whether the hint has been set to the only table ref here. It could be not set. http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java File fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java: http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java@209 PS2, Line 209: estimatedTotalRows > I made this change to use a scaled down value of the estimated row count ( Sounds about right. I also like the idea to specify the sample size in terms of number rows, which will speed up the sampling of a few rows from a very large table, where %1 could be in the order of million rows. I can file a JIRA on this and work on it after the min/max work. http://gerrit.cloudera.org:8080/#/c/16792/5/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java File fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java: http://gerrit.cloudera.org:8080/#/c/16792/5/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java@217 PS5, Line 217: partitions.size()/numTotalPartitions Cool! This will work well when the partitions are about the same size, which is mostly true with hash partitions. For other partition schemes with unequal sizes, such as range partitioning, I wonder if the use of HdfsPartition::numRows_ would work: sample rate = #rows to return / # rows in the surviving partitions. -- To view, visit http://gerrit.cloudera.org:8080/16792 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ife05a5343c913006f7659949b327b63d3f10c04b Gerrit-Change-Number: 16792 Gerrit-PatchSet: 5 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 04 Dec 2020 14:34:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10361: Use field id to resolve columns for Iceberg tables
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16788 ) Change subject: IMPALA-10361: Use field id to resolve columns for Iceberg tables .. Patch Set 5: Code-Review+1 (2 comments) Thanks for adding the tests, the change looks great. I'm planning to do another round next week, so only giving it +1 for now. I think for Iceberg tables we should always try to resolve columns via field id, i.e. for Iceberg tables we can ignore the value of PARQUET_FALLBACK_SCHEMA_RESOLUTION. Do you plan to implement this for ORC tables as well (in a separate patch)? Maybe we should open another Jira/subtask for that. http://gerrit.cloudera.org:8080/#/c/16788/5/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java File fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java: http://gerrit.cloudera.org:8080/#/c/16788/5/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java@339 PS5, Line 339: boolean isFullAcidTable = AcidUtils.isFullAcidTable(msTbl.getParameters()); Iceberg tables cannot be full ACID, maybe it can be a precondition. http://gerrit.cloudera.org:8080/#/c/16788/5/testdata/data/README File testdata/data/README: http://gerrit.cloudera.org:8080/#/c/16788/5/testdata/data/README@608 PS5, Line 608: generated file will contains multi blocks, multi pages per block. Please add information about the newly added files and tests. -- To view, visit http://gerrit.cloudera.org:8080/16788 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I057bdc6ab2859cc4d40de5ed428d0c20028b8435 Gerrit-Change-Number: 16788 Gerrit-PatchSet: 5 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Fri, 04 Dec 2020 13:19:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10361: Use field id to resolve columns for Iceberg tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16788 ) Change subject: IMPALA-10361: Use field id to resolve columns for Iceberg tables .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks// : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16788 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I057bdc6ab2859cc4d40de5ed428d0c20028b8435 Gerrit-Change-Number: 16788 Gerrit-PatchSet: 5 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Fri, 04 Dec 2020 11:48:08 + Gerrit-HasComments: No
[Impala-ASF-CR] WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16741 ) Change subject: WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/16741/7/be/src/exprs/iceberg-functions-ir.cc File be/src/exprs/iceberg-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/16741/7/be/src/exprs/iceberg-functions-ir.cc@59 PS7, Line 59: if (input.val4 < 0 && result.val4 > 0) { : return TruncatePartitionTransformDecimalImpl(input.val4, width.val); : } Could you add a comment what happens here? Shouldn't we use something like RETURN_IF_OVERFLOW in decimal-operators-ir.cc? impala_udf::DecimalVal is able to hold decimals with any size, but impala::DecimalVal might only have 4 bytes of storage, this might be problematic in some cases. -- To view, visit http://gerrit.cloudera.org:8080/16741 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd Gerrit-Change-Number: 16741 Gerrit-PatchSet: 7 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 04 Dec 2020 11:48:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10361: Use field id to resolve columns for Iceberg tables
wangsheng has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/16788 ) Change subject: IMPALA-10361: Use field id to resolve columns for Iceberg tables .. IMPALA-10361: Use field id to resolve columns for Iceberg tables We supported resolve column by field id for Iceberg table in this patch. We can use 'set PARQUET_FALLBACK_SCHEMA_RESOLUTION=FIELD_ID' or 'set PARQUET_FALLBACK_SCHEMA_RESOLUTION=2' to choose field id resolving. But pay attention, if you use this for non-Iceberg table, the result will be NULL. Change-Id: I057bdc6ab2859cc4d40de5ed428d0c20028b8435 --- M be/src/exec/parquet/parquet-metadata-utils.cc M be/src/exec/parquet/parquet-metadata-utils.h M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M be/src/runtime/types.cc M be/src/runtime/types.h M be/src/service/query-options-test.cc M common/thrift/CatalogObjects.thrift M common/thrift/Descriptors.thrift M common/thrift/ImpalaInternalService.thrift M common/thrift/Types.thrift M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/IcebergColumn.java A fe/src/main/java/org/apache/impala/catalog/IcebergStructField.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/StructType.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/Type.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-08/action=view/1-1-13d79bd6-4b97-4680-b4e1-52e93b6ce04e-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-08/action=view/6-6-305c9b7a-f42d-4245-b806-dfa7a792593f-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-08/action=view/9-9-224fe2d6-b0d9-42d6-bc95-15f52ecb29ad-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-08/action=view/00017-17-1a38e294-5992-48d9-a18e-08e129bb418c-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-08/action=view/00023-23-74cfcf22-3de2-489a-b1ec-d5141e75a8e8-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-08/action=view/00027-27-5f91dc85-b8f3-4cc2-a5c6-38b7fee49709-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-08/action=view/00030-30-dc3510cc-e765-43bc-be03-c5561a8d50a3-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-08/action=view/00031-31-364afc4a-b718-406d-a532-58fab5c8f85d-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-09/action=click/4-4-7a1a8e89-8aeb-4405-be64-76557432cf21-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-09/action=click/00014-14-765d552a-fddc-42f3-adfd-ecba20a01d80-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-09/action=click/00015-15-9957db43-3b9a-4a50-9946-d003cc1d461c-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-09/action=click/00019-19-1e1895d0-1f42-4c30-989f-968802831077-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-09/action=click/00020-20-bb59ac6d-aeee-4c35-9f8a-1a03127d33b8-0.parquet A testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-09/action=click/00028-28-44ba3ad9-737c-4416-a32c-501cc9a4aa90-0.parquet A
[Impala-ASF-CR] IMPALA-9936: Only send invalidations in DDL responses to LocalCatalog coordinators
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16435 ) Change subject: IMPALA-9936: Only send invalidations in DDL responses to LocalCatalog coordinators .. Patch Set 4: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/16435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id45827295ddee3eb6e98a11c55f582b2aebe5f38 Gerrit-Change-Number: 16435 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 04 Dec 2020 11:12:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10360: Allow simple limit to be treated as sampling hint
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16792 ) Change subject: IMPALA-10360: Allow simple limit to be treated as sampling hint .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7776/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16792 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ife05a5343c913006f7659949b327b63d3f10c04b Gerrit-Change-Number: 16792 Gerrit-PatchSet: 5 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 04 Dec 2020 08:31:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10360: Allow simple limit to be treated as sampling hint
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/16792 ) Change subject: IMPALA-10360: Allow simple limit to be treated as sampling hint .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java File fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java: http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java@209 PS2, Line 209: estimatedTotalRows > The TABLESAMPLE is a long type, so yeah the minimum can be 1%. You're righ I made this change to use a scaled down value of the estimated row count (after partition pruning). Also added a test which exercises both partition pruning and convert_limit_to_sample . When adding the test I realized that in my previous patchset compute stats was not run on the alltypes_date_partition_2 table. I added that to the compute-table-stats.sh script and made related updates to the plans. -- To view, visit http://gerrit.cloudera.org:8080/16792 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ife05a5343c913006f7659949b327b63d3f10c04b Gerrit-Change-Number: 16792 Gerrit-PatchSet: 5 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 04 Dec 2020 08:13:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10360: Allow simple limit to be treated as sampling hint
Hello Qifan Chen, Shant Hovsepian, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16792 to look at the new patch set (#5). Change subject: IMPALA-10360: Allow simple limit to be treated as sampling hint .. IMPALA-10360: Allow simple limit to be treated as sampling hint As a follow-up to IMPALA-10314, it is sometimes useful to consider a simple limit as a way to sample from a table if a relevant hint has been provided. Doing a sample instead of pure limit serves dual purposes: (a) it still helps with reducing the planning time since the scan ranges need be computed only for the sample files, (b) it allows sufficient number of files/rows to be read from the table such that after applying filter conditions or joins with another table, the query may still produce the N rows needed for limit. This fuctionality is especially useful if the query is against a view (note that TABLESAMPLE clause cannot be applied to a view). In this patch, a new table level hint, 'convert_limit_to_sample' is added. If this hint is attached to a table either in the main query block or within a view/subquery and simple limit optimization conditions are satisfied (according to IMPALA-10314), the limit is converted to a table sample. For example: set optimize_simple_limit = true; CREATE VIEW v1 as SELECT * FROM T [convert_limit_to_sample] WHERE [always_true] ; SELECT * FROM v1 LIMIT 10; In this case, the limit 10 is converted to a sample of T and the sampling percent is the greater of 1% or ratio (in percent) of limit to the estimated row count of the table (after partition pruning). Testing: - Added a alltypes_date_partition_2 table where the date and timestamp values match (this helps with setting the 'always_true' hint). - Added views with 'convert_limit_to_sample' and 'always_true' hints and added new tests against the views. Modified a few existing tests to reference the new table variant. - Added an end-to-end test. Change-Id: Ife05a5343c913006f7659949b327b63d3f10c04b --- M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/TableRef.java M fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M testdata/bin/compute-table-stats.sh M testdata/datasets/functional/functional_schema_template.sql M testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test M testdata/workloads/functional-query/queries/QueryTest/range-constant-propagation.test 10 files changed, 279 insertions(+), 34 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/16792/5 -- To view, visit http://gerrit.cloudera.org:8080/16792 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ife05a5343c913006f7659949b327b63d3f10c04b Gerrit-Change-Number: 16792 Gerrit-PatchSet: 5 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong