[Impala-ASF-CR] IMPALA-10532: TestOverlapMinMaxFilters.test overlap min max filters seems flaky
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17289 ) Change subject: IMPALA-10532: TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky .. Patch Set 6: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17289 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I527de530f7db1ce959e7ef2ae3ced18677221c9f Gerrit-Change-Number: 17289 Gerrit-PatchSet: 6 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 15 Apr 2021 04:02:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10532: TestOverlapMinMaxFilters.test overlap min max filters seems flaky
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17289 ) Change subject: IMPALA-10532: TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky .. Patch Set 6: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7077/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/17289 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I527de530f7db1ce959e7ef2ae3ced18677221c9f Gerrit-Change-Number: 17289 Gerrit-PatchSet: 6 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 15 Apr 2021 04:02:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10652: Optimize the checking of the size of incremental stats
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17299 ) Change subject: IMPALA-10652: Optimize the checking of the size of incremental stats .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8586/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee Gerrit-Change-Number: 17299 Gerrit-PatchSet: 4 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Thu, 15 Apr 2021 02:46:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10532: TestOverlapMinMaxFilters.test overlap min max filters seems flaky
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17289 ) Change subject: IMPALA-10532: TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky .. Patch Set 5: retest -- To view, visit http://gerrit.cloudera.org:8080/17289 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I527de530f7db1ce959e7ef2ae3ced18677221c9f Gerrit-Change-Number: 17289 Gerrit-PatchSet: 5 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 15 Apr 2021 02:33:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 ) Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator .. Patch Set 21: retest -- To view, visit http://gerrit.cloudera.org:8080/17252 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964 Gerrit-Change-Number: 17252 Gerrit-PatchSet: 21 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 15 Apr 2021 02:33:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10652: Optimize the checking of the size of incremental stats
liuyao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17299 ) Change subject: IMPALA-10652: Optimize the checking of the size of incremental stats .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/17299/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: http://gerrit.cloudera.org:8080/#/c/17299/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@1784 PS3, Line 1784: fu > nit. table alltypes? Done -- To view, visit http://gerrit.cloudera.org:8080/17299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee Gerrit-Change-Number: 17299 Gerrit-PatchSet: 4 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Thu, 15 Apr 2021 02:27:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10652: Optimize the checking of the size of incremental stats
Hello Aman Sinha, Qifan Chen, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17299 to look at the new patch set (#4). Change subject: IMPALA-10652: Optimize the checking of the size of incremental stats .. IMPALA-10652: Optimize the checking of the size of incremental stats Modify the estimation method of incremental statistics size: incremental statistics size = Existing partition statistics + This time calculation partition stats - Repeated calculation partition stats Testing: All partitions of a table have no incremental stats. --Calculate the incremental stats of all partitions, the incremental stats size exceeds the threshold, an error is reported. --Calculate the incremental stats of one partition, no error is reported. Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee --- M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java 2 files changed, 45 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/17299/4 -- To view, visit http://gerrit.cloudera.org:8080/17299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee Gerrit-Change-Number: 17299 Gerrit-PatchSet: 4 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao
[Impala-ASF-CR] IMPALA-10631: Upgrade DataSketches to version 3.0.0
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17294 ) Change subject: IMPALA-10631: Upgrade DataSketches to version 3.0.0 .. Patch Set 4: All tests run with the pre-review-test job passed, see: https://jenkins.impala.io/job/pre-review-test/909/. The failed test in the gerrit-verify-dryrun job (query_test/test_fetch.py, from: https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/13658/) did not reappear. How to deal with this situation, re-run the gerrit-verify-dryrun job? -- To view, visit http://gerrit.cloudera.org:8080/17294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I37622a7643d015b80f55b802421eae826aa7a4f9 Gerrit-Change-Number: 17294 Gerrit-PatchSet: 4 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 15 Apr 2021 01:57:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10645: Log catalogd HMS API metrics
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17284 ) Change subject: IMPALA-10645: Log catalogd HMS API metrics .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8585/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17284 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id41afe89bbe3395c158919bddd09f302c6752287 Gerrit-Change-Number: 17284 Gerrit-PatchSet: 7 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Thu, 15 Apr 2021 01:10:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10645: Log catalogd HMS API metrics
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17284 ) Change subject: IMPALA-10645: Log catalogd HMS API metrics .. Patch Set 7: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7076/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/17284 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id41afe89bbe3395c158919bddd09f302c6752287 Gerrit-Change-Number: 17284 Gerrit-PatchSet: 7 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Thu, 15 Apr 2021 00:50:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10645: Log catalogd HMS API metrics
Vihang Karajgaonkar has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/17284 ) Change subject: IMPALA-10645: Log catalogd HMS API metrics .. IMPALA-10645: Log catalogd HMS API metrics Expose rpc duration, cache hit ratio, etc for Catalogd HMS APIs. The metrics currently are only logged at debug level when the catalogd starts a HMS endpoint. A followup will be done separately to expose them to the debug UI. This patch was originally contributed by Kishen Das. Testing: 1. Deployed the catalogd's metastore server and made sure that the metrics are logged in the catalogd.INFO logs. Change-Id: Id41afe89bbe3395c158919bddd09f302c6752287 --- M common/thrift/JniCatalog.thrift M common/thrift/metrics.json M fe/src/main/java/org/apache/impala/catalog/CatalogHmsAPIHelper.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java M fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java A fe/src/main/java/org/apache/impala/catalog/metastore/HmsApiNameEnum.java M fe/src/main/java/org/apache/impala/catalog/metastore/ICatalogMetastoreServer.java M fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java M fe/src/main/java/org/apache/impala/catalog/metastore/NoOpCatalogMetastoreServer.java M fe/src/main/java/org/apache/impala/catalog/monitor/CatalogMonitor.java 11 files changed, 455 insertions(+), 36 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/84/17284/7 -- To view, visit http://gerrit.cloudera.org:8080/17284 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id41afe89bbe3395c158919bddd09f302c6752287 Gerrit-Change-Number: 17284 Gerrit-PatchSet: 7 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-10645: Log catalogd HMS API metrics
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17284 ) Change subject: IMPALA-10645: Log catalogd HMS API metrics .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/17284/4/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java File fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java: http://gerrit.cloudera.org:8080/#/c/17284/4/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java@96 PS4, Line 96: public static final String CATALOGD_CACHE_HIT_RATIO_METRIC = "catalogd-hms-cache.cache-hit-ratio"; > line too long (100 > 90) Removed this since it was unused. -- To view, visit http://gerrit.cloudera.org:8080/17284 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id41afe89bbe3395c158919bddd09f302c6752287 Gerrit-Change-Number: 17284 Gerrit-PatchSet: 4 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Thu, 15 Apr 2021 00:49:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table cache for hms ddl apis which modify tables and partitions.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17298 ) Change subject: IMPALA-10648: Invalidate catalogd table cache for hms ddl apis which modify tables and partitions. .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8584/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17298 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98 Gerrit-Change-Number: 17298 Gerrit-PatchSet: 5 Gerrit-Owner: Sourabh Goyal Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Thu, 15 Apr 2021 00:40:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table cache for hms ddl apis which modify tables and partitions.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17298 ) Change subject: IMPALA-10648: Invalidate catalogd table cache for hms ddl apis which modify tables and partitions. .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/17298/5/tests/custom_cluster/test_metastore_service.py File tests/custom_cluster/test_metastore_service.py: http://gerrit.cloudera.org:8080/#/c/17298/5/tests/custom_cluster/test_metastore_service.py@565 PS5, Line 565: q flake8: E501 line too long (96 > 90 characters) -- To view, visit http://gerrit.cloudera.org:8080/17298 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98 Gerrit-Change-Number: 17298 Gerrit-PatchSet: 5 Gerrit-Owner: Sourabh Goyal Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Thu, 15 Apr 2021 00:21:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table cache for hms ddl apis which modify tables and partitions.
Hello Quanlong Huang, Vihang Karajgaonkar, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17298 to look at the new patch set (#5). Change subject: IMPALA-10648: Invalidate catalogd table cache for hms ddl apis which modify tables and partitions. .. IMPALA-10648: Invalidate catalogd table cache for hms ddl apis which modify tables and partitions. For non transactional tables, invalidate the table from cache if HMS DDL apis are accessed from catalogd's metastore server. Any subsequent get table request fetches the table from HMS and loads it in cache. This ensures that any get_table/get_partition requests after DDL operations on the same table return updated table. This behavior has a performance penalty (since table loading in cache takes time) but ensures consistency. This change is behind catalogd server's flag: invalidate_hms_cache_on_ddls which is enabled by default. The flag needs to be turned off if this change becomes a performance bottleneck. Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98 --- M be/src/catalog/catalog-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M tests/custom_cluster/test_metastore_service.py 6 files changed, 508 insertions(+), 44 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/17298/5 -- To view, visit http://gerrit.cloudera.org:8080/17298 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98 Gerrit-Change-Number: 17298 Gerrit-PatchSet: 5 Gerrit-Owner: Sourabh Goyal Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table cache for hms ddl apis which modify tables and partitions.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17298 ) Change subject: IMPALA-10648: Invalidate catalogd table cache for hms ddl apis which modify tables and partitions. .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8583/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17298 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98 Gerrit-Change-Number: 17298 Gerrit-PatchSet: 4 Gerrit-Owner: Sourabh Goyal Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Wed, 14 Apr 2021 23:38:50 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP IMPALA-7825: Upgrade Thrift version to 0.11.0
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17170 ) Change subject: WIP IMPALA-7825: Upgrade Thrift version to 0.11.0 .. Patch Set 5: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7075/ -- To view, visit http://gerrit.cloudera.org:8080/17170 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6 Gerrit-Change-Number: 17170 Gerrit-PatchSet: 5 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 14 Apr 2021 23:27:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table cache for hms ddl apis which modify tables and partitions.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17298 ) Change subject: IMPALA-10648: Invalidate catalogd table cache for hms ddl apis which modify tables and partitions. .. Patch Set 4: (18 comments) http://gerrit.cloudera.org:8080/#/c/17298/4/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java File fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java: http://gerrit.cloudera.org:8080/#/c/17298/4/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@1067 PS4, Line 1067: client.getHiveClient().getThriftClient().drop_partitions_req(dropPartitionsRequest); line too long (94 > 90) http://gerrit.cloudera.org:8080/#/c/17298/4/tests/custom_cluster/test_metastore_service.py File tests/custom_cluster/test_metastore_service.py: http://gerrit.cloudera.org:8080/#/c/17298/4/tests/custom_cluster/test_metastore_service.py@470 PS4, Line 470: . flake8: E131 continuation line unaligned for hanging indent http://gerrit.cloudera.org:8080/#/c/17298/4/tests/custom_cluster/test_metastore_service.py@476 PS4, Line 476: e flake8: E501 line too long (92 > 90 characters) http://gerrit.cloudera.org:8080/#/c/17298/4/tests/custom_cluster/test_metastore_service.py@489 PS4, Line 489: # flake8: E265 block comment should start with '# ' http://gerrit.cloudera.org:8080/#/c/17298/4/tests/custom_cluster/test_metastore_service.py@490 PS4, Line 490: e flake8: E501 line too long (92 > 90 characters) http://gerrit.cloudera.org:8080/#/c/17298/4/tests/custom_cluster/test_metastore_service.py@496 PS4, Line 496: flake8: E501 line too long (96 > 90 characters) http://gerrit.cloudera.org:8080/#/c/17298/4/tests/custom_cluster/test_metastore_service.py@499 PS4, Line 499: t flake8: E501 line too long (92 > 90 characters) http://gerrit.cloudera.org:8080/#/c/17298/4/tests/custom_cluster/test_metastore_service.py@526 PS4, Line 526: # flake8: E303 too many blank lines (2) http://gerrit.cloudera.org:8080/#/c/17298/4/tests/custom_cluster/test_metastore_service.py@531 PS4, Line 531: flake8: E501 line too long (96 > 90 characters) http://gerrit.cloudera.org:8080/#/c/17298/4/tests/custom_cluster/test_metastore_service.py@533 PS4, Line 533: t flake8: E501 line too long (92 > 90 characters) http://gerrit.cloudera.org:8080/#/c/17298/4/tests/custom_cluster/test_metastore_service.py@550 PS4, Line 550: # flake8: E303 too many blank lines (2) http://gerrit.cloudera.org:8080/#/c/17298/4/tests/custom_cluster/test_metastore_service.py@554 PS4, Line 554: flake8: E501 line too long (96 > 90 characters) http://gerrit.cloudera.org:8080/#/c/17298/4/tests/custom_cluster/test_metastore_service.py@561 PS4, Line 561: e flake8: E501 line too long (100 > 90 characters) http://gerrit.cloudera.org:8080/#/c/17298/4/tests/custom_cluster/test_metastore_service.py@568 PS4, Line 568: q flake8: E501 line too long (96 > 90 characters) http://gerrit.cloudera.org:8080/#/c/17298/4/tests/custom_cluster/test_metastore_service.py@577 PS4, Line 577: flake8: E251 unexpected spaces around keyword / parameter equals http://gerrit.cloudera.org:8080/#/c/17298/4/tests/custom_cluster/test_metastore_service.py@577 PS4, Line 577: flake8: E251 unexpected spaces around keyword / parameter equals http://gerrit.cloudera.org:8080/#/c/17298/4/tests/custom_cluster/test_metastore_service.py@578 PS4, Line 578: flake8: E251 unexpected spaces around keyword / parameter equals http://gerrit.cloudera.org:8080/#/c/17298/4/tests/custom_cluster/test_metastore_service.py@578 PS4, Line 578: flake8: E251 unexpected spaces around keyword / parameter equals -- To view, visit http://gerrit.cloudera.org:8080/17298 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98 Gerrit-Change-Number: 17298 Gerrit-PatchSet: 4 Gerrit-Owner: Sourabh Goyal Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sourabh Goyal Gerrit-Comment-Date: Wed, 14 Apr 2021 23:16:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table cache for hms ddl apis which modify tables and partitions.
Sourabh Goyal has posted comments on this change. ( http://gerrit.cloudera.org:8080/17298 ) Change subject: IMPALA-10648: Invalidate catalogd table cache for hms ddl apis which modify tables and partitions. .. Patch Set 4: (26 comments) http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java File fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java: http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@141 PS3, Line 141: import org.apache.hadoop.hive.metastore.api.LockRequest; > Is this used somewhere? No. Will remove it. Thanks for pointing it out. http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@222 PS3, Line 222: UnlockRequest; > Is this used somewhere? Will remove it. http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@347 PS3, Line 347: "is > nit: We use 4 spaces indention for multi-line statement. However, many chan Ack http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@348 PS3, Line 348: } > line too long (91 > 90) Ack http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@349 PS3, Line 349: > nit: 4 spaces indention Ack http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@890 PS3, Line 890: client.getHiveClient().getThriftClient().add_partition(partition); > line too long (99 > 90) Ack http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@891 PS3, Line 891: invalidateNonTransactionalTableIfExists(partition.getDbName(), > line too long (94 > 90) Ack http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@904 PS3, Line 904: invalidateNonTransactionalTableIfExists(partition.getDbName(), > line too long (94 > 90) Ack http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@914 PS3, Line 914: try (MetaStoreClient client = catalog_.getMetaStoreClient()) { > line too long (102 > 90) Ack http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@915 PS3, Line 915: > Is it possible that these partitions belong to different tables? Very likely the partitions belong to the same table. I will confirm and make the change accordingly. Thanks for pointing it out. http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@929 PS3, Line 929: int numPartitionsAdded = client.getHiveCl > Same question as above Will check this one too. http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@930 PS3, Line 930: .getThriftClient().add_partitions_pspec(list); > line too long (104 > 90) Ack http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@955 PS3, Line 955: AddPartitionsResult result = client.getHiveClient().getThriftClient() > line too long (114 > 90) Ack http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@968 PS3, Line 968: try (MetaStoreClient client = catalog_.getMetaStoreClient()) { > line too long (104 > 90) Ack http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@994 PS3, Line 994: throws InvalidObjectException, AlreadyExistsException, MetaException, TException { > line too long (91 > 90) Ack http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@1021 PS3, Line 1021: throws NoSuchObjectException, MetaException, TException { > line too long (94 > 90) Ack http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@1034 PS3, Line 1034: boolean deleteData) > line too long (106 > 90) Ack http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@1049 PS3, Line 1049: String partName, boolean deleteData, EnvironmentContext envContext) > line too long (91 > 90) Ack
[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table cache for hms ddl apis which modify tables and partitions.
Hello Quanlong Huang, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17298 to look at the new patch set (#4). Change subject: IMPALA-10648: Invalidate catalogd table cache for hms ddl apis which modify tables and partitions. .. IMPALA-10648: Invalidate catalogd table cache for hms ddl apis which modify tables and partitions. For non transactional tables, invalidate the table from cache if HMS DDL apis are accessed from catalogd's metastore server. Any subsequent get table request fetches the table from HMS and loads it in cache. This ensures that any get_table/get_partition requests after DDL operations on the same table return updated table. This behavior has a performance penalty (since table loading in cache takes time) but ensures consistency. This change is behind catalogd server's flag: invalidate_hms_cache_on_ddls which is enabled by default. The flag needs to be turned off if this change becomes a performance bottleneck. Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98 --- M be/src/catalog/catalog-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M tests/custom_cluster/test_metastore_service.py 6 files changed, 510 insertions(+), 44 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/17298/4 -- To view, visit http://gerrit.cloudera.org:8080/17298 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98 Gerrit-Change-Number: 17298 Gerrit-PatchSet: 4 Gerrit-Owner: Sourabh Goyal Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-10584: Defer advancing read page if stream only has 2 pages.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17195 ) Change subject: IMPALA-10584: Defer advancing read page if stream only has 2 pages. .. Patch Set 6: > Patch Set 6: > > (8 comments) > > Patch set 5 change the condition on when to NOT advance the read page. > Read page will not be advanced in UnpinStream if read page is attached to > output RowBatch and there are only 2 pages in the stream. Sorry, I should mention patch set 6 instead of 5. -- To view, visit http://gerrit.cloudera.org:8080/17195 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a Gerrit-Change-Number: 17195 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 14 Apr 2021 22:43:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10584: Defer advancing read page if stream only has 2 pages.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17195 ) Change subject: IMPALA-10584: Defer advancing read page if stream only has 2 pages. .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8582/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17195 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a Gerrit-Change-Number: 17195 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 14 Apr 2021 22:36:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10584: Defer advancing read page if stream only has 2 pages.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17195 ) Change subject: IMPALA-10584: Defer advancing read page if stream only has 2 pages. .. Patch Set 6: (8 comments) Patch set 5 change the condition on when to NOT advance the read page. Read page will not be advanced in UnpinStream if read page is attached to output RowBatch and there are only 2 pages in the stream. http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc File be/src/runtime/buffered-tuple-stream-test.cc: http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc@1271 PS5, Line 1271: ASSERT_OK(stream.UnpinStream(BufferedTupleStream::UNPIN_ALL_EXCEPT_CURRENT)); > Not related to this patch, but I'm confused how do we verify we have advanc I think this verify the read page has been advanced by not hitting DCHECK at ExpectedPinCount(). http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc@1300 PS5, Line 1300: // Test that UnpinStream defer advancing the read page when all rows from the read page > Could you add some comments above this? e.g. Done http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc@1333 PS5, Line 1333: ASSERT_TRUE(read_batch.num_rows() < num_rows); > Could you add a comment here that we are adding rows without releasing the Done http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc@1334 PS5, Line 1334: ASSERT_TRUE(!eos); > Could you leave a comment about why do we run this twice? I guess we want t Done http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc@1354 PS5, Line 1354: // Retry inserting this row by decreasing the index. > Could you add ASSERT_FALSE(stream.is_pinned()) at the end? Done http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc@1356 PS5, Line 1356: // even if we're not immediately cleaning up the read_batch. > Do we need read_batch.Reset() as well? Done http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream.cc File be/src/runtime/buffered-tuple-stream.cc: http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream.cc@728 PS5, Line 728: // defer advancing the read page until the next GetNext() call by the reader : // (see IMPALA-10584). : defer_advancing_read_page = true; : } : } : : if > Ok, after some investigation, I'm not confident I can implement solution 2) Marking this as Done. Lets continue this discussion in patch set 5. http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream.cc@747 PS5, Line 747: > May DCHECK() on this assertion here. Done -- To view, visit http://gerrit.cloudera.org:8080/17195 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a Gerrit-Change-Number: 17195 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 14 Apr 2021 22:22:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10645: Log catalogd HMS API metrics
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17284 ) Change subject: IMPALA-10645: Log catalogd HMS API metrics .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8581/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17284 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id41afe89bbe3395c158919bddd09f302c6752287 Gerrit-Change-Number: 17284 Gerrit-PatchSet: 4 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Wed, 14 Apr 2021 22:20:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10584: Defer advancing read page if stream only has 2 pages.
Hello Quanlong Huang, Qifan Chen, Joe McDonnell, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17195 to look at the new patch set (#6). Change subject: IMPALA-10584: Defer advancing read page if stream only has 2 pages. .. IMPALA-10584: Defer advancing read page if stream only has 2 pages. TestScratchLimit::test_with_unlimited_scratch_limit has been intermittently crashing in ubuntu-16.04-dockerised-tests environment after result spooling is enabled by default in IMPALA-9856. DCHECK violation occurs in ReservationTracker::CheckConsistency() due to BufferedTupleStream wrongly tries to reclaim memory reservation while unpinning the stream. For this bug to surface, all of the following needs to happen: - Stream is in pinned mode. - There are only 2 pages in the stream: 1 read and 1 write. - Stream can not increase reservation anymore either due to memory pressure or low buffer/memory limit. - The stream read page has been fully read and is attached to output RowBatch. But the output RowBatch has not cleaned up yet. - BufferedTupleStream::UnpinStream is invoked. The memory accounting bug happens because UnpinStream proceeds to NextReadPage where the read page buffer was mistakenly assumed as released. default_page_len_ bytes were added into write_page_reservation_ and subsequently violates the total memory reservation. This patch fixes the bug by deferring advancement of the read iterator in UnpinStream if the read page is attached to output RowBatch and there are only 2 pages in the stream. This is OK because after UnpinStream finished, the stream is now in unpinned mode and has_read_write_page is false. The next AddRow operation is then allowed to unpin the previous write page first before reusing the reservation to allocate a new write page. The next GetNext call will be responsible to advance the read page. Testing: - Add be test DeferAdvancingReadPage. - Loop the TestScratchLimit::test_with_unlimited_scratch_limit in my local dev machine and verify that each test passed without triggering the DCHECK violation. - Reenable result spooling in TestScratchLimit that was disabled in IMPALA-10559. - Pass core tests. Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a --- M be/src/runtime/buffered-tuple-stream-test.cc M be/src/runtime/buffered-tuple-stream.cc M be/src/runtime/buffered-tuple-stream.h M tests/query_test/test_scratch_limit.py 4 files changed, 100 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/17195/6 -- To view, visit http://gerrit.cloudera.org:8080/17195 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a Gerrit-Change-Number: 17195 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-9287: Add support for embedded metastore mode
Vihang Karajgaonkar has abandoned this change. ( http://gerrit.cloudera.org:8080/15223 ) Change subject: IMPALA-9287: Add support for embedded metastore mode .. Abandoned I am not actively working on this patch anymore. -- To view, visit http://gerrit.cloudera.org:8080/15223 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: Iab7de6a7f11ef0bff85c0fb03f004756c3cde784 Gerrit-Change-Number: 15223 Gerrit-PatchSet: 8 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-10645: Log catalogd HMS API metrics
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17284 ) Change subject: IMPALA-10645: Log catalogd HMS API metrics .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/17284/4/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java File fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java: http://gerrit.cloudera.org:8080/#/c/17284/4/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java@96 PS4, Line 96: public static final String CATALOGD_CACHE_HIT_RATIO_METRIC = "catalogd-hms-cache.cache-hit-ratio"; line too long (100 > 90) -- To view, visit http://gerrit.cloudera.org:8080/17284 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id41afe89bbe3395c158919bddd09f302c6752287 Gerrit-Change-Number: 17284 Gerrit-PatchSet: 4 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Wed, 14 Apr 2021 22:02:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8795 : Enable event polling by default in tests
Vihang Karajgaonkar has abandoned this change. ( http://gerrit.cloudera.org:8080/13922 ) Change subject: IMPALA-8795 : Enable event polling by default in tests .. Abandoned -- To view, visit http://gerrit.cloudera.org:8080/13922 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: I7279349d4900e24fbcf558f290549496844ce138 Gerrit-Change-Number: 13922 Gerrit-PatchSet: 10 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-8795 : Enable event polling by default in dockerized tests.
Vihang Karajgaonkar has abandoned this change. ( http://gerrit.cloudera.org:8080/14272 ) Change subject: IMPALA-8795 : Enable event polling by default in dockerized tests. .. Abandoned I don't think this patch is actively being worked upon anymore. -- To view, visit http://gerrit.cloudera.org:8080/14272 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: I222b64236060b3c4c2d554e2f10e129984ebe972 Gerrit-Change-Number: 14272 Gerrit-PatchSet: 18 Gerrit-Owner: Anurag Mantripragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-10645: Log catalogd HMS API metrics
Vihang Karajgaonkar has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/17284 ) Change subject: IMPALA-10645: Log catalogd HMS API metrics .. IMPALA-10645: Log catalogd HMS API metrics Expose rpc duration, cache hit ratio, etc for Catalogd HMS APIs. The metrics currently are only logged at trace level. A followup will be done separately to expose them to the debug UI. This patch was originally contributed by Kishen Das. Testing: 1. Deployed the catalogd's metastore server and made sure that the metrics are logged in the catalogd.INFO logs. Change-Id: Id41afe89bbe3395c158919bddd09f302c6752287 --- M common/thrift/JniCatalog.thrift M common/thrift/metrics.json M fe/src/main/java/org/apache/impala/catalog/CatalogHmsAPIHelper.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java M fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java A fe/src/main/java/org/apache/impala/catalog/metastore/HmsApiNameEnum.java M fe/src/main/java/org/apache/impala/catalog/metastore/ICatalogMetastoreServer.java M fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java M fe/src/main/java/org/apache/impala/catalog/metastore/NoOpCatalogMetastoreServer.java M fe/src/main/java/org/apache/impala/catalog/monitor/CatalogMonitor.java 12 files changed, 470 insertions(+), 40 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/84/17284/4 -- To view, visit http://gerrit.cloudera.org:8080/17284 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id41afe89bbe3395c158919bddd09f302c6752287 Gerrit-Change-Number: 17284 Gerrit-PatchSet: 4 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-10645: Log catalogd HMS API metrics
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17284 ) Change subject: IMPALA-10645: Log catalogd HMS API metrics .. Patch Set 2: (5 comments) http://gerrit.cloudera.org:8080/#/c/17284/2/common/thrift/JniCatalog.thrift File common/thrift/JniCatalog.thrift: http://gerrit.cloudera.org:8080/#/c/17284/2/common/thrift/JniCatalog.thrift@849 PS2, Line 849: > nit: redundant blank line Done http://gerrit.cloudera.org:8080/#/c/17284/2/common/thrift/JniCatalog.thrift@884 PS2, Line 884: > nit: redundant blank line Done http://gerrit.cloudera.org:8080/#/c/17284/2/common/thrift/JniCatalog.thrift@909 PS2, Line 909: > nit: redundant blank line Done http://gerrit.cloudera.org:8080/#/c/17284/2/common/thrift/JniCatalog.thrift@934 PS2, Line 934: > nit: redundant blank lines Done http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/metastore/HmsApiNameEnum.java File fe/src/main/java/org/apache/impala/catalog/metastore/HmsApiNameEnum.java: http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/metastore/HmsApiNameEnum.java@23 PS2, Line 23: * > nit: remove this line Done -- To view, visit http://gerrit.cloudera.org:8080/17284 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id41afe89bbe3395c158919bddd09f302c6752287 Gerrit-Change-Number: 17284 Gerrit-PatchSet: 2 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Wed, 14 Apr 2021 22:01:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10645: Log catalogd HMS API metrics
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17284 ) Change subject: IMPALA-10645: Log catalogd HMS API metrics .. Patch Set 2: (26 comments) Sorry for the formatting errors. Hopefully they are all gone now. http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/CatalogHmsAPIHelper.java File fe/src/main/java/org/apache/impala/catalog/CatalogHmsAPIHelper.java: http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/CatalogHmsAPIHelper.java@142 PS2, Line 142: catalog, reqBuilder.build(), dbName, tblName, HmsApiNameEnum.GET_TABLE_REQ.apiName()); > line too long (94 > 90) Done http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/CatalogHmsAPIHelper.java@245 PS2, Line 245: catalogReq.build(), dbName, tblName, HmsApiNameEnum.GET_PARTITION_BY_EXPR.apiName()); > line too long (93 > 90) Done http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/CatalogHmsAPIHelper.java@367 PS2, Line 367: requestBuilder.build(), dbName, tblName, HmsApiNameEnum.GET_PARTITION_BY_NAMES.apiName()); > line too long (98 > 90) Done http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2221 PS2, Line 2221: > nit: I think we use 4 spaces idention here Done http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2227 PS2, Line 2227: > nit: 4 spaces idention Yeah, looks like the original patch from Kishen used wrong code-style template. Done. http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2230 PS2, Line 2230: if > nit: need one space after "if" Done http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2231 PS2, Line 2231: // Update the cache miss metric, as the valid write id list did not match and we have to reload the table. > line too long (114 > 90) Done http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2233 PS2, Line 2233: > nit: 4 spaces idention Done http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2067 PS2, Line 2067: Counter misses = CatalogMonitor.INSTANCE.getCatalogdHmsCacheMetrics().getCounter(FILEMETADATA_CACHE_MISS_METRIC); > Previously we count the metrics per table, this changes it to use a global Yes, I think changing this from table level to global level doesn't make a lot of sense. The metric value is not very useful if it is at global level. Let me see how can I change it to table level back. http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2068 PS2, Line 2068: Counter hits = CatalogMonitor.INSTANCE.getCatalogdHmsCacheMetrics().getCounter(FILEMETADATA_CACHE_HIT_METRIC); > line too long (114 > 90) Done http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2131 PS2, Line 2131: getFileMetadataCacheHitRate > After this patch, the hit rate will be a global hit rate counted on all tab I think we should keep it at table level. http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java File fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java: http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java@77 PS2, Line 77: private static final Logger LOG = : LoggerFactory.getLogger(CatalogMetastoreServer.class); > nit: don't need changes since the original line fits in 90 chars. Done http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java@94 PS2, Line 94: private static final String ACTIVE_CONNECTIONS_METRIC : = "metastore.active.connections"; > nit: don't need changes as well. Done http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java@99 PS2, Line 99: = > nit: I think our code style tend to put this in the above line Done http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java@143 PS2, Line 143: > nit:
[Impala-ASF-CR] IMPALA-10502: Handle CREATE/DROP events correctly
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17308 ) Change subject: IMPALA-10502: Handle CREATE/DROP events correctly .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8580/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17308 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia2c5e96b48abac015240f20295b3ec3b1d71f24a Gerrit-Change-Number: 17308 Gerrit-PatchSet: 3 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 14 Apr 2021 21:49:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10502: Handle CREATE/DROP events correctly
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17308 ) Change subject: IMPALA-10502: Handle CREATE/DROP events correctly .. Patch Set 3: (33 comments) http://gerrit.cloudera.org:8080/#/c/17308/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/17308/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1526 PS3, Line 1526: .removePartitionsIfNotAddedLater(eventId_, dbName_, tblName_, droppedPartitions_, line too long (95 > 90) http://gerrit.cloudera.org:8080/#/c/17308/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java: http://gerrit.cloudera.org:8080/#/c/17308/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@257 PS3, Line 257: String.format(CatalogOpExecutor.HMS_RPC_ERROR_FORMAT_STR, "getNextNotification")); line too long (92 > 90) http://gerrit.cloudera.org:8080/#/c/17308/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@737 PS3, Line 737: CatalogOpExecutor catalogOpExecutor, long startSyncFromId, long eventPollingInterval) line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/17308/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@744 PS3, Line 744: new MetastoreEventsProcessor(catalogOpExecutor, startSyncFromId, eventPollingInterval); line too long (95 > 90) http://gerrit.cloudera.org:8080/#/c/17308/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/17308/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1304 PS3, Line 1304: private Table addHdfsPartitions(MetaStoreClient msClient, Table tbl, List partitions) line too long (98 > 90) http://gerrit.cloudera.org:8080/#/c/17308/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1704 PS3, Line 1704: CreateDatabaseEvent.CREATE_DATABASE_EVENT_TYPE.equals(notificationEvent.getEventType()) line too long (105 > 90) http://gerrit.cloudera.org:8080/#/c/17308/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1715 PS3, Line 1715: // Due to HIVE-24899 we cannot rely on the database object present in the event line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/17308/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1918 PS3, Line 1918: eventIdToPartVals.computeIfAbsent(eventId, l -> new ArrayList<>()).add(partVals); line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/17308/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2544 PS3, Line 2544: events = MetastoreEventsProcessor.getNextMetastoreEvents(catalog_, eventId, new NotificationFilter() { line too long (110 > 90) http://gerrit.cloudera.org:8080/#/c/17308/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2547 PS3, Line 2547: return DropTableEvent.DROP_TABLE_EVENT_TYPE.equals(notificationEvent.getEventType()) && finalMsTbl line too long (110 > 90) http://gerrit.cloudera.org:8080/#/c/17308/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3119 PS3, Line 3119: Pair eventTblPair = getTableFromEvents( line too long (99 > 90) http://gerrit.cloudera.org:8080/#/c/17308/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3188 PS3, Line 3188: CreateTableEvent.CREATE_TABLE_EVENT_TYPE.equals(notificationEvent.getEventType()) line too long (97 > 90) http://gerrit.cloudera.org:8080/#/c/17308/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3385 PS3, Line 3385: return CreateTableEvent.CREATE_TABLE_EVENT_TYPE.equals(notificationEvent.getEventType()) line too long (104 > 90) http://gerrit.cloudera.org:8080/#/c/17308/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3386 PS3, Line 3386: && newTable.getDbName().equalsIgnoreCase(notificationEvent.getDbName()) line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/17308/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3387 PS3, Line 3387: && newTable.getTableName().equalsIgnoreCase(notificationEvent.getTableName()); line too long (98 > 90) http://gerrit.cloudera.org:8080/#/c/17308/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3780 PS3, Line 3780: "EventId: {} Not adding partitions since the database {} does not exist anymore.", line too long (92 > 90)
[Impala-ASF-CR] IMPALA-10502: Handle CREATE/DROP events correctly
Vihang Karajgaonkar has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17308 Change subject: IMPALA-10502: Handle CREATE/DROP events correctly .. IMPALA-10502: Handle CREATE/DROP events correctly The current way to detect self-events in case of CREATE/DROP events on database, table and partition is problematic when the same object is created and dropped repeatedly in quick succession. This happens mainly due to couple of reasons. For example if we have a sequence of CREATE_TABLE, DROP_TABLE, CREATE_TABLE ... on the same table, it is possible that when the create table event is being processed, the table is not present in catalog because it was dropped recently. In such a case, events processor does not have enough state information in catalogd to determine that this table has been dropped from the catalogd and the event should be ignored. Similarly, if a drop event is being processed, it is possible that the table has been recreated with the same name when the drop event is received. In such a case, events processor removes the table from the catalogd. This can cause problems for queries which expect the table to exist or not exist. E.g create table query fails with a table already exists or a drop table query fails with table does not exist error. In order to fix this issue, catalogd now keeps track of dropped objects in a deleteLog which are garbage collected as the events come in. Every time a database, table or parition is dropped, the deleteLog is populated with the the drop event id generated due to the drop operation. This deleteLog is looked up when the event is received to determine if the event can be ignored. Testing: 1. Added a new test which loops to create create/drop events for database, table and partitions. 2. Added new metrics which the test verifies to ensure that events don't create or drop the object. Change-Id: Ia2c5e96b48abac015240f20295b3ec3b1d71f24a --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/Db.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java A fe/src/main/java/org/apache/impala/catalog/events/DeleteEventLog.java A fe/src/main/java/org/apache/impala/catalog/events/EventFactory.java M fe/src/main/java/org/apache/impala/catalog/events/ExternalEventsProcessor.java M fe/src/main/java/org/apache/impala/catalog/events/InFlightEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/catalog/events/NoOpEventProcessor.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M fe/src/test/java/org/apache/impala/catalog/events/EventsProcessorStressTest.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M fe/src/test/java/org/apache/impala/catalog/events/SynchronousHMSEventProcessorForTests.java M fe/src/test/java/org/apache/impala/testutil/CatalogServiceTestCatalog.java M tests/custom_cluster/test_event_processing.py 21 files changed, 2,158 insertions(+), 834 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/17308/3 -- To view, visit http://gerrit.cloudera.org:8080/17308 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ia2c5e96b48abac015240f20295b3ec3b1d71f24a Gerrit-Change-Number: 17308 Gerrit-PatchSet: 3 Gerrit-Owner: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-10584: Defer advancing read page when reservation is tight.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17195 ) Change subject: IMPALA-10584: Defer advancing read page when reservation is tight. .. Patch Set 5: (1 comment) s http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream.cc File be/src/runtime/buffered-tuple-stream.cc: http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream.cc@728 PS5, Line 728: if (buffer_pool_client_->GetUnusedReservation() < last_attached_page_len_) { : // Since attached_to_output_batch is true and GetUnusedReservation() is less : // than last_attached_page_len_, the reader is most likely has not clean up the : // RowBatch where the read page is attached to. We defer advancing the read page : // until the next GetNext() call by the reader (see IMPALA-10584). : defer_advancing_read_page = true; : } > Your understanding is correct. Ideally, BufferedTupleStream should be 100% Ok, after some investigation, I'm not confident I can implement solution 2) safely. The refactoring will touch several code that I don't fully understand yet such as grouping-aggregator, partitioned-hash-join-builder, and analytic-eval-node. What if I change the logic into this instead: defer_advancing_read_page = num_pages_ <= 2; Basically, we avoid getting into situation where we ended up with only 1 read-write page while unpinning the stream. I think this is also a valid reason not to advance the read page (the other reason is that the reader has not freed the attached buffer). The "stealing" still won't happen and the DCHECK still won't hit. -- To view, visit http://gerrit.cloudera.org:8080/17195 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a Gerrit-Change-Number: 17195 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 14 Apr 2021 21:27:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10658: LOAD DATA INPATH silently fails between HDFS and Azure ABFS
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17316 ) Change subject: IMPALA-10658: LOAD DATA INPATH silently fails between HDFS and Azure ABFS .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/17316 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id807e8a200b83283a09d3a917185cabab930017d Gerrit-Change-Number: 17316 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Comment-Date: Wed, 14 Apr 2021 18:29:48 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10584: Defer advancing read page when reservation is tight.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17195 ) Change subject: IMPALA-10584: Defer advancing read page when reservation is tight. .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream.cc File be/src/runtime/buffered-tuple-stream.cc: http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream.cc@728 PS5, Line 728: if (buffer_pool_client_->GetUnusedReservation() < last_attached_page_len_) { : // Since attached_to_output_batch is true and GetUnusedReservation() is less : // than last_attached_page_len_, the reader is most likely has not clean up the : // RowBatch where the read page is attached to. We defer advancing the read page : // until the next GetNext() call by the reader (see IMPALA-10584). : defer_advancing_read_page = true; : } > Maybe I'm misunderstanding something. This still seems tricky for me. I agr Your understanding is correct. Ideally, BufferedTupleStream should be 100% aware whether its client has freed the attached buffer or not. It raise a question though on how to do that, because the stream lost reference to the buffer once it attach the buffer to output row batch. In relation of BufferedPlanRootSink and SpillableRowBatchQueue, I can think of 2 solution: 1). Track the amount of both used and unused reservation at the end of GetNext() call. We can then check the reservation amount again in UnpinStream() to determine whether client has freed the buffer or not. However, this can be tricky and lead to more corner cases such as what if we increase/reduce the total reservation in between? 2). Add an explicit method in BufferedTupleStream to signal the free. SpillableRowBatchQueue then need similar method as well to chain the signal down from BufferedPlanRootSink to BufferedTupleStream. The mechanics might becomes awkward, but it maintain correctness. I'll try to implement solution 2) first. -- To view, visit http://gerrit.cloudera.org:8080/17195 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a Gerrit-Change-Number: 17195 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 14 Apr 2021 18:13:35 + Gerrit-HasComments: Yes
[Impala-ASF-CR] WIP IMPALA-7825: Upgrade Thrift version to 0.11.0
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17170 ) Change subject: WIP IMPALA-7825: Upgrade Thrift version to 0.11.0 .. Patch Set 4: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/8579/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/17170 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6 Gerrit-Change-Number: 17170 Gerrit-PatchSet: 4 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 14 Apr 2021 18:04:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 ) Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator .. Patch Set 21: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7072/ -- To view, visit http://gerrit.cloudera.org:8080/17252 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964 Gerrit-Change-Number: 17252 Gerrit-PatchSet: 21 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Wed, 14 Apr 2021 17:48:33 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP IMPALA-7825: Upgrade Thrift version to 0.11.0
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17170 ) Change subject: WIP IMPALA-7825: Upgrade Thrift version to 0.11.0 .. Patch Set 3: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/8578/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/17170 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6 Gerrit-Change-Number: 17170 Gerrit-PatchSet: 3 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 14 Apr 2021 17:48:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10532: TestOverlapMinMaxFilters.test overlap min max filters seems flaky
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17289 ) Change subject: IMPALA-10532: TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky .. Patch Set 5: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7070/ -- To view, visit http://gerrit.cloudera.org:8080/17289 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I527de530f7db1ce959e7ef2ae3ced18677221c9f Gerrit-Change-Number: 17289 Gerrit-PatchSet: 5 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Wed, 14 Apr 2021 17:46:59 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP IMPALA-7825: Upgrade Thrift version to 0.11.0
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17170 ) Change subject: WIP IMPALA-7825: Upgrade Thrift version to 0.11.0 .. Patch Set 5: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7075/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/17170 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6 Gerrit-Change-Number: 17170 Gerrit-PatchSet: 5 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 14 Apr 2021 17:42:50 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP IMPALA-7825: Upgrade Thrift version to 0.11.0
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17170 to look at the new patch set (#4). Change subject: WIP IMPALA-7825: Upgrade Thrift version to 0.11.0 .. WIP IMPALA-7825: Upgrade Thrift version to 0.11.0 Before this patch Impala mainly used Thrift 0.9.3, but it was possible to compile Impala shell with Thrift 0.11.0, so the 0.11.0 Thrift lib was already included in toolchain. Most of the changes are related to replacing boost:: with std:: shared_ptr-s in cpp code (this is a continuation of patch by Vihang). The Thrift upgrade also needs an Impyla relase with Thrift 0.11.0, as Impala's test framework relies on Impyla. A thrift_sasl release is also needed, because it currently pins Thrift version to 0.9.3 for Python 2. The current patch uses alpha releases from Impyla and thrift_sasl that use thrift 0.11.0. Testing: - ran Impyla's test suite with Python 2 and 3 Other TODOs: - remove preexisting extra logic needed to use 0.11.0 for python - the thrift compilation was changed to generated template code - this was an easy solution to avoid a compilation issue after merging IMPALA-10600, but I should check its effect on compilation time Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6 --- M be/src/benchmarks/network-perf-benchmark.cc M be/src/catalog/catalog-server.h M be/src/catalog/catalog-service-client-wrapper.h M be/src/catalog/catalogd-main.cc M be/src/rpc/TAcceptQueueServer.cpp M be/src/rpc/TAcceptQueueServer.h M be/src/rpc/auth-provider.h M be/src/rpc/authentication.cc M be/src/rpc/hs2-http-test.cc M be/src/rpc/thrift-client.h M be/src/rpc/thrift-server-test.cc M be/src/rpc/thrift-server.cc M be/src/rpc/thrift-server.h M be/src/rpc/thrift-thread.cc M be/src/rpc/thrift-thread.h M be/src/rpc/thrift-util.cc M be/src/rpc/thrift-util.h M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/service/impalad-main.cc M be/src/statestore/statestore-service-client-wrapper.h M be/src/statestore/statestore-subscriber-client-wrapper.h M be/src/statestore/statestore-subscriber.cc M be/src/statestore/statestore-subscriber.h M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M be/src/testutil/in-process-servers.h M be/src/transport/THttpServer.cpp M be/src/transport/THttpServer.h M be/src/transport/THttpTransport.cpp M be/src/transport/THttpTransport.h M be/src/transport/TSaslClientTransport.cpp M be/src/transport/TSaslClientTransport.h M be/src/transport/TSaslServerTransport.cpp M be/src/transport/TSaslServerTransport.h M be/src/transport/TSaslTransport.cpp M be/src/transport/TSaslTransport.h M be/src/util/parquet-reader.cc M bin/impala-config.sh M common/thrift/CMakeLists.txt M infra/python/deps/requirements.txt M java/pom.xml M shell/ext-py/thrift_sasl-0.4.2/setup.py 43 files changed, 189 insertions(+), 189 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/17170/4 -- To view, visit http://gerrit.cloudera.org:8080/17170 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6 Gerrit-Change-Number: 17170 Gerrit-PatchSet: 4 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10657: Remove accidental usage of shaded imports
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17314 ) Change subject: IMPALA-10657: Remove accidental usage of shaded imports .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/17314 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifa67aec96539db7861416f3e71cca83607e3d8c9 Gerrit-Change-Number: 17314 Gerrit-PatchSet: 2 Gerrit-Owner: John Sherman Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Comment-Date: Wed, 14 Apr 2021 17:40:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10657: Remove accidental usage of shaded imports
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17314 ) Change subject: IMPALA-10657: Remove accidental usage of shaded imports .. IMPALA-10657: Remove accidental usage of shaded imports - This changes seemingly accidental usages of shaded imports with the direct dependency - This is helpful to reduce confusion and possibly reduce the required jars for certain usages of the frontend jars Change-Id: Ifa67aec96539db7861416f3e71cca83607e3d8c9 Reviewed-on: http://gerrit.cloudera.org:8080/17314 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java M fe/src/main/java/org/apache/impala/catalog/Transaction.java M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java M fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java 4 files changed, 6 insertions(+), 4 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/17314 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ifa67aec96539db7861416f3e71cca83607e3d8c9 Gerrit-Change-Number: 17314 Gerrit-PatchSet: 3 Gerrit-Owner: John Sherman Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell
[Impala-ASF-CR] WIP IMPALA-7825: Upgrade Thrift version to 0.11.0
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17170 ) Change subject: WIP IMPALA-7825: Upgrade Thrift version to 0.11.0 .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7074/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/17170 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6 Gerrit-Change-Number: 17170 Gerrit-PatchSet: 3 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 14 Apr 2021 17:28:46 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP IMPALA-7825: Upgrade Thrift version to 0.11.0
Csaba Ringhofer has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/17170 ) Change subject: WIP IMPALA-7825: Upgrade Thrift version to 0.11.0 .. WIP IMPALA-7825: Upgrade Thrift version to 0.11.0 Before this patch Impala mainly used Thrift 0.9.3, but it was possible to compile Impala shell with Thrift 0.11.0, so the 0.11.0 Thrift lib was already included in toolchain. Most of the changes are related to replacing boost:: with std:: shared_ptr-s in cpp code (this is a continuation of patch by Vihang). The Thrift upgrade also needs an Impyla relase with Thrift 0.11.0, as Impala's test framework relies on Impyla. A thrift_sasl release is also needed, because it currently pins Thrift version to 0.9.3 for Python 2. The current patch uses alpha releases from Impyla and thrift_sasl that use thrift 0.11.0. Testing: - ran Impyla's test suite with Python 2 and 3 Other TODOs: - remove preexisting extra logic needed to use 0.11.0 for python - the thrift compilation was changed to generated template code - this was an easy solution to avoid a compilation issue after merging IMPALA-10600, but I should check its effect on compilation time Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6 --- M be/src/benchmarks/network-perf-benchmark.cc M be/src/catalog/catalog-server.h M be/src/catalog/catalog-service-client-wrapper.h M be/src/catalog/catalogd-main.cc M be/src/rpc/TAcceptQueueServer.cpp M be/src/rpc/TAcceptQueueServer.h M be/src/rpc/auth-provider.h M be/src/rpc/authentication.cc M be/src/rpc/hs2-http-test.cc M be/src/rpc/thrift-client.h M be/src/rpc/thrift-server-test.cc M be/src/rpc/thrift-server.cc M be/src/rpc/thrift-server.h M be/src/rpc/thrift-thread.cc M be/src/rpc/thrift-thread.h M be/src/rpc/thrift-util.cc M be/src/rpc/thrift-util.h M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/service/impalad-main.cc M be/src/statestore/statestore-service-client-wrapper.h M be/src/statestore/statestore-subscriber-client-wrapper.h M be/src/statestore/statestore-subscriber.cc M be/src/statestore/statestore-subscriber.h M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M be/src/testutil/in-process-servers.h M be/src/transport/THttpServer.cpp M be/src/transport/THttpServer.h M be/src/transport/THttpTransport.cpp M be/src/transport/THttpTransport.h M be/src/transport/TSaslClientTransport.cpp M be/src/transport/TSaslClientTransport.h M be/src/transport/TSaslServerTransport.cpp M be/src/transport/TSaslServerTransport.h M be/src/transport/TSaslTransport.cpp M be/src/transport/TSaslTransport.h M be/src/util/parquet-reader.cc M bin/impala-config.sh M common/thrift/CMakeLists.txt M infra/python/deps/requirements.txt M java/pom.xml M shell/ext-py/thrift_sasl-0.4.2/setup.py 43 files changed, 189 insertions(+), 190 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/17170/3 -- To view, visit http://gerrit.cloudera.org:8080/17170 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6 Gerrit-Change-Number: 17170 Gerrit-PatchSet: 3 Gerrit-Owner: Csaba Ringhofer
[Impala-ASF-CR] IMPALA-10658: LOAD DATA INPATH silently fails between HDFS and Azure ABFS
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17316 ) Change subject: IMPALA-10658: LOAD DATA INPATH silently fails between HDFS and Azure ABFS .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/17316/1/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java: http://gerrit.cloudera.org:8080/#/c/17316/1/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@588 PS1, Line 588: return fs.exists(path); If the qualified paths aren't the same and fs.exists(path) returns true, do we know they are on the same filesystem? I'm wondering whether this should just return false. -- To view, visit http://gerrit.cloudera.org:8080/17316 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id807e8a200b83283a09d3a917185cabab930017d Gerrit-Change-Number: 17316 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Comment-Date: Wed, 14 Apr 2021 16:52:57 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10654: Fix precision loss in DecimalValue to double conversion.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17303 ) Change subject: IMPALA-10654: Fix precision loss in DecimalValue to double conversion. .. Patch Set 5: (4 comments) The change looks great. A test with Parquet would be nice, other than that I only found nitpicks. When you upload a new PS please reply to the comments. Most of the time clicking on "Done" is enough. This way we'll know we won't left anything open. http://gerrit.cloudera.org:8080/#/c/17303/5/be/src/runtime/decimal-value.inline.h File be/src/runtime/decimal-value.inline.h: http://gerrit.cloudera.org:8080/#/c/17303/5/be/src/runtime/decimal-value.inline.h@753 PS5, Line 753: ( nit: parentheses not needed http://gerrit.cloudera.org:8080/#/c/17303/5/be/src/runtime/decimal-value.inline.h@754 PS5, Line 754: nit: we indent with 4 spaces if the line belongs to the same statement as the previous line. http://gerrit.cloudera.org:8080/#/c/17303/5/be/src/runtime/decimal-value.inline.h@755 PS5, Line 755: is_negative, nit: I think we should either put each argument to a separate line, or put all of them to the previous line. In this case I think the latter makes more sense. http://gerrit.cloudera.org:8080/#/c/17303/5/testdata/workloads/functional-query/queries/QueryTest/values.test File testdata/workloads/functional-query/queries/QueryTest/values.test: http://gerrit.cloudera.org:8080/#/c/17303/5/testdata/workloads/functional-query/queries/QueryTest/values.test@104 PS5, Line 104: For write path, the default precision of 16 is a limitation yet to be That's true for text tables, but could you please add a test with a Parquet table? I think we should getter better precision there. -- To view, visit http://gerrit.cloudera.org:8080/17303 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56f0652cb8f81a491b87d9b108a94c00ae6c99a1 Gerrit-Change-Number: 17303 Gerrit-PatchSet: 5 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 14 Apr 2021 16:31:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 ) Change subject: IMPALA-10656: Fire insert events before commit .. Patch Set 6: Code-Review+1 (4 comments) Looked at the backend, the change looks great! http://gerrit.cloudera.org:8080/#/c/17313/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17313/6//COMMIT_MSG@17 PS6, Line 17: d, nit: some lines are longer than 72 chars http://gerrit.cloudera.org:8080/#/c/17313/6/be/src/exec/hdfs-table-sink.cc File be/src/exec/hdfs-table-sink.cc: http://gerrit.cloudera.org:8080/#/c/17313/6/be/src/exec/hdfs-table-sink.cc@484 PS6, Line 484: current_file +1 for the rename :) http://gerrit.cloudera.org:8080/#/c/17313/6/be/src/runtime/dml-exec-state.h File be/src/runtime/dml-exec-state.h: http://gerrit.cloudera.org:8080/#/c/17313/6/be/src/runtime/dml-exec-state.h@63 PS6, Line 63: staging_dir_to_clean_up nit: could you please mention this parameter in the comment? http://gerrit.cloudera.org:8080/#/c/17313/6/be/src/runtime/dml-exec-state.cc File be/src/runtime/dml-exec-state.cc: http://gerrit.cloudera.org:8080/#/c/17313/6/be/src/runtime/dml-exec-state.cc@496 PS6, Line 496: } nit: add extra line -- To view, visit http://gerrit.cloudera.org:8080/17313 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9 Gerrit-Change-Number: 17313 Gerrit-PatchSet: 6 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 14 Apr 2021 16:12:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option
fifteencai has posted comments on this change. ( http://gerrit.cloudera.org:8080/17306 ) Change subject: IMPALA-10445: Adjust NDV's scale with query option .. Patch Set 15: Resolved assert problem in newly added unit tests. -- To view, visit http://gerrit.cloudera.org:8080/17306 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50 Gerrit-Change-Number: 17306 Gerrit-PatchSet: 15 Gerrit-Owner: fifteencai Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: fifteencai Gerrit-Comment-Date: Wed, 14 Apr 2021 15:18:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17306 ) Change subject: IMPALA-10445: Adjust NDV's scale with query option .. Patch Set 15: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8577/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17306 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50 Gerrit-Change-Number: 17306 Gerrit-PatchSet: 15 Gerrit-Owner: fifteencai Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: fifteencai Gerrit-Comment-Date: Wed, 14 Apr 2021 14:50:25 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17306 ) Change subject: IMPALA-10445: Adjust NDV's scale with query option .. Patch Set 14: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8576/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17306 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50 Gerrit-Change-Number: 17306 Gerrit-PatchSet: 14 Gerrit-Owner: fifteencai Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: fifteencai Gerrit-Comment-Date: Wed, 14 Apr 2021 14:42:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option
fifteencai has uploaded a new patch set (#15). ( http://gerrit.cloudera.org:8080/17306 ) Change subject: IMPALA-10445: Adjust NDV's scale with query option .. IMPALA-10445: Adjust NDV's scale with query option This is a new way to control NDV's scale. Since IMPALA-2658, we can trade memory for more accurate estimation by setting larger `scale` in SQL function NDV(, ). However the use of larger NDV scale requires the modification of SQL queries which may not be practical in certain applications: - Firstly, SQL writers are reluctant to lower that scale. They prone to fill up the scale, which will make the cluster unstable. Especially when there are `group by`s with high cardinalities. So it is wiser to let cluster admin other than sql writer choose appropriate scale. - Secondly, In some application scenarios, queries are stored in DBs. In a BI system, for example, rewriting thousands of SQLs is risky. In this commit, we introduced a new Query Option `DEFAULT_NDV_SCALE` with the following semantics: 1. The allowed value is in the range [1..10]; 2. Previously, the scale used in NDV() functions was fixed at 2. Now the scale is provided by the newly added query options. 3. It does not influence the NDV scale for SQL function NDV(, ) in which the NDV scale is provided by the 2nd argument . We also refactored method `Analyze` to make sure APPX_COUNT_DISTINCT can work with this query option. After this, cluster admins can substitute `count(distinct )` with `ndv(, scale)`. Implementation details: - The default value of DEFAULT_NDV_SCALE is 2, so we won't change the default ndv behavior. - We port `CountDistinctToNdv` transform logic from `SelectStmt.analyze()` to `ExprRewriter`, making it compatible with further rewrite rules. - The newly added rewrite rule `DefaultNdvScaleRule` is applied after `CountDistinctToNdvRule`. Usage: To set a default ndv scale: ``` SET DEFAULT_NDV_SCALE = 10; ``` To unset: ``` SET DEFAULT_NDV_SCALE = 2; ``` Here are test results of a typical workload (cardinality=40,090,650): ++ | Metric| Count Distinct |NDV2|NDV5|NDV10 | ++ | Memory(GB) | 3.83 |1.84|1.85| 1.89 | | Duration(s) | 182.89| 30.22|29.72 | 29.24 | | ErrorRate |0% |1.8%|1.17% | 0.06% | ++ Testing: 1) Added 3 unit test cases in `ExprRewriteRulesTest`. 2) Added 5 unit test cases in `ExprRewriterTest`. 3) Ran all front-end unit test, passed. 4) Added a new query-option test. Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50 --- M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java A fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java A fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java 11 files changed, 315 insertions(+), 34 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17306/15 -- To view, visit http://gerrit.cloudera.org:8080/17306 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50 Gerrit-Change-Number: 17306 Gerrit-PatchSet: 15 Gerrit-Owner: fifteencai Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: fifteencai
[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17306 ) Change subject: IMPALA-10445: Adjust NDV's scale with query option .. Patch Set 14: (4 comments) http://gerrit.cloudera.org:8080/#/c/17306/14/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java File fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java: http://gerrit.cloudera.org:8080/#/c/17306/14/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java@541 PS14, Line 541: //-- line too long (96 > 90) http://gerrit.cloudera.org:8080/#/c/17306/14/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java@546 PS14, Line 546: // CASE 1: DEFAULT_NDV_SCALE=5(same as the value in original sql), APPX_COUNT_DISTINCT=True line too long (95 > 90) http://gerrit.cloudera.org:8080/#/c/17306/14/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java@552 PS14, Line 552: //-- line too long (96 > 90) http://gerrit.cloudera.org:8080/#/c/17306/14/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java@553 PS14, Line 553: String sql1 = "SELECT ndv(id), ndv(id, 5), count(DISTINCT id) FROM functional.alltypes"; line too long (92 > 90) -- To view, visit http://gerrit.cloudera.org:8080/17306 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50 Gerrit-Change-Number: 17306 Gerrit-PatchSet: 14 Gerrit-Owner: fifteencai Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: fifteencai Gerrit-Comment-Date: Wed, 14 Apr 2021 14:23:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option
fifteencai has uploaded a new patch set (#14). ( http://gerrit.cloudera.org:8080/17306 ) Change subject: IMPALA-10445: Adjust NDV's scale with query option .. IMPALA-10445: Adjust NDV's scale with query option This is a new way to control NDV's scale. Since IMPALA-2658, we can trade memory for more accurate estimation by setting larger `scale` in SQL function NDV(, ). However the use of larger NDV scale requires the modification of SQL queries which may not be practical in certain applications: - Firstly, SQL writers are reluctant to lower that scale. They prone to fill up the scale, which will make the cluster unstable. Especially when there are `group by`s with high cardinalities. So it is wiser to let cluster admin other than sql writer choose appropriate scale. - Secondly, In some application scenarios, queries are stored in DBs. In a BI system, for example, rewriting thousands of SQLs is risky. In this commit, we introduced a new Query Option `DEFAULT_NDV_SCALE` with the following semantics: 1. The allowed value is in the range [1..10]; 2. Previously, the scale used in NDV() functions was fixed at 2. Now the scale is provided by the newly added query options. 3. It does not influence the NDV scale for SQL function NDV(, ) in which the NDV scale is provided by the 2nd argument . We also refactored method `Analyze` to make sure APPX_COUNT_DISTINCT can work with this query option. After this, cluster admins can substitute `count(distinct )` with `ndv(, scale)`. Implementation details: - The default value of DEFAULT_NDV_SCALE is 2, so we won't change the default ndv behavior. - We port `CountDistinctToNdv` transform logic from `SelectStmt.analyze()` to `ExprRewriter`, making it compatible with further rewrite rules. - The newly added rewrite rule `DefaultNdvScaleRule` is applied after `CountDistinctToNdvRule`. Usage: To set a default ndv scale: ``` SET DEFAULT_NDV_SCALE = 10; ``` To unset: ``` SET DEFAULT_NDV_SCALE = 2; ``` Here are test results of a typical workload (cardinality=40,090,650): ++ | Metric| Count Distinct |NDV2|NDV5|NDV10 | ++ | Memory(GB) | 3.83 |1.84|1.85| 1.89 | | Duration(s) | 182.89| 30.22|29.72 | 29.24 | | ErrorRate |0% |1.8%|1.17% | 0.06% | ++ Testing: 1) Added 3 unit test cases in `ExprRewriteRulesTest`. 2) Added 5 unit test cases in `ExprRewriterTest`. 3) Ran all front-end unit test, passed. 4) Added a new query-option test. Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50 --- M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java A fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java A fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java 11 files changed, 312 insertions(+), 34 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17306/14 -- To view, visit http://gerrit.cloudera.org:8080/17306 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50 Gerrit-Change-Number: 17306 Gerrit-PatchSet: 14 Gerrit-Owner: fifteencai Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: fifteencai
[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17306 ) Change subject: IMPALA-10445: Adjust NDV's scale with query option .. Patch Set 13: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8575/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17306 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50 Gerrit-Change-Number: 17306 Gerrit-PatchSet: 13 Gerrit-Owner: fifteencai Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: fifteencai Gerrit-Comment-Date: Wed, 14 Apr 2021 14:20:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10584: Defer advancing read page when reservation is tight.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17195 ) Change subject: IMPALA-10584: Defer advancing read page when reservation is tight. .. Patch Set 5: (8 comments) http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc File be/src/runtime/buffered-tuple-stream-test.cc: http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc@1271 PS5, Line 1271: ASSERT_OK(stream.UnpinStream(BufferedTupleStream::UNPIN_ALL_EXCEPT_CURRENT)); Not related to this patch, but I'm confused how do we verify we have advanced the read page here. I think this just verify that UnpinStream won't hit any DCHECKs. Do you have any ideas on this? http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc@1300 PS5, Line 1300: TEST_F(SimpleTupleStreamTest, DeferAdvancingReadPage) { Could you add some comments above this? e.g. Test that UnpinStream defer advancing the read page when all rows from the read page are attached to a returned RowBatch but got not enough reservation. http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc@1333 PS5, Line 1333: Could you add a comment here that we are adding rows without releasing the read_batch (i.e. read_batch.Reset())? So we will hit not enough reservation and need unpinning the stream to continue. http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc@1334 PS5, Line 1334: for (int j = 0; j < 2; ++j) { Could you leave a comment about why do we run this twice? I guess we want to test it when 'stream' is pinned and unpinned. BTW, could you add ASSERT_TRUE(stream.is_pinned()) before the loop? http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc@1354 PS5, Line 1354: } Could you add ASSERT_FALSE(stream.is_pinned()) at the end? http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc@1356 PS5, Line 1356: } Do we need read_batch.Reset() as well? http://gerrit.cloudera.org:8080/#/c/17195/3/be/src/runtime/buffered-tuple-stream.cc File be/src/runtime/buffered-tuple-stream.cc: http://gerrit.cloudera.org:8080/#/c/17195/3/be/src/runtime/buffered-tuple-stream.cc@723 PS3, Line 723: > That is a good point, thanks! Ah yeah, you are right. The page handle is destroyed in AttachBufferToBatch(). Thanks for catching this! http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream.cc File be/src/runtime/buffered-tuple-stream.cc: http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream.cc@728 PS5, Line 728: if (buffer_pool_client_->GetUnusedReservation() < last_attached_page_len_) { : // Since attached_to_output_batch is true and GetUnusedReservation() is less : // than last_attached_page_len_, the reader is most likely has not clean up the : // RowBatch where the read page is attached to. We defer advancing the read page : // until the next GetNext() call by the reader (see IMPALA-10584). : defer_advancing_read_page = true; : } Maybe I'm misunderstanding something. This still seems tricky for me. I agree that this can avoid hitting the DCHECK in this JIRA, because we now won't advance to get a read/write page so won't check consistency (i.e. won't find that unused reservation become negative). But in the case when the unused reservation >= last_attached_page_len_, we always advance the read page regardless whether the reader has freed the row batch buffers. The accounting seems wrong in the period after we advance the read page and before the reader frees the attached buffer: If we advance to get a read/write page in NextReadPage() and save default_page_len_ of reservation for a later write page, it seems like "stealing" default_page_len_ worth of space from the unused reservation, and then after the reader frees the attached buffer, we returns the stealed space back to the unused reservation. Although the unused reservation won't be negative so won't hit any DCHECKs in this case, it might be possible that we run out of unused reservation earlier than it should (if somehow the reader doesn't free the buffer for a long time). I wonder if we should detect whether the reader has freed the attached buffer and make the decision by that. Please correct me if I'm misunderstanding anything. -- To view, visit http://gerrit.cloudera.org:8080/17195 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a Gerrit-Change-Number: 17195 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto Gerrit-Reviewer:
[Impala-ASF-CR] WIP - IMPALA-10642: Write support for Parquet Bloom filters - most common types
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/17262 ) Change subject: WIP - IMPALA-10642: Write support for Parquet Bloom filters - most common types .. Patch Set 3: (7 comments) http://gerrit.cloudera.org:8080/#/c/17262/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17262/3//COMMIT_MSG@16 PS3, Line 16: TBL_PROPS TBL_PROPS will configure bloom filters per column, right? http://gerrit.cloudera.org:8080/#/c/17262/3//COMMIT_MSG@16 PS3, Line 16: TBL_PROPS - write Parquet Bloom filters as set in table properties : IF_NO_DICT - write Parquet Bloom filters if the row group is not :fully dictionary encoded What is the relation of TBL_PROPS and IF_NO_DICT? E.g. with TBL_PROPS will you write bloom filter even if the column dictionary encoded? http://gerrit.cloudera.org:8080/#/c/17262/1/be/src/exec/parquet/hdfs-parquet-table-writer.cc File be/src/exec/parquet/hdfs-parquet-table-writer.cc: http://gerrit.cloudera.org:8080/#/c/17262/1/be/src/exec/parquet/hdfs-parquet-table-writer.cc@605 PS1, Line 605: // The ParquetBloomFilter object if one is being written. If : // 'ShouldInitParquetBloomFilter()' is false, the combination of the impala type and the : // parquet type is not supported or some error occurs during the initialisation of the : // ParquetBloomFilter object, it is set to NULL. : unique_ptr I find using dst_ptr a bit risky: I like the solution, but I disagree here: "in case of int8 and int16 we need to pad them to int32 because parquet doesn't support smaller ints, so in these cases we cannot use the buffer, adding even more special cases." Why can't we use the buffer? dst_ptr points to the plain encoded page buffer, where the i16/i8 are already converted to i32 http://gerrit.cloudera.org:8080/#/c/17262/3/be/src/exec/parquet/hdfs-parquet-table-writer.cc File be/src/exec/parquet/hdfs-parquet-table-writer.cc: http://gerrit.cloudera.org:8080/#/c/17262/3/be/src/exec/parquet/hdfs-parquet-table-writer.cc@493 PS3, Line 493: if (parent_ I would prefer to move this to a separate function like FlushDictionaryToBloomFilterIfNeeded() ProcessValue() is a critical function when trying to understand how Impala writes Parquet files, so I think that we should try to minimize noise. http://gerrit.cloudera.org:8080/#/c/17262/3/be/src/exec/parquet/hdfs-parquet-table-writer.cc@542 PS3, Line 542: if (ShouldUpdateParquetBloomFilter()) { I would prefer to move this to a separate function like UpdateBloomFilterIfNeeded() http://gerrit.cloudera.org:8080/#/c/17262/3/be/src/exec/parquet/hdfs-parquet-table-writer.cc@597 PS3, Line 597: whwn typo http://gerrit.cloudera.org:8080/#/c/17262/3/be/src/exec/parquet/parquet-bloom-filter-util.cc File be/src/exec/parquet/parquet-bloom-filter-util.cc: http://gerrit.cloudera.org:8080/#/c/17262/3/be/src/exec/parquet/parquet-bloom-filter-util.cc@220 PS3, Line 220: { nit: brace placement -- To view, visit http://gerrit.cloudera.org:8080/17262 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie865efd4f0c11b9e111fb94f77d084bf6ee20792 Gerrit-Change-Number: 17262 Gerrit-PatchSet: 3 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 14 Apr 2021 14:03:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17306 ) Change subject: IMPALA-10445: Adjust NDV's scale with query option .. Patch Set 13: (7 comments) http://gerrit.cloudera.org:8080/#/c/17306/13/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java: http://gerrit.cloudera.org:8080/#/c/17306/13/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java@180 PS13, Line 180: // Given a list of `rule`, this function checks whether the rewritten is as expected line too long (93 > 90) http://gerrit.cloudera.org:8080/#/c/17306/13/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java@181 PS13, Line 181: // Caution: if no rule in `rules` is expected to be fired, should set expectedExprStr to null line too long (95 > 90) http://gerrit.cloudera.org:8080/#/c/17306/13/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java@182 PS13, Line 182: // or "NULL". otherwise, this function would throw an exception like "Rule xxx didn't fire" line too long (93 > 90) http://gerrit.cloudera.org:8080/#/c/17306/13/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java File fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java: http://gerrit.cloudera.org:8080/#/c/17306/13/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java@541 PS13, Line 541: //-- line too long (96 > 90) http://gerrit.cloudera.org:8080/#/c/17306/13/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java@546 PS13, Line 546: // CASE 1: DEFAULT_NDV_SCALE=5(same as the value in original sql), APPX_COUNT_DISTINCT=True line too long (95 > 90) http://gerrit.cloudera.org:8080/#/c/17306/13/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java@552 PS13, Line 552: //-- line too long (96 > 90) http://gerrit.cloudera.org:8080/#/c/17306/13/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java@553 PS13, Line 553: String sql1 = "SELECT ndv(id), ndv(id, 5), count(DISTINCT id) FROM functional.alltypes"; line too long (92 > 90) -- To view, visit http://gerrit.cloudera.org:8080/17306 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50 Gerrit-Change-Number: 17306 Gerrit-PatchSet: 13 Gerrit-Owner: fifteencai Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: fifteencai Gerrit-Comment-Date: Wed, 14 Apr 2021 14:01:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option
fifteencai has uploaded a new patch set (#13). ( http://gerrit.cloudera.org:8080/17306 ) Change subject: IMPALA-10445: Adjust NDV's scale with query option .. IMPALA-10445: Adjust NDV's scale with query option This is a new way to control NDV's scale. Since IMPALA-2658, we can trade memory for more accurate estimation by setting larger `scale` in SQL function NDV(, ). However the use of larger NDV scale requires the modification of SQL queries which may not be practical in certain applications: - Firstly, SQL writers are reluctant to lower that scale. They prone to fill up the scale, which will make the cluster unstable. Especially when there are `group by`s with high cardinalities. So it is wiser to let cluster admin other than sql writer choose appropriate scale. - Secondly, In some application scenarios, queries are stored in DBs. In a BI system, for example, rewriting thousands of SQLs is risky. In this commit, we introduced a new Query Option `DEFAULT_NDV_SCALE` with the following semantics: 1. The allowed value is in the range [1..10]; 2. Previously, the scale used in NDV() functions was fixed at 2. Now the scale is provided by the newly added query options. 3. It does not influence the NDV scale for SQL function NDV(, ) in which the NDV scale is provided by the 2nd argument . We also refactored method `Analyze` to make sure APPX_COUNT_DISTINCT can work with this query option. After this, cluster admins can substitute `count(distinct )` with `ndv(, scale)`. Implementation details: - The default value of DEFAULT_NDV_SCALE is 2, so we won't change the default ndv behavior. - We port `CountDistinctToNdv` transform logic from `SelectStmt.analyze()` to `ExprRewriter`, making it compatible with further rewrite rules. - The newly added rewrite rule `DefaultNdvScaleRule` is applied after `CountDistinctToNdvRule`. Usage: To set a default ndv scale: ``` SET DEFAULT_NDV_SCALE = 10; ``` To unset: ``` SET DEFAULT_NDV_SCALE = 2; ``` Here are test results of a typical workload (cardinality=40,090,650): ++ | Metric| Count Distinct |NDV2|NDV5|NDV10 | ++ | Memory(GB) | 3.83 |1.84|1.85| 1.89 | | Duration(s) | 182.89| 30.22|29.72 | 29.24 | | ErrorRate |0% |1.8%|1.17% | 0.06% | ++ Testing: 1) Added 3 unit test cases in `ExprRewriteRulesTest`. 2) Added 5 unit test cases in `ExprRewriterTest`. 3) Ran all front-end unit test, passed. 4) Added a new query-option test. Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50 --- M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java A fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java A fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java 11 files changed, 302 insertions(+), 34 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17306/13 -- To view, visit http://gerrit.cloudera.org:8080/17306 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50 Gerrit-Change-Number: 17306 Gerrit-PatchSet: 13 Gerrit-Owner: fifteencai Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: fifteencai
[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 ) Change subject: IMPALA-10656: Fire insert events before commit .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8574/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17313 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9 Gerrit-Change-Number: 17313 Gerrit-PatchSet: 6 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 14 Apr 2021 13:20:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10652: Optimize the checking of the size of incremental stats
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17299 ) Change subject: IMPALA-10652: Optimize the checking of the size of incremental stats .. Patch Set 3: Code-Review+1 (2 comments) Looks good! http://gerrit.cloudera.org:8080/#/c/17299/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java: http://gerrit.cloudera.org:8080/#/c/17299/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@452 PS2, Line 452: if (partitionSet_ == null) { : numOfAllIncStatsPartitions = allPartitio > It is a pre-check on the size of incremental stats to prevent the increment Okay. Done. http://gerrit.cloudera.org:8080/#/c/17299/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: http://gerrit.cloudera.org:8080/#/c/17299/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@1784 PS3, Line 1784: aa nit. table alltypes? -- To view, visit http://gerrit.cloudera.org:8080/17299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee Gerrit-Change-Number: 17299 Gerrit-PatchSet: 3 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Wed, 14 Apr 2021 13:04:50 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 ) Change subject: IMPALA-10656: Fire insert events before commit .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/17313/6/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java File fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java: http://gerrit.cloudera.org:8080/#/c/17313/6/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@879 PS6, Line 879: unpartTable.getFileSystem(), new Path(unpartTable.getHdfsBaseDir()), overwrite, line too long (91 > 90) -- To view, visit http://gerrit.cloudera.org:8080/17313 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9 Gerrit-Change-Number: 17313 Gerrit-PatchSet: 6 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 14 Apr 2021 13:03:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit
Hello Vihang Karajgaonkar, Zoltan Borok-Nagy, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17313 to look at the new patch set (#6). Change subject: IMPALA-10656: Fire insert events before commit .. IMPALA-10656: Fire insert events before commit Before this fix Impala committed an insert first, then reloaded the table from HMS, and generated the insert events based on the difference between the two snapshots. (e.g. which file was not present in the old snapshot but are there in the new one). Hive replication expects the insert events before the commit, so this may potentially lead to issues there. The solution is to collect the new files during the insert in the backend, and send the insert events based on this file set. This wasn't very hard to do as we were already collecting the files in some cases: - to move them from staging dir to their final location in case of non-partitioned tables - to write the file list to snapshot files in case of Iceberg tables This patch unifies the paths above and collects all information about the created files regardless of the table type. Testing: - no new tests, insert events were already covered in test_event_processing.py and MetastoreEventsProcessorTest.java - ran core tests Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9 --- M be/src/exec/hbase-table-sink.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-text-table-writer.cc M be/src/exec/output-partition.h M be/src/exec/parquet/hdfs-parquet-table-writer.cc M be/src/runtime/dml-exec-state.cc M be/src/runtime/dml-exec-state.h M be/src/service/client-request-state.cc M common/protobuf/control_service.proto M common/thrift/CatalogService.thrift M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java 12 files changed, 219 insertions(+), 215 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/17313/6 -- To view, visit http://gerrit.cloudera.org:8080/17313 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9 Gerrit-Change-Number: 17313 Gerrit-PatchSet: 6 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10658: LOAD DATA INPATH silently fails between HDFS and Azure ABFS
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17316 ) Change subject: IMPALA-10658: LOAD DATA INPATH silently fails between HDFS and Azure ABFS .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8573/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17316 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id807e8a200b83283a09d3a917185cabab930017d Gerrit-Change-Number: 17316 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Comment-Date: Wed, 14 Apr 2021 13:02:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17306 ) Change subject: IMPALA-10445: Adjust NDV's scale with query option .. Patch Set 12: Code-Review+1 Looks good! -- To view, visit http://gerrit.cloudera.org:8080/17306 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50 Gerrit-Change-Number: 17306 Gerrit-PatchSet: 12 Gerrit-Owner: fifteencai Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: fifteencai Gerrit-Comment-Date: Wed, 14 Apr 2021 13:01:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10584: Defer advancing read page when reservation is tight.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17195 ) Change subject: IMPALA-10584: Defer advancing read page when reservation is tight. .. Patch Set 5: Code-Review+1 (1 comment) Looks good to me! http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream.cc File be/src/runtime/buffered-tuple-stream.cc: http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream.cc@747 PS5, Line 747: the first page must be a read page May DCHECK() on this assertion here. -- To view, visit http://gerrit.cloudera.org:8080/17195 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a Gerrit-Change-Number: 17195 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 14 Apr 2021 12:58:38 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10658: LOAD DATA INPATH silently fails between HDFS and Azure ABFS
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17316 ) Change subject: IMPALA-10658: LOAD DATA INPATH silently fails between HDFS and Azure ABFS .. Patch Set 1: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7073/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/17316 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id807e8a200b83283a09d3a917185cabab930017d Gerrit-Change-Number: 17316 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 14 Apr 2021 12:42:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10658: LOAD DATA INPATH silently fails between HDFS and Azure ABFS
Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17316 Change subject: IMPALA-10658: LOAD DATA INPATH silently fails between HDFS and Azure ABFS .. IMPALA-10658: LOAD DATA INPATH silently fails between HDFS and Azure ABFS LOAD DATA INPATH silently fails when Impala tries to move files from HDFS to ABFS. The problem is that we use FileSystem.makeQualified(Path) to decide if path is on a given filesystem. We expect to get an IllegalArgumentException if path is on a different filesystem. However, the Azure FileSystem implementation doesn't throw this exception. Because of that Impala thinks that an 'hdfs://' path and an 'abfs://' path is on the same filesystem, so it tries to move files with FileSystem.rename(). In case of errors rename() might throw an Exception, or return false. Impala doesn't check the return value, therefore if rename() returns false then the error remains silent. This patch fixes Impala's isPathOnFileSystem() and also adds a check for the return value of rename(). Testing: * tested manually between HDFS and Azure ABFS. Change-Id: Id807e8a200b83283a09d3a917185cabab930017d --- M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java 1 file changed, 8 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/17316/1 -- To view, visit http://gerrit.cloudera.org:8080/17316 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Id807e8a200b83283a09d3a917185cabab930017d Gerrit-Change-Number: 17316 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 ) Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator .. Patch Set 21: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7072/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/17252 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964 Gerrit-Change-Number: 17252 Gerrit-PatchSet: 21 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Wed, 14 Apr 2021 12:03:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10657: Remove accidental usage of shaded imports
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17314 ) Change subject: IMPALA-10657: Remove accidental usage of shaded imports .. Patch Set 2: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7071/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/17314 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifa67aec96539db7861416f3e71cca83607e3d8c9 Gerrit-Change-Number: 17314 Gerrit-PatchSet: 2 Gerrit-Owner: John Sherman Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Comment-Date: Wed, 14 Apr 2021 12:01:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10532: TestOverlapMinMaxFilters.test overlap min max filters seems flaky
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17289 ) Change subject: IMPALA-10532: TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky .. Patch Set 5: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7070/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/17289 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I527de530f7db1ce959e7ef2ae3ced18677221c9f Gerrit-Change-Number: 17289 Gerrit-PatchSet: 5 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Wed, 14 Apr 2021 11:57:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 ) Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator .. Patch Set 21: > Patch Set 21: > > Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7067/ > DRY_RUN=false I think we need to rerun the gerrit-verify-dryrun since jenkins.impala.io down yesterday. -- To view, visit http://gerrit.cloudera.org:8080/17252 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964 Gerrit-Change-Number: 17252 Gerrit-PatchSet: 21 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Wed, 14 Apr 2021 11:39:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 ) Change subject: IMPALA-10656: Fire insert events before commit .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8572/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17313 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9 Gerrit-Change-Number: 17313 Gerrit-PatchSet: 5 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 14 Apr 2021 10:43:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 ) Change subject: IMPALA-10656: Fire insert events before commit .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8571/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17313 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9 Gerrit-Change-Number: 17313 Gerrit-PatchSet: 4 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 14 Apr 2021 10:40:36 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 ) Change subject: IMPALA-10656: Fire insert events before commit .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/17313/5/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java File fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java: http://gerrit.cloudera.org:8080/#/c/17313/5/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@879 PS5, Line 879: unpartTable.getFileSystem(), new Path(unpartTable.getHdfsBaseDir()), overwrite, line too long (91 > 90) -- To view, visit http://gerrit.cloudera.org:8080/17313 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9 Gerrit-Change-Number: 17313 Gerrit-PatchSet: 5 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 14 Apr 2021 10:24:00 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit
Hello Vihang Karajgaonkar, Zoltan Borok-Nagy, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17313 to look at the new patch set (#5). Change subject: IMPALA-10656: Fire insert events before commit .. IMPALA-10656: Fire insert events before commit Before this fix Impala committed an insert first, then reloaded the table from HMS, and generated the insert events based on the difference between the two snapshots. (e.g. which file was not present in the old snapshot but are there in the new one). Hive replication expects the insert events before the commit, so this may potentially lead to issues there. The solution is to collect the new files during the insert in the backend, and send the insert events based on this file set. This wasn't very hard to do as we were already collecting the files in some cases: - to move them from staging dir to their final location in case of non-partitioned tables - to write the file list to snapshot files in case of Iceberg tables This patch unifies the paths above and collects all information about the created files regardless of the table type. Testing: - no new tests, insert events were already covered in test_event_processing.py and MetastoreEventsProcessorTest.java - ran core tests Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9 --- M be/src/exec/hbase-table-sink.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-text-table-writer.cc M be/src/exec/output-partition.h M be/src/exec/parquet/hdfs-parquet-table-writer.cc M be/src/runtime/dml-exec-state.cc M be/src/runtime/dml-exec-state.h M be/src/service/client-request-state.cc M common/protobuf/control_service.proto M common/thrift/CatalogService.thrift M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java 12 files changed, 220 insertions(+), 216 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/17313/5 -- To view, visit http://gerrit.cloudera.org:8080/17313 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9 Gerrit-Change-Number: 17313 Gerrit-PatchSet: 5 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 ) Change subject: IMPALA-10656: Fire insert events before commit .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/17313/4/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java File fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java: http://gerrit.cloudera.org:8080/#/c/17313/4/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@879 PS4, Line 879: unpartTable.getFileSystem(), new Path(unpartTable.getHdfsBaseDir()), overwrite, line too long (91 > 90) -- To view, visit http://gerrit.cloudera.org:8080/17313 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9 Gerrit-Change-Number: 17313 Gerrit-PatchSet: 4 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 14 Apr 2021 10:22:00 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 ) Change subject: IMPALA-10656: Fire insert events before commit .. Patch Set 4: PS 4 fixes the issue found during the jenkins tests. -- To view, visit http://gerrit.cloudera.org:8080/17313 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9 Gerrit-Change-Number: 17313 Gerrit-PatchSet: 4 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 14 Apr 2021 10:21:19 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit
Csaba Ringhofer has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/17313 ) Change subject: IMPALA-10656: Fire insert events before commit .. IMPALA-10656: Fire insert events before commit Before this fix Impala committed an insert first, then reloaded the table from HMS, and generated the insert events based on the difference between the two snapshots. (e.g. which file was not present in the old snapshot but are there in the new one). Hive replication expects the insert events before the commit, so this may potentially lead to issues there. The solution is to collect the new files during the insert in the backend, and send the insert events based on this file set. This wasn't very hard to do as we were already collecting the files in some cases: - to move them from staging dir to their final location in case of non-partitioned tables - to write the file list to snapshot files in case of Iceberg tables This patch unifies the paths above and collects all information about the created files regardless of the table type. Testing: - no new tests, insert events were already covered in test_event_processing.py and MetastoreEventsProcessorTest.java - ran core tests Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9 --- M be/src/exec/hbase-table-sink.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-text-table-writer.cc M be/src/exec/output-partition.h M be/src/exec/parquet/hdfs-parquet-table-writer.cc M be/src/runtime/dml-exec-state.cc M be/src/runtime/dml-exec-state.h M be/src/service/client-request-state.cc M common/protobuf/control_service.proto M common/thrift/CatalogService.thrift M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java 12 files changed, 225 insertions(+), 216 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/17313/4 -- To view, visit http://gerrit.cloudera.org:8080/17313 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9 Gerrit-Change-Number: 17313 Gerrit-PatchSet: 4 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10640: Support reading Parquet Bloom filters - most common types
Tamas Mate has posted comments on this change. ( http://gerrit.cloudera.org:8080/17026 ) Change subject: IMPALA-10640: Support reading Parquet Bloom filters - most common types .. Patch Set 25: Code-Review+1 Thanks everyone, there are a few styling nits and a missing include guard (comment on patch set 23) that are still there. LGTM! -- To view, visit http://gerrit.cloudera.org:8080/17026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287 Gerrit-Change-Number: 17026 Gerrit-PatchSet: 25 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 14 Apr 2021 08:54:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17306 ) Change subject: IMPALA-10445: Adjust NDV's scale with query option .. Patch Set 12: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/17306 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50 Gerrit-Change-Number: 17306 Gerrit-PatchSet: 12 Gerrit-Owner: fifteencai Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: fifteencai Gerrit-Comment-Date: Wed, 14 Apr 2021 08:46:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17306 ) Change subject: IMPALA-10445: Adjust NDV's scale with query option .. Patch Set 11: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/8570/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/17306 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50 Gerrit-Change-Number: 17306 Gerrit-PatchSet: 11 Gerrit-Owner: fifteencai Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: fifteencai Gerrit-Comment-Date: Wed, 14 Apr 2021 08:15:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10645: Log catalogd HMS API metrics
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17284 ) Change subject: IMPALA-10645: Log catalogd HMS API metrics .. Patch Set 2: (25 comments) http://gerrit.cloudera.org:8080/#/c/17284/2/common/thrift/JniCatalog.thrift File common/thrift/JniCatalog.thrift: http://gerrit.cloudera.org:8080/#/c/17284/2/common/thrift/JniCatalog.thrift@849 PS2, Line 849: nit: redundant blank line http://gerrit.cloudera.org:8080/#/c/17284/2/common/thrift/JniCatalog.thrift@884 PS2, Line 884: nit: redundant blank line http://gerrit.cloudera.org:8080/#/c/17284/2/common/thrift/JniCatalog.thrift@909 PS2, Line 909: nit: redundant blank line http://gerrit.cloudera.org:8080/#/c/17284/2/common/thrift/JniCatalog.thrift@934 PS2, Line 934: nit: redundant blank lines http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2219 PS2, Line 2219: if nit: need one space after "if" http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2221 PS2, Line 2221: nit: I think we use 4 spaces idention here http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2227 PS2, Line 2227: nit: 4 spaces idention http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2230 PS2, Line 2230: if nit: need one space after "if" http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2233 PS2, Line 2233: nit: 4 spaces idention http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2067 PS2, Line 2067: Counter misses = CatalogMonitor.INSTANCE.getCatalogdHmsCacheMetrics().getCounter(FILEMETADATA_CACHE_MISS_METRIC); Previously we count the metrics per table, this changes it to use a global counter. Is it intended? http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2131 PS2, Line 2131: getFileMetadataCacheHitRate After this patch, the hit rate will be a global hit rate counted on all tables. Is it intended? http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java File fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java: http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java@77 PS2, Line 77: private static final Logger LOG = : LoggerFactory.getLogger(CatalogMetastoreServer.class); nit: don't need changes since the original line fits in 90 chars. http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java@94 PS2, Line 94: private static final String ACTIVE_CONNECTIONS_METRIC : = "metastore.active.connections"; nit: don't need changes as well. http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java@99 PS2, Line 99: = nit: I think our code style tend to put this in the above line http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java@143 PS2, Line 143: nit: redundant blank line http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java@200 PS2, Line 200: throws Throwable { nit: move to the above line http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java@201 PS2, Line 201: nit: redundant blank line http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java@202 PS2, Line 202: contains nit: Unnecessary 'contains' check. We can always call add() directly. http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java@256 PS2, Line 256: /** :* :*/ Do you plan to add something here? http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java@486 PS2, Line 486: nit: redundant blank line http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java@488 PS2, Line
[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table cache for hms ddl apis which modify tables and partitions.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17298 ) Change subject: IMPALA-10648: Invalidate catalogd table cache for hms ddl apis which modify tables and partitions. .. Patch Set 3: (13 comments) The solution looks good to me. Please adjust the code style to match existing codes. Thanks! http://gerrit.cloudera.org:8080/#/c/17298/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17298/3//COMMIT_MSG@9 PS3, Line 9: For non transactional tables Could you explain why we don't do this for transactional tables in the commit message? http://gerrit.cloudera.org:8080/#/c/17298/3//COMMIT_MSG@13 PS3, Line 13: (since table loading in cache takes time) but ensures consistency. This change is behind catalogd nit: please adjust the commit message body to fit into at-most 72 chars per line. http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java File fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java: http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@141 PS3, Line 141: import org.apache.hadoop.hive.metastore.api.InvalidPartitionException; Is this used somewhere? http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@222 PS3, Line 222: UnknownPartitionException Is this used somewhere? http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@347 PS3, Line 347: nit: We use 4 spaces indention for multi-line statement. However, many changes in this patch use 8 spaces. Could you configure your IDE to adjust this? http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@349 PS3, Line 349: nit: 4 spaces indention http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@915 PS3, Line 915: partitionList Is it possible that these partitions belong to different tables? * If no, let's call invalidateNonTransactionalTableIfExists() once * If yes, let's de-duplicate the table names first and then call invalidateNonTransactionalTableIfExists() for each table. http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@929 PS3, Line 929: for (PartitionSpec partitionSpec : list) { Same question as above http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@1086 PS3, Line 1086: nit: 4 spaces indention http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@1087 PS3, Line 1087: , nit: 8 spaces indention and move the comma above http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@2849 PS3, Line 2849: if nit: In our code style, we need one space after "if". Many changes in this patch don't have it. Please adjust them. Thanks! http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@2903 PS3, Line 2903: // return immediately if flag invalidateCacheOnDDLs_ is false : if(!invalidateCacheOnDDLs_) { : LOG.debug("Not removing table {}.{} from catalogd cache because " + : "invalidateCacheOnDDLs_ flag is set to {} ", dbNameWithCatalog, : tableName, invalidateCacheOnDDLs_); : return; : } : // Parse db name. Throw error if parsing fails. : String dbName = dbNameWithCatalog; : try { : dbName = MetaStoreUtils.parseDbName(dbNameWithCatalog, serverConf_)[1]; : } catch (MetaException ex) { : LOG.error("Successfully executed HMS api: {} but encountered error " + : "when parsing dbName {} to invalidate/remove table from cache with error message: {}", : apiName, dbNameWithCatalog, ex.getMessage()); : throw ex; : } : Db db = catalog_.getDb(dbName); : if(db == null) { : LOG.debug("Not removing table {}.{} because db {} does not exist in catalogd cache", : dbName, tableName, dbName); : return; : } : if(!db.containsTable(tableName)) { : LOG.debug("Not removing table {}.{}
[Impala-ASF-CR] IMPALA-10652: Optimize the checking of the size of incremental stats
liuyao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17299 ) Change subject: IMPALA-10652: Optimize the checking of the size of incremental stats .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/17299/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java: http://gerrit.cloudera.org:8080/#/c/17299/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@448 PS2, Line 448: numOfAllIncStats > May rename the variable to numOfAllIncStatsPartitions Done http://gerrit.cloudera.org:8080/#/c/17299/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@452 PS2, Line 452: if (partitionSet_ == null) { : numOfAllIncStatsPartitions = allPartitio > We may not need to verify the size limit when the partition set for increme It is a pre-check on the size of incremental stats to prevent the incremental stats from occupying too much memory after calculation. If no partition is specified, all partitions are calculated. Need to check whether the incremental stats of all partitions exceeds the threshold after calculation. -- To view, visit http://gerrit.cloudera.org:8080/17299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee Gerrit-Change-Number: 17299 Gerrit-PatchSet: 3 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Wed, 14 Apr 2021 06:18:18 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10652: Optimize the checking of the size of incremental stats
Hello Aman Sinha, Qifan Chen, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17299 to look at the new patch set (#3). Change subject: IMPALA-10652: Optimize the checking of the size of incremental stats .. IMPALA-10652: Optimize the checking of the size of incremental stats Modify the estimation method of incremental statistics size: incremental statistics size = Existing partition statistics + This time calculation partition stats - Repeated calculation partition stats Testing: All partitions of a table have no incremental stats. --Calculate the incremental stats of all partitions, the incremental stats size exceeds the threshold, an error is reported. --Calculate the incremental stats of one partition, no error is reported. Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee --- M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java 2 files changed, 45 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/17299/3 -- To view, visit http://gerrit.cloudera.org:8080/17299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee Gerrit-Change-Number: 17299 Gerrit-PatchSet: 3 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option
fifteencai has uploaded a new patch set (#12). ( http://gerrit.cloudera.org:8080/17306 ) Change subject: IMPALA-10445: Adjust NDV's scale with query option .. IMPALA-10445: Adjust NDV's scale with query option This is a new way to control NDV's scale. Since IMPALA-2658, we can trade memory for more accurate estimation by setting larger `scale` in SQL function NDV(, ). However the use of larger NDV scale requires the modification of SQL queries which may not be practical in certain applications: - Firstly, SQL writers are reluctant to lower that scale. They prone to fill up the scale, which will make the cluster unstable. Especially when there are `group by`s with high cardinalities. So it is wiser to let cluster admin other than sql writer choose appropriate scale. - Secondly, In some application scenarios, queries are stored in DBs. In a BI system, for example, rewriting thousands of SQLs is risky. In this commit, we introduced a new Query Option `DEFAULT_NDV_SCALE` with the following semantics: 1. The allowed value is in the range [1..10]; 2. Previously, the scale used in NDV() functions was fixed at 2. Now the scale is provided by the newly added query options. 3. It does not influence the NDV scale for SQL function NDV(, ) in which the NDV scale is provided by the 2nd argument . We also refactored method `Analyze` to make sure APPX_COUNT_DISTINCT can work with this query option. After this, cluster admins can substitute `count(distinct )` with `ndv(, scale)`. Implementation details: - The default value of DEFAULT_NDV_SCALE is 2, so we won't change the default ndv behavior. - We port `CountDistinctToNdv` transform logic from `SelectStmt.analyze()` to `ExprRewriter`, making it compatible with further rewrite rules. - The newly added rewrite rule `DefaultNdvScaleRule` is applied after `CountDistinctToNdvRule`. Usage: To set a default ndv scale: ``` SET DEFAULT_NDV_SCALE = 10; ``` To unset: ``` SET DEFAULT_NDV_SCALE = 2; ``` Here are test results of a typical workload (cardinality=40,090,650): ++ | Metric| Count Distinct |NDV2|NDV5|NDV10 | ++ | Memory(GB) | 3.83 |1.84|1.85| 1.89 | | Duration(s) | 182.89| 30.22|29.72 | 29.24 | | ErrorRate |0% |1.8%|1.17% | 0.06% | ++ Testing: 1) Added 3 unit test cases in `ExprRewriteRulesTest`. 2) Added 5 unit test cases in `ExprRewriterTest`. 3) Ran all front-end unit test, passed. 4) Added a new query-option test. Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50 --- M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java A fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java A fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java 11 files changed, 298 insertions(+), 34 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17306/12 -- To view, visit http://gerrit.cloudera.org:8080/17306 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50 Gerrit-Change-Number: 17306 Gerrit-PatchSet: 12 Gerrit-Owner: fifteencai Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: fifteencai