[Impala-ASF-CR](asf-site) Add slack channel in the community page
Jim Apple has posted comments on this change. ( http://gerrit.cloudera.org:8080/17311 ) Change subject: Add slack channel in the community page .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17311 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: asf-site Gerrit-MessageType: comment Gerrit-Change-Id: I26112aff63cabf6f4ca437f12d71928e80106687 Gerrit-Change-Number: 17311 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 13 Apr 2021 05:12:06 + Gerrit-HasComments: No
[Impala-ASF-CR](asf-site) Add slack channel in the community page
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17311 ) Change subject: Add slack channel in the community page .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/17311/1/community.html File community.html: http://gerrit.cloudera.org:8080/#/c/17311/1/community.html@155 PS1, Line 155: sign up > Do join links expire after a certain amount of time or a certain number of I remember those things too. But this time when I copy the link in "Invite people to Apache Impala", it notifies me "never expires". So I think it's fine (including user limits). BTW, Presto uses such kind of a link as well in their doc: https://prestodb.io/community.html -- To view, visit http://gerrit.cloudera.org:8080/17311 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: asf-site Gerrit-MessageType: comment Gerrit-Change-Id: I26112aff63cabf6f4ca437f12d71928e80106687 Gerrit-Change-Number: 17311 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 13 Apr 2021 03:48:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR](asf-site) Add slack channel in the community page
Jim Apple has posted comments on this change. ( http://gerrit.cloudera.org:8080/17311 ) Change subject: Add slack channel in the community page .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/17311/1/community.html File community.html: http://gerrit.cloudera.org:8080/#/c/17311/1/community.html@155 PS1, Line 155: sign up Do join links expire after a certain amount of time or a certain number of users? I have a recollection of running into that. -- To view, visit http://gerrit.cloudera.org:8080/17311 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: asf-site Gerrit-MessageType: comment Gerrit-Change-Id: I26112aff63cabf6f4ca437f12d71928e80106687 Gerrit-Change-Number: 17311 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 13 Apr 2021 02:51:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR](asf-site) Add slack channel in the community page
Quanlong Huang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17311 Change subject: Add slack channel in the community page .. Add slack channel in the community page This patch adds links to join the apache-impala slack channel. Change-Id: I26112aff63cabf6f4ca437f12d71928e80106687 --- M community.html 1 file changed, 1 insertion(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/17311/1 -- To view, visit http://gerrit.cloudera.org:8080/17311 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: asf-site Gerrit-MessageType: newchange Gerrit-Change-Id: I26112aff63cabf6f4ca437f12d71928e80106687 Gerrit-Change-Number: 17311 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang
[Impala-ASF-CR] IMPALA-10532: TestOverlapMinMaxFilters.test overlap min max filters seems flaky
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17289 ) Change subject: IMPALA-10532: TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/17289/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17289/2//COMMIT_MSG@10 PS2, Line 10: due to the query plan change (from 3-node scan to : 2-node scan) which splits the row groups among scan nodes differently. > One more piece of details on instance 2. The overlap ratio of 0 indicates t Ah, I see! Thanks for the detailed explanation! It's really helpful. http://gerrit.cloudera.org:8080/#/c/17289/2/tests/query_test/test_runtime_filters.py File tests/query_test/test_runtime_filters.py: http://gerrit.cloudera.org:8080/#/c/17289/2/tests/query_test/test_runtime_filters.py@270 PS2, Line 270: @SkipIfNotHdfsMinicluster.scheduling IIUC, the issue is due to the underlying parquet files have different stats between EC and non-EC runs. It's not about scheduling. So I suggest we add a new skip type in SkipIfEC and use that instead, e.g. SkipIfEC.different_stats, and adding this in SkipIfEC (tests/common/skip.py) different_stats = pytest.mark.skipif(IS_EC, reason="Parquet/ORC files that are generated" " in tests have different stats in EC mode due to larger block group size.") -- To view, visit http://gerrit.cloudera.org:8080/17289 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I527de530f7db1ce959e7ef2ae3ced18677221c9f Gerrit-Change-Number: 17289 Gerrit-PatchSet: 2 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Tue, 13 Apr 2021 02:15:53 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table cache for hms ddl apis which modify tables and partitions.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17298 ) Change subject: IMPALA-10648: Invalidate catalogd table cache for hms ddl apis which modify tables and partitions. .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8557/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17298 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98 Gerrit-Change-Number: 17298 Gerrit-PatchSet: 3 Gerrit-Owner: Sourabh Goyal Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 12 Apr 2021 22:29:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table cache for hms ddl apis which modify tables and partitions.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17298 ) Change subject: IMPALA-10648: Invalidate catalogd table cache for hms ddl apis which modify tables and partitions. .. Patch Set 3: (50 comments) http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java File fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java: http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@348 PS3, Line 348: LOG.info("Invalidate catalogd cache for DDLs on non transactional tables is set to {}", line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@658 PS3, Line 658: removeNonTransactionalTableIfExists(dbname, tblname, "drop_table_with_environment_context"); line too long (98 > 90) http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@843 PS3, Line 843: invalidateNonTransactionalTableIfExists(newTable.getDbName(), newTable.getTableName(), line too long (92 > 90) http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@890 PS3, Line 890: Partition addedPartition = client.getHiveClient().getThriftClient().add_partition(partition); line too long (99 > 90) http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@891 PS3, Line 891: invalidateNonTransactionalTableIfExists(partition.getDbName(), partition.getTableName(), line too long (94 > 90) http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@904 PS3, Line 904: invalidateNonTransactionalTableIfExists(partition.getDbName(), partition.getTableName(), line too long (94 > 90) http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@914 PS3, Line 914: int numPartitionsAdded = client.getHiveClient().getThriftClient().add_partitions(partitionList); line too long (102 > 90) http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@916 PS3, Line 916: invalidateNonTransactionalTableIfExists(partition.getDbName(), partition.getTableName(), line too long (96 > 90) http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@930 PS3, Line 930: invalidateNonTransactionalTableIfExists(partitionSpec.getDbName(), partitionSpec.getTableName(), line too long (104 > 90) http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@955 PS3, Line 955: invalidateNonTransactionalTableIfExists(addPartitionsRequest.getDbName(), addPartitionsRequest.getTblName(), line too long (114 > 90) http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@968 PS3, Line 968: .append_partition_with_environment_context(dbname, tblname, partVals, environmentContext); line too long (104 > 90) http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@994 PS3, Line 994: .append_partition_by_name_with_environment_context(dbname, tblname, partName, line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@1021 PS3, Line 1021: .drop_partition_with_environment_context(dbname, tblname, partNames, deleteData, line too long (94 > 90) http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@1034 PS3, Line 1034: boolean partitionsDropped = client.getHiveClient().getThriftClient().drop_partition_by_name(dbname, line too long (106 > 90) http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@1049 PS3, Line 1049: .drop_partition_by_name_with_environment_context(dbName, tableName, partName, line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@1064 PS3, Line 1064: invalidateNonTransactionalTableIfExists(dropPartitionsRequest.getDbName(), dropPartitionsRequest.getTblName(), line too long (116 > 90)
[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table cache for hms ddl apis which modify tables and partitions.
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17298 to look at the new patch set (#3). Change subject: IMPALA-10648: Invalidate catalogd table cache for hms ddl apis which modify tables and partitions. .. IMPALA-10648: Invalidate catalogd table cache for hms ddl apis which modify tables and partitions. For non transactional tables, invalidate the table from cache if HMS DDL apis are accessed from catalogd's metastore server. Any subsequent get table request fetches the table from HMS and loads it in cache. This ensures that any get_table/get_partition requests after DDL operations on the same table return updated table. This behavior has a performance penalty (since table loading in cache takes time) but ensures consistency. This change is behind catalogd server's flag: invalidate_hms_cache_on_ddls which is enabled by default. The flag needs to be turned off if this change becomes a performance bottleneck. Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98 --- M be/src/catalog/catalog-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M tests/custom_cluster/test_metastore_service.py 6 files changed, 420 insertions(+), 48 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/17298/3 -- To view, visit http://gerrit.cloudera.org:8080/17298 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98 Gerrit-Change-Number: 17298 Gerrit-PatchSet: 3 Gerrit-Owner: Sourabh Goyal Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 ) Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator .. Patch Set 18: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8556/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17252 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964 Gerrit-Change-Number: 17252 Gerrit-PatchSet: 18 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Mon, 12 Apr 2021 21:12:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10619: Minor refactoring of analytic function methods
Aman Sinha has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17237 ) Change subject: IMPALA-10619: Minor refactoring of analytic function methods .. IMPALA-10619: Minor refactoring of analytic function methods The FIRST_VALUE, LAST_VALUE functions go through standardization process in AnalyticExpr where they may be rewritten with different number of parameters or with different window frame. In order for an external FE to leverage this standardization, this patch creates a wrapper method for FunctionCallExpr creation and does minor refactoring. Also added accessor methods to AnalyticEvalNode and changed visibility of couple of methods in PlanNode for use by external FE. Testing: Ran PlannerTests. No new tests are added since this does not change the existing behavior. Change-Id: I39e4268c0c5500f09acf98357a80763c28f615c2 Reviewed-on: http://gerrit.cloudera.org:8080/17237 Reviewed-by: Impala Public Jenkins Tested-by: Aman Sinha Reviewed-by: Aman Sinha --- M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java 3 files changed, 18 insertions(+), 7 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved Aman Sinha: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/17237 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I39e4268c0c5500f09acf98357a80763c28f615c2 Gerrit-Change-Number: 17237 Gerrit-PatchSet: 5 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-10619: Minor refactoring of analytic function methods
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17237 ) Change subject: IMPALA-10619: Minor refactoring of analytic function methods .. Patch Set 4: Code-Review+2 Carry forward +2 -- To view, visit http://gerrit.cloudera.org:8080/17237 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I39e4268c0c5500f09acf98357a80763c28f615c2 Gerrit-Change-Number: 17237 Gerrit-PatchSet: 4 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 12 Apr 2021 21:06:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 ) Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator .. Patch Set 18: Code-Review+1 (1 comment) Looks good to me. Just have one more nit. Please carry my +1 after fix. http://gerrit.cloudera.org:8080/#/c/17252/18//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17252/18//COMMIT_MSG@12 PS18, Line 12: always_false_flipped_ nit: always_false_flipped_to_false_ -- To view, visit http://gerrit.cloudera.org:8080/17252 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964 Gerrit-Change-Number: 17252 Gerrit-PatchSet: 18 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Mon, 12 Apr 2021 21:00:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator
Qifan Chen has uploaded a new patch set (#18). ( http://gerrit.cloudera.org:8080/17252 ) Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator .. IMPALA-10647 Improve always-true min/max filter handling in coordinator The change improves how a coordinator behaves when a just arriving min/max filter is always true. A new member 'always_true_filter_received_' is introduced to record such a fact. Similarily, the new member always_false_flipped_ is added to indicate that the always false flag is flipped from 'true' to 'false'. These two members only influence how the min and max columns in "Filter routing table" and "Final filter table" in profile are displayed as follows. 1. 'PartialUpdates' - The min and the max are partially updated; 2. 'AlwaysTrue' - One received filter is AlwaysTrue; 3. 'AlwaysFalse'- No filter is received or all received filters are empty; 4. 'Real values'- The final accumulated min/max from all received filters. A second change introduced is to record, in scan node, the arrival time of min/max filters (as a timestamp since the system is rebooted, obtained by calling MonotonicMillis()). A timestamp of similar nature is recorded for hdfs parquet scanners when a row group is processed. By comparing these two timestamps, one can easily diagnose issues related to late arrival of min/max filters. Testing: 1. Ran unit tests; 2. Ran core tests. Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964 --- M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/scan-node.cc M be/src/runtime/coordinator-filter-state.h M be/src/runtime/coordinator.cc M be/src/util/min-max-filter.cc M be/src/util/min-max-filter.h M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test 8 files changed, 223 insertions(+), 28 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/17252/18 -- To view, visit http://gerrit.cloudera.org:8080/17252 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964 Gerrit-Change-Number: 17252 Gerrit-PatchSet: 18 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-10532: TestOverlapMinMaxFilters.test overlap min max filters seems flaky
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17289 ) Change subject: IMPALA-10532: TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/17289/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17289/2//COMMIT_MSG@10 PS2, Line 10: due to the query plan change (from 3-node scan to : 2-node scan) which splits the row groups among scan nodes differently. > Here are some details if it is not too much :-) The bottom line is that onl One more piece of details on instance 2. The overlap ratio of 0 indicates that the entire row group is filtered out (which is very good). We do not drill down to figure out how many pages there are in the group, for the sake of reporting. -- To view, visit http://gerrit.cloudera.org:8080/17289 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I527de530f7db1ce959e7ef2ae3ced18677221c9f Gerrit-Change-Number: 17289 Gerrit-PatchSet: 2 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Mon, 12 Apr 2021 20:49:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10584: Defer advancing read page when reservation is tight.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17195 ) Change subject: IMPALA-10584: Defer advancing read page when reservation is tight. .. Patch Set 3: It looks like the solution of this problem is much more simple than I thought: just don't advance the read page. The initial reason why we are advancing read page in the first place is explained in IMPALA-8890 (review is in https://gerrit.cloudera.org/c/14144/): a DCHECK hit in ExpectedPinCount() because we tried to unpin a fully exhausted and attached read_page_. IMPALA-8890 choose to advance the read iterator to avoid this DCHECK. Patch set 3 in this review choose to NOT advance the read page, but instead skip the first page (that is the read page) when iterating pages_ to call UnpinPageIfNeeded. -- To view, visit http://gerrit.cloudera.org:8080/17195 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a Gerrit-Change-Number: 17195 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Mon, 12 Apr 2021 20:18:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10584: Defer advancing read page when reservation is tight.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17195 ) Change subject: IMPALA-10584: Defer advancing read page when reservation is tight. .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8555/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17195 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a Gerrit-Change-Number: 17195 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Mon, 12 Apr 2021 20:12:19 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10584: Defer advancing read page when reservation is tight.
Hello Quanlong Huang, Joe McDonnell, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17195 to look at the new patch set (#3). Change subject: IMPALA-10584: Defer advancing read page when reservation is tight. .. IMPALA-10584: Defer advancing read page when reservation is tight. TestScratchLimit::test_with_unlimited_scratch_limit has been intermittently crashing in ubuntu-16.04-dockerised-tests environment after result spooling is enabled by default in IMPALA-9856. DCHECK violation occurs in ReservationTracker::CheckConsistency() due to BufferedTupleStream wrongly tries to reclaim memory reservation while unpinning the stream. For this bug to surface, all of the following needs to happen: - Stream is in pinned mode. - There are only 2 pages in the stream: 1 read and 1 write. - Stream can not increase reservation anymore either due to memory pressure or low buffer/memory limit. - The stream read page has been fully read and is attached to output RowBatch. But the output RowBatch has not cleaned up yet. - BufferedTupleStream::UnpinStream is invoked. The memory accounting bug happens because UnpinStream proceeds to NextReadPage where the read page buffer was mistakenly assumed as released. default_page_len_ bytes were added into write_page_reservation_ and subsequently violates the total memory reservation. This patch fixes the bug by deferring advancement of the read iterator in UnpinStream if the read page is attached to output RowBatch and there is no more unused reservation. This is OK because after UnpinStream finished, the stream is now in unpinned mode and has_read_write_page is false. The next AddRow operation is then allowed to unpin the previous write page first before reusing the reservation to allocate a new write page. Testing: - Loop the TestScratchLimit::test_with_unlimited_scratch_limit in my local dev machine and verify that each test passed without triggering the DCHECK violation. - Reenable result spooling in TestScratchLimit that was disabled in IMPALA-10559. - Pass core tests. Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a --- M be/src/runtime/buffered-tuple-stream.cc M tests/query_test/test_scratch_limit.py 2 files changed, 24 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/17195/3 -- To view, visit http://gerrit.cloudera.org:8080/17195 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a Gerrit-Change-Number: 17195 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-10350 Fix precision loss while converting DecimalValue to double.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17303 ) Change subject: IMPALA-10350 Fix precision loss while converting DecimalValue to double. .. Patch Set 4: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/8554/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/17303 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56f0652cb8f81a491b87d9b108a94c00ae6c99a1 Gerrit-Change-Number: 17303 Gerrit-PatchSet: 4 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 12 Apr 2021 19:18:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10350 Fix precision loss while converting DecimalValue to double.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17303 ) Change subject: IMPALA-10350 Fix precision loss while converting DecimalValue to double. .. Patch Set 4: (20 comments) http://gerrit.cloudera.org:8080/#/c/17303/4/be/src/thirdparty/fast_double_parser/fast_double_parser.h File be/src/thirdparty/fast_double_parser/fast_double_parser.h: http://gerrit.cloudera.org:8080/#/c/17303/4/be/src/thirdparty/fast_double_parser/fast_double_parser.h@13 PS4, Line 13: #if (defined(sun) || defined(__sun)) line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/17303/4/be/src/thirdparty/fast_double_parser/fast_double_parser.h@17 PS4, Line 17: #if defined(__CYGWIN__) || defined(__MINGW32__) || defined(__MINGW64__) line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/17303/4/be/src/thirdparty/fast_double_parser/fast_double_parser.h@22 PS4, Line 22: * Determining whether we should import xlocale.h or not is line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/17303/4/be/src/thirdparty/fast_double_parser/fast_double_parser.h@25 PS4, Line 25: #if defined(FAST_DOUBLE_PARSER_SOLARIS) || defined(FAST_DOUBLE_PARSER_CYGWIN) line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/17303/4/be/src/thirdparty/fast_double_parser/fast_double_parser.h@67 PS4, Line 67: #endif // defined(FAST_DOUBLE_PARSER_SOLARIS) || defined(FAST_DOUBLE_PARSER_CYGWIN) line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/17303/4/be/src/thirdparty/fast_double_parser/fast_double_parser.h@84 PS4, Line 84: * However, we have that line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/17303/4/be/src/thirdparty/fast_double_parser/fast_double_parser.h@86 PS4, Line 86: * Thus it is possible for a number of the form w * 10^-342 where line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/17303/4/be/src/thirdparty/fast_double_parser/fast_double_parser.h@94 PS4, Line 94: * Any number of form w * 10^309 where w>= 1 is going to be line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/17303/4/be/src/thirdparty/fast_double_parser/fast_double_parser.h@153 PS4, Line 153: // credit: https://stackoverflow.com/questions/28868367/getting-the-high-part-of-64-bit-integer-multiplication line too long (110 > 90) http://gerrit.cloudera.org:8080/#/c/17303/4/be/src/thirdparty/fast_double_parser/fast_double_parser.h@154 PS4, Line 154: really_inline uint64_t Emulate64x64to128(uint64_t& r_hi, const uint64_t x, const uint64_t y) { line too long (94 > 90) http://gerrit.cloudera.org:8080/#/c/17303/4/be/src/thirdparty/fast_double_parser/fast_double_parser.h@159 PS4, Line 159: line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/17303/4/be/src/thirdparty/fast_double_parser/fast_double_parser.h@224 PS4, Line 224: */ line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/17303/4/be/src/thirdparty/fast_double_parser/fast_double_parser.h@958 PS4, Line 958: line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/17303/4/be/src/thirdparty/fast_double_parser/fast_double_parser.h@960 PS4, Line 960: // The exponent is 1024 + 63 + power line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/17303/4/be/src/thirdparty/fast_double_parser/fast_double_parser.h@976 PS4, Line 976: // The 65536 is (1<<16) and corresponds to line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/17303/4/be/src/thirdparty/fast_double_parser/fast_double_parser.h@979 PS4, Line 979: // ((152170 * power ) >> 16) is equal to line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/17303/4/be/src/thirdparty/fast_double_parser/fast_double_parser.h@980 PS4, Line 980: // floor(log(5**power)/log(2)) line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/17303/4/be/src/thirdparty/fast_double_parser/fast_double_parser.h@982 PS4, Line 982: // Note that this is not magic: 152170/(1<<16) is line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/17303/4/be/src/thirdparty/fast_double_parser/fast_double_parser.h@984 PS4, Line 984: // The 1<<16 value is a power of two; we could use a line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/17303/4/be/src/thirdparty/fast_double_parser/fast_double_parser.h@1097 PS4, Line 1097: #if defined(FAST_DOUBLE_PARSER_SOLARIS) || defined(FAST_DOUBLE_PARSER_CYGWIN) line has trailing whitespace -- To view, visit http://gerrit.cloudera.org:8080/17303 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56f0652cb8f81a491b87d9b108a94c00ae6c99a1 Gerrit-Change-Number: 17303 Gerrit-PatchSet: 4 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 12 Apr 2021 18:59:23 +
[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 ) Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator .. Patch Set 17: (5 comments) http://gerrit.cloudera.org:8080/#/c/17252/15//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17252/15//COMMIT_MSG@29 PS15, Line 29: row group is processed. By comparing these two timestamps, one : can easily diagnose issues related to late arrival of min/max : filters. > Consider adding these three tests for Final table as its content is sending The test added in Patch set 17 looks good to me, thanks! http://gerrit.cloudera.org:8080/#/c/17252/17/be/src/runtime/coordinator-filter-state.h File be/src/runtime/coordinator-filter-state.h: http://gerrit.cloudera.org:8080/#/c/17252/17/be/src/runtime/coordinator-filter-state.h@178 PS17, Line 178: True value means the always false flag in aggregated filter is flipped. To further clarify the doc, please mention that a True value means the filter was flipped from True to False by coordinator. http://gerrit.cloudera.org:8080/#/c/17252/15/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/17252/15/be/src/runtime/coordinator.cc@659 PS15, Line 659: > If AlwaysTrueFilterReceived() is true, then the accumulated filter is logic New logic looks good to me. I'll continue my comments on patch set 17. http://gerrit.cloudera.org:8080/#/c/17252/17/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/17252/17/be/src/runtime/coordinator.cc@668 PS17, Line 668: state.AlwaysFalseFlipped() The method name sounds ambiguous. Was it flipped from true to false, or false to true. Seems like the former. The method return true IF filter was an AlwaysFalse before being disabled (flipped to an AlwaysTrue) by coordinator. Maybe "WasAlwaysFalse" is more descriptive? http://gerrit.cloudera.org:8080/#/c/17252/17/be/src/util/min-max-filter.h File be/src/util/min-max-filter.h: http://gerrit.cloudera.org:8080/#/c/17252/17/be/src/util/min-max-filter.h@358 PS17, Line 358: : std::string DebugString(const MinMaxFilterPB& filter); : bool AlwaysTrue(const MinMaxFilterPB& filter); : bool AlwaysFalse(const MinMaxFilterPB& filter); : std::string DebugString(const ColumnValuePB& value); Any reason not making these methods as static methods under MinMaxFilter class? -- To view, visit http://gerrit.cloudera.org:8080/17252 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964 Gerrit-Change-Number: 17252 Gerrit-PatchSet: 17 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Mon, 12 Apr 2021 18:59:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10350 Fix precision loss while converting DecimalValue to double.
Amogh Margoor has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/17303 ) Change subject: IMPALA-10350 Fix precision loss while converting DecimalValue to double. .. IMPALA-10350 Fix precision loss while converting DecimalValue to double. Original approach to convert DecimalValue(internal representation of decimals) was not accurate. It was: static_cast(value_) / pow(10.0, scale). However only integers from −2^53 to 2^53 can be represented accurately by double precision without any loss. Hence, it would not work for numbers like -0.43149576573887316. For DecimalValue representing -0.43149576573887316, value_ would be -43149576573887316 and scale would be 17. As value_ < -2^53, result would not be accurate. In newer approach we are using third party library https://github.com/lemire/fast_double_parser, which handles above scenario in a performant manner. Change-Id: I56f0652cb8f81a491b87d9b108a94c00ae6c99a1 --- M be/src/runtime/decimal-value.inline.h A be/src/thirdparty/fast_double_parser/LICENSE A be/src/thirdparty/fast_double_parser/LICENSE.BSL A be/src/thirdparty/fast_double_parser/README.md A be/src/thirdparty/fast_double_parser/fast_double_parser.h M bin/rat_exclude_files.txt M testdata/workloads/functional-query/queries/QueryTest/values.test 7 files changed, 1,551 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/17303/4 -- To view, visit http://gerrit.cloudera.org:8080/17303 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I56f0652cb8f81a491b87d9b108a94c00ae6c99a1 Gerrit-Change-Number: 17303 Gerrit-PatchSet: 4 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] [WIP] IMPALA-10650: Bailout min/max filters in hash join builder early
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17295 ) Change subject: [WIP] IMPALA-10650: Bailout min/max filters in hash join builder early .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/17295/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17295/5//COMMIT_MSG@10 PS5, Line 10: periodically measuring > What's the average sampling rate for typical load? It is per every batch of rows (1024) for now. The performance test is pending. -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 5 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Mon, 12 Apr 2021 17:53:38 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10619: Minor refactoring of analytic function methods
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17237 ) Change subject: IMPALA-10619: Minor refactoring of analytic function methods .. Patch Set 4: Verified+1 > Patch Set 4: Verified-1 > > Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7063/ 2 HBASE test failures are flaky tests as mentioned in IMPALA-1995. -- To view, visit http://gerrit.cloudera.org:8080/17237 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I39e4268c0c5500f09acf98357a80763c28f615c2 Gerrit-Change-Number: 17237 Gerrit-PatchSet: 4 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 12 Apr 2021 17:51:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10619: Minor refactoring of analytic function methods
Aman Sinha has removed a vote on this change. Change subject: IMPALA-10619: Minor refactoring of analytic function methods .. Removed Verified-1 by Impala Public Jenkins -- To view, visit http://gerrit.cloudera.org:8080/17237 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: I39e4268c0c5500f09acf98357a80763c28f615c2 Gerrit-Change-Number: 17237 Gerrit-PatchSet: 4 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 ) Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator .. Patch Set 17: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8553/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17252 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964 Gerrit-Change-Number: 17252 Gerrit-PatchSet: 17 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Mon, 12 Apr 2021 17:48:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10532: TestOverlapMinMaxFilters.test overlap min max filters seems flaky
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17289 ) Change subject: IMPALA-10532: TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/17289/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17289/2//COMMIT_MSG@10 PS2, Line 10: due to the query plan change (from 3-node scan to : 2-node scan) which splits the row groups among scan nodes differently. > It's not clear to me why no pages are filtered by the min-max filter. Could Here are some details if it is not too much :-) The bottom line is that only 2 nodes are involved and the split is such that in instance 1, the overlap ratio (0.749748) is higher than the threshold (0.5) which implies no min/max filtering at the page and row level. In instance 2, the overlap ration is 0 which implies no min/max filtering. Combined, no page level filtering is done and thus the total pages filtered is 0. The details is purposely left out in the commit message. Executor instance 1: 1732 E RowGroup Debug: Try to filter out a rowgroup via overlap predicate: MonotonicMillis=13987739, fid=1, SchemaNode=optional int64 l_orderkey [i:0 d:1 r:0], columnType=BIGINT, overlap ratio=0.749748, threshold=0.5, worthiness=0, enabled for page=0, enabled for row=0, data min=1, data max=3209607, content=BigIntMinMaxFilter(min=224167, max=2630562, always_false=false), always_true=false) 1778 E - NumRuntimeFilteredPages: 0 (0) 1779 E - NumRuntimeFilteredRowGroups: 0 (0) Executor instance 2 1919 E RowGroup Debug: Try to filter out a rowgroup via overlap predicate: MonotonicMillis=13987773, fid=1, SchemaNode=optional int64 l_orderkey [i:0 d:1 r:0], columnType=BIGINT, overlap ratio=0, threshold=0.5, worthiness=1, enabled for page=1, enabled for row=0, data min=3209632, data max=600, content=BigIntMinMaxFilter(min=224167, max=2630562, always_false=false), always_true=false) 1965 E - NumRuntimeFilteredPages: 0 (0) 1966 E - NumRuntimeFilteredRowGroups: 1 (1) -- To view, visit http://gerrit.cloudera.org:8080/17289 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I527de530f7db1ce959e7ef2ae3ced18677221c9f Gerrit-Change-Number: 17289 Gerrit-PatchSet: 2 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Mon, 12 Apr 2021 17:45:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 ) Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator .. Patch Set 17: Three new tests are added in Test/overlap_min_max_filters.test. -- To view, visit http://gerrit.cloudera.org:8080/17252 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964 Gerrit-Change-Number: 17252 Gerrit-PatchSet: 17 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Mon, 12 Apr 2021 17:29:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator
Qifan Chen has uploaded a new patch set (#17). ( http://gerrit.cloudera.org:8080/17252 ) Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator .. IMPALA-10647 Improve always-true min/max filter handling in coordinator The change improves how a coordinator behaves when a just arriving min/max filter is always true. A new member 'always_true_filter_received_' is introduced to record such a fact. Similarily, the new member always_false_flipped_ is added to indicate that the always false flag is flipped from true to false. These two members only influence how the min and max columns in "Filter routing table" and "Final filter table" in profile are displayed as follows. 1. 'PartialUpdates' - The min and the max are partially updated; 2. 'AlwaysTrue' - One received filter is AlwaysTrue; 3. 'AlwaysFalse'- No filter is received or all received filters are empty; 4. 'Real values'- The final accumulated min/max from all received filters. A second change introduced is to record, in scan node, the arrival time of min/max filters (as a timestamp since the system is rebooted obtained by calling MonotonicMillis()). A timestamp of similar nature is recorded for hdfs parquet scanners when a row group is processed. By comparing these two timestamps, one can easily diagnose issues related to late arrival of min/max filters. Testing: 1. Ran unit tests; 2. Ran core tests. Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964 --- M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/scan-node.cc M be/src/runtime/coordinator-filter-state.h M be/src/runtime/coordinator.cc M be/src/util/min-max-filter.cc M be/src/util/min-max-filter.h M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test 8 files changed, 214 insertions(+), 28 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/17252/17 -- To view, visit http://gerrit.cloudera.org:8080/17252 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964 Gerrit-Change-Number: 17252 Gerrit-PatchSet: 17 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-9732: Improve exceptions of unsupported HdfsTableSink formats
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17300 ) Change subject: IMPALA-9732: Improve exceptions of unsupported HdfsTableSink formats .. Patch Set 1: Code-Review+2 (3 comments) Nice improvement! http://gerrit.cloudera.org:8080/#/c/17300/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17300/1//COMMIT_MSG@13 PS1, Line 13: eaerlier earlier http://gerrit.cloudera.org:8080/#/c/17300/1//COMMIT_MSG@15 PS1, Line 15: Iceberg Actually Iceberg tables are related because they might use Parquet or ORC underneath. FeIcebergTable inherits from FeFsTable and it overrides 'getFileFormats()' so your patch works well on iceberg tables. http://gerrit.cloudera.org:8080/#/c/17300/1//COMMIT_MSG@16 PS1, Line 16: realted related -- To view, visit http://gerrit.cloudera.org:8080/17300 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7fa2f949336a422acb4d01c9347b9b2e808e4aec Gerrit-Change-Number: 17300 Gerrit-PatchSet: 1 Gerrit-Owner: Tamas Mate Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 12 Apr 2021 16:44:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10445: Adjust NDV's precision/scale by query option
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17306 ) Change subject: IMPALA-10445: Adjust NDV's precision/scale by query option .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8552/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17306 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50 Gerrit-Change-Number: 17306 Gerrit-PatchSet: 8 Gerrit-Owner: fifteencai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 12 Apr 2021 16:37:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10445: Adjust NDV's precision/scale by query option
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17306 ) Change subject: IMPALA-10445: Adjust NDV's precision/scale by query option .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8551/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17306 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50 Gerrit-Change-Number: 17306 Gerrit-PatchSet: 7 Gerrit-Owner: fifteencai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 12 Apr 2021 16:33:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10445: Adjust NDV's precision/scale by query option
fifteencai has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/17306 ) Change subject: IMPALA-10445: Adjust NDV's precision/scale by query option .. IMPALA-10445: Adjust NDV's precision/scale by query option This is a new way to control NDV's scale. Since IMPALA-2658, we can trade additional memory for more accurate estimation by setting larger `scale`. The scale is decided by SQL writers. However, it is a bumpy road to smoothly upgrade ndv by setting scales larger than the default value of 2. Here lies 2 reasons: - Firstly, SQL writers are reluctant to lower expectations, they prone to write ndv(id, 10) other than ndv(id, 9), ndv(id, 8) and so on. But larger scales like 10 will use much more memories, especially when there are `group by`s with high cardinality. So it is wiser to let cluster admin to choose appropriate scale instead. - Secondly, The queries are stored in a BI tool's configuration DB. Rewriting thousands of SQLs is a risky job. In this commit, we introduced a new Query Option `DEFAULT_NDV_SCALE`. During to its runtime essence, We can either set it before query submission according to cluster's overall load, or set it by placing a default query option for dynamic resource pool. We also modified the `Analyzer` to make sure APPX_COUNT_DISTINCT can work with this query option. So if needed, we can degrade service by transforming `count (distinct id)` to `ndv(id, scale)`. Implementation details: - The default value of DEFAULT_NDV_SCALE is 2, so we won't change the default ndv behavior. - We port `CountDistinctToNdv` transform logic from `SelectStmt.analyze()` to `ExprRewriter`, making it compatible with further rewrite rules. - The newly added rewrite rule `DefaultNdvScaleRule` is applied after `CountDistinctToNdvRule`. Usage: To set a default ndv scale: ``` SET DEFAULT_NDV_SCALE = 10; -- The range is [1, 10] ``` To unset: ``` SET DEFAULT_NDV_SCALE = 2; ``` Here are test results of a typical workload (cardinality=40,090,650): ++ | Metric| Count Distinct |NDV2|NDV5|NDV10 | ++ | Memory(GB) | 3.83 |1.84|1.85| 1.89 | | Duration(s) | 182.89| 30.22|29.72 | 29.24 | | ErrorRate |0% |1.8%|1.17% | 0.06% | ++ Testing: 1) Added 3 unit test cases in `ExprRewriteRulesTest`. 2) Added 5 unit test cases in `ExprRewriterTest`. 3) Ran all front-end unit test, passed. 4) Added a new query-option test. Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50 --- M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java A fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java A fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java 11 files changed, 229 insertions(+), 34 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17306/8 -- To view, visit http://gerrit.cloudera.org:8080/17306 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50 Gerrit-Change-Number: 17306 Gerrit-PatchSet: 8 Gerrit-Owner: fifteencai Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10445: Adjust NDV's precision/scale by query option
fifteencai has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/17306 ) Change subject: IMPALA-10445: Adjust NDV's precision/scale by query option .. IMPALA-10445: Adjust NDV's precision/scale by query option We introduce a new way to control NDV's scale. Since IMPALA-2658, we can trade additional memory for more accurate estimation by setting larger `scale`. The scale is decided by SQL writers. However, it is a bumpy road to smoothly upgrade ndv by setting scales larger than the default value of 2. Here lies 2 reasons: -Firstly, SQL writers are reluctant to lower expectations, they prone to write ndv(id, 10) other than ndv(id, 9), ndv(id, 8) and so on. But larger scales like 10 will use much more memories, especially when there are `group by`s with high cardinality. So it is wiser to let cluster admin to choose appropriate scale instead. Secondly, The queries are stored in a BI tool's configuration DB. Rewriting thousands of SQLs is a risky job. In this commit, we introduced a new Query Option `DEFAULT_NDV_SCALE`. During to its runtime essence, We can either set it before query submission according to cluster's overall load, or set it by placing a default query option for dynamic resource pool. We also modified the `Analyzer` to make sure APPX_COUNT_DISTINCT can work with this query option. So if needed, we can degrade service by transforming `count (distinct id)` to `ndv(id, scale)`. Implementation details: - The default value of DEFAULT_NDV_SCALE is 2, so we won't change the default ndv behavior. - We port `CountDistinctToNdv` transform logic from `SelectStmt.analyze()` to `ExprRewriter`, making it compatible with further rewrite rules. - The newly added rewrite rule `DefaultNdvScaleRule` is applied after `CountDistinctToNdvRule`. Usage: To set a default ndv scale: ``` SET DEFAULT_NDV_SCALE = 10; -- The range is [1, 10] ``` To unset: ``` SET DEFAULT_NDV_SCALE = 2; ``` Here are test results of a typical workload (cardinality=40,090,650): ++ | Metric| Count Distinct |NDV2|NDV5|NDV10 | ++ | Memory(GB) | 3.83 |1.84|1.85| 1.89 | | Duration(s) | 182.89| 30.22|29.72 | 29.24 | | ErrorRate |0% |1.8%|1.17% | 0.06% | ++ Testing: 1) Added 3 unit test cases in `ExprRewriteRulesTest`. 2) Added 5 unit test cases in `ExprRewriterTest`. 3) Ran all front-end unit test, passed. 4) Added a new query-option test. Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50 --- M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java A fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java A fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java 11 files changed, 229 insertions(+), 34 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17306/7 -- To view, visit http://gerrit.cloudera.org:8080/17306 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50 Gerrit-Change-Number: 17306 Gerrit-PatchSet: 7 Gerrit-Owner: fifteencai Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10445: Adjust NDV's precision/scale by query option
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17306 ) Change subject: IMPALA-10445: Adjust NDV's precision/scale by query option .. Patch Set 3: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/8550/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/17306 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50 Gerrit-Change-Number: 17306 Gerrit-PatchSet: 3 Gerrit-Owner: fifteencai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 12 Apr 2021 15:58:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10640: Support reading Parquet Bloom filters - most common types
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17026 ) Change subject: IMPALA-10640: Support reading Parquet Bloom filters - most common types .. Patch Set 25: Code-Review+1 (5 comments) Thanks Daniel, the change LGTM. Tamas said he's planning to take another look so I'm only giving +1 for now. http://gerrit.cloudera.org:8080/#/c/17026/24//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17026/24//COMMIT_MSG@7 PS24, Line 7: IMPALA-10640: Support reading Parquet Bloom filters - most common types > I think it can only apply to top level columns as I check whether the type I see. Is there a plan to extend it to non top level fields? Btw, does Parquet-MR generate bloom filters for them as well? http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/hdfs-parquet-scanner.cc@887 PS24, Line 887: > I've added the query option and put this code and the call to CreateColIdx2 Yeah, adding a test is never a bad idea and it shouldn't be hard to add such a test in a .test file. http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/hdfs-parquet-scanner.cc@1483 PS24, Line 1483: > I've considered that but ParquetPageIndex::ReadAll() Oh, I see. Thanks for taking a look. http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/hdfs-parquet-scanner.cc@1716 PS24, Line 1716: bloom_filter.InitFromDirectoryNoCopy(data_buffer.buffer(), data_buffer.Size())); > I included the row group idx, column idx and filename in the log message. I I think it's OK as it is. Based on the column idx and query text I think we should be able to identify the conjunct. http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/parquet-bloom-filter-util.h File be/src/exec/parquet/parquet-bloom-filter-util.h: http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/parquet-bloom-filter-util.h@37 PS24, Line 37: . > I do not want to guarantee in which cases '*storage' will be used as the im I see, thanks for extending the comment. -- To view, visit http://gerrit.cloudera.org:8080/17026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287 Gerrit-Change-Number: 17026 Gerrit-PatchSet: 25 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 12 Apr 2021 15:55:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10350 Fix precision loss while converting DecimalValue to double.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17303 ) Change subject: IMPALA-10350 Fix precision loss while converting DecimalValue to double. .. Patch Set 3: (7 comments) The change looks great overall! http://gerrit.cloudera.org:8080/#/c/17303/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17303/3//COMMIT_MSG@7 PS3, Line 7: IMPALA-10350 Please add : after IMPALA-10350 http://gerrit.cloudera.org:8080/#/c/17303/3//COMMIT_MSG@9 PS3, Line 9: imals) was not accurate. In commit message bodies we use lines with 72 chars width. http://gerrit.cloudera.org:8080/#/c/17303/3//COMMIT_MSG@20 PS3, Line 20: Please add section about testing. http://gerrit.cloudera.org:8080/#/c/17303/3/be/src/runtime/decimal-value.inline.h File be/src/runtime/decimal-value.inline.h: http://gerrit.cloudera.org:8080/#/c/17303/3/be/src/runtime/decimal-value.inline.h@727 PS3, Line 727: // Original approach was to use: Thanks for the detailed comments! http://gerrit.cloudera.org:8080/#/c/17303/3/be/src/runtime/decimal-value.inline.h@745 PS3, Line 745: int64_t> (value_ Type of value_ can be __int128_t. If that's the case and the value is greater than the max value of int64_t, then I think we also want to fallback to the original algorithm. http://gerrit.cloudera.org:8080/#/c/17303/3/be/src/runtime/decimal-value.inline.h@747 PS3, Line 747: // Fallback to original approach. This is not always accurate as described above. : // Other alternative would be to convert value_ to string and parse it into : // double using std:strtod. However std::strtod is atleast 4X slower : // than this approach (https://github.com/lemire/fast_double_parser#sample-results) : // and that is excluding cost to convert value_ to string. : if (!success) { : result = static_cast(value_) / pow(10.0, scale); : } I'm happy with the current patch as it is a big step forward, but IMPALA-10350 is about correctness and I think we still want to track that problem in Jira. How about creating a sub-task for IMPALA-10350 and commit this patch with that Jira ID? We'll close the sub-task once this patch is committed, but leave IMPALA-10350 open so we'll be aware of the limitations. Hopefully we'll find a proper algorithm in the future. (If this patch was using strtod then we'd still need another Jira to track perf improvements possibilities in the future). http://gerrit.cloudera.org:8080/#/c/17303/3/testdata/workloads/functional-query/queries/QueryTest/values.test File testdata/workloads/functional-query/queries/QueryTest/values.test: http://gerrit.cloudera.org:8080/#/c/17303/3/testdata/workloads/functional-query/queries/QueryTest/values.test@106 PS3, Line 106: # insert one negative value < -2^53 * 10^(-17) Please add test where the value needs to be stored in __int128_t. -- To view, visit http://gerrit.cloudera.org:8080/17303 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56f0652cb8f81a491b87d9b108a94c00ae6c99a1 Gerrit-Change-Number: 17303 Gerrit-PatchSet: 3 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 12 Apr 2021 15:50:54 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10445: Adjust NDV's precision/scale by query option
fifteencai has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/17306 ) Change subject: IMPALA-10445: Adjust NDV's precision/scale by query option .. IMPALA-10445: Adjust NDV's precision/scale by query option In this commit, we introduced a new Query Option to control NDV's default HyperLogLog scale. Since IMPALA-2658, we can trade additional memory for more accurate estimation by setting larger `scale`. The scale is decided by SQL writers. However, it is a bumpy road to smoothly upgrade ndv by setting scales larger than the default value of 2. Here lies 2 reasons, firstly, SQL writers are reluctant to lower expectations, they prone to write ndv(id, 10) other than ndv(id, 9), ndv(id, 8) and so on. But larger scales like 10 will use much more memories, especially when there are `group by`s with high cardinality. So it is wiser to let cluster admin to choose appropriate scale instead. Secondly, The queries are stored in a BI tool's configuration DB. Rewriting thousands of SQLs is a risky job. In this commit, we introduced a new Query Option `DEFAULT_NDV_SCALE`. During to its runtime essence, We can either set it before query submission according to cluster's overall load, or set it by placing a default query option for dynamic resource pool. We also modified the `Analyzer` to make sure APPX_COUNT_DISTINCT works with this query option. So if needed, we can degrade the service by transforming `count (distinct id)` to `ndv(id, scale)`. Implementation details: - The default value of DEFAULT_NDV_SCALE is 2, so we won't change the default ndv behavior. - We port `CountDistinctToNdv` transform logic from `SelectStmt.analyze()` to `ExprRewriter`, making it compatible with further rewrite rules. - The newly added rewrite rule `DefaultNdvScaleRule` is applied after `CountDistinctToNdvRule`. Usage: To set a default ndv scale: ``` SET DEFAULT_NDV_SCALE = 10; -- The range is [1, 10] ``` To unset: ``` SET DEFAULT_NDV_SCALE = 2; ``` Here are test results of a typical workload (cardinality = 40,090,650): +=+ | Metric| Count Distinct |NDV2|NDV5|NDV10 | +-+ | Memory(GB) | 3.83 |1.84|1.85| 1.89 | | Duration(s) | 182.89| 30.22|29.72 | 29.24 | | ErrorRate |0% |1.8%|1.17% | 0.06% | +=+ Testing: 1) Added 3 unit test cases in `ExprRewriteRulesTest`. 2) Added 5 unit test cases in `ExprRewriterTest`. 3) Ran all front-end unit test, passed. 4) Added a new query-option test. Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50 --- M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java A fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java A fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java 11 files changed, 196 insertions(+), 34 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17306/3 -- To view, visit http://gerrit.cloudera.org:8080/17306 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50 Gerrit-Change-Number: 17306 Gerrit-PatchSet: 3 Gerrit-Owner: fifteencai Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] WIP - IMPALA-10642: Write support for Parquet Bloom filters - most common types
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17262 ) Change subject: WIP - IMPALA-10642: Write support for Parquet Bloom filters - most common types .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8549/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17262 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie865efd4f0c11b9e111fb94f77d084bf6ee20792 Gerrit-Change-Number: 17262 Gerrit-PatchSet: 3 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 12 Apr 2021 15:32:01 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP - IMPALA-10642: Write support for Parquet Bloom filters - most common types
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17262 ) Change subject: WIP - IMPALA-10642: Write support for Parquet Bloom filters - most common types .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/17262/3/be/src/exec/parquet/hdfs-parquet-table-writer.cc File be/src/exec/parquet/hdfs-parquet-table-writer.cc: http://gerrit.cloudera.org:8080/#/c/17262/3/be/src/exec/parquet/hdfs-parquet-table-writer.cc@495 PS3, Line 495: // Write dictionary keys to Parquet Bloom filter if we haven't been filling it so line too long (91 > 90) -- To view, visit http://gerrit.cloudera.org:8080/17262 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie865efd4f0c11b9e111fb94f77d084bf6ee20792 Gerrit-Change-Number: 17262 Gerrit-PatchSet: 3 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 12 Apr 2021 15:12:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] WIP - IMPALA-10642: Write support for Parquet Bloom filters - most common types
Daniel Becker has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17262 Change subject: WIP - IMPALA-10642: Write support for Parquet Bloom filters - most common types .. WIP - IMPALA-10642: Write support for Parquet Bloom filters - most common types This change adds support for writing Parquet Bloom filters for the types for which read support was added in IMPALA-10640. Writing of Parquet Bloom filters can be controlled by the 'parquet_bloom_filter_write' query option which has the following possible values: NEVER - never write Parquet Bloom filters TBL_PROPS - write Parquet Bloom filters as set in table properties IF_NO_DICT - write Parquet Bloom filters if the row group is not fully dictionary encoded ALWAYS - always write Parquet Bloom filters, even if the row group is fully dictionary encoded TODO: Implement table properties involving Parquet Bloom filters. TODO: Decide size of Parquet Bloom filter based on NDV heuristics or configuration. Testing: TODO Change-Id: Ie865efd4f0c11b9e111fb94f77d084bf6ee20792 --- M be/src/exec/parquet/hdfs-parquet-table-writer.cc M be/src/exec/parquet/hdfs-parquet-table-writer.h M be/src/exec/parquet/parquet-bloom-filter-util.cc M be/src/exec/parquet/parquet-bloom-filter-util.h M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/debug-util.cc M be/src/util/debug-util.h M be/src/util/dict-encoding.h M be/src/util/parquet-bloom-filter.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift 12 files changed, 313 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/17262/3 -- To view, visit http://gerrit.cloudera.org:8080/17262 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ie865efd4f0c11b9e111fb94f77d084bf6ee20792 Gerrit-Change-Number: 17262 Gerrit-PatchSet: 3 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker
[Impala-ASF-CR] IMPALA-10445: Adjust NDV's precision/scale by query option
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17306 ) Change subject: IMPALA-10445: Adjust NDV's precision/scale by query option .. Patch Set 2: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/8548/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/17306 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50 Gerrit-Change-Number: 17306 Gerrit-PatchSet: 2 Gerrit-Owner: fifteencai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 12 Apr 2021 14:57:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10445: Adjust NDV's precision/scale by query option
fifteencai has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/17306 ) Change subject: IMPALA-10445: Adjust NDV's precision/scale by query option .. IMPALA-10445: Adjust NDV's precision/scale by query option In this commit, we introduced a new Query Option to control NDV's default HyperLogLog scale. Since IMPALA-2658, we can trade additional memory for more accurate estimation by setting larger `scale`. The scale is decided by SQL writers. However, it is a bumpy road to smoothly upgrade ndv by setting scales larger than the default value of 2. Here lies 2 reasons, firstly, SQL writers are reluctant to lower expectations, they prone to write ndv(id, 10) other than ndv(id, 9), ndv(id, 8) and so on. But larger scales like 10 will use much more memories, especially when there are `group by`s with high cardinality. So it is wiser to let cluster admin to choose appropriate scale instead. Secondly, The queries are stored in a BI tool's configuration DB. Rewriting thousands of SQLs is a risky job. In this commit, we introduced a new Query Option `DEFAULT_NDV_SCALE`. During to its runtime essence, We can either set it before query submission according to cluster's overall load, or set it by placing a default query option for dynamic resource pool. We also modified the `Analyzer` to make sure APPX_COUNT_DISTINCT works with this query option. So if needed, we can degrade the service by transforming `count (distinct id)` to `ndv(id, scale)`. Implementation details: - The default value of DEFAULT_NDV_SCALE is 2, so we won't change the default ndv behavior. - We port `CountDistinctToNdv` transform logic from `SelectStmt.analyze()` to `ExprRewriter`, making it compatible with further rewrite rules. - The newly added rewrite rule `DefaultNdvScaleRule` is applied after `CountDistinctToNdvRule`. Usage: To set a default ndv scale: ``` SET DEFAULT_NDV_SCALE = 10; -- The range is [1, 10] ``` To unset: ``` SET DEFAULT_NDV_SCALE = 2; ``` Here are test results of a typical workload (cardinality = 40,090,650): +=+ | Metric| Count Distinct |NDV2|NDV5|NDV10 | +-+ | Memory(GB) | 3.83 |1.84|1.85| 1.89 | | Duration(s) | 182.89| 30.22|29.72 | 29.24 | | ErrorRate |0% |1.8%|1.17% | 0.06% | +=+ Testing: 1) Added 3 unit test cases in `ExprRewriteRulesTest`. 2) Added 5 unit test cases in `ExprRewriterTest`. 3) Ran all front-end unit test, passed. 4) Added a new query-option test. Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50 --- M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java A fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java A fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java 11 files changed, 196 insertions(+), 34 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17306/2 -- To view, visit http://gerrit.cloudera.org:8080/17306 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50 Gerrit-Change-Number: 17306 Gerrit-PatchSet: 2 Gerrit-Owner: fifteencai Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR](asf-site) Switch to .asf.yaml for site publishing
Jim Apple has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17301 ) Change subject: Switch to .asf.yaml for site publishing .. Switch to .asf.yaml for site publishing Our existing method of web publication will be going offline. .asf.yaml is the replacement. See also https://cwiki.apache.org/confluence/display/INFRA/Git+-+.asf.yaml+features#Git.asf.yamlfeatures-Publishingabranchtoyourprojectwebsite (not visible to all users) https://lists.apache.org/thread.html/r8d023c0f5afefca7f6ce4e26d02404762bd6234fbe328011e1564249%40%3Cusers.infra.apache.org%3E https://infra-reports.apache.org/site-source/ Change-Id: I9154ab397674d4c98e20f7c4f3774dfc19ff2932 Reviewed-on: http://gerrit.cloudera.org:8080/17301 Reviewed-by: Jim Apple Tested-by: Jim Apple --- A .asf.yaml 1 file changed, 2 insertions(+), 0 deletions(-) Approvals: Jim Apple: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/17301 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: asf-site Gerrit-MessageType: merged Gerrit-Change-Id: I9154ab397674d4c98e20f7c4f3774dfc19ff2932 Gerrit-Change-Number: 17301 Gerrit-PatchSet: 3 Gerrit-Owner: Jim Apple Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Shajini Thayasingh
[Impala-ASF-CR](asf-site) Switch to .asf.yaml for site publishing
Jim Apple has posted comments on this change. ( http://gerrit.cloudera.org:8080/17301 ) Change subject: Switch to .asf.yaml for site publishing .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/17301 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: asf-site Gerrit-MessageType: comment Gerrit-Change-Id: I9154ab397674d4c98e20f7c4f3774dfc19ff2932 Gerrit-Change-Number: 17301 Gerrit-PatchSet: 2 Gerrit-Owner: Jim Apple Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Shajini Thayasingh Gerrit-Comment-Date: Mon, 12 Apr 2021 14:22:31 + Gerrit-HasComments: No
[Impala-ASF-CR](asf-site) Switch to .asf.yaml for site publishing
Jim Apple has posted comments on this change. ( http://gerrit.cloudera.org:8080/17301 ) Change subject: Switch to .asf.yaml for site publishing .. Patch Set 2: Code-Review+2 Carry +2 (only change is in formatting of commit description -- To view, visit http://gerrit.cloudera.org:8080/17301 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: asf-site Gerrit-MessageType: comment Gerrit-Change-Id: I9154ab397674d4c98e20f7c4f3774dfc19ff2932 Gerrit-Change-Number: 17301 Gerrit-PatchSet: 2 Gerrit-Owner: Jim Apple Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Shajini Thayasingh Gerrit-Comment-Date: Mon, 12 Apr 2021 14:20:42 + Gerrit-HasComments: No
[Impala-ASF-CR](asf-site) Switch to .asf.yaml for site publishing
Jim Apple has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17301 Change subject: Switch to .asf.yaml for site publishing .. Switch to .asf.yaml for site publishing Our existing method of web publication will be going offline. .asf.yaml is the replacement. See also https://cwiki.apache.org/confluence/display/INFRA/Git+-+.asf.yaml+features#Git.asf.yamlfeatures-Publishingabranchtoyourprojectwebsite (not visible to all users) https://lists.apache.org/thread.html/r8d023c0f5afefca7f6ce4e26d02404762bd6234fbe328011e1564249%40%3Cusers.infra.apache.org%3E https://infra-reports.apache.org/site-source/ Change-Id: I9154ab397674d4c98e20f7c4f3774dfc19ff2932 --- A .asf.yaml 1 file changed, 2 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/01/17301/1 -- To view, visit http://gerrit.cloudera.org:8080/17301 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: asf-site Gerrit-MessageType: newchange Gerrit-Change-Id: I9154ab397674d4c98e20f7c4f3774dfc19ff2932 Gerrit-Change-Number: 17301 Gerrit-PatchSet: 1 Gerrit-Owner: Jim Apple Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Shajini Thayasingh
[Impala-ASF-CR](asf-site) Switch to .asf.yaml for site publishing
Hello Quanlong Huang, Shajini Thayasingh, Sahil Takiar, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17301 to look at the new patch set (#2). Change subject: Switch to .asf.yaml for site publishing .. Switch to .asf.yaml for site publishing Our existing method of web publication will be going offline. .asf.yaml is the replacement. See also https://cwiki.apache.org/confluence/display/INFRA/Git+-+.asf.yaml+features#Git.asf.yamlfeatures-Publishingabranchtoyourprojectwebsite (not visible to all users) https://lists.apache.org/thread.html/r8d023c0f5afefca7f6ce4e26d02404762bd6234fbe328011e1564249%40%3Cusers.infra.apache.org%3E https://infra-reports.apache.org/site-source/ Change-Id: I9154ab397674d4c98e20f7c4f3774dfc19ff2932 --- A .asf.yaml 1 file changed, 2 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/01/17301/2 -- To view, visit http://gerrit.cloudera.org:8080/17301 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: asf-site Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9154ab397674d4c98e20f7c4f3774dfc19ff2932 Gerrit-Change-Number: 17301 Gerrit-PatchSet: 2 Gerrit-Owner: Jim Apple Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Shajini Thayasingh
[Impala-ASF-CR] IMPALA-10445: Adjust NDV's precision/scale by query option
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17306 ) Change subject: IMPALA-10445: Adjust NDV's precision/scale by query option .. Patch Set 1: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/8547/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/17306 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50 Gerrit-Change-Number: 17306 Gerrit-PatchSet: 1 Gerrit-Owner: fifteencai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 12 Apr 2021 13:47:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10445: Adjust NDV's precision/scale by query option
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17306 ) Change subject: IMPALA-10445: Adjust NDV's precision/scale by query option .. Patch Set 1: (8 comments) http://gerrit.cloudera.org:8080/#/c/17306/1/fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java File fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java: http://gerrit.cloudera.org:8080/#/c/17306/1/fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java@13 PS1, Line 13: public static org.apache.impala.rewrite.CountDistinctToNdvRule INSTANCE = new org.apache.impala.rewrite.CountDistinctToNdvRule(); line too long (133 > 90) http://gerrit.cloudera.org:8080/#/c/17306/1/fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java@25 PS1, Line 25: if (!analyzer.getQueryCtx().client_request.query_options.appx_count_distinct) return expr; line too long (98 > 90) http://gerrit.cloudera.org:8080/#/c/17306/1/fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java@31 PS1, Line 31: || !oldFunctionCallExpr.isDistinct() || oldFunctionCallExpr.getParams().exprs().size() != 1 ) { line too long (111 > 90) http://gerrit.cloudera.org:8080/#/c/17306/1/fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java@36 PS1, Line 36: FunctionCallExpr ndvFunc = new FunctionCallExpr("ndv", oldFunctionCallExpr.getParams().exprs()); line too long (104 > 90) http://gerrit.cloudera.org:8080/#/c/17306/1/fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java File fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java: http://gerrit.cloudera.org:8080/#/c/17306/1/fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java@28 PS1, Line 28: * changes were made, the transformed Expr is guaranteed to be a different Expr object, line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/17306/1/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java File fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java: http://gerrit.cloudera.org:8080/#/c/17306/1/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java@521 PS1, Line 521: assertToSql(createAnalysisCtx(options), countDistinctSql, countDistinctSql, "SELECT ndv(id) FROM functional.alltypes"); line too long (123 > 90) http://gerrit.cloudera.org:8080/#/c/17306/1/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java@525 PS1, Line 525: assertToSql(createAnalysisCtx(options), countDistinctSql, countDistinctSql, "SELECT ndv(id, 10) FROM functional.alltypes"); line too long (127 > 90) http://gerrit.cloudera.org:8080/#/c/17306/1/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java@535 PS1, Line 535: assertToSql(createAnalysisCtx(options), ndvSql, ndvSql, "SELECT ndv(id, 9) FROM functional.alltypes"); line too long (106 > 90) -- To view, visit http://gerrit.cloudera.org:8080/17306 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50 Gerrit-Change-Number: 17306 Gerrit-PatchSet: 1 Gerrit-Owner: fifteencai Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 12 Apr 2021 13:29:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10445: Adjust NDV's precision/scale by query option
fifteencai has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17306 Change subject: IMPALA-10445: Adjust NDV's precision/scale by query option .. IMPALA-10445: Adjust NDV's precision/scale by query option In this commit, we introduced a new Query Option to control NDV's default HyperLogLog scale. Since IMPALA-2658, we can trade additional memory for more accurate estimation by setting larger `scale`. The scale is decided by SQL writers. However, it is a bumpy road to smoothly upgrade ndv by setting scales larger than the default value of 2. Here lies 2 reasons, firstly, SQL writers are reluctant to lower expectations, they prone to write ndv(id, 10) other than ndv(id, 9), ndv(id, 8) and so on. But larger scales like 10 will use much more memories, especially when there are `group by`s with high cardinality. So it is wiser to let cluster admin to choose appropriate scale instead. Secondly, The queries are stored in a BI tool's configuration DB. Rewriting thousands of SQLs is a risky job. In this commit, we introduced a new Query Option `DEFAULT_NDV_SCALE`. During to its runtime essence, We can either set it before query submission according to cluster's overall load, or set it by placing a default query option for dynamic resource pool. We also modified the `Analyzer` to make sure APPX_COUNT_DISTINCT works with this query option. So if needed, we can degrade the service by transforming `count (distinct id)` to `ndv(id, scale)`. Implementation details: - The default value of DEFAULT_NDV_SCALE is 2, so we won't change the default ndv behavior. - We port `CountDistinctToNdv` transform logic from `SelectStmt.analyze()` to `ExprRewriter`, making it compatible with further rewrite rules. - The newly added rewrite rule `DefaultNdvScaleRule` is applied after `CountDistinctToNdvRule`. Usage: To set a default ndv scale: ``` SET DEFAULT_NDV_SCALE = 10; -- The range is [1, 10] ``` To unset: ``` SET DEFAULT_NDV_SCALE = 2; ``` Here are test results of a typical workload (cardinality = 40,090,650): +=+ | Metric| Count Distinct |NDV2|NDV5|NDV10 | +-+ | Memory(GB) | 3.83 |1.84|1.85| 1.89 | | Duration(s) | 182.89| 30.22|29.72 | 29.24 | | ErrorRate |0% |1.8%|1.17% | 0.06% | +=+ Testing: 1) Added 3 unit test cases in `ExprRewriteRulesTest`. 2) Added 5 unit test cases in `ExprRewriterTest`. 3) Ran all front-end unit test, passed. 4) Added a new query-option test. Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50 --- M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java A fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java A fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java 11 files changed, 190 insertions(+), 34 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17306/1 -- To view, visit http://gerrit.cloudera.org:8080/17306 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50 Gerrit-Change-Number: 17306 Gerrit-PatchSet: 1 Gerrit-Owner: fifteencai
[Impala-ASF-CR] IMPALA-10619: Minor refactoring of analytic function methods
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17237 ) Change subject: IMPALA-10619: Minor refactoring of analytic function methods .. Patch Set 4: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7063/ -- To view, visit http://gerrit.cloudera.org:8080/17237 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I39e4268c0c5500f09acf98357a80763c28f615c2 Gerrit-Change-Number: 17237 Gerrit-PatchSet: 4 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 12 Apr 2021 13:12:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10631: Upgrade DataSketches to version 3.0.0
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17294 ) Change subject: IMPALA-10631: Upgrade DataSketches to version 3.0.0 .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8546/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I37622a7643d015b80f55b802421eae826aa7a4f9 Gerrit-Change-Number: 17294 Gerrit-PatchSet: 3 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 12 Apr 2021 10:37:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10631: Upgrade DataSketches to version 3.0.0
Fucun Chu has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/17294 ) Change subject: IMPALA-10631: Upgrade DataSketches to version 3.0.0 .. IMPALA-10631: Upgrade DataSketches to version 3.0.0 Upgrade the external DataSketches files CPC/HLL/KLL/Theta to version 3.0.0 tests: -Ran the tests from tests/query_test/test_datasketches.py Change-Id: I37622a7643d015b80f55b802421eae826aa7a4f9 --- M be/src/exprs/datasketches-test.cc M be/src/thirdparty/datasketches/AuxHashMap-internal.hpp M be/src/thirdparty/datasketches/AuxHashMap.hpp M be/src/thirdparty/datasketches/CompositeInterpolationXTable.hpp M be/src/thirdparty/datasketches/CouponHashSet-internal.hpp M be/src/thirdparty/datasketches/CouponHashSet.hpp M be/src/thirdparty/datasketches/CouponList-internal.hpp M be/src/thirdparty/datasketches/CouponList.hpp M be/src/thirdparty/datasketches/CubicInterpolation.hpp M be/src/thirdparty/datasketches/HarmonicNumbers.hpp M be/src/thirdparty/datasketches/Hll4Array-internal.hpp M be/src/thirdparty/datasketches/Hll4Array.hpp M be/src/thirdparty/datasketches/Hll6Array-internal.hpp M be/src/thirdparty/datasketches/Hll6Array.hpp M be/src/thirdparty/datasketches/Hll8Array-internal.hpp M be/src/thirdparty/datasketches/Hll8Array.hpp M be/src/thirdparty/datasketches/HllArray-internal.hpp M be/src/thirdparty/datasketches/HllArray.hpp M be/src/thirdparty/datasketches/HllSketch-internal.hpp M be/src/thirdparty/datasketches/HllSketchImpl.hpp M be/src/thirdparty/datasketches/HllSketchImplFactory.hpp M be/src/thirdparty/datasketches/HllUnion-internal.hpp M be/src/thirdparty/datasketches/HllUtil.hpp M be/src/thirdparty/datasketches/MurmurHash3.h M be/src/thirdparty/datasketches/README.md M be/src/thirdparty/datasketches/RelativeErrorTables.hpp A be/src/thirdparty/datasketches/bounds_on_ratios_in_sampled_sets.hpp A be/src/thirdparty/datasketches/bounds_on_ratios_in_theta_sketched_sets.hpp M be/src/thirdparty/datasketches/cpc_common.hpp M be/src/thirdparty/datasketches/cpc_compressor.hpp M be/src/thirdparty/datasketches/cpc_compressor_impl.hpp M be/src/thirdparty/datasketches/cpc_sketch.hpp M be/src/thirdparty/datasketches/cpc_sketch_impl.hpp M be/src/thirdparty/datasketches/cpc_union.hpp M be/src/thirdparty/datasketches/cpc_union_impl.hpp M be/src/thirdparty/datasketches/cpc_util.hpp M be/src/thirdparty/datasketches/hll.hpp M be/src/thirdparty/datasketches/icon_estimator.hpp M be/src/thirdparty/datasketches/kll_quantile_calculator.hpp M be/src/thirdparty/datasketches/kll_quantile_calculator_impl.hpp M be/src/thirdparty/datasketches/kll_sketch.hpp M be/src/thirdparty/datasketches/kll_sketch_impl.hpp M be/src/thirdparty/datasketches/memory_operations.hpp M be/src/thirdparty/datasketches/theta_a_not_b.hpp M be/src/thirdparty/datasketches/theta_a_not_b_impl.hpp A be/src/thirdparty/datasketches/theta_comparators.hpp A be/src/thirdparty/datasketches/theta_constants.hpp A be/src/thirdparty/datasketches/theta_helpers.hpp M be/src/thirdparty/datasketches/theta_intersection.hpp A be/src/thirdparty/datasketches/theta_intersection_base.hpp A be/src/thirdparty/datasketches/theta_intersection_base_impl.hpp M be/src/thirdparty/datasketches/theta_intersection_impl.hpp A be/src/thirdparty/datasketches/theta_jaccard_similarity.hpp A be/src/thirdparty/datasketches/theta_jaccard_similarity_base.hpp A be/src/thirdparty/datasketches/theta_set_difference_base.hpp A be/src/thirdparty/datasketches/theta_set_difference_base_impl.hpp M be/src/thirdparty/datasketches/theta_sketch.hpp M be/src/thirdparty/datasketches/theta_sketch_impl.hpp M be/src/thirdparty/datasketches/theta_union.hpp A be/src/thirdparty/datasketches/theta_union_base.hpp A be/src/thirdparty/datasketches/theta_union_base_impl.hpp M be/src/thirdparty/datasketches/theta_union_impl.hpp A be/src/thirdparty/datasketches/theta_update_sketch_base.hpp A be/src/thirdparty/datasketches/theta_update_sketch_base_impl.hpp M be/src/thirdparty/datasketches/u32_table.hpp M be/src/thirdparty/datasketches/u32_table_impl.hpp 66 files changed, 2,646 insertions(+), 1,873 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/17294/3 -- To view, visit http://gerrit.cloudera.org:8080/17294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I37622a7643d015b80f55b802421eae826aa7a4f9 Gerrit-Change-Number: 17294 Gerrit-PatchSet: 3 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[native-toolchain-CR] IMPALA-10488: Add jwt-cpp 0.5.0 to the toolchain
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/17118 ) Change subject: IMPALA-10488: Add jwt-cpp 0.5.0 to the toolchain .. Patch Set 3: Code-Review+2 >here are some CMake extensions that can go fetch a project as of a particular >revision and build it I agree, that would be the best. It adds one more step that can fail, but if github doesn't work, then we wouldn't even get to that step. -- To view, visit http://gerrit.cloudera.org:8080/17118 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I77aa3b36b45e8ef3c2d7873327948197c2c65d11 Gerrit-Change-Number: 17118 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Joe McDonnell Gerrit-Comment-Date: Mon, 12 Apr 2021 10:17:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10532: TestOverlapMinMaxFilters.test overlap min max filters seems flaky
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17289 ) Change subject: IMPALA-10532: TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/17289/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17289/2//COMMIT_MSG@10 PS2, Line 10: due to the query plan change (from 3-node scan to : 2-node scan) which splits the row groups among scan nodes differently. It's not clear to me why no pages are filtered by the min-max filter. Could you help to add more details? BTW, I left a comment in the JIRA about how to reproduce this issue locally: https://issues.apache.org/jira/browse/IMPALA-10532?focusedCommentId=17319265=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17319265 -- To view, visit http://gerrit.cloudera.org:8080/17289 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I527de530f7db1ce959e7ef2ae3ced18677221c9f Gerrit-Change-Number: 17289 Gerrit-PatchSet: 2 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Mon, 12 Apr 2021 10:09:23 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10619: Minor refactoring of analytic function methods
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17237 ) Change subject: IMPALA-10619: Minor refactoring of analytic function methods .. Patch Set 4: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7063/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/17237 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I39e4268c0c5500f09acf98357a80763c28f615c2 Gerrit-Change-Number: 17237 Gerrit-PatchSet: 4 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 12 Apr 2021 07:22:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10619: Minor refactoring of analytic function methods
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17237 ) Change subject: IMPALA-10619: Minor refactoring of analytic function methods .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17237 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I39e4268c0c5500f09acf98357a80763c28f615c2 Gerrit-Change-Number: 17237 Gerrit-PatchSet: 4 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 12 Apr 2021 07:22:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10619: Minor refactoring of analytic function methods
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17237 ) Change subject: IMPALA-10619: Minor refactoring of analytic function methods .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/17237/1/fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java File fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java: http://gerrit.cloudera.org:8080/#/c/17237/1/fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java@833 PS1, Line 833: protected FunctionCallExpr createRewrittenFunction(FunctionName funcName, > Sure. Don't want to block this. I'm ok with this since I think the external Yes, that's indeed how it is being checked in the derived class ..through a Preconditions.checkArgument on the instance of getFnCall(). -- To view, visit http://gerrit.cloudera.org:8080/17237 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I39e4268c0c5500f09acf98357a80763c28f615c2 Gerrit-Change-Number: 17237 Gerrit-PatchSet: 3 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 12 Apr 2021 07:20:49 + Gerrit-HasComments: Yes