Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17306 )
Change subject: IMPALA-10445: Adjust NDV's scale with query option ...................................................................... Patch Set 9: (6 comments) Looks good to me! http://gerrit.cloudera.org:8080/#/c/17306/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17306/9//COMMIT_MSG@12 PS9, Line 12: estimation by setting larger `scale`. That scale is decided by SQL in SQL function NDV(<expr>, <scale>) http://gerrit.cloudera.org:8080/#/c/17306/9//COMMIT_MSG@11 PS9, Line 11: Since IMPALA-2658, we can trade memory for more accurate : estimation by setting larger `scale`. That scale is decided by SQL : writers. However, it is a bumpy road for cluster admins to allow for : larger scales. Here lies 2 reasons: May rewrite the sentence as follows. Since IMPALA-2658, we can trade memory for more accurate estimation by setting larger NDV scale in SQL function NDV(<expr>, <scale>). However, the use of larger NDV scale requires the modification of the NDV function in queries which may not be practical in certain applications. http://gerrit.cloudera.org:8080/#/c/17306/9//COMMIT_MSG@24 PS9, Line 24: In this commit, we introduced a new Query Option `DEFAULT_NDV_SCALE`. : During to the advantage of query option, Cluster admins can either tune : 1 desired query, or influence upcoming queries by placing a default : query option in a dynamic resource pool. This paragraph can be rewritten as follows. In this commit, we introduced a new Query Option `DEFAULT_NDV_SCALE` with the following semantics. 1. The allowed value is in the range [2..10]; 2. The value of the query option is the default NDV scale for the SQL function of form NDV(<expr>). Previously, the NDV scale used in such functions is fixed at 2; 3. It does not influence the NDV scale for SQL function NDV(<expr>, <scale>) in which the NDV scale is provided by the 2nd argument <scale>. http://gerrit.cloudera.org:8080/#/c/17306/9/common/thrift/ImpalaService.thrift File common/thrift/ImpalaService.thrift: http://gerrit.cloudera.org:8080/#/c/17306/9/common/thrift/ImpalaService.thrift@655 PS9, Line 655: , make it easier to change NDV's scale may change to " to SQL NDV function NDV(<expr>. http://gerrit.cloudera.org:8080/#/c/17306/9/fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java File fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java: http://gerrit.cloudera.org:8080/#/c/17306/9/fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java@58 PS9, Line 58: Do function name substitution Probably should say "Create the substituted form". http://gerrit.cloudera.org:8080/#/c/17306/9/fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java File fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java: http://gerrit.cloudera.org:8080/#/c/17306/9/fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java@38 PS9, Line 38: DefaultNdvScaleRule I wonder if this rule can be merged to the CountDistinctToNdvRule. That is, if APPX_COUNT_DISTINCT query option is set, we fire that rule. Within that rule, we check for query option DEFAULT_NDV_SCALE and form NDV() function in two ways: NDV(<expr>) if the value of DEFAULT_NDV_SCALE is 2, or NDV(<expr>, <scale>) otherwise. -- To view, visit http://gerrit.cloudera.org:8080/17306 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50 Gerrit-Change-Number: 17306 Gerrit-PatchSet: 9 Gerrit-Owner: fifteencai <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Comment-Date: Tue, 13 Apr 2021 17:40:26 +0000 Gerrit-HasComments: Yes
