[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..


Patch Set 17: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 17
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: fifteencai 
Gerrit-Comment-Date: Wed, 21 Apr 2021 08:45:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..

IMPALA-10445: Adjust NDV's scale with query option

This is a new way to control NDV's scale.

Since IMPALA-2658, we can trade memory for more accurate
estimation by setting larger `scale` in SQL function
NDV(, ). However the use of larger NDV scale requires
the modification of SQL queries which may not be practical in certain
applications:

- Firstly, SQL writers are reluctant to lower that scale. They prone
to fill up the scale, which will make the cluster unstable. Especially
when there are `group by`s with high cardinalities. So it is wiser to
let cluster admin other than sql writer choose appropriate scale.

- Secondly, In some application scenarios, queries are stored in DBs.
In a BI system, for example, rewriting thousands of SQLs is risky.

In this commit, we introduced a new Query Option `DEFAULT_NDV_SCALE`
with the following semantics:

1. The allowed value is in the range [1..10];
2. Previously, the scale used in NDV() functions was fixed at 2.
Now the scale is provided by the newly added query options.
3. It does not influence the NDV scale for SQL function
NDV(, ) in which the NDV scale is provided by the 2nd
argument .

We also refactored method `Analyze` to make sure APPX_COUNT_DISTINCT
can work with this query option. After this, cluster admins can
substitute `count(distinct )` with `ndv(, scale)`.

Implementation details:

- The default value of DEFAULT_NDV_SCALE is 2, so we won't change
the default ndv behavior.
- We port `CountDistinctToNdv` transform logic from
`SelectStmt.analyze()` to `ExprRewriter`, making it compatible with
further rewrite rules.
- The newly added rewrite rule `DefaultNdvScaleRule` is applied
after `CountDistinctToNdvRule`.

Usage:

To set a default ndv scale:
```
SET DEFAULT_NDV_SCALE = 10;
```

To unset:
```
SET DEFAULT_NDV_SCALE = 2;
```

Here are test results of a typical workload (cardinality=40,090,650):
++
|   Metric| Count Distinct |NDV2|NDV5|NDV10  |
++
|  Memory(GB) |   3.83 |1.84|1.85| 1.89  |
| Duration(s) |  182.89|   30.22|29.72   | 29.24 |
|  ErrorRate  |0%  |1.8%|1.17%   | 0.06% |
++

Testing:
1) Added 3 unit test cases in `ExprRewriteRulesTest`.
2) Added 5 unit test cases in `ExprRewriterTest`.
3) Ran all front-end unit test, passed.
4) Added a new query-option test.

Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Reviewed-on: http://gerrit.cloudera.org:8080/17306
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
A fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java
A fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java
11 files changed, 314 insertions(+), 34 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 18
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: fifteencai 


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..


Patch Set 17: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 17
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: fifteencai 
Gerrit-Comment-Date: Wed, 21 Apr 2021 03:01:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..


Patch Set 17:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7088/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 17
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: fifteencai 
Gerrit-Comment-Date: Wed, 21 Apr 2021 03:01:45 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-20 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..


Patch Set 16: Code-Review+2

> Hi, Can we merge this patch now?

Yes. Thanks for the feature! Looking forward to more contributions from you!


--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 16
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: fifteencai 
Gerrit-Comment-Date: Wed, 21 Apr 2021 03:00:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-20 Thread fifteencai (Code Review)
fifteencai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..


Patch Set 16:

Hi, Can we merge this patch now?


--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 16
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: fifteencai 
Gerrit-Comment-Date: Wed, 21 Apr 2021 01:03:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-18 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..


Patch Set 16: Code-Review+1

LGTM. Give +1 first in case others want to have a look.


--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 16
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: fifteencai 
Gerrit-Comment-Date: Mon, 19 Apr 2021 03:53:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..


Patch Set 16:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8589/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 16
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: fifteencai 
Gerrit-Comment-Date: Thu, 15 Apr 2021 14:46:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-15 Thread fifteencai (Code Review)
fifteencai has uploaded a new patch set (#16). ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..

IMPALA-10445: Adjust NDV's scale with query option

This is a new way to control NDV's scale.

Since IMPALA-2658, we can trade memory for more accurate
estimation by setting larger `scale` in SQL function
NDV(, ). However the use of larger NDV scale requires
the modification of SQL queries which may not be practical in certain
applications:

- Firstly, SQL writers are reluctant to lower that scale. They prone
to fill up the scale, which will make the cluster unstable. Especially
when there are `group by`s with high cardinalities. So it is wiser to
let cluster admin other than sql writer choose appropriate scale.

- Secondly, In some application scenarios, queries are stored in DBs.
In a BI system, for example, rewriting thousands of SQLs is risky.

In this commit, we introduced a new Query Option `DEFAULT_NDV_SCALE`
with the following semantics:

1. The allowed value is in the range [1..10];
2. Previously, the scale used in NDV() functions was fixed at 2.
Now the scale is provided by the newly added query options.
3. It does not influence the NDV scale for SQL function
NDV(, ) in which the NDV scale is provided by the 2nd
argument .

We also refactored method `Analyze` to make sure APPX_COUNT_DISTINCT
can work with this query option. After this, cluster admins can
substitute `count(distinct )` with `ndv(, scale)`.

Implementation details:

- The default value of DEFAULT_NDV_SCALE is 2, so we won't change
the default ndv behavior.
- We port `CountDistinctToNdv` transform logic from
`SelectStmt.analyze()` to `ExprRewriter`, making it compatible with
further rewrite rules.
- The newly added rewrite rule `DefaultNdvScaleRule` is applied
after `CountDistinctToNdvRule`.

Usage:

To set a default ndv scale:
```
SET DEFAULT_NDV_SCALE = 10;
```

To unset:
```
SET DEFAULT_NDV_SCALE = 2;
```

Here are test results of a typical workload (cardinality=40,090,650):
++
|   Metric| Count Distinct |NDV2|NDV5|NDV10  |
++
|  Memory(GB) |   3.83 |1.84|1.85| 1.89  |
| Duration(s) |  182.89|   30.22|29.72   | 29.24 |
|  ErrorRate  |0%  |1.8%|1.17%   | 0.06% |
++

Testing:
1) Added 3 unit test cases in `ExprRewriteRulesTest`.
2) Added 5 unit test cases in `ExprRewriterTest`.
3) Ran all front-end unit test, passed.
4) Added a new query-option test.

Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
A fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java
A fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java
11 files changed, 314 insertions(+), 34 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17306/16
--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 16
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: fifteencai 


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-15 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..


Patch Set 15: Code-Review+1

(5 comments)

http://gerrit.cloudera.org:8080/#/c/17306/15/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java:

http://gerrit.cloudera.org:8080/#/c/17306/15/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java@181
PS15, Line 181:
nit: redudant space


http://gerrit.cloudera.org:8080/#/c/17306/15/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java@184
PS15, Line 184: should
nit: "callers should"


http://gerrit.cloudera.org:8080/#/c/17306/15/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java@188
PS15, Line 188:* expStr: origin expr
  :* rules: list of rewrite rules
  :* expectedExprStr: expected expr
  :* return: Expr
nit: our code style use param comments like these:

  /**
   * 
   *
   * @param exprStr: original expr
   * @param rules: list of rewrite rules
   * @param expectedExprStr: expected expr
   * @return the rewritten Expr
   */


http://gerrit.cloudera.org:8080/#/c/17306/15/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java@794
PS15, Line 794: session.options().setAppx_count_distinct(true);
  : RewritesOk("count(distinct bool_col)", rules, 
"ndv(bool_col)");
  :   }
  :
  :   @Test
  :   public void testDefaultNdvScaleRule() throws ImpalaException {
  : List rules = Lists.newArrayList(
  : 
org.apache.impala.rewrite.DefaultNdvScaleRule.INSTANCE
  : );
  : session.options().setDefault_ndv_scale(10);
  : RewritesOk("ndv(bool_col)", rules, "ndv(bool_col, 10)");
  :   }
  :
  :   @Test
  :   public void testDefaultNdvScaleRuleNotSet() throws 
ImpalaException {
  : List rules = Lists.newArrayList(
  : 
org.apache.impala.rewrite.DefaultNdvScaleRule.INSTANCE
  : );
  : RewritesOk("ndv(bool_col)", rules, null);
  :   }
  :
  :   @Test
  :   public void testDefaultNdvScaleRuleSetDefault() throws 
ImpalaException {
  : List rules = Lists.newArrayList(
  : 
org.apache.impala.rewrite.DefaultNdvScaleRule.INSTANCE
  : );
  : session.options().setDefault_ndv_scale(2);
  : RewritesOk("ndv(bool_col)", rules, null);
nit: Just like other tests in this file, you can merge these into one test 
method since they share the same rule.


http://gerrit.cloudera.org:8080/#/c/17306/15/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java@822
PS15, Line 822:
nit: redundant blank line



--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 15
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: fifteencai 
Gerrit-Comment-Date: Thu, 15 Apr 2021 13:08:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-14 Thread fifteencai (Code Review)
fifteencai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..


Patch Set 15:

Resolved assert problem in newly added unit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 15
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: fifteencai 
Gerrit-Comment-Date: Wed, 14 Apr 2021 15:18:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..


Patch Set 15:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8577/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 15
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: fifteencai 
Gerrit-Comment-Date: Wed, 14 Apr 2021 14:50:25 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..


Patch Set 14:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8576/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 14
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: fifteencai 
Gerrit-Comment-Date: Wed, 14 Apr 2021 14:42:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-14 Thread fifteencai (Code Review)
fifteencai has uploaded a new patch set (#15). ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..

IMPALA-10445: Adjust NDV's scale with query option

This is a new way to control NDV's scale.

Since IMPALA-2658, we can trade memory for more accurate
estimation by setting larger `scale` in SQL function
NDV(, ). However the use of larger NDV scale requires
the modification of SQL queries which may not be practical in certain
applications:

- Firstly, SQL writers are reluctant to lower that scale. They prone
to fill up the scale, which will make the cluster unstable. Especially
when there are `group by`s with high cardinalities. So it is wiser to
let cluster admin other than sql writer choose appropriate scale.

- Secondly, In some application scenarios, queries are stored in DBs.
In a BI system, for example, rewriting thousands of SQLs is risky.

In this commit, we introduced a new Query Option `DEFAULT_NDV_SCALE`
with the following semantics:

1. The allowed value is in the range [1..10];
2. Previously, the scale used in NDV() functions was fixed at 2.
Now the scale is provided by the newly added query options.
3. It does not influence the NDV scale for SQL function
NDV(, ) in which the NDV scale is provided by the 2nd
argument .

We also refactored method `Analyze` to make sure APPX_COUNT_DISTINCT
can work with this query option. After this, cluster admins can
substitute `count(distinct )` with `ndv(, scale)`.

Implementation details:

- The default value of DEFAULT_NDV_SCALE is 2, so we won't change
the default ndv behavior.
- We port `CountDistinctToNdv` transform logic from
`SelectStmt.analyze()` to `ExprRewriter`, making it compatible with
further rewrite rules.
- The newly added rewrite rule `DefaultNdvScaleRule` is applied
after `CountDistinctToNdvRule`.

Usage:

To set a default ndv scale:
```
SET DEFAULT_NDV_SCALE = 10;
```

To unset:
```
SET DEFAULT_NDV_SCALE = 2;
```

Here are test results of a typical workload (cardinality=40,090,650):
++
|   Metric| Count Distinct |NDV2|NDV5|NDV10  |
++
|  Memory(GB) |   3.83 |1.84|1.85| 1.89  |
| Duration(s) |  182.89|   30.22|29.72   | 29.24 |
|  ErrorRate  |0%  |1.8%|1.17%   | 0.06% |
++

Testing:
1) Added 3 unit test cases in `ExprRewriteRulesTest`.
2) Added 5 unit test cases in `ExprRewriterTest`.
3) Ran all front-end unit test, passed.
4) Added a new query-option test.

Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
A fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java
A fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java
11 files changed, 315 insertions(+), 34 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17306/15
--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 15
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: fifteencai 


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..


Patch Set 14:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/17306/14/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java:

http://gerrit.cloudera.org:8080/#/c/17306/14/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java@541
PS14, Line 541: 
//--
line too long (96 > 90)


http://gerrit.cloudera.org:8080/#/c/17306/14/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java@546
PS14, Line 546: // CASE 1: DEFAULT_NDV_SCALE=5(same as the value in 
original sql), APPX_COUNT_DISTINCT=True
line too long (95 > 90)


http://gerrit.cloudera.org:8080/#/c/17306/14/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java@552
PS14, Line 552: 
//--
line too long (96 > 90)


http://gerrit.cloudera.org:8080/#/c/17306/14/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java@553
PS14, Line 553: String sql1 = "SELECT ndv(id), ndv(id, 5), count(DISTINCT 
id) FROM functional.alltypes";
line too long (92 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 14
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: fifteencai 
Gerrit-Comment-Date: Wed, 14 Apr 2021 14:23:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-14 Thread fifteencai (Code Review)
fifteencai has uploaded a new patch set (#14). ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..

IMPALA-10445: Adjust NDV's scale with query option

This is a new way to control NDV's scale.

Since IMPALA-2658, we can trade memory for more accurate
estimation by setting larger `scale` in SQL function
NDV(, ). However the use of larger NDV scale requires
the modification of SQL queries which may not be practical in certain
applications:

- Firstly, SQL writers are reluctant to lower that scale. They prone
to fill up the scale, which will make the cluster unstable. Especially
when there are `group by`s with high cardinalities. So it is wiser to
let cluster admin other than sql writer choose appropriate scale.

- Secondly, In some application scenarios, queries are stored in DBs.
In a BI system, for example, rewriting thousands of SQLs is risky.

In this commit, we introduced a new Query Option `DEFAULT_NDV_SCALE`
with the following semantics:

1. The allowed value is in the range [1..10];
2. Previously, the scale used in NDV() functions was fixed at 2.
Now the scale is provided by the newly added query options.
3. It does not influence the NDV scale for SQL function
NDV(, ) in which the NDV scale is provided by the 2nd
argument .

We also refactored method `Analyze` to make sure APPX_COUNT_DISTINCT
can work with this query option. After this, cluster admins can
substitute `count(distinct )` with `ndv(, scale)`.

Implementation details:

- The default value of DEFAULT_NDV_SCALE is 2, so we won't change
the default ndv behavior.
- We port `CountDistinctToNdv` transform logic from
`SelectStmt.analyze()` to `ExprRewriter`, making it compatible with
further rewrite rules.
- The newly added rewrite rule `DefaultNdvScaleRule` is applied
after `CountDistinctToNdvRule`.

Usage:

To set a default ndv scale:
```
SET DEFAULT_NDV_SCALE = 10;
```

To unset:
```
SET DEFAULT_NDV_SCALE = 2;
```

Here are test results of a typical workload (cardinality=40,090,650):
++
|   Metric| Count Distinct |NDV2|NDV5|NDV10  |
++
|  Memory(GB) |   3.83 |1.84|1.85| 1.89  |
| Duration(s) |  182.89|   30.22|29.72   | 29.24 |
|  ErrorRate  |0%  |1.8%|1.17%   | 0.06% |
++

Testing:
1) Added 3 unit test cases in `ExprRewriteRulesTest`.
2) Added 5 unit test cases in `ExprRewriterTest`.
3) Ran all front-end unit test, passed.
4) Added a new query-option test.

Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
A fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java
A fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java
11 files changed, 312 insertions(+), 34 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17306/14
--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 14
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: fifteencai 


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..


Patch Set 13:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8575/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 13
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: fifteencai 
Gerrit-Comment-Date: Wed, 14 Apr 2021 14:20:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..


Patch Set 13:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/17306/13/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java:

http://gerrit.cloudera.org:8080/#/c/17306/13/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java@180
PS13, Line 180:   // Given a list of `rule`, this function checks whether the 
rewritten  is as expected
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/17306/13/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java@181
PS13, Line 181:   // Caution: if no rule in `rules` is expected to be fired, 
should set expectedExprStr to null
line too long (95 > 90)


http://gerrit.cloudera.org:8080/#/c/17306/13/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java@182
PS13, Line 182:   // or "NULL". otherwise, this function would throw an 
exception like "Rule xxx didn't fire"
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/17306/13/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java:

http://gerrit.cloudera.org:8080/#/c/17306/13/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java@541
PS13, Line 541: 
//--
line too long (96 > 90)


http://gerrit.cloudera.org:8080/#/c/17306/13/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java@546
PS13, Line 546: // CASE 1: DEFAULT_NDV_SCALE=5(same as the value in 
original sql), APPX_COUNT_DISTINCT=True
line too long (95 > 90)


http://gerrit.cloudera.org:8080/#/c/17306/13/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java@552
PS13, Line 552: 
//--
line too long (96 > 90)


http://gerrit.cloudera.org:8080/#/c/17306/13/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java@553
PS13, Line 553: String sql1 = "SELECT ndv(id), ndv(id, 5), count(DISTINCT 
id) FROM functional.alltypes";
line too long (92 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 13
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: fifteencai 
Gerrit-Comment-Date: Wed, 14 Apr 2021 14:01:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-14 Thread fifteencai (Code Review)
fifteencai has uploaded a new patch set (#13). ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..

IMPALA-10445: Adjust NDV's scale with query option

This is a new way to control NDV's scale.

Since IMPALA-2658, we can trade memory for more accurate
estimation by setting larger `scale` in SQL function
NDV(, ). However the use of larger NDV scale requires
the modification of SQL queries which may not be practical in certain
applications:

- Firstly, SQL writers are reluctant to lower that scale. They prone
to fill up the scale, which will make the cluster unstable. Especially
when there are `group by`s with high cardinalities. So it is wiser to
let cluster admin other than sql writer choose appropriate scale.

- Secondly, In some application scenarios, queries are stored in DBs.
In a BI system, for example, rewriting thousands of SQLs is risky.

In this commit, we introduced a new Query Option `DEFAULT_NDV_SCALE`
with the following semantics:

1. The allowed value is in the range [1..10];
2. Previously, the scale used in NDV() functions was fixed at 2.
Now the scale is provided by the newly added query options.
3. It does not influence the NDV scale for SQL function
NDV(, ) in which the NDV scale is provided by the 2nd
argument .

We also refactored method `Analyze` to make sure APPX_COUNT_DISTINCT
can work with this query option. After this, cluster admins can
substitute `count(distinct )` with `ndv(, scale)`.

Implementation details:

- The default value of DEFAULT_NDV_SCALE is 2, so we won't change
the default ndv behavior.
- We port `CountDistinctToNdv` transform logic from
`SelectStmt.analyze()` to `ExprRewriter`, making it compatible with
further rewrite rules.
- The newly added rewrite rule `DefaultNdvScaleRule` is applied
after `CountDistinctToNdvRule`.

Usage:

To set a default ndv scale:
```
SET DEFAULT_NDV_SCALE = 10;
```

To unset:
```
SET DEFAULT_NDV_SCALE = 2;
```

Here are test results of a typical workload (cardinality=40,090,650):
++
|   Metric| Count Distinct |NDV2|NDV5|NDV10  |
++
|  Memory(GB) |   3.83 |1.84|1.85| 1.89  |
| Duration(s) |  182.89|   30.22|29.72   | 29.24 |
|  ErrorRate  |0%  |1.8%|1.17%   | 0.06% |
++

Testing:
1) Added 3 unit test cases in `ExprRewriteRulesTest`.
2) Added 5 unit test cases in `ExprRewriterTest`.
3) Ran all front-end unit test, passed.
4) Added a new query-option test.

Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
A fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java
A fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java
11 files changed, 302 insertions(+), 34 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17306/13
--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 13
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: fifteencai 


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-14 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..


Patch Set 12: Code-Review+1

Looks good!


--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 12
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: fifteencai 
Gerrit-Comment-Date: Wed, 14 Apr 2021 13:01:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-14 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..


Patch Set 12: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 12
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: fifteencai 
Gerrit-Comment-Date: Wed, 14 Apr 2021 08:46:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..


Patch Set 11:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/8570/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 11
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: fifteencai 
Gerrit-Comment-Date: Wed, 14 Apr 2021 08:15:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-14 Thread fifteencai (Code Review)
fifteencai has uploaded a new patch set (#12). ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..

IMPALA-10445: Adjust NDV's scale with query option

This is a new way to control NDV's scale.

Since IMPALA-2658, we can trade memory for more accurate
estimation by setting larger `scale` in SQL function
NDV(, ). However the use of larger NDV scale requires
the modification of SQL queries which may not be practical in certain
applications:

- Firstly, SQL writers are reluctant to lower that scale. They prone
to fill up the scale, which will make the cluster unstable. Especially
when there are `group by`s with high cardinalities. So it is wiser to
let cluster admin other than sql writer choose appropriate scale.

- Secondly, In some application scenarios, queries are stored in DBs.
In a BI system, for example, rewriting thousands of SQLs is risky.

In this commit, we introduced a new Query Option `DEFAULT_NDV_SCALE`
with the following semantics:

1. The allowed value is in the range [1..10];
2. Previously, the scale used in NDV() functions was fixed at 2.
Now the scale is provided by the newly added query options.
3. It does not influence the NDV scale for SQL function
NDV(, ) in which the NDV scale is provided by the 2nd
argument .

We also refactored method `Analyze` to make sure APPX_COUNT_DISTINCT
can work with this query option. After this, cluster admins can
substitute `count(distinct )` with `ndv(, scale)`.

Implementation details:

- The default value of DEFAULT_NDV_SCALE is 2, so we won't change
the default ndv behavior.
- We port `CountDistinctToNdv` transform logic from
`SelectStmt.analyze()` to `ExprRewriter`, making it compatible with
further rewrite rules.
- The newly added rewrite rule `DefaultNdvScaleRule` is applied
after `CountDistinctToNdvRule`.

Usage:

To set a default ndv scale:
```
SET DEFAULT_NDV_SCALE = 10;
```

To unset:
```
SET DEFAULT_NDV_SCALE = 2;
```

Here are test results of a typical workload (cardinality=40,090,650):
++
|   Metric| Count Distinct |NDV2|NDV5|NDV10  |
++
|  Memory(GB) |   3.83 |1.84|1.85| 1.89  |
| Duration(s) |  182.89|   30.22|29.72   | 29.24 |
|  ErrorRate  |0%  |1.8%|1.17%   | 0.06% |
++

Testing:
1) Added 3 unit test cases in `ExprRewriteRulesTest`.
2) Added 5 unit test cases in `ExprRewriterTest`.
3) Ran all front-end unit test, passed.
4) Added a new query-option test.

Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
A fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java
A fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java
11 files changed, 298 insertions(+), 34 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17306/12
--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 12
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: fifteencai 


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-13 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..


Patch Set 11:

(2 comments)

Looks good!

http://gerrit.cloudera.org:8080/#/c/17306/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17306/11//COMMIT_MSG@14
PS11, Line 14: y
nit: queries


http://gerrit.cloudera.org:8080/#/c/17306/11/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java:

http://gerrit.cloudera.org:8080/#/c/17306/11/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java@539
PS11, Line 539: }
The above tests look good!

Can we add several more to test the combination of the two new rules:


1. SELECT count(DISTINCT id), ndv(id) FROM functional.alltypes
2. setDefault_ndv_scale(9) and then SELECT count(DISTINCT id), ndv(id) FROM 
functional.alltypes
3. setDefault_ndv_scale(5) and then SELECT count(DISTINCT id), ndv(id), ndv(id, 
10) FROM functional.alltypes
3. setDefault_ndv_scale(5) and then SELECT  ndv(id, 10), count(DISTINCT id), 
ndv(id) FROM functional.alltypes
4. setAppx_count_distinct(true);setDefault_ndv_scale(9) and then SELECT 
count(DISTINCT id), ndv(id) FROM functional.alltypes



--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 11
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: fifteencai 
Gerrit-Comment-Date: Wed, 14 Apr 2021 02:08:08 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-13 Thread fifteencai (Code Review)
fifteencai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..


Patch Set 11:

Thanks a lot to Quanlong and Qifan !!! I have reworded the commit msg and added 
unit tests. Please take a look.


--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 11
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: fifteencai 
Gerrit-Comment-Date: Wed, 14 Apr 2021 00:56:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-13 Thread fifteencai (Code Review)
fifteencai has uploaded a new patch set (#11). ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..

IMPALA-10445: Adjust NDV's scale with query option

This is a new way to control NDV's scale.

Since IMPALA-2658, we can trade memory for more accurate
estimation by setting larger `scale` in SQL function
NDV(, ). However the use of larger NDV scale requires
the modification of SQL query which may not be practical in certain
applications:

- Firstly, SQL writers are reluctant to lower that scale. They prone
to fill up the scale, which will make the cluster unstable. Especially
when there are `group by`s with high cardinalities. So it is wiser to
let cluster admin other than sql writer choose appropriate scale.

- Secondly, In some application scenarios, queries are stored in DBs.
In a BI system, for example, rewriting thousands of SQLs is risky.

In this commit, we introduced a new Query Option `DEFAULT_NDV_SCALE`
with the following semantics:

1. The allowed value is in the range [1..10];
2. Previously, the scale used in NDV() functions was fixed at 2.
Now the scale is provided by the newly added query options.
3. It does not influence the NDV scale for SQL function
NDV(, ) in which the NDV scale is provided by the 2nd
argument .

We also refactored method `Analyze` to make sure APPX_COUNT_DISTINCT
can work with this query option. After this, cluster admins can
substitute `count(distinct )` with `ndv(, scale)`.

Implementation details:

- The default value of DEFAULT_NDV_SCALE is 2, so we won't change
the default ndv behavior.
- We port `CountDistinctToNdv` transform logic from
`SelectStmt.analyze()` to `ExprRewriter`, making it compatible with
further rewrite rules.
- The newly added rewrite rule `DefaultNdvScaleRule` is applied
after `CountDistinctToNdvRule`.

Usage:

To set a default ndv scale:
```
SET DEFAULT_NDV_SCALE = 10;
```

To unset:
```
SET DEFAULT_NDV_SCALE = 2;
```

Here are test results of a typical workload (cardinality=40,090,650):
++
|   Metric| Count Distinct |NDV2|NDV5|NDV10  |
++
|  Memory(GB) |   3.83 |1.84|1.85| 1.89  |
| Duration(s) |  182.89|   30.22|29.72   | 29.24 |
|  ErrorRate  |0%  |1.8%|1.17%   | 0.06% |
++

Testing:
1) Added 3 unit test cases in `ExprRewriteRulesTest`.
2) Added 5 unit test cases in `ExprRewriterTest`.
3) Ran all front-end unit test, passed.
4) Added a new query-option test.

Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
A fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java
A fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java
11 files changed, 252 insertions(+), 34 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17306/11
--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 11
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-13 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17306/9/fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java
File fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java:

http://gerrit.cloudera.org:8080/#/c/17306/9/fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java@38
PS9, Line 38: DefaultNdvScaleRule
> I wonder if this rule can be merged to the CountDistinctToNdvRule. That is,
It seems this rule by itself is also important to transform NDV() to 
NDV(, ) under the table when transform count(distinct) to NDV() is 
not needed. I am OK with keeping this rule.



--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 9
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 13 Apr 2021 21:18:47 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-13 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..


Patch Set 9:

(6 comments)

Looks good to me!

http://gerrit.cloudera.org:8080/#/c/17306/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17306/9//COMMIT_MSG@12
PS9, Line 12: estimation by setting larger `scale`. That scale is decided by SQL
in SQL function NDV(, )


http://gerrit.cloudera.org:8080/#/c/17306/9//COMMIT_MSG@11
PS9, Line 11: Since IMPALA-2658, we can trade memory for more accurate
: estimation by setting larger `scale`. That scale is decided by SQL
: writers. However, it is a bumpy road for cluster admins to allow 
for
: larger scales. Here lies 2 reasons:
May rewrite the sentence as follows.

Since IMPALA-2658, we can trade memory for more accurate
estimation by setting larger NDV scale in SQL function NDV(, ). 
However, the use of larger NDV scale requires the modification of the NDV 
function in queries which may not be practical in certain applications.


http://gerrit.cloudera.org:8080/#/c/17306/9//COMMIT_MSG@24
PS9, Line 24: In this commit, we introduced a new Query Option 
`DEFAULT_NDV_SCALE`.
: During to the advantage of query option, Cluster admins can 
either tune
: 1 desired query, or influence upcoming queries by placing a 
default
: query option in a dynamic resource pool.
This paragraph can be rewritten as follows.

In this commit, we introduced a new Query Option `DEFAULT_NDV_SCALE` with the 
following semantics.

1. The allowed value is in the range [2..10];
2. The value of the query option is the default NDV scale for the SQL function 
of form NDV(). Previously, the NDV scale used in such functions is fixed 
at 2;
3. It does not influence the NDV scale for SQL function NDV(, ) in 
which the NDV scale is provided by the 2nd argument .


http://gerrit.cloudera.org:8080/#/c/17306/9/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/17306/9/common/thrift/ImpalaService.thrift@655
PS9, Line 655: , make it easier to change NDV's scale
may change to " to SQL NDV function NDV(.


http://gerrit.cloudera.org:8080/#/c/17306/9/fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java
File fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java:

http://gerrit.cloudera.org:8080/#/c/17306/9/fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java@58
PS9, Line 58: Do function name substitution
Probably should say "Create the substituted form".


http://gerrit.cloudera.org:8080/#/c/17306/9/fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java
File fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java:

http://gerrit.cloudera.org:8080/#/c/17306/9/fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java@38
PS9, Line 38: DefaultNdvScaleRule
I wonder if this rule can be merged to the CountDistinctToNdvRule. That is, if 
APPX_COUNT_DISTINCT query option is set, we fire that rule. Within that rule, 
we check for query option DEFAULT_NDV_SCALE and form NDV() function in two 
ways: NDV() if the value of DEFAULT_NDV_SCALE is 2, or NDV(, 
) otherwise.



--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 9
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 13 Apr 2021 17:40:26 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..


Patch Set 9:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8559/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 9
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 13 Apr 2021 10:22:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-13 Thread fifteencai (Code Review)
fifteencai has uploaded a new patch set (#9). ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..

IMPALA-10445: Adjust NDV's scale with query option

This is a new way to control NDV's scale.

Since IMPALA-2658, we can trade memory for more accurate
estimation by setting larger `scale`. That scale is decided by SQL
writers. However, it is a bumpy road for cluster admins to allow for
larger scales. Here lies 2 reasons:

- Firstly, SQL writers are reluctant to low the scale. They prone
to fill up the scale, which will make the cluster unstable, especially
when there are `group by`s with high cardinalities. So it is wiser to
let cluster admin instead of sql writer choose appropriate scale.

- Secondly, In some application scenarios, queries are stored in DBs.
In a BI system, for example, rewriting thousands of SQLs is risky.

In this commit, we introduced a new Query Option `DEFAULT_NDV_SCALE`.
During to the advantage of query option, Cluster admins can either tune
1 desired query, or influence upcoming queries by placing a default
query option in a dynamic resource pool.

We also refactored method `Analyze` to make sure APPX_COUNT_DISTINCT
can work with this query option. After this, cluster admins can degrade
service level by transforming `count(distinct id)` to `ndv(id, scale)`.

Implementation details:

- The default value of DEFAULT_NDV_SCALE is 2, so we won't change
the default ndv behavior.
- We port `CountDistinctToNdv` transform logic from
`SelectStmt.analyze()` to `ExprRewriter`, making it compatible with
further rewrite rules.
- The newly added rewrite rule `DefaultNdvScaleRule` is applied
after `CountDistinctToNdvRule`.

Usage:

To set a default ndv scale:
```
SET DEFAULT_NDV_SCALE = 10;  -- ranges from 1 to 10, both inclusive.
```

To unset:
```
SET DEFAULT_NDV_SCALE = 2;
```

Here are test results of a typical workload (cardinality=40,090,650):
++
|   Metric| Count Distinct |NDV2|NDV5|NDV10  |
++
|  Memory(GB) |   3.83 |1.84|1.85| 1.89  |
| Duration(s) |  182.89|   30.22|29.72   | 29.24 |
|  ErrorRate  |0%  |1.8%|1.17%   | 0.06% |
++

Testing:
1) Added 3 unit test cases in `ExprRewriteRulesTest`.
2) Added 5 unit test cases in `ExprRewriterTest`.
3) Ran all front-end unit test, passed.
4) Added a new query-option test.

Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
A fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java
A fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java
11 files changed, 251 insertions(+), 34 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17306/9
--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 9
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang