[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-13 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..


Patch Set 11:

(2 comments)

Looks good!

http://gerrit.cloudera.org:8080/#/c/17306/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17306/11//COMMIT_MSG@14
PS11, Line 14: y
nit: queries


http://gerrit.cloudera.org:8080/#/c/17306/11/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java:

http://gerrit.cloudera.org:8080/#/c/17306/11/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java@539
PS11, Line 539: }
The above tests look good!

Can we add several more to test the combination of the two new rules:


1. SELECT count(DISTINCT id), ndv(id) FROM functional.alltypes
2. setDefault_ndv_scale(9) and then SELECT count(DISTINCT id), ndv(id) FROM 
functional.alltypes
3. setDefault_ndv_scale(5) and then SELECT count(DISTINCT id), ndv(id), ndv(id, 
10) FROM functional.alltypes
3. setDefault_ndv_scale(5) and then SELECT  ndv(id, 10), count(DISTINCT id), 
ndv(id) FROM functional.alltypes
4. setAppx_count_distinct(true);setDefault_ndv_scale(9) and then SELECT 
count(DISTINCT id), ndv(id) FROM functional.alltypes



--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 11
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: fifteencai 
Gerrit-Comment-Date: Wed, 14 Apr 2021 02:08:08 +
Gerrit-HasComments: Yes


[Impala-ASF-CR](asf-site) Add slack channel in the community page

2021-04-13 Thread Quanlong Huang (Code Review)
Quanlong Huang has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/17311 )

Change subject: Add slack channel in the community page
..

Add slack channel in the community page

This patch adds links to join the apache-impala slack channel.

Change-Id: I26112aff63cabf6f4ca437f12d71928e80106687
Reviewed-on: http://gerrit.cloudera.org:8080/17311
Reviewed-by: Jim Apple 
Tested-by: Quanlong Huang 
---
M community.html
1 file changed, 1 insertion(+), 0 deletions(-)

Approvals:
  Jim Apple: Looks good to me, approved
  Quanlong Huang: Verified

--
To view, visit http://gerrit.cloudera.org:8080/17311
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-MessageType: merged
Gerrit-Change-Id: I26112aff63cabf6f4ca437f12d71928e80106687
Gerrit-Change-Number: 17311
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR](asf-site) Add slack channel in the community page

2021-04-13 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17311 )

Change subject: Add slack channel in the community page
..


Patch Set 1: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/17311
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-MessageType: comment
Gerrit-Change-Id: I26112aff63cabf6f4ca437f12d71928e80106687
Gerrit-Change-Number: 17311
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 14 Apr 2021 00:58:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-13 Thread fifteencai (Code Review)
fifteencai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..


Patch Set 11:

Thanks a lot to Quanlong and Qifan !!! I have reworded the commit msg and added 
unit tests. Please take a look.


--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 11
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: fifteencai 
Gerrit-Comment-Date: Wed, 14 Apr 2021 00:56:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-13 Thread fifteencai (Code Review)
fifteencai has uploaded a new patch set (#11). ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..

IMPALA-10445: Adjust NDV's scale with query option

This is a new way to control NDV's scale.

Since IMPALA-2658, we can trade memory for more accurate
estimation by setting larger `scale` in SQL function
NDV(, ). However the use of larger NDV scale requires
the modification of SQL query which may not be practical in certain
applications:

- Firstly, SQL writers are reluctant to lower that scale. They prone
to fill up the scale, which will make the cluster unstable. Especially
when there are `group by`s with high cardinalities. So it is wiser to
let cluster admin other than sql writer choose appropriate scale.

- Secondly, In some application scenarios, queries are stored in DBs.
In a BI system, for example, rewriting thousands of SQLs is risky.

In this commit, we introduced a new Query Option `DEFAULT_NDV_SCALE`
with the following semantics:

1. The allowed value is in the range [1..10];
2. Previously, the scale used in NDV() functions was fixed at 2.
Now the scale is provided by the newly added query options.
3. It does not influence the NDV scale for SQL function
NDV(, ) in which the NDV scale is provided by the 2nd
argument .

We also refactored method `Analyze` to make sure APPX_COUNT_DISTINCT
can work with this query option. After this, cluster admins can
substitute `count(distinct )` with `ndv(, scale)`.

Implementation details:

- The default value of DEFAULT_NDV_SCALE is 2, so we won't change
the default ndv behavior.
- We port `CountDistinctToNdv` transform logic from
`SelectStmt.analyze()` to `ExprRewriter`, making it compatible with
further rewrite rules.
- The newly added rewrite rule `DefaultNdvScaleRule` is applied
after `CountDistinctToNdvRule`.

Usage:

To set a default ndv scale:
```
SET DEFAULT_NDV_SCALE = 10;
```

To unset:
```
SET DEFAULT_NDV_SCALE = 2;
```

Here are test results of a typical workload (cardinality=40,090,650):
++
|   Metric| Count Distinct |NDV2|NDV5|NDV10  |
++
|  Memory(GB) |   3.83 |1.84|1.85| 1.89  |
| Duration(s) |  182.89|   30.22|29.72   | 29.24 |
|  ErrorRate  |0%  |1.8%|1.17%   | 0.06% |
++

Testing:
1) Added 3 unit test cases in `ExprRewriteRulesTest`.
2) Added 5 unit test cases in `ExprRewriterTest`.
3) Ran all front-end unit test, passed.
4) Added a new query-option test.

Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
A fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java
A fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java
11 files changed, 252 insertions(+), 34 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17306/11
--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 11
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] WIP: IMPALA-10656: Fire insert events before commit

2021-04-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17313 )

Change subject: WIP: IMPALA-10656: Fire insert events before commit
..


Patch Set 3: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7066/


--
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 14 Apr 2021 00:45:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10532: TestOverlapMinMaxFilters.test overlap min max filters seems flaky

2021-04-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17289 )

Change subject: IMPALA-10532: 
TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky
..


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7069/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/17289
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I527de530f7db1ce959e7ef2ae3ced18677221c9f
Gerrit-Change-Number: 17289
Gerrit-PatchSet: 5
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 13 Apr 2021 23:37:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10532: TestOverlapMinMaxFilters.test overlap min max filters seems flaky

2021-04-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17289 )

Change subject: IMPALA-10532: 
TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky
..


Patch Set 5: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17289
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I527de530f7db1ce959e7ef2ae3ced18677221c9f
Gerrit-Change-Number: 17289
Gerrit-PatchSet: 5
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 13 Apr 2021 23:37:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10532: TestOverlapMinMaxFilters.test overlap min max filters seems flaky

2021-04-13 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17289 )

Change subject: IMPALA-10532: 
TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17289
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I527de530f7db1ce959e7ef2ae3ced18677221c9f
Gerrit-Change-Number: 17289
Gerrit-PatchSet: 4
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 13 Apr 2021 23:36:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10657: Remove accidental usage of shaded imports

2021-04-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17314 )

Change subject: IMPALA-10657: Remove accidental usage of shaded imports
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17314
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifa67aec96539db7861416f3e71cca83607e3d8c9
Gerrit-Change-Number: 17314
Gerrit-PatchSet: 2
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Comment-Date: Tue, 13 Apr 2021 23:24:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10657: Remove accidental usage of shaded imports

2021-04-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17314 )

Change subject: IMPALA-10657: Remove accidental usage of shaded imports
..


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7068/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/17314
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifa67aec96539db7861416f3e71cca83607e3d8c9
Gerrit-Change-Number: 17314
Gerrit-PatchSet: 2
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Comment-Date: Tue, 13 Apr 2021 23:24:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10657: Remove accidental usage of shaded imports

2021-04-13 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17314 )

Change subject: IMPALA-10657: Remove accidental usage of shaded imports
..


Patch Set 1: Code-Review+2

This makes sense, and looking through the history, I agree that these don't 
seem to have any purpose.


--
To view, visit http://gerrit.cloudera.org:8080/17314
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifa67aec96539db7861416f3e71cca83607e3d8c9
Gerrit-Change-Number: 17314
Gerrit-PatchSet: 1
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Comment-Date: Tue, 13 Apr 2021 22:47:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10657: Remove accidental usage of shaded imports

2021-04-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17314 )

Change subject: IMPALA-10657: Remove accidental usage of shaded imports
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8569/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17314
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifa67aec96539db7861416f3e71cca83607e3d8c9
Gerrit-Change-Number: 17314
Gerrit-PatchSet: 1
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Comment-Date: Tue, 13 Apr 2021 21:42:41 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10657: Remove accidental usage of shaded imports

2021-04-13 Thread John Sherman (Code Review)
John Sherman has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17314


Change subject: IMPALA-10657: Remove accidental usage of shaded imports
..

IMPALA-10657: Remove accidental usage of shaded imports

- This changes seemingly accidental usages of shaded imports
  with the direct dependency
  - This is helpful to reduce confusion and possibly reduce the
  required jars for certain usages of the frontend jars

Change-Id: Ifa67aec96539db7861416f3e71cca83607e3d8c9
---
M fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java
M fe/src/main/java/org/apache/impala/catalog/Transaction.java
M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java
M fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java
4 files changed, 6 insertions(+), 4 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/17314/1
--
To view, visit http://gerrit.cloudera.org:8080/17314
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ifa67aec96539db7861416f3e71cca83607e3d8c9
Gerrit-Change-Number: 17314
Gerrit-PatchSet: 1
Gerrit-Owner: John Sherman 


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-13 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17306/9/fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java
File fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java:

http://gerrit.cloudera.org:8080/#/c/17306/9/fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java@38
PS9, Line 38: DefaultNdvScaleRule
> I wonder if this rule can be merged to the CountDistinctToNdvRule. That is,
It seems this rule by itself is also important to transform NDV() to 
NDV(, ) under the table when transform count(distinct) to NDV() is 
not needed. I am OK with keeping this rule.



--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 9
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 13 Apr 2021 21:18:47 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10584: Defer advancing read page when reservation is tight.

2021-04-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17195 )

Change subject: IMPALA-10584: Defer advancing read page when reservation is 
tight.
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8568/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17195
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a
Gerrit-Change-Number: 17195
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 13 Apr 2021 20:59:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10584: Defer advancing read page when reservation is tight.

2021-04-13 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17195 )

Change subject: IMPALA-10584: Defer advancing read page when reservation is 
tight.
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17195/4/be/src/runtime/buffered-tuple-stream.cc
File be/src/runtime/buffered-tuple-stream.cc:

http://gerrit.cloudera.org:8080/#/c/17195/4/be/src/runtime/buffered-tuple-stream.cc@733
PS4, Line 733: dvancing_read_page = true;
> This DCHECK was part of my debugging when investigating the bug.
Done


http://gerrit.cloudera.org:8080/#/c/17195/4/be/src/runtime/buffered-tuple-stream.cc@750
PS4, Line 750:
> Ack
Done. Reword it a bit.



--
To view, visit http://gerrit.cloudera.org:8080/17195
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a
Gerrit-Change-Number: 17195
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 13 Apr 2021 20:41:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10584: Defer advancing read page when reservation is tight.

2021-04-13 Thread Riza Suminto (Code Review)
Hello Quanlong Huang, Qifan Chen, Joe McDonnell, Bikramjeet Vig, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17195

to look at the new patch set (#5).

Change subject: IMPALA-10584: Defer advancing read page when reservation is 
tight.
..

IMPALA-10584: Defer advancing read page when reservation is tight.

TestScratchLimit::test_with_unlimited_scratch_limit has been
intermittently crashing in ubuntu-16.04-dockerised-tests environment
after result spooling is enabled by default in IMPALA-9856. DCHECK
violation occurs in ReservationTracker::CheckConsistency() due to
BufferedTupleStream wrongly tries to reclaim memory reservation while
unpinning the stream.

For this bug to surface, all of the following needs to happen:
- Stream is in pinned mode.
- There are only 2 pages in the stream: 1 read and 1 write.
- Stream can not increase reservation anymore either due to memory
  pressure or low buffer/memory limit.
- The stream read page has been fully read and is attached to output
  RowBatch. But the output RowBatch has not cleaned up yet.
- BufferedTupleStream::UnpinStream is invoked.

The memory accounting bug happens because UnpinStream proceeds to
NextReadPage where the read page buffer was mistakenly assumed as
released. default_page_len_ bytes were added into
write_page_reservation_ and subsequently violates the total memory
reservation.

This patch fixes the bug by deferring advancement of the read iterator
in UnpinStream if the read page is attached to output RowBatch and there
is no more unused reservation. This is OK because after UnpinStream
finished, the stream is now in unpinned mode and has_read_write_page is
false. The next AddRow operation is then allowed to unpin the previous
write page first before reusing the reservation to allocate a new write
page.

Testing:
- Add be test DeferAdvancingReadPage.
- Loop the TestScratchLimit::test_with_unlimited_scratch_limit in my
  local dev machine and verify that each test passed without triggering
  the DCHECK violation.
- Reenable result spooling in TestScratchLimit that was disabled in
  IMPALA-10559.
- Pass core tests.

Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a
---
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
M tests/query_test/test_scratch_limit.py
4 files changed, 96 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/17195/5
--
To view, visit http://gerrit.cloudera.org:8080/17195
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a
Gerrit-Change-Number: 17195
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

2021-04-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in 
coordinator
..


Patch Set 21:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7067/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 21
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 13 Apr 2021 20:13:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

2021-04-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in 
coordinator
..


Patch Set 21: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 21
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 13 Apr 2021 20:13:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

2021-04-13 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in 
coordinator
..


Patch Set 20: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 20
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 13 Apr 2021 20:12:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10584: Defer advancing read page when reservation is tight.

2021-04-13 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17195 )

Change subject: IMPALA-10584: Defer advancing read page when reservation is 
tight.
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17195/4/be/src/runtime/buffered-tuple-stream.cc
File be/src/runtime/buffered-tuple-stream.cc:

http://gerrit.cloudera.org:8080/#/c/17195/4/be/src/runtime/buffered-tuple-stream.cc@733
PS4, Line 733: read_it_.read_page_ == pages_.begin()
> I wonder if this assertion is always true, since in pinned mode, there can
This DCHECK was part of my debugging when investigating the bug.
The intent is to match assertion in NextReadPage (buffered-tuple-stream.cc:529).

Agree that this DCHECK most likely will always be true since we check earlier 
that attached_to_output_batch == true.
Will remove this DCHECK in next patch set.


http://gerrit.cloudera.org:8080/#/c/17195/4/be/src/runtime/buffered-tuple-stream.cc@750
PS4, Line 750: unpin this first page
> May add a comment on when this currently unpinned page is finally unpinned.
Ack



--
To view, visit http://gerrit.cloudera.org:8080/17195
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a
Gerrit-Change-Number: 17195
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 13 Apr 2021 20:06:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10654: Fix precision loss in DecimalValue to double conversion.

2021-04-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17303 )

Change subject: IMPALA-10654: Fix precision loss in DecimalValue to double 
conversion.
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8567/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17303
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56f0652cb8f81a491b87d9b108a94c00ae6c99a1
Gerrit-Change-Number: 17303
Gerrit-PatchSet: 5
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 13 Apr 2021 19:58:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10584: Defer advancing read page when reservation is tight.

2021-04-13 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17195 )

Change subject: IMPALA-10584: Defer advancing read page when reservation is 
tight.
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17195/4/be/src/runtime/buffered-tuple-stream.cc
File be/src/runtime/buffered-tuple-stream.cc:

http://gerrit.cloudera.org:8080/#/c/17195/4/be/src/runtime/buffered-tuple-stream.cc@733
PS4, Line 733: read_it_.read_page_ == pages_.begin()
I wonder if this assertion is always true, since in pinned mode, there can be 
multiple pages in the stream in memory. The current read page (i.e. 
read_it_.read_page_) may be any of these pages?


http://gerrit.cloudera.org:8080/#/c/17195/4/be/src/runtime/buffered-tuple-stream.cc@750
PS4, Line 750: unpin this first page
May add a comment on when this currently unpinned page is finally unpinned.



--
To view, visit http://gerrit.cloudera.org:8080/17195
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a
Gerrit-Change-Number: 17195
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 13 Apr 2021 19:53:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10654: Fix precision loss in DecimalValue to double conversion.

2021-04-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17303 )

Change subject: IMPALA-10654: Fix precision loss in DecimalValue to double 
conversion.
..


Patch Set 5:

(20 comments)

http://gerrit.cloudera.org:8080/#/c/17303/5/be/src/thirdparty/fast_double_parser/fast_double_parser.h
File be/src/thirdparty/fast_double_parser/fast_double_parser.h:

http://gerrit.cloudera.org:8080/#/c/17303/5/be/src/thirdparty/fast_double_parser/fast_double_parser.h@13
PS5, Line 13: #if (defined(sun) || defined(__sun))
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/5/be/src/thirdparty/fast_double_parser/fast_double_parser.h@17
PS5, Line 17: #if defined(__CYGWIN__) || defined(__MINGW32__) || 
defined(__MINGW64__)
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/5/be/src/thirdparty/fast_double_parser/fast_double_parser.h@22
PS5, Line 22:  * Determining whether we should import xlocale.h or not is
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/5/be/src/thirdparty/fast_double_parser/fast_double_parser.h@25
PS5, Line 25: #if defined(FAST_DOUBLE_PARSER_SOLARIS) || 
defined(FAST_DOUBLE_PARSER_CYGWIN)
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/5/be/src/thirdparty/fast_double_parser/fast_double_parser.h@67
PS5, Line 67: #endif //  defined(FAST_DOUBLE_PARSER_SOLARIS) || 
defined(FAST_DOUBLE_PARSER_CYGWIN)
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/5/be/src/thirdparty/fast_double_parser/fast_double_parser.h@84
PS5, Line 84:  * However, we have that
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/5/be/src/thirdparty/fast_double_parser/fast_double_parser.h@86
PS5, Line 86:  * Thus it is possible for a number of the form w * 10^-342 where
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/5/be/src/thirdparty/fast_double_parser/fast_double_parser.h@94
PS5, Line 94:  * Any number of form w * 10^309 where w>= 1 is going to be
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/5/be/src/thirdparty/fast_double_parser/fast_double_parser.h@153
PS5, Line 153: // credit: 
https://stackoverflow.com/questions/28868367/getting-the-high-part-of-64-bit-integer-multiplication
line too long (110 > 90)


http://gerrit.cloudera.org:8080/#/c/17303/5/be/src/thirdparty/fast_double_parser/fast_double_parser.h@154
PS5, Line 154: really_inline uint64_t Emulate64x64to128(uint64_t& r_hi, const 
uint64_t x, const uint64_t y) {
line too long (94 > 90)


http://gerrit.cloudera.org:8080/#/c/17303/5/be/src/thirdparty/fast_double_parser/fast_double_parser.h@159
PS5, Line 159: 
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/5/be/src/thirdparty/fast_double_parser/fast_double_parser.h@224
PS5, Line 224:  */
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/5/be/src/thirdparty/fast_double_parser/fast_double_parser.h@958
PS5, Line 958:
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/5/be/src/thirdparty/fast_double_parser/fast_double_parser.h@960
PS5, Line 960:   // The exponent is 1024 + 63 + power
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/5/be/src/thirdparty/fast_double_parser/fast_double_parser.h@976
PS5, Line 976:   // The 65536 is (1<<16) and corresponds to
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/5/be/src/thirdparty/fast_double_parser/fast_double_parser.h@979
PS5, Line 979:   // ((152170 * power ) >> 16) is equal to
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/5/be/src/thirdparty/fast_double_parser/fast_double_parser.h@980
PS5, Line 980:   // floor(log(5**power)/log(2))
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/5/be/src/thirdparty/fast_double_parser/fast_double_parser.h@982
PS5, Line 982:   // Note that this is not magic: 152170/(1<<16) is
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/5/be/src/thirdparty/fast_double_parser/fast_double_parser.h@984
PS5, Line 984:   // The 1<<16 value is a power of two; we could use a
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/5/be/src/thirdparty/fast_double_parser/fast_double_parser.h@1097
PS5, Line 1097: #if defined(FAST_DOUBLE_PARSER_SOLARIS) || 
defined(FAST_DOUBLE_PARSER_CYGWIN)
line has trailing whitespace



--
To view, visit http://gerrit.cloudera.org:8080/17303
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56f0652cb8f81a491b87d9b108a94c00ae6c99a1
Gerrit-Change-Number: 17303
Gerrit-PatchSet: 5
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 13 Apr 2021 19:38:33 +

[Impala-ASF-CR] IMPALA-10654: Fix precision loss in DecimalValue to double conversion.

2021-04-13 Thread Amogh Margoor (Code Review)
Amogh Margoor has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/17303 )

Change subject: IMPALA-10654: Fix precision loss in DecimalValue to double 
conversion.
..

IMPALA-10654: Fix precision loss in DecimalValue to double conversion.

Original approach to convert DecimalValue(internal representation
of decimals) to double was not accurate.
It was:
   static_cast(value_) / pow(10.0, scale).
However only integers from −2^53 to 2^53 can be represented
accurately by double precision without any loss.
Hence, it would not work for numbers like -0.43149576573887316.
For DecimalValue representing -0.43149576573887316, value_ would be
-43149576573887316 and scale would be 17. As value_ < -2^53,
 result would not be accurate. In newer approach we are using third
party library https://github.com/lemire/fast_double_parser, which
handles above scenario in a performant manner.

Testing:
1. Added End to End Tests covering following scenarios:
a. Test to show precision limitation of 16 in the write path
b. DecimalValue's value_ between -2^53 and 2^53.
b. value_ outside above range but abs(value_) < UINT64_MAX
c. abs(value_) > UINT64_MAX -covers DecimalValue<__int128_t>
2. Ran existing  backend and end-to-end tests completely

Change-Id: I56f0652cb8f81a491b87d9b108a94c00ae6c99a1
---
M be/src/runtime/decimal-value.inline.h
A be/src/thirdparty/fast_double_parser/LICENSE
A be/src/thirdparty/fast_double_parser/LICENSE.BSL
A be/src/thirdparty/fast_double_parser/README.md
A be/src/thirdparty/fast_double_parser/fast_double_parser.h
M bin/rat_exclude_files.txt
M testdata/workloads/functional-query/queries/QueryTest/values.test
7 files changed, 1,568 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/17303/5
-- 
To view, visit http://gerrit.cloudera.org:8080/17303
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I56f0652cb8f81a491b87d9b108a94c00ae6c99a1
Gerrit-Change-Number: 17303
Gerrit-PatchSet: 5
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10584: Defer advancing read page when reservation is tight.

2021-04-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17195 )

Change subject: IMPALA-10584: Defer advancing read page when reservation is 
tight.
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8566/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17195
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a
Gerrit-Change-Number: 17195
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 13 Apr 2021 19:22:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP: IMPALA-10656: Fire insert events before commit

2021-04-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17313 )

Change subject: WIP: IMPALA-10656: Fire insert events before commit
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8565/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 13 Apr 2021 19:17:41 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10584: Defer advancing read page when reservation is tight.

2021-04-13 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17195 )

Change subject: IMPALA-10584: Defer advancing read page when reservation is 
tight.
..


Patch Set 4:

(2 comments)

Patch set 4 submitted.

http://gerrit.cloudera.org:8080/#/c/17195/3/be/src/runtime/buffered-tuple-stream-test.cc
File be/src/runtime/buffered-tuple-stream-test.cc:

http://gerrit.cloudera.org:8080/#/c/17195/3/be/src/runtime/buffered-tuple-stream-test.cc@1237
PS3, Line 1237: TEST_F(SimpleTupleStreamTest, UnpinReadPage) {
> It is possible to reproduce the bug with some changes in this test?
Added DeferAdvancingReadPage to test this corner case.


http://gerrit.cloudera.org:8080/#/c/17195/3/be/src/runtime/buffered-tuple-stream.cc
File be/src/runtime/buffered-tuple-stream.cc:

http://gerrit.cloudera.org:8080/#/c/17195/3/be/src/runtime/buffered-tuple-stream.cc@723
PS3, Line 723:
> Does this assume the size of the read page is default_page_len_? Should we
That is a good point, thanks!
However, replacing it with read_it_.read_page_->len() caught this DCHECK 
failure: "buffer-pool.cc:93] Check failed: is_open()".
I think this because the the page handle has been closed in 
AttachBufferToBatch().
I made a work around by memorizing the page len in last_attached_page_len_ 
variable.



--
To view, visit http://gerrit.cloudera.org:8080/17195
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a
Gerrit-Change-Number: 17195
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 13 Apr 2021 19:07:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10584: Defer advancing read page when reservation is tight.

2021-04-13 Thread Riza Suminto (Code Review)
Hello Quanlong Huang, Joe McDonnell, Bikramjeet Vig, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17195

to look at the new patch set (#4).

Change subject: IMPALA-10584: Defer advancing read page when reservation is 
tight.
..

IMPALA-10584: Defer advancing read page when reservation is tight.

TestScratchLimit::test_with_unlimited_scratch_limit has been
intermittently crashing in ubuntu-16.04-dockerised-tests environment
after result spooling is enabled by default in IMPALA-9856. DCHECK
violation occurs in ReservationTracker::CheckConsistency() due to
BufferedTupleStream wrongly tries to reclaim memory reservation while
unpinning the stream.

For this bug to surface, all of the following needs to happen:
- Stream is in pinned mode.
- There are only 2 pages in the stream: 1 read and 1 write.
- Stream can not increase reservation anymore either due to memory
  pressure or low buffer/memory limit.
- The stream read page has been fully read and is attached to output
  RowBatch. But the output RowBatch has not cleaned up yet.
- BufferedTupleStream::UnpinStream is invoked.

The memory accounting bug happens because UnpinStream proceeds to
NextReadPage where the read page buffer was mistakenly assumed as
released. default_page_len_ bytes were added into
write_page_reservation_ and subsequently violates the total memory
reservation.

This patch fixes the bug by deferring advancement of the read iterator
in UnpinStream if the read page is attached to output RowBatch and there
is no more unused reservation. This is OK because after UnpinStream
finished, the stream is now in unpinned mode and has_read_write_page is
false. The next AddRow operation is then allowed to unpin the previous
write page first before reusing the reservation to allocate a new write
page.

Testing:
- Add be test DeferAdvancingReadPage.
- Loop the TestScratchLimit::test_with_unlimited_scratch_limit in my
  local dev machine and verify that each test passed without triggering
  the DCHECK violation.
- Reenable result spooling in TestScratchLimit that was disabled in
  IMPALA-10559.
- Pass core tests.

Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a
---
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
M tests/query_test/test_scratch_limit.py
4 files changed, 97 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/17195/4
--
To view, visit http://gerrit.cloudera.org:8080/17195
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a
Gerrit-Change-Number: 17195
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] WIP: IMPALA-10656: Fire insert events before commit

2021-04-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17313 )

Change subject: WIP: IMPALA-10656: Fire insert events before commit
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17313/3/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
File 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java:

http://gerrit.cloudera.org:8080/#/c/17313/3/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@879
PS3, Line 879: unpartTable.getFileSystem(), new 
Path(unpartTable.getHdfsBaseDir()), overwrite,
line too long (91 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 13 Apr 2021 18:57:27 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] WIP: IMPALA-10656: Fire insert events before commit

2021-04-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17313 )

Change subject: WIP: IMPALA-10656: Fire insert events before commit
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7066/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 13 Apr 2021 18:57:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP: IMPALA-10656: Fire insert events before commit

2021-04-13 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17313


Change subject: WIP: IMPALA-10656: Fire insert events before commit
..

WIP: IMPALA-10656: Fire insert events before commit

Before this fix Impala committed an insert first, then reloaded the table
from HMS, and generated the insert events based on the difference between
the two snapshots. (e.g. which file was not present in the old snapshot
but are there in the new one).

Hive replication expects the insert events before the commit, so this may
potentially lead to issues there.

The solution is to collect the new files during the insert in the backend,
and send the insert events based on this file set. This wasn't very hard
to do as we were alrady collecting the files in some cases:
- to move them from staging dir to their final location in case of
  non-partitioned tables
- to write the file list to snapshot files in case of Icebert tables
This patch unifies the paths above and collects all information about
the created files regardless of the table type.

Testing:
- no new tests
- I wasn't able to run EE tests yet, this is the reason behind the poc
  state
- done some basic manual testing

Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
---
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-text-table-writer.cc
M be/src/exec/output-partition.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/service/client-request-state.cc
M common/protobuf/control_service.proto
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
12 files changed, 225 insertions(+), 216 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/17313/3
--
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-9732: Improve exceptions of unsupported HdfsTableSink formats

2021-04-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17300 )

Change subject: IMPALA-9732: Improve exceptions of unsupported HdfsTableSink 
formats
..


Patch Set 3: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/17300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7fa2f949336a422acb4d01c9347b9b2e808e4aec
Gerrit-Change-Number: 17300
Gerrit-PatchSet: 3
Gerrit-Owner: Tamas Mate 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 13 Apr 2021 17:59:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9732: Improve exceptions of unsupported HdfsTableSink formats

2021-04-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/17300 )

Change subject: IMPALA-9732: Improve exceptions of unsupported HdfsTableSink 
formats
..

IMPALA-9732: Improve exceptions of unsupported HdfsTableSink formats

This change updates the exception that is thrown when the user tries to
insert into a partition which has unsupported format. The information to
make this decision is available during analysis, therefore this commit
also moves the check from the planner to the analyzer to have an
earlier result.

In the analyzer only the FeFsTables have to be checked therefore Kudu
tables are not related. Also, there is a difference between static and
dynamic partition clauses, for static partition clauses the partition
format is available during compile, for dynaminc partition clauses it is
only avaialble during runtime.

Testing:
 - Added unit tests
 - Ran exhaustive tests successfully

Change-Id: I7fa2f949336a422acb4d01c9347b9b2e808e4aec
Reviewed-on: http://gerrit.cloudera.org:8080/17300
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/datasets/functional/functional_schema_template.sql
4 files changed, 78 insertions(+), 21 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/17300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I7fa2f949336a422acb4d01c9347b9b2e808e4aec
Gerrit-Change-Number: 17300
Gerrit-PatchSet: 4
Gerrit-Owner: Tamas Mate 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

2021-04-13 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in 
coordinator
..


Patch Set 20: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 20
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 13 Apr 2021 17:45:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-13 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..


Patch Set 9:

(6 comments)

Looks good to me!

http://gerrit.cloudera.org:8080/#/c/17306/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17306/9//COMMIT_MSG@12
PS9, Line 12: estimation by setting larger `scale`. That scale is decided by SQL
in SQL function NDV(, )


http://gerrit.cloudera.org:8080/#/c/17306/9//COMMIT_MSG@11
PS9, Line 11: Since IMPALA-2658, we can trade memory for more accurate
: estimation by setting larger `scale`. That scale is decided by SQL
: writers. However, it is a bumpy road for cluster admins to allow 
for
: larger scales. Here lies 2 reasons:
May rewrite the sentence as follows.

Since IMPALA-2658, we can trade memory for more accurate
estimation by setting larger NDV scale in SQL function NDV(, ). 
However, the use of larger NDV scale requires the modification of the NDV 
function in queries which may not be practical in certain applications.


http://gerrit.cloudera.org:8080/#/c/17306/9//COMMIT_MSG@24
PS9, Line 24: In this commit, we introduced a new Query Option 
`DEFAULT_NDV_SCALE`.
: During to the advantage of query option, Cluster admins can 
either tune
: 1 desired query, or influence upcoming queries by placing a 
default
: query option in a dynamic resource pool.
This paragraph can be rewritten as follows.

In this commit, we introduced a new Query Option `DEFAULT_NDV_SCALE` with the 
following semantics.

1. The allowed value is in the range [2..10];
2. The value of the query option is the default NDV scale for the SQL function 
of form NDV(). Previously, the NDV scale used in such functions is fixed 
at 2;
3. It does not influence the NDV scale for SQL function NDV(, ) in 
which the NDV scale is provided by the 2nd argument .


http://gerrit.cloudera.org:8080/#/c/17306/9/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/17306/9/common/thrift/ImpalaService.thrift@655
PS9, Line 655: , make it easier to change NDV's scale
may change to " to SQL NDV function NDV(.


http://gerrit.cloudera.org:8080/#/c/17306/9/fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java
File fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java:

http://gerrit.cloudera.org:8080/#/c/17306/9/fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java@58
PS9, Line 58: Do function name substitution
Probably should say "Create the substituted form".


http://gerrit.cloudera.org:8080/#/c/17306/9/fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java
File fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java:

http://gerrit.cloudera.org:8080/#/c/17306/9/fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java@38
PS9, Line 38: DefaultNdvScaleRule
I wonder if this rule can be merged to the CountDistinctToNdvRule. That is, if 
APPX_COUNT_DISTINCT query option is set, we fire that rule. Within that rule, 
we check for query option DEFAULT_NDV_SCALE and form NDV() function in two 
ways: NDV() if the value of DEFAULT_NDV_SCALE is 2, or NDV(, 
) otherwise.



--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 9
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 13 Apr 2021 17:40:26 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10532: TestOverlapMinMaxFilters.test overlap min max filters seems flaky

2021-04-13 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17289 )

Change subject: IMPALA-10532: 
TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky
..


Patch Set 4: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/17289
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I527de530f7db1ce959e7ef2ae3ced18677221c9f
Gerrit-Change-Number: 17289
Gerrit-PatchSet: 4
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 13 Apr 2021 17:28:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

2021-04-13 Thread Fang-Yu Rao (Code Review)
Fang-Yu Rao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in 
coordinator
..


Patch Set 20: Code-Review+1

(4 comments)

Thanks Qifan for addressing my previous comments! I do not have any addition 
suggestion.

http://gerrit.cloudera.org:8080/#/c/17252/15//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17252/15//COMMIT_MSG@23
PS15, Line 23: receive
> Please see my comment in scan-node.cc.
Done


http://gerrit.cloudera.org:8080/#/c/17252/15//COMMIT_MSG@27
PS15, Line 27: lling
> Done
Done


http://gerrit.cloudera.org:8080/#/c/17252/15/be/src/exec/scan-node.cc
File be/src/exec/scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/17252/15/be/src/exec/scan-node.cc@239
PS15, Line 239: end
> Good question.
Thanks Qifan for the detailed explanation! I do not have any more comment.


http://gerrit.cloudera.org:8080/#/c/17252/15/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/17252/15/be/src/runtime/coordinator.cc@652
PS15, Line 652:   // Also add the min/max value for the accumulated filter 
as follows.
  :   //  'PartialUpdates' - The min and the max are partial
> Reword the comment as follows and try to avoid describe how the accumulated
Thanks Qifan for addressing my comment. I do not have any addition suggestion.



--
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 20
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 13 Apr 2021 17:17:05 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10652: Optimize the checking of the size of incremental stats

2021-04-13 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17299 )

Change subject: IMPALA-10652: Optimize the checking of the size of incremental 
stats
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17299/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/17299/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@448
PS2, Line 448: incPartitionSize
May rename the variable to numOfAllIncStatsPartitions


http://gerrit.cloudera.org:8080/#/c/17299/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@452
PS2, Line 452: if (partitionSet_ == null) {
 :   incPartitionSize = allPartitions.size();
We may not need to verify the size limit when the partition set for incremental 
stats update is empty. The reason is that any incremental stats update done 
before has already passed the size test.



--
To view, visit http://gerrit.cloudera.org:8080/17299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee
Gerrit-Change-Number: 17299
Gerrit-PatchSet: 2
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 13 Apr 2021 16:49:57 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

2021-04-13 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in 
coordinator
..


Patch Set 20: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 20
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 13 Apr 2021 16:01:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10532: TestOverlapMinMaxFilters.test overlap min max filters seems flaky

2021-04-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17289 )

Change subject: IMPALA-10532: 
TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8564/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17289
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I527de530f7db1ce959e7ef2ae3ced18677221c9f
Gerrit-Change-Number: 17289
Gerrit-PatchSet: 4
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 13 Apr 2021 15:57:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10532: TestOverlapMinMaxFilters.test overlap min max filters seems flaky

2021-04-13 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/17289 )

Change subject: IMPALA-10532: 
TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky
..

IMPALA-10532: TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky

This change disables the overlap min/max filter test for hdfs in
erasure coding, due to the query plan change (from 3-node scan to
2-node scan) which splits the row groups among scan nodes differently.

The SkipIfEC class in test harness skip.py is enhanced with a new
skip reason 'different_scan_split' to facilitate this action.

Testing:
  1. Ran unit tests;
  2. Ran core tests.

Change-Id: I527de530f7db1ce959e7ef2ae3ced18677221c9f
---
M tests/common/skip.py
M tests/query_test/test_runtime_filters.py
2 files changed, 4 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/17289/4
--
To view, visit http://gerrit.cloudera.org:8080/17289
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I527de530f7db1ce959e7ef2ae3ced18677221c9f
Gerrit-Change-Number: 17289
Gerrit-PatchSet: 4
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-10631: Upgrade DataSketches to version 3.0.0

2021-04-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17294 )

Change subject: IMPALA-10631: Upgrade DataSketches to version 3.0.0
..


Patch Set 4: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7064/


--
To view, visit http://gerrit.cloudera.org:8080/17294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I37622a7643d015b80f55b802421eae826aa7a4f9
Gerrit-Change-Number: 17294
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 13 Apr 2021 13:42:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10655: Add ImpalaServer interface to Initialize TQueryCtx for external frontends

2021-04-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17312 )

Change subject: IMPALA-10655: Add ImpalaServer interface to Initialize 
TQueryCtx for external frontends
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8563/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17312
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I59cebcf087e703a4ab49fb44f6f5ba1044f26546
Gerrit-Change-Number: 17312
Gerrit-PatchSet: 2
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 13 Apr 2021 13:41:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10655: Add ImpalaServer interface to Initialize TQueryCtx for external frontends

2021-04-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17312 )

Change subject: IMPALA-10655: Add ImpalaServer interface to Initialize 
TQueryCtx for external frontends
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8562/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17312
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I59cebcf087e703a4ab49fb44f6f5ba1044f26546
Gerrit-Change-Number: 17312
Gerrit-PatchSet: 1
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 13 Apr 2021 13:34:45 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

2021-04-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in 
coordinator
..


Patch Set 20:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8561/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 20
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 13 Apr 2021 13:25:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

2021-04-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in 
coordinator
..


Patch Set 19:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8560/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 19
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 13 Apr 2021 13:23:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10655: Add ImpalaServer interface to Initialize TQueryCtx for external frontends

2021-04-13 Thread Kurt Deschler (Code Review)
Hello Aman Sinha, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17312

to look at the new patch set (#2).

Change subject: IMPALA-10655: Add ImpalaServer interface to Initialize 
TQueryCtx for external frontends
..

IMPALA-10655: Add ImpalaServer interface to Initialize TQueryCtx for external 
frontends

This patch adds a new interface that returns an initialized TQueryCtx
for use in requests submitted by external frontends. This is necessary
to ensure that the query context in an externally generated TExecRequest
has appropriate coordinator metadata.

Testing: External frontend regression tests

Change-Id: I59cebcf087e703a4ab49fb44f6f5ba1044f26546
Reviewed-by: Aman Sinha 
---
M be/src/rpc/hs2-http-test.cc
M be/src/service/impala-hs2-server.cc
M be/src/service/impala-server.h
M common/thrift/ImpalaService.thrift
M tests/hs2/test_hs2.py
M tests/run-tests.py
6 files changed, 35 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/17312/2
--
To view, visit http://gerrit.cloudera.org:8080/17312
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I59cebcf087e703a4ab49fb44f6f5ba1044f26546
Gerrit-Change-Number: 17312
Gerrit-PatchSet: 2
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10655: Add ImpalaServer interface to Initialize TQueryCtx for external frontends

2021-04-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17312 )

Change subject: IMPALA-10655: Add ImpalaServer interface to Initialize 
TQueryCtx for external frontends
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17312/1/be/src/service/impala-hs2-server.cc
File be/src/service/impala-hs2-server.cc:

http://gerrit.cloudera.org:8080/#/c/17312/1/be/src/service/impala-hs2-server.cc@575
PS1, Line 575:   const ThriftServer::ConnectionContext* connection_context = 
ThriftServer::GetThreadConnectionContext();
line too long (105 > 90)


http://gerrit.cloudera.org:8080/#/c/17312/1/be/src/service/impala-hs2-server.cc@577
PS1, Line 577: HS2_RETURN_ERROR(return_val, "Unsupported operation", 
SQLSTATE_OPTIONAL_FEATURE_NOT_IMPLEMENTED);
line too long (101 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/17312
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I59cebcf087e703a4ab49fb44f6f5ba1044f26546
Gerrit-Change-Number: 17312
Gerrit-PatchSet: 1
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 13 Apr 2021 13:18:28 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10655: Add ImpalaServer interface to Initialize TQueryCtx for external frontends

2021-04-13 Thread Kurt Deschler (Code Review)
Hello Aman Sinha,

I'd like you to do a code review. Please visit

http://gerrit.cloudera.org:8080/17312

to review the following change.


Change subject: IMPALA-10655: Add ImpalaServer interface to Initialize 
TQueryCtx for external frontends
..

IMPALA-10655: Add ImpalaServer interface to Initialize TQueryCtx for external 
frontends

This patch adds a new interface that returns an initialized TQueryCtx
for use in requests submitted by external frontends. This is necessary
to ensure that the query context in an externally generated TExecRequest
has appropriate coordinator metadata.

Testing: External frontend regression tests

Change-Id: I59cebcf087e703a4ab49fb44f6f5ba1044f26546
Reviewed-by: Aman Sinha 
---
M be/src/rpc/hs2-http-test.cc
M be/src/service/impala-hs2-server.cc
M be/src/service/impala-server.h
M common/thrift/ImpalaService.thrift
M tests/hs2/test_hs2.py
M tests/run-tests.py
6 files changed, 33 insertions(+), 2 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/17312/1
--
To view, visit http://gerrit.cloudera.org:8080/17312
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I59cebcf087e703a4ab49fb44f6f5ba1044f26546
Gerrit-Change-Number: 17312
Gerrit-PatchSet: 1
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Aman Sinha 


[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

2021-04-13 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#20). ( 
http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in 
coordinator
..

IMPALA-10647 Improve always-true min/max filter handling in coordinator

The change improves how a coordinator behaves when a just
arriving min/max filter is always true. A new member
'always_true_filter_received_' is introduced to record such a
fact. Similarily, the new member always_false_flipped_to_false_
is added to indicate that the always false flag is flipped from
'true' to 'false'. These two members only influence how the min
and max columns in "Filter routing table" and "Final filter
table" in profile are displayed as follows.

  1. 'PartialUpdates' - The min and the max are partially updated;
  2. 'AlwaysTrue' - One received filter is AlwaysTrue;
  3. 'AlwaysFalse'- No filter is received or all received
filters are empty;
  4. 'Real values'- The final accumulated min/max from all
received filters.

A second change introduced is to record, in scan node, the
arrival time of min/max filters (as a timestamp since the system
is rebooted, obtained by calling MonotonicMillis()). A timestamp
of similar nature is recorded for hdfs parquet scanners when a
row group is processed. By comparing these two timestamps, one
can easily diagnose issues related to late arrival of min/max
filters.

Testing:
  1. Added three new tests in overlap_min_max_filters.test to
 verify that the min/max are displayed correctly when the
 min/max filter in hash join builder is set to always true,
 always false, or a pair of meaningful min and max values.
  2. Ran unit tests;
  3. Ran core tests successfully.

Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/scan-node.cc
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M 
testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
8 files changed, 221 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/17252/20
--
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 20
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

2021-04-13 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#19). ( 
http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in 
coordinator
..

IMPALA-10647 Improve always-true min/max filter handling in coordinator

The change improves how a coordinator behaves when a just
arriving min/max filter is always true. A new member
'always_true_filter_received_' is introduced to record such a
fact. Similarily, the new member always_false_flipped_to_false_
is added to indicate that the always false flag is flipped from
'true' to 'false'. These two members only influence how the min
and max columns in "Filter routing table" and "Final filter
table" in profile are displayed as follows.

  1. 'PartialUpdates' - The min and the max are partially updated;
  2. 'AlwaysTrue' - One received filter is AlwaysTrue;
  3. 'AlwaysFalse'- No filter is received or all received
filters are empty;
  4. 'Real values'- The final accumulated min/max from all
received filters.

A second change introduced is to record, in scan node, the
arrival time of min/max filters (as a timestamp since the system
is rebooted, obtained by calling MonotonicMillis()). A timestamp
of similar nature is recorded for hdfs parquet scanners when a
row group is processed. By comparing these two timestamps, one
can easily diagnose issues related to late arrival of min/max
filters.

Testing:
  1. Added three new tests in overlap_min_max_filters.test to
 verify that the min/max are displayed corrected when the
 min/max filter in hash join builder is set to always true,
 always false, or a pair of meaningful min and max values.
  2. Ran unit tests;
  3. Ran core tests.

Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/scan-node.cc
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M 
testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
8 files changed, 221 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/17252/19
--
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 19
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-10584: Defer advancing read page when reservation is tight.

2021-04-13 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17195 )

Change subject: IMPALA-10584: Defer advancing read page when reservation is 
tight.
..


Patch Set 3:

(2 comments)

Thanks for the investigation! Still digesting the context. Left some questions 
first.

http://gerrit.cloudera.org:8080/#/c/17195/3/be/src/runtime/buffered-tuple-stream-test.cc
File be/src/runtime/buffered-tuple-stream-test.cc:

http://gerrit.cloudera.org:8080/#/c/17195/3/be/src/runtime/buffered-tuple-stream-test.cc@1237
PS3, Line 1237: TEST_F(SimpleTupleStreamTest, UnpinReadPage) {
It is possible to reproduce the bug with some changes in this test?


http://gerrit.cloudera.org:8080/#/c/17195/3/be/src/runtime/buffered-tuple-stream.cc
File be/src/runtime/buffered-tuple-stream.cc:

http://gerrit.cloudera.org:8080/#/c/17195/3/be/src/runtime/buffered-tuple-stream.cc@723
PS3, Line 723: default_page_len_
Does this assume the size of the read page is default_page_len_? Should we 
handle large read page (size=max_page_len_) here and use 
read_it_.read_page_->len() instead?



--
To view, visit http://gerrit.cloudera.org:8080/17195
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a
Gerrit-Change-Number: 17195
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 13 Apr 2021 12:57:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9732: Improve exceptions of unsupported HdfsTableSink formats

2021-04-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17300 )

Change subject: IMPALA-9732: Improve exceptions of unsupported HdfsTableSink 
formats
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7fa2f949336a422acb4d01c9347b9b2e808e4aec
Gerrit-Change-Number: 17300
Gerrit-PatchSet: 3
Gerrit-Owner: Tamas Mate 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 13 Apr 2021 12:12:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9732: Improve exceptions of unsupported HdfsTableSink formats

2021-04-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17300 )

Change subject: IMPALA-9732: Improve exceptions of unsupported HdfsTableSink 
formats
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7065/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/17300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7fa2f949336a422acb4d01c9347b9b2e808e4aec
Gerrit-Change-Number: 17300
Gerrit-PatchSet: 3
Gerrit-Owner: Tamas Mate 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 13 Apr 2021 12:12:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9732: Improve exceptions of unsupported HdfsTableSink formats

2021-04-13 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17300 )

Change subject: IMPALA-9732: Improve exceptions of unsupported HdfsTableSink 
formats
..


Patch Set 2: Code-Review+2

Thanks for the changes!


--
To view, visit http://gerrit.cloudera.org:8080/17300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7fa2f949336a422acb4d01c9347b9b2e808e4aec
Gerrit-Change-Number: 17300
Gerrit-PatchSet: 2
Gerrit-Owner: Tamas Mate 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 13 Apr 2021 12:11:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..


Patch Set 9:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8559/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 9
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 13 Apr 2021 10:22:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-13 Thread fifteencai (Code Review)
fifteencai has uploaded a new patch set (#9). ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..

IMPALA-10445: Adjust NDV's scale with query option

This is a new way to control NDV's scale.

Since IMPALA-2658, we can trade memory for more accurate
estimation by setting larger `scale`. That scale is decided by SQL
writers. However, it is a bumpy road for cluster admins to allow for
larger scales. Here lies 2 reasons:

- Firstly, SQL writers are reluctant to low the scale. They prone
to fill up the scale, which will make the cluster unstable, especially
when there are `group by`s with high cardinalities. So it is wiser to
let cluster admin instead of sql writer choose appropriate scale.

- Secondly, In some application scenarios, queries are stored in DBs.
In a BI system, for example, rewriting thousands of SQLs is risky.

In this commit, we introduced a new Query Option `DEFAULT_NDV_SCALE`.
During to the advantage of query option, Cluster admins can either tune
1 desired query, or influence upcoming queries by placing a default
query option in a dynamic resource pool.

We also refactored method `Analyze` to make sure APPX_COUNT_DISTINCT
can work with this query option. After this, cluster admins can degrade
service level by transforming `count(distinct id)` to `ndv(id, scale)`.

Implementation details:

- The default value of DEFAULT_NDV_SCALE is 2, so we won't change
the default ndv behavior.
- We port `CountDistinctToNdv` transform logic from
`SelectStmt.analyze()` to `ExprRewriter`, making it compatible with
further rewrite rules.
- The newly added rewrite rule `DefaultNdvScaleRule` is applied
after `CountDistinctToNdvRule`.

Usage:

To set a default ndv scale:
```
SET DEFAULT_NDV_SCALE = 10;  -- ranges from 1 to 10, both inclusive.
```

To unset:
```
SET DEFAULT_NDV_SCALE = 2;
```

Here are test results of a typical workload (cardinality=40,090,650):
++
|   Metric| Count Distinct |NDV2|NDV5|NDV10  |
++
|  Memory(GB) |   3.83 |1.84|1.85| 1.89  |
| Duration(s) |  182.89|   30.22|29.72   | 29.24 |
|  ErrorRate  |0%  |1.8%|1.17%   | 0.06% |
++

Testing:
1) Added 3 unit test cases in `ExprRewriteRulesTest`.
2) Added 5 unit test cases in `ExprRewriterTest`.
3) Ran all front-end unit test, passed.
4) Added a new query-option test.

Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
A fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java
A fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java
11 files changed, 251 insertions(+), 34 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17306/9
--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 9
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-10631: Upgrade DataSketches to version 3.0.0

2021-04-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17294 )

Change subject: IMPALA-10631: Upgrade DataSketches to version 3.0.0
..


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7064/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/17294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I37622a7643d015b80f55b802421eae826aa7a4f9
Gerrit-Change-Number: 17294
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 13 Apr 2021 07:59:42 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10631: Upgrade DataSketches to version 3.0.0

2021-04-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17294 )

Change subject: IMPALA-10631: Upgrade DataSketches to version 3.0.0
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I37622a7643d015b80f55b802421eae826aa7a4f9
Gerrit-Change-Number: 17294
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 13 Apr 2021 07:59:42 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's precision/scale by query option

2021-04-13 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's precision/scale by query option
..


Patch Set 8: Code-Review+1

(8 comments)

Thanks for contributing this feature! The solution LGTM. I only have comments 
on code style.

+Aman and Qifan if they want to have a look.

http://gerrit.cloudera.org:8080/#/c/17306/8/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/17306/8/be/src/service/query-options.cc@1074
PS8, Line 1074: value
nit: interger value


http://gerrit.cloudera.org:8080/#/c/17306/8/fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java
File fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java:

http://gerrit.cloudera.org:8080/#/c/17306/8/fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java@18
PS8, Line 18: package org.apache.impala.rewrite;
nit: need a blank line after this


http://gerrit.cloudera.org:8080/#/c/17306/8/fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java@22
PS8, Line 22: import org.apache.impala.analysis.Analyzer;
nit: need a blank line after this


http://gerrit.cloudera.org:8080/#/c/17306/8/fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java@25
PS8, Line 25:  *
:  * */
nit: correct this to */


http://gerrit.cloudera.org:8080/#/c/17306/8/fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java@27
PS8, Line 27: {
nit: need a whitespace here


http://gerrit.cloudera.org:8080/#/c/17306/8/fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java@41
PS8, Line 41: if 
(!analyzer.getQueryCtx().client_request.query_options.appx_count_distinct)
: return expr;
nit: multiline if-statement needs brackets


http://gerrit.cloudera.org:8080/#/c/17306/8/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java:

http://gerrit.cloudera.org:8080/#/c/17306/8/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java@790
PS8, Line 790: session.options().setDefault_ndv_scale(10);
For test coverage, could you add a test without this option? i.e. add the 
following test before this line

 RewritesOk("ndv(bool_col)", rules, "ndv(bool_col)");


http://gerrit.cloudera.org:8080/#/c/17306/8/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java:

http://gerrit.cloudera.org:8080/#/c/17306/8/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java@539
PS8, Line 539:
nit: remove the blank line at the end



--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 8
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 13 Apr 2021 07:59:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10631: Upgrade DataSketches to version 3.0.0

2021-04-13 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17294 )

Change subject: IMPALA-10631: Upgrade DataSketches to version 3.0.0
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I37622a7643d015b80f55b802421eae826aa7a4f9
Gerrit-Change-Number: 17294
Gerrit-PatchSet: 3
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 13 Apr 2021 07:57:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9732: Improve exceptions of unsupported HdfsTableSink formats

2021-04-13 Thread Tamas Mate (Code Review)
Tamas Mate has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17300 )

Change subject: IMPALA-9732: Improve exceptions of unsupported HdfsTableSink 
formats
..


Patch Set 2:

(3 comments)

Hi Zoltan, thank you for the review. the change has been updated.

http://gerrit.cloudera.org:8080/#/c/17300/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17300/1//COMMIT_MSG@13
PS1, Line 13: earlier
> earlier
Done


http://gerrit.cloudera.org:8080/#/c/17300/1//COMMIT_MSG@15
PS1, Line 15: Kudu
> Actually Iceberg tables are related because they might use Parquet or ORC u
Done


http://gerrit.cloudera.org:8080/#/c/17300/1//COMMIT_MSG@16
PS1, Line 16: related
> related
Done



--
To view, visit http://gerrit.cloudera.org:8080/17300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7fa2f949336a422acb4d01c9347b9b2e808e4aec
Gerrit-Change-Number: 17300
Gerrit-PatchSet: 2
Gerrit-Owner: Tamas Mate 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 13 Apr 2021 07:50:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9732: Improve exceptions of unsupported HdfsTableSink formats

2021-04-13 Thread Tamas Mate (Code Review)
Hello Zoltan Borok-Nagy, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17300

to look at the new patch set (#2).

Change subject: IMPALA-9732: Improve exceptions of unsupported HdfsTableSink 
formats
..

IMPALA-9732: Improve exceptions of unsupported HdfsTableSink formats

This change updates the exception that is thrown when the user tries to
insert into a partition which has unsupported format. The information to
make this decision is available during analysis, therefore this commit
also moves the check from the planner to the analyzer to have an
earlier result.

In the analyzer only the FeFsTables have to be checked therefore Kudu
tables are not related. Also, there is a difference between static and
dynamic partition clauses, for static partition clauses the partition
format is available during compile, for dynaminc partition clauses it is
only avaialble during runtime.

Testing:
 - Added unit tests
 - Ran exhaustive tests successfully

Change-Id: I7fa2f949336a422acb4d01c9347b9b2e808e4aec
---
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/datasets/functional/functional_schema_template.sql
4 files changed, 78 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/17300/2
--
To view, visit http://gerrit.cloudera.org:8080/17300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7fa2f949336a422acb4d01c9347b9b2e808e4aec
Gerrit-Change-Number: 17300
Gerrit-PatchSet: 2
Gerrit-Owner: Tamas Mate 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10282: Implement ds cpc sketch() and ds cpc estimate() functions

2021-04-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16656 )

Change subject: IMPALA-10282: Implement ds_cpc_sketch() and ds_cpc_estimate() 
functions
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8558/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I731e66fbadc74bc339c973f4d9337db9b7dd715a
Gerrit-Change-Number: 16656
Gerrit-PatchSet: 8
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 13 Apr 2021 06:43:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10532: TestOverlapMinMaxFilters.test overlap min max filters seems flaky

2021-04-13 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17289 )

Change subject: IMPALA-10532: 
TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17289/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17289/2//COMMIT_MSG@7
PS2, Line 7: TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky
nit: maybe this is more accurate:

 Skip TestOverlapMinMaxFilters.test_overlap_min_max_filters when EC is enabled



--
To view, visit http://gerrit.cloudera.org:8080/17289
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I527de530f7db1ce959e7ef2ae3ced18677221c9f
Gerrit-Change-Number: 17289
Gerrit-PatchSet: 2
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 13 Apr 2021 06:36:05 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10282: Implement ds cpc sketch() and ds cpc estimate() functions

2021-04-13 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded a new patch set (#8). ( 
http://gerrit.cloudera.org:8080/16656 )

Change subject: IMPALA-10282: Implement ds_cpc_sketch() and ds_cpc_estimate() 
functions
..

IMPALA-10282: Implement ds_cpc_sketch() and ds_cpc_estimate() functions

These functions can be used to get cardinality estimates of data
using CPC algorithm from Apache DataSketches. ds_cpc_sketch()
receives a dataset, e.g. a column from a table, and returns a
serialized CPC sketch in string format. This can be written to a
table or be fed directly to ds_cpc_estimate() that returns the
cardinality estimate for that sketch.

Similar to the HLL sketch, the primary use-case for the CPC sketch
is for counting distinct values as a stream, and then merging
multiple sketches together for a total distinct count.

For more details about Apache DataSketches' CPC see:
http://datasketches.apache.org/docs/CPC/CPC.html
Figures-of-Merit Comparison of the HLL and CPC Sketches see:
https://datasketches.apache.org/docs/DistinctCountMeritComparisons.html

Testing:
 - Added some tests running estimates for small datasets where the
   amount of data is small enough to get the correct results.
 - Ran manual tests on tpch_parquet.lineitem to compare perfomance
   with ndv(). Depending on data characteristics ndv() appears 2x-3x
   faster. CPC gives closer estimate than current ndv(). CPC is more
   accurate than HLL in some cases

Change-Id: I731e66fbadc74bc339c973f4d9337db9b7dd715a
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M be/src/exprs/datasketches-common.cc
M be/src/exprs/datasketches-common.h
M be/src/exprs/datasketches-functions-ir.cc
M be/src/exprs/datasketches-functions.h
M common/function-registry/impala_functions.py
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/data/README
A testdata/data/cpc_sketches_from_hive.parquet
A testdata/workloads/functional-query/queries/QueryTest/datasketches-cpc.test
M tests/query_test/test_datasketches.py
12 files changed, 398 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/16656/8
--
To view, visit http://gerrit.cloudera.org:8080/16656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I731e66fbadc74bc339c973f4d9337db9b7dd715a
Gerrit-Change-Number: 16656
Gerrit-PatchSet: 8
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins