[Impala-ASF-CR] IMPALA-10532: TestOverlapMinMaxFilters.test overlap min max filters seems flaky

2021-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17289 )

Change subject: IMPALA-10532: 
TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky
..


Patch Set 6: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17289
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I527de530f7db1ce959e7ef2ae3ced18677221c9f
Gerrit-Change-Number: 17289
Gerrit-PatchSet: 6
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 15 Apr 2021 04:02:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10532: TestOverlapMinMaxFilters.test overlap min max filters seems flaky

2021-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17289 )

Change subject: IMPALA-10532: 
TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky
..


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7077/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/17289
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I527de530f7db1ce959e7ef2ae3ced18677221c9f
Gerrit-Change-Number: 17289
Gerrit-PatchSet: 6
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 15 Apr 2021 04:02:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10652: Optimize the checking of the size of incremental stats

2021-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17299 )

Change subject: IMPALA-10652: Optimize the checking of the size of incremental 
stats
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8586/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee
Gerrit-Change-Number: 17299
Gerrit-PatchSet: 4
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Thu, 15 Apr 2021 02:46:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10532: TestOverlapMinMaxFilters.test overlap min max filters seems flaky

2021-04-14 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17289 )

Change subject: IMPALA-10532: 
TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky
..


Patch Set 5:

retest


--
To view, visit http://gerrit.cloudera.org:8080/17289
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I527de530f7db1ce959e7ef2ae3ced18677221c9f
Gerrit-Change-Number: 17289
Gerrit-PatchSet: 5
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 15 Apr 2021 02:33:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

2021-04-14 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in 
coordinator
..


Patch Set 21:

retest


--
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 21
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 15 Apr 2021 02:33:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10652: Optimize the checking of the size of incremental stats

2021-04-14 Thread liuyao (Code Review)
liuyao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17299 )

Change subject: IMPALA-10652: Optimize the checking of the size of incremental 
stats
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17299/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

http://gerrit.cloudera.org:8080/#/c/17299/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@1784
PS3, Line 1784: fu
> nit. table alltypes?
Done



--
To view, visit http://gerrit.cloudera.org:8080/17299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee
Gerrit-Change-Number: 17299
Gerrit-PatchSet: 4
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Thu, 15 Apr 2021 02:27:43 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10652: Optimize the checking of the size of incremental stats

2021-04-14 Thread liuyao (Code Review)
Hello Aman Sinha, Qifan Chen, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17299

to look at the new patch set (#4).

Change subject: IMPALA-10652: Optimize the checking of the size of incremental 
stats
..

IMPALA-10652: Optimize the checking of the size of incremental stats

Modify the estimation method of incremental statistics size:
incremental statistics size = Existing partition statistics
+ This time calculation partition stats
- Repeated calculation partition stats

Testing:
All partitions of a table have no incremental stats.
--Calculate the incremental stats of all partitions,
  the incremental stats size exceeds the threshold,
  an error is reported.
--Calculate the incremental stats of one partition,
  no error is reported.

Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee
---
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
2 files changed, 45 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/17299/4
--
To view, visit http://gerrit.cloudera.org:8080/17299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee
Gerrit-Change-Number: 17299
Gerrit-PatchSet: 4
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 


[Impala-ASF-CR] IMPALA-10631: Upgrade DataSketches to version 3.0.0

2021-04-14 Thread Fucun Chu (Code Review)
Fucun Chu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17294 )

Change subject: IMPALA-10631: Upgrade DataSketches to version 3.0.0
..


Patch Set 4:

All tests run with the pre-review-test job passed, see: 
https://jenkins.impala.io/job/pre-review-test/909/. The failed test in the 
gerrit-verify-dryrun job (query_test/test_fetch.py, from: 
https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/13658/) did not 
reappear.
How to deal with this situation, re-run the gerrit-verify-dryrun job?


--
To view, visit http://gerrit.cloudera.org:8080/17294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I37622a7643d015b80f55b802421eae826aa7a4f9
Gerrit-Change-Number: 17294
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 15 Apr 2021 01:57:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10645: Log catalogd HMS API metrics

2021-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17284 )

Change subject: IMPALA-10645: Log catalogd HMS API metrics
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8585/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17284
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id41afe89bbe3395c158919bddd09f302c6752287
Gerrit-Change-Number: 17284
Gerrit-PatchSet: 7
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Thu, 15 Apr 2021 01:10:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10645: Log catalogd HMS API metrics

2021-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17284 )

Change subject: IMPALA-10645: Log catalogd HMS API metrics
..


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7076/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/17284
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id41afe89bbe3395c158919bddd09f302c6752287
Gerrit-Change-Number: 17284
Gerrit-PatchSet: 7
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Thu, 15 Apr 2021 00:50:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10645: Log catalogd HMS API metrics

2021-04-14 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/17284 )

Change subject: IMPALA-10645: Log catalogd HMS API metrics
..

IMPALA-10645: Log catalogd HMS API metrics

Expose rpc duration, cache hit ratio, etc for Catalogd HMS APIs.
The metrics currently are only logged at debug level
when the catalogd starts a HMS endpoint. A followup
will be done separately to expose them to the debug UI.

This patch was originally contributed by Kishen Das.

Testing:
1. Deployed the catalogd's metastore server and made sure that
the metrics are logged in the catalogd.INFO logs.

Change-Id: Id41afe89bbe3395c158919bddd09f302c6752287
---
M common/thrift/JniCatalog.thrift
M common/thrift/metrics.json
M fe/src/main/java/org/apache/impala/catalog/CatalogHmsAPIHelper.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java
A fe/src/main/java/org/apache/impala/catalog/metastore/HmsApiNameEnum.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/ICatalogMetastoreServer.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/NoOpCatalogMetastoreServer.java
M fe/src/main/java/org/apache/impala/catalog/monitor/CatalogMonitor.java
11 files changed, 455 insertions(+), 36 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/84/17284/7
--
To view, visit http://gerrit.cloudera.org:8080/17284
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id41afe89bbe3395c158919bddd09f302c6752287
Gerrit-Change-Number: 17284
Gerrit-PatchSet: 7
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-10645: Log catalogd HMS API metrics

2021-04-14 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17284 )

Change subject: IMPALA-10645: Log catalogd HMS API metrics
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17284/4/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java
File 
fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java:

http://gerrit.cloudera.org:8080/#/c/17284/4/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java@96
PS4, Line 96:   public static final String CATALOGD_CACHE_HIT_RATIO_METRIC = 
"catalogd-hms-cache.cache-hit-ratio";
> line too long (100 > 90)
Removed this since it was unused.



--
To view, visit http://gerrit.cloudera.org:8080/17284
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id41afe89bbe3395c158919bddd09f302c6752287
Gerrit-Change-Number: 17284
Gerrit-PatchSet: 4
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Thu, 15 Apr 2021 00:49:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table cache for hms ddl apis which modify tables and partitions.

2021-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17298 )

Change subject: IMPALA-10648: Invalidate catalogd table cache for hms ddl apis 
which modify tables and partitions.
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8584/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17298
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98
Gerrit-Change-Number: 17298
Gerrit-PatchSet: 5
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Thu, 15 Apr 2021 00:40:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table cache for hms ddl apis which modify tables and partitions.

2021-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17298 )

Change subject: IMPALA-10648: Invalidate catalogd table cache for hms ddl apis 
which modify tables and partitions.
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17298/5/tests/custom_cluster/test_metastore_service.py
File tests/custom_cluster/test_metastore_service.py:

http://gerrit.cloudera.org:8080/#/c/17298/5/tests/custom_cluster/test_metastore_service.py@565
PS5, Line 565: q
flake8: E501 line too long (96 > 90 characters)



--
To view, visit http://gerrit.cloudera.org:8080/17298
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98
Gerrit-Change-Number: 17298
Gerrit-PatchSet: 5
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Thu, 15 Apr 2021 00:21:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table cache for hms ddl apis which modify tables and partitions.

2021-04-14 Thread Sourabh Goyal (Code Review)
Hello Quanlong Huang, Vihang Karajgaonkar, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17298

to look at the new patch set (#5).

Change subject: IMPALA-10648: Invalidate catalogd table cache for hms ddl apis 
which modify tables and partitions.
..

IMPALA-10648: Invalidate catalogd table cache for hms ddl apis which modify 
tables and partitions.

For non transactional tables, invalidate the table from cache if HMS DDL apis 
are accessed
from catalogd's metastore server. Any subsequent get table request fetches the 
table from
HMS and loads it in cache. This ensures that any get_table/get_partition 
requests after DDL
operations on the same table return updated table. This behavior has a 
performance penalty
(since table loading in cache takes time) but ensures consistency. This change 
is behind catalogd
server's flag: invalidate_hms_cache_on_ddls which is enabled by default. The 
flag needs to be
turned off if this change becomes a performance bottleneck.

Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98
---
M be/src/catalog/catalog-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M 
fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M tests/custom_cluster/test_metastore_service.py
6 files changed, 508 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/17298/5
--
To view, visit http://gerrit.cloudera.org:8080/17298
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98
Gerrit-Change-Number: 17298
Gerrit-PatchSet: 5
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table cache for hms ddl apis which modify tables and partitions.

2021-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17298 )

Change subject: IMPALA-10648: Invalidate catalogd table cache for hms ddl apis 
which modify tables and partitions.
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8583/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17298
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98
Gerrit-Change-Number: 17298
Gerrit-PatchSet: 4
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Wed, 14 Apr 2021 23:38:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-7825: Upgrade Thrift version to 0.11.0

2021-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17170 )

Change subject: WIP IMPALA-7825: Upgrade Thrift version to 0.11.0
..


Patch Set 5: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7075/


--
To view, visit http://gerrit.cloudera.org:8080/17170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6
Gerrit-Change-Number: 17170
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 14 Apr 2021 23:27:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table cache for hms ddl apis which modify tables and partitions.

2021-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17298 )

Change subject: IMPALA-10648: Invalidate catalogd table cache for hms ddl apis 
which modify tables and partitions.
..


Patch Set 4:

(18 comments)

http://gerrit.cloudera.org:8080/#/c/17298/4/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
File 
fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java:

http://gerrit.cloudera.org:8080/#/c/17298/4/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@1067
PS4, Line 1067:   
client.getHiveClient().getThriftClient().drop_partitions_req(dropPartitionsRequest);
line too long (94 > 90)


http://gerrit.cloudera.org:8080/#/c/17298/4/tests/custom_cluster/test_metastore_service.py
File tests/custom_cluster/test_metastore_service.py:

http://gerrit.cloudera.org:8080/#/c/17298/4/tests/custom_cluster/test_metastore_service.py@470
PS4, Line 470: .
flake8: E131 continuation line unaligned for hanging indent


http://gerrit.cloudera.org:8080/#/c/17298/4/tests/custom_cluster/test_metastore_service.py@476
PS4, Line 476: e
flake8: E501 line too long (92 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/17298/4/tests/custom_cluster/test_metastore_service.py@489
PS4, Line 489: #
flake8: E265 block comment should start with '# '


http://gerrit.cloudera.org:8080/#/c/17298/4/tests/custom_cluster/test_metastore_service.py@490
PS4, Line 490: e
flake8: E501 line too long (92 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/17298/4/tests/custom_cluster/test_metastore_service.py@496
PS4, Line 496:
flake8: E501 line too long (96 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/17298/4/tests/custom_cluster/test_metastore_service.py@499
PS4, Line 499: t
flake8: E501 line too long (92 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/17298/4/tests/custom_cluster/test_metastore_service.py@526
PS4, Line 526: #
flake8: E303 too many blank lines (2)


http://gerrit.cloudera.org:8080/#/c/17298/4/tests/custom_cluster/test_metastore_service.py@531
PS4, Line 531:
flake8: E501 line too long (96 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/17298/4/tests/custom_cluster/test_metastore_service.py@533
PS4, Line 533: t
flake8: E501 line too long (92 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/17298/4/tests/custom_cluster/test_metastore_service.py@550
PS4, Line 550: #
flake8: E303 too many blank lines (2)


http://gerrit.cloudera.org:8080/#/c/17298/4/tests/custom_cluster/test_metastore_service.py@554
PS4, Line 554:
flake8: E501 line too long (96 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/17298/4/tests/custom_cluster/test_metastore_service.py@561
PS4, Line 561: e
flake8: E501 line too long (100 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/17298/4/tests/custom_cluster/test_metastore_service.py@568
PS4, Line 568: q
flake8: E501 line too long (96 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/17298/4/tests/custom_cluster/test_metastore_service.py@577
PS4, Line 577:
flake8: E251 unexpected spaces around keyword / parameter equals


http://gerrit.cloudera.org:8080/#/c/17298/4/tests/custom_cluster/test_metastore_service.py@577
PS4, Line 577:
flake8: E251 unexpected spaces around keyword / parameter equals


http://gerrit.cloudera.org:8080/#/c/17298/4/tests/custom_cluster/test_metastore_service.py@578
PS4, Line 578:
flake8: E251 unexpected spaces around keyword / parameter equals


http://gerrit.cloudera.org:8080/#/c/17298/4/tests/custom_cluster/test_metastore_service.py@578
PS4, Line 578:
flake8: E251 unexpected spaces around keyword / parameter equals



--
To view, visit http://gerrit.cloudera.org:8080/17298
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98
Gerrit-Change-Number: 17298
Gerrit-PatchSet: 4
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Comment-Date: Wed, 14 Apr 2021 23:16:43 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table cache for hms ddl apis which modify tables and partitions.

2021-04-14 Thread Sourabh Goyal (Code Review)
Sourabh Goyal has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17298 )

Change subject: IMPALA-10648: Invalidate catalogd table cache for hms ddl apis 
which modify tables and partitions.
..


Patch Set 4:

(26 comments)

http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
File 
fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java:

http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@141
PS3, Line 141: import org.apache.hadoop.hive.metastore.api.LockRequest;
> Is this used somewhere?
No. Will remove it. Thanks for pointing it out.


http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@222
PS3, Line 222: UnlockRequest;
> Is this used somewhere?
Will remove it.


http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@347
PS3, Line 347: "is
> nit: We use 4 spaces indention for multi-line statement. However, many chan
Ack


http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@348
PS3, Line 348:   }
> line too long (91 > 90)
Ack


http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@349
PS3, Line 349:
> nit: 4 spaces indention
Ack


http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@890
PS3, Line 890:   
client.getHiveClient().getThriftClient().add_partition(partition);
> line too long (99 > 90)
Ack


http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@891
PS3, Line 891:   
invalidateNonTransactionalTableIfExists(partition.getDbName(),
> line too long (94 > 90)
Ack


http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@904
PS3, Line 904:   
invalidateNonTransactionalTableIfExists(partition.getDbName(),
> line too long (94 > 90)
Ack


http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@914
PS3, Line 914: try (MetaStoreClient client = catalog_.getMetaStoreClient()) 
{
> line too long (102 > 90)
Ack


http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@915
PS3, Line 915:
> Is it possible that these partitions belong to different tables?
Very likely the partitions belong to the same table. I will confirm and make 
the change accordingly.
Thanks for pointing it out.


http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@929
PS3, Line 929: int numPartitionsAdded =  client.getHiveCl
> Same question as above
Will check this one too.


http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@930
PS3, Line 930:   .getThriftClient().add_partitions_pspec(list);
> line too long (104 > 90)
Ack


http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@955
PS3, Line 955:   AddPartitionsResult result =  
client.getHiveClient().getThriftClient()
> line too long (114 > 90)
Ack


http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@968
PS3, Line 968: try (MetaStoreClient client = catalog_.getMetaStoreClient()) 
{
> line too long (104 > 90)
Ack


http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@994
PS3, Line 994:   throws InvalidObjectException, AlreadyExistsException, 
MetaException, TException {
> line too long (91 > 90)
Ack


http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@1021
PS3, Line 1021:   throws NoSuchObjectException, MetaException, TException {
> line too long (94 > 90)
Ack


http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@1034
PS3, Line 1034:   boolean deleteData)
> line too long (106 > 90)
Ack


http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@1049
PS3, Line 1049:   String partName, boolean deleteData, EnvironmentContext 
envContext)
> line too long (91 > 90)
Ack



[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table cache for hms ddl apis which modify tables and partitions.

2021-04-14 Thread Sourabh Goyal (Code Review)
Hello Quanlong Huang, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17298

to look at the new patch set (#4).

Change subject: IMPALA-10648: Invalidate catalogd table cache for hms ddl apis 
which modify tables and partitions.
..

IMPALA-10648: Invalidate catalogd table cache for hms ddl apis which modify 
tables and partitions.

For non transactional tables, invalidate the table from cache if HMS DDL apis 
are accessed
from catalogd's metastore server. Any subsequent get table request fetches the 
table from
HMS and loads it in cache. This ensures that any get_table/get_partition 
requests after DDL
operations on the same table return updated table. This behavior has a 
performance penalty
(since table loading in cache takes time) but ensures consistency. This change 
is behind catalogd
server's flag: invalidate_hms_cache_on_ddls which is enabled by default. The 
flag needs to be
turned off if this change becomes a performance bottleneck.

Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98
---
M be/src/catalog/catalog-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M 
fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M tests/custom_cluster/test_metastore_service.py
6 files changed, 510 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/17298/4
--
To view, visit http://gerrit.cloudera.org:8080/17298
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98
Gerrit-Change-Number: 17298
Gerrit-PatchSet: 4
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-10584: Defer advancing read page if stream only has 2 pages.

2021-04-14 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17195 )

Change subject: IMPALA-10584: Defer advancing read page if stream only has 2 
pages.
..


Patch Set 6:

> Patch Set 6:
>
> (8 comments)
>
> Patch set 5 change the condition on when to NOT advance the read page.
> Read page will not be advanced in UnpinStream if read page is attached to 
> output RowBatch and there are only 2 pages in the stream.

Sorry, I should mention patch set 6 instead of 5.


--
To view, visit http://gerrit.cloudera.org:8080/17195
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a
Gerrit-Change-Number: 17195
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 14 Apr 2021 22:43:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10584: Defer advancing read page if stream only has 2 pages.

2021-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17195 )

Change subject: IMPALA-10584: Defer advancing read page if stream only has 2 
pages.
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8582/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17195
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a
Gerrit-Change-Number: 17195
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 14 Apr 2021 22:36:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10584: Defer advancing read page if stream only has 2 pages.

2021-04-14 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17195 )

Change subject: IMPALA-10584: Defer advancing read page if stream only has 2 
pages.
..


Patch Set 6:

(8 comments)

Patch set 5 change the condition on when to NOT advance the read page.
Read page will not be advanced in UnpinStream if read page is attached to 
output RowBatch and there are only 2 pages in the stream.

http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc
File be/src/runtime/buffered-tuple-stream-test.cc:

http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc@1271
PS5, Line 1271: 
ASSERT_OK(stream.UnpinStream(BufferedTupleStream::UNPIN_ALL_EXCEPT_CURRENT));
> Not related to this patch, but I'm confused how do we verify we have advanc
I think this verify the read page has been advanced by not hitting DCHECK at 
ExpectedPinCount().


http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc@1300
PS5, Line 1300: // Test that UnpinStream defer advancing the read page when all 
rows from the read page
> Could you add some comments above this? e.g.
Done


http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc@1333
PS5, Line 1333: ASSERT_TRUE(read_batch.num_rows() < num_rows);
> Could you add a comment here that we are adding rows without releasing the
Done


http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc@1334
PS5, Line 1334: ASSERT_TRUE(!eos);
> Could you leave a comment about why do we run this twice? I guess we want t
Done


http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc@1354
PS5, Line 1354:   // Retry inserting this row by decreasing the index.
> Could you add ASSERT_FALSE(stream.is_pinned()) at the end?
Done


http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc@1356
PS5, Line 1356:   // even if we're not immediately cleaning up the 
read_batch.
> Do we need read_batch.Reset() as well?
Done


http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream.cc
File be/src/runtime/buffered-tuple-stream.cc:

http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream.cc@728
PS5, Line 728:   // defer advancing the read page until the next 
GetNext() call by the reader
 :   // (see IMPALA-10584).
 :   defer_advancing_read_page = true;
 : }
 :   }
 :
 :   if 
> Ok, after some investigation, I'm not confident I can implement solution 2)
Marking this as Done. Lets continue this discussion in patch set 5.


http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream.cc@747
PS5, Line 747:
> May DCHECK() on this assertion here.
Done



--
To view, visit http://gerrit.cloudera.org:8080/17195
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a
Gerrit-Change-Number: 17195
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 14 Apr 2021 22:22:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10645: Log catalogd HMS API metrics

2021-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17284 )

Change subject: IMPALA-10645: Log catalogd HMS API metrics
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8581/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17284
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id41afe89bbe3395c158919bddd09f302c6752287
Gerrit-Change-Number: 17284
Gerrit-PatchSet: 4
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Wed, 14 Apr 2021 22:20:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10584: Defer advancing read page if stream only has 2 pages.

2021-04-14 Thread Riza Suminto (Code Review)
Hello Quanlong Huang, Qifan Chen, Joe McDonnell, Bikramjeet Vig, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17195

to look at the new patch set (#6).

Change subject: IMPALA-10584: Defer advancing read page if stream only has 2 
pages.
..

IMPALA-10584: Defer advancing read page if stream only has 2 pages.

TestScratchLimit::test_with_unlimited_scratch_limit has been
intermittently crashing in ubuntu-16.04-dockerised-tests environment
after result spooling is enabled by default in IMPALA-9856. DCHECK
violation occurs in ReservationTracker::CheckConsistency() due to
BufferedTupleStream wrongly tries to reclaim memory reservation while
unpinning the stream.

For this bug to surface, all of the following needs to happen:
- Stream is in pinned mode.
- There are only 2 pages in the stream: 1 read and 1 write.
- Stream can not increase reservation anymore either due to memory
  pressure or low buffer/memory limit.
- The stream read page has been fully read and is attached to output
  RowBatch. But the output RowBatch has not cleaned up yet.
- BufferedTupleStream::UnpinStream is invoked.

The memory accounting bug happens because UnpinStream proceeds to
NextReadPage where the read page buffer was mistakenly assumed as
released. default_page_len_ bytes were added into
write_page_reservation_ and subsequently violates the total memory
reservation.

This patch fixes the bug by deferring advancement of the read iterator
in UnpinStream if the read page is attached to output RowBatch and there
are only 2 pages in the stream. This is OK because after UnpinStream
finished, the stream is now in unpinned mode and has_read_write_page is
false. The next AddRow operation is then allowed to unpin the previous
write page first before reusing the reservation to allocate a new write
page. The next GetNext call will be responsible to advance the read
page.

Testing:
- Add be test DeferAdvancingReadPage.
- Loop the TestScratchLimit::test_with_unlimited_scratch_limit in my
  local dev machine and verify that each test passed without triggering
  the DCHECK violation.
- Reenable result spooling in TestScratchLimit that was disabled in
  IMPALA-10559.
- Pass core tests.

Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a
---
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
M tests/query_test/test_scratch_limit.py
4 files changed, 100 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/17195/6
--
To view, visit http://gerrit.cloudera.org:8080/17195
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a
Gerrit-Change-Number: 17195
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-9287: Add support for embedded metastore mode

2021-04-14 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has abandoned this change. ( 
http://gerrit.cloudera.org:8080/15223 )

Change subject: IMPALA-9287: Add support for embedded metastore mode
..


Abandoned

I am not actively working on this patch anymore.
--
To view, visit http://gerrit.cloudera.org:8080/15223
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: Iab7de6a7f11ef0bff85c0fb03f004756c3cde784
Gerrit-Change-Number: 15223
Gerrit-PatchSet: 8
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-10645: Log catalogd HMS API metrics

2021-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17284 )

Change subject: IMPALA-10645: Log catalogd HMS API metrics
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17284/4/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java
File 
fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java:

http://gerrit.cloudera.org:8080/#/c/17284/4/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java@96
PS4, Line 96:   public static final String CATALOGD_CACHE_HIT_RATIO_METRIC = 
"catalogd-hms-cache.cache-hit-ratio";
line too long (100 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/17284
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id41afe89bbe3395c158919bddd09f302c6752287
Gerrit-Change-Number: 17284
Gerrit-PatchSet: 4
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Wed, 14 Apr 2021 22:02:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8795 : Enable event polling by default in tests

2021-04-14 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has abandoned this change. ( 
http://gerrit.cloudera.org:8080/13922 )

Change subject: IMPALA-8795 : Enable event polling by default in tests
..


Abandoned
--
To view, visit http://gerrit.cloudera.org:8080/13922
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I7279349d4900e24fbcf558f290549496844ce138
Gerrit-Change-Number: 13922
Gerrit-PatchSet: 10
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-8795 : Enable event polling by default in dockerized tests.

2021-04-14 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has abandoned this change. ( 
http://gerrit.cloudera.org:8080/14272 )

Change subject: IMPALA-8795 : Enable event polling by default in dockerized 
tests.
..


Abandoned

I don't think this patch is actively being worked upon anymore.
--
To view, visit http://gerrit.cloudera.org:8080/14272
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I222b64236060b3c4c2d554e2f10e129984ebe972
Gerrit-Change-Number: 14272
Gerrit-PatchSet: 18
Gerrit-Owner: Anurag Mantripragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-10645: Log catalogd HMS API metrics

2021-04-14 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/17284 )

Change subject: IMPALA-10645: Log catalogd HMS API metrics
..

IMPALA-10645: Log catalogd HMS API metrics

Expose rpc duration, cache hit ratio, etc for Catalogd HMS APIs.
The metrics currently are only logged at trace level. A followup
will be done separately to expose them to the debug UI.

This patch was originally contributed by Kishen Das.

Testing:
1. Deployed the catalogd's metastore server and made sure that
the metrics are logged in the catalogd.INFO logs.

Change-Id: Id41afe89bbe3395c158919bddd09f302c6752287
---
M common/thrift/JniCatalog.thrift
M common/thrift/metrics.json
M fe/src/main/java/org/apache/impala/catalog/CatalogHmsAPIHelper.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java
A fe/src/main/java/org/apache/impala/catalog/metastore/HmsApiNameEnum.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/ICatalogMetastoreServer.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/NoOpCatalogMetastoreServer.java
M fe/src/main/java/org/apache/impala/catalog/monitor/CatalogMonitor.java
12 files changed, 470 insertions(+), 40 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/84/17284/4
--
To view, visit http://gerrit.cloudera.org:8080/17284
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id41afe89bbe3395c158919bddd09f302c6752287
Gerrit-Change-Number: 17284
Gerrit-PatchSet: 4
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-10645: Log catalogd HMS API metrics

2021-04-14 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17284 )

Change subject: IMPALA-10645: Log catalogd HMS API metrics
..


Patch Set 2:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/17284/2/common/thrift/JniCatalog.thrift
File common/thrift/JniCatalog.thrift:

http://gerrit.cloudera.org:8080/#/c/17284/2/common/thrift/JniCatalog.thrift@849
PS2, Line 849:
> nit: redundant blank line
Done


http://gerrit.cloudera.org:8080/#/c/17284/2/common/thrift/JniCatalog.thrift@884
PS2, Line 884:
> nit: redundant blank line
Done


http://gerrit.cloudera.org:8080/#/c/17284/2/common/thrift/JniCatalog.thrift@909
PS2, Line 909:
> nit: redundant blank line
Done


http://gerrit.cloudera.org:8080/#/c/17284/2/common/thrift/JniCatalog.thrift@934
PS2, Line 934:
> nit: redundant blank lines
Done


http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/metastore/HmsApiNameEnum.java
File fe/src/main/java/org/apache/impala/catalog/metastore/HmsApiNameEnum.java:

http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/metastore/HmsApiNameEnum.java@23
PS2, Line 23:  *
> nit: remove this line
Done



--
To view, visit http://gerrit.cloudera.org:8080/17284
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id41afe89bbe3395c158919bddd09f302c6752287
Gerrit-Change-Number: 17284
Gerrit-PatchSet: 2
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Wed, 14 Apr 2021 22:01:05 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10645: Log catalogd HMS API metrics

2021-04-14 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17284 )

Change subject: IMPALA-10645: Log catalogd HMS API metrics
..


Patch Set 2:

(26 comments)

Sorry for the formatting errors. Hopefully they are all gone now.

http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/CatalogHmsAPIHelper.java
File fe/src/main/java/org/apache/impala/catalog/CatalogHmsAPIHelper.java:

http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/CatalogHmsAPIHelper.java@142
PS2, Line 142: catalog, reqBuilder.build(), dbName, tblName, 
HmsApiNameEnum.GET_TABLE_REQ.apiName());
> line too long (94 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/CatalogHmsAPIHelper.java@245
PS2, Line 245: catalogReq.build(), dbName, tblName, 
HmsApiNameEnum.GET_PARTITION_BY_EXPR.apiName());
> line too long (93 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/CatalogHmsAPIHelper.java@367
PS2, Line 367: requestBuilder.build(), dbName, tblName, 
HmsApiNameEnum.GET_PARTITION_BY_NAMES.apiName());
> line too long (98 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2221
PS2, Line 2221:
> nit: I think we use 4 spaces idention here
Done


http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2227
PS2, Line 2227:
> nit: 4 spaces idention
Yeah, looks like the original patch from Kishen used wrong code-style template. 
Done.


http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2230
PS2, Line 2230: if
> nit: need one space after "if"
Done


http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2231
PS2, Line 2231: // Update the cache miss metric, as the valid write id 
list did not match and we have to reload the table.
> line too long (114 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2233
PS2, Line 2233:
> nit: 4 spaces idention
Done


http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2067
PS2, Line 2067: Counter misses = 
CatalogMonitor.INSTANCE.getCatalogdHmsCacheMetrics().getCounter(FILEMETADATA_CACHE_MISS_METRIC);
> Previously we count the metrics per table, this changes it to use a global
Yes, I think changing this from table level to global level doesn't make a lot 
of sense. The metric value is not very useful if it is at global level. Let me 
see how can I change it to table level back.


http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2068
PS2, Line 2068: Counter hits = 
CatalogMonitor.INSTANCE.getCatalogdHmsCacheMetrics().getCounter(FILEMETADATA_CACHE_HIT_METRIC);
> line too long (114 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2131
PS2, Line 2131: getFileMetadataCacheHitRate
> After this patch, the hit rate will be a global hit rate counted on all tab
I think we should keep it at table level.


http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java
File 
fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java:

http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java@77
PS2, Line 77:   private static final Logger LOG =
:   LoggerFactory.getLogger(CatalogMetastoreServer.class);
> nit: don't need changes since the original line fits in 90 chars.
Done


http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java@94
PS2, Line 94:   private static final String ACTIVE_CONNECTIONS_METRIC
:   = "metastore.active.connections";
> nit: don't need changes as well.
Done


http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java@99
PS2, Line 99: =
> nit: I think our code style tend to put this in the above line
Done


http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java@143
PS2, Line 143:
> nit: 

[Impala-ASF-CR] IMPALA-10502: Handle CREATE/DROP events correctly

2021-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17308 )

Change subject: IMPALA-10502: Handle CREATE/DROP events correctly
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8580/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17308
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia2c5e96b48abac015240f20295b3ec3b1d71f24a
Gerrit-Change-Number: 17308
Gerrit-PatchSet: 3
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 14 Apr 2021 21:49:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10502: Handle CREATE/DROP events correctly

2021-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17308 )

Change subject: IMPALA-10502: Handle CREATE/DROP events correctly
..


Patch Set 3:

(33 comments)

http://gerrit.cloudera.org:8080/#/c/17308/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/17308/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1526
PS3, Line 1526:   .removePartitionsIfNotAddedLater(eventId_, 
dbName_, tblName_, droppedPartitions_,
line too long (95 > 90)


http://gerrit.cloudera.org:8080/#/c/17308/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
File 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java:

http://gerrit.cloudera.org:8080/#/c/17308/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@257
PS3, Line 257:   
String.format(CatalogOpExecutor.HMS_RPC_ERROR_FORMAT_STR, 
"getNextNotification"));
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/17308/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@737
PS3, Line 737:   CatalogOpExecutor catalogOpExecutor, long startSyncFromId, 
long eventPollingInterval)
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/17308/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@744
PS3, Line 744: new MetastoreEventsProcessor(catalogOpExecutor, 
startSyncFromId, eventPollingInterval);
line too long (95 > 90)


http://gerrit.cloudera.org:8080/#/c/17308/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/17308/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1304
PS3, Line 1304:   private Table addHdfsPartitions(MetaStoreClient msClient, 
Table tbl, List partitions)
line too long (98 > 90)


http://gerrit.cloudera.org:8080/#/c/17308/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1704
PS3, Line 1704:   
CreateDatabaseEvent.CREATE_DATABASE_EVENT_TYPE.equals(notificationEvent.getEventType())
line too long (105 > 90)


http://gerrit.cloudera.org:8080/#/c/17308/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1715
PS3, Line 1715: // Due to HIVE-24899 we cannot rely on the database 
object present in the event
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/17308/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1918
PS3, Line 1918:   eventIdToPartVals.computeIfAbsent(eventId, l -> new 
ArrayList<>()).add(partVals);
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/17308/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2544
PS3, Line 2544: events = 
MetastoreEventsProcessor.getNextMetastoreEvents(catalog_, eventId, new 
NotificationFilter() {
line too long (110 > 90)


http://gerrit.cloudera.org:8080/#/c/17308/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2547
PS3, Line 2547: return 
DropTableEvent.DROP_TABLE_EVENT_TYPE.equals(notificationEvent.getEventType()) 
&& finalMsTbl
line too long (110 > 90)


http://gerrit.cloudera.org:8080/#/c/17308/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3119
PS3, Line 3119:   Pair eventTblPair = getTableFromEvents(
line too long (99 > 90)


http://gerrit.cloudera.org:8080/#/c/17308/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3188
PS3, Line 3188: 
CreateTableEvent.CREATE_TABLE_EVENT_TYPE.equals(notificationEvent.getEventType())
line too long (97 > 90)


http://gerrit.cloudera.org:8080/#/c/17308/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3385
PS3, Line 3385: return 
CreateTableEvent.CREATE_TABLE_EVENT_TYPE.equals(notificationEvent.getEventType())
line too long (104 > 90)


http://gerrit.cloudera.org:8080/#/c/17308/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3386
PS3, Line 3386: && 
newTable.getDbName().equalsIgnoreCase(notificationEvent.getDbName())
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/17308/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3387
PS3, Line 3387: && 
newTable.getTableName().equalsIgnoreCase(notificationEvent.getTableName());
line too long (98 > 90)


http://gerrit.cloudera.org:8080/#/c/17308/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3780
PS3, Line 3780:   "EventId: {} Not adding partitions since the database 
{} does not exist anymore.",
line too long (92 > 90)



[Impala-ASF-CR] IMPALA-10502: Handle CREATE/DROP events correctly

2021-04-14 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17308


Change subject: IMPALA-10502: Handle CREATE/DROP events correctly
..

IMPALA-10502: Handle CREATE/DROP events correctly

The current way to detect self-events in case of CREATE/DROP events on
database, table and partition is problematic when the same object is
created and dropped repeatedly in quick succession. This happens mainly
due to couple of reasons. For example if we have a sequence of
CREATE_TABLE, DROP_TABLE, CREATE_TABLE ... on the same table, it is
possible that when the create table event is being processed, the table
is not present in catalog because it was dropped recently. In such a
case, events processor does not have enough state information in
catalogd to determine that this table has been dropped from the
catalogd and the event should be ignored. Similarly, if a drop event
is being processed, it is possible that the table has been recreated
with the same name when the drop event is received. In such a case,
events processor removes the table from the catalogd.

This can cause problems for queries which expect the table to exist or
not exist. E.g create table query fails with a table already exists or
a drop table query fails with table does not exist error.

In order to fix this issue, catalogd now keeps track of dropped objects
in a deleteLog which are garbage collected as the events come in. Every
time a database, table or parition is dropped, the deleteLog is
populated with the the drop event id generated due to the drop
operation. This deleteLog is looked up when the event is received to
determine if the event can be ignored.

Testing:
1. Added a new test which loops to create create/drop events for
database, table and partitions.
2. Added new metrics which the test verifies to ensure that events
don't create or drop the object.

Change-Id: Ia2c5e96b48abac015240f20295b3ec3b1d71f24a
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/Db.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
A fe/src/main/java/org/apache/impala/catalog/events/DeleteEventLog.java
A fe/src/main/java/org/apache/impala/catalog/events/EventFactory.java
M fe/src/main/java/org/apache/impala/catalog/events/ExternalEventsProcessor.java
M fe/src/main/java/org/apache/impala/catalog/events/InFlightEvents.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/catalog/events/NoOpEventProcessor.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M 
fe/src/test/java/org/apache/impala/catalog/events/EventsProcessorStressTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/SynchronousHMSEventProcessorForTests.java
M fe/src/test/java/org/apache/impala/testutil/CatalogServiceTestCatalog.java
M tests/custom_cluster/test_event_processing.py
21 files changed, 2,158 insertions(+), 834 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/17308/3
--
To view, visit http://gerrit.cloudera.org:8080/17308
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia2c5e96b48abac015240f20295b3ec3b1d71f24a
Gerrit-Change-Number: 17308
Gerrit-PatchSet: 3
Gerrit-Owner: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-10584: Defer advancing read page when reservation is tight.

2021-04-14 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17195 )

Change subject: IMPALA-10584: Defer advancing read page when reservation is 
tight.
..


Patch Set 5:

(1 comment)

s

http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream.cc
File be/src/runtime/buffered-tuple-stream.cc:

http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream.cc@728
PS5, Line 728: if (buffer_pool_client_->GetUnusedReservation() < 
last_attached_page_len_) {
 :   // Since attached_to_output_batch is true and 
GetUnusedReservation() is less
 :   // than last_attached_page_len_, the reader is most 
likely has not clean up the
 :   // RowBatch where the read page is attached to. We 
defer advancing the read page
 :   // until the next GetNext() call by the reader (see 
IMPALA-10584).
 :   defer_advancing_read_page = true;
 : }
> Your understanding is correct. Ideally, BufferedTupleStream should be 100%
Ok, after some investigation, I'm not confident I can implement solution 2) 
safely. The refactoring will touch several code that I don't fully understand 
yet such as grouping-aggregator, partitioned-hash-join-builder, and 
analytic-eval-node.

What if I change the logic into this instead:

  defer_advancing_read_page = num_pages_ <= 2;

Basically, we avoid getting into situation where we ended up with only 1 
read-write page while unpinning the stream. I think this is also a valid reason 
not to advance the read page (the other reason is that the reader has not freed 
the attached buffer). The "stealing" still won't happen and the DCHECK still 
won't hit.



--
To view, visit http://gerrit.cloudera.org:8080/17195
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a
Gerrit-Change-Number: 17195
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 14 Apr 2021 21:27:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10658: LOAD DATA INPATH silently fails between HDFS and Azure ABFS

2021-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17316 )

Change subject: IMPALA-10658: LOAD DATA INPATH silently fails between HDFS and 
Azure ABFS
..


Patch Set 1: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/17316
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id807e8a200b83283a09d3a917185cabab930017d
Gerrit-Change-Number: 17316
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Comment-Date: Wed, 14 Apr 2021 18:29:48 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10584: Defer advancing read page when reservation is tight.

2021-04-14 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17195 )

Change subject: IMPALA-10584: Defer advancing read page when reservation is 
tight.
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream.cc
File be/src/runtime/buffered-tuple-stream.cc:

http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream.cc@728
PS5, Line 728: if (buffer_pool_client_->GetUnusedReservation() < 
last_attached_page_len_) {
 :   // Since attached_to_output_batch is true and 
GetUnusedReservation() is less
 :   // than last_attached_page_len_, the reader is most 
likely has not clean up the
 :   // RowBatch where the read page is attached to. We 
defer advancing the read page
 :   // until the next GetNext() call by the reader (see 
IMPALA-10584).
 :   defer_advancing_read_page = true;
 : }
> Maybe I'm misunderstanding something. This still seems tricky for me. I agr
Your understanding is correct. Ideally, BufferedTupleStream should be 100% 
aware whether its client has freed the attached buffer or not. It raise a 
question though on how to do that, because the stream lost reference to the 
buffer once it attach the buffer to output row batch.
In relation of BufferedPlanRootSink and SpillableRowBatchQueue, I can think of 
2 solution:

1). Track the amount of  both used and unused reservation at the end of 
GetNext() call. We can then check the reservation amount again in UnpinStream() 
to determine whether client has freed the buffer or not. However, this can be 
tricky and lead to more corner cases such as what if we increase/reduce the 
total reservation in between?

2). Add an explicit method in BufferedTupleStream to signal the free. 
SpillableRowBatchQueue then need similar method as well to chain the signal 
down from BufferedPlanRootSink to BufferedTupleStream. The mechanics might 
becomes awkward, but it maintain correctness.

I'll try to implement solution 2) first.



--
To view, visit http://gerrit.cloudera.org:8080/17195
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a
Gerrit-Change-Number: 17195
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 14 Apr 2021 18:13:35 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] WIP IMPALA-7825: Upgrade Thrift version to 0.11.0

2021-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17170 )

Change subject: WIP IMPALA-7825: Upgrade Thrift version to 0.11.0
..


Patch Set 4:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/8579/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/17170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6
Gerrit-Change-Number: 17170
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 14 Apr 2021 18:04:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

2021-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in 
coordinator
..


Patch Set 21: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7072/


--
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 21
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 14 Apr 2021 17:48:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-7825: Upgrade Thrift version to 0.11.0

2021-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17170 )

Change subject: WIP IMPALA-7825: Upgrade Thrift version to 0.11.0
..


Patch Set 3:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/8578/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/17170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6
Gerrit-Change-Number: 17170
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 14 Apr 2021 17:48:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10532: TestOverlapMinMaxFilters.test overlap min max filters seems flaky

2021-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17289 )

Change subject: IMPALA-10532: 
TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky
..


Patch Set 5: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7070/


-- 
To view, visit http://gerrit.cloudera.org:8080/17289
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I527de530f7db1ce959e7ef2ae3ced18677221c9f
Gerrit-Change-Number: 17289
Gerrit-PatchSet: 5
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 14 Apr 2021 17:46:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-7825: Upgrade Thrift version to 0.11.0

2021-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17170 )

Change subject: WIP IMPALA-7825: Upgrade Thrift version to 0.11.0
..


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7075/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/17170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6
Gerrit-Change-Number: 17170
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 14 Apr 2021 17:42:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-7825: Upgrade Thrift version to 0.11.0

2021-04-14 Thread Csaba Ringhofer (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17170

to look at the new patch set (#4).

Change subject: WIP IMPALA-7825: Upgrade Thrift version to 0.11.0
..

WIP IMPALA-7825: Upgrade Thrift version to 0.11.0

Before this patch Impala mainly used Thrift 0.9.3, but it was
possible to compile Impala shell with Thrift 0.11.0, so the 0.11.0
Thrift lib was already included in toolchain.

Most of the changes are related to replacing boost:: with std::
shared_ptr-s in cpp code (this is a continuation of patch by Vihang).

The Thrift upgrade also needs an Impyla relase with Thrift 0.11.0, as
Impala's test framework relies on Impyla. A thrift_sasl release is also
needed, because it currently pins Thrift version to 0.9.3 for Python 2.

The current patch uses alpha releases from Impyla and thrift_sasl that
use thrift 0.11.0.

Testing:
- ran Impyla's test suite with Python 2 and 3

Other TODOs:
- remove preexisting extra logic needed to use 0.11.0 for python
- the thrift compilation was changed to generated template code -
  this was an easy solution to avoid a compilation issue after
  merging IMPALA-10600, but I should check its effect on compilation
  time

Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6
---
M be/src/benchmarks/network-perf-benchmark.cc
M be/src/catalog/catalog-server.h
M be/src/catalog/catalog-service-client-wrapper.h
M be/src/catalog/catalogd-main.cc
M be/src/rpc/TAcceptQueueServer.cpp
M be/src/rpc/TAcceptQueueServer.h
M be/src/rpc/auth-provider.h
M be/src/rpc/authentication.cc
M be/src/rpc/hs2-http-test.cc
M be/src/rpc/thrift-client.h
M be/src/rpc/thrift-server-test.cc
M be/src/rpc/thrift-server.cc
M be/src/rpc/thrift-server.h
M be/src/rpc/thrift-thread.cc
M be/src/rpc/thrift-thread.h
M be/src/rpc/thrift-util.cc
M be/src/rpc/thrift-util.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/service/impalad-main.cc
M be/src/statestore/statestore-service-client-wrapper.h
M be/src/statestore/statestore-subscriber-client-wrapper.h
M be/src/statestore/statestore-subscriber.cc
M be/src/statestore/statestore-subscriber.h
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
M be/src/testutil/in-process-servers.h
M be/src/transport/THttpServer.cpp
M be/src/transport/THttpServer.h
M be/src/transport/THttpTransport.cpp
M be/src/transport/THttpTransport.h
M be/src/transport/TSaslClientTransport.cpp
M be/src/transport/TSaslClientTransport.h
M be/src/transport/TSaslServerTransport.cpp
M be/src/transport/TSaslServerTransport.h
M be/src/transport/TSaslTransport.cpp
M be/src/transport/TSaslTransport.h
M be/src/util/parquet-reader.cc
M bin/impala-config.sh
M common/thrift/CMakeLists.txt
M infra/python/deps/requirements.txt
M java/pom.xml
M shell/ext-py/thrift_sasl-0.4.2/setup.py
43 files changed, 189 insertions(+), 189 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/17170/4
--
To view, visit http://gerrit.cloudera.org:8080/17170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6
Gerrit-Change-Number: 17170
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10657: Remove accidental usage of shaded imports

2021-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17314 )

Change subject: IMPALA-10657: Remove accidental usage of shaded imports
..


Patch Set 2: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/17314
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifa67aec96539db7861416f3e71cca83607e3d8c9
Gerrit-Change-Number: 17314
Gerrit-PatchSet: 2
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Comment-Date: Wed, 14 Apr 2021 17:40:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10657: Remove accidental usage of shaded imports

2021-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/17314 )

Change subject: IMPALA-10657: Remove accidental usage of shaded imports
..

IMPALA-10657: Remove accidental usage of shaded imports

- This changes seemingly accidental usages of shaded imports
  with the direct dependency
  - This is helpful to reduce confusion and possibly reduce the
  required jars for certain usages of the frontend jars

Change-Id: Ifa67aec96539db7861416f3e71cca83607e3d8c9
Reviewed-on: http://gerrit.cloudera.org:8080/17314
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java
M fe/src/main/java/org/apache/impala/catalog/Transaction.java
M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java
M fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java
4 files changed, 6 insertions(+), 4 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/17314
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ifa67aec96539db7861416f3e71cca83607e3d8c9
Gerrit-Change-Number: 17314
Gerrit-PatchSet: 3
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 


[Impala-ASF-CR] WIP IMPALA-7825: Upgrade Thrift version to 0.11.0

2021-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17170 )

Change subject: WIP IMPALA-7825: Upgrade Thrift version to 0.11.0
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7074/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/17170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6
Gerrit-Change-Number: 17170
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 14 Apr 2021 17:28:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-7825: Upgrade Thrift version to 0.11.0

2021-04-14 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/17170 )

Change subject: WIP IMPALA-7825: Upgrade Thrift version to 0.11.0
..

WIP IMPALA-7825: Upgrade Thrift version to 0.11.0

Before this patch Impala mainly used Thrift 0.9.3, but it was
possible to compile Impala shell with Thrift 0.11.0, so the 0.11.0
Thrift lib was already included in toolchain.

Most of the changes are related to replacing boost:: with std::
shared_ptr-s in cpp code (this is a continuation of patch by Vihang).

The Thrift upgrade also needs an Impyla relase with Thrift 0.11.0, as
Impala's test framework relies on Impyla. A thrift_sasl release is also
needed, because it currently pins Thrift version to 0.9.3 for Python 2.

The current patch uses alpha releases from Impyla and thrift_sasl that
use thrift 0.11.0.

Testing:
- ran Impyla's test suite with Python 2 and 3

Other TODOs:
- remove preexisting extra logic needed to use 0.11.0 for python
- the thrift compilation was changed to generated template code -
  this was an easy solution to avoid a compilation issue after
  merging IMPALA-10600, but I should check its effect on compilation
  time

Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6
---
M be/src/benchmarks/network-perf-benchmark.cc
M be/src/catalog/catalog-server.h
M be/src/catalog/catalog-service-client-wrapper.h
M be/src/catalog/catalogd-main.cc
M be/src/rpc/TAcceptQueueServer.cpp
M be/src/rpc/TAcceptQueueServer.h
M be/src/rpc/auth-provider.h
M be/src/rpc/authentication.cc
M be/src/rpc/hs2-http-test.cc
M be/src/rpc/thrift-client.h
M be/src/rpc/thrift-server-test.cc
M be/src/rpc/thrift-server.cc
M be/src/rpc/thrift-server.h
M be/src/rpc/thrift-thread.cc
M be/src/rpc/thrift-thread.h
M be/src/rpc/thrift-util.cc
M be/src/rpc/thrift-util.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/service/impalad-main.cc
M be/src/statestore/statestore-service-client-wrapper.h
M be/src/statestore/statestore-subscriber-client-wrapper.h
M be/src/statestore/statestore-subscriber.cc
M be/src/statestore/statestore-subscriber.h
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
M be/src/testutil/in-process-servers.h
M be/src/transport/THttpServer.cpp
M be/src/transport/THttpServer.h
M be/src/transport/THttpTransport.cpp
M be/src/transport/THttpTransport.h
M be/src/transport/TSaslClientTransport.cpp
M be/src/transport/TSaslClientTransport.h
M be/src/transport/TSaslServerTransport.cpp
M be/src/transport/TSaslServerTransport.h
M be/src/transport/TSaslTransport.cpp
M be/src/transport/TSaslTransport.h
M be/src/util/parquet-reader.cc
M bin/impala-config.sh
M common/thrift/CMakeLists.txt
M infra/python/deps/requirements.txt
M java/pom.xml
M shell/ext-py/thrift_sasl-0.4.2/setup.py
43 files changed, 189 insertions(+), 190 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/17170/3
--
To view, visit http://gerrit.cloudera.org:8080/17170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6
Gerrit-Change-Number: 17170
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer 


[Impala-ASF-CR] IMPALA-10658: LOAD DATA INPATH silently fails between HDFS and Azure ABFS

2021-04-14 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17316 )

Change subject: IMPALA-10658: LOAD DATA INPATH silently fails between HDFS and 
Azure ABFS
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17316/1/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

http://gerrit.cloudera.org:8080/#/c/17316/1/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@588
PS1, Line 588:   return fs.exists(path);
If the qualified paths aren't the same and fs.exists(path) returns true, do we 
know they are on the same filesystem?

I'm wondering whether this should just return false.



--
To view, visit http://gerrit.cloudera.org:8080/17316
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id807e8a200b83283a09d3a917185cabab930017d
Gerrit-Change-Number: 17316
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Comment-Date: Wed, 14 Apr 2021 16:52:57 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10654: Fix precision loss in DecimalValue to double conversion.

2021-04-14 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17303 )

Change subject: IMPALA-10654: Fix precision loss in DecimalValue to double 
conversion.
..


Patch Set 5:

(4 comments)

The change looks great. A test with Parquet would be nice, other than that I 
only found nitpicks.

When you upload a new PS please reply to the comments. Most of the time 
clicking on "Done" is enough. This way we'll know we won't left anything open.

http://gerrit.cloudera.org:8080/#/c/17303/5/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

http://gerrit.cloudera.org:8080/#/c/17303/5/be/src/runtime/decimal-value.inline.h@753
PS5, Line 753: (
nit: parentheses not needed


http://gerrit.cloudera.org:8080/#/c/17303/5/be/src/runtime/decimal-value.inline.h@754
PS5, Line 754:
nit: we indent with 4 spaces if the line belongs to the same statement as the 
previous line.


http://gerrit.cloudera.org:8080/#/c/17303/5/be/src/runtime/decimal-value.inline.h@755
PS5, Line 755: is_negative, 
nit: I think we should either put each argument to a separate line, or put all 
of them to the previous line. In this case I think the latter makes more sense.


http://gerrit.cloudera.org:8080/#/c/17303/5/testdata/workloads/functional-query/queries/QueryTest/values.test
File testdata/workloads/functional-query/queries/QueryTest/values.test:

http://gerrit.cloudera.org:8080/#/c/17303/5/testdata/workloads/functional-query/queries/QueryTest/values.test@104
PS5, Line 104: For write path, the default precision of 16 is a limitation yet 
to be
That's true for text tables, but could you please add a test with a Parquet 
table? I think we should getter better precision there.



--
To view, visit http://gerrit.cloudera.org:8080/17303
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56f0652cb8f81a491b87d9b108a94c00ae6c99a1
Gerrit-Change-Number: 17303
Gerrit-PatchSet: 5
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 14 Apr 2021 16:31:32 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

2021-04-14 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
..


Patch Set 6: Code-Review+1

(4 comments)

Looked at the backend, the change looks great!

http://gerrit.cloudera.org:8080/#/c/17313/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17313/6//COMMIT_MSG@17
PS6, Line 17: d,
nit: some lines are longer than 72 chars


http://gerrit.cloudera.org:8080/#/c/17313/6/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

http://gerrit.cloudera.org:8080/#/c/17313/6/be/src/exec/hdfs-table-sink.cc@484
PS6, Line 484: current_file
+1 for the rename :)


http://gerrit.cloudera.org:8080/#/c/17313/6/be/src/runtime/dml-exec-state.h
File be/src/runtime/dml-exec-state.h:

http://gerrit.cloudera.org:8080/#/c/17313/6/be/src/runtime/dml-exec-state.h@63
PS6, Line 63: staging_dir_to_clean_up
nit: could you please mention this parameter in the comment?


http://gerrit.cloudera.org:8080/#/c/17313/6/be/src/runtime/dml-exec-state.cc
File be/src/runtime/dml-exec-state.cc:

http://gerrit.cloudera.org:8080/#/c/17313/6/be/src/runtime/dml-exec-state.cc@496
PS6, Line 496: }
nit: add extra line



--
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 14 Apr 2021 16:12:40 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-14 Thread fifteencai (Code Review)
fifteencai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..


Patch Set 15:

Resolved assert problem in newly added unit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 15
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: fifteencai 
Gerrit-Comment-Date: Wed, 14 Apr 2021 15:18:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..


Patch Set 15:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8577/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 15
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: fifteencai 
Gerrit-Comment-Date: Wed, 14 Apr 2021 14:50:25 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..


Patch Set 14:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8576/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 14
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: fifteencai 
Gerrit-Comment-Date: Wed, 14 Apr 2021 14:42:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-14 Thread fifteencai (Code Review)
fifteencai has uploaded a new patch set (#15). ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..

IMPALA-10445: Adjust NDV's scale with query option

This is a new way to control NDV's scale.

Since IMPALA-2658, we can trade memory for more accurate
estimation by setting larger `scale` in SQL function
NDV(, ). However the use of larger NDV scale requires
the modification of SQL queries which may not be practical in certain
applications:

- Firstly, SQL writers are reluctant to lower that scale. They prone
to fill up the scale, which will make the cluster unstable. Especially
when there are `group by`s with high cardinalities. So it is wiser to
let cluster admin other than sql writer choose appropriate scale.

- Secondly, In some application scenarios, queries are stored in DBs.
In a BI system, for example, rewriting thousands of SQLs is risky.

In this commit, we introduced a new Query Option `DEFAULT_NDV_SCALE`
with the following semantics:

1. The allowed value is in the range [1..10];
2. Previously, the scale used in NDV() functions was fixed at 2.
Now the scale is provided by the newly added query options.
3. It does not influence the NDV scale for SQL function
NDV(, ) in which the NDV scale is provided by the 2nd
argument .

We also refactored method `Analyze` to make sure APPX_COUNT_DISTINCT
can work with this query option. After this, cluster admins can
substitute `count(distinct )` with `ndv(, scale)`.

Implementation details:

- The default value of DEFAULT_NDV_SCALE is 2, so we won't change
the default ndv behavior.
- We port `CountDistinctToNdv` transform logic from
`SelectStmt.analyze()` to `ExprRewriter`, making it compatible with
further rewrite rules.
- The newly added rewrite rule `DefaultNdvScaleRule` is applied
after `CountDistinctToNdvRule`.

Usage:

To set a default ndv scale:
```
SET DEFAULT_NDV_SCALE = 10;
```

To unset:
```
SET DEFAULT_NDV_SCALE = 2;
```

Here are test results of a typical workload (cardinality=40,090,650):
++
|   Metric| Count Distinct |NDV2|NDV5|NDV10  |
++
|  Memory(GB) |   3.83 |1.84|1.85| 1.89  |
| Duration(s) |  182.89|   30.22|29.72   | 29.24 |
|  ErrorRate  |0%  |1.8%|1.17%   | 0.06% |
++

Testing:
1) Added 3 unit test cases in `ExprRewriteRulesTest`.
2) Added 5 unit test cases in `ExprRewriterTest`.
3) Ran all front-end unit test, passed.
4) Added a new query-option test.

Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
A fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java
A fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java
11 files changed, 315 insertions(+), 34 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17306/15
--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 15
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: fifteencai 


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..


Patch Set 14:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/17306/14/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java:

http://gerrit.cloudera.org:8080/#/c/17306/14/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java@541
PS14, Line 541: 
//--
line too long (96 > 90)


http://gerrit.cloudera.org:8080/#/c/17306/14/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java@546
PS14, Line 546: // CASE 1: DEFAULT_NDV_SCALE=5(same as the value in 
original sql), APPX_COUNT_DISTINCT=True
line too long (95 > 90)


http://gerrit.cloudera.org:8080/#/c/17306/14/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java@552
PS14, Line 552: 
//--
line too long (96 > 90)


http://gerrit.cloudera.org:8080/#/c/17306/14/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java@553
PS14, Line 553: String sql1 = "SELECT ndv(id), ndv(id, 5), count(DISTINCT 
id) FROM functional.alltypes";
line too long (92 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 14
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: fifteencai 
Gerrit-Comment-Date: Wed, 14 Apr 2021 14:23:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-14 Thread fifteencai (Code Review)
fifteencai has uploaded a new patch set (#14). ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..

IMPALA-10445: Adjust NDV's scale with query option

This is a new way to control NDV's scale.

Since IMPALA-2658, we can trade memory for more accurate
estimation by setting larger `scale` in SQL function
NDV(, ). However the use of larger NDV scale requires
the modification of SQL queries which may not be practical in certain
applications:

- Firstly, SQL writers are reluctant to lower that scale. They prone
to fill up the scale, which will make the cluster unstable. Especially
when there are `group by`s with high cardinalities. So it is wiser to
let cluster admin other than sql writer choose appropriate scale.

- Secondly, In some application scenarios, queries are stored in DBs.
In a BI system, for example, rewriting thousands of SQLs is risky.

In this commit, we introduced a new Query Option `DEFAULT_NDV_SCALE`
with the following semantics:

1. The allowed value is in the range [1..10];
2. Previously, the scale used in NDV() functions was fixed at 2.
Now the scale is provided by the newly added query options.
3. It does not influence the NDV scale for SQL function
NDV(, ) in which the NDV scale is provided by the 2nd
argument .

We also refactored method `Analyze` to make sure APPX_COUNT_DISTINCT
can work with this query option. After this, cluster admins can
substitute `count(distinct )` with `ndv(, scale)`.

Implementation details:

- The default value of DEFAULT_NDV_SCALE is 2, so we won't change
the default ndv behavior.
- We port `CountDistinctToNdv` transform logic from
`SelectStmt.analyze()` to `ExprRewriter`, making it compatible with
further rewrite rules.
- The newly added rewrite rule `DefaultNdvScaleRule` is applied
after `CountDistinctToNdvRule`.

Usage:

To set a default ndv scale:
```
SET DEFAULT_NDV_SCALE = 10;
```

To unset:
```
SET DEFAULT_NDV_SCALE = 2;
```

Here are test results of a typical workload (cardinality=40,090,650):
++
|   Metric| Count Distinct |NDV2|NDV5|NDV10  |
++
|  Memory(GB) |   3.83 |1.84|1.85| 1.89  |
| Duration(s) |  182.89|   30.22|29.72   | 29.24 |
|  ErrorRate  |0%  |1.8%|1.17%   | 0.06% |
++

Testing:
1) Added 3 unit test cases in `ExprRewriteRulesTest`.
2) Added 5 unit test cases in `ExprRewriterTest`.
3) Ran all front-end unit test, passed.
4) Added a new query-option test.

Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
A fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java
A fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java
11 files changed, 312 insertions(+), 34 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17306/14
--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 14
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: fifteencai 


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..


Patch Set 13:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8575/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 13
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: fifteencai 
Gerrit-Comment-Date: Wed, 14 Apr 2021 14:20:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10584: Defer advancing read page when reservation is tight.

2021-04-14 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17195 )

Change subject: IMPALA-10584: Defer advancing read page when reservation is 
tight.
..


Patch Set 5:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc
File be/src/runtime/buffered-tuple-stream-test.cc:

http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc@1271
PS5, Line 1271: 
ASSERT_OK(stream.UnpinStream(BufferedTupleStream::UNPIN_ALL_EXCEPT_CURRENT));
Not related to this patch, but I'm confused how do we verify we have advanced 
the read page here. I think this just verify that UnpinStream won't hit any 
DCHECKs. Do you have any ideas on this?


http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc@1300
PS5, Line 1300: TEST_F(SimpleTupleStreamTest, DeferAdvancingReadPage) {
Could you add some comments above this? e.g.
Test that UnpinStream defer advancing the read page when all rows from the read 
page are
attached to a returned RowBatch but got not enough reservation.


http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc@1333
PS5, Line 1333:
Could you add a comment here that we are adding rows without releasing the 
read_batch (i.e. read_batch.Reset())? So we will hit not enough reservation and 
need unpinning the stream to continue.


http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc@1334
PS5, Line 1334: for (int j = 0; j < 2; ++j) {
Could you leave a comment about why do we run this twice? I guess we want to 
test it when 'stream' is pinned and unpinned.

BTW, could you add ASSERT_TRUE(stream.is_pinned()) before the loop?


http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc@1354
PS5, Line 1354: }
Could you add ASSERT_FALSE(stream.is_pinned()) at the end?


http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc@1356
PS5, Line 1356:   }
Do we need read_batch.Reset() as well?


http://gerrit.cloudera.org:8080/#/c/17195/3/be/src/runtime/buffered-tuple-stream.cc
File be/src/runtime/buffered-tuple-stream.cc:

http://gerrit.cloudera.org:8080/#/c/17195/3/be/src/runtime/buffered-tuple-stream.cc@723
PS3, Line 723:
> That is a good point, thanks!
Ah yeah, you are right. The page handle is destroyed in AttachBufferToBatch(). 
Thanks for catching this!


http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream.cc
File be/src/runtime/buffered-tuple-stream.cc:

http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream.cc@728
PS5, Line 728: if (buffer_pool_client_->GetUnusedReservation() < 
last_attached_page_len_) {
 :   // Since attached_to_output_batch is true and 
GetUnusedReservation() is less
 :   // than last_attached_page_len_, the reader is most 
likely has not clean up the
 :   // RowBatch where the read page is attached to. We 
defer advancing the read page
 :   // until the next GetNext() call by the reader (see 
IMPALA-10584).
 :   defer_advancing_read_page = true;
 : }
Maybe I'm misunderstanding something. This still seems tricky for me. I agree 
that this can avoid hitting the DCHECK in this JIRA, because we now won't 
advance to get a read/write page so won't check consistency (i.e. won't find 
that unused reservation become negative).

But in the case when the unused reservation >= last_attached_page_len_, we 
always advance the read page regardless whether the reader has freed the row 
batch buffers. The accounting seems wrong in the period after we advance the 
read page and before the reader frees the attached buffer:

If we advance to get a read/write page in NextReadPage() and save 
default_page_len_ of reservation for a later write page, it seems like 
"stealing" default_page_len_ worth of space from the unused reservation, and 
then after the reader frees the attached buffer, we returns the stealed space 
back to the unused reservation. Although the unused reservation won't be 
negative so won't hit any DCHECKs in this case, it might be possible that we 
run out of unused reservation earlier than it should (if somehow the reader 
doesn't free the buffer for a long time).

I wonder if we should detect whether the reader has freed the attached buffer 
and make the decision by that.

Please correct me if I'm misunderstanding anything.



--
To view, visit http://gerrit.cloudera.org:8080/17195
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a
Gerrit-Change-Number: 17195
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: 

[Impala-ASF-CR] WIP - IMPALA-10642: Write support for Parquet Bloom filters - most common types

2021-04-14 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17262 )

Change subject: WIP - IMPALA-10642: Write support for Parquet Bloom filters - 
most common types
..


Patch Set 3:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/17262/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17262/3//COMMIT_MSG@16
PS3, Line 16: TBL_PROPS
TBL_PROPS will configure bloom filters per column, right?


http://gerrit.cloudera.org:8080/#/c/17262/3//COMMIT_MSG@16
PS3, Line 16:   TBL_PROPS  - write Parquet Bloom filters as set in table 
properties
:   IF_NO_DICT - write Parquet Bloom filters if the row group is not
:fully dictionary encoded
What is the relation of TBL_PROPS and IF_NO_DICT? E.g. with TBL_PROPS will you 
write bloom filter even if the column dictionary encoded?


http://gerrit.cloudera.org:8080/#/c/17262/1/be/src/exec/parquet/hdfs-parquet-table-writer.cc
File be/src/exec/parquet/hdfs-parquet-table-writer.cc:

http://gerrit.cloudera.org:8080/#/c/17262/1/be/src/exec/parquet/hdfs-parquet-table-writer.cc@605
PS1, Line 605:   // The ParquetBloomFilter object if one is being written. If
 :   // 'ShouldInitParquetBloomFilter()' is false, the combination 
of the impala type and the
 :   // parquet type is not supported or some error occurs during 
the initialisation of the
 :   // ParquetBloomFilter object, it is set to NULL.
 :   unique_ptr I find using dst_ptr a bit risky:
I like the solution, but I disagree here:
"in case of int8 and int16 we need to pad them to int32 because parquet doesn't 
support smaller ints, so in these cases we cannot use the buffer, adding even 
more special cases."
Why can't we use the buffer? dst_ptr points to the plain encoded page buffer, 
where the i16/i8 are already converted to i32


http://gerrit.cloudera.org:8080/#/c/17262/3/be/src/exec/parquet/hdfs-parquet-table-writer.cc
File be/src/exec/parquet/hdfs-parquet-table-writer.cc:

http://gerrit.cloudera.org:8080/#/c/17262/3/be/src/exec/parquet/hdfs-parquet-table-writer.cc@493
PS3, Line 493: if (parent_
I would prefer to move this to a separate function like 
FlushDictionaryToBloomFilterIfNeeded()

ProcessValue() is a critical function when trying to understand how Impala 
writes Parquet files, so I think that we should try to minimize noise.


http://gerrit.cloudera.org:8080/#/c/17262/3/be/src/exec/parquet/hdfs-parquet-table-writer.cc@542
PS3, Line 542: if (ShouldUpdateParquetBloomFilter()) {
I would prefer to move this to a separate function like 
UpdateBloomFilterIfNeeded()


http://gerrit.cloudera.org:8080/#/c/17262/3/be/src/exec/parquet/hdfs-parquet-table-writer.cc@597
PS3, Line 597: whwn
typo


http://gerrit.cloudera.org:8080/#/c/17262/3/be/src/exec/parquet/parquet-bloom-filter-util.cc
File be/src/exec/parquet/parquet-bloom-filter-util.cc:

http://gerrit.cloudera.org:8080/#/c/17262/3/be/src/exec/parquet/parquet-bloom-filter-util.cc@220
PS3, Line 220: {
nit: brace placement



--
To view, visit http://gerrit.cloudera.org:8080/17262
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie865efd4f0c11b9e111fb94f77d084bf6ee20792
Gerrit-Change-Number: 17262
Gerrit-PatchSet: 3
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 14 Apr 2021 14:03:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..


Patch Set 13:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/17306/13/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java:

http://gerrit.cloudera.org:8080/#/c/17306/13/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java@180
PS13, Line 180:   // Given a list of `rule`, this function checks whether the 
rewritten  is as expected
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/17306/13/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java@181
PS13, Line 181:   // Caution: if no rule in `rules` is expected to be fired, 
should set expectedExprStr to null
line too long (95 > 90)


http://gerrit.cloudera.org:8080/#/c/17306/13/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java@182
PS13, Line 182:   // or "NULL". otherwise, this function would throw an 
exception like "Rule xxx didn't fire"
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/17306/13/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java:

http://gerrit.cloudera.org:8080/#/c/17306/13/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java@541
PS13, Line 541: 
//--
line too long (96 > 90)


http://gerrit.cloudera.org:8080/#/c/17306/13/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java@546
PS13, Line 546: // CASE 1: DEFAULT_NDV_SCALE=5(same as the value in 
original sql), APPX_COUNT_DISTINCT=True
line too long (95 > 90)


http://gerrit.cloudera.org:8080/#/c/17306/13/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java@552
PS13, Line 552: 
//--
line too long (96 > 90)


http://gerrit.cloudera.org:8080/#/c/17306/13/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java@553
PS13, Line 553: String sql1 = "SELECT ndv(id), ndv(id, 5), count(DISTINCT 
id) FROM functional.alltypes";
line too long (92 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 13
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: fifteencai 
Gerrit-Comment-Date: Wed, 14 Apr 2021 14:01:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-14 Thread fifteencai (Code Review)
fifteencai has uploaded a new patch set (#13). ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..

IMPALA-10445: Adjust NDV's scale with query option

This is a new way to control NDV's scale.

Since IMPALA-2658, we can trade memory for more accurate
estimation by setting larger `scale` in SQL function
NDV(, ). However the use of larger NDV scale requires
the modification of SQL queries which may not be practical in certain
applications:

- Firstly, SQL writers are reluctant to lower that scale. They prone
to fill up the scale, which will make the cluster unstable. Especially
when there are `group by`s with high cardinalities. So it is wiser to
let cluster admin other than sql writer choose appropriate scale.

- Secondly, In some application scenarios, queries are stored in DBs.
In a BI system, for example, rewriting thousands of SQLs is risky.

In this commit, we introduced a new Query Option `DEFAULT_NDV_SCALE`
with the following semantics:

1. The allowed value is in the range [1..10];
2. Previously, the scale used in NDV() functions was fixed at 2.
Now the scale is provided by the newly added query options.
3. It does not influence the NDV scale for SQL function
NDV(, ) in which the NDV scale is provided by the 2nd
argument .

We also refactored method `Analyze` to make sure APPX_COUNT_DISTINCT
can work with this query option. After this, cluster admins can
substitute `count(distinct )` with `ndv(, scale)`.

Implementation details:

- The default value of DEFAULT_NDV_SCALE is 2, so we won't change
the default ndv behavior.
- We port `CountDistinctToNdv` transform logic from
`SelectStmt.analyze()` to `ExprRewriter`, making it compatible with
further rewrite rules.
- The newly added rewrite rule `DefaultNdvScaleRule` is applied
after `CountDistinctToNdvRule`.

Usage:

To set a default ndv scale:
```
SET DEFAULT_NDV_SCALE = 10;
```

To unset:
```
SET DEFAULT_NDV_SCALE = 2;
```

Here are test results of a typical workload (cardinality=40,090,650):
++
|   Metric| Count Distinct |NDV2|NDV5|NDV10  |
++
|  Memory(GB) |   3.83 |1.84|1.85| 1.89  |
| Duration(s) |  182.89|   30.22|29.72   | 29.24 |
|  ErrorRate  |0%  |1.8%|1.17%   | 0.06% |
++

Testing:
1) Added 3 unit test cases in `ExprRewriteRulesTest`.
2) Added 5 unit test cases in `ExprRewriterTest`.
3) Ran all front-end unit test, passed.
4) Added a new query-option test.

Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
A fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java
A fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java
11 files changed, 302 insertions(+), 34 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17306/13
--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 13
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: fifteencai 


[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

2021-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8574/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 14 Apr 2021 13:20:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10652: Optimize the checking of the size of incremental stats

2021-04-14 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17299 )

Change subject: IMPALA-10652: Optimize the checking of the size of incremental 
stats
..


Patch Set 3: Code-Review+1

(2 comments)

Looks good!

http://gerrit.cloudera.org:8080/#/c/17299/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/17299/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@452
PS2, Line 452: if (partitionSet_ == null) {
 :   numOfAllIncStatsPartitions = allPartitio
> It is a pre-check on the size of incremental stats to prevent the increment
Okay.

Done.


http://gerrit.cloudera.org:8080/#/c/17299/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

http://gerrit.cloudera.org:8080/#/c/17299/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@1784
PS3, Line 1784: aa
nit. table alltypes?



--
To view, visit http://gerrit.cloudera.org:8080/17299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee
Gerrit-Change-Number: 17299
Gerrit-PatchSet: 3
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Wed, 14 Apr 2021 13:04:50 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

2021-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17313/6/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
File 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java:

http://gerrit.cloudera.org:8080/#/c/17313/6/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@879
PS6, Line 879: unpartTable.getFileSystem(), new 
Path(unpartTable.getHdfsBaseDir()), overwrite,
line too long (91 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 14 Apr 2021 13:03:40 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

2021-04-14 Thread Csaba Ringhofer (Code Review)
Hello Vihang Karajgaonkar, Zoltan Borok-Nagy, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17313

to look at the new patch set (#6).

Change subject: IMPALA-10656: Fire insert events before commit
..

IMPALA-10656: Fire insert events before commit

Before this fix Impala committed an insert first, then reloaded the table
from HMS, and generated the insert events based on the difference between
the two snapshots. (e.g. which file was not present in the old snapshot
but are there in the new one).

Hive replication expects the insert events before the commit, so this may
potentially lead to issues there.

The solution is to collect the new files during the insert in the backend,
and send the insert events based on this file set. This wasn't very hard
to do as we were already collecting the files in some cases:
- to move them from staging dir to their final location in case of
  non-partitioned tables
- to write the file list to snapshot files in case of Iceberg tables
This patch unifies the paths above and collects all information about
the created files regardless of the table type.

Testing:
- no new tests, insert events were already covered in
  test_event_processing.py and MetastoreEventsProcessorTest.java
- ran core tests

Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
---
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-text-table-writer.cc
M be/src/exec/output-partition.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/service/client-request-state.cc
M common/protobuf/control_service.proto
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
12 files changed, 219 insertions(+), 215 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/17313/6
--
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10658: LOAD DATA INPATH silently fails between HDFS and Azure ABFS

2021-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17316 )

Change subject: IMPALA-10658: LOAD DATA INPATH silently fails between HDFS and 
Azure ABFS
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8573/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17316
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id807e8a200b83283a09d3a917185cabab930017d
Gerrit-Change-Number: 17316
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Comment-Date: Wed, 14 Apr 2021 13:02:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-14 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..


Patch Set 12: Code-Review+1

Looks good!


--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 12
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: fifteencai 
Gerrit-Comment-Date: Wed, 14 Apr 2021 13:01:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10584: Defer advancing read page when reservation is tight.

2021-04-14 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17195 )

Change subject: IMPALA-10584: Defer advancing read page when reservation is 
tight.
..


Patch Set 5: Code-Review+1

(1 comment)

Looks good to me!

http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream.cc
File be/src/runtime/buffered-tuple-stream.cc:

http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream.cc@747
PS5, Line 747: the first page must be a read page
May DCHECK() on this assertion here.



--
To view, visit http://gerrit.cloudera.org:8080/17195
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a
Gerrit-Change-Number: 17195
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 14 Apr 2021 12:58:38 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10658: LOAD DATA INPATH silently fails between HDFS and Azure ABFS

2021-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17316 )

Change subject: IMPALA-10658: LOAD DATA INPATH silently fails between HDFS and 
Azure ABFS
..


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7073/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/17316
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id807e8a200b83283a09d3a917185cabab930017d
Gerrit-Change-Number: 17316
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 14 Apr 2021 12:42:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10658: LOAD DATA INPATH silently fails between HDFS and Azure ABFS

2021-04-14 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17316


Change subject: IMPALA-10658: LOAD DATA INPATH silently fails between HDFS and 
Azure ABFS
..

IMPALA-10658: LOAD DATA INPATH silently fails between HDFS and Azure ABFS

LOAD DATA INPATH silently fails when Impala tries to move files from
HDFS to ABFS. The problem is that we use FileSystem.makeQualified(Path)
to decide if path is on a given filesystem. We expect to get an
IllegalArgumentException if path is on a different filesystem. However,
the Azure FileSystem implementation doesn't throw this exception.

Because of that Impala thinks that an 'hdfs://' path and an 'abfs://'
path is on the same filesystem, so it tries to move files with
FileSystem.rename(). In case of errors rename() might throw an
Exception, or return false. Impala doesn't check the return value,
therefore if rename() returns false then the error remains silent.

This patch fixes Impala's isPathOnFileSystem() and also adds a check
for the return value of rename().

Testing:
 * tested manually between HDFS and Azure ABFS.

Change-Id: Id807e8a200b83283a09d3a917185cabab930017d
---
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
1 file changed, 8 insertions(+), 6 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/17316/1
--
To view, visit http://gerrit.cloudera.org:8080/17316
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Id807e8a200b83283a09d3a917185cabab930017d
Gerrit-Change-Number: 17316
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

2021-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in 
coordinator
..


Patch Set 21:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7072/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 21
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 14 Apr 2021 12:03:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10657: Remove accidental usage of shaded imports

2021-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17314 )

Change subject: IMPALA-10657: Remove accidental usage of shaded imports
..


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7071/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/17314
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifa67aec96539db7861416f3e71cca83607e3d8c9
Gerrit-Change-Number: 17314
Gerrit-PatchSet: 2
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Comment-Date: Wed, 14 Apr 2021 12:01:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10532: TestOverlapMinMaxFilters.test overlap min max filters seems flaky

2021-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17289 )

Change subject: IMPALA-10532: 
TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky
..


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7070/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/17289
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I527de530f7db1ce959e7ef2ae3ced18677221c9f
Gerrit-Change-Number: 17289
Gerrit-PatchSet: 5
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 14 Apr 2021 11:57:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

2021-04-14 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in 
coordinator
..


Patch Set 21:

> Patch Set 21:
>
> Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7067/ 
> DRY_RUN=false

I think we need to rerun the gerrit-verify-dryrun since jenkins.impala.io down 
yesterday.


--
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 21
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 14 Apr 2021 11:39:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

2021-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8572/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 14 Apr 2021 10:43:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

2021-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8571/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 14 Apr 2021 10:40:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

2021-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17313/5/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
File 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java:

http://gerrit.cloudera.org:8080/#/c/17313/5/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@879
PS5, Line 879: unpartTable.getFileSystem(), new 
Path(unpartTable.getHdfsBaseDir()), overwrite,
line too long (91 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 14 Apr 2021 10:24:00 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

2021-04-14 Thread Csaba Ringhofer (Code Review)
Hello Vihang Karajgaonkar, Zoltan Borok-Nagy, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17313

to look at the new patch set (#5).

Change subject: IMPALA-10656: Fire insert events before commit
..

IMPALA-10656: Fire insert events before commit

Before this fix Impala committed an insert first, then reloaded the table
from HMS, and generated the insert events based on the difference between
the two snapshots. (e.g. which file was not present in the old snapshot
but are there in the new one).

Hive replication expects the insert events before the commit, so this may
potentially lead to issues there.

The solution is to collect the new files during the insert in the backend,
and send the insert events based on this file set. This wasn't very hard
to do as we were already collecting the files in some cases:
- to move them from staging dir to their final location in case of
  non-partitioned tables
- to write the file list to snapshot files in case of Iceberg tables
This patch unifies the paths above and collects all information about
the created files regardless of the table type.

Testing:
- no new tests, insert events were already covered in
  test_event_processing.py and MetastoreEventsProcessorTest.java
- ran core tests

Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
---
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-text-table-writer.cc
M be/src/exec/output-partition.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/service/client-request-state.cc
M common/protobuf/control_service.proto
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
12 files changed, 220 insertions(+), 216 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/17313/5
--
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

2021-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17313/4/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
File 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java:

http://gerrit.cloudera.org:8080/#/c/17313/4/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@879
PS4, Line 879: unpartTable.getFileSystem(), new 
Path(unpartTable.getHdfsBaseDir()), overwrite,
line too long (91 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 14 Apr 2021 10:22:00 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

2021-04-14 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
..


Patch Set 4:

PS 4 fixes the issue found during the jenkins tests.


--
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 14 Apr 2021 10:21:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

2021-04-14 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
..

IMPALA-10656: Fire insert events before commit

Before this fix Impala committed an insert first, then reloaded the table
from HMS, and generated the insert events based on the difference between
the two snapshots. (e.g. which file was not present in the old snapshot
but are there in the new one).

Hive replication expects the insert events before the commit, so this may
potentially lead to issues there.

The solution is to collect the new files during the insert in the backend,
and send the insert events based on this file set. This wasn't very hard
to do as we were already collecting the files in some cases:
- to move them from staging dir to their final location in case of
  non-partitioned tables
- to write the file list to snapshot files in case of Iceberg tables
This patch unifies the paths above and collects all information about
the created files regardless of the table type.

Testing:
- no new tests, insert events were already covered in
  test_event_processing.py and MetastoreEventsProcessorTest.java
- ran core tests

Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
---
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-text-table-writer.cc
M be/src/exec/output-partition.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/service/client-request-state.cc
M common/protobuf/control_service.proto
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
12 files changed, 225 insertions(+), 216 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/17313/4
--
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10640: Support reading Parquet Bloom filters - most common types

2021-04-14 Thread Tamas Mate (Code Review)
Tamas Mate has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17026 )

Change subject: IMPALA-10640: Support reading Parquet Bloom filters - most 
common types
..


Patch Set 25: Code-Review+1

Thanks everyone, there are a few styling nits and a missing include guard 
(comment on patch set 23) that are still there.
LGTM!


--
To view, visit http://gerrit.cloudera.org:8080/17026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287
Gerrit-Change-Number: 17026
Gerrit-PatchSet: 25
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 14 Apr 2021 08:54:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-14 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..


Patch Set 12: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 12
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: fifteencai 
Gerrit-Comment-Date: Wed, 14 Apr 2021 08:46:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..


Patch Set 11:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/8570/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 11
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: fifteencai 
Gerrit-Comment-Date: Wed, 14 Apr 2021 08:15:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10645: Log catalogd HMS API metrics

2021-04-14 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17284 )

Change subject: IMPALA-10645: Log catalogd HMS API metrics
..


Patch Set 2:

(25 comments)

http://gerrit.cloudera.org:8080/#/c/17284/2/common/thrift/JniCatalog.thrift
File common/thrift/JniCatalog.thrift:

http://gerrit.cloudera.org:8080/#/c/17284/2/common/thrift/JniCatalog.thrift@849
PS2, Line 849:
nit: redundant blank line


http://gerrit.cloudera.org:8080/#/c/17284/2/common/thrift/JniCatalog.thrift@884
PS2, Line 884:
nit: redundant blank line


http://gerrit.cloudera.org:8080/#/c/17284/2/common/thrift/JniCatalog.thrift@909
PS2, Line 909:
nit: redundant blank line


http://gerrit.cloudera.org:8080/#/c/17284/2/common/thrift/JniCatalog.thrift@934
PS2, Line 934:
nit: redundant blank lines


http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2219
PS2, Line 2219: if
nit: need one space after "if"


http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2221
PS2, Line 2221:
nit: I think we use 4 spaces idention here


http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2227
PS2, Line 2227:
nit: 4 spaces idention


http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2230
PS2, Line 2230: if
nit: need one space after "if"


http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2233
PS2, Line 2233:
nit: 4 spaces idention


http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2067
PS2, Line 2067: Counter misses = 
CatalogMonitor.INSTANCE.getCatalogdHmsCacheMetrics().getCounter(FILEMETADATA_CACHE_MISS_METRIC);
Previously we count the metrics per table, this changes it to use a global 
counter. Is it intended?


http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2131
PS2, Line 2131: getFileMetadataCacheHitRate
After this patch, the hit rate will be a global hit rate counted on all tables. 
Is it intended?


http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java
File 
fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java:

http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java@77
PS2, Line 77:   private static final Logger LOG =
:   LoggerFactory.getLogger(CatalogMetastoreServer.class);
nit: don't need changes since the original line fits in 90 chars.


http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java@94
PS2, Line 94:   private static final String ACTIVE_CONNECTIONS_METRIC
:   = "metastore.active.connections";
nit: don't need changes as well.


http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java@99
PS2, Line 99: =
nit: I think our code style tend to put this in the above line


http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java@143
PS2, Line 143:
nit: redundant blank line


http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java@200
PS2, Line 200: throws Throwable {
nit: move to the above line


http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java@201
PS2, Line 201:
nit: redundant blank line


http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java@202
PS2, Line 202: contains
nit: Unnecessary 'contains' check. We can always call add() directly.


http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java@256
PS2, Line 256:   /**
 :*
 :*/
Do you plan to add something here?


http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java@486
PS2, Line 486:
nit: redundant blank line


http://gerrit.cloudera.org:8080/#/c/17284/2/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java@488
PS2, Line 

[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table cache for hms ddl apis which modify tables and partitions.

2021-04-14 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17298 )

Change subject: IMPALA-10648: Invalidate catalogd table cache for hms ddl apis 
which modify tables and partitions.
..


Patch Set 3:

(13 comments)

The solution looks good to me. Please adjust the code style to match existing 
codes. Thanks!

http://gerrit.cloudera.org:8080/#/c/17298/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17298/3//COMMIT_MSG@9
PS3, Line 9: For non transactional tables
Could you explain why we don't do this for transactional tables in the commit 
message?


http://gerrit.cloudera.org:8080/#/c/17298/3//COMMIT_MSG@13
PS3, Line 13: (since table loading in cache takes time) but ensures 
consistency. This change is behind catalogd
nit: please adjust the commit message body to fit into at-most 72 chars per 
line.


http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
File 
fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java:

http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@141
PS3, Line 141: import 
org.apache.hadoop.hive.metastore.api.InvalidPartitionException;
Is this used somewhere?


http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@222
PS3, Line 222: UnknownPartitionException
Is this used somewhere?


http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@347
PS3, Line 347:
nit: We use 4 spaces indention for multi-line statement. However, many changes 
in this patch use 8 spaces. Could you configure your IDE to adjust this?


http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@349
PS3, Line 349:
nit: 4 spaces indention


http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@915
PS3, Line 915: partitionList
Is it possible that these partitions belong to different tables?
* If no, let's call invalidateNonTransactionalTableIfExists() once
* If yes, let's de-duplicate the table names first and then call 
invalidateNonTransactionalTableIfExists() for each table.


http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@929
PS3, Line 929: for (PartitionSpec partitionSpec : list) {
Same question as above


http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@1086
PS3, Line 1086:
nit: 4 spaces indention


http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@1087
PS3, Line 1087: ,
nit: 8 spaces indention and move the comma above


http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@2849
PS3, Line 2849: if
nit: In our code style, we need one space after "if". Many changes in this 
patch don't have it. Please adjust them. Thanks!


http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@2903
PS3, Line 2903: // return immediately if flag invalidateCacheOnDDLs_ is 
false
  : if(!invalidateCacheOnDDLs_) {
  :   LOG.debug("Not removing table {}.{} from catalogd cache 
because " +
  :   "invalidateCacheOnDDLs_ flag is set to {} 
", dbNameWithCatalog,
  :   tableName, invalidateCacheOnDDLs_);
  :   return;
  : }
  : // Parse db name. Throw error if parsing fails.
  : String dbName = dbNameWithCatalog;
  : try {
  :   dbName = MetaStoreUtils.parseDbName(dbNameWithCatalog, 
serverConf_)[1];
  : } catch (MetaException ex) {
  :   LOG.error("Successfully executed HMS api: {} but 
encountered error " +
  :   "when parsing dbName {} to 
invalidate/remove table from cache with error message: {}",
  :   apiName, dbNameWithCatalog, ex.getMessage());
  :   throw ex;
  : }
  : Db db = catalog_.getDb(dbName);
  : if(db == null) {
  :   LOG.debug("Not removing table {}.{} because db {} does 
not exist in catalogd cache",
  :   dbName, tableName, dbName);
  :   return;
  : }
  : if(!db.containsTable(tableName)) {
  :   LOG.debug("Not removing table {}.{} 

[Impala-ASF-CR] IMPALA-10652: Optimize the checking of the size of incremental stats

2021-04-14 Thread liuyao (Code Review)
liuyao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17299 )

Change subject: IMPALA-10652: Optimize the checking of the size of incremental 
stats
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17299/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/17299/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@448
PS2, Line 448: numOfAllIncStats
> May rename the variable to numOfAllIncStatsPartitions
Done


http://gerrit.cloudera.org:8080/#/c/17299/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@452
PS2, Line 452: if (partitionSet_ == null) {
 :   numOfAllIncStatsPartitions = allPartitio
> We may not need to verify the size limit when the partition set for increme
It is a pre-check on the size of incremental stats to prevent the incremental 
stats from occupying too much memory after calculation. If no partition is 
specified, all partitions are calculated. Need to check whether the incremental 
stats of all partitions exceeds the threshold after calculation.



--
To view, visit http://gerrit.cloudera.org:8080/17299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee
Gerrit-Change-Number: 17299
Gerrit-PatchSet: 3
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Wed, 14 Apr 2021 06:18:18 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10652: Optimize the checking of the size of incremental stats

2021-04-14 Thread liuyao (Code Review)
Hello Aman Sinha, Qifan Chen, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17299

to look at the new patch set (#3).

Change subject: IMPALA-10652: Optimize the checking of the size of incremental 
stats
..

IMPALA-10652: Optimize the checking of the size of incremental stats

Modify the estimation method of incremental statistics size:
incremental statistics size = Existing partition statistics
+ This time calculation partition stats
- Repeated calculation partition stats

Testing:
All partitions of a table have no incremental stats.
--Calculate the incremental stats of all partitions,
  the incremental stats size exceeds the threshold,
  an error is reported.
--Calculate the incremental stats of one partition,
  no error is reported.

Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee
---
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
2 files changed, 45 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/17299/3
--
To view, visit http://gerrit.cloudera.org:8080/17299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee
Gerrit-Change-Number: 17299
Gerrit-PatchSet: 3
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-14 Thread fifteencai (Code Review)
fifteencai has uploaded a new patch set (#12). ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..

IMPALA-10445: Adjust NDV's scale with query option

This is a new way to control NDV's scale.

Since IMPALA-2658, we can trade memory for more accurate
estimation by setting larger `scale` in SQL function
NDV(, ). However the use of larger NDV scale requires
the modification of SQL queries which may not be practical in certain
applications:

- Firstly, SQL writers are reluctant to lower that scale. They prone
to fill up the scale, which will make the cluster unstable. Especially
when there are `group by`s with high cardinalities. So it is wiser to
let cluster admin other than sql writer choose appropriate scale.

- Secondly, In some application scenarios, queries are stored in DBs.
In a BI system, for example, rewriting thousands of SQLs is risky.

In this commit, we introduced a new Query Option `DEFAULT_NDV_SCALE`
with the following semantics:

1. The allowed value is in the range [1..10];
2. Previously, the scale used in NDV() functions was fixed at 2.
Now the scale is provided by the newly added query options.
3. It does not influence the NDV scale for SQL function
NDV(, ) in which the NDV scale is provided by the 2nd
argument .

We also refactored method `Analyze` to make sure APPX_COUNT_DISTINCT
can work with this query option. After this, cluster admins can
substitute `count(distinct )` with `ndv(, scale)`.

Implementation details:

- The default value of DEFAULT_NDV_SCALE is 2, so we won't change
the default ndv behavior.
- We port `CountDistinctToNdv` transform logic from
`SelectStmt.analyze()` to `ExprRewriter`, making it compatible with
further rewrite rules.
- The newly added rewrite rule `DefaultNdvScaleRule` is applied
after `CountDistinctToNdvRule`.

Usage:

To set a default ndv scale:
```
SET DEFAULT_NDV_SCALE = 10;
```

To unset:
```
SET DEFAULT_NDV_SCALE = 2;
```

Here are test results of a typical workload (cardinality=40,090,650):
++
|   Metric| Count Distinct |NDV2|NDV5|NDV10  |
++
|  Memory(GB) |   3.83 |1.84|1.85| 1.89  |
| Duration(s) |  182.89|   30.22|29.72   | 29.24 |
|  ErrorRate  |0%  |1.8%|1.17%   | 0.06% |
++

Testing:
1) Added 3 unit test cases in `ExprRewriteRulesTest`.
2) Added 5 unit test cases in `ExprRewriterTest`.
3) Ran all front-end unit test, passed.
4) Added a new query-option test.

Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
A fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java
A fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java
11 files changed, 298 insertions(+), 34 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17306/12
--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 12
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: fifteencai