[Impala-ASF-CR] IMPALA-10677: Set selectivity of Not-equal

2021-04-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17344 )

Change subject: IMPALA-10677: Set selectivity of Not-equal
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8644/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17344
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icd6f5945840ea2a8194d72aa440ddfa6915cbb3a
Gerrit-Change-Number: 17344
Gerrit-PatchSet: 1
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 27 Apr 2021 04:25:25 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10652: Optimize the checking of the size of incremental stats

2021-04-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17299 )

Change subject: IMPALA-10652: Optimize the checking of the size of incremental 
stats
..


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7103/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/17299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee
Gerrit-Change-Number: 17299
Gerrit-PatchSet: 4
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Tue, 27 Apr 2021 04:07:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10677: Set selectivity of Not-equal

2021-04-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17344 )

Change subject: IMPALA-10677: Set selectivity of Not-equal
..


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7102/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/17344
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icd6f5945840ea2a8194d72aa440ddfa6915cbb3a
Gerrit-Change-Number: 17344
Gerrit-PatchSet: 1
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 27 Apr 2021 04:06:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10677: Set selectivity of Not-equal

2021-04-26 Thread liuyao (Code Review)
liuyao has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17344


Change subject: IMPALA-10677: Set selectivity of Not-equal
..

IMPALA-10677: Set selectivity of Not-equal

Calculate binary predicate selectivity if one of the children is
a slotref and the other children are all constant.
eg. something like "col = 5", but not "2 * col = 10"

selectivity = 1 - 1/ndv

Testing:
Modify the function testNeSelectivity() of the
ExprCardinalityTest.java, change -1 to the correct value.

Change-Id: Icd6f5945840ea2a8194d72aa440ddfa6915cbb3a
---
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java
M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/card-scan.test
M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test
12 files changed, 60 insertions(+), 57 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/17344/1
--
To view, visit http://gerrit.cloudera.org:8080/17344
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Icd6f5945840ea2a8194d72aa440ddfa6915cbb3a
Gerrit-Change-Number: 17344
Gerrit-PatchSet: 1
Gerrit-Owner: liuyao 


[Impala-ASF-CR] IMPALA-7825: Upgrade Thrift version to 0.11.0

2021-04-26 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17170 )

Change subject: IMPALA-7825: Upgrade Thrift version to 0.11.0
..


Patch Set 23: Code-Review+2

> Patch Set 23:
>
> (7 comments)
>
> Thanks a lot Quanlong for the detailed analysis!
>
> I added more conversions, and now test_shell_interactive.py passes with the 
> non-accelerated protocol.
>
> I like the code less and less though and become unsure about the 
> no_utf8strings option. When reading thrift structures, it makes sense, as we 
> can avoid unnecessary decode + encode pairs if we expect the result in utf8. 
> But when writing, it would be better to convert every 'unicode' to utf8, it 
> too much hassle to do this in the caller.
>
> I think that ideally Thrift would always encode when writing but return 
> string during read based on some option from the protocol, and do this 
> consistently in both accelerated and normal protocol.

Yeah, I think the hassle comes from http://gerrit.cloudera.org:8080/15524 
(IMPALA-3343, IMPALA-9489: Make impala-shell compatible with python 3). Start 
from that patch, we change our internal string type from 'str' to 'unicode' in 
python2:

 from __future__ import unicode_literals

At that point we expect getting 'unicode' from thrift. Now we switch the thrift 
py module to be compiled with no_utf8strings, so we are getting 'str' from 
thrift. This breaks the codes expecting 'unicode' values and needs additional 
converting codes.

To finish the python3 compatibility work in impala-shell, I think we still need 
to insist in importing unicode_literals. I have some thoughts on future items 
(need further discussion).
* using thrift py module without no_utf8strings in Impyla, then Impyla may be 
able to remove the dependency on thriftpy2 in Python3.
* Impyla can provide an option on whether returning 'str' or 'unicode' values 
in python2, and then do neccessary converting at the boundary. In our tests, 
we'd like Impyla returns 'str' values.
* Finally we can get rid of the no_utf8strings option in impala-shell and don't 
need the converting codes added in this patch.

The current patch set LGTM. Thanks for addressing the comments!


--
To view, visit http://gerrit.cloudera.org:8080/17170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6
Gerrit-Change-Number: 17170
Gerrit-PatchSet: 23
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 27 Apr 2021 02:14:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

2021-04-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
..

IMPALA-10656: Fire insert events before commit

Before this fix Impala committed an insert first, then reloaded the
table from HMS, and generated the insert events based on the difference
between the two snapshots. (e.g. which file was not present in the old
snapshot but are there in the new one).

Hive replication expects the insert events before the commit, so this
may potentially lead to issues there.

The solution is to collect the new files during the insert in the
backend, and send the insert events based on this file set. This wasn't
very hard to do as we were already collecting the files in some cases:
- to move them from staging dir to their final location in case of
  non-partitioned tables
- to write the file list to snapshot files in case of Iceberg tables
This patch unifies the paths above and collects all information about
the created files regardless of the table type.

Testing:
- no new tests, insert events were already covered in
  test_event_processing.py and MetastoreEventsProcessorTest.java
- ran core tests

Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Reviewed-on: http://gerrit.cloudera.org:8080/17313
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-text-table-writer.cc
M be/src/exec/output-partition.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/service/client-request-state.cc
M common/protobuf/control_service.proto
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
12 files changed, 247 insertions(+), 226 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 15
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

2021-04-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
..


Patch Set 14: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 14
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 27 Apr 2021 00:41:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10644: RangerAuthorizationFactory cannot be instantiated

2021-04-26 Thread Joe McDonnell (Code Review)
Joe McDonnell has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/17282 )

Change subject: IMPALA-10644: RangerAuthorizationFactory cannot be instantiated
..

IMPALA-10644: RangerAuthorizationFactory cannot be instantiated

Earlier when the GBN was bumped up to 11920537 in commit
1ab1143 some of the solr dependencies were excluded. This causes
RangerAuthorizationFactory to initialization errors.

This patch reverts the dependency exclusion to fix the problem.

Testing:
 - Passes core job

Change-Id: I1b6953b84fd28bb75f97516a3b7f40cd0a12af41
Reviewed-on: http://gerrit.cloudera.org:8080/17282
Tested-by: Impala Public Jenkins 
Reviewed-by: Joe McDonnell 
---
M fe/pom.xml
1 file changed, 6 insertions(+), 8 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Joe McDonnell: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/17282
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I1b6953b84fd28bb75f97516a3b7f40cd0a12af41
Gerrit-Change-Number: 17282
Gerrit-PatchSet: 5
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-10644: RangerAuthorizationFactory cannot be instantiated

2021-04-26 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17282 )

Change subject: IMPALA-10644: RangerAuthorizationFactory cannot be instantiated
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17282
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1b6953b84fd28bb75f97516a3b7f40cd0a12af41
Gerrit-Change-Number: 17282
Gerrit-PatchSet: 4
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Mon, 26 Apr 2021 22:01:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10644: RangerAuthorizationFactory cannot be instantiated

2021-04-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17282 )

Change subject: IMPALA-10644: RangerAuthorizationFactory cannot be instantiated
..


Patch Set 4: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/17282
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1b6953b84fd28bb75f97516a3b7f40cd0a12af41
Gerrit-Change-Number: 17282
Gerrit-PatchSet: 4
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Mon, 26 Apr 2021 21:56:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis

2021-04-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17298 )

Change subject: IMPALA-10648: Invalidate catalogd table metadata cache for HMS 
DDL apis
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8643/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17298
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98
Gerrit-Change-Number: 17298
Gerrit-PatchSet: 8
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Mon, 26 Apr 2021 20:03:43 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis

2021-04-26 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17298 )

Change subject: IMPALA-10648: Invalidate catalogd table metadata cache for HMS 
DDL apis
..


Patch Set 8:

Thanks. The changes looks good to me.


--
To view, visit http://gerrit.cloudera.org:8080/17298
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98
Gerrit-Change-Number: 17298
Gerrit-PatchSet: 8
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Mon, 26 Apr 2021 20:02:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis

2021-04-26 Thread Sourabh Goyal (Code Review)
Sourabh Goyal has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17298 )

Change subject: IMPALA-10648: Invalidate catalogd table metadata cache for HMS 
DDL apis
..


Patch Set 8:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/17298/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17298/5//COMMIT_MSG@9
PS5, Line 9: For transactional tables, catalogd already guarantees consitent 
table
   : metadata reads
> nit, Can you please reformat this commit msg to 72 line width as per the co
Ack


http://gerrit.cloudera.org:8080/#/c/17298/7/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
File 
fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java:

http://gerrit.cloudera.org:8080/#/c/17298/7/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@2966
PS7, Line 2966: n
> I found that removing the table directly from catalog_ doesn't take the met
Sure.


http://gerrit.cloudera.org:8080/#/c/17298/7/tests/custom_cluster/test_metastore_service.py
File tests/custom_cluster/test_metastore_service.py:

http://gerrit.cloudera.org:8080/#/c/17298/7/tests/custom_cluster/test_metastore_service.py@439
PS7, Line 439: invalid
> nit, s/removed/invalidated
Ack


http://gerrit.cloudera.org:8080/#/c/17298/7/tests/custom_cluster/test_metastore_service.py@442
PS7, Line 442: removed
> nit, s/removed/invalidated
For drop case, we remove (and not invalidate) from the cache.



--
To view, visit http://gerrit.cloudera.org:8080/17298
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98
Gerrit-Change-Number: 17298
Gerrit-PatchSet: 8
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Mon, 26 Apr 2021 19:43:26 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis

2021-04-26 Thread Sourabh Goyal (Code Review)
Hello Quanlong Huang, Vihang Karajgaonkar, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17298

to look at the new patch set (#8).

Change subject: IMPALA-10648: Invalidate catalogd table metadata cache for HMS 
DDL apis
..

IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis

For transactional tables, catalogd already guarantees consitent table
metadata reads based on the writeIdList passed in the request. For
non transactional tables, the reads are eventually consistent as in
event processor thread in the background, processes HMS events for the
table and updates its metadata.
In this patch, to ensure strong consistency guarantees for external
tables,we invalidate the table metadata from cache if HMS DDL apis
like alter/drop table/partition are accessed from catalogd's metastore
server. As a result of which, any subsequent get table request fetches
the table from HMS and loads it in cache. This ensures that any
get_table/get_partition requests after DDL operations on same table
return updated table metadata. This behavior has a performance penalty
since metadata loading in cache takes time specially for large tables.
The change is behind catalogd server's flag:
invalidate_hms_cache_on_ddls which is enabled by default. The flag
needs to be turned off in case of a performance bottleneck.

Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98
---
M be/src/catalog/catalog-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M 
fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M tests/custom_cluster/test_metastore_service.py
6 files changed, 517 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/17298/8
--
To view, visit http://gerrit.cloudera.org:8080/17298
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98
Gerrit-Change-Number: 17298
Gerrit-PatchSet: 8
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 


[native-toolchain-CR] IMPALA-10674: Update toolchain ORC libary for better Iceberg support

2021-04-26 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17342 )

Change subject: IMPALA-10674: Update toolchain ORC libary for better Iceberg 
support
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17342
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I72625f4bd6ff3e83ffaaa2c83d31b8ee29c0c35a
Gerrit-Change-Number: 17342
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Comment-Date: Mon, 26 Apr 2021 19:31:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10676: Improve start/stop scripts for Hiveserver and Metastore

2021-04-26 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17340 )

Change subject: IMPALA-10676: Improve start/stop scripts for Hiveserver and 
Metastore
..


Patch Set 1: Code-Review+2

(1 comment)

Left a non-blocking comment below.

http://gerrit.cloudera.org:8080/#/c/17340/1/testdata/bin/run-hive-server.sh
File testdata/bin/run-hive-server.sh:

http://gerrit.cloudera.org:8080/#/c/17340/1/testdata/bin/run-hive-server.sh@145
PS1, Line 145: 30020
nit, It would be good to include this information in the commit message that we 
are now exposing 30020 debug port for HS2.



--
To view, visit http://gerrit.cloudera.org:8080/17340
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9208efdf49f383c5cfb10cd9881272847405a05
Gerrit-Change-Number: 17340
Gerrit-PatchSet: 1
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Mon, 26 Apr 2021 19:31:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

2021-04-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
..


Patch Set 14:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7101/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 14
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 26 Apr 2021 18:57:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

2021-04-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
..


Patch Set 14: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 14
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 26 Apr 2021 18:57:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early

2021-04-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17295 )

Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early
..


Patch Set 14:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8642/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17295
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183
Gerrit-Change-Number: 17295
Gerrit-PatchSet: 14
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 26 Apr 2021 18:26:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early

2021-04-26 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#14). ( 
http://gerrit.cloudera.org:8080/17295 )

Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early
..

IMPALA-10650: Bailout min/max filters in hash join builder early

This change set addresses the weakness in population min/max filters
in the hash join builder by periodically measuring the usefulness of
each such filter and set the 'always_true_' flag to true. Once set
to true, the insertion to such a filter completely skips the steps
from the evaluation of the value from a row to the verification of
the value in the min/max range. This optimization is LLVM-codeded.

In addition, a new flag 'is_min_max_value_present' is added to
TRuntimeFilterTargetDesc to indicate whether the min/max column stats
is present in the query plan. The flag eliminates the need to check
the presence of min/max stats for every row in runtime.

The Insert() methods are optimized with branch predication compiler
hints which yield 4% to 7% improvement for common SQL Integer types.

Testing:
  1. Ran core test;
  2. Ran performance test (TBD).

Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/runtime/runtime-filter-ir.cc
M be/src/util/min-max-filter-ir.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/util/TColumnValueUtil.java
12 files changed, 370 insertions(+), 150 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/17295/14
--
To view, visit http://gerrit.cloudera.org:8080/17295
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183
Gerrit-Change-Number: 17295
Gerrit-PatchSet: 14
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] WIP - IMPALA-10642: Write support for Parquet Bloom filters - most common types

2021-04-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17262 )

Change subject: WIP - IMPALA-10642: Write support for Parquet Bloom filters - 
most common types
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8641/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17262
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie865efd4f0c11b9e111fb94f77d084bf6ee20792
Gerrit-Change-Number: 17262
Gerrit-PatchSet: 7
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 26 Apr 2021 18:03:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

2021-04-26 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
..


Patch Set 13: Code-Review+2

+1 and carrying forward Zoltan's +1 from earlier.


--
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 13
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 26 Apr 2021 17:49:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

2021-04-26 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
..


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17313/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/17313/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4876
PS12, Line 4876: These ACID_WRITE events are collected by HMS and become
   :* visible during commit
> Removed this sentence. I definitely don't want to became a source of false
Thanks.



--
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 12
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 26 Apr 2021 17:49:05 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] WIP - IMPALA-10642: Write support for Parquet Bloom filters - most common types

2021-04-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17262 )

Change subject: WIP - IMPALA-10642: Write support for Parquet Bloom filters - 
most common types
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17262/7/be/src/exec/parquet/hdfs-parquet-table-writer.cc
File be/src/exec/parquet/hdfs-parquet-table-writer.cc:

http://gerrit.cloudera.org:8080/#/c/17262/7/be/src/exec/parquet/hdfs-parquet-table-writer.cc@453
PS7, Line 453:   parquet_bloom_filter_bytes_ = 
parent->parquet_bloom_filter_col_sizes_[column_name()];
line too long (91 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/17262
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie865efd4f0c11b9e111fb94f77d084bf6ee20792
Gerrit-Change-Number: 17262
Gerrit-PatchSet: 7
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 26 Apr 2021 17:45:05 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] WIP - IMPALA-10642: Write support for Parquet Bloom filters - most common types

2021-04-26 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17262 )

Change subject: WIP - IMPALA-10642: Write support for Parquet Bloom filters - 
most common types
..


Patch Set 7:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17262/6/tests/query_test/test_parquet_bloom_filter.py
File tests/query_test/test_parquet_bloom_filter.py:

http://gerrit.cloudera.org:8080/#/c/17262/6/tests/query_test/test_parquet_bloom_filter.py@28
PS6, Line 28: p
> flake8: E126 continuation line over-indented for hanging indent
Done


http://gerrit.cloudera.org:8080/#/c/17262/6/tests/query_test/test_parquet_bloom_filter.py@126
PS6, Line 126: s
> flake8: E226 missing whitespace around arithmetic operator
Done


http://gerrit.cloudera.org:8080/#/c/17262/6/tests/query_test/test_parquet_bloom_filter.py@145
PS6, Line 145: w
> flake8: F841 local variable 'bloom_filter_header' is assigned to but never
Done



--
To view, visit http://gerrit.cloudera.org:8080/17262
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie865efd4f0c11b9e111fb94f77d084bf6ee20792
Gerrit-Change-Number: 17262
Gerrit-PatchSet: 7
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 26 Apr 2021 17:44:18 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] WIP - IMPALA-10642: Write support for Parquet Bloom filters - most common types

2021-04-26 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/17262 )

Change subject: WIP - IMPALA-10642: Write support for Parquet Bloom filters - 
most common types
..

WIP - IMPALA-10642: Write support for Parquet Bloom filters - most common types

This change adds support for writing Parquet Bloom filters for the types
for which read support was added in IMPALA-10640.

Writing of Parquet Bloom filters can be controlled by the
'parquet_bloom_filter_write' query option which has the following
possible values:
  NEVER  - never write Parquet Bloom filters
  IF_NO_DICT - write Parquet Bloom filters if specified in the table
   properties AND if the row group is not fully
   dictionary encoded
  ALWAYS - always write Parquet Bloom filters if specified in the
   table properties, even if the row group is fully
   dictionary encoded

Introduced the 'parquet.bloom.filter.columns' table property. It is a
comma separated pairs of 'col_name:bytes' pairs. The 'bytes' part means
the size of the bitset of the Bloom filter, and is optional. If the size
is not given, it will be the maximal Bloom filter size
(ParquetBloomFilter::MAX_BYTES).
Example: "col1:1024,col2,col4:100'.

Testing:
  - Added a test in tests/query_test/test_parquet_bloom_filter.py that
uses Impala to write the same table as in the test file
'testdata/data/parquet-bloom-filtering.parquet' and checks whether the
Parquet Bloom filter header and bitset are identical.
  - TODO: Test falling back from dict encoding to plain and using Bloom
filters.

Change-Id: Ie865efd4f0c11b9e111fb94f77d084bf6ee20792
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/parquet/hdfs-parquet-table-writer.h
M be/src/exec/parquet/parquet-bloom-filter-util.cc
M be/src/exec/parquet/parquet-bloom-filter-util.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M be/src/util/dict-encoding.h
M be/src/util/parquet-bloom-filter-test.cc
M be/src/util/parquet-bloom-filter.cc
M be/src/util/parquet-bloom-filter.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M tests/query_test/test_parquet_bloom_filter.py
20 files changed, 584 insertions(+), 30 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/17262/7
--
To view, visit http://gerrit.cloudera.org:8080/17262
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie865efd4f0c11b9e111fb94f77d084bf6ee20792
Gerrit-Change-Number: 17262
Gerrit-PatchSet: 7
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-7825: Upgrade Thrift version to 0.11.0

2021-04-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17170 )

Change subject: IMPALA-7825: Upgrade Thrift version to 0.11.0
..


Patch Set 23:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8640/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6
Gerrit-Change-Number: 17170
Gerrit-PatchSet: 23
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 26 Apr 2021 16:45:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7825: Upgrade Thrift version to 0.11.0

2021-04-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17170 )

Change subject: IMPALA-7825: Upgrade Thrift version to 0.11.0
..


Patch Set 22:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8639/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6
Gerrit-Change-Number: 17170
Gerrit-PatchSet: 22
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 26 Apr 2021 16:39:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7825: Upgrade Thrift version to 0.11.0

2021-04-26 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17170 )

Change subject: IMPALA-7825: Upgrade Thrift version to 0.11.0
..


Patch Set 23:

(7 comments)

Thanks a lot Quanlong for the detailed analysis!

I added more conversions, and now test_shell_interactive.py passes with the 
non-accelerated protocol.

I like the code less and less though and become unsure about the no_utf8strings 
option. When reading thrift structures, it makes sense, as we can avoid 
unnecessary decode + encode pairs if we expect the result in utf8. But when 
writing, it would be better to convert every 'unicode' to utf8, it too much 
hassle to do this in the caller.

I think that ideally Thrift would always encode when writing but return string 
during read based on some option from the protocol, and do this consistently in 
both accelerated and normal protocol.

http://gerrit.cloudera.org:8080/#/c/17170/21/shell/impala-shell
File shell/impala-shell:

http://gerrit.cloudera.org:8080/#/c/17170/21/shell/impala-shell@29
PS21, Line 29: 0.1
> stale comment
Done


http://gerrit.cloudera.org:8080/#/c/17170/21/shell/impala_client.py
File shell/impala_client.py:

http://gerrit.cloudera.org:8080/#/c/17170/21/shell/impala_client.py@85
PS21, Line 85: # Helper to decode utf8 encoded str to unicode type in Python 2. 
NOOP in Python 3.
> While calling this on all string fields from thrift, I think we also need t
Done


http://gerrit.cloudera.org:8080/#/c/17170/21/shell/impala_client.py@735
PS21, Line 735:
> I think we need to encode this into 'str' when it's 'unicode' in python2. T
Done


http://gerrit.cloudera.org:8080/#/c/17170/21/shell/impala_client.py@736
PS21, Line 736: ngImpalaHS2Service rpc is ide
> This also contains unicodes, which could lead to an error in ImpalaHttpClie
Done


http://gerrit.cloudera.org:8080/#/c/17170/21/shell/impala_client.py@1120
PS21, Line 1120: _service(
> I think we need to encode this too, if it's unicode in python2.
Done


http://gerrit.cloudera.org:8080/#/c/17170/22/shell/impala_client.py
File shell/impala_client.py:

http://gerrit.cloudera.org:8080/#/c/17170/22/shell/impala_client.py@85
PS22, Line 85: # Helper to decode utf8 encoded str to unicode type in Python 2. 
NOOP in Python 3.
> flake8: E302 expected 2 blank lines, found 1
Done


http://gerrit.cloudera.org:8080/#/c/17170/22/shell/impala_client.py@91
PS22, Line 91:
> flake8: E302 expected 2 blank lines, found 1
Done



--
To view, visit http://gerrit.cloudera.org:8080/17170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6
Gerrit-Change-Number: 17170
Gerrit-PatchSet: 23
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 26 Apr 2021 16:34:04 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7825: Upgrade Thrift version to 0.11.0

2021-04-26 Thread Csaba Ringhofer (Code Review)
Hello Quanlong Huang, Tamas Mate, Qifan Chen, Zoltan Borok-Nagy, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17170

to look at the new patch set (#23).

Change subject: IMPALA-7825: Upgrade Thrift version to 0.11.0
..

IMPALA-7825: Upgrade Thrift version to 0.11.0

Before this patch Impala mainly used Thrift 0.9.3, but it was
possible to compile Impala shell with Thrift 0.11.0, so the 0.11.0
Thrift lib was already included in the toolchain.

Most of the changes are related to replacing boost:: with std::
shared_ptr-s in cpp code (this is a continuation of patch by Sahil).

The Thrift upgrade also needs an Impyla release with Thrift 0.11.0, as
Impala's test framework relies on Impyla. A thrift_sasl release is also
needed, because it currently pins Thrift version to 0.9.3 for Python 2.

The current patch uses alpha releases from Impyla and thrift_sasl that
use thrift 0.11.0.

Notable side effects:
- old logic to compile thrift for impala-shell with 0.11.0 was removed
- impala_shell's utf8 handling had to be updated as the new 0.11.0
  compilation happens with no_utf8strings. This also made things a
  bit faster, e.g the following is ~0.22s instead of ~0.25
  shell/impala_shell.py \
-B -q "select * from functional_parquet.alltypes;" > /dev/null
- THRIFT-3921 changed the stream operators to print an enum's name
  instead of its number, leading to slightly different messages
  in some cases.
- "templates" was added to the thift generator's parameters to avoid
  a compilation issue (related to IMPALA-10600). I didn't notice any
  change in compilation time. This option generated .tcc files with
  templetized readers/writers for Thrift types. Currently we don't
  use these, but they could potentially speed up (de)serialization.

Testing:
- ran Impyla's test suite with Python 2 and 3
- ran core tests

Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6
---
M CMakeLists.txt
M be/src/benchmarks/network-perf-benchmark.cc
M be/src/catalog/catalog-server.h
M be/src/catalog/catalog-service-client-wrapper.h
M be/src/catalog/catalog-util.cc
M be/src/catalog/catalogd-main.cc
M be/src/rpc/TAcceptQueueServer.cpp
M be/src/rpc/TAcceptQueueServer.h
M be/src/rpc/auth-provider.h
M be/src/rpc/authentication.cc
M be/src/rpc/hs2-http-test.cc
M be/src/rpc/thrift-client.h
M be/src/rpc/thrift-server-test.cc
M be/src/rpc/thrift-server.cc
M be/src/rpc/thrift-server.h
M be/src/rpc/thrift-thread.cc
M be/src/rpc/thrift-thread.h
M be/src/rpc/thrift-util.cc
M be/src/rpc/thrift-util.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/service/impalad-main.cc
M be/src/statestore/statestore-service-client-wrapper.h
M be/src/statestore/statestore-subscriber-client-wrapper.h
M be/src/statestore/statestore-subscriber.cc
M be/src/statestore/statestore-subscriber.h
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
M be/src/testutil/in-process-servers.h
M be/src/transport/THttpServer.cpp
M be/src/transport/THttpServer.h
M be/src/transport/THttpTransport.cpp
M be/src/transport/THttpTransport.h
M be/src/transport/TSaslClientTransport.cpp
M be/src/transport/TSaslClientTransport.h
M be/src/transport/TSaslServerTransport.cpp
M be/src/transport/TSaslServerTransport.h
M be/src/transport/TSaslTransport.cpp
M be/src/transport/TSaslTransport.h
M be/src/util/parquet-reader.cc
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M bin/impala-shell.sh
M bin/set-pythonpath.sh
M common/thrift/CMakeLists.txt
M infra/python/deps/requirements.txt
M java/pom.xml
M shell/ext-py/thrift_sasl-0.4.2/setup.py
M shell/impala-shell
M shell/impala_client.py
M shell/impala_shell.py
M shell/make_shell_tarball.sh
M shell/packaging/make_python_package.sh
M shell/shell_output.py
M tests/beeswax/impala_beeswax.py
M tests/conftest.py
M tests/query_test/test_observability.py
M tests/shell/util.py
58 files changed, 258 insertions(+), 310 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/17170/23
--
To view, visit http://gerrit.cloudera.org:8080/17170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6
Gerrit-Change-Number: 17170
Gerrit-PatchSet: 23
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-7825: Upgrade Thrift version to 0.11.0

2021-04-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17170 )

Change subject: IMPALA-7825: Upgrade Thrift version to 0.11.0
..


Patch Set 22:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17170/22/shell/impala_client.py
File shell/impala_client.py:

http://gerrit.cloudera.org:8080/#/c/17170/22/shell/impala_client.py@85
PS22, Line 85: def utf8_decode_if_needed(val):
flake8: E302 expected 2 blank lines, found 1


http://gerrit.cloudera.org:8080/#/c/17170/22/shell/impala_client.py@91
PS22, Line 91: def utf8_encode_if_needed(val):
flake8: E302 expected 2 blank lines, found 1



--
To view, visit http://gerrit.cloudera.org:8080/17170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6
Gerrit-Change-Number: 17170
Gerrit-PatchSet: 22
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 26 Apr 2021 16:21:04 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7825: Upgrade Thrift version to 0.11.0

2021-04-26 Thread Csaba Ringhofer (Code Review)
Hello Quanlong Huang, Tamas Mate, Qifan Chen, Zoltan Borok-Nagy, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17170

to look at the new patch set (#22).

Change subject: IMPALA-7825: Upgrade Thrift version to 0.11.0
..

IMPALA-7825: Upgrade Thrift version to 0.11.0

Before this patch Impala mainly used Thrift 0.9.3, but it was
possible to compile Impala shell with Thrift 0.11.0, so the 0.11.0
Thrift lib was already included in the toolchain.

Most of the changes are related to replacing boost:: with std::
shared_ptr-s in cpp code (this is a continuation of patch by Sahil).

The Thrift upgrade also needs an Impyla release with Thrift 0.11.0, as
Impala's test framework relies on Impyla. A thrift_sasl release is also
needed, because it currently pins Thrift version to 0.9.3 for Python 2.

The current patch uses alpha releases from Impyla and thrift_sasl that
use thrift 0.11.0.

Notable side effects:
- old logic to compile thrift for impala-shell with 0.11.0 was removed
- impala_shell's utf8 handling had to be updated as the new 0.11.0
  compilation happens with no_utf8strings. This also made things a
  bit faster, e.g the following is ~0.22s instead of ~0.25
  shell/impala_shell.py \
-B -q "select * from functional_parquet.alltypes;" > /dev/null
- THRIFT-3921 changed the stream operators to print an enum's name
  instead of its number, leading to slightly different messages
  in some cases.
- "templates" was added to the thift generator's parameters to avoid
  a compilation issue (related to IMPALA-10600). I didn't notice any
  change in compilation time. This option generated .tcc files with
  templetized readers/writers for Thrift types. Currently we don't
  use these, but they could potentially speed up (de)serialization.

Testing:
- ran Impyla's test suite with Python 2 and 3
- ran core tests

Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6
---
M CMakeLists.txt
M be/src/benchmarks/network-perf-benchmark.cc
M be/src/catalog/catalog-server.h
M be/src/catalog/catalog-service-client-wrapper.h
M be/src/catalog/catalog-util.cc
M be/src/catalog/catalogd-main.cc
M be/src/rpc/TAcceptQueueServer.cpp
M be/src/rpc/TAcceptQueueServer.h
M be/src/rpc/auth-provider.h
M be/src/rpc/authentication.cc
M be/src/rpc/hs2-http-test.cc
M be/src/rpc/thrift-client.h
M be/src/rpc/thrift-server-test.cc
M be/src/rpc/thrift-server.cc
M be/src/rpc/thrift-server.h
M be/src/rpc/thrift-thread.cc
M be/src/rpc/thrift-thread.h
M be/src/rpc/thrift-util.cc
M be/src/rpc/thrift-util.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/service/impalad-main.cc
M be/src/statestore/statestore-service-client-wrapper.h
M be/src/statestore/statestore-subscriber-client-wrapper.h
M be/src/statestore/statestore-subscriber.cc
M be/src/statestore/statestore-subscriber.h
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
M be/src/testutil/in-process-servers.h
M be/src/transport/THttpServer.cpp
M be/src/transport/THttpServer.h
M be/src/transport/THttpTransport.cpp
M be/src/transport/THttpTransport.h
M be/src/transport/TSaslClientTransport.cpp
M be/src/transport/TSaslClientTransport.h
M be/src/transport/TSaslServerTransport.cpp
M be/src/transport/TSaslServerTransport.h
M be/src/transport/TSaslTransport.cpp
M be/src/transport/TSaslTransport.h
M be/src/util/parquet-reader.cc
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M bin/impala-shell.sh
M bin/set-pythonpath.sh
M common/thrift/CMakeLists.txt
M infra/python/deps/requirements.txt
M java/pom.xml
M shell/ext-py/thrift_sasl-0.4.2/setup.py
M shell/impala-shell
M shell/impala_client.py
M shell/impala_shell.py
M shell/make_shell_tarball.sh
M shell/packaging/make_python_package.sh
M shell/shell_output.py
M tests/beeswax/impala_beeswax.py
M tests/conftest.py
M tests/query_test/test_observability.py
M tests/shell/util.py
58 files changed, 256 insertions(+), 310 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/17170/22
--
To view, visit http://gerrit.cloudera.org:8080/17170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6
Gerrit-Change-Number: 17170
Gerrit-PatchSet: 22
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10644: RangerAuthorizationFactory cannot be instantiated

2021-04-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17282 )

Change subject: IMPALA-10644: RangerAuthorizationFactory cannot be instantiated
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8638/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17282
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1b6953b84fd28bb75f97516a3b7f40cd0a12af41
Gerrit-Change-Number: 17282
Gerrit-PatchSet: 4
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Mon, 26 Apr 2021 16:08:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10644: RangerAuthorizationFactory cannot be instantiated

2021-04-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17282 )

Change subject: IMPALA-10644: RangerAuthorizationFactory cannot be instantiated
..


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7100/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/17282
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1b6953b84fd28bb75f97516a3b7f40cd0a12af41
Gerrit-Change-Number: 17282
Gerrit-PatchSet: 4
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Mon, 26 Apr 2021 15:58:25 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10644: RangerAuthorizationFactory cannot be instantiated

2021-04-26 Thread Joe McDonnell (Code Review)
Joe McDonnell has uploaded a new patch set (#4) to the change originally 
created by Vihang Karajgaonkar. ( http://gerrit.cloudera.org:8080/17282 )

Change subject: IMPALA-10644: RangerAuthorizationFactory cannot be instantiated
..

IMPALA-10644: RangerAuthorizationFactory cannot be instantiated

Earlier when the GBN was bumped up to 11920537 in commit
1ab1143 some of the solr dependencies were excluded. This causes
RangerAuthorizationFactory to initialization errors.

This patch reverts the dependency exclusion to fix the problem.

Testing:
 - Passes core job

Change-Id: I1b6953b84fd28bb75f97516a3b7f40cd0a12af41
---
M fe/pom.xml
1 file changed, 6 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/17282/4
--
To view, visit http://gerrit.cloudera.org:8080/17282
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1b6953b84fd28bb75f97516a3b7f40cd0a12af41
Gerrit-Change-Number: 17282
Gerrit-PatchSet: 4
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Vihang Karajgaonkar 


[native-toolchain-CR] IMPALA-10674: Update toolchain ORC libary for better Iceberg support

2021-04-26 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17342


Change subject: IMPALA-10674: Update toolchain ORC libary for better Iceberg 
support
..

IMPALA-10674: Update toolchain ORC libary for better Iceberg support

We need the following fixes/features from the ORC library:

* ORC-763: Fix timestamp inconsistencies with Java
* ORC-784: Support setting timezone to timestamp column
* ORC-666: Support timastamp with local timezone (this corresponds
   to the Iceberg TIMESTAMPTZ type)
* ORC-781: Make type annotations available from C++ (this is
   needed for Iceberg column resolution via field ids)

This commit adds the above via formatted patches.

Testing:
 * executed the tests of the ORC library

Change-Id: I72625f4bd6ff3e83ffaaa2c83d31b8ee29c0c35a
---
M buildall.sh
A 
source/orc/orc-1.6.2-patches/0008-ORC-763-C-Fix-ORC-timestamp-inconsistencies-with-Jav.patch
A 
source/orc/orc-1.6.2-patches/0009-ORC-784-C-Support-setting-timezone-to-timestamp-colu.patch
A 
source/orc/orc-1.6.2-patches/0010-ORC-666-C-Support-timestamp-with-local-timezone.patch
A 
source/orc/orc-1.6.2-patches/0011-ORC-781-C-Make-type-annotations-available-from-C.patch
5 files changed, 1,516 insertions(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/native-toolchain 
refs/changes/42/17342/1
--
To view, visit http://gerrit.cloudera.org:8080/17342
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I72625f4bd6ff3e83ffaaa2c83d31b8ee29c0c35a
Gerrit-Change-Number: 17342
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

2021-04-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
..


Patch Set 13:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8637/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 13
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 26 Apr 2021 14:19:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

2021-04-26 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
..


Patch Set 13:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17313/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/17313/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4876
PS12, Line 4876:
   :* 2. If the table is no
> I see that you added this. But I am not sure if this is correct. HMS metada
Removed this sentence. I definitely don't want to became a source of false 
information and I agree that this is implementation details.

It would be the best if there was a public document somewhere that describes 
these concepts and we could link it in situations like this.



--
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 13
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 26 Apr 2021 14:02:56 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

2021-04-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
..


Patch Set 13:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17313/13/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/17313/13/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4880
PS13, Line 4880:* 
https://github.com/apache/hive/blob/25892ea409/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L3251
line too long (114 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 13
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 26 Apr 2021 14:01:11 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

2021-04-26 Thread Csaba Ringhofer (Code Review)
Hello Vihang Karajgaonkar, Zoltan Borok-Nagy, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17313

to look at the new patch set (#13).

Change subject: IMPALA-10656: Fire insert events before commit
..

IMPALA-10656: Fire insert events before commit

Before this fix Impala committed an insert first, then reloaded the
table from HMS, and generated the insert events based on the difference
between the two snapshots. (e.g. which file was not present in the old
snapshot but are there in the new one).

Hive replication expects the insert events before the commit, so this
may potentially lead to issues there.

The solution is to collect the new files during the insert in the
backend, and send the insert events based on this file set. This wasn't
very hard to do as we were already collecting the files in some cases:
- to move them from staging dir to their final location in case of
  non-partitioned tables
- to write the file list to snapshot files in case of Iceberg tables
This patch unifies the paths above and collects all information about
the created files regardless of the table type.

Testing:
- no new tests, insert events were already covered in
  test_event_processing.py and MetastoreEventsProcessorTest.java
- ran core tests

Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
---
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-text-table-writer.cc
M be/src/exec/output-partition.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/service/client-request-state.cc
M common/protobuf/control_service.proto
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
12 files changed, 247 insertions(+), 226 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/17313/13
--
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 13
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10631: Upgrade DataSketches to version 3.0.0

2021-04-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17294 )

Change subject: IMPALA-10631: Upgrade DataSketches to version 3.0.0
..


Patch Set 5: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/17294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I37622a7643d015b80f55b802421eae826aa7a4f9
Gerrit-Change-Number: 17294
Gerrit-PatchSet: 5
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 26 Apr 2021 13:35:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10631: Upgrade DataSketches to version 3.0.0

2021-04-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/17294 )

Change subject: IMPALA-10631: Upgrade DataSketches to version 3.0.0
..

IMPALA-10631: Upgrade DataSketches to version 3.0.0

Upgrade the external DataSketches files CPC/HLL/KLL/Theta to version
3.0.0

tests:
 -Ran the tests from tests/query_test/test_datasketches.py

Change-Id: I37622a7643d015b80f55b802421eae826aa7a4f9
Reviewed-on: http://gerrit.cloudera.org:8080/17294
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/exprs/datasketches-test.cc
M be/src/thirdparty/datasketches/AuxHashMap-internal.hpp
M be/src/thirdparty/datasketches/AuxHashMap.hpp
M be/src/thirdparty/datasketches/CompositeInterpolationXTable.hpp
M be/src/thirdparty/datasketches/CouponHashSet-internal.hpp
M be/src/thirdparty/datasketches/CouponHashSet.hpp
M be/src/thirdparty/datasketches/CouponList-internal.hpp
M be/src/thirdparty/datasketches/CouponList.hpp
M be/src/thirdparty/datasketches/CubicInterpolation.hpp
M be/src/thirdparty/datasketches/HarmonicNumbers.hpp
M be/src/thirdparty/datasketches/Hll4Array-internal.hpp
M be/src/thirdparty/datasketches/Hll4Array.hpp
M be/src/thirdparty/datasketches/Hll6Array-internal.hpp
M be/src/thirdparty/datasketches/Hll6Array.hpp
M be/src/thirdparty/datasketches/Hll8Array-internal.hpp
M be/src/thirdparty/datasketches/Hll8Array.hpp
M be/src/thirdparty/datasketches/HllArray-internal.hpp
M be/src/thirdparty/datasketches/HllArray.hpp
M be/src/thirdparty/datasketches/HllSketch-internal.hpp
M be/src/thirdparty/datasketches/HllSketchImpl.hpp
M be/src/thirdparty/datasketches/HllSketchImplFactory.hpp
M be/src/thirdparty/datasketches/HllUnion-internal.hpp
M be/src/thirdparty/datasketches/HllUtil.hpp
M be/src/thirdparty/datasketches/MurmurHash3.h
M be/src/thirdparty/datasketches/README.md
M be/src/thirdparty/datasketches/RelativeErrorTables.hpp
A be/src/thirdparty/datasketches/bounds_on_ratios_in_sampled_sets.hpp
A be/src/thirdparty/datasketches/bounds_on_ratios_in_theta_sketched_sets.hpp
M be/src/thirdparty/datasketches/cpc_common.hpp
M be/src/thirdparty/datasketches/cpc_compressor.hpp
M be/src/thirdparty/datasketches/cpc_compressor_impl.hpp
M be/src/thirdparty/datasketches/cpc_sketch.hpp
M be/src/thirdparty/datasketches/cpc_sketch_impl.hpp
M be/src/thirdparty/datasketches/cpc_union.hpp
M be/src/thirdparty/datasketches/cpc_union_impl.hpp
M be/src/thirdparty/datasketches/cpc_util.hpp
M be/src/thirdparty/datasketches/hll.hpp
M be/src/thirdparty/datasketches/icon_estimator.hpp
M be/src/thirdparty/datasketches/kll_quantile_calculator.hpp
M be/src/thirdparty/datasketches/kll_quantile_calculator_impl.hpp
M be/src/thirdparty/datasketches/kll_sketch.hpp
M be/src/thirdparty/datasketches/kll_sketch_impl.hpp
M be/src/thirdparty/datasketches/memory_operations.hpp
M be/src/thirdparty/datasketches/theta_a_not_b.hpp
M be/src/thirdparty/datasketches/theta_a_not_b_impl.hpp
A be/src/thirdparty/datasketches/theta_comparators.hpp
A be/src/thirdparty/datasketches/theta_constants.hpp
A be/src/thirdparty/datasketches/theta_helpers.hpp
M be/src/thirdparty/datasketches/theta_intersection.hpp
A be/src/thirdparty/datasketches/theta_intersection_base.hpp
A be/src/thirdparty/datasketches/theta_intersection_base_impl.hpp
M be/src/thirdparty/datasketches/theta_intersection_impl.hpp
A be/src/thirdparty/datasketches/theta_jaccard_similarity.hpp
A be/src/thirdparty/datasketches/theta_jaccard_similarity_base.hpp
A be/src/thirdparty/datasketches/theta_set_difference_base.hpp
A be/src/thirdparty/datasketches/theta_set_difference_base_impl.hpp
M be/src/thirdparty/datasketches/theta_sketch.hpp
M be/src/thirdparty/datasketches/theta_sketch_impl.hpp
M be/src/thirdparty/datasketches/theta_union.hpp
A be/src/thirdparty/datasketches/theta_union_base.hpp
A be/src/thirdparty/datasketches/theta_union_base_impl.hpp
M be/src/thirdparty/datasketches/theta_union_impl.hpp
A be/src/thirdparty/datasketches/theta_update_sketch_base.hpp
A be/src/thirdparty/datasketches/theta_update_sketch_base_impl.hpp
M be/src/thirdparty/datasketches/u32_table.hpp
M be/src/thirdparty/datasketches/u32_table_impl.hpp
66 files changed, 2,646 insertions(+), 1,873 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/17294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I37622a7643d015b80f55b802421eae826aa7a4f9
Gerrit-Change-Number: 17294
Gerrit-PatchSet: 6
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10654: Fix precision loss in DecimalValue to double conversion.

2021-04-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17303 )

Change subject: IMPALA-10654: Fix precision loss in DecimalValue to double 
conversion.
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8636/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17303
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56f0652cb8f81a491b87d9b108a94c00ae6c99a1
Gerrit-Change-Number: 17303
Gerrit-PatchSet: 6
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 26 Apr 2021 12:50:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10654: Fix precision loss in DecimalValue to double conversion.

2021-04-26 Thread Amogh Margoor (Code Review)
Amogh Margoor has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17303 )

Change subject: IMPALA-10654: Fix precision loss in DecimalValue to double 
conversion.
..


Patch Set 5:

> (4 comments)
 >
 > The change looks great. A test with Parquet would be nice, other
 > than that I only found nitpicks.
 >
 > When you upload a new PS please reply to the comments. Most of the
 > time clicking on "Done" is enough. This way we'll know we won't
 > left anything open.

I did reply to the comments but didn't know the reply gets saved as draft and 
needs explicit post later. Sorry about that. I figured that out now, so you 
might get some old replies too.


--
To view, visit http://gerrit.cloudera.org:8080/17303
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56f0652cb8f81a491b87d9b108a94c00ae6c99a1
Gerrit-Change-Number: 17303
Gerrit-PatchSet: 5
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 26 Apr 2021 12:36:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10654: Fix precision loss in DecimalValue to double conversion.

2021-04-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17303 )

Change subject: IMPALA-10654: Fix precision loss in DecimalValue to double 
conversion.
..


Patch Set 6:

(20 comments)

http://gerrit.cloudera.org:8080/#/c/17303/6/be/src/thirdparty/fast_double_parser/fast_double_parser.h
File be/src/thirdparty/fast_double_parser/fast_double_parser.h:

http://gerrit.cloudera.org:8080/#/c/17303/6/be/src/thirdparty/fast_double_parser/fast_double_parser.h@13
PS6, Line 13: #if (defined(sun) || defined(__sun))
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/6/be/src/thirdparty/fast_double_parser/fast_double_parser.h@17
PS6, Line 17: #if defined(__CYGWIN__) || defined(__MINGW32__) || 
defined(__MINGW64__)
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/6/be/src/thirdparty/fast_double_parser/fast_double_parser.h@22
PS6, Line 22:  * Determining whether we should import xlocale.h or not is
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/6/be/src/thirdparty/fast_double_parser/fast_double_parser.h@25
PS6, Line 25: #if defined(FAST_DOUBLE_PARSER_SOLARIS) || 
defined(FAST_DOUBLE_PARSER_CYGWIN)
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/6/be/src/thirdparty/fast_double_parser/fast_double_parser.h@67
PS6, Line 67: #endif //  defined(FAST_DOUBLE_PARSER_SOLARIS) || 
defined(FAST_DOUBLE_PARSER_CYGWIN)
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/6/be/src/thirdparty/fast_double_parser/fast_double_parser.h@84
PS6, Line 84:  * However, we have that
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/6/be/src/thirdparty/fast_double_parser/fast_double_parser.h@86
PS6, Line 86:  * Thus it is possible for a number of the form w * 10^-342 where
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/6/be/src/thirdparty/fast_double_parser/fast_double_parser.h@94
PS6, Line 94:  * Any number of form w * 10^309 where w>= 1 is going to be
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/6/be/src/thirdparty/fast_double_parser/fast_double_parser.h@153
PS6, Line 153: // credit: 
https://stackoverflow.com/questions/28868367/getting-the-high-part-of-64-bit-integer-multiplication
line too long (110 > 90)


http://gerrit.cloudera.org:8080/#/c/17303/6/be/src/thirdparty/fast_double_parser/fast_double_parser.h@154
PS6, Line 154: really_inline uint64_t Emulate64x64to128(uint64_t& r_hi, const 
uint64_t x, const uint64_t y) {
line too long (94 > 90)


http://gerrit.cloudera.org:8080/#/c/17303/6/be/src/thirdparty/fast_double_parser/fast_double_parser.h@159
PS6, Line 159: 
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/6/be/src/thirdparty/fast_double_parser/fast_double_parser.h@224
PS6, Line 224:  */
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/6/be/src/thirdparty/fast_double_parser/fast_double_parser.h@958
PS6, Line 958:
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/6/be/src/thirdparty/fast_double_parser/fast_double_parser.h@960
PS6, Line 960:   // The exponent is 1024 + 63 + power
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/6/be/src/thirdparty/fast_double_parser/fast_double_parser.h@976
PS6, Line 976:   // The 65536 is (1<<16) and corresponds to
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/6/be/src/thirdparty/fast_double_parser/fast_double_parser.h@979
PS6, Line 979:   // ((152170 * power ) >> 16) is equal to
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/6/be/src/thirdparty/fast_double_parser/fast_double_parser.h@980
PS6, Line 980:   // floor(log(5**power)/log(2))
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/6/be/src/thirdparty/fast_double_parser/fast_double_parser.h@982
PS6, Line 982:   // Note that this is not magic: 152170/(1<<16) is
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/6/be/src/thirdparty/fast_double_parser/fast_double_parser.h@984
PS6, Line 984:   // The 1<<16 value is a power of two; we could use a
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/6/be/src/thirdparty/fast_double_parser/fast_double_parser.h@1097
PS6, Line 1097: #if defined(FAST_DOUBLE_PARSER_SOLARIS) || 
defined(FAST_DOUBLE_PARSER_CYGWIN)
line has trailing whitespace



--
To view, visit http://gerrit.cloudera.org:8080/17303
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56f0652cb8f81a491b87d9b108a94c00ae6c99a1
Gerrit-Change-Number: 17303
Gerrit-PatchSet: 6
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 

[Impala-ASF-CR] IMPALA-10654: Fix precision loss in DecimalValue to double conversion.

2021-04-26 Thread Amogh Margoor (Code Review)
Amogh Margoor has uploaded a new patch set (#6). ( 
http://gerrit.cloudera.org:8080/17303 )

Change subject: IMPALA-10654: Fix precision loss in DecimalValue to double 
conversion.
..

IMPALA-10654: Fix precision loss in DecimalValue to double conversion.

Original approach to convert DecimalValue(internal representation
of decimals) to double was not accurate.
It was:
   static_cast(value_) / pow(10.0, scale).
However only integers from −2^53 to 2^53 can be represented
accurately by double precision without any loss.
Hence, it would not work for numbers like -0.43149576573887316.
For DecimalValue representing -0.43149576573887316, value_ would be
-43149576573887316 and scale would be 17. As value_ < -2^53,
 result would not be accurate. In newer approach we are using third
party library https://github.com/lemire/fast_double_parser, which
handles above scenario in a performant manner.

Testing:
1. Added End to End Tests covering following scenarios:
a. Test to show precision limitation of 16 in the write path
b. DecimalValue's value_ between -2^53 and 2^53.
b. value_ outside above range but abs(value_) < UINT64_MAX
c. abs(value_) > UINT64_MAX -covers DecimalValue<__int128_t>
2. Ran existing  backend and end-to-end tests completely

Change-Id: I56f0652cb8f81a491b87d9b108a94c00ae6c99a1
---
M be/src/runtime/decimal-value.inline.h
A be/src/thirdparty/fast_double_parser/LICENSE
A be/src/thirdparty/fast_double_parser/LICENSE.BSL
A be/src/thirdparty/fast_double_parser/README.md
A be/src/thirdparty/fast_double_parser/fast_double_parser.h
M bin/rat_exclude_files.txt
M testdata/workloads/functional-query/queries/QueryTest/values.test
M tests/query_test/test_insert_parquet.py
8 files changed, 1,579 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/17303/6
--
To view, visit http://gerrit.cloudera.org:8080/17303
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I56f0652cb8f81a491b87d9b108a94c00ae6c99a1
Gerrit-Change-Number: 17303
Gerrit-PatchSet: 6
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10640: Support reading Parquet Bloom filters - most common types

2021-04-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17026 )

Change subject: IMPALA-10640: Support reading Parquet Bloom filters - most 
common types
..


Patch Set 29:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8635/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287
Gerrit-Change-Number: 17026
Gerrit-PatchSet: 29
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 26 Apr 2021 12:13:48 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10640: Support reading Parquet Bloom filters - most common types

2021-04-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17026 )

Change subject: IMPALA-10640: Support reading Parquet Bloom filters - most 
common types
..


Patch Set 29:

(182 comments)

http://gerrit.cloudera.org:8080/#/c/17026/29/be/src/thirdparty/xxhash/xxhash.h
File be/src/thirdparty/xxhash/xxhash.h:

http://gerrit.cloudera.org:8080/#/c/17026/29/be/src/thirdparty/xxhash/xxhash.h@70
PS29, Line 70: 
https://fastcompression.blogspot.com/2019/03/presenting-xxh3.html?showComment=1552696407071#c3490092340461170735
line too long (112 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/29/be/src/thirdparty/xxhash/xxhash.h@92
PS29, Line 92:  *  
https://fastcompression.blogspot.com/2018/03/xxhash-for-small-keys-impressive-power.html
line too long (96 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/29/be/src/thirdparty/xxhash/xxhash.h@113
PS29, Line 113: #  elif defined (__cplusplus) || (defined (__STDC_VERSION__) && 
(__STDC_VERSION__ >= 199901L) /* C99 */)
line too long (104 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/29/be/src/thirdparty/xxhash/xxhash.h@243
PS29, Line 243: #  define XXH3_64bits_reset_withSecret XXH_NAME2(XXH_NAMESPACE, 
XXH3_64bits_reset_withSecret)
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/29/be/src/thirdparty/xxhash/xxhash.h@253
PS29, Line 253: #  define XXH3_128bits_reset_withSeed XXH_NAME2(XXH_NAMESPACE, 
XXH3_128bits_reset_withSeed)
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/29/be/src/thirdparty/xxhash/xxhash.h@254
PS29, Line 254: #  define XXH3_128bits_reset_withSecret 
XXH_NAME2(XXH_NAMESPACE, XXH3_128bits_reset_withSecret)
line too long (95 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/29/be/src/thirdparty/xxhash/xxhash.h@270
PS29, Line 270: #define XXH_VERSION_NUMBER  (XXH_VERSION_MAJOR *100*100 + 
XXH_VERSION_MINOR *100 + XXH_VERSION_RELEASE)
line too long (103 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/29/be/src/thirdparty/xxhash/xxhash.h@429
PS29, Line 429:  * @param statePtr A pointer to an @ref XXH32_state_t allocated 
with @ref XXH32_createState().
line too long (94 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/29/be/src/thirdparty/xxhash/xxhash.h@441
PS29, Line 441: XXH_PUBLIC_API void XXH32_copyState(XXH32_state_t* dst_state, 
const XXH32_state_t* src_state);
line too long (94 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/29/be/src/thirdparty/xxhash/xxhash.h@476
PS29, Line 476: XXH_PUBLIC_API XXH_errorcode XXH32_update (XXH32_state_t* 
statePtr, const void* input, size_t length);
line too long (102 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/29/be/src/thirdparty/xxhash/xxhash.h@628
PS29, Line 628: XXH_PUBLIC_API void XXH64_copyState(XXH64_state_t* dst_state, 
const XXH64_state_t* src_state);
line too long (94 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/29/be/src/thirdparty/xxhash/xxhash.h@631
PS29, Line 631: XXH_PUBLIC_API XXH_errorcode XXH64_update (XXH64_state_t* 
statePtr, const void* input, size_t length);
line too long (102 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/29/be/src/thirdparty/xxhash/xxhash.h@700
PS29, Line 700: XXH_PUBLIC_API XXH64_hash_t XXH3_64bits_withSeed(const void* 
data, size_t len, XXH64_hash_t seed);
line too long (98 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/29/be/src/thirdparty/xxhash/xxhash.h@724
PS29, Line 724: XXH_PUBLIC_API XXH64_hash_t XXH3_64bits_withSecret(const void* 
data, size_t len, const void* secret, size_t secretSize);
line too long (120 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/29/be/src/thirdparty/xxhash/xxhash.h@743
PS29, Line 743: XXH_PUBLIC_API void XXH3_copyState(XXH3_state_t* dst_state, 
const XXH3_state_t* src_state);
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/29/be/src/thirdparty/xxhash/xxhash.h@756
PS29, Line 756: XXH_PUBLIC_API XXH_errorcode 
XXH3_64bits_reset_withSeed(XXH3_state_t* statePtr, XXH64_hash_t seed);
line too long (99 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/29/be/src/thirdparty/xxhash/xxhash.h@766
PS29, Line 766: XXH_PUBLIC_API XXH_errorcode 
XXH3_64bits_reset_withSecret(XXH3_state_t* statePtr, const void* secret, size_t 
secretSize);
line too long (121 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/29/be/src/thirdparty/xxhash/xxhash.h@768
PS29, Line 768: XXH_PUBLIC_API XXH_errorcode XXH3_64bits_update (XXH3_state_t* 
statePtr, const void* input, size_t length);
line too long (107 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/29/be/src/thirdparty/xxhash/xxhash.h@791
PS29, Line 791: XXH_PUBLIC_API XXH128_hash_t XXH3_128bits_withSeed(const void* 
data, size_t len, XXH64_hash_t seed);
line too long (100 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/29/be/src/thirdparty/xxhash/xxhash.h@792
PS29, Line 792: XXH_PUBLIC_API XXH128_hash_t XXH3_128bits_withSecret(const 
void* data, size_t len, const void* secret, size_t secretSize);
line too long (122 > 90)


[Impala-ASF-CR] IMPALA-10640: Support reading Parquet Bloom filters - most common types

2021-04-26 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded a new patch set (#29). ( 
http://gerrit.cloudera.org:8080/17026 )

Change subject: IMPALA-10640: Support reading Parquet Bloom filters - most 
common types
..

IMPALA-10640: Support reading Parquet Bloom filters - most common types

This change adds read support for Parquet Bloom filters for types that
can reasonably be supported in Impala. Other types, such as CHAR(N),
would be very difficult to support because the length may be different
in Parquet and in Impala which results in truncation or padding, and
that changes the hash which makes using the Bloom filter impossible.
Write support will be added in a later change.
The supported Parquet type - Impala type pairs are the following:

 ---
|Parquet type |  Impala type|
|---|
|INT32|  TINYINT, SMALLINT, INT |
|INT64|  BIGINT |
|FLOAT|  FLOAT  |
|DOUBLE   |  DOUBLE |
|BYTE_ARRAY   |  STRING |
 ---

The following types are not supported for the given reasons:

 
|Impala type |  Problem  |
||
|VARCHAR(N)  | truncation can change hash|
|CHAR(N) | padding / truncation can change hash  |
|DECIMAL | multiple encodings supported  |
|TIMESTAMP   | multiple encodings supported, timezone conversion |
|DATE| not considered yet|
 

Support may be added for these types later, see IMPALA-10641.

If a Bloom filter is available for a column that is fully dictionary
encoded, the Bloom filter is not used as the dictionary can give exact
results in filtering.

Testing:
  - Added tests/query_test/test_parquet_bloom_filter.py that tests
whether Parquet Bloom filtering works for the supported types and
that we do not incorrectly discard row groups for the unsupported
type VARCHAR. The Parquet file used in the test was generated with
an external tool.
  - Added unit tests for ParquetBloomFilter in file
be/src/util/parquet-bloom-filter-test.cc
  - A minor, unrelated change was done in
be/src/util/bloom-filter-test.cc: the MakeRandom() function had
return type uint64_t, the documentation claimed it returned a 64 bit
random number, but the actual number of random bits is 32, which is
what is intended in the tests. The return type and documentation
have been corrected to use 32 bits.

Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287
---
M LICENSE.txt
M be/src/exec/parquet/CMakeLists.txt
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
A be/src/exec/parquet/parquet-bloom-filter-util.cc
A be/src/exec/parquet/parquet-bloom-filter-util.h
M be/src/exprs/expr-value.h
M be/src/exprs/literal.cc
M be/src/exprs/literal.h
M be/src/runtime/bufferpool/buffer-pool-internal.h
M be/src/runtime/bufferpool/buffer-pool.cc
M be/src/runtime/bufferpool/buffer-pool.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
A be/src/thirdparty/xxhash/README.md
A be/src/thirdparty/xxhash/xxhash.h
M be/src/util/CMakeLists.txt
M be/src/util/bloom-filter-test.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
A be/src/util/impala-bloom-filter-buffer-allocator.cc
A be/src/util/impala-bloom-filter-buffer-allocator.h
A be/src/util/parquet-bloom-filter-avx2.cc
A be/src/util/parquet-bloom-filter-test.cc
A be/src/util/parquet-bloom-filter.cc
A be/src/util/parquet-bloom-filter.h
M bin/jenkins/critique-gerrit-review.py
M bin/rat_exclude_files.txt
M bin/run_clang_tidy.sh
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M common/thrift/parquet.thrift
M testdata/data/README
A testdata/data/parquet-bloom-filtering.parquet
A 
testdata/workloads/functional-query/queries/QueryTest/parquet-bloom-filter-disabled.test
A 
testdata/workloads/functional-query/queries/QueryTest/parquet-bloom-filter.test
A tests/query_test/test_parquet_bloom_filter.py
37 files changed, 7,410 insertions(+), 127 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/17026/29
--
To view, visit http://gerrit.cloudera.org:8080/17026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287
Gerrit-Change-Number: 17026
Gerrit-PatchSet: 29
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan 

[Impala-ASF-CR] IMPALA-10631: Upgrade DataSketches to version 3.0.0

2021-04-26 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17294 )

Change subject: IMPALA-10631: Upgrade DataSketches to version 3.0.0
..


Patch Set 4:

Let's give it a try to re-run the job. We'll see.


--
To view, visit http://gerrit.cloudera.org:8080/17294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I37622a7643d015b80f55b802421eae826aa7a4f9
Gerrit-Change-Number: 17294
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 26 Apr 2021 07:45:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10631: Upgrade DataSketches to version 3.0.0

2021-04-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17294 )

Change subject: IMPALA-10631: Upgrade DataSketches to version 3.0.0
..


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7099/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/17294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I37622a7643d015b80f55b802421eae826aa7a4f9
Gerrit-Change-Number: 17294
Gerrit-PatchSet: 5
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 26 Apr 2021 07:45:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10631: Upgrade DataSketches to version 3.0.0

2021-04-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17294 )

Change subject: IMPALA-10631: Upgrade DataSketches to version 3.0.0
..


Patch Set 5: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I37622a7643d015b80f55b802421eae826aa7a4f9
Gerrit-Change-Number: 17294
Gerrit-PatchSet: 5
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 26 Apr 2021 07:45:25 +
Gerrit-HasComments: No