[Impala-ASF-CR] IMPALA-10489: Implement JWT support

2021-05-21 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has uploaded a new patch set (#9). ( 
http://gerrit.cloudera.org:8080/17435 )

Change subject: IMPALA-10489: Implement JWT support
..

IMPALA-10489: Implement JWT support

This patch added JWT support with following functionality:
 * Load and parse JWKS from pre-installed JSON file.
 * Read the JWT token from the HTTP Header.
 * Verify the JWT's signature with puclic key in JWKS.
 * Get the username out of the payload of JWT token.

We use third party library jwt-cpp to verify JWT token. jwt-cpp is a
headers only C++ library. It was added to native-toolchain.
This patch modified bootstrap_toolchian.py to download jwt-cpp from
toolchain s3 bucket, and modified makefiles to add jwt-cpp/include
in the include path.

Added BE unit-tests for loading JWKS file and verifying JWT token.
Also added FE custom cluster test for JWT authentication.

Testing:
 - Passed core run.

Change-Id: I6b71fa854c9ddc8ca882878853395e1eb866143c
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/rpc/authentication.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/transport/THttpServer.cpp
M be/src/transport/THttpServer.h
M be/src/util/CMakeLists.txt
A be/src/util/jwt-util-test.cc
A be/src/util/jwt-util.cc
A be/src/util/jwt-util.h
M be/src/util/webserver.cc
M be/src/util/webserver.h
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M bin/rat_exclude_files.txt
A cmake_modules/FindJwtCpp.cmake
M common/thrift/generate_error_codes.py
M common/thrift/metrics.json
M fe/src/test/java/org/apache/impala/customcluster/LdapHS2Test.java
M fe/src/test/java/org/apache/impala/customcluster/LdapWebserverTest.java
A testdata/jwt/jwks_rs256.json
22 files changed, 1,502 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/17435/9
--
To view, visit http://gerrit.cloudera.org:8080/17435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6b71fa854c9ddc8ca882878853395e1eb866143c
Gerrit-Change-Number: 17435
Gerrit-PatchSet: 9
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-10489: Implement JWT support

2021-05-21 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17435 )

Change subject: IMPALA-10489: Implement JWT support
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17435/6/be/src/transport/THttpServer.cpp
File be/src/transport/THttpServer.cpp:

http://gerrit.cloudera.org:8080/#/c/17435/6/be/src/transport/THttpServer.cpp@286
PS6, Line 286:   resetAuthState();
 :   returnUnauthorized();
 :   throw TTransportException("HTTP auth failed.");
 : }
 :   }
 :
 :   if (!authorized && use_jwt_token_ && !auth_value_.empty()
 :   && auth_value_.find('.') != string::npos) {
 : // Check Authorization header with the Bearer authentication 
scheme as:
 : // Authorization: Bearer 
 : // A well-formed JWT consists of three concatenated 
Base64url-encoded strings,
 : // se
> Thanks for capturing this case. SAML2 token is base64 encoded XML and shoul
Discussed with Vihang. SAML Bearer token is generated by Impala code after the 
SAML auth flow is completed. Technically we could generate a JWT instead of our 
current implementation of the token at the end of SAML flow. Since Impala 
supports multiple auth mechanism in parallel, in theory we can have SAML and 
JWT configured simultaneously. So it's better to fall back to JWT verification 
after SAML verification fails.



--
To view, visit http://gerrit.cloudera.org:8080/17435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6b71fa854c9ddc8ca882878853395e1eb866143c
Gerrit-Change-Number: 17435
Gerrit-PatchSet: 8
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Sat, 22 May 2021 01:01:53 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10613: (addendum) Fix test on S3 builds

2021-05-21 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17483 )

Change subject: IMPALA-10613: (addendum) Fix test on S3 builds
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6ac291529dc0661abdfc2d4f48924a2c4b807c40
Gerrit-Change-Number: 17483
Gerrit-PatchSet: 3
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Sat, 22 May 2021 00:59:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10681: Improve inner join cardinality estimates

2021-05-21 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17387 )

Change subject: IMPALA-10681: Improve inner join cardinality estimates
..


Patch Set 7: Code-Review+1

Carry Zoltan's +1


--
To view, visit http://gerrit.cloudera.org:8080/17387
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8aa9d3b8f3c4848b3e9414fe19ad7ad348d12ecc
Gerrit-Change-Number: 17387
Gerrit-PatchSet: 7
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Sat, 22 May 2021 00:57:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10681: Improve inner join cardinality estimates

2021-05-21 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17387 )

Change subject: IMPALA-10681: Improve inner join cardinality estimates
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java
File fe/src/main/java/org/apache/impala/planner/JoinNode.java:

http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@430
PS3, Line 430: he NDVs on both sides t
> I will try to refactor these two methods such that a common internal method
I refactored both getGenericJoinCardinality() and getOtherJoinCardinality() by 
having them call a common internal utility method that computes the generic 
join cardinality. The difference between the 2 callers is the manner in which 
the parameters are supplied.



--
To view, visit http://gerrit.cloudera.org:8080/17387
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8aa9d3b8f3c4848b3e9414fe19ad7ad348d12ecc
Gerrit-Change-Number: 17387
Gerrit-PatchSet: 7
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Sat, 22 May 2021 00:56:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10681: Improve inner join cardinality estimates

2021-05-21 Thread Aman Sinha (Code Review)
Hello Qifan Chen, Zoltan Borok-Nagy, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17387

to look at the new patch set (#7).

Change subject: IMPALA-10681: Improve inner join cardinality estimates
..

IMPALA-10681: Improve inner join cardinality estimates

During cardinality estimation for inner joins, if the join
conjunct involves a scan slot on left side and a function
(e.g MAX) on the right, currently we determine that the NDV
stats of either side is not useful and return the left side's
cardinality even though it may be a significant over-estimate.

In this patch, we handle join conjuncts of such types by
keeping them in an 'other' eligible conjuncts list as long as
the NDV for expressions on both sides of the join and the
input row count is available. For example, in the following
cases the NDV is available but was not being used for inner
joins since the previous logic was only looking for scan
slots: (a) int_col = MAX(int_col) and the right input does
not have a group-by, so right NDV = 1 can be used. (b) if it
has a group-by and the group-by columns already have
associated NDV, the combined NDV is also available.
Other such examples exist. An auxiliary struct is introduced
to keep track of the ndv and row count.

Once these 'other' eligible conjuncts are populated, we do the
join cardinality estimation in a manner similar to the normal
join conjuncts by fetching the stats from the auxiliary struct.

Testing:
 - Added new planner tests for inner join cardinality
 - Modified expected plans for certains tests including
   TPC-DS queries and ran end-to-end TPC-DS queries
 - Since TPC-DS plans are complex, I did a check of the cardinality
   changes for some of the hash joins but not the changes in the
   shape of a plan (e.g whether the join order changed).
 - Preliminary tests with a TPC-DS 10 GB scale factor on a single
   node showed between 5-15% performance improvements for 4 of the
   6 queries whose plans changed.

Change-Id: I8aa9d3b8f3c4848b3e9414fe19ad7ad348d12ecc
---
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/card-inner-join.test
M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/partition-key-scans-default.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/partition-key-scans.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q05.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q71.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q74.test
M testdata/workloads/functional-planner/queries/PlannerTest/views.test
15 files changed, 3,719 insertions(+), 3,349 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/17387/7
--
To view, visit http://gerrit.cloudera.org:8080/17387
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8aa9d3b8f3c4848b3e9414fe19ad7ad348d12ecc
Gerrit-Change-Number: 17387
Gerrit-PatchSet: 7
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] [PROTOTYPE] IMPALA-6784: Upgrade gperftools to 2.8.1

2021-05-21 Thread Joe McDonnell (Code Review)
Joe McDonnell has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17485


Change subject: [PROTOTYPE] IMPALA-6784: Upgrade gperftools to 2.8.1
..

[PROTOTYPE] IMPALA-6784: Upgrade gperftools to 2.8.1

This upgrades gperftools to 2.8.1 to get various improvements
over gperftools 2.5. One improvement is that PageHeap::AllocLarge()
has been improved from O(n) complexity to O(log n).

Testing:
 - Core job

Change-Id: Ib0974b276db908c1ef733114656ba5276c302fdc
---
M bin/impala-config.sh
1 file changed, 2 insertions(+), 2 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/85/17485/1
--
To view, visit http://gerrit.cloudera.org:8080/17485
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib0974b276db908c1ef733114656ba5276c302fdc
Gerrit-Change-Number: 17485
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 


[Impala-ASF-CR] IMPALA-10709: Min/max filters should be enabled for joins into sorted columns in Parquet tables

2021-05-21 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17478 )

Change subject: IMPALA-10709: Min/max filters should be enabled for joins into 
sorted columns in Parquet tables
..


Patch Set 7:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/17478/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17478/7//COMMIT_MSG@7
PS7, Line 7: enabled for joins into sorted
nit: maybe say 'enabled only for joins on sorted..'


http://gerrit.cloudera.org:8080/#/c/17478/7//COMMIT_MSG@10
PS7, Line 10: addvantage
nit: spelling


http://gerrit.cloudera.org:8080/#/c/17478/7//COMMIT_MSG@10
PS7, Line 10: the
nit: extra 'the'


http://gerrit.cloudera.org:8080/#/c/17478/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/17478/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2776
PS7, Line 2776:   sort_by_columns_ =
For non-parquet tables or parquet tables that have not been created with the 
SORT BY clause, the getParameters().get("sort.columns") would return NULL, so 
suggest adding a null check for that.  Also, getParameters() itself may return 
NULL so both checks would be needed.


http://gerrit.cloudera.org:8080/#/c/17478/7/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/17478/7/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@750
PS7, Line 750: if (table != null && table instanceof HdfsTable) {
I think this should also check for ((HdfsTable) table).isParquetTable() .

A more general thought on this is when we talk about the sort by column, here 
we are limiting it to the case when the table itself was created with the SORT 
BY clause.  As opposed to the ColumnOrder property contained in the Parquet 
footer:
   optional list column_orders; <-- from the FileMetadata 
portion of the footer

So if an external software (e.g Spark) populated the above in the footer, we 
won't end up using it. When we talk about min-max filters for Parquet, we 
always think about the Parquet footer so this would be worth clarifying in the 
commit message. In the future we could be smarter about this e.g by looking at 
the footers of a small sample of files.



--
To view, visit http://gerrit.cloudera.org:8080/17478
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I28c19c4b39b01ffa7d275fb245be85c28e9b2963
Gerrit-Change-Number: 17478
Gerrit-PatchSet: 7
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Comment-Date: Fri, 21 May 2021 23:43:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8737: Switch to gperftool with O(log n) PageHeap::AllocLarge()

2021-05-21 Thread Joe McDonnell (Code Review)
Joe McDonnell has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17484


Change subject: IMPALA-8737: Switch to gperftool with O(log n) 
PageHeap::AllocLarge()
..

IMPALA-8737: Switch to gperftool with O(log n) PageHeap::AllocLarge()

In gperftools 2.5, PageHeap::AllocLarge() has O(n) behavior.
This switches to a patched version of gperftools 2.5 that
changes the behavior to O(log n). This corresponds to
these commits:

https://github.com/gperftools/gperftools/commit/06c9414ec423ffe442c047b2560555f9d5847b1d
https://github.com/gperftools/gperftools/commit/f1d3fe4a21e339a3fd6e4592ee784a7b92dc

Testing:
 - Core job

Change-Id: I6377f7087111cf10ae35ce102beabcafb505579f
---
M bin/impala-config.sh
1 file changed, 2 insertions(+), 2 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/84/17484/1
--
To view, visit http://gerrit.cloudera.org:8080/17484
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I6377f7087111cf10ae35ce102beabcafb505579f
Gerrit-Change-Number: 17484
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 


[Impala-ASF-CR] IMPALA-10613: (addendum) Fix test on S3 builds

2021-05-21 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17483


Change subject: IMPALA-10613: (addendum) Fix test on S3 builds
..

IMPALA-10613: (addendum) Fix test on S3 builds

The test_metastore_service.py fails on S3 builds because it expects
the filemetadata's object dictionary to be present. However, if the
table is located on S3 then there are no file-blocks in the returned
file-metadata and hence the length of obj_dict will be 0.

This patch fixes the test in S3 builds by not asserting the check
on S3, ADLS, GCS builds.

Testing Done:
1. Ran the test locally to confirm it is working.
2. [WIP] Ran the test on a S3 environment where tables are created
on S3.

Change-Id: I6ac291529dc0661abdfc2d4f48924a2c4b807c40
---
M tests/custom_cluster/test_metastore_service.py
1 file changed, 5 insertions(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/17483/3
--
To view, visit http://gerrit.cloudera.org:8080/17483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I6ac291529dc0661abdfc2d4f48924a2c4b807c40
Gerrit-Change-Number: 17483
Gerrit-PatchSet: 3
Gerrit-Owner: Vihang Karajgaonkar 


[native-toolchain-CR] IMPALA-8737: Patch gperftools to implement O(log n) searching for PageHeap::AllocLarge()

2021-05-21 Thread Joe McDonnell (Code Review)
Joe McDonnell has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17482


Change subject: IMPALA-8737: Patch gperftools to implement O(log n) searching 
for PageHeap::AllocLarge()
..

IMPALA-8737: Patch gperftools to implement O(log n) searching for 
PageHeap::AllocLarge()

In gperftools 2.5, PageHeap::AllocLarge() has O(n) behavior.
This patches gperftool 2.5 to incorporate the patches that
change PageHeap::AllocLarge() to have O(log n) searching:

https://github.com/gperftools/gperftools/commit/06c9414ec423ffe442c047b2560555f9d5847b1d
https://github.com/gperftools/gperftools/commit/f1d3fe4a21e339a3fd6e4592ee784a7b92dc

This also adds a build for gperftools 2.8.1 to allow future
development/testing for actually upgrading gperftools (IMPALA-6784).

Testing:
 - Ran a toolchain build with this
 - Ran Impala locally with the resulting gperftools-2.5-p4

Change-Id: I7f41f2d296e26e160fd9bcc8be29085d1e9a8bc2
---
M buildall.sh
A 
source/gperftools/gperftools-2.5-patches/0003-Implemented-O-log-n-searching-among-large-spans.patch
A 
source/gperftools/gperftools-2.5-patches/0004-refactored-handling-of-reverse-span-set-iterator-for.patch
A 
source/gperftools/gperftools-2.8.1-patches/0001-Port-gperftools-to-adapt-aarch64.patch
4 files changed, 873 insertions(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/native-toolchain 
refs/changes/82/17482/1
--
To view, visit http://gerrit.cloudera.org:8080/17482
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I7f41f2d296e26e160fd9bcc8be29085d1e9a8bc2
Gerrit-Change-Number: 17482
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-05-21 Thread Amogh Margoor (Code Review)
Amogh Margoor has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h
File be/src/util/string-parser.h:

http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@545
PS6, Line 545: std::string res;
> Original input string in StringToFloatInternal() is not null-terminated str
Results:

3 Runs with 20 length string:


With std::string
Fetched 110 row(s) in 30.54s
Fetched 110 row(s) in 30.68s
Fetched 110 row(s) in 30.62s


With VLA:
Fetched 110 row(s) in 30.60s
Fetched 110 row(s) in 30.64s
Fetched 110 row(s) in 30.56s



3 Runs with 30 length string:
-

With std::string
Fetched 110 row(s) in 33.85s
Fetched 110 row(s) in 33.89s
Fetched 110 row(s) in 34.19s

With VLA:
Fetched 110 row(s) in 33.78s
Fetched 110 row(s) in 33.71s
Fetched 110 row(s) in 33.57s

Summary:
I think we are fine with performance. In real system hopefully we don't see too 
many tcmalloc contention due to this. Couple of things will help us avoid 
malloc in current scenario:
1. Strings less than 16 length due to Small string optimisation.
2. Based on Qifan's suggestion, we can avoid creating copy of string for some 
cases where we have non-integer terminated string as original input.

Other alternative (also suggested by Zoltan) is to do VLA for string of less 
than 256 bytes and above that we can use strtod.

I am ok with either approach.



--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 6
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 21 May 2021 21:37:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10709: Min/max filters should be enabled for joins into sorted columns in Parquet tables

2021-05-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17478 )

Change subject: IMPALA-10709: Min/max filters should be enabled for joins into 
sorted columns in Parquet tables
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8771/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17478
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I28c19c4b39b01ffa7d275fb245be85c28e9b2963
Gerrit-Change-Number: 17478
Gerrit-PatchSet: 7
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Comment-Date: Fri, 21 May 2021 21:07:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10709: Min/max filters should be enabled for joins into sorted columns in Parquet tables

2021-05-21 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/17478 )

Change subject: IMPALA-10709: Min/max filters should be enabled for joins into 
sorted columns in Parquet tables
..

IMPALA-10709: Min/max filters should be enabled for joins into sorted columns 
in Parquet tables

This patch enables min/max filters for equi-joins into sort by
columns in a Parquet table by default. This is to take the addvantage
of the min/max values being fully sorted in each data file for the
table. When there are multiple sort by columns in the table, only the
leading column will be assigned a min/max filter. The control knob
is minmax_filter_sorted_columns, default to true.

When query option minmax_filter_sorted_columns is true and query
option minmax_filter_threshold is 0, the patch automatically assigns
a reasonable value for the threshhold, and selects PAGE to be the
filtering level as normally specified via option
minmax_filtering_level. When minmax_filter_threshold is greater than 0,
then no adjustment will be made to both options
(minmax_filter_threshold and minmax_filtering_level).

When minmax_filter_sorted_columns is set to false, no min/max filters
will be specifically assigned to the leading sort by columns.

Testing:
  1). Added two new tests in overlap_min_max_filters.test to verify
  a) min/max filters are only created for leading sort by column;
  b) query option minmax_filter_sorted_columns works.
  2). Core [TBD]
  3). Performance [TBD]

Change-Id: I28c19c4b39b01ffa7d275fb245be85c28e9b2963
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M 
testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
10 files changed, 125 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/17478/7
--
To view, visit http://gerrit.cloudera.org:8080/17478
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I28c19c4b39b01ffa7d275fb245be85c28e9b2963
Gerrit-Change-Number: 17478
Gerrit-PatchSet: 7
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 


[Impala-ASF-CR] IMPALA-10489: Implement JWT support

2021-05-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17435 )

Change subject: IMPALA-10489: Implement JWT support
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8770/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6b71fa854c9ddc8ca882878853395e1eb866143c
Gerrit-Change-Number: 17435
Gerrit-PatchSet: 8
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 21 May 2021 20:03:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10489: Implement JWT support

2021-05-21 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has uploaded a new patch set (#8). ( 
http://gerrit.cloudera.org:8080/17435 )

Change subject: IMPALA-10489: Implement JWT support
..

IMPALA-10489: Implement JWT support

This patch added JWT support with following functionality:
 * Load and parse JWKS from pre-installed JSON file.
 * Read the JWT token from the HTTP Header.
 * Verify the JWT's signature with puclic key in JWKS.
 * Get the username out of the payload of JWT token.

We use third party library jwt-cpp to verify JWT token. jwt-cpp is a
headers only C++ library. It was added to native-toolchain.
This patch modified bootstrap_toolchian.py to download jwt-cpp from
toolchain s3 bucket, and modified makefiles to add jwt-cpp/include
in the include path.

Added BE unit-tests for loading JWKS file and verifying JWT token.
Also added FE custom cluster test for JWT authentication.

Testing:
 - Passed core run.

Change-Id: I6b71fa854c9ddc8ca882878853395e1eb866143c
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/rpc/authentication.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/transport/THttpServer.cpp
M be/src/transport/THttpServer.h
M be/src/util/CMakeLists.txt
A be/src/util/jwt-util-test.cc
A be/src/util/jwt-util.cc
A be/src/util/jwt-util.h
M be/src/util/webserver.cc
M be/src/util/webserver.h
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M bin/rat_exclude_files.txt
A cmake_modules/FindJwtCpp.cmake
M common/thrift/generate_error_codes.py
M common/thrift/metrics.json
M fe/src/test/java/org/apache/impala/customcluster/LdapHS2Test.java
M fe/src/test/java/org/apache/impala/customcluster/LdapWebserverTest.java
A testdata/jwt/jwks_rs256.json
22 files changed, 1,494 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/17435/8
--
To view, visit http://gerrit.cloudera.org:8080/17435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6b71fa854c9ddc8ca882878853395e1eb866143c
Gerrit-Change-Number: 17435
Gerrit-PatchSet: 8
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-10489: Implement JWT support

2021-05-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17435 )

Change subject: IMPALA-10489: Implement JWT support
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8769/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6b71fa854c9ddc8ca882878853395e1eb866143c
Gerrit-Change-Number: 17435
Gerrit-PatchSet: 7
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 21 May 2021 19:22:42 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10489: Implement JWT support

2021-05-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17435 )

Change subject: IMPALA-10489: Implement JWT support
..


Patch Set 7:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17435/7/fe/src/test/java/org/apache/impala/customcluster/LdapHS2Test.java
File fe/src/test/java/org/apache/impala/customcluster/LdapHS2Test.java:

http://gerrit.cloudera.org:8080/#/c/17435/7/fe/src/test/java/org/apache/impala/customcluster/LdapHS2Test.java@433
PS7, Line 433: "--jwt_token_auth=true --jwt_validate_signature=true 
--jwks_file_path=%s --jwt_allow_without_tls=true",
line too long (111 > 90)


http://gerrit.cloudera.org:8080/#/c/17435/7/fe/src/test/java/org/apache/impala/customcluster/LdapHS2Test.java@494
PS7, Line 494: "--jwt_token_auth=true --jwt_validate_signature=false 
--jwt_allow_without_tls=true");
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/17435/7/fe/src/test/java/org/apache/impala/customcluster/LdapWebserverTest.java
File fe/src/test/java/org/apache/impala/customcluster/LdapWebserverTest.java:

http://gerrit.cloudera.org:8080/#/c/17435/7/fe/src/test/java/org/apache/impala/customcluster/LdapWebserverTest.java@289
PS7, Line 289: "--jwt_token_auth=true --jwt_validate_signature=true 
--jwks_file_path=%s --jwt_allow_without_tls=true",
line too long (115 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/17435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6b71fa854c9ddc8ca882878853395e1eb866143c
Gerrit-Change-Number: 17435
Gerrit-PatchSet: 7
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 21 May 2021 19:02:50 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10489: Implement JWT support

2021-05-21 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/17435 )

Change subject: IMPALA-10489: Implement JWT support
..

IMPALA-10489: Implement JWT support

This patch added JWT support with following functionality:
 * Load and parse JWKS from pre-installed JSON file.
 * Read the JWT token from the HTTP Header.
 * Verify the JWT's signature with puclic key in JWKS.
 * Get the username out of the payload of JWT token.

We use third party library jwt-cpp to verify JWT token. jwt-cpp is a
headers only C++ library. It was added to native-toolchain.
This patch modified bootstrap_toolchian.py to download jwt-cpp from
toolchain s3 bucket, and modified makefiles to add jwt-cpp/include
in the include path.

Added BE unit-tests for loading JWKS file and verifying JWT token.
Also added FE custom cluster test for JWT authentication.

Testing:
 - Passed core run.

Change-Id: I6b71fa854c9ddc8ca882878853395e1eb866143c
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/rpc/authentication.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/transport/THttpServer.cpp
M be/src/transport/THttpServer.h
M be/src/util/CMakeLists.txt
A be/src/util/jwt-util-test.cc
A be/src/util/jwt-util.cc
A be/src/util/jwt-util.h
M be/src/util/webserver.cc
M be/src/util/webserver.h
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M bin/rat_exclude_files.txt
A cmake_modules/FindJwtCpp.cmake
M common/thrift/generate_error_codes.py
M common/thrift/metrics.json
M fe/src/test/java/org/apache/impala/customcluster/LdapHS2Test.java
M fe/src/test/java/org/apache/impala/customcluster/LdapWebserverTest.java
A testdata/jwt/jwks_rs256.json
22 files changed, 1,493 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/17435/7
--
To view, visit http://gerrit.cloudera.org:8080/17435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6b71fa854c9ddc8ca882878853395e1eb866143c
Gerrit-Change-Number: 17435
Gerrit-PatchSet: 7
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early

2021-05-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17295 )

Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early
..


Patch Set 32:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8768/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17295
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183
Gerrit-Change-Number: 17295
Gerrit-PatchSet: 32
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 21 May 2021 18:55:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early

2021-05-21 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17295 )

Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early
..


Patch Set 32:

Here is the clean sql execution after the fix.

Server version: impalad version 4.0.0-SNAPSHOT DEBUG (build 
603d1a8fd0946b32e76807bb4c505171f87f7881)
Query: drop table if exists min_max_filter_large_strings1
+-+
| summary |
+-+
| Table has been dropped. |
+-+
Fetched 1 row(s) in 4.94s
Query: drop table if exists min_max_filter_large_strings2
+-+
| summary |
+-+
| Table has been dropped. |
+-+
Fetched 1 row(s) in 3.99s
Query: create table min_max_filter_large_strings1 (string_col string primary 
key) stored as kudu
+-+
| summary |
+-+
| Table has been created. |
+-+
WARNINGS: Unpartitioned Kudu tables are inefficient for large data sizes.

Fetched 1 row(s) in 0.56s
Query: insert into min_max_filter_large_strings1 values 
(''),
 
('bbbc'),
 
(''),
 

[Impala-ASF-CR] IMPALA-10489: Implement JWT support

2021-05-21 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17435 )

Change subject: IMPALA-10489: Implement JWT support
..


Patch Set 6:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/17435/6/be/src/rpc/authentication.cc
File be/src/rpc/authentication.cc:

http://gerrit.cloudera.org:8080/#/c/17435/6/be/src/rpc/authentication.cc@135
PS6, Line 135: DEFINE_bool_hidden(saml2_allow_without_tls_debug_only, false,
 : "When this configuration is set to true, Impala allows SAML2 
authentication "
 : "on unsecure channel. This should be only enabled for 
development / testing.");
> I think we should have a similar parameter for JWTs, because JWTs are only
Done


http://gerrit.cloudera.org:8080/#/c/17435/6/be/src/rpc/authentication.cc@155
PS6, Line 155: // Authorization type for JWT token based authorization in HTTP 
header, for example
 : // "Authorization: Bearer ", or “Authorization: 
Bearer;token: ”.
 : DEFINE_string(
 : jwt_auth_type, "Bearer", "Authorization type for JWT token 
based authorization");
 : // JWT token identifier in HTTP header. for example 
“token=”.
 : DEFINE_string(jwt_token_identifier, "token", "JWT token 
identifier");
> We definitely want to support this standard authorization header:
I saw following statement in design doc "SSO Token based authentication in 
CDW": "This token is then used in the JDBC URL with authentication type set to 
“auth=token” and “token=” (exact semantics are TBD Action: for 
hive/impala teams)". I was thinking we got to support different formats. Did a 
search, it's standard to set JWT auth as "Authorization: Bearer ".
Removed jwt_auth_type and jwt_token_identifier.


http://gerrit.cloudera.org:8080/#/c/17435/6/be/src/rpc/authentication.cc@568
PS6, Line 568:   // Create a cookie to return.
 :   connection_context->return_headers.push_back(
 :   Substitute("Set-Cookie: $0", GenerateCookie(username, 
hash)));
> I'm wondering if we really need to exchange the JWT for a cookie. We defini
Agree, it's common to JWT with each request.
When discussed with Naveen last time, it's not clear if JWT will accompany each 
request. I will confirm with him.


http://gerrit.cloudera.org:8080/#/c/17435/6/be/src/transport/THttpServer.cpp
File be/src/transport/THttpServer.cpp:

http://gerrit.cloudera.org:8080/#/c/17435/6/be/src/transport/THttpServer.cpp@170
PS6, Line 170: #if 0
 :   } else if (use_jwt_token_
 :   && THRIFT_strncasecmp(header, 
FLAGS_jwt_token_identifier.c_str(), sz) == 0
 :   && THRIFT_strncasecmp(
 :  auth_value_.c_str(), FLAGS_jwt_auth_type.c_str(), 
auth_value_.length())
 :   == 0) {
 : // JWT token in HTTP header as 
“Authorization=Bearer;token=”
 : jwt_token_ = string(value);
 : #endif
> If this code isn't needed, let's remove it.
Done


http://gerrit.cloudera.org:8080/#/c/17435/6/be/src/transport/THttpServer.cpp@286
PS6, Line 286:   if (got_bearer_auth) {
 : fallback_to_other_auths = false;
 : // Final SAML message in browser mode, check bearer and 
replace it with a cookie.
 : DCHECK(wrapped_request_ != nullptr);
 : wrapped_request_->headers["Authorization"] = auth_value_;
 : if (callbacks_.validate_saml2_bearer_fn()) {
 :   // During EE tests it makes things easier to return 
401-Unauthorized here.
 :   // This hack can be removed once there is a Python 
client that
 :   // supports SAML (IMPALA-10496).
 :   if (!FLAGS_saml2_ee_test_mode) authorized = true;
 :   if (metrics_enabled_) 
http_metrics_->total_saml_auth_success_->Increment(1);
 : }
> One complication is that both SAML and JWT use the "Authorization Bearer http://gerrit.cloudera.org:8080/17435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6b71fa854c9ddc8ca882878853395e1eb866143c
Gerrit-Change-Number: 17435
Gerrit-PatchSet: 6
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 21 May 2021 18:38:14 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early

2021-05-21 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#32). ( 
http://gerrit.cloudera.org:8080/17295 )

Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early
..

IMPALA-10650: Bailout min/max filters in hash join builder early

This change set addresses the weakness in population min/max filters
in the hash join builder by periodically measuring the usefulness of
each filter and set the 'always_true_' flag accordingly. Once set to
true, the insertion to such a filter completely skips the steps from
the evaluation of the value from a row to the verification of the
value in the min/max range. This optimization is LLVM-enabled.

In addition, a new flag 'is_min_max_value_present' is added to
TRuntimeFilterTargetDesc to indicate whether the min/max column stats
is present in the query plan. The flag eliminates the need to check
the presence of min/max stats for every row in back-end.

Early bail out improves the HJ builder step in general. For example,
the step for join node #11 in TPCDS Q8 improves 13%, and the step
for join node #8 in TPCDS Q16 improves 3.2%.

The Insert() methods are optimized with branch prediction compiler
hints which yield the following improvement when tested with the
insertion of 1 randomly generated items.

  Small Integers: 7.0%
  Integers:   4.1%
  Big Integers:   4.3%
  Strings:5.6%
  Dates:  4.4%
  Timestamps:10.7%
  Decimals(4):   10.4%
  Decimals(8):9.1%

In addition, the min/max stats for pages are read in batches with a
fast track version for column types of int32_t,  int64_t, float,
double and date that have identical storage format as Parquet. For a
row group, the page locations are read only once, instead of once for
every page skipped, resulting in 100x speedup when a subset of 199
pages are skipped.

Testing:
  1. Ran core test successfully;
  2. Ran TPCDS performance tests.

Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/filter-context.cc
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exec/parquet/parquet-column-stats.cc
M be/src/exec/parquet/parquet-column-stats.h
M be/src/exec/parquet/parquet-column-stats.inline.h
M be/src/exec/parquet/parquet-common.h
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/runtime/runtime-filter-ir.cc
M be/src/util/min-max-filter-ir.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/util/TColumnValueUtil.java
17 files changed, 984 insertions(+), 297 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/17295/32
--
To view, visit http://gerrit.cloudera.org:8080/17295
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183
Gerrit-Change-Number: 17295
Gerrit-PatchSet: 32
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10626: Add support for Iceberg's Catalogs API

2021-05-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17466 )

Change subject: IMPALA-10626: Add support for Iceberg's Catalogs API
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8767/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17466
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5dfa150986117fc55b28034c4eda38a736460ead
Gerrit-Change-Number: 17466
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Fri, 21 May 2021 17:03:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10626: Add support for Iceberg's Catalogs API

2021-05-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17466 )

Change subject: IMPALA-10626: Add support for Iceberg's Catalogs API
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8766/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17466
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5dfa150986117fc55b28034c4eda38a736460ead
Gerrit-Change-Number: 17466
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Fri, 21 May 2021 16:53:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10626: Add support for Iceberg's Catalogs API

2021-05-21 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17466 )

Change subject: IMPALA-10626: Add support for Iceberg's Catalogs API
..


Patch Set 4:

PS4 is a rebase.


--
To view, visit http://gerrit.cloudera.org:8080/17466
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5dfa150986117fc55b28034c4eda38a736460ead
Gerrit-Change-Number: 17466
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Fri, 21 May 2021 16:36:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10626: Add support for Iceberg's Catalogs API

2021-05-21 Thread Zoltan Borok-Nagy (Code Review)
Hello wangsheng, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17466

to look at the new patch set (#4).

Change subject: IMPALA-10626: Add support for Iceberg's Catalogs API
..

IMPALA-10626: Add support for Iceberg's Catalogs API

Iceberg recently switched to use its Catalogs class to define
catalog and table properties. Catalog information is stored in
a configuration file such as hive-site.xml. And the table properties
contain information about which catalog is being used and what is
the Iceberg table id.

E.g. in the Hive conf we can have the following properties to define
catalogs:

 iceberg.catalog..type = hadoop
 iceberg.catalog..warehouse = somelocation

 or

 iceberg.catalog..type = hive

And at the table level we can have the following:

iceberg.catalog = 
name = 

Table property 'iceberg.catalog' refers to a Catalog defined in the
configuration file. This is in contradiction with Impala's current
behavior where we are already using 'iceberg.catalog', and it can
have the following values:

 * hive.catalog for HiveCatalog
 * hadoop.catalog for HadoopCatalog
 * hadoop.tables for HadoopTables

To be backward-compatible and also support the new Catalogs properties
Impala still recognizes the above special values. But, from now Impala
doesn't define 'iceberg.catalog' by default. 'iceberg.catalog' being
NULL means HiveCatalog for both Impala and Iceberg's Catalogs API,
hence for Hive and Spark as well.

If 'iceberg.catalog' has a different value than the special values it
indicates that Iceberg's Catalogs API is being used, so Impala will
try to look up the catalog configuration from the Hive config file.

Testing:
 * added SHOW CREATE TABLE tests
 * added e2e tests that create/insert/drop Iceberg tables with Catalogs
 * manually tested interop behavior with Hive

Change-Id: I5dfa150986117fc55b28034c4eda38a736460ead
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalog.java
A fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHadoopCatalog.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHadoopTables.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M fe/src/test/resources/hive-site.xml.py
A testdata/workloads/functional-query/queries/QueryTest/iceberg-catalogs.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
M tests/query_test/test_iceberg.py
15 files changed, 468 insertions(+), 35 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/17466/4
--
To view, visit http://gerrit.cloudera.org:8080/17466
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5dfa150986117fc55b28034c4eda38a736460ead
Gerrit-Change-Number: 17466
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 


[Impala-ASF-CR] IMPALA-10626: Add support for Iceberg's Catalogs API

2021-05-21 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17466 )

Change subject: IMPALA-10626: Add support for Iceberg's Catalogs API
..


Patch Set 2:

(4 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/17466/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17466/2//COMMIT_MSG@12
PS2, Line 12: contiain
> contain?
Done


http://gerrit.cloudera.org:8080/#/c/17466/2/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java:

http://gerrit.cloudera.org:8080/#/c/17466/2/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java@677
PS2, Line 677: throws AnalysisException
> Exception never thrown in this method.
Done


http://gerrit.cloudera.org:8080/#/c/17466/2/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalog.java
File fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17466/2/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalog.java@58
PS2, Line 58: properties
> Maybe we should alse add some comments for this new parameter.
Done


http://gerrit.cloudera.org:8080/#/c/17466/2/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java
File fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java:

http://gerrit.cloudera.org:8080/#/c/17466/2/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java@114
PS2, Line 114: properties
> Unused variable.
Done



--
To view, visit http://gerrit.cloudera.org:8080/17466
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5dfa150986117fc55b28034c4eda38a736460ead
Gerrit-Change-Number: 17466
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Fri, 21 May 2021 16:33:28 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10626: Add support for Iceberg's Catalogs API

2021-05-21 Thread Zoltan Borok-Nagy (Code Review)
Hello wangsheng, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17466

to look at the new patch set (#3).

Change subject: IMPALA-10626: Add support for Iceberg's Catalogs API
..

IMPALA-10626: Add support for Iceberg's Catalogs API

Iceberg recently switched to use its Catalogs class to define
catalog and table properties. Catalog information is stored in
a configuration file such as hive-site.xml. And the table properties
contain information about which catalog is being used and what is
the Iceberg table id.

E.g. in the Hive conf we can have the following properties to define
catalogs:

 iceberg.catalog..type = hadoop
 iceberg.catalog..warehouse = somelocation

 or

 iceberg.catalog..type = hive

And at the table level we can have the following:

iceberg.catalog = 
name = 

Table property 'iceberg.catalog' refers to a Catalog defined in the
configuration file. This is in contradiction with Impala's current
behavior where we are already using 'iceberg.catalog', and it can
have the following values:

 * hive.catalog for HiveCatalog
 * hadoop.catalog for HadoopCatalog
 * hadoop.tables for HadoopTables

To be backward-compatible and also support the new Catalogs properties
Impala still recognizes the above special values. But, from now Impala
doesn't define 'iceberg.catalog' by default. 'iceberg.catalog' being
NULL means HiveCatalog for both Impala and Iceberg's Catalogs API,
hence for Hive and Spark as well.

If 'iceberg.catalog' has a different value than the special values it
indicates that Iceberg's Catalogs API is being used, so Impala will
try to look up the catalog configuration from the Hive config file.

Testing:
 * added SHOW CREATE TABLE tests
 * added e2e tests that create/insert/drop Iceberg tables with Catalogs
 * manually tested interop behavior with Hive

Change-Id: I5dfa150986117fc55b28034c4eda38a736460ead
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalog.java
A fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHadoopCatalog.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHadoopTables.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M fe/src/test/resources/hive-site.xml.py
A testdata/workloads/functional-query/queries/QueryTest/iceberg-catalogs.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
M tests/query_test/test_iceberg.py
15 files changed, 468 insertions(+), 35 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/17466/3
--
To view, visit http://gerrit.cloudera.org:8080/17466
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5dfa150986117fc55b28034c4eda38a736460ead
Gerrit-Change-Number: 17466
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: wangsheng 


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-05-21 Thread Amogh Margoor (Code Review)
Amogh Margoor has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h
File be/src/util/string-parser.h:

http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@545
PS6, Line 545: std::string res;
> Stack overflow is sensitive to the depth of the stack and the size of the c
Original input string in StringToFloatInternal() is not null-terminated string 
and was figured out during testing. That is also one of the reason for the 
copying things into null-terminated string.
I am checking perf penalty with a million casts requiring heap allocation. if 
we see perf issues, we can use VLA for shorter strings and heap for longer ones.



--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 6
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 21 May 2021 16:02:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-05-21 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h
File be/src/util/string-parser.h:

http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@545
PS6, Line 545: std::string res;
> Yeah, I think it'd be good to see measurements for longer float literals. W
Stack overflow is sensitive to the depth of the stack and the size of the 
current frame.

Assume in most cases, the input string conforms to the format that the fast 
double parser library can handle properly, then we may not need to build the 
string object at all.

Thus, we can use the following form of signature:

inline bool stringToFloatPreprocess(char*, int offset, int len, string* 
preprocessed_result);

We return string (in preprocessed_result) only when the input does not conform.

In good case, we do not construct a string object and we can feed the original 
input string in StringToFloatInternal() to the fast parser library, if the 
input string is null-terminated. In seems this is the case as the only use of 
the StringToFloatInternal() is called indirectly here throughout Impala BE.

343 static bool ParseProbability(const string& prob_str, bool* should_execute) {
344   StringParser::ParseResult parse_result;  
345   double probability = StringParser::StringToFloat(
346   prob_str.c_str(), prob_str.size(), _result);
347   if (parse_result != StringParser::PARSE_SUCCESS ||   
348   probability < 0.0 || probability > 1.0) {   
349 return false;
350   } 
   
351   // +1L ensures probability of 0.0 and 1.0 work as expected.
352   *should_execute = rand() < probability * (RAND_MAX + 1L);
353   return true;  
   
354 }



--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 6
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 21 May 2021 15:27:08 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-05-21 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h
File be/src/util/string-parser.h:

http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@545
PS6, Line 545: std::string res;
> I had checked Small String Optimization and had read till 24 bytes it would
Yeah, I think it'd be good to see measurements for longer float literals. We 
are using TCMalloc in Impala which should be fairly performant so it's possible 
that the perf is not too bad.

We can also have an 'if' in the code. If strlen > 1 KiB use strtod(), otherwise 
use fast_double_parser with a Variable-length array, or a fixed-sized 1 KiB 
array?



--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 6
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 21 May 2021 11:24:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-05-21 Thread Amogh Margoor (Code Review)
Amogh Margoor has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h
File be/src/util/string-parser.h:

http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@545
PS6, Line 545: std::string res;
> I wonder if the performance numbers are still the same, given that now we n
I had checked Small String Optimization and had read till 24 bytes it would not 
do allocation which appeared reasonable. But i rechecked after your comment and 
24 bytes seem to be only for clang. I wrote a small program to check for our 
g++ and it avoids allocation till 15 bytes only. Program:

#include 
#include 
#include 

// replace operator new and delete to log allocations
void* operator new(std::size_t n) {
std::cout << "[Allocating " << n << " bytes]";
return malloc(n);
}
void operator delete(void* p) throw() {
free(p);
}

int main() {
for (size_t i = 0; i < 30; ++i) {
std::cout << i << ": " << std::string(i, '=') << std::endl;
}
}

is 15 good enough length ? otherwise dynamic allocation will make things slow. 
My assumption is input won't be long enough to cause stack overflow. I am not 
sure if we have some limitation on the string being passed to StringToFloat.



--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 6
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 21 May 2021 10:49:09 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-05-21 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 6:

(2 comments)

Thanks for the changes!

http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h
File be/src/util/string-parser.h:

http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@545
PS6, Line 545: std::string res;
I wonder if the performance numbers are still the same, given that now we need 
to dynamically allocate memory from the heap. (Though for short strings the 
standard library implementation being used avoids allocation by using Small 
String Optimization)


http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@552
PS6, Line 552: // string has `move` defined, which would be used instead of 
copy.
nit: I don't think we need this comment. And I think the situation is even 
better as in this case Named Return Value Optimization kicks in which even 
avoids moving:
https://en.cppreference.com/w/cpp/language/copy_elision



-- 
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 6
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 21 May 2021 09:29:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early

2021-05-21 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17295 )

Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early
..


Patch Set 31:

Seems like TestMinMaxFilters.test_large_strings timed out:

https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/13947/testReport/
https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4238/testReport/


--
To view, visit http://gerrit.cloudera.org:8080/17295
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183
Gerrit-Change-Number: 17295
Gerrit-PatchSet: 31
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 21 May 2021 08:59:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10626: Add support for Iceberg's Catalogs API

2021-05-21 Thread wangsheng (Code Review)
wangsheng has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17466 )

Change subject: IMPALA-10626: Add support for Iceberg's Catalogs API
..


Patch Set 2:

(4 comments)

Thanks for new feature!

http://gerrit.cloudera.org:8080/#/c/17466/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17466/2//COMMIT_MSG@12
PS2, Line 12: contiain
contain?


http://gerrit.cloudera.org:8080/#/c/17466/2/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java:

http://gerrit.cloudera.org:8080/#/c/17466/2/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java@677
PS2, Line 677: throws AnalysisException
Exception never thrown in this method.


http://gerrit.cloudera.org:8080/#/c/17466/2/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalog.java
File fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17466/2/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalog.java@58
PS2, Line 58: properties
Maybe we should alse add some comments for this new parameter.


http://gerrit.cloudera.org:8080/#/c/17466/2/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java
File fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java:

http://gerrit.cloudera.org:8080/#/c/17466/2/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java@114
PS2, Line 114: properties
Unused variable.



--
To view, visit http://gerrit.cloudera.org:8080/17466
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5dfa150986117fc55b28034c4eda38a736460ead
Gerrit-Change-Number: 17466
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Fri, 21 May 2021 07:35:21 +
Gerrit-HasComments: Yes