[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early

2021-04-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17295 )

Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early
..


Patch Set 9:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8619/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17295
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183
Gerrit-Change-Number: 17295
Gerrit-PatchSet: 9
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 22 Apr 2021 00:05:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10658: LOAD DATA INPATH silently fails between HDFS and Azure ABFS

2021-04-21 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17316 )

Change subject: IMPALA-10658: LOAD DATA INPATH silently fails between HDFS and 
Azure ABFS
..


Patch Set 3: Code-Review+1

This makes sense to me. I'm ok to go to +2 if other reviewers don't have 
comments.


--
To view, visit http://gerrit.cloudera.org:8080/17316
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id807e8a200b83283a09d3a917185cabab930017d
Gerrit-Change-Number: 17316
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 21 Apr 2021 23:50:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early

2021-04-21 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#9). ( 
http://gerrit.cloudera.org:8080/17295 )

Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early
..

IMPALA-10650: Bailout min/max filters in hash join builder early

This change set addresses the weakness in population min/max filters
in the hash join builder by periodically measuring the usefulness of
each such filter and set the 'always_true_' flag to true. Inserting
into a filter with a true 'always_true_' flag, the steps from the
evaluation of the value from a row to the verification of the value
in the min/max range are skipped completely.

The above optimization is also LLVM-codeded.

Testing:
  1. Ran core test;
  2. Ran performance test (TBD).

Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/runtime/runtime-filter-ir.cc
M be/src/util/min-max-filter-ir.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
9 files changed, 179 insertions(+), 42 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/17295/9
--
To view, visit http://gerrit.cloudera.org:8080/17295
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183
Gerrit-Change-Number: 17295
Gerrit-PatchSet: 9
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-10669: Limit memory during dataload in Docker-based tests

2021-04-21 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17331 )

Change subject: IMPALA-10669: Limit memory during dataload in Docker-based tests
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17331/1/docker/test-with-docker.py
File docker/test-with-docker.py:

http://gerrit.cloudera.org:8080/#/c/17331/1/docker/test-with-docker.py@648
PS1, Line 648: # Tez can get confused when creating ORC files with 
nested structures
 : # on machines with lots of RAM, e.g. m5.12xlarge. 
Limiting the available memory
 : # seems to avoid this problem, so limit container memory 
just for the build
 : # container. 128GB is still much more than dataload 
needs, but does the trick.
 : extras=["-m", "128g"],
Should we make this conditional on the machine having more than 128g of memory?



--
To view, visit http://gerrit.cloudera.org:8080/17331
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5100589ffc54184b6ca13db139cb983175c934eb
Gerrit-Change-Number: 17331
Gerrit-PatchSet: 1
Gerrit-Owner: Laszlo Gaal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Comment-Date: Wed, 21 Apr 2021 23:30:50 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10584: Defer advancing read page if stream only has 2 pages.

2021-04-21 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17195 )

Change subject: IMPALA-10584: Defer advancing read page if stream only has 2 
pages.
..


Patch Set 6: Code-Review+1

Looks good!


--
To view, visit http://gerrit.cloudera.org:8080/17195
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a
Gerrit-Change-Number: 17195
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 21 Apr 2021 21:23:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table cache for HMS DDL apis which modify tables and partitions.

2021-04-21 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17298 )

Change subject: IMPALA-10648: Invalidate catalogd table cache for HMS DDL apis 
which modify tables and partitions.
..


Patch Set 7:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/17298/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17298/5//COMMIT_MSG@7
PS5, Line 7:
nit, Can you please remove this. It is redundant and makes the title 
unnecessary long.


http://gerrit.cloudera.org:8080/#/c/17298/5//COMMIT_MSG@9
PS5, Line 9:
   : For transactio
nit, Can you please reformat this commit msg to 72 line width as per the 
convention.


http://gerrit.cloudera.org:8080/#/c/17298/7/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
File 
fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java:

http://gerrit.cloudera.org:8080/#/c/17298/7/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@2966
PS7, Line 2966: removeTableIfExists
I found that removing the table directly from catalog_ doesn't take the 
metastoreDdlLock_ which can introduce subtle bugs. I had removed this method 
from CatalogServiceCatalog and added it to CatalogOpExecutor in my patch 
https://gerrit.cloudera.org/#/c/17308/. Depending on whether we merge 
https://gerrit.cloudera.org/#/c/17308/ or the current patch first, I think we 
will have to resolve that conflict one or the other way. I suggest for now add 
a //TODO here mentioning that this method should be moved to CatalogOpExecutor 
as suggested in IMPALA-10502. If we merge IMPALA-10502 first in case it takes 
us long time to unrevert https://gerrit.cloudera.org/#/c/17244/, it should be 
handled automatically as we resolve the git conflict.


http://gerrit.cloudera.org:8080/#/c/17298/7/tests/custom_cluster/test_metastore_service.py
File tests/custom_cluster/test_metastore_service.py:

http://gerrit.cloudera.org:8080/#/c/17298/7/tests/custom_cluster/test_metastore_service.py@439
PS7, Line 439: removed
nit, s/removed/invalidated


http://gerrit.cloudera.org:8080/#/c/17298/7/tests/custom_cluster/test_metastore_service.py@442
PS7, Line 442: removed
nit, s/removed/invalidated



--
To view, visit http://gerrit.cloudera.org:8080/17298
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98
Gerrit-Change-Number: 17298
Gerrit-PatchSet: 7
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Wed, 21 Apr 2021 19:55:51 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10669: Limit memory during dataload in Docker-based tests

2021-04-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17331 )

Change subject: IMPALA-10669: Limit memory during dataload in Docker-based tests
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8618/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17331
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5100589ffc54184b6ca13db139cb983175c934eb
Gerrit-Change-Number: 17331
Gerrit-PatchSet: 1
Gerrit-Owner: Laszlo Gaal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Comment-Date: Wed, 21 Apr 2021 19:31:42 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10669: Limit memory during dataload in Docker-based tests

2021-04-21 Thread Laszlo Gaal (Code Review)
Laszlo Gaal has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17331 )

Change subject: IMPALA-10669: Limit memory during dataload in Docker-based tests
..


Patch Set 1:

Ran this on an m5.12xlarge with three different OSs: Ubuntu 16.04, Ubuntu 18.04 
and Centos 7, running just FE_TESTs and JDBC_TEST. All passed, so the patch 
does not trigger the planner test discrepancy seen earlier.


--
To view, visit http://gerrit.cloudera.org:8080/17331
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5100589ffc54184b6ca13db139cb983175c934eb
Gerrit-Change-Number: 17331
Gerrit-PatchSet: 1
Gerrit-Owner: Laszlo Gaal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Comment-Date: Wed, 21 Apr 2021 19:13:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10669: Limit memory during dataload in Docker-based tests

2021-04-21 Thread Laszlo Gaal (Code Review)
Laszlo Gaal has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17331


Change subject: IMPALA-10669: Limit memory during dataload in Docker-based tests
..

IMPALA-10669: Limit memory during dataload in Docker-based tests

When Docker-based tests are run on machines with lots of RAM (e.g
AWS m5.12xlarge with 192GB of RAM), creating ORC files with complex
types often caused Tez containers to attempt to exceed their memory
limits. This resulted in Yarn terminating such containers, making the
dataload phase and the shole build fail.

Since this type of failure was only observed on hosts with very high
amounts of memory, the workaround is to put an artificial constrain on
the memory available for the build container (this is the Docker
container in which Impala is built and the initial dataload is
performed). Memory for this container is limited to 128GB, which is
still much more than dataload requires. It also matches the native
memory size of an AWS r5.4xlarge instance, which routinely runs Impala
builds in our downstream environment.

Change-Id: I5100589ffc54184b6ca13db139cb983175c934eb
---
M docker/test-with-docker.py
1 file changed, 5 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/17331/1
--
To view, visit http://gerrit.cloudera.org:8080/17331
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I5100589ffc54184b6ca13db139cb983175c934eb
Gerrit-Change-Number: 17331
Gerrit-PatchSet: 1
Gerrit-Owner: Laszlo Gaal 


[Impala-ASF-CR] IMPALA-10613: (Addendum) Change isSetGetFileMetadata to isGetFileMetadata

2021-04-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17330 )

Change subject: IMPALA-10613: (Addendum) Change isSetGetFileMetadata to 
isGetFileMetadata
..


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7089/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/17330
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib96985e9865caaf5990cb9ef82787d076728eeff
Gerrit-Change-Number: 17330
Gerrit-PatchSet: 1
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Anonymous Coward (646)
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 21 Apr 2021 18:29:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10613: (Addendum) Change isSetGetFileMetadata to isGetFileMetadata

2021-04-21 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17330 )

Change subject: IMPALA-10613: (Addendum) Change isSetGetFileMetadata to 
isGetFileMetadata
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17330
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib96985e9865caaf5990cb9ef82787d076728eeff
Gerrit-Change-Number: 17330
Gerrit-PatchSet: 1
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Anonymous Coward (646)
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 21 Apr 2021 18:26:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10613: (Addendum) Change isSetGetFileMetadata to isGetFileMetadata

2021-04-21 Thread Anonymous Coward (Code Review)
Anonymous Coward (646) has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17330 )

Change subject: IMPALA-10613: (Addendum) Change isSetGetFileMetadata to 
isGetFileMetadata
..


Patch Set 1: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/17330
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib96985e9865caaf5990cb9ef82787d076728eeff
Gerrit-Change-Number: 17330
Gerrit-PatchSet: 1
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Anonymous Coward (646)
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 21 Apr 2021 18:06:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10613: (Addendum) Change isSetGetFileMetadata to isGetFileMetadata

2021-04-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17330 )

Change subject: IMPALA-10613: (Addendum) Change isSetGetFileMetadata to 
isGetFileMetadata
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8617/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17330
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib96985e9865caaf5990cb9ef82787d076728eeff
Gerrit-Change-Number: 17330
Gerrit-PatchSet: 1
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 21 Apr 2021 17:47:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10613: (Addendum) Change isSetGetFileMetadata to isGetFileMetadata

2021-04-21 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17330


Change subject: IMPALA-10613: (Addendum) Change isSetGetFileMetadata to 
isGetFileMetadata
..

IMPALA-10613: (Addendum) Change isSetGetFileMetadata to isGetFileMetadata

CatalogHMSAPIHelper uses isSetGetFileMetadata which checks if the
flag is set to either true or false. However, we should check if it
is explicitly set to false instead.

Change-Id: Ib96985e9865caaf5990cb9ef82787d076728eeff
---
M fe/src/main/java/org/apache/impala/catalog/CatalogHmsAPIHelper.java
1 file changed, 1 insertion(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/17330/1
--
To view, visit http://gerrit.cloudera.org:8080/17330
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib96985e9865caaf5990cb9ef82787d076728eeff
Gerrit-Change-Number: 17330
Gerrit-PatchSet: 1
Gerrit-Owner: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-10640: Support reading Parquet Bloom filters - most common types

2021-04-21 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17026 )

Change subject: IMPALA-10640: Support reading Parquet Bloom filters - most 
common types
..


Patch Set 27:

The problem seems to be with this test: 
tests/failure/test_failpoints.py::TestFailpoints::test_failpoints, and this 
query in particular:

SET mt_dop=4;
SET batch_size=0;
SET num_nodes=0;
SET disable_codegen_rows_threshold=0;
SET disable_codegen=False;
SET abort_on_error=1;
SET exec_single_node_rows_threshold=0;
SET 
debug_action=0:PREPARE_SCANNER:MEM_LIMIT_EXCEEDED|COORD_BEFORE_EXEC_RPC:JITTER@100@0.3;
SELECT STRAIGHT_JOIN *
FROM functional_kudu.alltypes t1
JOIN /*+broadcast*/ functional_kudu.alltypesagg t2 ON t1.id = t2.id
WHERE t2.int_col < 1000;

It seems that when reading a Kudu table, the code for kudu::BlockBloomFilter 
may come from different places
- from the Kudu (binary?) dependency and
- be/src/kudu/util/block_bloom_filter.h

The crash happens because an assert in Kudu as 
kudu::BlockBloomFilter::hash_algorithm_ has an invalid value.

I think it is caused by binary incompatibility between our modified 
kudu::BlockBloomFilter and the original one, used in Kudu. I have added two 
fields just before kudu::BlockBloomFilter::hash_algorithm_ and probably that 
data is treated as if it were the hash_algorithm_ field by the original Kudu 
code. When I put the new fields to the end of the class, the crash doesn't 
happen.


--
To view, visit http://gerrit.cloudera.org:8080/17026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287
Gerrit-Change-Number: 17026
Gerrit-PatchSet: 27
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 21 Apr 2021 15:50:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7825: Upgrade Thrift version to 0.11.0

2021-04-21 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17170 )

Change subject: IMPALA-7825: Upgrade Thrift version to 0.11.0
..


Patch Set 13: Code-Review+1

(4 comments)

http://gerrit.cloudera.org:8080/#/c/17170/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17170/10//COMMIT_MSG@13
PS10, Line 13: boost:: with std::
 : shared_ptr-s
> It is simply related to Thrift using std:: instead of boost:: shared_ptr's
Done


http://gerrit.cloudera.org:8080/#/c/17170/10//COMMIT_MSG@27
PS10, Line 27: n some ca
> it is the exact name of a command line option, so I think that it is correc
Done


http://gerrit.cloudera.org:8080/#/c/17170/10/be/src/rpc/thrift-thread.h
File be/src/rpc/thrift-thread.h:

http://gerrit.cloudera.org:8080/#/c/17170/10/be/src/rpc/thrift-thread.h@39
PS10, Line 39: /*det
> True for "detached" would mean that newThread() should return with already
Done


http://gerrit.cloudera.org:8080/#/c/17170/10/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/17170/10/be/src/service/impala-server.cc@1536
PS10, Line 1536:   ScopedShardedMapRef> 
map_ref(query_id, _driver_map_);
> specifically in this file shared_ptr is defined and we are using it without
Done



--
To view, visit http://gerrit.cloudera.org:8080/17170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6
Gerrit-Change-Number: 17170
Gerrit-PatchSet: 13
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Comment-Date: Wed, 21 Apr 2021 14:32:40 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10658: LOAD DATA INPATH silently fails between HDFS and Azure ABFS

2021-04-21 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17316 )

Change subject: IMPALA-10658: LOAD DATA INPATH silently fails between HDFS and 
Azure ABFS
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17316/1/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

http://gerrit.cloudera.org:8080/#/c/17316/1/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@588
PS1, Line 588:   String.format("Path '
> This would be similar to the 'base' version. It wouldn't work because Azure
Thanks for the explanation. Sounds like the new code should work.



--
To view, visit http://gerrit.cloudera.org:8080/17316
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id807e8a200b83283a09d3a917185cabab930017d
Gerrit-Change-Number: 17316
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 21 Apr 2021 13:45:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10611: Fix flakiness in test wide row

2021-04-21 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17324 )

Change subject: IMPALA-10611: Fix flakiness in test_wide_row
..


Patch Set 1: Code-Review+2

I'm OK with this fix but maybe we should also re-open IMPALA-681.


--
To view, visit http://gerrit.cloudera.org:8080/17324
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie1f0b7d4d6b3a875d9b408f057d46fdbdbdf2a34
Gerrit-Change-Number: 17324
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 21 Apr 2021 12:50:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10658: LOAD DATA INPATH silently fails between HDFS and Azure ABFS

2021-04-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17316 )

Change subject: IMPALA-10658: LOAD DATA INPATH silently fails between HDFS and 
Azure ABFS
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8616/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17316
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id807e8a200b83283a09d3a917185cabab930017d
Gerrit-Change-Number: 17316
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 21 Apr 2021 10:46:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10658: LOAD DATA INPATH silently fails between HDFS and Azure ABFS

2021-04-21 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17316 )

Change subject: IMPALA-10658: LOAD DATA INPATH silently fails between HDFS and 
Azure ABFS
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17316/1/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

http://gerrit.cloudera.org:8080/#/c/17316/1/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@588
PS1, Line 588:   String.format("Path '
> I like the idea of requiring a qualified path as input. I was about to comm
In the new PS I'm requiring a qualified path and use

 try {
   Path qp = fs.makeQualified(path);
   return path.equals(qp);
 }



--
To view, visit http://gerrit.cloudera.org:8080/17316
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id807e8a200b83283a09d3a917185cabab930017d
Gerrit-Change-Number: 17316
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 21 Apr 2021 10:29:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10658: LOAD DATA INPATH silently fails between HDFS and Azure ABFS

2021-04-21 Thread Zoltan Borok-Nagy (Code Review)
Hello Quanlong Huang, Qifan Chen, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17316

to look at the new patch set (#3).

Change subject: IMPALA-10658: LOAD DATA INPATH silently fails between HDFS and 
Azure ABFS
..

IMPALA-10658: LOAD DATA INPATH silently fails between HDFS and Azure ABFS

LOAD DATA INPATH silently fails when Impala tries to move files from
HDFS to ABFS. The problem is that we use FileSystem.makeQualified(Path)
to decide if path is on a given filesystem. We expect to get an
IllegalArgumentException if path is on a different filesystem. However,
the Azure FileSystem implementation doesn't throw this exception.

Because of that Impala thinks that an 'hdfs://' path and an 'abfs://'
path is on the same filesystem, so it tries to move files with
FileSystem.rename(). In case of errors rename() might throw an
Exception, or return false. Impala doesn't check the return value,
therefore if rename() returns false then the error remains silent.

This patch fixes Impala's isPathOnFileSystem() and also adds a check
for the return value of rename().

Testing:
 * tested manually between HDFS and Azure ABFS.
 * added JUnit test to FileSystemUtilTest

Change-Id: Id807e8a200b83283a09d3a917185cabab930017d
---
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M fe/src/test/java/org/apache/impala/common/FileSystemUtilTest.java
2 files changed, 50 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/17316/3
--
To view, visit http://gerrit.cloudera.org:8080/17316
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id807e8a200b83283a09d3a917185cabab930017d
Gerrit-Change-Number: 17316
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..


Patch Set 17: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 17
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: fifteencai 
Gerrit-Comment-Date: Wed, 21 Apr 2021 08:45:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..

IMPALA-10445: Adjust NDV's scale with query option

This is a new way to control NDV's scale.

Since IMPALA-2658, we can trade memory for more accurate
estimation by setting larger `scale` in SQL function
NDV(, ). However the use of larger NDV scale requires
the modification of SQL queries which may not be practical in certain
applications:

- Firstly, SQL writers are reluctant to lower that scale. They prone
to fill up the scale, which will make the cluster unstable. Especially
when there are `group by`s with high cardinalities. So it is wiser to
let cluster admin other than sql writer choose appropriate scale.

- Secondly, In some application scenarios, queries are stored in DBs.
In a BI system, for example, rewriting thousands of SQLs is risky.

In this commit, we introduced a new Query Option `DEFAULT_NDV_SCALE`
with the following semantics:

1. The allowed value is in the range [1..10];
2. Previously, the scale used in NDV() functions was fixed at 2.
Now the scale is provided by the newly added query options.
3. It does not influence the NDV scale for SQL function
NDV(, ) in which the NDV scale is provided by the 2nd
argument .

We also refactored method `Analyze` to make sure APPX_COUNT_DISTINCT
can work with this query option. After this, cluster admins can
substitute `count(distinct )` with `ndv(, scale)`.

Implementation details:

- The default value of DEFAULT_NDV_SCALE is 2, so we won't change
the default ndv behavior.
- We port `CountDistinctToNdv` transform logic from
`SelectStmt.analyze()` to `ExprRewriter`, making it compatible with
further rewrite rules.
- The newly added rewrite rule `DefaultNdvScaleRule` is applied
after `CountDistinctToNdvRule`.

Usage:

To set a default ndv scale:
```
SET DEFAULT_NDV_SCALE = 10;
```

To unset:
```
SET DEFAULT_NDV_SCALE = 2;
```

Here are test results of a typical workload (cardinality=40,090,650):
++
|   Metric| Count Distinct |NDV2|NDV5|NDV10  |
++
|  Memory(GB) |   3.83 |1.84|1.85| 1.89  |
| Duration(s) |  182.89|   30.22|29.72   | 29.24 |
|  ErrorRate  |0%  |1.8%|1.17%   | 0.06% |
++

Testing:
1) Added 3 unit test cases in `ExprRewriteRulesTest`.
2) Added 5 unit test cases in `ExprRewriterTest`.
3) Ran all front-end unit test, passed.
4) Added a new query-option test.

Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Reviewed-on: http://gerrit.cloudera.org:8080/17306
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
A fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java
A fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java
11 files changed, 314 insertions(+), 34 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 18
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: fifteencai