[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. Patch Set 9: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8619/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 9 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 22 Apr 2021 00:05:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10658: LOAD DATA INPATH silently fails between HDFS and Azure ABFS
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17316 ) Change subject: IMPALA-10658: LOAD DATA INPATH silently fails between HDFS and Azure ABFS .. Patch Set 3: Code-Review+1 This makes sense to me. I'm ok to go to +2 if other reviewers don't have comments. -- To view, visit http://gerrit.cloudera.org:8080/17316 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id807e8a200b83283a09d3a917185cabab930017d Gerrit-Change-Number: 17316 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 21 Apr 2021 23:50:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Qifan Chen has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. IMPALA-10650: Bailout min/max filters in hash join builder early This change set addresses the weakness in population min/max filters in the hash join builder by periodically measuring the usefulness of each such filter and set the 'always_true_' flag to true. Inserting into a filter with a true 'always_true_' flag, the steps from the evaluation of the value from a row to the verification of the value in the min/max range are skipped completely. The above optimization is also LLVM-codeded. Testing: 1. Ran core test; 2. Ran performance test (TBD). Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 --- M be/src/codegen/gen_ir_descriptions.py M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/runtime/runtime-filter-ir.cc M be/src/util/min-max-filter-ir.cc M be/src/util/min-max-filter.cc M be/src/util/min-max-filter.h 9 files changed, 179 insertions(+), 42 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/17295/9 -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 9 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-10669: Limit memory during dataload in Docker-based tests
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17331 ) Change subject: IMPALA-10669: Limit memory during dataload in Docker-based tests .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/17331/1/docker/test-with-docker.py File docker/test-with-docker.py: http://gerrit.cloudera.org:8080/#/c/17331/1/docker/test-with-docker.py@648 PS1, Line 648: # Tez can get confused when creating ORC files with nested structures : # on machines with lots of RAM, e.g. m5.12xlarge. Limiting the available memory : # seems to avoid this problem, so limit container memory just for the build : # container. 128GB is still much more than dataload needs, but does the trick. : extras=["-m", "128g"], Should we make this conditional on the machine having more than 128g of memory? -- To view, visit http://gerrit.cloudera.org:8080/17331 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5100589ffc54184b6ca13db139cb983175c934eb Gerrit-Change-Number: 17331 Gerrit-PatchSet: 1 Gerrit-Owner: Laszlo Gaal Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Comment-Date: Wed, 21 Apr 2021 23:30:50 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10584: Defer advancing read page if stream only has 2 pages.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17195 ) Change subject: IMPALA-10584: Defer advancing read page if stream only has 2 pages. .. Patch Set 6: Code-Review+1 Looks good! -- To view, visit http://gerrit.cloudera.org:8080/17195 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a Gerrit-Change-Number: 17195 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 21 Apr 2021 21:23:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table cache for HMS DDL apis which modify tables and partitions.
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17298 ) Change subject: IMPALA-10648: Invalidate catalogd table cache for HMS DDL apis which modify tables and partitions. .. Patch Set 7: (5 comments) http://gerrit.cloudera.org:8080/#/c/17298/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17298/5//COMMIT_MSG@7 PS5, Line 7: nit, Can you please remove this. It is redundant and makes the title unnecessary long. http://gerrit.cloudera.org:8080/#/c/17298/5//COMMIT_MSG@9 PS5, Line 9: : For transactio nit, Can you please reformat this commit msg to 72 line width as per the convention. http://gerrit.cloudera.org:8080/#/c/17298/7/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java File fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java: http://gerrit.cloudera.org:8080/#/c/17298/7/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@2966 PS7, Line 2966: removeTableIfExists I found that removing the table directly from catalog_ doesn't take the metastoreDdlLock_ which can introduce subtle bugs. I had removed this method from CatalogServiceCatalog and added it to CatalogOpExecutor in my patch https://gerrit.cloudera.org/#/c/17308/. Depending on whether we merge https://gerrit.cloudera.org/#/c/17308/ or the current patch first, I think we will have to resolve that conflict one or the other way. I suggest for now add a //TODO here mentioning that this method should be moved to CatalogOpExecutor as suggested in IMPALA-10502. If we merge IMPALA-10502 first in case it takes us long time to unrevert https://gerrit.cloudera.org/#/c/17244/, it should be handled automatically as we resolve the git conflict. http://gerrit.cloudera.org:8080/#/c/17298/7/tests/custom_cluster/test_metastore_service.py File tests/custom_cluster/test_metastore_service.py: http://gerrit.cloudera.org:8080/#/c/17298/7/tests/custom_cluster/test_metastore_service.py@439 PS7, Line 439: removed nit, s/removed/invalidated http://gerrit.cloudera.org:8080/#/c/17298/7/tests/custom_cluster/test_metastore_service.py@442 PS7, Line 442: removed nit, s/removed/invalidated -- To view, visit http://gerrit.cloudera.org:8080/17298 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98 Gerrit-Change-Number: 17298 Gerrit-PatchSet: 7 Gerrit-Owner: Sourabh Goyal Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Wed, 21 Apr 2021 19:55:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10669: Limit memory during dataload in Docker-based tests
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17331 ) Change subject: IMPALA-10669: Limit memory during dataload in Docker-based tests .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8618/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17331 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5100589ffc54184b6ca13db139cb983175c934eb Gerrit-Change-Number: 17331 Gerrit-PatchSet: 1 Gerrit-Owner: Laszlo Gaal Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Comment-Date: Wed, 21 Apr 2021 19:31:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10669: Limit memory during dataload in Docker-based tests
Laszlo Gaal has posted comments on this change. ( http://gerrit.cloudera.org:8080/17331 ) Change subject: IMPALA-10669: Limit memory during dataload in Docker-based tests .. Patch Set 1: Ran this on an m5.12xlarge with three different OSs: Ubuntu 16.04, Ubuntu 18.04 and Centos 7, running just FE_TESTs and JDBC_TEST. All passed, so the patch does not trigger the planner test discrepancy seen earlier. -- To view, visit http://gerrit.cloudera.org:8080/17331 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5100589ffc54184b6ca13db139cb983175c934eb Gerrit-Change-Number: 17331 Gerrit-PatchSet: 1 Gerrit-Owner: Laszlo Gaal Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Comment-Date: Wed, 21 Apr 2021 19:13:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10669: Limit memory during dataload in Docker-based tests
Laszlo Gaal has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17331 Change subject: IMPALA-10669: Limit memory during dataload in Docker-based tests .. IMPALA-10669: Limit memory during dataload in Docker-based tests When Docker-based tests are run on machines with lots of RAM (e.g AWS m5.12xlarge with 192GB of RAM), creating ORC files with complex types often caused Tez containers to attempt to exceed their memory limits. This resulted in Yarn terminating such containers, making the dataload phase and the shole build fail. Since this type of failure was only observed on hosts with very high amounts of memory, the workaround is to put an artificial constrain on the memory available for the build container (this is the Docker container in which Impala is built and the initial dataload is performed). Memory for this container is limited to 128GB, which is still much more than dataload requires. It also matches the native memory size of an AWS r5.4xlarge instance, which routinely runs Impala builds in our downstream environment. Change-Id: I5100589ffc54184b6ca13db139cb983175c934eb --- M docker/test-with-docker.py 1 file changed, 5 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/17331/1 -- To view, visit http://gerrit.cloudera.org:8080/17331 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I5100589ffc54184b6ca13db139cb983175c934eb Gerrit-Change-Number: 17331 Gerrit-PatchSet: 1 Gerrit-Owner: Laszlo Gaal
[Impala-ASF-CR] IMPALA-10613: (Addendum) Change isSetGetFileMetadata to isGetFileMetadata
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17330 ) Change subject: IMPALA-10613: (Addendum) Change isSetGetFileMetadata to isGetFileMetadata .. Patch Set 1: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7089/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/17330 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib96985e9865caaf5990cb9ef82787d076728eeff Gerrit-Change-Number: 17330 Gerrit-PatchSet: 1 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Anonymous Coward (646) Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 21 Apr 2021 18:29:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10613: (Addendum) Change isSetGetFileMetadata to isGetFileMetadata
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17330 ) Change subject: IMPALA-10613: (Addendum) Change isSetGetFileMetadata to isGetFileMetadata .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17330 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib96985e9865caaf5990cb9ef82787d076728eeff Gerrit-Change-Number: 17330 Gerrit-PatchSet: 1 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Anonymous Coward (646) Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 21 Apr 2021 18:26:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10613: (Addendum) Change isSetGetFileMetadata to isGetFileMetadata
Anonymous Coward (646) has posted comments on this change. ( http://gerrit.cloudera.org:8080/17330 ) Change subject: IMPALA-10613: (Addendum) Change isSetGetFileMetadata to isGetFileMetadata .. Patch Set 1: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/17330 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib96985e9865caaf5990cb9ef82787d076728eeff Gerrit-Change-Number: 17330 Gerrit-PatchSet: 1 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Anonymous Coward (646) Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 21 Apr 2021 18:06:19 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10613: (Addendum) Change isSetGetFileMetadata to isGetFileMetadata
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17330 ) Change subject: IMPALA-10613: (Addendum) Change isSetGetFileMetadata to isGetFileMetadata .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8617/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17330 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib96985e9865caaf5990cb9ef82787d076728eeff Gerrit-Change-Number: 17330 Gerrit-PatchSet: 1 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 21 Apr 2021 17:47:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10613: (Addendum) Change isSetGetFileMetadata to isGetFileMetadata
Vihang Karajgaonkar has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17330 Change subject: IMPALA-10613: (Addendum) Change isSetGetFileMetadata to isGetFileMetadata .. IMPALA-10613: (Addendum) Change isSetGetFileMetadata to isGetFileMetadata CatalogHMSAPIHelper uses isSetGetFileMetadata which checks if the flag is set to either true or false. However, we should check if it is explicitly set to false instead. Change-Id: Ib96985e9865caaf5990cb9ef82787d076728eeff --- M fe/src/main/java/org/apache/impala/catalog/CatalogHmsAPIHelper.java 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/17330/1 -- To view, visit http://gerrit.cloudera.org:8080/17330 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ib96985e9865caaf5990cb9ef82787d076728eeff Gerrit-Change-Number: 17330 Gerrit-PatchSet: 1 Gerrit-Owner: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-10640: Support reading Parquet Bloom filters - most common types
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/17026 ) Change subject: IMPALA-10640: Support reading Parquet Bloom filters - most common types .. Patch Set 27: The problem seems to be with this test: tests/failure/test_failpoints.py::TestFailpoints::test_failpoints, and this query in particular: SET mt_dop=4; SET batch_size=0; SET num_nodes=0; SET disable_codegen_rows_threshold=0; SET disable_codegen=False; SET abort_on_error=1; SET exec_single_node_rows_threshold=0; SET debug_action=0:PREPARE_SCANNER:MEM_LIMIT_EXCEEDED|COORD_BEFORE_EXEC_RPC:JITTER@100@0.3; SELECT STRAIGHT_JOIN * FROM functional_kudu.alltypes t1 JOIN /*+broadcast*/ functional_kudu.alltypesagg t2 ON t1.id = t2.id WHERE t2.int_col < 1000; It seems that when reading a Kudu table, the code for kudu::BlockBloomFilter may come from different places - from the Kudu (binary?) dependency and - be/src/kudu/util/block_bloom_filter.h The crash happens because an assert in Kudu as kudu::BlockBloomFilter::hash_algorithm_ has an invalid value. I think it is caused by binary incompatibility between our modified kudu::BlockBloomFilter and the original one, used in Kudu. I have added two fields just before kudu::BlockBloomFilter::hash_algorithm_ and probably that data is treated as if it were the hash_algorithm_ field by the original Kudu code. When I put the new fields to the end of the class, the crash doesn't happen. -- To view, visit http://gerrit.cloudera.org:8080/17026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287 Gerrit-Change-Number: 17026 Gerrit-PatchSet: 27 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 21 Apr 2021 15:50:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7825: Upgrade Thrift version to 0.11.0
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17170 ) Change subject: IMPALA-7825: Upgrade Thrift version to 0.11.0 .. Patch Set 13: Code-Review+1 (4 comments) http://gerrit.cloudera.org:8080/#/c/17170/10//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17170/10//COMMIT_MSG@13 PS10, Line 13: boost:: with std:: : shared_ptr-s > It is simply related to Thrift using std:: instead of boost:: shared_ptr's Done http://gerrit.cloudera.org:8080/#/c/17170/10//COMMIT_MSG@27 PS10, Line 27: n some ca > it is the exact name of a command line option, so I think that it is correc Done http://gerrit.cloudera.org:8080/#/c/17170/10/be/src/rpc/thrift-thread.h File be/src/rpc/thrift-thread.h: http://gerrit.cloudera.org:8080/#/c/17170/10/be/src/rpc/thrift-thread.h@39 PS10, Line 39: /*det > True for "detached" would mean that newThread() should return with already Done http://gerrit.cloudera.org:8080/#/c/17170/10/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/17170/10/be/src/service/impala-server.cc@1536 PS10, Line 1536: ScopedShardedMapRef> map_ref(query_id, _driver_map_); > specifically in this file shared_ptr is defined and we are using it without Done -- To view, visit http://gerrit.cloudera.org:8080/17170 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6 Gerrit-Change-Number: 17170 Gerrit-PatchSet: 13 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tamas Mate Gerrit-Comment-Date: Wed, 21 Apr 2021 14:32:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10658: LOAD DATA INPATH silently fails between HDFS and Azure ABFS
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17316 ) Change subject: IMPALA-10658: LOAD DATA INPATH silently fails between HDFS and Azure ABFS .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/17316/1/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java: http://gerrit.cloudera.org:8080/#/c/17316/1/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@588 PS1, Line 588: String.format("Path ' > This would be similar to the 'base' version. It wouldn't work because Azure Thanks for the explanation. Sounds like the new code should work. -- To view, visit http://gerrit.cloudera.org:8080/17316 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id807e8a200b83283a09d3a917185cabab930017d Gerrit-Change-Number: 17316 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 21 Apr 2021 13:45:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10611: Fix flakiness in test wide row
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17324 ) Change subject: IMPALA-10611: Fix flakiness in test_wide_row .. Patch Set 1: Code-Review+2 I'm OK with this fix but maybe we should also re-open IMPALA-681. -- To view, visit http://gerrit.cloudera.org:8080/17324 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie1f0b7d4d6b3a875d9b408f057d46fdbdbdf2a34 Gerrit-Change-Number: 17324 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 21 Apr 2021 12:50:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10658: LOAD DATA INPATH silently fails between HDFS and Azure ABFS
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17316 ) Change subject: IMPALA-10658: LOAD DATA INPATH silently fails between HDFS and Azure ABFS .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8616/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17316 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id807e8a200b83283a09d3a917185cabab930017d Gerrit-Change-Number: 17316 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 21 Apr 2021 10:46:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10658: LOAD DATA INPATH silently fails between HDFS and Azure ABFS
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17316 ) Change subject: IMPALA-10658: LOAD DATA INPATH silently fails between HDFS and Azure ABFS .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/17316/1/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java: http://gerrit.cloudera.org:8080/#/c/17316/1/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@588 PS1, Line 588: String.format("Path ' > I like the idea of requiring a qualified path as input. I was about to comm In the new PS I'm requiring a qualified path and use try { Path qp = fs.makeQualified(path); return path.equals(qp); } -- To view, visit http://gerrit.cloudera.org:8080/17316 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id807e8a200b83283a09d3a917185cabab930017d Gerrit-Change-Number: 17316 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 21 Apr 2021 10:29:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10658: LOAD DATA INPATH silently fails between HDFS and Azure ABFS
Hello Quanlong Huang, Qifan Chen, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17316 to look at the new patch set (#3). Change subject: IMPALA-10658: LOAD DATA INPATH silently fails between HDFS and Azure ABFS .. IMPALA-10658: LOAD DATA INPATH silently fails between HDFS and Azure ABFS LOAD DATA INPATH silently fails when Impala tries to move files from HDFS to ABFS. The problem is that we use FileSystem.makeQualified(Path) to decide if path is on a given filesystem. We expect to get an IllegalArgumentException if path is on a different filesystem. However, the Azure FileSystem implementation doesn't throw this exception. Because of that Impala thinks that an 'hdfs://' path and an 'abfs://' path is on the same filesystem, so it tries to move files with FileSystem.rename(). In case of errors rename() might throw an Exception, or return false. Impala doesn't check the return value, therefore if rename() returns false then the error remains silent. This patch fixes Impala's isPathOnFileSystem() and also adds a check for the return value of rename(). Testing: * tested manually between HDFS and Azure ABFS. * added JUnit test to FileSystemUtilTest Change-Id: Id807e8a200b83283a09d3a917185cabab930017d --- M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java M fe/src/test/java/org/apache/impala/common/FileSystemUtilTest.java 2 files changed, 50 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/17316/3 -- To view, visit http://gerrit.cloudera.org:8080/17316 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id807e8a200b83283a09d3a917185cabab930017d Gerrit-Change-Number: 17316 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17306 ) Change subject: IMPALA-10445: Adjust NDV's scale with query option .. Patch Set 17: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/17306 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50 Gerrit-Change-Number: 17306 Gerrit-PatchSet: 17 Gerrit-Owner: fifteencai Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: fifteencai Gerrit-Comment-Date: Wed, 21 Apr 2021 08:45:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17306 ) Change subject: IMPALA-10445: Adjust NDV's scale with query option .. IMPALA-10445: Adjust NDV's scale with query option This is a new way to control NDV's scale. Since IMPALA-2658, we can trade memory for more accurate estimation by setting larger `scale` in SQL function NDV(, ). However the use of larger NDV scale requires the modification of SQL queries which may not be practical in certain applications: - Firstly, SQL writers are reluctant to lower that scale. They prone to fill up the scale, which will make the cluster unstable. Especially when there are `group by`s with high cardinalities. So it is wiser to let cluster admin other than sql writer choose appropriate scale. - Secondly, In some application scenarios, queries are stored in DBs. In a BI system, for example, rewriting thousands of SQLs is risky. In this commit, we introduced a new Query Option `DEFAULT_NDV_SCALE` with the following semantics: 1. The allowed value is in the range [1..10]; 2. Previously, the scale used in NDV() functions was fixed at 2. Now the scale is provided by the newly added query options. 3. It does not influence the NDV scale for SQL function NDV(, ) in which the NDV scale is provided by the 2nd argument . We also refactored method `Analyze` to make sure APPX_COUNT_DISTINCT can work with this query option. After this, cluster admins can substitute `count(distinct )` with `ndv(, scale)`. Implementation details: - The default value of DEFAULT_NDV_SCALE is 2, so we won't change the default ndv behavior. - We port `CountDistinctToNdv` transform logic from `SelectStmt.analyze()` to `ExprRewriter`, making it compatible with further rewrite rules. - The newly added rewrite rule `DefaultNdvScaleRule` is applied after `CountDistinctToNdvRule`. Usage: To set a default ndv scale: ``` SET DEFAULT_NDV_SCALE = 10; ``` To unset: ``` SET DEFAULT_NDV_SCALE = 2; ``` Here are test results of a typical workload (cardinality=40,090,650): ++ | Metric| Count Distinct |NDV2|NDV5|NDV10 | ++ | Memory(GB) | 3.83 |1.84|1.85| 1.89 | | Duration(s) | 182.89| 30.22|29.72 | 29.24 | | ErrorRate |0% |1.8%|1.17% | 0.06% | ++ Testing: 1) Added 3 unit test cases in `ExprRewriteRulesTest`. 2) Added 5 unit test cases in `ExprRewriterTest`. 3) Ran all front-end unit test, passed. 4) Added a new query-option test. Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50 Reviewed-on: http://gerrit.cloudera.org:8080/17306 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java A fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java A fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java 11 files changed, 314 insertions(+), 34 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/17306 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50 Gerrit-Change-Number: 17306 Gerrit-PatchSet: 18 Gerrit-Owner: fifteencai Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: fifteencai