[Impala-ASF-CR] IMPALA-10497: Fix flakiness in test no fd caching on cached data.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17054 ) Change subject: IMPALA-10497: Fix flakiness in test_no_fd_caching_on_cached_data. .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8113/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17054 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I774f9dfea7dcc107c3c7f2b76db3aaf4b2dd7952 Gerrit-Change-Number: 17054 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 10 Feb 2021 07:53:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9979: part 2: partitioned top-n
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/16242 ) Change subject: IMPALA-9979: part 2: partitioned top-n .. Patch Set 34: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16242 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic638af9495981d889a4cb7455a71e8be0eb1a8e5 Gerrit-Change-Number: 16242 Gerrit-PatchSet: 34 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 10 Feb 2021 07:49:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9979: part 2: partitioned top-n
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16242 ) Change subject: IMPALA-9979: part 2: partitioned top-n .. Patch Set 34: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8112/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16242 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic638af9495981d889a4cb7455a71e8be0eb1a8e5 Gerrit-Change-Number: 16242 Gerrit-PatchSet: 34 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 10 Feb 2021 07:40:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10497: Fix flakiness in test no fd caching on cached data.
Riza Suminto has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17054 Change subject: IMPALA-10497: Fix flakiness in test_no_fd_caching_on_cached_data. .. IMPALA-10497: Fix flakiness in test_no_fd_caching_on_cached_data. test_no_fd_caching_on_cached_data has been flaky for not having all of the data fully cached in warm up phase. This patch fix the test by: 1. Reduce the number of rows written to table cachefd.simple. 2. Repeat the warm up query 5 times. 3. Lower the cluster size to 1. Testing: - Loop the test manually 100 times and see no more failures. Change-Id: I774f9dfea7dcc107c3c7f2b76db3aaf4b2dd7952 --- M tests/custom_cluster/test_hdfs_fd_caching.py 1 file changed, 31 insertions(+), 20 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/17054/1 -- To view, visit http://gerrit.cloudera.org:8080/17054 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I774f9dfea7dcc107c3c7f2b76db3aaf4b2dd7952 Gerrit-Change-Number: 17054 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto
[Impala-ASF-CR] IMPALA-9979: part 2: partitioned top-n
Hello Aman Sinha, Qifan Chen, Thomas Tauber-Marshall, Shant Hovsepian, David Rorke, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16242 to look at the new patch set (#34). Change subject: IMPALA-9979: part 2: partitioned top-n .. IMPALA-9979: part 2: partitioned top-n Planner changes: --- The planner now identifies predicates that can be converted into limits in a partitioned or unpartitioned top-n with the following method: * Push down predicates that reference analytic tuple into inline view. These will be evaluated after the analytic plan for the inline SelectStmt is generated. * Identify predicates that reference the analytic tuple and could be converted to limits. * If they can be applied to the last sort group of the analytic plan, and the windows are all compatible, then the lowest limit gets converted into a limit in the top N. * Otherwise generate a select node with the conjuncts. We add logic to merge SELECT nodes to avoid generating duplicates from inside and outside the inline view. * The pushed predicate is still added to the SELECT node because it is necessary for correctness for predicates like '=' to filter additional rows and also the limit pushdown optimization looks for analytic predicates there, so retaining all predicates simplifies that. The selectivity of the predicate is adjusted so that cardinality estimates remain accurate. The optimization can be disabled by setting ANALYTIC_RANK_PUSHDOWN_THRESHOLD=0. By default it is only enabled for limits of 1000 or less, because the in-memory Top-N may perform significantly worse than a full sort for large heaps (since updating the heap for every input row ends up being more expensive than doing a traditional sort). We could probably optimize this more with better tuning so that it can gracefully fall back to doing the full sort at runtime. rank() and row_number() are handled. rank() needs support in the TopN node to include ties for the last place, which is also added in this patch. If predicates are trivially false, we generate empty nodes. This interacts with the limit pushdwon optimization. The limit pushdown optimization is applied after the partitioned top-n is generated, and can sometimes result in more optimal plans, so it is generalized to handle pushing into partitioned top-n nodes. Backend changes: --- The top-n node in the backend is augmented to handle the partitioned case, for which we use a std::map and a comparator based on the partition exprs. The partitioned top-n node has a soft limit of 64MB on the size of the in-memory heaps and can spill with use of an embedded Sorter. The current implementation tries to evict heaps that are less effective at filtering rows. Limitations: --- There are several possible extensions to this that we did not do: * dense_rank() is not supported because it would require additional backend support - IMPALA-10014. * ntile() is not supported because it would require additional backend support - IMPALA-10174. * Only one predicate per analytic is pushed. * Redundant rank()/row_number() predicates are not merged, only the lowest is chosen. * Lower bounds are not converted into OFFSET. * The analytic operator cannot be eliminated even if the analytic expression was only used in the predicate. * This doesn't push predicates into UNION - IMPALA-10013 * Always false predicates don't result in empty plan - IMPALA-10015 Tests: - * Planner tests - added tests that exercise the interesting code paths added in planning. - Predicate ordering in SELECT nodes changed in a couple of cases because some predicates were pushed into the inline views. * Modified SORT targeted perf tests to avoid conversion to Top-N * Added targeted perf test for partitioned top-n. * End-to-end tests - Unpartitioned Top-N end-to-end tests - Basic partitioning and duplicate handling tests on functional - Similar basic tests on larger inputs from TPC-DS and with larger partition counts. - I inspected the results and also ran the same tests with analytic_rank_pushdown_threshold=0 to confirm that the results were the same as with the full sort. - Fallback to spilling sort. Perf: - Added a targeted benchmark that goes from ~2s to ~1s with mt_dop=8 on TPC-H 30 on my desktop. Change-Id: Ic638af9495981d889a4cb7455a71e8be0eb1a8e5 --- M be/src/codegen/gen_ir_descriptions.py M be/src/exec/exec-node.cc M be/src/exec/topn-node-ir.cc M be/src/exec/topn-node.cc M be/src/exec/topn-node.h M be/src/exprs/slot-ref.h M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/priority-queue.h M be/src/util/tuple-row-compare.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M
[Impala-ASF-CR] IMPALA-9979: part 2: partitioned top-n
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16242 ) Change subject: IMPALA-9979: part 2: partitioned top-n .. Patch Set 33: (2 comments) http://gerrit.cloudera.org:8080/#/c/16242/30/be/src/exec/topn-node.cc File be/src/exec/topn-node.cc: http://gerrit.cloudera.org:8080/#/c/16242/30/be/src/exec/topn-node.cc@559 PS30, Line 559: // We evict heaps starting with the heaps that were least effective at filtering > I think I was partly thinking of the computation within the AnalyticEval no Yup http://gerrit.cloudera.org:8080/#/c/16242/33/testdata/workloads/functional-planner/queries/PlannerTest/analytic-rank-pushdown.test File testdata/workloads/functional-planner/queries/PlannerTest/analytic-rank-pushdown.test: http://gerrit.cloudera.org:8080/#/c/16242/33/testdata/workloads/functional-planner/queries/PlannerTest/analytic-rank-pushdown.test@22 PS33, Line 22: limit predicate > When browsing through the plans one more time, it occurred to me that some I think that makes sense, it is introducing a very specific new term that might be confusing. I replaced it. -- To view, visit http://gerrit.cloudera.org:8080/16242 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic638af9495981d889a4cb7455a71e8be0eb1a8e5 Gerrit-Change-Number: 16242 Gerrit-PatchSet: 33 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 10 Feb 2021 07:21:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10469: push quickstart to apache repo
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17030 ) Change subject: IMPALA-10469: push quickstart to apache repo .. IMPALA-10469: push quickstart to apache repo This adds a script, docker/publish_images_to_apache.sh, that allows uploading images to the apache/impala docker hub repo, prefixed with a version string. E.g. with the following commands: ninja docker_images quickstart_docker_images ./docker/publish_images_to_apache.sh -v 81d5377c2 The uploaded images can then be used for the quickstart cluster, as documented in docker/README. Updated docs for quickstart to use a prefix from apache/impala Remove IMPALA_QUICKSTART_VERSION, which doesn't interact well with the tagging since the image name and version are now encoded in the tag. Fix an incorrect image name added to docker-images.txt: impala_profile_tool_image. Testing: Ran Impala quickstart with data loading using instructions in README. export IMPALA_QUICKSTART_IMAGE_PREFIX="apache/impala:81d5377c2-" docker network create -d bridge quickstart-network export QUICKSTART_IP=$(docker network inspect quickstart-network -f '{{(index .IPAM.Config 0).Gateway}}') export QUICKSTART_LISTEN_ADDR=$QUICKSTART_IP docker-compose -f docker/quickstart.yml \ -f docker/quickstart-kudu-minimal.yml \ -f docker/quickstart-load-data.yml up -d docker run --network=quickstart-network -it \ ${IMPALA_QUICKSTART_IMAGE_PREFIX}impala_quickstart_client impala-shell Change-Id: I535d77e565b73d732ae511d7525193467086c76a Reviewed-on: http://gerrit.cloudera.org:8080/17030 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M docker/CMakeLists.txt M docker/README.md A docker/publish_images_to_apache.sh M docker/quickstart-load-data.yml M docker/quickstart.yml 5 files changed, 115 insertions(+), 14 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/17030 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I535d77e565b73d732ae511d7525193467086c76a Gerrit-Change-Number: 17030 Gerrit-PatchSet: 5 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10469: push quickstart to apache repo
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17030 ) Change subject: IMPALA-10469: push quickstart to apache repo .. Patch Set 4: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/17030 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I535d77e565b73d732ae511d7525193467086c76a Gerrit-Change-Number: 17030 Gerrit-PatchSet: 4 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 10 Feb 2021 06:56:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10397 : Reduce flakiness in test single workload
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17028 ) Change subject: IMPALA-10397 : Reduce flakiness in test_single_workload .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/17028 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I73ea5eb663db6d03832b19ed323670590946f514 Gerrit-Change-Number: 17028 Gerrit-PatchSet: 3 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 10 Feb 2021 06:05:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10397 : Reduce flakiness in test single workload
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17028 ) Change subject: IMPALA-10397 : Reduce flakiness in test_single_workload .. IMPALA-10397 : Reduce flakiness in test_single_workload This test failed recently due to a timeout waiting for executors to come up. The logs showed that the executors came up on time but it was not recognized by the coordinator. This patch attempts to reduce flakiness by increasing the timeout and adding more logging in case this happens in the future. Testing: Ran in a loop on my local for a few hours. Change-Id: I73ea5eb663db6d03832b19ed323670590946f514 Reviewed-on: http://gerrit.cloudera.org:8080/17028 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M tests/custom_cluster/test_auto_scaling.py 1 file changed, 16 insertions(+), 9 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/17028 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I73ea5eb663db6d03832b19ed323670590946f514 Gerrit-Change-Number: 17028 Gerrit-PatchSet: 4 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10496: SAML implementation in Impala
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16833 ) Change subject: IMPALA-10496: SAML implementation in Impala .. Patch Set 21: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6878/ -- To view, visit http://gerrit.cloudera.org:8080/16833 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia0c026cba1b90e7ff6ec5ae49be78b0d1edd8dfa Gerrit-Change-Number: 16833 Gerrit-PatchSet: 21 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Wed, 10 Feb 2021 03:53:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10467: Implement ds theta union() function
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17048 ) Change subject: IMPALA-10467: Implement ds_theta_union() function .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8111/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17048 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91baf58c76eb43748acd5245047edac8c66761b2 Gerrit-Change-Number: 17048 Gerrit-PatchSet: 1 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 10 Feb 2021 01:52:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10467: Implement ds theta union() function
Fucun Chu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17048 Change subject: IMPALA-10467: Implement ds_theta_union() function .. IMPALA-10467: Implement ds_theta_union() function This function receives a set of serialized Apache DataSketches Theta sketches produced by ds_theta_sketch() and merges them into a single sketch. An example usage is to create a sketch for each partition of a table, write these sketches to a separate table and based on which partition the user is interested of the relevant sketches can be union-ed together to get an estimate. E.g.: SELECT ds_theta_estimate(ds_theta_union(sketch_col)) FROM sketch_tbl WHERE partition_col=1 OR partition_col=5; Testing: - Apart from the automated tests I added to this patch I also tested ds_theta_union() on a bigger dataset to check that serialization, deserialization and merging steps work well. I took TPCH25.linelitem, created a number of sketches with grouping by l_shipdate and called ds_theta_union() on those sketches Change-Id: I91baf58c76eb43748acd5245047edac8c66761b2 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/data/README A testdata/data/theta_sketches_from_impala.parquet M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test M tests/query_test/test_datasketches.py 7 files changed, 162 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/17048/1 -- To view, visit http://gerrit.cloudera.org:8080/17048 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I91baf58c76eb43748acd5245047edac8c66761b2 Gerrit-Change-Number: 17048 Gerrit-PatchSet: 1 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10469: push quickstart to apache repo
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17030 ) Change subject: IMPALA-10469: push quickstart to apache repo .. Patch Set 4: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6880/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/17030 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I535d77e565b73d732ae511d7525193467086c76a Gerrit-Change-Number: 17030 Gerrit-PatchSet: 4 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 10 Feb 2021 01:22:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10469: push quickstart to apache repo
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17030 ) Change subject: IMPALA-10469: push quickstart to apache repo .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17030 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I535d77e565b73d732ae511d7525193467086c76a Gerrit-Change-Number: 17030 Gerrit-PatchSet: 4 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 10 Feb 2021 01:22:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10397 : Reduce flakiness in test single workload
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17028 ) Change subject: IMPALA-10397 : Reduce flakiness in test_single_workload .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8110/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17028 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I73ea5eb663db6d03832b19ed323670590946f514 Gerrit-Change-Number: 17028 Gerrit-PatchSet: 2 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 10 Feb 2021 00:43:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5675: Support UTF-8 Varchar and Char
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16909 ) Change subject: IMPALA-5675: Support UTF-8 Varchar and Char .. Patch Set 12: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8109/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16909 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I62efa3042c64d1d005a2cf4fd1d31e992543963f Gerrit-Change-Number: 16909 Gerrit-PatchSet: 12 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 10 Feb 2021 00:35:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10397 : Reduce flakiness in test single workload
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17028 ) Change subject: IMPALA-10397 : Reduce flakiness in test_single_workload .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17028 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I73ea5eb663db6d03832b19ed323670590946f514 Gerrit-Change-Number: 17028 Gerrit-PatchSet: 3 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 10 Feb 2021 00:24:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10397 : Reduce flakiness in test single workload
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17028 ) Change subject: IMPALA-10397 : Reduce flakiness in test_single_workload .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6879/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/17028 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I73ea5eb663db6d03832b19ed323670590946f514 Gerrit-Change-Number: 17028 Gerrit-PatchSet: 3 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 10 Feb 2021 00:24:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10397 : Reduce flakiness in test single workload
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/17028 ) Change subject: IMPALA-10397 : Reduce flakiness in test_single_workload .. Patch Set 2: Code-Review+2 (2 comments) Carrying over +2 http://gerrit.cloudera.org:8080/#/c/17028/1/tests/custom_cluster/test_auto_scaling.py File tests/custom_cluster/test_auto_scaling.py: http://gerrit.cloudera.org:8080/#/c/17028/1/tests/custom_cluster/test_auto_scaling.py@54 PS1, Line 54: > flake8: E501 line too long (91 > 90 characters) Done http://gerrit.cloudera.org:8080/#/c/17028/1/tests/custom_cluster/test_auto_scaling.py@60 PS1, Line 60: metric_val = self.impalad_test_service.get_metric_value(TOTAL_BACKENDS_METRIC_NAME) > Nit: tidier to use a variable for "cluster-membership.backends.total" rathe Done -- To view, visit http://gerrit.cloudera.org:8080/17028 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I73ea5eb663db6d03832b19ed323670590946f514 Gerrit-Change-Number: 17028 Gerrit-PatchSet: 2 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 10 Feb 2021 00:24:35 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10397 : Reduce flakiness in test single workload
Hello Andrew Sherman, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17028 to look at the new patch set (#2). Change subject: IMPALA-10397 : Reduce flakiness in test_single_workload .. IMPALA-10397 : Reduce flakiness in test_single_workload This test failed recently due to a timeout waiting for executors to come up. The logs showed that the executors came up on time but it was not recognized by the coordinator. This patch attempts to reduce flakiness by increasing the timeout and adding more logging in case this happens in the future. Testing: Ran in a loop on my local for a few hours. Change-Id: I73ea5eb663db6d03832b19ed323670590946f514 --- M tests/custom_cluster/test_auto_scaling.py 1 file changed, 16 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/17028/2 -- To view, visit http://gerrit.cloudera.org:8080/17028 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I73ea5eb663db6d03832b19ed323670590946f514 Gerrit-Change-Number: 17028 Gerrit-PatchSet: 2 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-5675: Support UTF-8 Varchar and Char
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16909 to look at the new patch set (#12). Change subject: IMPALA-5675: Support UTF-8 Varchar and Char .. IMPALA-5675: Support UTF-8 Varchar and Char This patch adds support for UTF-8 aware varchar and char types. In UTF-8 mode, when truncating UTF-8 varchar(N) and char(N) strings, lengths will be counted by UTF-8 characters instead of bytes. So the result string will have up to N UTF-8 characters. The UTF8_MODE query option is first detected in FE when analyzing the query. A 'is_utf8' label is added in Exprs and SlotDescriptors. They are used in generating thrift objects and computing the tuple layouts. A char(N) slot will occupy 4 * N bytes if it's in UTF-8 type, because a UTF-8 character can be encoded into 1~4 bytes. The slot will store up to N UTF-8 characters. There is a gotcha that we should not add the label in Type.java, because Type instances are shared across the FE. Query compilation reuses the Type instances from the metadata. If we modify Type instances during compilation, other queries in non-UTF8 mode will be affected. However, in BE, we need the type related classes (e.g. ColumnType, TypeDesc) to carry in the utf8 markers. It's impractical to check the UTF8_MODE query option everywhere it needs to be. E.g. in AnyValUtil::SetAnyVal we can't access the query options. So we add the 'is_utf8' marker in TScalarType, ColumnType, TypeDesc to conveniently distinguish char(N) and varchar(N) types in UTF-8 mode. When generating thrift objects in FE, Exprs and SlotDescriptors deliver 'is_utf8' markers to TScalaTypes. They finally landed in ColumnType and TypeDesc instances. Given the correct UTF-8 mode checked, we just need to truncate/pad the char/varchar strings with their length counted by UTF-8 characters. Since char(N) slots always occupy 4N bytes, when converting char(N) to other string types, we need to re-calculate the actual length corresponding to N UTF-8 characters. We can optimize this in later patches, e.g. store the UTF-8 length in the slot, or deal with UTF-8 char(N) by the same way as varchar(N), i.e. reallocate the string space and just store the pointer and length in the slot. Tests: - Add tests for reading char(N) and varchar(N) columns in UTF8_MODE. - Add truncating/padding tests - Kudu only supports Varchar currently. Add special tests for Kudu. - Add tests for writing CHAR(N)/VARCHAR(N) in UTF-8 mode. Change-Id: I62efa3042c64d1d005a2cf4fd1d31e992543963f --- M be/src/codegen/codegen-anyval.cc M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/llvm-codegen.cc M be/src/exec/data-source-scan-node.cc M be/src/exec/grouping-aggregator.cc M be/src/exec/hdfs-avro-scanner-ir.cc M be/src/exec/hdfs-avro-scanner-test.cc M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h M be/src/exec/hdfs-text-table-writer.cc M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-util.cc M be/src/exec/kudu-util.h M be/src/exec/orc-column-readers.cc M be/src/exec/parquet/hdfs-parquet-table-writer.cc M be/src/exec/parquet/parquet-column-readers.cc M be/src/exec/parquet/parquet-column-stats.inline.h M be/src/exec/parquet/parquet-common.h M be/src/exec/parquet/parquet-plain-test.cc M be/src/exec/text-converter.cc M be/src/exec/text-converter.inline.h M be/src/exprs/agg-fn-evaluator.cc M be/src/exprs/anyval-util.cc M be/src/exprs/anyval-util.h M be/src/exprs/cast-functions-ir.cc M be/src/exprs/scalar-expr-evaluator.cc M be/src/exprs/scalar-fn-call.cc M be/src/exprs/slot-ref.cc M be/src/runtime/raw-value-ir.cc M be/src/runtime/raw-value.cc M be/src/runtime/raw-value.inline.h M be/src/runtime/tuple.cc M be/src/runtime/types.cc M be/src/runtime/types.h M be/src/service/fe-support.cc M be/src/service/hs2-util.cc M be/src/udf/udf-internal.h M be/src/udf/udf.cc M be/src/udf/udf.h M be/src/util/CMakeLists.txt M be/src/util/bit-util.h M be/src/util/dict-encoding.h M be/src/util/string-util-test.cc M be/src/util/string-util.cc M be/src/util/string-util.h M be/src/util/tuple-row-compare.cc M common/thrift/Types.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/CastExpr.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/catalog/Type.java M fe/src/main/java/org/apache/impala/service/Frontend.java M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-query/queries/QueryTest/kudu_create.test A testdata/workloads/functional-query/queries/QueryTest/utf8-chars-casting.test A
[Impala-ASF-CR] IMPALA-10469: push quickstart to apache repo
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17030 ) Change subject: IMPALA-10469: push quickstart to apache repo .. Patch Set 3: Code-Review+2 This makes sense to me -- To view, visit http://gerrit.cloudera.org:8080/17030 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I535d77e565b73d732ae511d7525193467086c76a Gerrit-Change-Number: 17030 Gerrit-PatchSet: 3 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 09 Feb 2021 23:48:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10161: User LDAP Search bind support
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/17047 ) Change subject: IMPALA-10161: User LDAP Search bind support .. Patch Set 1: (3 comments) A few remaining nits, but otherwise it looks good to me. I'll give Csaba an opportunity to take another look if he wants before +2ing it http://gerrit.cloudera.org:8080/#/c/17047/1/be/src/util/webserver.cc File be/src/util/webserver.cc: http://gerrit.cloudera.org:8080/#/c/17047/1/be/src/util/webserver.cc@128 PS1, Line 128: "Used as filter for both simple and " : "search nit: formatting (i.e. have the string start on its own line like it is here, but wrap it such that its as long as possible on that line), here and elsewhere unfortunately this is one of the things that clang-format gets wrong (since it won't automatically combine strings for you) http://gerrit.cloudera.org:8080/#/c/17047/1/fe/src/test/java/org/apache/impala/customcluster/LdapSearchBindImpalaShellTest.java File fe/src/test/java/org/apache/impala/customcluster/LdapSearchBindImpalaShellTest.java: http://gerrit.cloudera.org:8080/#/c/17047/1/fe/src/test/java/org/apache/impala/customcluster/LdapSearchBindImpalaShellTest.java@104 PS1, Line 104: n typo http://gerrit.cloudera.org:8080/#/c/17047/1/fe/src/test/java/org/apache/impala/customcluster/LdapSimpleBindImpalaShellTest.java File fe/src/test/java/org/apache/impala/customcluster/LdapSimpleBindImpalaShellTest.java: http://gerrit.cloudera.org:8080/#/c/17047/1/fe/src/test/java/org/apache/impala/customcluster/LdapSimpleBindImpalaShellTest.java@42 PS1, Line 42: @Test I think you missed some instances of @Override here and below, though it might be more straightforward to just avoid overriding things by naming the functions in the base class something like test...Impl() or similar. Not a big deal, though. -- To view, visit http://gerrit.cloudera.org:8080/17047 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I978744ad05d9ef408328d1e4dd2d18c329f4d3b7 Gerrit-Change-Number: 17047 Gerrit-PatchSet: 1 Gerrit-Owner: Tamas Mate Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Tue, 09 Feb 2021 22:40:54 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10496: SAML implementation in Impala
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16833 ) Change subject: IMPALA-10496: SAML implementation in Impala .. Patch Set 20: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8108/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16833 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia0c026cba1b90e7ff6ec5ae49be78b0d1edd8dfa Gerrit-Change-Number: 16833 Gerrit-PatchSet: 20 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 09 Feb 2021 22:31:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10496: SAML implementation in Impala
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16833 ) Change subject: IMPALA-10496: SAML implementation in Impala .. Patch Set 21: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6878/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16833 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia0c026cba1b90e7ff6ec5ae49be78b0d1edd8dfa Gerrit-Change-Number: 16833 Gerrit-PatchSet: 21 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 09 Feb 2021 22:16:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10496: SAML implementation in Impala
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16833 ) Change subject: IMPALA-10496: SAML implementation in Impala .. Patch Set 20: (37 comments) http://gerrit.cloudera.org:8080/#/c/16833/20/fe/src/main/java/org/apache/impala/authentication/saml/AuthTokenGenerator.java File fe/src/main/java/org/apache/impala/authentication/saml/AuthTokenGenerator.java: http://gerrit.cloudera.org:8080/#/c/16833/20/fe/src/main/java/org/apache/impala/authentication/saml/AuthTokenGenerator.java@20 PS20, Line 20: // copy of https://github.com/vihangk1/hive/blob/master_saml/service/src/java/org/apache/hive/service/auth/saml/AuthTokenGenerator.java line too long (135 > 90) http://gerrit.cloudera.org:8080/#/c/16833/20/fe/src/main/java/org/apache/impala/authentication/saml/HiveSamlAuthTokenGenerator.java File fe/src/main/java/org/apache/impala/authentication/saml/HiveSamlAuthTokenGenerator.java: http://gerrit.cloudera.org:8080/#/c/16833/20/fe/src/main/java/org/apache/impala/authentication/saml/HiveSamlAuthTokenGenerator.java@32 PS20, Line 32: // https://github.com/vihangk1/hive/blob/master_saml/service/src/java/org/apache/hive/service/auth/saml/HiveSamlAuthTokenGenerator.java line too long (135 > 90) http://gerrit.cloudera.org:8080/#/c/16833/20/fe/src/main/java/org/apache/impala/authentication/saml/HiveSamlAuthTokenGenerator.java@52 PS20, Line 52: private static final Logger LOG = LoggerFactory.getLogger(HiveSamlAuthTokenGenerator.class); line too long (94 > 90) http://gerrit.cloudera.org:8080/#/c/16833/20/fe/src/main/java/org/apache/impala/authentication/saml/HiveSamlGroupNameFilter.java File fe/src/main/java/org/apache/impala/authentication/saml/HiveSamlGroupNameFilter.java: http://gerrit.cloudera.org:8080/#/c/16833/20/fe/src/main/java/org/apache/impala/authentication/saml/HiveSamlGroupNameFilter.java@31 PS20, Line 31: // https://github.com/vihangk1/hive/blob/master_saml/service/src/java/org/apache/hive/service/auth/saml/HiveSamlGroupNameFilter.java line too long (132 > 90) http://gerrit.cloudera.org:8080/#/c/16833/20/fe/src/main/java/org/apache/impala/authentication/saml/HiveSamlRelayStateInfo.java File fe/src/main/java/org/apache/impala/authentication/saml/HiveSamlRelayStateInfo.java: http://gerrit.cloudera.org:8080/#/c/16833/20/fe/src/main/java/org/apache/impala/authentication/saml/HiveSamlRelayStateInfo.java@20 PS20, Line 20: // copy of https://github.com/vihangk1/hive/blob/master_saml/service/src/java/org/apache/hive/service/auth/saml/HiveSamlRelayStateInfo.java line too long (139 > 90) http://gerrit.cloudera.org:8080/#/c/16833/20/fe/src/main/java/org/apache/impala/authentication/saml/HiveSamlRelayStateStore.java File fe/src/main/java/org/apache/impala/authentication/saml/HiveSamlRelayStateStore.java: http://gerrit.cloudera.org:8080/#/c/16833/20/fe/src/main/java/org/apache/impala/authentication/saml/HiveSamlRelayStateStore.java@32 PS20, Line 32: // slightly modified copy of https://github.com/vihangk1/hive/blob/master_saml/service/src/java/org/apache/hive/service/auth/saml/HiveSamlRelayStateInfo.java line too long (157 > 90) http://gerrit.cloudera.org:8080/#/c/16833/20/fe/src/main/java/org/apache/impala/authentication/saml/ImpalaSamlClient.java File fe/src/main/java/org/apache/impala/authentication/saml/ImpalaSamlClient.java: http://gerrit.cloudera.org:8080/#/c/16833/20/fe/src/main/java/org/apache/impala/authentication/saml/ImpalaSamlClient.java@43 PS20, Line 43: // modified version of https://github.com/vihangk1/hive/blob/master_saml/service/src/java/org/apache/hive/service/auth/saml/HiveSaml2Client.java line too long (144 > 90) http://gerrit.cloudera.org:8080/#/c/16833/20/fe/src/main/java/org/apache/impala/authentication/saml/ImpalaSamlClient.java@69 PS20, Line 69: //TODO handle the replayCache as described in http://www.pac4j.org/docs/clients/saml.html line too long (93 > 90) http://gerrit.cloudera.org:8080/#/c/16833/20/fe/src/main/java/org/apache/impala/authentication/saml/ImpalaSamlClient.java@148 PS20, Line 148: // This is done to keep original structure by Vihang + keep ImpalaSamlClient as the only line too long (92 > 90) http://gerrit.cloudera.org:8080/#/c/16833/20/fe/src/main/java/org/apache/impala/authentication/saml/ImpalaSamlClient.java@188 PS20, Line 188: // https://github.com/vihangk1/hive/blob/master_saml/service/src/java/org/apache/hive/service/cli/thrift/ThriftHttpServlet.java line too long (129 > 90) http://gerrit.cloudera.org:8080/#/c/16833/20/fe/src/main/java/org/apache/impala/authentication/saml/ImpalaSamlClient.java@189 PS20, Line 189: private String doSamlAuth(WrappedWebContext webContext) throws HttpSamlAuthenticationException { line too long (98 > 90) http://gerrit.cloudera.org:8080/#/c/16833/20/fe/src/main/java/org/apache/impala/service/BackendConfig.java File fe/src/main/java/org/apache/impala/service/BackendConfig.java:
[Impala-ASF-CR] IMPALA-10496: SAML implementation in Impala
Csaba Ringhofer has uploaded a new patch set (#20). ( http://gerrit.cloudera.org:8080/16833 ) Change subject: IMPALA-10496: SAML implementation in Impala .. IMPALA-10496: SAML implementation in Impala The bulk of the SAML2 related code is done on Java side because: - There is already a POC in Hive that could be reused. - The only SAML lib for c++ seems to be OpenSaml, which is seemed quite hard to use and a heavy dependency. Doing authentication in Java needed some plumbing, as the hs2-http port is listened to in c++ and http related processing happens in THttpServer/THttpTransport, which is not a "real" web server, just a simple http implementation that processes the headers and passes content to the thrift service. - Http headers (and in one case body) are inspected and if it is SAML related, the http request is wrapped in TWrappedHttpRequest and sent to the Frontend. The Frontend processes it and returns a TWrappedHttpResponse with the info to return to the client. - After the last SAML message (with the bearer token) we generate an auth cookie in c++ (which can be validated in c++), so later requests in the session don't need to call to Java. State of implementation: - The java side is more or less ok, will be updated when the Hive implementation changes. I would do a proper cleanup / documentation once the Hive code is more final. - Compatibility with other auth mechanisms should be decided: - Whether other clients should be able to auth with ldap/kerberos is not clear yet. Testing: - Added EE tests that use Python's urllib2 to sent SAML requests to Impala. Impala works slightly differently during tests (saml2_ee_test_mode=true). Change-Id: Ia0c026cba1b90e7ff6ec5ae49be78b0d1edd8dfa --- M be/src/common/global-flags.cc M be/src/rpc/auth-provider.h M be/src/rpc/authentication-test.cc M be/src/rpc/authentication.cc M be/src/rpc/authentication.h M be/src/rpc/hs2-http-test.cc M be/src/rpc/thrift-server.h M be/src/service/frontend.cc M be/src/service/frontend.h M be/src/service/impala-server.cc M be/src/transport/THttpServer.cpp M be/src/transport/THttpServer.h M be/src/transport/THttpTransport.cpp M be/src/transport/THttpTransport.h M be/src/util/backend-gflag-util.cc M bin/rat_exclude_files.txt M common/thrift/BackendGflags.thrift M common/thrift/Frontend.thrift M common/thrift/metrics.json M fe/pom.xml A fe/src/main/java/org/apache/impala/authentication/saml/AuthTokenGenerator.java A fe/src/main/java/org/apache/impala/authentication/saml/HiveSamlAuthTokenGenerator.java A fe/src/main/java/org/apache/impala/authentication/saml/HiveSamlGroupNameFilter.java A fe/src/main/java/org/apache/impala/authentication/saml/HiveSamlHttpServlet.java A fe/src/main/java/org/apache/impala/authentication/saml/HiveSamlRelayStateInfo.java A fe/src/main/java/org/apache/impala/authentication/saml/HiveSamlRelayStateStore.java A fe/src/main/java/org/apache/impala/authentication/saml/HiveSamlUtils.java A fe/src/main/java/org/apache/impala/authentication/saml/HttpSamlAuthenticationException.java A fe/src/main/java/org/apache/impala/authentication/saml/HttpSamlNoGroupsMatchedException.java A fe/src/main/java/org/apache/impala/authentication/saml/ImpalaSamlClient.java A fe/src/main/java/org/apache/impala/authentication/saml/NullSessionStore.java A fe/src/main/java/org/apache/impala/authentication/saml/WrappedWebContext.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/service/JniFrontend.java M java/pom.xml A testdata/authentication/saml2_sso.jks A testdata/authentication/saml2_sso_metadata.xml A tests/custom_cluster/test_saml2_sso.py 39 files changed, 2,142 insertions(+), 50 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/16833/20 -- To view, visit http://gerrit.cloudera.org:8080/16833 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia0c026cba1b90e7ff6ec5ae49be78b0d1edd8dfa Gerrit-Change-Number: 16833 Gerrit-PatchSet: 20 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-8721: re-enable test hive impala interop
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17042 ) Change subject: IMPALA-8721: re-enable test_hive_impala_interop .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/17042 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7e2beabd7082a45a0fc3b60d318cf698079768ff Gerrit-Change-Number: 17042 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 09 Feb 2021 21:44:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8721: re-enable test hive impala interop
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17042 ) Change subject: IMPALA-8721: re-enable test_hive_impala_interop .. IMPALA-8721: re-enable test_hive_impala_interop The test now passes because HIVE-21290 was fixed. Revert "IMPALA-8689: test_hive_impala_interop failing with "Timeout >7200s"" This reverts commit 5d8c99ce74c45a7d04f11e1f252b346d654f02bf. Change-Id: I7e2beabd7082a45a0fc3b60d318cf698079768ff Reviewed-on: http://gerrit.cloudera.org:8080/17042 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M tests/custom_cluster/test_hive_parquet_codec_interop.py 1 file changed, 2 insertions(+), 4 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/17042 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I7e2beabd7082a45a0fc3b60d318cf698079768ff Gerrit-Change-Number: 17042 Gerrit-PatchSet: 3 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10469: push quickstart to apache repo
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17030 ) Change subject: IMPALA-10469: push quickstart to apache repo .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8107/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17030 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I535d77e565b73d732ae511d7525193467086c76a Gerrit-Change-Number: 17030 Gerrit-PatchSet: 3 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 09 Feb 2021 20:45:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10469: push quickstart to apache repo
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17030 ) Change subject: IMPALA-10469: push quickstart to apache repo .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/17030/2/docker/publish_images_to_apache.sh File docker/publish_images_to_apache.sh: http://gerrit.cloudera.org:8080/#/c/17030/2/docker/publish_images_to_apache.sh@62 PS2, Line 62: IMAGES+=" impala_quickstart_client impala_quickstart_hms" > Ok, I think a coherent first step is that docker-images.txt is a list of al Good point. I did this and re-pushed the images. Also removed the -d flag cause I don't think that was particularly useful and it got more complicated when we had images that didn't have debug values. -- To view, visit http://gerrit.cloudera.org:8080/17030 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I535d77e565b73d732ae511d7525193467086c76a Gerrit-Change-Number: 17030 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 09 Feb 2021 20:27:33 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10469: push quickstart to apache repo
Hello Grant Henke, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17030 to look at the new patch set (#3). Change subject: IMPALA-10469: push quickstart to apache repo .. IMPALA-10469: push quickstart to apache repo This adds a script, docker/publish_images_to_apache.sh, that allows uploading images to the apache/impala docker hub repo, prefixed with a version string. E.g. with the following commands: ninja docker_images quickstart_docker_images ./docker/publish_images_to_apache.sh -v 81d5377c2 The uploaded images can then be used for the quickstart cluster, as documented in docker/README. Updated docs for quickstart to use a prefix from apache/impala Remove IMPALA_QUICKSTART_VERSION, which doesn't interact well with the tagging since the image name and version are now encoded in the tag. Fix an incorrect image name added to docker-images.txt: impala_profile_tool_image. Testing: Ran Impala quickstart with data loading using instructions in README. export IMPALA_QUICKSTART_IMAGE_PREFIX="apache/impala:81d5377c2-" docker network create -d bridge quickstart-network export QUICKSTART_IP=$(docker network inspect quickstart-network -f '{{(index .IPAM.Config 0).Gateway}}') export QUICKSTART_LISTEN_ADDR=$QUICKSTART_IP docker-compose -f docker/quickstart.yml \ -f docker/quickstart-kudu-minimal.yml \ -f docker/quickstart-load-data.yml up -d docker run --network=quickstart-network -it \ ${IMPALA_QUICKSTART_IMAGE_PREFIX}impala_quickstart_client impala-shell Change-Id: I535d77e565b73d732ae511d7525193467086c76a --- M docker/CMakeLists.txt M docker/README.md A docker/publish_images_to_apache.sh M docker/quickstart-load-data.yml M docker/quickstart.yml 5 files changed, 115 insertions(+), 14 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/17030/3 -- To view, visit http://gerrit.cloudera.org:8080/17030 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I535d77e565b73d732ae511d7525193467086c76a Gerrit-Change-Number: 17030 Gerrit-PatchSet: 3 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10469: push quickstart to apache repo
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17030 ) Change subject: IMPALA-10469: push quickstart to apache repo .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/17030/2/docker/publish_images_to_apache.sh File docker/publish_images_to_apache.sh: http://gerrit.cloudera.org:8080/#/c/17030/2/docker/publish_images_to_apache.sh@62 PS2, Line 62: IMAGES+=" impala_quickstart_client impala_quickstart_hms" > I've seen some build tools (internal to Cloudera) that basically intersect Ok, I think a coherent first step is that docker-images.txt is a list of all non-intermediary docker images. For example, it won't include the impala_base docker image, but it would include quickstart images. This will change over time, but it seems fine. -- To view, visit http://gerrit.cloudera.org:8080/17030 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I535d77e565b73d732ae511d7525193467086c76a Gerrit-Change-Number: 17030 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 09 Feb 2021 17:55:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9382: part 3/3 clean up runtime profile v2 text output
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17050 ) Change subject: IMPALA-9382: part 3/3 clean up runtime profile v2 text output .. Patch Set 1: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/8106/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/17050 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I277a0da749bcda4ecca574257b5aaacbcf222491 Gerrit-Change-Number: 17050 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Comment-Date: Tue, 09 Feb 2021 17:34:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9382: part 3/3 clean up runtime profile v2 text output
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17050 ) Change subject: IMPALA-9382: part 3/3 clean up runtime profile v2 text output .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/17050/1/tests/observability/test_profile_tool.py File tests/observability/test_profile_tool.py: http://gerrit.cloudera.org:8080/#/c/17050/1/tests/observability/test_profile_tool.py@50 PS1, Line 50: ) flake8: E501 line too long (91 > 90 characters) http://gerrit.cloudera.org:8080/#/c/17050/1/tests/observability/test_profile_tool.py@53 PS1, Line 53: ) flake8: E501 line too long (92 > 90 characters) -- To view, visit http://gerrit.cloudera.org:8080/17050 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I277a0da749bcda4ecca574257b5aaacbcf222491 Gerrit-Change-Number: 17050 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Comment-Date: Tue, 09 Feb 2021 17:15:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9586: update query option docs for mt dop
Tim Armstrong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17043 ) Change subject: IMPALA-9586: update query option docs for mt_dop .. IMPALA-9586: update query option docs for mt_dop There are interactions between mt_dop and num_nodes and num_scanner_threads. Mention these in the docs. Change-Id: I3d9a6f56ffaf211d7d3ca1fad506ee83d516ccbd Reviewed-on: http://gerrit.cloudera.org:8080/17043 Tested-by: Impala Public Jenkins Reviewed-by: Joe McDonnell --- M docs/topics/impala_num_nodes.xml M docs/topics/impala_num_scanner_threads.xml 2 files changed, 9 insertions(+), 0 deletions(-) Approvals: Impala Public Jenkins: Verified Joe McDonnell: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/17043 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I3d9a6f56ffaf211d7d3ca1fad506ee83d516ccbd Gerrit-Change-Number: 17043 Gerrit-PatchSet: 3 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-9382: part 3/3 clean up runtime profile v2 text output
Tim Armstrong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17050 Change subject: IMPALA-9382: part 3/3 clean up runtime profile v2 text output .. IMPALA-9382: part 3/3 clean up runtime profile v2 text output Eliminated some of the noisy per-instance counters from DEFAULT verbosity. Testing: * Updated impala-profile-tool test with new output * Added new impala-profile-tool test for v2 profile. Change-Id: I277a0da749bcda4ecca574257b5aaacbcf222491 --- M be/src/util/runtime-profile.cc M testdata/impala-profiles/README M testdata/impala-profiles/impala_profile_log_tpcds_compute_stats_default.expected.txt A testdata/impala-profiles/impala_profile_log_tpcds_compute_stats_v2 A testdata/impala-profiles/impala_profile_log_tpcds_compute_stats_v2_default.expected.txt A testdata/impala-profiles/impala_profile_log_tpcds_compute_stats_v2_extended.expected.txt M tests/observability/test_profile_tool.py 7 files changed, 5,755 insertions(+), 516 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/17050/1 -- To view, visit http://gerrit.cloudera.org:8080/17050 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I277a0da749bcda4ecca574257b5aaacbcf222491 Gerrit-Change-Number: 17050 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-9586: update query option docs for mt dop
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17043 ) Change subject: IMPALA-9586: update query option docs for mt_dop .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17043 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d9a6f56ffaf211d7d3ca1fad506ee83d516ccbd Gerrit-Change-Number: 17043 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Comment-Date: Tue, 09 Feb 2021 16:13:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8721: re-enable test hive impala interop
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17042 ) Change subject: IMPALA-8721: re-enable test_hive_impala_interop .. Patch Set 2: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6877/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/17042 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7e2beabd7082a45a0fc3b60d318cf698079768ff Gerrit-Change-Number: 17042 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 09 Feb 2021 16:08:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8721: re-enable test hive impala interop
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17042 ) Change subject: IMPALA-8721: re-enable test_hive_impala_interop .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17042 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7e2beabd7082a45a0fc3b60d318cf698079768ff Gerrit-Change-Number: 17042 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 09 Feb 2021 16:08:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10482, IMPALA-10493: Fix bugs in full ACID collection query rewrites
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17038 ) Change subject: IMPALA-10482, IMPALA-10493: Fix bugs in full ACID collection query rewrites .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8105/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17038 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8fc758d3c1e75c7066936d590aec8bff8d2b00b0 Gerrit-Change-Number: 17038 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 09 Feb 2021 15:43:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10482, IMPALA-10493: Fix bugs in full ACID collection query rewrites
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17038 ) Change subject: IMPALA-10482, IMPALA-10493: Fix bugs in full ACID collection query rewrites .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/17038/1/testdata/workloads/functional-query/queries/QueryTest/nested-types-scanner-basic.test File testdata/workloads/functional-query/queries/QueryTest/nested-types-scanner-basic.test: http://gerrit.cloudera.org:8080/#/c/17038/1/testdata/workloads/functional-query/queries/QueryTest/nested-types-scanner-basic.test@317 PS1, Line 317: > This query actually revealed another bug, opened IMPALA-10493 for it. Uploaded a fix for IMPALA-10493 in the context of this patch since the bug resides in the same method. Though I can split this patch if you feel necessary. -- To view, visit http://gerrit.cloudera.org:8080/17038 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8fc758d3c1e75c7066936d590aec8bff8d2b00b0 Gerrit-Change-Number: 17038 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 09 Feb 2021 15:28:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10482, IMPALA-10493: Fix bugs in full ACID collection query rewrites
Hello Quanlong Huang, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17038 to look at the new patch set (#3). Change subject: IMPALA-10482, IMPALA-10493: Fix bugs in full ACID collection query rewrites .. IMPALA-10482, IMPALA-10493: Fix bugs in full ACID collection query rewrites IMPALA-10482: SELECT * query on unrelative collection column of transactional ORC table will hit IllegalStateException. The AcidRewriter will rewrite queries like "select item from my_complex_orc.int_array" to "select item from my_complex_orc t, t.int_array" This cause troubles in star expansion. Because the original query "select * from my_complex_orc.int_array" is analyzed as "select item from my_complex_orc.int_array" But the rewritten query "select * from my_complex_orc t, t.int_array" is analyzed as "select id, item from my_complex_orc t, t.int_array". Hidden table refs can also cause issues during regular column resolution. E.g. when the table has top-level 'pos'/'item'/'key'/'value' columns. The workaround is to keep track of the automatically added table refs during query rewrite. So when we analyze the rewritten query we can ignore these auxiliary table refs. IMPALA-10493: Using JOIN ON syntax to join two full ACID collections produces wrong results. When AcidRewriter.splitCollectionRef() creates a new collection ref it doesn't copy every information needed to correctly execute the query. E.g. it dropped the ON clause, turning INNER joins to CROSS joins. Testing: * added e2e tests Change-Id: I8fc758d3c1e75c7066936d590aec8bff8d2b00b0 --- M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/main/java/org/apache/impala/analysis/TableRef.java M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-query/queries/QueryTest/nested-types-scanner-basic.test 7 files changed, 231 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/17038/3 -- To view, visit http://gerrit.cloudera.org:8080/17038 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8fc758d3c1e75c7066936d590aec8bff8d2b00b0 Gerrit-Change-Number: 17038 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10463: Implement ds theta sketch() and ds theat estimate() functions
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/17008 ) Change subject: IMPALA-10463: Implement ds_theta_sketch() and ds_theat_estimate() functions .. Patch Set 2: (7 comments) http://gerrit.cloudera.org:8080/#/c/17008/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17008/2//COMMIT_MSG@7 PS2, Line 7: ds_theat_estimate nit: typo http://gerrit.cloudera.org:8080/#/c/17008/2//COMMIT_MSG@13 PS2, Line 13: ds_theat_estimate nit: same typo http://gerrit.cloudera.org:8080/#/c/17008/2//COMMIT_MSG@28 PS2, Line 28:see IMPALA-10464. I'd also include some highlights from that perf measurement doc into the commit msg. Probably an additional section would be great for this. http://gerrit.cloudera.org:8080/#/c/17008/2/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/17008/2/be/src/exprs/aggregate-functions-ir.cc@1646 PS2, Line 1646: SerializeCompactDsThetaSketch In contrast with HLL as I see Theta doesn't compact the sketch just serializes it so this function name is not reflecting well what actually happens inside the function. Please rename it to SerializeDsThetaSketch() http://gerrit.cloudera.org:8080/#/c/17008/2/be/src/exprs/aggregate-functions-ir.cc@1899 PS2, Line 1899: datasketches::compact_theta_sketch* sketch_ptr = I;m a bit lost here. Could you help me understand why is it needed to convert the union_sketch to a compact_theta_sketch? Can't you return the union_sketch? http://gerrit.cloudera.org:8080/#/c/17008/2/be/src/exprs/datasketches-functions-ir.cc File be/src/exprs/datasketches-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/17008/2/be/src/exprs/datasketches-functions-ir.cc@110 PS2, Line 110: return 0; HLL returns a null here. Have you checked the behaviour in Hive to be in sync with the 2 systems? http://gerrit.cloudera.org:8080/#/c/17008/2/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test File testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test: http://gerrit.cloudera.org:8080/#/c/17008/2/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test@138 PS2, Line 138: # Check that ds_theta_estimate returns error for strings that are not serialized sketches. Please add a test when ds_theta_estimate() is used on an HLL sketch. I guess we expect an error there. -- To view, visit http://gerrit.cloudera.org:8080/17008 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14f24c16b815eec75cf90bb92c8b8b0363dcbfbc Gerrit-Change-Number: 17008 Gerrit-PatchSet: 2 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 09 Feb 2021 15:13:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10282: Implement ds cpc sketch() and ds cpc estimate() functions
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/16656 ) Change subject: IMPALA-10282: Implement ds_cpc_sketch() and ds_cpc_estimate() functions .. Patch Set 4: Hey, This review has been open for a while now. Do you have any updates on my comment/question? I see you pushed another review to include Theta sketch for cardinality estimates. With that we'll have 3 different algorithms for DataSketches for the same purpose and it makes me wonder which one is better for which purpose. -- To view, visit http://gerrit.cloudera.org:8080/16656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I731e66fbadc74bc339c973f4d9337db9b7dd715a Gerrit-Change-Number: 16656 Gerrit-PatchSet: 4 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 09 Feb 2021 13:52:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10482: Select-star query on unrelative collection column of transactional table hits IllegalStateException
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17038 ) Change subject: IMPALA-10482: Select-star query on unrelative collection column of transactional table hits IllegalStateException .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8104/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17038 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8fc758d3c1e75c7066936d590aec8bff8d2b00b0 Gerrit-Change-Number: 17038 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 09 Feb 2021 13:05:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10161: User LDAP Search bind support
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17047 ) Change subject: IMPALA-10161: User LDAP Search bind support .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8103/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17047 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I978744ad05d9ef408328d1e4dd2d18c329f4d3b7 Gerrit-Change-Number: 17047 Gerrit-PatchSet: 1 Gerrit-Owner: Tamas Mate Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Tue, 09 Feb 2021 12:59:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10482: Select-star query on unrelative collection column of transactional table hits IllegalStateException
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17038 ) Change subject: IMPALA-10482: Select-star query on unrelative collection column of transactional table hits IllegalStateException .. Patch Set 2: (7 comments) Thanks for the comments! http://gerrit.cloudera.org:8080/#/c/17038/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17038/1//COMMIT_MSG@7 PS1, Line 7: IMPAL > nit: IMPALA Oops, thanks. Done. http://gerrit.cloudera.org:8080/#/c/17038/1//COMMIT_MSG@25 PS1, Line 25: when doing star expansion > I think we also need to ignore them when resolving slot refs. These cases a Good catch. Added fix and tests in PS2. http://gerrit.cloudera.org:8080/#/c/17038/1/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java: http://gerrit.cloudera.org:8080/#/c/17038/1/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@130 PS1, Line 130: SelectStmt(SelectList selectList, : FromClause fromClause, : Expr wherePred > Maybe the current name is ok. We may refactor the AcidRewriter and reuse it Made this a property of TableRef, the name is 'isHidden()'. http://gerrit.cloudera.org:8080/#/c/17038/1/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@130 PS1, Line 130: SelectStmt(SelectList selectList, : FromClause fromClause, : Expr wherePred > I am not too familiar with the full ACID rewrites, but my guess is that the You are right, but since then Quanlong mentioned that we will get hidden table refs for other reasons as well. http://gerrit.cloudera.org:8080/#/c/17038/1/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@133 PS1, Line 133: ngPredicate, Lis > optional: I think that it would be slightly better to store this informatio Done http://gerrit.cloudera.org:8080/#/c/17038/1/testdata/workloads/functional-query/queries/QueryTest/nested-types-scanner-basic.test File testdata/workloads/functional-query/queries/QueryTest/nested-types-scanner-basic.test: http://gerrit.cloudera.org:8080/#/c/17038/1/testdata/workloads/functional-query/queries/QueryTest/nested-types-scanner-basic.test@317 PS1, Line 317: > +1 Sure. http://gerrit.cloudera.org:8080/#/c/17038/1/testdata/workloads/functional-query/queries/QueryTest/nested-types-scanner-basic.test@317 PS1, Line 317: > Can you add a bit more complex query? The idea is to include the same hidde This query actually revealed another bug, opened IMPALA-10493 for it. But if we put put the JOIN condition to the WHERE clause instead of the ON clause the query works fine. -- To view, visit http://gerrit.cloudera.org:8080/17038 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8fc758d3c1e75c7066936d590aec8bff8d2b00b0 Gerrit-Change-Number: 17038 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 09 Feb 2021 12:51:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10482: Select-star query on unrelative collection column of transactional table hits IllegalStateException
Hello Quanlong Huang, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17038 to look at the new patch set (#2). Change subject: IMPALA-10482: Select-star query on unrelative collection column of transactional table hits IllegalStateException .. IMPALA-10482: Select-star query on unrelative collection column of transactional table hits IllegalStateException SELECT * query on unrelative collection column of transactional ORC table will hit IllegalStateException. The AcidRewriter will rewrite queries like "select item from my_complex_orc.int_array" to "select item from my_complex_orc t, t.int_array" This cause troubles in star expansion. Because the original query "select * from my_complex_orc.int_array" is analyzed as "select item from my_complex_orc.int_array" But the rewritten query "select * from my_complex_orc t, t.int_array" is analyzed as "select id, item from my_complex_orc t, t.int_array". The workaround is to keep track of the automatically added table refs during query rewrite. So when we analyze the rewritten query we can ignore these auxiliary table refs when doing star expansion. Testing: * added e2e tests Change-Id: I8fc758d3c1e75c7066936d590aec8bff8d2b00b0 --- M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/main/java/org/apache/impala/analysis/TableRef.java M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-query/queries/QueryTest/nested-types-scanner-basic.test 7 files changed, 191 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/17038/2 -- To view, visit http://gerrit.cloudera.org:8080/17038 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8fc758d3c1e75c7066936d590aec8bff8d2b00b0 Gerrit-Change-Number: 17038 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-10161: User LDAP Search bind support
Tamas Mate has posted comments on this change. ( http://gerrit.cloudera.org:8080/17047 ) Change subject: IMPALA-10161: User LDAP Search bind support .. Patch Set 1: Hi Csaba, Thomas, thank you for the reviews, Apologies, I had to re-submit the change under a new change id. Compared to the previous review, this change contains: 1) A factory method that creates the LDAP instance based on the configuration 2) Updated the incorrect flag name in the commit message 3) Cleaned the headers in the webserver.h 4) Ran clang from cli, looks like my IDE was acting up -- To view, visit http://gerrit.cloudera.org:8080/17047 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I978744ad05d9ef408328d1e4dd2d18c329f4d3b7 Gerrit-Change-Number: 17047 Gerrit-PatchSet: 1 Gerrit-Owner: Tamas Mate Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Tue, 09 Feb 2021 12:39:19 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10161: User LDAP Search bind support
Tamas Mate has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17047 Change subject: IMPALA-10161: User LDAP Search bind support .. IMPALA-10161: User LDAP Search bind support This change adds user search bind support next to simple bind that can be configured with LDAP filters. The group check was done with LDAP search earlier, this change adds the possibility to configure it with Hadoop library like options, which is the LDAP filter with optional patterns. The '{0}' will be replaced with the user name while the '{1}' pattern will be replaced with the user dn. The following new flags have been added: --ldap_search_bind_authentication: a flag to change between simple and search bind --ldap_user_search_basedn: the base dn for the LDAP subtree to search --ldap_group_search_basedn: the base dn for the LDAP subtree to search Tested: - Custom cluster tests have been added Change-Id: I978744ad05d9ef408328d1e4dd2d18c329f4d3b7 --- M be/src/rpc/authentication.cc M be/src/util/CMakeLists.txt A be/src/util/ldap-search-bind.cc A be/src/util/ldap-search-bind.h A be/src/util/ldap-simple-bind.cc A be/src/util/ldap-simple-bind.h M be/src/util/ldap-util.cc M be/src/util/ldap-util.h M be/src/util/webserver.cc M fe/src/test/java/org/apache/impala/customcluster/LdapImpalaShellTest.java A fe/src/test/java/org/apache/impala/customcluster/LdapSearchBindImpalaShellTest.java A fe/src/test/java/org/apache/impala/customcluster/LdapSimpleBindImpalaShellTest.java M fe/src/test/java/org/apache/impala/testutil/LdapUtil.java M fe/src/test/resources/users.ldif 14 files changed, 904 insertions(+), 330 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/17047/1 -- To view, visit http://gerrit.cloudera.org:8080/17047 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I978744ad05d9ef408328d1e4dd2d18c329f4d3b7 Gerrit-Change-Number: 17047 Gerrit-PatchSet: 1 Gerrit-Owner: Tamas Mate Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-10161: User LDAP Search bind support
Tamas Mate has abandoned this change. ( http://gerrit.cloudera.org:8080/16717 ) Change subject: IMPALA-10161: User LDAP Search bind support .. Abandoned Abandoning this change for now. -- To view, visit http://gerrit.cloudera.org:8080/16717 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: I978744ad05d9ef408328d1e4dd2d18c329f4d3b7 Gerrit-Change-Number: 16717 Gerrit-PatchSet: 12 Gerrit-Owner: Tamas Mate Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-10463: Implement ds theta sketch() and ds theat estimate() functions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17008 ) Change subject: IMPALA-10463: Implement ds_theta_sketch() and ds_theat_estimate() functions .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8102/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17008 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14f24c16b815eec75cf90bb92c8b8b0363dcbfbc Gerrit-Change-Number: 17008 Gerrit-PatchSet: 2 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 09 Feb 2021 10:40:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10463: Implement ds theta sketch() and ds theat estimate() functions
Fucun Chu has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/17008 ) Change subject: IMPALA-10463: Implement ds_theta_sketch() and ds_theat_estimate() functions .. IMPALA-10463: Implement ds_theta_sketch() and ds_theat_estimate() functions These functions can be used to get cardinality estimates of data using Theta algorithm from Apache DataSketches. ds_theta_sketch() receives a dataset, e.g. a column from a table, and returns a serialized Theta sketch in string format. This can be written to a table or be fed directly to ds_theat_estimate() that returns the cardinality estimate for that sketch. Similar to the HLL sketch, the primary use-case for the Theta sketch is for counting distinct values as a stream, and then merging multiple sketches together for a total distinct count. For more details about Apache DataSketches' Theta see: https://datasketches.apache.org/docs/Theta/ThetaSketchFramework.html Testing: - Added some tests running estimates for small datasets where the amount of data is small enough to get the correct results. - Ran manual tests on tpch25_parquet.lineitem to compare perfomance with ds_hll_*. HLL and Theta gives closer estimate except for string, see IMPALA-10464. Change-Id: I14f24c16b815eec75cf90bb92c8b8b0363dcbfbc --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions-test.cc M be/src/exprs/aggregate-functions.h M be/src/exprs/datasketches-functions-ir.cc M be/src/exprs/datasketches-functions.h M common/function-registry/impala_functions.py M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/data/README A testdata/data/theta_sketches_from_hive.parquet A testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test M tests/query_test/test_datasketches.py 11 files changed, 399 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/17008/2 -- To view, visit http://gerrit.cloudera.org:8080/17008 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I14f24c16b815eec75cf90bb92c8b8b0363dcbfbc Gerrit-Change-Number: 17008 Gerrit-PatchSet: 2 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10161: User LDAP Search bind support
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16717 ) Change subject: IMPALA-10161: User LDAP Search bind support .. Patch Set 12: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8100/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16717 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I978744ad05d9ef408328d1e4dd2d18c329f4d3b7 Gerrit-Change-Number: 16717 Gerrit-PatchSet: 12 Gerrit-Owner: Tamas Mate Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Tue, 09 Feb 2021 09:37:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5675: Support UTF-8 Varchar and Char
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16909 ) Change subject: IMPALA-5675: Support UTF-8 Varchar and Char .. Patch Set 11: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8101/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16909 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I62efa3042c64d1d005a2cf4fd1d31e992543963f Gerrit-Change-Number: 16909 Gerrit-PatchSet: 11 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 09 Feb 2021 09:34:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8721: re-enable test hive impala interop
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/17042 ) Change subject: IMPALA-8721: re-enable test_hive_impala_interop .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17042 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7e2beabd7082a45a0fc3b60d318cf698079768ff Gerrit-Change-Number: 17042 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 09 Feb 2021 09:16:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5675: Support UTF-8 Varchar and Char
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16909 to look at the new patch set (#11). Change subject: IMPALA-5675: Support UTF-8 Varchar and Char .. IMPALA-5675: Support UTF-8 Varchar and Char This patch adds support for UTF-8 aware varchar and char types. In UTF-8 mode, when truncating UTF-8 varchar(N) and char(N) strings, lengths will be counted by UTF-8 characters instead of bytes. So the result string will have up to N UTF-8 characters. The UTF8_MODE query option is first detected in FE when analyzing the query. A 'is_utf8' label is added in Exprs and SlotDescriptors. They are used in generating thrift objects and computing the tuple layouts. A char(N) slot will occupy 4 * N bytes if it's in UTF-8 type, because a UTF-8 character can be encoded into 1~4 bytes. The slot will store up to N UTF-8 characters. There is a gotcha that we should not add the label in Type.java, because Type instances are shared across the FE. Query compilation reuses the Type instances from the metadata. If we modify Type instances during compilation, other queries in non-UTF8 mode will be affected. However, in BE, we need the type related classes (e.g. ColumnType, TypeDesc) to carry in the utf8 markers. It's impractical to check the UTF8_MODE query option everywhere it needs to be. E.g. in AnyValUtil::SetAnyVal we can't access the query options. So we add the 'is_utf8' marker in TScalarType, ColumnType, TypeDesc to conveniently distinguish char(N) and varchar(N) types in UTF-8 mode. When generating thrift objects in FE, Exprs and SlotDescriptors deliver 'is_utf8' markers to TScalaTypes. They finally landed in ColumnType and TypeDesc instances. Given the correct UTF-8 mode checked, we just need to truncate/pad the char/varchar strings with their length counted by UTF-8 characters. Since char(N) slots always occupy 4N bytes, when converting char(N) to other string types, we need to re-calculate the actual length corresponding to N UTF-8 characters. We can optimize this in later patches, e.g. store the UTF-8 length in the slot, or deal with UTF-8 char(N) by the same way as varchar(N), i.e. reallocate the string space and just store the pointer and length in the slot. Tests: - Add tests for reading char(N) and varchar(N) columns in UTF8_MODE. - Add truncating/padding tests - Kudu only supports Varchar currently. Add special tests for Kudu. - Add tests for writing CHAR(N)/VARCHAR(N) in UTF-8 mode. Change-Id: I62efa3042c64d1d005a2cf4fd1d31e992543963f --- M be/src/codegen/codegen-anyval.cc M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/llvm-codegen.cc M be/src/exec/data-source-scan-node.cc M be/src/exec/grouping-aggregator.cc M be/src/exec/hdfs-avro-scanner-ir.cc M be/src/exec/hdfs-avro-scanner-test.cc M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h M be/src/exec/hdfs-text-table-writer.cc M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-util.cc M be/src/exec/kudu-util.h M be/src/exec/orc-column-readers.cc M be/src/exec/parquet/hdfs-parquet-table-writer.cc M be/src/exec/parquet/parquet-column-readers.cc M be/src/exec/parquet/parquet-column-stats.inline.h M be/src/exec/parquet/parquet-common.h M be/src/exec/parquet/parquet-plain-test.cc M be/src/exec/text-converter.cc M be/src/exec/text-converter.inline.h M be/src/exprs/agg-fn-evaluator.cc M be/src/exprs/anyval-util.cc M be/src/exprs/anyval-util.h M be/src/exprs/cast-functions-ir.cc M be/src/exprs/scalar-expr-evaluator.cc M be/src/exprs/scalar-fn-call.cc M be/src/exprs/slot-ref.cc M be/src/runtime/raw-value-ir.cc M be/src/runtime/raw-value.cc M be/src/runtime/raw-value.inline.h M be/src/runtime/tuple.cc M be/src/runtime/types.cc M be/src/runtime/types.h M be/src/service/fe-support.cc M be/src/service/hs2-util.cc M be/src/udf/udf-internal.h M be/src/udf/udf.cc M be/src/udf/udf.h M be/src/util/CMakeLists.txt M be/src/util/bit-util.h M be/src/util/dict-encoding.h M be/src/util/string-util-test.cc M be/src/util/string-util.cc M be/src/util/string-util.h M be/src/util/tuple-row-compare.cc M common/thrift/Types.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/CastExpr.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/catalog/Type.java M fe/src/main/java/org/apache/impala/service/Frontend.java M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-query/queries/QueryTest/kudu_create.test A testdata/workloads/functional-query/queries/QueryTest/utf8-chars-casting.test A
[Impala-ASF-CR] IMPALA-10161: User LDAP Search bind support
Hello Thomas Tauber-Marshall, Attila Jeges, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16717 to look at the new patch set (#12). Change subject: IMPALA-10161: User LDAP Search bind support .. IMPALA-10161: User LDAP Search bind support This change adds user search bind support next to simple bind that can be configured with LDAP filters. The group check was done with LDAP search earlier, this change adds the possibility to configure it with Hadoop library like options, which is the LDAP filter with optional patterns. The '{0}' will be replaced with the user name while the '{1}' pattern will be replaced with the user dn. The following new flags have been added: --ldap_search_bind_authentication: a flag to change between simple and search bind --ldap_user_search_basedn: the base dn for the LDAP subtree to search --ldap_group_search_basedn: the base dn for the LDAP subtree to search Tested: - Custom cluster tests have been added Change-Id: I978744ad05d9ef408328d1e4dd2d18c329f4d3b7 --- M be/src/rpc/authentication.cc M be/src/util/CMakeLists.txt A be/src/util/ldap-search-bind.cc A be/src/util/ldap-search-bind.h A be/src/util/ldap-simple-bind.cc A be/src/util/ldap-simple-bind.h M be/src/util/ldap-util.cc M be/src/util/ldap-util.h M be/src/util/webserver.cc M fe/src/test/java/org/apache/impala/customcluster/LdapImpalaShellTest.java A fe/src/test/java/org/apache/impala/customcluster/LdapSearchBindImpalaShellTest.java A fe/src/test/java/org/apache/impala/customcluster/LdapSimpleBindImpalaShellTest.java M fe/src/test/java/org/apache/impala/testutil/LdapUtil.java M fe/src/test/resources/users.ldif 14 files changed, 904 insertions(+), 330 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/16717/12 -- To view, visit http://gerrit.cloudera.org:8080/16717 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I978744ad05d9ef408328d1e4dd2d18c329f4d3b7 Gerrit-Change-Number: 16717 Gerrit-PatchSet: 12 Gerrit-Owner: Tamas Mate Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-10161: User LDAP Search bind support
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16717 ) Change subject: IMPALA-10161: User LDAP Search bind support .. Patch Set 11: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/8099/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/16717 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I978744ad05d9ef408328d1e4dd2d18c329f4d3b7 Gerrit-Change-Number: 16717 Gerrit-PatchSet: 11 Gerrit-Owner: Tamas Mate Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Tue, 09 Feb 2021 08:21:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10161: User LDAP Search bind support
Tamas Mate has uploaded a new patch set (#11). ( http://gerrit.cloudera.org:8080/16717 ) Change subject: IMPALA-10161: User LDAP Search bind support .. IMPALA-10161: User LDAP Search bind support This change adds user search bind support next to simple bind that can be configured with LDAP filters. The group check was done with LDAP search earlier, this change adds the possibility to configure it with Hadoop library like options, which is the LDAP filter with optional patterns. The '{0}' will be replaced with the user name while the '{1}' pattern will be replaced with the user dn. The following new flags have been added: --ldap_search_bind_authentication: a flag to change between simple and search bind --ldap_user_search_basedn: the base dn for the LDAP subtree to search --ldap_group_search_basedn: the base dn for the LDAP subtree to search Tested: - Custom cluster tests have been added Change-Id: I978744ad05d9ef408328d1e4dd2d18c329f4d3b7 --- M be/src/rpc/authentication.cc M be/src/util/CMakeLists.txt A be/src/util/ldap-search-bind.cc A be/src/util/ldap-search-bind.h A be/src/util/ldap-simple-bind.cc A be/src/util/ldap-simple-bind.h M be/src/util/ldap-util.cc M be/src/util/ldap-util.h M be/src/util/webserver.cc M fe/src/test/java/org/apache/impala/customcluster/LdapImpalaShellTest.java A fe/src/test/java/org/apache/impala/customcluster/LdapSearchBindImpalaShellTest.java A fe/src/test/java/org/apache/impala/customcluster/LdapSimpleBindImpalaShellTest.java M fe/src/test/java/org/apache/impala/testutil/LdapUtil.java M fe/src/test/resources/users.ldif A ldap_flags 15 files changed, 912 insertions(+), 330 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/16717/11 -- To view, visit http://gerrit.cloudera.org:8080/16717 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I978744ad05d9ef408328d1e4dd2d18c329f4d3b7 Gerrit-Change-Number: 16717 Gerrit-PatchSet: 11 Gerrit-Owner: Tamas Mate Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Thomas Tauber-Marshall