[Impala-ASF-CR] WIP IMPALA-2581: LIMIT can be propagated down into some aggregations
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: WIP IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 6: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7450/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 6 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 01 Sep 2021 05:38:24 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP IMPALA-2581: LIMIT can be propagated down into some aggregations
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: WIP IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 5: Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7449/ -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 5 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 01 Sep 2021 04:10:36 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP IMPALA-2581: LIMIT can be propagated down into some aggregations
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: WIP IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 6: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/9414/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 6 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 01 Sep 2021 04:04:57 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP IMPALA-2581: LIMIT can be propagated down into some aggregations
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17821 to look at the new patch set (#6). Change subject: WIP IMPALA-2581: LIMIT can be propagated down into some aggregations .. WIP IMPALA-2581: LIMIT can be propagated down into some aggregations This patch contains 2 parts: 1. In some cases, push down limit to pre-aggregation a) aggregation node has no aggregate function b) aggregation node has no predicate 2. finish aggregation when number of unique keys of hash table has exceeded the limit. Queries like SELECT DISTINCT f FROM t LIMIT n Can pass the LIMIT all the way down to the pre-aggregation, which leads to a nearly unbounded speedup on these queries in large tables when n is low. Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 --- M be/src/exec/aggregation-node-base.cc M be/src/exec/aggregation-node-base.h M be/src/exec/aggregation-node.cc M be/src/exec/aggregator.h M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M be/src/exec/non-grouping-aggregator.h M be/src/exec/streaming-aggregation-node.cc M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test M testdata/workloads/functional-query/queries/QueryTest/spilling.test M testdata/workloads/targeted-perf/queries/aggregation.test 16 files changed, 127 insertions(+), 23 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/6 -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 6 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10901 cleaner and faster operations with datasketches
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17818 ) Change subject: IMPALA-10901 cleaner and faster operations with datasketches .. Patch Set 2: The following EE Tests files need to be modified: testdata/workloads/functional-query/queries/QueryTest/ datasketches-cpc.test datasketches-hll.test datasketches-kll.test datasketches-theta.test "UDF ERROR: Unable to deserialize sketch" needs to add e.what() information. Run the above test case file using the following command: cd tests impala-py.test query_test/test_datasketches.py Or use pre-review-test (https://jenkins.impala.io/job/pre-review-test/build?delay=0sec) to run the test -- To view, visit http://gerrit.cloudera.org:8080/17818 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I306a2489dac0f4d2d475e8f9987cd58bf95474bb Gerrit-Change-Number: 17818 Gerrit-PatchSet: 2 Gerrit-Owner: Alexander Saydakov Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 01 Sep 2021 03:05:58 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP IMPALA-2581: LIMIT can be propagated down into some aggregations
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: WIP IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 5: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7449/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 5 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 01 Sep 2021 02:40:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10901 cleaner and faster operations with datasketches
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17818 ) Change subject: IMPALA-10901 cleaner and faster operations with datasketches .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/9413/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17818 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I306a2489dac0f4d2d475e8f9987cd58bf95474bb Gerrit-Change-Number: 17818 Gerrit-PatchSet: 2 Gerrit-Owner: Alexander Saydakov Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 31 Aug 2021 23:53:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10901 cleaner and faster operations with datasketches
Hello Gabor Kaszab, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17818 to look at the new patch set (#2). Change subject: IMPALA-10901 cleaner and faster operations with datasketches .. IMPALA-10901 cleaner and faster operations with datasketches - serialize using bytes instead of stream - avoid unnecessary constructor during deserialization - simplified code slightly - added original exception message to re-thrown generic message Change-Id: I306a2489dac0f4d2d475e8f9987cd58bf95474bb --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/datasketches-common.cc M be/src/exprs/datasketches-common.h M be/src/exprs/datasketches-functions-ir.cc 4 files changed, 233 insertions(+), 342 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/18/17818/2 -- To view, visit http://gerrit.cloudera.org:8080/17818 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I306a2489dac0f4d2d475e8f9987cd58bf95474bb Gerrit-Change-Number: 17818 Gerrit-PatchSet: 2 Gerrit-Owner: Alexander Saydakov Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-9976 IMPALA-10866: Add recovery mechanism to admission service and fix consistency between coord failure detection and registration
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17332 ) Change subject: IMPALA-9976 IMPALA-10866: Add recovery mechanism to admission service and fix consistency between coord failure detection and registration .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/9412/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17332 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8ad3ef9b9e2496c484833d6326ce914c851e02fd Gerrit-Change-Number: 17332 Gerrit-PatchSet: 2 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Comment-Date: Tue, 31 Aug 2021 22:09:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9976 IMPALA-10866: Add recovery mechanism to admission service and fix consistency between coord failure detection and registration
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17332 ) Change subject: IMPALA-9976 IMPALA-10866: Add recovery mechanism to admission service and fix consistency between coord failure detection and registration .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/17332/2/tests/custom_cluster/test_admission_controller.py File tests/custom_cluster/test_admission_controller.py: http://gerrit.cloudera.org:8080/#/c/17332/2/tests/custom_cluster/test_admission_controller.py@1615 PS2, Line 1615: " flake8: E126 continuation line over-indented for hanging indent http://gerrit.cloudera.org:8080/#/c/17332/2/tests/custom_cluster/test_admission_controller.py@1628 PS2, Line 1628: " flake8: E126 continuation line over-indented for hanging indent -- To view, visit http://gerrit.cloudera.org:8080/17332 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8ad3ef9b9e2496c484833d6326ce914c851e02fd Gerrit-Change-Number: 17332 Gerrit-PatchSet: 2 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Comment-Date: Tue, 31 Aug 2021 21:48:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9976 IMPALA-10866: Add recovery mechanism to admission service and fix consistency between coord failure detection and registration
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/17332 ) Change subject: IMPALA-9976 IMPALA-10866: Add recovery mechanism to admission service and fix consistency between coord failure detection and registration .. Patch Set 2: (19 comments) http://gerrit.cloudera.org:8080/#/c/17332/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17332/1//COMMIT_MSG@9 PS1, Line 9: > I wrote my own version of the commit message to see that I understand the c Pretty much. The only thing I would add is that if a backend is marked as down by the statestore, it will also have to register(send full admission state) again with the admissiond to be able to be serviced http://gerrit.cloudera.org:8080/#/c/17332/1//COMMIT_MSG@12 PS1, Line 12: - Leverages the admission heartbeat mechanism to signal the > Nit" should this be "No RPCs are serviced from a coordinator until it has s Done http://gerrit.cloudera.org:8080/#/c/17332/1/be/src/scheduling/admission-control-client.cc File be/src/scheduling/admission-control-client.cc: http://gerrit.cloudera.org:8080/#/c/17332/1/be/src/scheduling/admission-control-client.cc@32 PS1, Line 32: "Re-submit for admission due to a possible admission service restart or network " > Is there a reason it is only a "possible" restart? as this can also be due to a generic network error that prevents the Admit RPC to go through. Added this to the error msg. Open to suggestion for a better message text http://gerrit.cloudera.org:8080/#/c/17332/1/be/src/scheduling/admission-control-service.cc File be/src/scheduling/admission-control-service.cc: http://gerrit.cloudera.org:8080/#/c/17332/1/be/src/scheduling/admission-control-service.cc@304 PS1, Line 304: LOG(INFO) << "Received heartbeat from unrecognized coord_id=" << req->coord_id(); > Maybe WARNING is too high, we do expect to see this after a restart, maybe Just a log line at the INFO level should suffice in that case http://gerrit.cloudera.org:8080/#/c/17332/1/be/src/scheduling/admission-control-service.cc@332 PS1, Line 332: bool registered_on_ac_service = > Is it possible that the admissionstate we have just received is more up-to- I have made both the RebuildAdmissionState and CancelQueriesOnFailedCoordinators atomic operations so there should be no inconsistency anymore. Also if the coord is already registered it would return an OK status. In that case as well there should be no inconsistency as all RPCs after the first successful call to RebuildAdmissionState would be serviced and ideally any subsequent calls for RebuildAdmissionState should just be previous retries. http://gerrit.cloudera.org:8080/#/c/17332/1/be/src/scheduling/admission-control-service.cc@449 PS1, Line 449: lock_guard l(admission_state->lock); > The size of known_coord_ids_ is interesting, maybe add it to the message. Done http://gerrit.cloudera.org:8080/#/c/17332/1/be/src/scheduling/admission-controller.cc File be/src/scheduling/admission-controller.cc: http://gerrit.cloudera.org:8080/#/c/17332/1/be/src/scheduling/admission-controller.cc@2103 PS1, Line 2103: DCHECK( > line too long (93 > 90) Done http://gerrit.cloudera.org:8080/#/c/17332/1/be/src/scheduling/admission-controller.cc@2104 PS1, Line 2104: num_backends_to_release_.find(state->query_id()) == num_backends_to_release_.end()); > line too long (92 > 90) Done http://gerrit.cloudera.org:8080/#/c/17332/1/be/src/scheduling/remote-admission-control-client.h File be/src/scheduling/remote-admission-control-client.h: http://gerrit.cloudera.org:8080/#/c/17332/1/be/src/scheduling/remote-admission-control-client.h@54 PS1, Line 54: /// TODO: add info on what this does? here or in the class comments > Yes it seems like these methods don't have descriptions will add these in the next update http://gerrit.cloudera.org:8080/#/c/17332/1/be/src/scheduling/remote-admission-control-client.cc File be/src/scheduling/remote-admission-control-client.cc: http://gerrit.cloudera.org:8080/#/c/17332/1/be/src/scheduling/remote-admission-control-client.cc@131 PS1, Line 131: // retry_admission can be false if AttemptAdmissionAndWait succeeded but a > I think this was already set to true? This might not be set to true if the RPC succeeded but the impala-server initiated a ResetPendingAdmit. This can happen in the following case: - the RPCs (both admit and getQueryStatus) succeeded on a previous instance of the admissiond - The coordinator re-registered with an ongoing (old or a new restarted) instance of the admissiond. http://gerrit.cloudera.org:8080/#/c/17332/1/be/src/scheduling/remote-admission-control-client.cc@158 PS1, Line 158: while (admit_rpc_status.IsNetworkError() > Nit: "admissiond" Done http://gerrit.cloudera.org:8080/#/c/17332/1/common/protobuf/admission_control_service.proto File common/protobuf/admission_control_service.proto:
[Impala-ASF-CR] IMPALA-9976 IMPALA-10866: Add recovery mechanism to admission service and fix consistency between coord failure detection and registration
Hello Andrew Sherman, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17332 to look at the new patch set (#2). Change subject: IMPALA-9976 IMPALA-10866: Add recovery mechanism to admission service and fix consistency between coord failure detection and registration .. IMPALA-9976 IMPALA-10866: Add recovery mechanism to admission service and fix consistency between coord failure detection and registration Major changes: IMPALA-9976: - Leverages the admission heartbeat mechanism to signal the coordinator to send its complete admission state - No RPCs are serviced by a coordinator until it has sent its complete admission state. This is to prevent making admission decisions till admission service has built its view of the cluster - The complete admission state consists of the states of all queries that have successfully been admitted, that is, received a valid schedule from the admission controller and have marked its admission as complete (for remote admission it means its pending admit status has transitioned from true to false) - This helps prevent sending incomplete/inconsistent state to the admission controller - Queries that have not started admission get a chance to send their request to the new service - Queries that are queued restart the admission process by sending the request again. This re-try is now also marked in the query profile - Other RPCs like ReleaseBackend, ReleaseQuery, CancelQuery that don't get serviced (till initial admission state is sent) can result in inconsistent state. This state will be rectified in the admission heartbeats - AdmitQuery and GetQueryStatus just retry again if they notice a network failure(assuming admissiond might be down/restarting) or received the error message that they cannot be serviced yet admissiond is waiting on initial state from this coordinator) IMPALA-10866: - Made sure that admission state removal on failure detection and admission state rebuilding on coordinator registration are atomic operations. - Leverage statestore's membership view to detect failure and allow coordinator registration. Limitations: - Rebuilding the state can not ensure that queued queries will maintain their spot in the queue. - Queries can be admitted before all coordinators get a chance to send their state. This can result in a brief period of over-admission We cannot rely completely on the statestore membership update and wait for all coordinators there to send admission state because that membership is also dynamic which makes it difficult to decide when to assume that the admission state is complete. - The functionalities for coordinator failure detection and registration rely completely on the statestore. Testing: - Added end to end tests Change-Id: I8ad3ef9b9e2496c484833d6326ce914c851e02fd --- M be/src/runtime/coordinator-backend-resource-state.cc M be/src/runtime/coordinator-backend-state.h M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h M be/src/scheduling/admission-control-client.cc M be/src/scheduling/admission-control-client.h M be/src/scheduling/admission-control-service.cc M be/src/scheduling/admission-control-service.h M be/src/scheduling/admission-controller-test.cc M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M be/src/scheduling/admissiond-env.cc M be/src/scheduling/local-admission-control-client.cc M be/src/scheduling/local-admission-control-client.h M be/src/scheduling/remote-admission-control-client.cc M be/src/scheduling/remote-admission-control-client.h M be/src/scheduling/schedule-state.cc M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/impala-server.cc M be/src/service/impala-server.h M common/protobuf/admission_control_service.proto M common/thrift/generate_error_codes.py M tests/custom_cluster/test_admission_controller.py 24 files changed, 756 insertions(+), 93 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/17332/2 -- To view, visit http://gerrit.cloudera.org:8080/17332 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8ad3ef9b9e2496c484833d6326ce914c851e02fd Gerrit-Change-Number: 17332 Gerrit-PatchSet: 2 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell
[Impala-ASF-CR] WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 ) Change subject: WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/17815/2/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/17815/2/be/src/exec/hdfs-orc-scanner.cc@1046 PS2, Line 1046: for (int i = 1; i < eval->root().children().size(); ++i) { : // ORC reader only supports pushing down predicates that constant parts are literal. : // We could get non-literal expr if expr rewrites are disabled. : if (!eval->root().GetChild(i)->IsLiteral()) return false; : in_list.emplace_back(GetLiteralSearchArguments( : eval, i, slot_desc->type(), _type)); : } > Yes, here is the rule: https://github.com/apache/impala/blob/beb8019f5300bb Okay. That fits my understanding of constant folding. Thanks for the URLs. So if we have tested the presence of literals in buildOrcInListStatsPredicate(), can we assume these literals will be saved in the plan and available in BE to build the In-list predicates with (i.e., to remove line 1049)? -- To view, visit http://gerrit.cloudera.org:8080/17815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225 Gerrit-Change-Number: 17815 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 31 Aug 2021 16:39:53 + Gerrit-HasComments: Yes
[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15370 ) Change subject: WIP IMPALA-6636: Use async IO in ORC scanner .. Patch Set 4: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7448/ -- To view, visit http://gerrit.cloudera.org:8080/15370 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074 Gerrit-Change-Number: 15370 Gerrit-PatchSet: 4 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 31 Aug 2021 16:15:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10884: Improve pretty-printing of fragment instance name
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17816 ) Change subject: IMPALA-10884: Improve pretty-printing of fragment instance name .. Patch Set 3: > Patch Set 2: Code-Review+2 > > This looks good to me Thank you for the review, Joe! -- To view, visit http://gerrit.cloudera.org:8080/17816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I03908ed2b29e43e133bff92c0d6480f8c5342f31 Gerrit-Change-Number: 17816 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 31 Aug 2021 15:27:07 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15370 ) Change subject: WIP IMPALA-6636: Use async IO in ORC scanner .. Patch Set 3: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/9411/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/15370 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074 Gerrit-Change-Number: 15370 Gerrit-PatchSet: 3 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 31 Aug 2021 14:41:56 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15370 ) Change subject: WIP IMPALA-6636: Use async IO in ORC scanner .. Patch Set 4: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7448/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/15370 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074 Gerrit-Change-Number: 15370 Gerrit-PatchSet: 4 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 31 Aug 2021 14:30:17 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15370 ) Change subject: WIP IMPALA-6636: Use async IO in ORC scanner .. Patch Set 4: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7447/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/15370 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074 Gerrit-Change-Number: 15370 Gerrit-PatchSet: 4 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 31 Aug 2021 14:30:08 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner
Hello Quanlong Huang, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15370 to look at the new patch set (#3). Change subject: WIP IMPALA-6636: Use async IO in ORC scanner .. WIP IMPALA-6636: Use async IO in ORC scanner Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074 --- M be/src/exec/hdfs-columnar-scanner.cc M be/src/exec/hdfs-columnar-scanner.h M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/hdfs-orc-scanner.h M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/parquet/parquet-page-reader.cc M be/src/exec/scanner-context.cc M be/src/exec/scanner-context.h M be/src/runtime/io/disk-io-mgr.h M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java 11 files changed, 395 insertions(+), 181 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/15370/3 -- To view, visit http://gerrit.cloudera.org:8080/15370 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074 Gerrit-Change-Number: 15370 Gerrit-PatchSet: 3 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 ) Change subject: WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/17815/2/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/17815/2/be/src/exec/hdfs-orc-scanner.cc@1046 PS2, Line 1046: for (int i = 1; i < eval->root().children().size(); ++i) { : // ORC reader only supports pushing down predicates that constant parts are literal. : // We could get non-literal expr if expr rewrites are disabled. : if (!eval->root().GetChild(i)->IsLiteral()) return false; : in_list.emplace_back(GetLiteralSearchArguments( : eval, i, slot_desc->type(), _type)); : } > Does the constant-folding happen in FE? Yes, here is the rule: https://github.com/apache/impala/blob/beb8019f5300bb163424e7fdfec50b8e4b796e26/fe/src/main/java/org/apache/impala/rewrite/FoldConstantsRule.java#L41 Here is the entry point for expr rewrite: https://github.com/apache/impala/blob/beb8019f5300bb163424e7fdfec50b8e4b796e26/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java#L521 -- To view, visit http://gerrit.cloudera.org:8080/17815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225 Gerrit-Change-Number: 17815 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 31 Aug 2021 14:14:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 ) Change subject: WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/17815/2/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/17815/2/be/src/exec/hdfs-orc-scanner.cc@1046 PS2, Line 1046: for (int i = 1; i < eval->root().children().size(); ++i) { : // ORC reader only supports pushing down predicates that constant parts are literal. : // We could get non-literal expr if expr rewrites are disabled. : if (!eval->root().GetChild(i)->IsLiteral()) return false; : in_list.emplace_back(GetLiteralSearchArguments( : eval, i, slot_desc->type(), _type)); : } > This loop is for generating 'in_list', the vector. The check Does the constant-folding happen in FE? http://gerrit.cloudera.org:8080/#/c/17815/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/17815/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@603 PS2, Line 603: EQ > Sorry, I mean EQUALS predicate. We have a check at line 595. I see. Yeah, push directly is nice. Done. -- To view, visit http://gerrit.cloudera.org:8080/17815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225 Gerrit-Change-Number: 17815 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 31 Aug 2021 13:06:33 + Gerrit-HasComments: Yes
[Impala-ASF-CR] WIP IMPALA-2581: LIMIT can be propagated down into some aggregations
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: WIP IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 5: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7445/ -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 5 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 31 Aug 2021 10:51:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9495: Support struct in select list for ORC tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17638 ) Change subject: IMPALA-9495: Support struct in select list for ORC tables .. Patch Set 15: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/9410/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17638 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0fbe56bdcd372b72e99c0195d87a818e7fa4bc3a Gerrit-Change-Number: 17638 Gerrit-PatchSet: 15 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 31 Aug 2021 09:21:08 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP IMPALA-2581: LIMIT can be propagated down into some aggregations
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: WIP IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 5: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7445/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 5 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 31 Aug 2021 09:18:34 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 ) Change subject: WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/17815/2/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/17815/2/be/src/exec/hdfs-orc-scanner.cc@1046 PS2, Line 1046: for (int i = 1; i < eval->root().children().size(); ++i) { : // ORC reader only supports pushing down predicates that constant parts are literal. : // We could get non-literal expr if expr rewrites are disabled. : if (!eval->root().GetChild(i)->IsLiteral()) return false; : in_list.emplace_back(GetLiteralSearchArguments( : eval, i, slot_desc->type(), _type)); : } > Since we have checked in FE on literals already, looks this loop can be rem This loop is for generating 'in_list', the vector. The check inside it is also needed since we could get non-literal expr if expr rewrites are disabled (thus constant-folding is disabled). http://gerrit.cloudera.org:8080/#/c/17815/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/17815/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@603 PS2, Line 603: EQ > nit. you mean binary? Sorry, I mean EQUALS predicate. We have a check at line 595. -- To view, visit http://gerrit.cloudera.org:8080/17815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225 Gerrit-Change-Number: 17815 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 31 Aug 2021 09:15:00 + Gerrit-HasComments: Yes
[Impala-ASF-CR] WIP IMPALA-2581: LIMIT can be propagated down into some aggregations
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: WIP IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 2: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7444/ -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 2 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 31 Aug 2021 09:12:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9495: Support struct in select list for ORC tables
Hello Quanlong Huang, Qifan Chen, Daniel Becker, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17638 to look at the new patch set (#15). Change subject: IMPALA-9495: Support struct in select list for ORC tables .. IMPALA-9495: Support struct in select list for ORC tables This patch implements the functionality to allow structs in the select list of inline views, topmost blocks. When displaying the value of a struct it is formatted into a JSON value and returned as a string. An example of such a value: SELECT struct_col FROM some_table; '{"int_struct_member":12,"string_struct_member":"string value"}' Another example where we query a nested struct: SELECT outer_struct_col FROM some_table; '{"inner_struct":{"string_member":"string value","int_member":12}}' Note, the conversion from struct to JSON happens on the server side before sending out the value in HS2 to the client. However, HS2 is capable of handling struct values as well so in a later change we might want to add a functionality to send the struct in thrift to the client so that the client can use the struct directly. -- Internal representation of a struct: When scanning a struct the rowbatch will hold the values of the struct's children as if they were queried one by one directly in the select list. E.g. Taking the following table: CREATE TABLE tbl (id int, s struct) STORED AS ORC And running the following query: SELECT id, s FROM tbl; After scanning a row in a row batch will hold the following values: (note the biggest size comes first) 1: The pointer for the string in s.b 2: The length for the string in s.b 3: The int value for s.a 4: The int value of id 5: A single null byte for all the slots: id, s, s.a, s.b The size of a struct has an effect on the order of the memory layout of a row batch. The struct size is calculated by summing the size of its fields and then the struct gets a place in the row batch to precede all smaller slots by size. Note, all the fields of a struct are consecutive to each other in the row batch. Inside a struct the order of the fields is also based on their size as it does in a regular case for primitives. When evaluating a struct as a SlotRef a newly introduced StructVal will be used to refer to the actual values of a struct in the row batch. This StructVal holds a vector of pointers where each pointer represents a member of the struct. Following the above example the StructVal would keep two pointers, one to point to an IntVal and one to point to a StringVal. -- Changes related to tuple and slot descriptors: When providing a struct in the select list there is going to be a SlotDescriptor for the struct slot in the topmost TupleDescriptor. Additionally, another TupleDesriptor is created to hold SlotDescriptors for each of the struct's children. The struct SlotDescriptor points to the newly introduced TupleDescriptor using 'itemTupleId'. The offsets for the children of the struct is calculated from the beginning of the topmost TupleDescriptor and not from the TupleDescriptor that directly holds the struct's children. The null indicator bytes as well are stored on the level of the topmost TupleDescriptor. -- Changes related to scalar expressions: A struct in the select list is translated into an expression tree where the top of this tree is a SlotRef for the struct itself and its children in the tree are SlotRefs for the members of the struct. When evaluating a struct SlotRef after the null checks the evaluation is delegated to the children SlotRefs. -- Restrictions: - Codegen support is not included in this patch. - Only ORC file format is supported by this patch. - Only HS2 client supports returning structs. Beeswax support is not implemented as it is going to be deprecated anyway. Currently we receive an error when trying to query a struct through Beeswax. -- Tests added: - The ORC and Parquet functional database is extended with 2 new tables: A table with one level structs, holding different kind of primitive types as members and another table with 2 and 3 level nested structs. - struct-in-select-list.test and nested-struct-in-select-list.test uses these new tables to query structs directly or through an inline view. Change-Id: I0fbe56bdcd372b72e99c0195d87a818e7fa4bc3a --- M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/orc-column-readers.cc M be/src/exec/orc-column-readers.h M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/parquet-collection-column-reader.cc M be/src/exprs/anyval-util.cc M be/src/exprs/expr-value.h M be/src/exprs/scalar-expr-evaluator.cc M be/src/exprs/scalar-expr-evaluator.h M be/src/exprs/scalar-expr.cc M be/src/exprs/scalar-expr.h M be/src/exprs/scalar-expr.inline.h M be/src/exprs/slot-ref.cc M
[Impala-ASF-CR] WIP IMPALA-2581: LIMIT can be propagated down into some aggregations
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17821 to look at the new patch set (#5). Change subject: WIP IMPALA-2581: LIMIT can be propagated down into some aggregations .. WIP IMPALA-2581: LIMIT can be propagated down into some aggregations This patch contains 2 parts: 1. In some cases, push down limit to pre-aggregation a) aggregation node has no aggregate function b) aggregation node has no predicate 2. finish aggregation when number of unique keys of hash table has exceeded the limit. Queries like SELECT DISTINCT f FROM t LIMIT n Can pass the LIMIT all the way down to the pre-aggregation, which leads to a nearly unbounded speedup on these queries in large tables when n is low. Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 --- M be/src/exec/aggregation-node-base.cc M be/src/exec/aggregation-node-base.h M be/src/exec/aggregation-node.cc M be/src/exec/aggregator.h M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M be/src/exec/non-grouping-aggregator.h M be/src/exec/streaming-aggregation-node.cc M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test M testdata/workloads/functional-query/queries/QueryTest/spilling.test M testdata/workloads/targeted-perf/queries/aggregation.test 17 files changed, 128 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/5 -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 5 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] [WIP]IMPALA-2581: LIMIT can be propagated down into some aggregations
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: [WIP]IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 3: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/9409/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 3 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 31 Aug 2021 07:53:47 + Gerrit-HasComments: No
[Impala-ASF-CR] [WIP]IMPALA-2581: LIMIT can be propagated down into some aggregations
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: [WIP]IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 2: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/9408/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 2 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 31 Aug 2021 07:46:24 + Gerrit-HasComments: No
[Impala-ASF-CR] [WIP]IMPALA-2581: LIMIT can be propagated down into some aggregations
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17821 to look at the new patch set (#4). Change subject: [WIP]IMPALA-2581: LIMIT can be propagated down into some aggregations .. [WIP]IMPALA-2581: LIMIT can be propagated down into some aggregations This patch does two things: 1. In some cases, push down limit to pre-aggregation a) aggregation node has no aggregate function b) aggregation node has no predicate 2. finish aggregation when number of unique keys of hash table has exceeded the limit. Queries like SELECT DISTINCT f FROM t LIMIT n Can pass the LIMIT all the way down to the pre-aggregation, which leads to a nearly unbounded speedup on these queries in large tables when n is low. Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 --- M be/src/exec/aggregation-node-base.cc M be/src/exec/aggregation-node-base.h M be/src/exec/aggregation-node.cc M be/src/exec/aggregator.h M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M be/src/exec/non-grouping-aggregator.h M be/src/exec/streaming-aggregation-node.cc M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test M testdata/workloads/functional-query/queries/QueryTest/spilling.test M testdata/workloads/targeted-perf/queries/aggregation.test 17 files changed, 128 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/4 -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 4 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] cleaner and faster operations wtih datasketches
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/17818 ) Change subject: cleaner and faster operations wtih datasketches .. Patch Set 1: (4 comments) Thanks Alexander for taking care of these changes! Apart from the automated format checks I left some comments, nothing serious. http://gerrit.cloudera.org:8080/#/c/17818/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17818/1//COMMIT_MSG@7 PS1, Line 7: cleaner I've created a jira ticket for this patch. Could you please add the Jira Id to the beginning of the commit msg (similarly to other patches)? https://issues.apache.org/jira/browse/IMPALA-10901 http://gerrit.cloudera.org:8080/#/c/17818/1//COMMIT_MSG@7 PS1, Line 7: wtih typo http://gerrit.cloudera.org:8080/#/c/17818/1//COMMIT_MSG@8 PS1, Line 8: Could you please write a sentence or two here to sum up the changes in this patch? http://gerrit.cloudera.org:8080/#/c/17818/1/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/17818/1/be/src/exprs/aggregate-functions-ir.cc@1624 PS1, Line 1624: StringVal SerializeCompactDsHllSketch(FunctionContext* ctx, With this simplification we ended up with a number of functions that have actually the same body, and a difference between them is a function parameter. I wonder if these could be simplified further to have a single template function that covers all of them. -- To view, visit http://gerrit.cloudera.org:8080/17818 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I306a2489dac0f4d2d475e8f9987cd58bf95474bb Gerrit-Change-Number: 17818 Gerrit-PatchSet: 1 Gerrit-Owner: Alexander Saydakov Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 31 Aug 2021 07:33:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17821 to look at the new patch set (#3). Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. IMPALA-2581: LIMIT can be propagated down into some aggregations This patch does two things: 1. In some cases, push down limit to pre-aggregation a) aggregation node has no aggregate function b) aggregation node has no predicate 2. finish aggregation when number of unique keys of hash table has exceeded the limit. Queries like SELECT DISTINCT f FROM t LIMIT n Can pass the LIMIT all the way down to the pre-aggregation, which leads to a nearly unbounded speedup on these queries in large tables when n is low. Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 --- M be/src/exec/aggregation-node-base.cc M be/src/exec/aggregation-node-base.h M be/src/exec/aggregation-node.cc M be/src/exec/aggregator.h M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M be/src/exec/non-grouping-aggregator.h M be/src/exec/streaming-aggregation-node.cc M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test M testdata/workloads/functional-query/queries/QueryTest/spilling.test M testdata/workloads/targeted-perf/queries/aggregation.test 17 files changed, 128 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/3 -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 3 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/17821/2/be/src/exec/aggregation-node-base.cc File be/src/exec/aggregation-node-base.cc: http://gerrit.cloudera.org:8080/#/c/17821/2/be/src/exec/aggregation-node-base.cc@91 PS2, Line 91: // node has exceeded the limit we can complete the query without calculating all the data line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/17821/2/be/src/exec/grouping-aggregator.h File be/src/exec/grouping-aggregator.h: http://gerrit.cloudera.org:8080/#/c/17821/2/be/src/exec/grouping-aggregator.h@217 PS2, Line 217: void UnsetLimit() { limit_ = -1; } line has trailing whitespace -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 2 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 31 Aug 2021 07:26:14 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 2: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7444/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 2 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 31 Aug 2021 07:25:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
liuyao has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17821 Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. IMPALA-2581: LIMIT can be propagated down into some aggregations This patch does two things: 1. In some cases, push down limit to pre-aggregation a) aggregation node has no aggregate function b) aggregation node has no predicate 2. finish aggregation when number of unique keys of hash table has exceeded the limit. Queries like SELECT DISTINCT f FROM t LIMIT n Can pass the LIMIT all the way down to the pre-aggregation, which leads to a nearly unbounded speedup on these queries in large tables when n is low. Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 --- M be/src/exec/aggregation-node-base.cc M be/src/exec/aggregation-node-base.h M be/src/exec/aggregation-node.cc M be/src/exec/aggregator.h M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M be/src/exec/non-grouping-aggregator.h M be/src/exec/streaming-aggregation-node.cc M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test M testdata/workloads/functional-query/queries/QueryTest/spilling.test M testdata/workloads/targeted-perf/queries/aggregation.test 17 files changed, 127 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/2 -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 2 Gerrit-Owner: liuyao
[Impala-ASF-CR] IMPALA-8680: Docker-based tests fail to archive the minicluster component logs
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/15898 ) Change subject: IMPALA-8680: Docker-based tests fail to archive the minicluster component logs .. IMPALA-8680: Docker-based tests fail to archive the minicluster component logs Inside docker container copy logs of cluster components hdfs, yarn, kudu from folder testdata/cluster/cdh/node-/var/log/ to folder logs/cluster/ Testing: - running docker-based tests and checked that minicluster logs are preserved and archived - test if minicluster logs get copied also in case when something gets wrong during build Change-Id: I23e25d42992cec47c593dc388bcf0bcef828c05e Reviewed-on: http://gerrit.cloudera.org:8080/15898 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M docker/entrypoint.sh 1 file changed, 66 insertions(+), 21 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/15898 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I23e25d42992cec47c593dc388bcf0bcef828c05e Gerrit-Change-Number: 15898 Gerrit-PatchSet: 9 Gerrit-Owner: Zoltan Garaguly Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Zoltan Garaguly
[Impala-ASF-CR] IMPALA-8680: Docker-based tests fail to archive the minicluster component logs
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15898 ) Change subject: IMPALA-8680: Docker-based tests fail to archive the minicluster component logs .. Patch Set 8: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/15898 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I23e25d42992cec47c593dc388bcf0bcef828c05e Gerrit-Change-Number: 15898 Gerrit-PatchSet: 8 Gerrit-Owner: Zoltan Garaguly Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Zoltan Garaguly Gerrit-Comment-Date: Tue, 31 Aug 2021 06:58:33 + Gerrit-HasComments: No