[Impala-ASF-CR] WIP IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-08-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: WIP IMPALA-2581: LIMIT can be propagated down into some 
aggregations
..


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7450/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 6
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 01 Sep 2021 05:38:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-08-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: WIP IMPALA-2581: LIMIT can be propagated down into some 
aggregations
..


Patch Set 5:

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7449/


--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 5
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 01 Sep 2021 04:10:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-08-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: WIP IMPALA-2581: LIMIT can be propagated down into some 
aggregations
..


Patch Set 6:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/9414/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 6
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 01 Sep 2021 04:04:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-08-31 Thread liuyao (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17821

to look at the new patch set (#6).

Change subject: WIP IMPALA-2581: LIMIT can be propagated down into some 
aggregations
..

WIP IMPALA-2581: LIMIT can be propagated down into some aggregations

This patch contains 2 parts:
1. In some cases,  push down limit to pre-aggregation
 a) aggregation node has no aggregate function
 b) aggregation node has no predicate
2. finish aggregation when number of unique keys of hash table has
exceeded the limit.

Queries like

SELECT DISTINCT f FROM t LIMIT n

Can pass the LIMIT all the way down to the pre-aggregation, which
leads to a nearly unbounded speedup on these queries in large tables
when n is low.

Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
---
M be/src/exec/aggregation-node-base.cc
M be/src/exec/aggregation-node-base.h
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregator.h
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M be/src/exec/non-grouping-aggregator.h
M be/src/exec/streaming-aggregation-node.cc
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
M testdata/workloads/targeted-perf/queries/aggregation.test
16 files changed, 127 insertions(+), 23 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/6
--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 6
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10901 cleaner and faster operations with datasketches

2021-08-31 Thread Fucun Chu (Code Review)
Fucun Chu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17818 )

Change subject: IMPALA-10901 cleaner and faster operations with datasketches
..


Patch Set 2:

The following EE Tests files need to be modified:
testdata/workloads/functional-query/queries/QueryTest/
datasketches-cpc.test
datasketches-hll.test
datasketches-kll.test
datasketches-theta.test
"UDF ERROR: Unable to deserialize sketch" needs to add e.what() information.

Run the above test case file using the following command:
cd tests
impala-py.test query_test/test_datasketches.py

Or use pre-review-test 
(https://jenkins.impala.io/job/pre-review-test/build?delay=0sec) to run the test


--
To view, visit http://gerrit.cloudera.org:8080/17818
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I306a2489dac0f4d2d475e8f9987cd58bf95474bb
Gerrit-Change-Number: 17818
Gerrit-PatchSet: 2
Gerrit-Owner: Alexander Saydakov 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 01 Sep 2021 03:05:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-08-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: WIP IMPALA-2581: LIMIT can be propagated down into some 
aggregations
..


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7449/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 5
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 01 Sep 2021 02:40:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10901 cleaner and faster operations with datasketches

2021-08-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17818 )

Change subject: IMPALA-10901 cleaner and faster operations with datasketches
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9413/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17818
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I306a2489dac0f4d2d475e8f9987cd58bf95474bb
Gerrit-Change-Number: 17818
Gerrit-PatchSet: 2
Gerrit-Owner: Alexander Saydakov 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 31 Aug 2021 23:53:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10901 cleaner and faster operations with datasketches

2021-08-31 Thread Alexander Saydakov (Code Review)
Hello Gabor Kaszab, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17818

to look at the new patch set (#2).

Change subject: IMPALA-10901 cleaner and faster operations with datasketches
..

IMPALA-10901 cleaner and faster operations with datasketches

- serialize using bytes instead of stream
- avoid unnecessary constructor during deserialization
- simplified code slightly
- added original exception message to re-thrown generic message

Change-Id: I306a2489dac0f4d2d475e8f9987cd58bf95474bb
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/datasketches-common.cc
M be/src/exprs/datasketches-common.h
M be/src/exprs/datasketches-functions-ir.cc
4 files changed, 233 insertions(+), 342 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/18/17818/2
--
To view, visit http://gerrit.cloudera.org:8080/17818
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I306a2489dac0f4d2d475e8f9987cd58bf95474bb
Gerrit-Change-Number: 17818
Gerrit-PatchSet: 2
Gerrit-Owner: Alexander Saydakov 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-9976 IMPALA-10866: Add recovery mechanism to admission service and fix consistency between coord failure detection and registration

2021-08-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17332 )

Change subject: IMPALA-9976 IMPALA-10866: Add recovery mechanism to admission 
service and fix consistency between coord failure detection and registration
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9412/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17332
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8ad3ef9b9e2496c484833d6326ce914c851e02fd
Gerrit-Change-Number: 17332
Gerrit-PatchSet: 2
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Comment-Date: Tue, 31 Aug 2021 22:09:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9976 IMPALA-10866: Add recovery mechanism to admission service and fix consistency between coord failure detection and registration

2021-08-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17332 )

Change subject: IMPALA-9976 IMPALA-10866: Add recovery mechanism to admission 
service and fix consistency between coord failure detection and registration
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17332/2/tests/custom_cluster/test_admission_controller.py
File tests/custom_cluster/test_admission_controller.py:

http://gerrit.cloudera.org:8080/#/c/17332/2/tests/custom_cluster/test_admission_controller.py@1615
PS2, Line 1615: "
flake8: E126 continuation line over-indented for hanging indent


http://gerrit.cloudera.org:8080/#/c/17332/2/tests/custom_cluster/test_admission_controller.py@1628
PS2, Line 1628: "
flake8: E126 continuation line over-indented for hanging indent



--
To view, visit http://gerrit.cloudera.org:8080/17332
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8ad3ef9b9e2496c484833d6326ce914c851e02fd
Gerrit-Change-Number: 17332
Gerrit-PatchSet: 2
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Comment-Date: Tue, 31 Aug 2021 21:48:56 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9976 IMPALA-10866: Add recovery mechanism to admission service and fix consistency between coord failure detection and registration

2021-08-31 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17332 )

Change subject: IMPALA-9976 IMPALA-10866: Add recovery mechanism to admission 
service and fix consistency between coord failure detection and registration
..


Patch Set 2:

(19 comments)

http://gerrit.cloudera.org:8080/#/c/17332/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17332/1//COMMIT_MSG@9
PS1, Line 9:
> I wrote my own version of the commit message to see that I understand the c
Pretty much. The only thing I would add is that if a backend is marked as down 
by the statestore, it will also have to register(send full admission state) 
again with the admissiond to be able to be serviced


http://gerrit.cloudera.org:8080/#/c/17332/1//COMMIT_MSG@12
PS1, Line 12: - Leverages the admission heartbeat mechanism to signal the
> Nit" should this be "No RPCs are serviced from a coordinator until it has s
Done


http://gerrit.cloudera.org:8080/#/c/17332/1/be/src/scheduling/admission-control-client.cc
File be/src/scheduling/admission-control-client.cc:

http://gerrit.cloudera.org:8080/#/c/17332/1/be/src/scheduling/admission-control-client.cc@32
PS1, Line 32: "Re-submit for admission due to a possible admission service 
restart or network "
> Is there a reason it is only a "possible" restart?
as this can also be due to a generic network error that prevents the Admit RPC 
to go through.
Added this to the error msg. Open to suggestion for a better message text


http://gerrit.cloudera.org:8080/#/c/17332/1/be/src/scheduling/admission-control-service.cc
File be/src/scheduling/admission-control-service.cc:

http://gerrit.cloudera.org:8080/#/c/17332/1/be/src/scheduling/admission-control-service.cc@304
PS1, Line 304: LOG(INFO) << "Received heartbeat from unrecognized 
coord_id=" << req->coord_id();
> Maybe WARNING is too high, we do expect to see this after a restart, maybe
Just a log line at the INFO level should suffice in that case


http://gerrit.cloudera.org:8080/#/c/17332/1/be/src/scheduling/admission-control-service.cc@332
PS1, Line 332:   bool registered_on_ac_service =
> Is it possible that the admissionstate we have just received is more up-to-
I have made both the RebuildAdmissionState and  
CancelQueriesOnFailedCoordinators atomic operations so there should be no 
inconsistency anymore. Also if the coord is already registered it would return 
an OK status. In that case as well there should be no inconsistency as all RPCs 
after the first successful call to RebuildAdmissionState would be serviced and 
ideally any subsequent calls for RebuildAdmissionState should just be previous 
retries.


http://gerrit.cloudera.org:8080/#/c/17332/1/be/src/scheduling/admission-control-service.cc@449
PS1, Line 449: lock_guard l(admission_state->lock);
> The size of known_coord_ids_ is interesting, maybe add it to the message.
Done


http://gerrit.cloudera.org:8080/#/c/17332/1/be/src/scheduling/admission-controller.cc
File be/src/scheduling/admission-controller.cc:

http://gerrit.cloudera.org:8080/#/c/17332/1/be/src/scheduling/admission-controller.cc@2103
PS1, Line 2103:   DCHECK(
> line too long (93 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17332/1/be/src/scheduling/admission-controller.cc@2104
PS1, Line 2104:   num_backends_to_release_.find(state->query_id()) == 
num_backends_to_release_.end());
> line too long (92 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17332/1/be/src/scheduling/remote-admission-control-client.h
File be/src/scheduling/remote-admission-control-client.h:

http://gerrit.cloudera.org:8080/#/c/17332/1/be/src/scheduling/remote-admission-control-client.h@54
PS1, Line 54:   /// TODO: add info on what this does? here or in the class 
comments
> Yes it seems like these methods don't have descriptions
will add these in the next update


http://gerrit.cloudera.org:8080/#/c/17332/1/be/src/scheduling/remote-admission-control-client.cc
File be/src/scheduling/remote-admission-control-client.cc:

http://gerrit.cloudera.org:8080/#/c/17332/1/be/src/scheduling/remote-admission-control-client.cc@131
PS1, Line 131: // retry_admission can be false if 
AttemptAdmissionAndWait succeeded but a
> I think this was already set to true?
This might not be set to true if the RPC succeeded but the impala-server 
initiated a ResetPendingAdmit. This can happen in the following case:
- the RPCs (both admit and getQueryStatus) succeeded on a previous instance of 
the admissiond
- The coordinator re-registered with an ongoing (old or a new restarted) 
instance of the admissiond.


http://gerrit.cloudera.org:8080/#/c/17332/1/be/src/scheduling/remote-admission-control-client.cc@158
PS1, Line 158:   while (admit_rpc_status.IsNetworkError()
> Nit: "admissiond"
Done


http://gerrit.cloudera.org:8080/#/c/17332/1/common/protobuf/admission_control_service.proto
File common/protobuf/admission_control_service.proto:


[Impala-ASF-CR] IMPALA-9976 IMPALA-10866: Add recovery mechanism to admission service and fix consistency between coord failure detection and registration

2021-08-31 Thread Bikramjeet Vig (Code Review)
Hello Andrew Sherman, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17332

to look at the new patch set (#2).

Change subject: IMPALA-9976 IMPALA-10866: Add recovery mechanism to admission 
service and fix consistency between coord failure detection and registration
..

IMPALA-9976 IMPALA-10866: Add recovery mechanism to admission service
and fix consistency between coord failure detection and registration

Major changes:
IMPALA-9976:
- Leverages the admission heartbeat mechanism to signal the
coordinator to send its complete admission state
- No RPCs are serviced by a coordinator until it has sent its complete
admission state. This is to prevent making admission decisions till
admission service has built its view of the cluster
- The complete admission state consists of the states of all queries
that have successfully been admitted, that is, received a valid
schedule from the admission controller and have marked its admission
as complete (for remote admission it means its pending admit status
has transitioned from true to false)
- This helps prevent sending incomplete/inconsistent state to the
admission controller
- Queries that have not started admission get a chance to send their
request to the new service
- Queries that are queued restart the admission process by sending
the request again. This re-try is now also marked in the query profile
- Other RPCs like ReleaseBackend, ReleaseQuery, CancelQuery that
don't get serviced (till initial admission state is sent) can result
in inconsistent state. This state will be rectified in the admission
heartbeats
- AdmitQuery and GetQueryStatus just retry again if they notice a
network failure(assuming admissiond might be down/restarting) or
received the error message that they cannot be serviced yet
admissiond is waiting on initial state from this coordinator)

IMPALA-10866:
- Made sure that admission state removal on failure detection and
admission state rebuilding on coordinator registration are atomic
operations.
- Leverage statestore's membership view to detect failure and
allow coordinator registration.

Limitations:
- Rebuilding the state can not ensure that queued queries will
maintain their spot in the queue.
- Queries can be admitted before all coordinators get a chance to
send their state. This can result in a brief period of over-admission
We cannot rely completely on the statestore membership update and
wait for all coordinators there to send admission state because
that membership is also dynamic which makes it difficult to decide
when to assume that the admission state is complete.
- The functionalities for coordinator failure detection and
registration rely completely on the statestore.

Testing:
- Added end to end tests

Change-Id: I8ad3ef9b9e2496c484833d6326ce914c851e02fd
---
M be/src/runtime/coordinator-backend-resource-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/scheduling/admission-control-client.cc
M be/src/scheduling/admission-control-client.h
M be/src/scheduling/admission-control-service.cc
M be/src/scheduling/admission-control-service.h
M be/src/scheduling/admission-controller-test.cc
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/admission-controller.h
M be/src/scheduling/admissiond-env.cc
M be/src/scheduling/local-admission-control-client.cc
M be/src/scheduling/local-admission-control-client.h
M be/src/scheduling/remote-admission-control-client.cc
M be/src/scheduling/remote-admission-control-client.h
M be/src/scheduling/schedule-state.cc
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M common/protobuf/admission_control_service.proto
M common/thrift/generate_error_codes.py
M tests/custom_cluster/test_admission_controller.py
24 files changed, 756 insertions(+), 93 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/17332/2
--
To view, visit http://gerrit.cloudera.org:8080/17332
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8ad3ef9b9e2496c484833d6326ce914c851e02fd
Gerrit-Change-Number: 17332
Gerrit-PatchSet: 2
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 


[Impala-ASF-CR] WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

2021-08-31 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17815 )

Change subject: WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list 
predicate to ORC reader
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17815/2/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17815/2/be/src/exec/hdfs-orc-scanner.cc@1046
PS2, Line 1046: for (int i = 1; i < eval->root().children().size(); ++i) {
  : // ORC reader only supports pushing down predicates that 
constant parts are literal.
  : // We could get non-literal expr if expr rewrites are 
disabled.
  : if (!eval->root().GetChild(i)->IsLiteral()) return false;
  : in_list.emplace_back(GetLiteralSearchArguments(
  : eval, i, slot_desc->type(), _type));
  :   }
> Yes, here is the rule: https://github.com/apache/impala/blob/beb8019f5300bb
Okay. That fits my understanding of constant folding. Thanks for the URLs.

So if we have tested the presence of literals in 
buildOrcInListStatsPredicate(), can we assume these literals will be saved in 
the plan and available in BE to build the In-list predicates with (i.e., to 
remove line 1049)?



--
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 31 Aug 2021 16:39:53 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-08-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15370 )

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..


Patch Set 4: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7448/


--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 31 Aug 2021 16:15:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10884: Improve pretty-printing of fragment instance name

2021-08-31 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17816 )

Change subject: IMPALA-10884: Improve pretty-printing of fragment instance name
..


Patch Set 3:

> Patch Set 2: Code-Review+2
>
> This looks good to me

Thank you for the review, Joe!


--
To view, visit http://gerrit.cloudera.org:8080/17816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03908ed2b29e43e133bff92c0d6480f8c5342f31
Gerrit-Change-Number: 17816
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 31 Aug 2021 15:27:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-08-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15370 )

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..


Patch Set 3:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/9411/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 31 Aug 2021 14:41:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-08-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15370 )

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7448/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 31 Aug 2021 14:30:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-08-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15370 )

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7447/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 31 Aug 2021 14:30:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-08-31 Thread Csaba Ringhofer (Code Review)
Hello Quanlong Huang, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15370

to look at the new patch set (#3).

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..

WIP IMPALA-6636: Use async IO in ORC scanner

Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
---
M be/src/exec/hdfs-columnar-scanner.cc
M be/src/exec/hdfs-columnar-scanner.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exec/parquet/parquet-page-reader.cc
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
M be/src/runtime/io/disk-io-mgr.h
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
11 files changed, 395 insertions(+), 181 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/15370/3
--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

2021-08-31 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17815 )

Change subject: WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list 
predicate to ORC reader
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17815/2/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17815/2/be/src/exec/hdfs-orc-scanner.cc@1046
PS2, Line 1046: for (int i = 1; i < eval->root().children().size(); ++i) {
  : // ORC reader only supports pushing down predicates that 
constant parts are literal.
  : // We could get non-literal expr if expr rewrites are 
disabled.
  : if (!eval->root().GetChild(i)->IsLiteral()) return false;
  : in_list.emplace_back(GetLiteralSearchArguments(
  : eval, i, slot_desc->type(), _type));
  :   }
> Does the constant-folding happen in FE?
Yes, here is the rule: 
https://github.com/apache/impala/blob/beb8019f5300bb163424e7fdfec50b8e4b796e26/fe/src/main/java/org/apache/impala/rewrite/FoldConstantsRule.java#L41

Here is the entry point for expr rewrite: 
https://github.com/apache/impala/blob/beb8019f5300bb163424e7fdfec50b8e4b796e26/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java#L521



--
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 31 Aug 2021 14:14:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

2021-08-31 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17815 )

Change subject: WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list 
predicate to ORC reader
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17815/2/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17815/2/be/src/exec/hdfs-orc-scanner.cc@1046
PS2, Line 1046: for (int i = 1; i < eval->root().children().size(); ++i) {
  : // ORC reader only supports pushing down predicates that 
constant parts are literal.
  : // We could get non-literal expr if expr rewrites are 
disabled.
  : if (!eval->root().GetChild(i)->IsLiteral()) return false;
  : in_list.emplace_back(GetLiteralSearchArguments(
  : eval, i, slot_desc->type(), _type));
  :   }
> This loop is for generating 'in_list', the vector. The check
Does the constant-folding happen in FE?


http://gerrit.cloudera.org:8080/#/c/17815/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/17815/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@603
PS2, Line 603: EQ
> Sorry, I mean EQUALS predicate. We have a check at line 595.
I see. Yeah, push directly is nice.

Done.



--
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 31 Aug 2021 13:06:33 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] WIP IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-08-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: WIP IMPALA-2581: LIMIT can be propagated down into some 
aggregations
..


Patch Set 5: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7445/


--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 5
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 31 Aug 2021 10:51:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9495: Support struct in select list for ORC tables

2021-08-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17638 )

Change subject: IMPALA-9495: Support struct in select list for ORC tables
..


Patch Set 15:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9410/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17638
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0fbe56bdcd372b72e99c0195d87a818e7fa4bc3a
Gerrit-Change-Number: 17638
Gerrit-PatchSet: 15
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 31 Aug 2021 09:21:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-08-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: WIP IMPALA-2581: LIMIT can be propagated down into some 
aggregations
..


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7445/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 5
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 31 Aug 2021 09:18:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader

2021-08-31 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17815 )

Change subject: WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list 
predicate to ORC reader
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17815/2/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17815/2/be/src/exec/hdfs-orc-scanner.cc@1046
PS2, Line 1046: for (int i = 1; i < eval->root().children().size(); ++i) {
  : // ORC reader only supports pushing down predicates that 
constant parts are literal.
  : // We could get non-literal expr if expr rewrites are 
disabled.
  : if (!eval->root().GetChild(i)->IsLiteral()) return false;
  : in_list.emplace_back(GetLiteralSearchArguments(
  : eval, i, slot_desc->type(), _type));
  :   }
> Since we have checked in FE on literals already, looks this loop can be rem
This loop is for generating 'in_list', the vector. The check 
inside it is also needed since we could get non-literal expr if expr rewrites 
are disabled (thus constant-folding is disabled).


http://gerrit.cloudera.org:8080/#/c/17815/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/17815/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@603
PS2, Line 603: EQ
> nit. you mean binary?
Sorry, I mean EQUALS predicate. We have a check at line 595.



--
To view, visit http://gerrit.cloudera.org:8080/17815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225
Gerrit-Change-Number: 17815
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 31 Aug 2021 09:15:00 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] WIP IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-08-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: WIP IMPALA-2581: LIMIT can be propagated down into some 
aggregations
..


Patch Set 2: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7444/


--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 2
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 31 Aug 2021 09:12:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9495: Support struct in select list for ORC tables

2021-08-31 Thread Gabor Kaszab (Code Review)
Hello Quanlong Huang, Qifan Chen, Daniel Becker, Csaba Ringhofer, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17638

to look at the new patch set (#15).

Change subject: IMPALA-9495: Support struct in select list for ORC tables
..

IMPALA-9495: Support struct in select list for ORC tables

This patch implements the functionality to allow structs in the select
list of inline views, topmost blocks. When displaying the value of a
struct it is formatted into a JSON value and returned as a string. An
example of such a value:

SELECT struct_col FROM some_table;
'{"int_struct_member":12,"string_struct_member":"string value"}'

Another example where we query a nested struct:
SELECT outer_struct_col FROM some_table;
'{"inner_struct":{"string_member":"string value","int_member":12}}'

Note, the conversion from struct to JSON happens on the server side
before sending out the value in HS2 to the client. However, HS2 is
capable of handling struct values as well so in a later change we might
want to add a functionality to send the struct in thrift to the client
so that the client can use the struct directly.

-- Internal representation of a struct:
When scanning a struct the rowbatch will hold the values of the
struct's children as if they were queried one by one directly in the
select list.

E.g. Taking the following table:
CREATE TABLE tbl (id int, s struct) STORED AS ORC

And running the following query:
SELECT id, s FROM tbl;

After scanning a row in a row batch will hold the following values:
(note the biggest size comes first)
 1: The pointer for the string in s.b
 2: The length for the string in s.b
 3: The int value for s.a
 4: The int value of id
 5: A single null byte for all the slots: id, s, s.a, s.b

The size of a struct has an effect on the order of the memory layout of
a row batch. The struct size is calculated by summing the size of its
fields and then the struct gets a place in the row batch to precede all
smaller slots by size. Note, all the fields of a struct are consecutive
to each other in the row batch. Inside a struct the order of the fields
is also based on their size as it does in a regular case for primitives.

When evaluating a struct as a SlotRef a newly introduced StructVal will
be used to refer to the actual values of a struct in the row batch.
This StructVal holds a vector of pointers where each pointer represents
a member of the struct. Following the above example the StructVal would
keep two pointers, one to point to an IntVal and one to point to a
StringVal.

-- Changes related to tuple and slot descriptors:
When providing a struct in the select list there is going to be a
SlotDescriptor for the struct slot in the topmost TupleDescriptor.
Additionally, another TupleDesriptor is created to hold SlotDescriptors
for each of the struct's children. The struct SlotDescriptor points to
the newly introduced TupleDescriptor using 'itemTupleId'.
The offsets for the children of the struct is calculated from the
beginning of the topmost TupleDescriptor and not from the
TupleDescriptor that directly holds the struct's children. The null
indicator bytes as well are stored on the level of the topmost
TupleDescriptor.

-- Changes related to scalar expressions:
A struct in the select list is translated into an expression tree where
the top of this tree is a SlotRef for the struct itself and its
children in the tree are SlotRefs for the members of the struct. When
evaluating a struct SlotRef after the null checks the evaluation is
delegated to the children SlotRefs.

-- Restrictions:
  - Codegen support is not included in this patch.
  - Only ORC file format is supported by this patch.
  - Only HS2 client supports returning structs. Beeswax support is not
implemented as it is going to be deprecated anyway. Currently we
receive an error when trying to query a struct through Beeswax.

-- Tests added:
  - The ORC and Parquet functional database is extended with 2 new
tables: A table with one level structs, holding different kind of
primitive types as members and another table with 2 and 3 level
nested structs.
  - struct-in-select-list.test and nested-struct-in-select-list.test
uses these new tables to query structs directly or through an
inline view.

Change-Id: I0fbe56bdcd372b72e99c0195d87a818e7fa4bc3a
---
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/orc-column-readers.cc
M be/src/exec/orc-column-readers.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/parquet-collection-column-reader.cc
M be/src/exprs/anyval-util.cc
M be/src/exprs/expr-value.h
M be/src/exprs/scalar-expr-evaluator.cc
M be/src/exprs/scalar-expr-evaluator.h
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-expr.h
M be/src/exprs/scalar-expr.inline.h
M be/src/exprs/slot-ref.cc
M 

[Impala-ASF-CR] WIP IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-08-31 Thread liuyao (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17821

to look at the new patch set (#5).

Change subject: WIP IMPALA-2581: LIMIT can be propagated down into some 
aggregations
..

WIP IMPALA-2581: LIMIT can be propagated down into some aggregations

This patch contains 2 parts:
1. In some cases,  push down limit to pre-aggregation
 a) aggregation node has no aggregate function
 b) aggregation node has no predicate
2. finish aggregation when number of unique keys of hash table has
exceeded the limit.

Queries like

SELECT DISTINCT f FROM t LIMIT n

Can pass the LIMIT all the way down to the pre-aggregation, which
leads to a nearly unbounded speedup on these queries in large tables
when n is low.

Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
---
M be/src/exec/aggregation-node-base.cc
M be/src/exec/aggregation-node-base.h
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregator.h
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M be/src/exec/non-grouping-aggregator.h
M be/src/exec/streaming-aggregation-node.cc
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
M testdata/workloads/targeted-perf/queries/aggregation.test
17 files changed, 128 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/5
--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 5
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] [WIP]IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-08-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: [WIP]IMPALA-2581: LIMIT can be propagated down into some 
aggregations
..


Patch Set 3:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/9409/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 3
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 31 Aug 2021 07:53:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] [WIP]IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-08-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: [WIP]IMPALA-2581: LIMIT can be propagated down into some 
aggregations
..


Patch Set 2:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/9408/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 2
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 31 Aug 2021 07:46:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] [WIP]IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-08-31 Thread liuyao (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17821

to look at the new patch set (#4).

Change subject: [WIP]IMPALA-2581: LIMIT can be propagated down into some 
aggregations
..

[WIP]IMPALA-2581: LIMIT can be propagated down into some aggregations

This patch does two things:
1. In some cases,  push down limit to pre-aggregation
 a) aggregation node has no aggregate function
 b) aggregation node has no predicate
2. finish aggregation when number of unique keys of hash table has
exceeded the limit.

Queries like

SELECT DISTINCT f FROM t LIMIT n

Can pass the LIMIT all the way down to the pre-aggregation, which
leads to a nearly unbounded speedup on these queries in large tables
when n is low.

Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
---
M be/src/exec/aggregation-node-base.cc
M be/src/exec/aggregation-node-base.h
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregator.h
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M be/src/exec/non-grouping-aggregator.h
M be/src/exec/streaming-aggregation-node.cc
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
M testdata/workloads/targeted-perf/queries/aggregation.test
17 files changed, 128 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/4
--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 4
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] cleaner and faster operations wtih datasketches

2021-08-31 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17818 )

Change subject: cleaner and faster operations wtih datasketches
..


Patch Set 1:

(4 comments)

Thanks Alexander for taking care of these changes!
Apart from the automated format checks I left some comments, nothing serious.

http://gerrit.cloudera.org:8080/#/c/17818/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17818/1//COMMIT_MSG@7
PS1, Line 7: cleaner
I've created a jira ticket for this patch. Could you please add the Jira Id to 
the beginning of the commit msg (similarly to other patches)?
https://issues.apache.org/jira/browse/IMPALA-10901


http://gerrit.cloudera.org:8080/#/c/17818/1//COMMIT_MSG@7
PS1, Line 7: wtih
typo


http://gerrit.cloudera.org:8080/#/c/17818/1//COMMIT_MSG@8
PS1, Line 8:
Could you please write a sentence or two here to sum up the changes in this 
patch?


http://gerrit.cloudera.org:8080/#/c/17818/1/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17818/1/be/src/exprs/aggregate-functions-ir.cc@1624
PS1, Line 1624: StringVal SerializeCompactDsHllSketch(FunctionContext* ctx,
With this simplification we ended up with a number of functions that have 
actually the same body, and a difference between them is a function parameter. 
I wonder if these could be simplified further to have a single template 
function that covers all of them.



--
To view, visit http://gerrit.cloudera.org:8080/17818
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I306a2489dac0f4d2d475e8f9987cd58bf95474bb
Gerrit-Change-Number: 17818
Gerrit-PatchSet: 1
Gerrit-Owner: Alexander Saydakov 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 31 Aug 2021 07:33:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-08-31 Thread liuyao (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17821

to look at the new patch set (#3).

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..

IMPALA-2581: LIMIT can be propagated down into some aggregations

This patch does two things:
1. In some cases,  push down limit to pre-aggregation
 a) aggregation node has no aggregate function
 b) aggregation node has no predicate
2. finish aggregation when number of unique keys of hash table has
exceeded the limit.

Queries like

SELECT DISTINCT f FROM t LIMIT n

Can pass the LIMIT all the way down to the pre-aggregation, which
leads to a nearly unbounded speedup on these queries in large tables
when n is low.

Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
---
M be/src/exec/aggregation-node-base.cc
M be/src/exec/aggregation-node-base.h
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregator.h
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M be/src/exec/non-grouping-aggregator.h
M be/src/exec/streaming-aggregation-node.cc
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
M testdata/workloads/targeted-perf/queries/aggregation.test
17 files changed, 128 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/3
--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 3
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-08-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17821/2/be/src/exec/aggregation-node-base.cc
File be/src/exec/aggregation-node-base.cc:

http://gerrit.cloudera.org:8080/#/c/17821/2/be/src/exec/aggregation-node-base.cc@91
PS2, Line 91:   // node has exceeded the limit we can complete the query 
without calculating all the data
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/17821/2/be/src/exec/grouping-aggregator.h
File be/src/exec/grouping-aggregator.h:

http://gerrit.cloudera.org:8080/#/c/17821/2/be/src/exec/grouping-aggregator.h@217
PS2, Line 217:   void UnsetLimit() { limit_ = -1; }
line has trailing whitespace



--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 2
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 31 Aug 2021 07:26:14 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-08-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7444/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 2
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 31 Aug 2021 07:25:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-08-31 Thread liuyao (Code Review)
liuyao has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17821


Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..

IMPALA-2581: LIMIT can be propagated down into some aggregations

This patch does two things:
1. In some cases,  push down limit to pre-aggregation
 a) aggregation node has no aggregate function
 b) aggregation node has no predicate
2. finish aggregation when number of unique keys of hash table has
exceeded the limit.

Queries like

SELECT DISTINCT f FROM t LIMIT n

Can pass the LIMIT all the way down to the pre-aggregation, which
leads to a nearly unbounded speedup on these queries in large tables
when n is low.

Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
---
M be/src/exec/aggregation-node-base.cc
M be/src/exec/aggregation-node-base.h
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregator.h
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M be/src/exec/non-grouping-aggregator.h
M be/src/exec/streaming-aggregation-node.cc
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
M testdata/workloads/targeted-perf/queries/aggregation.test
17 files changed, 127 insertions(+), 24 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/2
--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 2
Gerrit-Owner: liuyao 


[Impala-ASF-CR] IMPALA-8680: Docker-based tests fail to archive the minicluster component logs

2021-08-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/15898 )

Change subject: IMPALA-8680: Docker-based tests fail to archive the minicluster 
component logs
..

IMPALA-8680: Docker-based tests fail to archive the minicluster component logs

Inside docker container copy logs of cluster components hdfs, yarn, kudu
from folder testdata/cluster/cdh/node-/var/log/
to folder logs/cluster/

Testing:
 - running docker-based tests and checked that minicluster logs are preserved 
and archived
 - test if minicluster logs get copied also in case when something gets wrong 
during build

Change-Id: I23e25d42992cec47c593dc388bcf0bcef828c05e
Reviewed-on: http://gerrit.cloudera.org:8080/15898
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M docker/entrypoint.sh
1 file changed, 66 insertions(+), 21 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/15898
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I23e25d42992cec47c593dc388bcf0bcef828c05e
Gerrit-Change-Number: 15898
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Garaguly 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Zoltan Garaguly 


[Impala-ASF-CR] IMPALA-8680: Docker-based tests fail to archive the minicluster component logs

2021-08-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15898 )

Change subject: IMPALA-8680: Docker-based tests fail to archive the minicluster 
component logs
..


Patch Set 8: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/15898
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I23e25d42992cec47c593dc388bcf0bcef828c05e
Gerrit-Change-Number: 15898
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Garaguly 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Zoltan Garaguly 
Gerrit-Comment-Date: Tue, 31 Aug 2021 06:58:33 +
Gerrit-HasComments: No