Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11852 )

Change subject: IMPALA-3652: Fix resource transfer in subplans with limits
......................................................................


Patch Set 1:

(7 comments)

I think this makes sense. Had some questions about cases where we might need to 
transfer memory and aren't et.

http://gerrit.cloudera.org:8080/#/c/11852/1/be/src/exec/aggregation-node-base.cc
File be/src/exec/aggregation-node-base.cc:

http://gerrit.cloudera.org:8080/#/c/11852/1/be/src/exec/aggregation-node-base.cc@77
PS1, Line 77:   for (auto& agg : aggs_) RETURN_IF_ERROR(agg->Reset(state));
I think there are cases where previous batches reference memory owned by the 
aggregators. E.g. if we returned half of one partition, the partition data may 
still be owned by the aggregator.


http://gerrit.cloudera.org:8080/#/c/11852/1/be/src/exec/analytic-eval-node.cc
File be/src/exec/analytic-eval-node.cc:

http://gerrit.cloudera.org:8080/#/c/11852/1/be/src/exec/analytic-eval-node.cc@812
PS1, Line 812: Status AnalyticEvalNode::Reset(RuntimeState* state, RowBatch* 
row_batch) {
I feel like we need some of the logic from the eos branch on line 783 here - 
otherwise could we have returned some rows that are backed by the various tuple 
pools?


http://gerrit.cloudera.org:8080/#/c/11852/1/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

http://gerrit.cloudera.org:8080/#/c/11852/1/be/src/exec/partitioned-hash-join-node.cc@200
PS1, Line 200: Status PartitionedHashJoinNode::Reset(RuntimeState* state, 
RowBatch* row_batch) {
I think it's possible that returned rows could reference memory from in-memory 
build partitions. CleanUpHashPartitions() handles this in the normal case by 
attaching that memory to the output batch.


http://gerrit.cloudera.org:8080/#/c/11852/1/be/src/exec/partitioned-hash-join-node.cc@211
PS1, Line 211:   DCHECK(output_unmatched_batch_ == nullptr || 
output_unmatched_batch_->num_rows() == 0);
I don't understand why this is true - couldn't the previous GetNext() call have 
gone down the OutputUnmatchedBuild() code path and be partway through 
outputting rows from the buidl side?


http://gerrit.cloudera.org:8080/#/c/11852/1/be/src/exec/subplan-node.cc
File be/src/exec/subplan-node.cc:

http://gerrit.cloudera.org:8080/#/c/11852/1/be/src/exec/subplan-node.cc@91
PS1, Line 91: transfered
nit: transferred


http://gerrit.cloudera.org:8080/#/c/11852/1/be/src/exec/topn-node.cc
File be/src/exec/topn-node.cc:

http://gerrit.cloudera.org:8080/#/c/11852/1/be/src/exec/topn-node.cc@219
PS1, Line 219:   // of resources in the future.
I think we'd need some very weird query shape with limits on subqueries at 
multiple levels, but isn't it possible that returned tuples could reference 
data in tuple_pool_ here? In that case they might get clobbered in 
ReclaimTuplePool().


http://gerrit.cloudera.org:8080/#/c/11852/1/testdata/workloads/functional-query/queries/QueryTest/nested-types-tpch.test
File 
testdata/workloads/functional-query/queries/QueryTest/nested-types-tpch.test:

http://gerrit.cloudera.org:8080/#/c/11852/1/testdata/workloads/functional-query/queries/QueryTest/nested-types-tpch.test@195
PS1, Line 195: set batch_size=10;
Do these options take effect? If batch_size is part of the test vector that 
will take precedence..



--
To view, visit http://gerrit.cloudera.org:8080/11852
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3968a379fcbb5d30fcec304995d3e44933dbbc77
Gerrit-Change-Number: 11852
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Marshall <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Thu, 01 Nov 2018 22:36:33 +0000
Gerrit-HasComments: Yes

Reply via email to