Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13883 )

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue interface and 
BufferedPRS imp
......................................................................


Patch Set 7:

(2 comments)

Didn't do a full review, but wanted to point out a memory-safety issue because 
of the arcane memory transfer rules around RowBatches.

http://gerrit.cloudera.org:8080/#/c/13883/7/be/src/exec/buffered-plan-root-sink.cc
File be/src/exec/buffered-plan-root-sink.cc:

http://gerrit.cloudera.org:8080/#/c/13883/7/be/src/exec/buffered-plan-root-sink.cc@38
PS7, Line 38:   output_batch->AcquireState(batch);
AcquireState isn't safe to use here - it doesn't copy into fresh memory, just 
does a shallow move of the batches state. You don't know what the batch might 
have pointers to. There are two potential problems
* If the FLUSH_RESOURCES flag is set, it's necessary for correctness to copy 
this batch and all preceding batches, since the memory they reference may be 
going away
* The batches may be keeping alive a lot more memory than they're actually 
references, e.g. if a lot of rows got filtered out by predicates

NljBuilder correctly handled both of these problems, but I don't think it's 
worth the complexity of trying to do that (you'd, at minimum, need to iterate 
through the queue and deep copy all the batches).

This is all a consequence of Impala very aggressively trying to avoid copying 
memory.

I'd recommend using DeepCopyTo() like here: 
https://github.com/apache/impala/blob/master/be/src/exec/nested-loop-join-builder.cc#L97

There are probably relatively few common query patterns that would exercise 
this since most common plans have an exchange feeding into PlanRootSink. A 
grouping aggregation feeding directly into PlanRootSink might be able to 
reproduce it (you'd have to set num_nodes=1 to get a non-distributed plan).


http://gerrit.cloudera.org:8080/#/c/13883/7/be/src/exec/plan-root-sink.cc
File be/src/exec/plan-root-sink.cc:

http://gerrit.cloudera.org:8080/#/c/13883/7/be/src/exec/plan-root-sink.cc@64
PS7, Line 64: Status PlanRootSink::Send(RuntimeState* state, RowBatch* batch) {
I'd find it clearer if this was a helper function instead of the default 
implementation of Send() - that would make it clear that it wasn't a (possibly) 
complete implementation of Send(), which is what I'd generally expect for a 
default implementation.



--
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 7
Gerrit-Owner: Sahil Takiar <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Sahil Takiar <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Wed, 24 Jul 2019 17:30:23 +0000
Gerrit-HasComments: Yes

Reply via email to