Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/13883 )
Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue interface and BufferedPRS imp ...................................................................... Patch Set 7: (2 comments) Didn't do a full review, but wanted to point out a memory-safety issue because of the arcane memory transfer rules around RowBatches. http://gerrit.cloudera.org:8080/#/c/13883/7/be/src/exec/buffered-plan-root-sink.cc File be/src/exec/buffered-plan-root-sink.cc: http://gerrit.cloudera.org:8080/#/c/13883/7/be/src/exec/buffered-plan-root-sink.cc@38 PS7, Line 38: output_batch->AcquireState(batch); AcquireState isn't safe to use here - it doesn't copy into fresh memory, just does a shallow move of the batches state. You don't know what the batch might have pointers to. There are two potential problems * If the FLUSH_RESOURCES flag is set, it's necessary for correctness to copy this batch and all preceding batches, since the memory they reference may be going away * The batches may be keeping alive a lot more memory than they're actually references, e.g. if a lot of rows got filtered out by predicates NljBuilder correctly handled both of these problems, but I don't think it's worth the complexity of trying to do that (you'd, at minimum, need to iterate through the queue and deep copy all the batches). This is all a consequence of Impala very aggressively trying to avoid copying memory. I'd recommend using DeepCopyTo() like here: https://github.com/apache/impala/blob/master/be/src/exec/nested-loop-join-builder.cc#L97 There are probably relatively few common query patterns that would exercise this since most common plans have an exchange feeding into PlanRootSink. A grouping aggregation feeding directly into PlanRootSink might be able to reproduce it (you'd have to set num_nodes=1 to get a non-distributed plan). http://gerrit.cloudera.org:8080/#/c/13883/7/be/src/exec/plan-root-sink.cc File be/src/exec/plan-root-sink.cc: http://gerrit.cloudera.org:8080/#/c/13883/7/be/src/exec/plan-root-sink.cc@64 PS7, Line 64: Status PlanRootSink::Send(RuntimeState* state, RowBatch* batch) { I'd find it clearer if this was a helper function instead of the default implementation of Send() - that would make it clear that it wasn't a (possibly) complete implementation of Send(), which is what I'd generally expect for a default implementation. -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 7 Gerrit-Owner: Sahil Takiar <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Sahil Takiar <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Wed, 24 Jul 2019 17:30:23 +0000 Gerrit-HasComments: Yes
