Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4023: don't attach buffered tuple streams to batches
......................................................................


Patch Set 2:

I agree that it could make a difference but I don't think we want to preserve 
the old behaviour. I think we should be consistent between Blocks attached by 
the sorter and Blocks attached by Join/Agg nodes. I looked at the history and 
the motivation seems to be in the join node:     

    // We want to return as soon as we have attached a tuple stream to the 
out_batch
    // (before preparing a new partition). The attached tuple stream will be 
recycled
    // by the caller, freeing up more memory when we prepare the next partition.
    if (out_batch->AtCapacity()) break;

So it seems to be a heuristic to break out of the loop in the join node. I 
think if we want to preserve that behaviour we should change the PHJ node to 
explicitly check whether there were blocks attached to the batch, since that 
seems to be the actual intention. It's unclear if it's worth adding that 
special-case to avoid accumulating a littl extra memory before we flush out the 
batch.

It's somewhat broken anyway since there's nothing to prevent the other exec 
node from holding onto the memory (e.g. if it's a NLJ node).

-- 
To view, visit http://gerrit.cloudera.org:8080/4448
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6471422f86ce71e4c6ab277a276000051bc2e8ff
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: No

Reply via email to