Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8196 )
Change subject: IMPALA-4236: Codegen CopyRows() for select nodes ...................................................................... Patch Set 6: (2 comments) Your code looks good but I'd really like to simplify this code further so that it's easier to understand in future. http://gerrit.cloudera.org:8080/#/c/8196/6/be/src/exec/select-node.cc File be/src/exec/select-node.cc: http://gerrit.cloudera.org:8080/#/c/8196/6/be/src/exec/select-node.cc@91 PS6, Line 91: if (ReachedLimit() || (child_row_idx_ == child_row_batch_->num_rows() && child_eos_)) { This is still a bit wonky (not your change but we should try to leave this in a good state) - it's not clear when this will actually be taken. The invariant around GetNext() is that callers shouldn't call GetNext() after 'eos' is true. Given that invariant, I don't think it's possible that this is taken. * ReachedLimit() should never be true here. limit_ == 0 should be caught by the frontend and converted into an EmptySetNode. If limit_ > 0, then we should have set *eos after copying the rows below. * If we hit the end of the input below on line 120, we should already have set *eos. http://gerrit.cloudera.org:8080/#/c/8196/6/be/src/exec/select-node.cc@107 PS6, Line 107: child_row_batch_->TransferResourceOwnership(row_batch); Sorry for the additional churn but I think the resource transfer is overly complicated. This actually will often attach the resources to the batch after the last batch that references them. Instead it should attach resources to the batch as soon as the last row from the input batch is copied to the output. I'm thinking removing the transfer and AtCapacity() check here and changing the logic below to something like: if (*eos || child_row_idx_ == child_row_batch_->num_rows()) { child_row_batch_->TransferResourceOwnership(row_batch); child_row_batch_->Reset(); } if (*eos || row_batch->AtCapacity()) return Status::OK(); Or maybe it would then be clearer as a do/while() loop. This would require some further changes, since it would reset child_row_batch_->num_rows() to 0 - but maybe child_row_batch_->num_rows() == 0 should actually just signal that the batch has been consumed. -- To view, visit http://gerrit.cloudera.org:8080/8196 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie0d496d004418468e16b6f564f90f45ebbf87c1e Gerrit-Change-Number: 8196 Gerrit-PatchSet: 6 Gerrit-Owner: Bikramjeet Vig <[email protected]> Gerrit-Reviewer: Bikramjeet Vig <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Mon, 09 Oct 2017 18:56:37 +0000 Gerrit-HasComments: Yes
