Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/9344 )
Change subject: IMPALA-5518: Allocate KrpcDataStreamRecvr RowBatch tuples from BufferPool ...................................................................... Patch Set 4: (7 comments) http://gerrit.cloudera.org:8080/#/c/9344/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9344/3//COMMIT_MSG@29 PS3, Line 29: Note that the proper reservation mechanism of the exchange node > I was thinking through the implications of allocating this memory through t Good point. As a comparison, suppose we allocate the memory from non-bufferpool sources, does it mean the chance of allocation failure will be lower when we are that close to capacity ? If so, will bumping it to 85% get us to parity ? http://gerrit.cloudera.org:8080/#/c/9344/1/be/src/exec/blocking-join-node.cc File be/src/exec/blocking-join-node.cc: http://gerrit.cloudera.org:8080/#/c/9344/1/be/src/exec/blocking-join-node.cc@163 PS1, Line 163: // Release resources in 'build_batch_' before closing the children as some of the > I wonder if this was the root cause of IMPALA-5715. Could be. http://gerrit.cloudera.org:8080/#/c/9344/1/be/src/exec/exchange-node.cc File be/src/exec/exchange-node.cc: http://gerrit.cloudera.org:8080/#/c/9344/1/be/src/exec/exchange-node.cc@82 PS1, Line 82: ExecEnv::GetInsta > Use ExecEnv::GetInstance() for consistency? Done http://gerrit.cloudera.org:8080/#/c/9344/1/be/src/exec/exchange-node.cc@138 PS1, Line 138: ExecEnv::GetInstance()->buffer_pool()->DeregisterClient(&recvr_buffer_pool_client_); > This check isn't necessary, deregistering an unregistered client is a no-op Done http://gerrit.cloudera.org:8080/#/c/9344/1/be/src/runtime/row-batch.cc File be/src/runtime/row-batch.cc: http://gerrit.cloudera.org:8080/#/c/9344/1/be/src/runtime/row-batch.cc@123 PS1, Line 123: > Why can't we allocate one buffer large enough for the tuple pointers and th On the other hand, that also means holding onto the memory of tuple_ptrs_ longer than necessary as row batches are accumulated by the caller of ExchangeNode::GetNext(). The hope is that this tuple_ptrs_ will mostly be satisfied with min_buffer_size so they will be recycled quickly across row batches. While I agree that we don't necessarily need to preserve tuple_ptrs_ across Reset() for batch generated from KrpcDataStreamRecvr, it seems to complicate the RowBatch logic even further. In other words, one cannot easily tell whether tuple_ptrs_ will remain valid after Reset() if we follow this pattern, making the code slightly harder to reason about. So, in light of that, I try to keep the logic of Reset() similar across row batches created from the 3 different ctors. http://gerrit.cloudera.org:8080/#/c/9344/1/be/src/runtime/row-batch.cc@199 PS1, Line 199: const int64_t uncompressed_size = header.uncompressed_size(); > nit: don't need the std:: prefixes in .cc generally Done http://gerrit.cloudera.org:8080/#/c/9344/1/be/src/runtime/row-batch.cc@484 PS1, Line 484: // tuple_ptrs_ were allocated with malloc so can be swapped between batches. > We don't need to support AcquireState() for these batches do we? I don't th Good point. Replaced with a DCHECK. -- To view, visit http://gerrit.cloudera.org:8080/9344 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If4b1a45f68b9df0d3b539511e15aff15700246f2 Gerrit-Change-Number: 9344 Gerrit-PatchSet: 4 Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Fri, 16 Feb 2018 09:06:30 +0000 Gerrit-HasComments: Yes