Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9344 )

Change subject: IMPALA-5518: Allocate KrpcDataStreamRecvr RowBatch tuples from 
BufferPool
......................................................................


Patch Set 4:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/9344/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9344/3//COMMIT_MSG@29
PS3, Line 29: Note that the proper reservation mechanism of the exchange node
> I was thinking through the implications of allocating this memory through t
Good point. As a comparison, suppose we allocate the memory from non-bufferpool 
sources, does it mean the chance of allocation failure will be lower when we 
are that close to capacity ? If so, will bumping it to 85% get us to parity ?


http://gerrit.cloudera.org:8080/#/c/9344/1/be/src/exec/blocking-join-node.cc
File be/src/exec/blocking-join-node.cc:

http://gerrit.cloudera.org:8080/#/c/9344/1/be/src/exec/blocking-join-node.cc@163
PS1, Line 163:     // Release resources in 'build_batch_' before closing the 
children as some of the
> I wonder if this was the root cause of IMPALA-5715.
Could be.


http://gerrit.cloudera.org:8080/#/c/9344/1/be/src/exec/exchange-node.cc
File be/src/exec/exchange-node.cc:

http://gerrit.cloudera.org:8080/#/c/9344/1/be/src/exec/exchange-node.cc@82
PS1, Line 82: ExecEnv::GetInsta
> Use ExecEnv::GetInstance() for consistency?
Done


http://gerrit.cloudera.org:8080/#/c/9344/1/be/src/exec/exchange-node.cc@138
PS1, Line 138:   
ExecEnv::GetInstance()->buffer_pool()->DeregisterClient(&recvr_buffer_pool_client_);
> This check isn't necessary, deregistering an unregistered client is a no-op
Done


http://gerrit.cloudera.org:8080/#/c/9344/1/be/src/runtime/row-batch.cc
File be/src/runtime/row-batch.cc:

http://gerrit.cloudera.org:8080/#/c/9344/1/be/src/runtime/row-batch.cc@123
PS1, Line 123:
> Why can't we allocate one buffer large enough for the tuple pointers and th
On the other hand, that also means holding onto the memory of tuple_ptrs_ 
longer than necessary as row batches are accumulated by the caller of 
ExchangeNode::GetNext(). The hope is that this tuple_ptrs_ will mostly be 
satisfied with min_buffer_size so they will be recycled quickly across row 
batches.

While I agree that we don't necessarily need to preserve tuple_ptrs_ across 
Reset() for batch generated from KrpcDataStreamRecvr, it seems to complicate 
the RowBatch logic even further. In other words, one cannot easily tell whether 
tuple_ptrs_ will remain valid after Reset() if we follow this pattern, making 
the code slightly harder to reason about. So, in light of that, I try to keep 
the logic of Reset() similar across row batches created from the 3 different 
ctors.


http://gerrit.cloudera.org:8080/#/c/9344/1/be/src/runtime/row-batch.cc@199
PS1, Line 199:   const int64_t uncompressed_size = header.uncompressed_size();
> nit: don't need the std:: prefixes in .cc generally
Done


http://gerrit.cloudera.org:8080/#/c/9344/1/be/src/runtime/row-batch.cc@484
PS1, Line 484:   // tuple_ptrs_ were allocated with malloc so can be swapped 
between batches.
> We don't need to support AcquireState() for these batches do we? I don't th
Good point. Replaced with a DCHECK.



--
To view, visit http://gerrit.cloudera.org:8080/9344
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If4b1a45f68b9df0d3b539511e15aff15700246f2
Gerrit-Change-Number: 9344
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Fri, 16 Feb 2018 09:06:30 +0000
Gerrit-HasComments: Yes

Reply via email to