Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/9527 )
Change subject: IMPALA-6609: Fix ownership of class members in KrpcDataStreamRecvr ...................................................................... Patch Set 1: (9 comments) http://gerrit.cloudera.org:8080/#/c/9527/1/be/src/runtime/krpc-data-stream-recvr.h File be/src/runtime/krpc-data-stream-recvr.h: http://gerrit.cloudera.org:8080/#/c/9527/1/be/src/runtime/krpc-data-stream-recvr.h@163 PS1, Line 163: /// Row schema. > Add "Not owned." Done http://gerrit.cloudera.org:8080/#/c/9527/1/be/src/runtime/krpc-data-stream-recvr.h@182 PS1, Line 182: client_ > Not your change, but I think we should we rename this to buffer_pool_client Renamed to buffer_pool_client_. http://gerrit.cloudera.org:8080/#/c/9527/1/be/src/runtime/krpc-data-stream-recvr.h@201 PS1, Line 201: own most of the : /// counters below > if the only exception is the one you marked "Not owned", how about saying " Done http://gerrit.cloudera.org:8080/#/c/9527/1/be/src/runtime/krpc-data-stream-recvr.h@201 PS1, Line 201: own most of the : /// counters below > Add "These profiles are guaranteed to live as long as their parent 'profile Done http://gerrit.cloudera.org:8080/#/c/9527/1/be/src/runtime/krpc-data-stream-recvr.cc File be/src/runtime/krpc-data-stream-recvr.cc: http://gerrit.cloudera.org:8080/#/c/9527/1/be/src/runtime/krpc-data-stream-recvr.cc@a636 PS1, Line 636: > Maybe I'm missing something, but I'm not able to understand why this was re TakeOverEarlySender() may access 'mgr_'. I was thinking that there may be a race between Close() and TakeOverEarlySender(). However, the race should be impossible given both TakeOverEarlySenders() and Close() are supposed to be called from the fragment execution thread context. Reverted this line in PS2. Added a DCHECK() to validate the assumption. http://gerrit.cloudera.org:8080/#/c/9527/1/be/src/runtime/krpc-data-stream-recvr.cc@411 PS1, Line 411: recvr_->deferred_rpc_tracker() > We access the deferred_rpc_tracker() here too. Should we check for cancella Another subtle point which warrants a comment. See line 399 for the DCHECK. The contract is that the queue 'deferred_rpcs_' is drained during cancellation and no thread should add to 'deferred_rpcs_' after Cancel() is called. So, line 398 implicitly handles the case of cancellation. Note the DCHECK() at line 399. Comments needed there. http://gerrit.cloudera.org:8080/#/c/9527/1/be/src/runtime/krpc-data-stream-recvr.cc@451 PS1, Line 451: Only update the 'deferred_rpc_tracker_' if the queue is still open. > I think a "why" rather than "what" would be more useful here. Something lik It's true that a queue may be cancelled but not yet closed if a fragment instance is cancelled but not yet closed. We want to avoid consuming from 'deferred_rpc_trackers_' as all deferred RPCs are drained in Cancel() so deferred_rpc_trackers_'s consumption should go to zero after Cancel() has been called on all sender queues. There is a window in KrpcDataStreamMgr::CreateRecvr() between FindRecvr() returns and when TakeOverEarlySender is called. In that window, the fragment instance can be cancelled by the coordinator. That's how a thread may end up calling TakeOverEarlySender() after it has been cancelled. This receiver shouldn't be closed at this point as this function is called in the context of the fragment execution thread and KrpcDataStreamReceiver::Close() is also supposed to be called from fragment execution thread only. Added some explanation and DCHECK for it. http://gerrit.cloudera.org:8080/#/c/9527/1/be/src/runtime/krpc-data-stream-recvr.cc@568 PS1, Line 568: Note that the parent profile doesn't hold any ownership to these : // profiles. > I think this will be hard to decifer, is there a more direct thing we can s This is trying to point out that 'profile_' won't actually free these two profiles so they are safe to access even after 'profile_' is deleted. Added the comments you suggested. http://gerrit.cloudera.org:8080/#/c/9527/1/be/src/runtime/krpc-data-stream-recvr.cc@644 PS1, Line 644: deferred_rpc_tracker_->Close(); > This seems a little brittle, although it would work for us. In the header, Yes, I believe a more clear explanation about the contract is warranted. In essence, it's assuming that all threads which try to consume from / insert into sender queues will hold the sender queue's lock and check for 'is_cancelled_' flag which acts as a surrogate on whether the queue is still open for business. If 'is_cancelled_' is true, it may mean Close() has been called on it already. So past line 642 above, no threads should really be consuming from deferred_rpc_tracker_ so the main thread here can safely close it without holding the lock. -- To view, visit http://gerrit.cloudera.org:8080/9527 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3378496e2201b16c662b9daa634333480705c61a Gerrit-Change-Number: 9527 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Sailesh Mukil <sail...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Thu, 08 Mar 2018 04:13:54 +0000 Gerrit-HasComments: Yes