Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9446 )

Change subject: IMPALA-6554: Fix a race in DequeueDeferredRpc()
......................................................................


Patch Set 1:

(2 comments)

Responding to questions first. Will update the patch next.

http://gerrit.cloudera.org:8080/#/c/9446/1/be/src/runtime/krpc-data-stream-recvr.cc
File be/src/runtime/krpc-data-stream-recvr.cc:

http://gerrit.cloudera.org:8080/#/c/9446/1/be/src/runtime/krpc-data-stream-recvr.cc@337
PS1, Line 337: RespondAndReleaseRpc
> so in order for this to be correct w.r.t. Close(), the lock must be held wh
Good idea. Will update the patch.


http://gerrit.cloudera.org:8080/#/c/9446/1/be/src/runtime/krpc-data-stream-recvr.cc@438
PS1, Line 438:     
recvr_->mem_tracker()->Release(ctx->rpc_context->GetTransferSize());
             :   }
             :
             :   // Responds to the sender to ack the insertion of the row 
batches.
             :   status.ToProto(ctx->response->mutable_status());
             :   ctx->rpc_context->RespondSuccess();
> given that the other callsites are holding the lock, why is it not okay to
Not really. RespondSuccess() only enqueues a task for the reactor thread to 
respond to an RPC. The Inbound call which owns the actual InboundTransfer 
buffer is freed only after the reactor thread finishes sending the response 
(https://github.com/apache/impala/blob/master/be/src/kudu/rpc/connection.cc#L404-L432).

InboundTransfer at certain stages of an RPC is currently untracked memory. For 
instance, the ReadHandler() called by the reactor thread is responsible for 
allocating the buffer for holding the entire RPC payload. We currently don't 
have the hook into the KRPC stack to count this towards a particular query. 
This definitely needs to be addressed once we get past KRPC milestone 1.

I avoid holding the lock to while the response buffer is being serialized. We 
call RespondAndReleaseRpc() with lock held but they are mostly for error 
conditions (e.g. cancelled stream, failure to unpack the RPC sidecars etc). 
Note that in the fast path AddRow(), we also serialize the response without 
holding the lock. Not that I have solid figure to prove that it matters a lot 
but this seems to take long enough that the race this patch is fixing can 
actually manifest with concurrent queries.



--
To view, visit http://gerrit.cloudera.org:8080/9446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4a3b0789633c7b8bc898381d509e2af769f0e069
Gerrit-Change-Number: 9446
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Comment-Date: Mon, 26 Feb 2018 20:27:35 +0000
Gerrit-HasComments: Yes

Reply via email to