Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/9446 )
Change subject: IMPALA-6554: Fix a race in DequeueDeferredRpc() ...................................................................... Patch Set 1: (2 comments) Responding to questions first. Will update the patch next. http://gerrit.cloudera.org:8080/#/c/9446/1/be/src/runtime/krpc-data-stream-recvr.cc File be/src/runtime/krpc-data-stream-recvr.cc: http://gerrit.cloudera.org:8080/#/c/9446/1/be/src/runtime/krpc-data-stream-recvr.cc@337 PS1, Line 337: RespondAndReleaseRpc > so in order for this to be correct w.r.t. Close(), the lock must be held wh Good idea. Will update the patch. http://gerrit.cloudera.org:8080/#/c/9446/1/be/src/runtime/krpc-data-stream-recvr.cc@438 PS1, Line 438: recvr_->mem_tracker()->Release(ctx->rpc_context->GetTransferSize()); : } : : // Responds to the sender to ack the insertion of the row batches. : status.ToProto(ctx->response->mutable_status()); : ctx->rpc_context->RespondSuccess(); > given that the other callsites are holding the lock, why is it not okay to Not really. RespondSuccess() only enqueues a task for the reactor thread to respond to an RPC. The Inbound call which owns the actual InboundTransfer buffer is freed only after the reactor thread finishes sending the response (https://github.com/apache/impala/blob/master/be/src/kudu/rpc/connection.cc#L404-L432). InboundTransfer at certain stages of an RPC is currently untracked memory. For instance, the ReadHandler() called by the reactor thread is responsible for allocating the buffer for holding the entire RPC payload. We currently don't have the hook into the KRPC stack to count this towards a particular query. This definitely needs to be addressed once we get past KRPC milestone 1. I avoid holding the lock to while the response buffer is being serialized. We call RespondAndReleaseRpc() with lock held but they are mostly for error conditions (e.g. cancelled stream, failure to unpack the RPC sidecars etc). Note that in the fast path AddRow(), we also serialize the response without holding the lock. Not that I have solid figure to prove that it matters a lot but this seems to take long enough that the race this patch is fixing can actually manifest with concurrent queries. -- To view, visit http://gerrit.cloudera.org:8080/9446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4a3b0789633c7b8bc898381d509e2af769f0e069 Gerrit-Change-Number: 9446 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Comment-Date: Mon, 26 Feb 2018 20:27:35 +0000 Gerrit-HasComments: Yes
