Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9690 )

Change subject: IMPALA-6685: Improve profiles in KrpcDataStreamRecvr and 
KrpcDataStreamSender
......................................................................


Patch Set 4:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/9690/3/be/src/runtime/krpc-data-stream-mgr.cc
File be/src/runtime/krpc-data-stream-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/9690/3/be/src/runtime/krpc-data-stream-mgr.cc@201
PS3, Line 201:   TUniqueId finst_id;
> its impossible to know what this is all about until you read the other code
Done


http://gerrit.cloudera.org:8080/#/c/9690/3/be/src/runtime/krpc-data-stream-recvr.h
File be/src/runtime/krpc-data-stream-recvr.h:

http://gerrit.cloudera.org:8080/#/c/9690/3/be/src/runtime/krpc-data-stream-recvr.h@171
PS3, Line 171: num_deferred_rpcs()
> should this be num_deferred_rpcs? We've been referring to these as "deferre
Done


http://gerrit.cloudera.org:8080/#/c/9690/3/be/src/runtime/krpc-data-stream-recvr.h@245
PS3, Line 245:  Following counters belong to 'dequeue_profile
> is that accurate? what does "deferred row batches" have to do with sampling
Oops.. copied and pasted from elsewhere.


http://gerrit.cloudera.org:8080/#/c/9690/3/be/src/runtime/krpc-data-stream-recvr.h@254
PS3, Line 254:   RuntimeProfile::Counter* queue_get_batch_time_;
> then it should be in this section, right?
Done


http://gerrit.cloudera.org:8080/#/c/9690/3/be/src/runtime/krpc-data-stream-recvr.h@255
PS3, Line 255:
> then why have data_wait_time_?
Inactive time is not shown in the profile. It's only used for computing the 
"non-child" time of the exchange node. I find "non-child" time inaccurate but I 
will refrain from fixing it in this change.


http://gerrit.cloudera.org:8080/#/c/9690/3/be/src/runtime/krpc-data-stream-recvr.h@262
PS3, Line 262:   /// 
------------------------------------------------------------------------------------
             :   /// Following counters belong to 'enqueue_pro
> why is this in the "dequeue" profile?  seems more enqueue side, no?
Done


http://gerrit.cloudera.org:8080/#/c/9690/3/be/src/runtime/krpc-data-stream-recvr.h@294
PS3, Line 294: RuntimeProfile::Counter* total_deferred_rpcs_time_;
This is converted to tracking the cumulative time in which 'deferred_rpcs_' is 
not empty in the new patch as the average tends to be skewed outlier (e.g. the 
amount of time between when a RPC arrives and when the parent of exchange node 
actually calls GetBatch()).


http://gerrit.cloudera.org:8080/#/c/9690/3/be/src/runtime/krpc-data-stream-recvr.cc
File be/src/runtime/krpc-data-stream-recvr.cc:

http://gerrit.cloudera.org:8080/#/c/9690/3/be/src/runtime/krpc-data-stream-recvr.cc@212
PS3, Line 212: parent_recvr), num_remaining_senders_(num_senders) {
> nit: would be nice to order these in the same order of the comment
Done


http://gerrit.cloudera.org:8080/#/c/9690/3/be/src/runtime/krpc-data-stream-recvr.cc@457
PS3, Line 457:       DataStreamService::RespondAndReleaseRpc(status, 
ctx->response, ctx->rpc_context,
             :           recvr_->deferred_rpc_tracker());
             :       DequeueDeferredRpc();
             :       return;
             :     }
             :
             :     // Stops if inserting the batch causes us to go over the 
limit.
             :     // Put 'ctx' back on the queue.
             :     if (!CanEnqueue(batch_size)) {
             :       ctx.swap(deferred_rpcs_.front());
             :       DCHECK(deferred_rpcs_.front().get() != nullptr);
             :       return;
             :     }
             :
> why not use UnpackRequest() for this?
Convert to a static function. UnpackRequest() was returning deserialized size 
while this returns the serialized size.


http://gerrit.cloudera.org:8080/#/c/9690/3/be/src/runtime/krpc-data-stream-sender.cc
File be/src/runtime/krpc-data-stream-sender.cc:

http://gerrit.cloudera.org:8080/#/c/9690/3/be/src/runtime/krpc-data-stream-sender.cc@388
PS3, Line 388: _in_flight_batch_)
> but that doesn't include time spent in the krpc service queue, right? so ho
Updated the patch to also include the time in service queue. There is currently 
no easy way to track time spent in the outbound transfer queue.


http://gerrit.cloudera.org:8080/#/c/9690/3/be/src/runtime/krpc-data-stream-sender.cc@389
PS3, Line 389:     int64_t network_time = total_time - 
resp_.receiver_latency_ns();
> why is that? why does total time not include time spent waiting for the rep
In the Thrift based implementation, we are mostly tracking the time to cache 
the send buffer and the serialization time.

PS4 fixes this up by counting the total time as time spent in Send(), 
FlushFinal(), Close(), Open() and Prepare(). Also counted the time spent in 
WaitForRpc() as "inactive time" so as to show the actual overhead of DSS.


http://gerrit.cloudera.org:8080/#/c/9690/3/be/src/runtime/runtime-state.h
File be/src/runtime/runtime-state.h:

http://gerrit.cloudera.org:8080/#/c/9690/3/be/src/runtime/runtime-state.h@a356
PS3, Line 356:
> is the "across all threads" part true? maybe include that in the new commen
"across all threads" is not valid anymore.There is only one thread: the 
fragment instance execution threads. I believe it used to be true with Thrift 
when there is one thread per channel.


http://gerrit.cloudera.org:8080/#/c/9690/3/be/src/service/data-stream-service.cc
File be/src/service/data-stream-service.cc:

http://gerrit.cloudera.org:8080/#/c/9690/3/be/src/service/data-stream-service.cc@80
PS3, Line 80: void DataStreamService::TransmitData(const TransmitDataRequestPB* 
request,
> I think it would be clearer to set the initial value of set_queue_time() he
With the new patch, we will now use the time a RPC is received recorded in an 
inbound call.


http://gerrit.cloudera.org:8080/#/c/9690/3/be/src/service/data-stream-service.cc@88
PS3, Line 88: void DataStreamService::RespondRpc(const Status& status,
            :     ResponsePBType* response, kudu::rpc::RpcContext* ctx) {
            :   MonoDelta duration(Mon
> could just call RespondRpc(), no?
Done


http://gerrit.cloudera.org:8080/#/c/9690/3/common/protobuf/data_stream_service.proto
File common/protobuf/data_stream_service.proto:

http://gerrit.cloudera.org:8080/#/c/9690/3/common/protobuf/data_stream_service.proto@53
PS3, Line 53:   // Latency for response in the receiver in nanoseconds.
> what is "queue time"?
Done


http://gerrit.cloudera.org:8080/#/c/9690/3/common/protobuf/data_stream_service.proto@53
PS3, Line 53:   // Latency for response in the receiver in nanoseconds.
> After reading through the code, "queue time" seems confusing because there
Done



--
To view, visit http://gerrit.cloudera.org:8080/9690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8ba405921b3df920c1e85b940ce9c8d02fc647cd
Gerrit-Change-Number: 9690
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com>
Gerrit-Comment-Date: Sat, 24 Mar 2018 22:54:39 +0000
Gerrit-HasComments: Yes

Reply via email to