Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/9690 )
Change subject: IMPALA-6685: Improve profiles in KrpcDataStreamRecvr and KrpcDataStreamSender ...................................................................... Patch Set 3: (14 comments) http://gerrit.cloudera.org:8080/#/c/9690/3/be/src/runtime/krpc-data-stream-mgr.cc File be/src/runtime/krpc-data-stream-mgr.cc: http://gerrit.cloudera.org:8080/#/c/9690/3/be/src/runtime/krpc-data-stream-mgr.cc@201 PS3, Line 201: response->set_queue_time(MonotonicNanos()); its impossible to know what this is all about until you read the other code. Can we move this to the service level so the entire bookkeeping is done there? http://gerrit.cloudera.org:8080/#/c/9690/3/be/src/runtime/krpc-data-stream-recvr.h File be/src/runtime/krpc-data-stream-recvr.h: http://gerrit.cloudera.org:8080/#/c/9690/3/be/src/runtime/krpc-data-stream-recvr.h@171 PS3, Line 171: num_deferred_batches should this be num_deferred_rpcs? We've been referring to these as "deferred RPCs" elsewhere, right? If you want to leave these as "batches", let's at least indicate in the comment that it was the RPC that was deferred. http://gerrit.cloudera.org:8080/#/c/9690/3/be/src/runtime/krpc-data-stream-recvr.h@245 PS3, Line 245: Time series of number of deferred row batches is that accurate? what does "deferred row batches" have to do with sampling bytes_dequeued_counter_? http://gerrit.cloudera.org:8080/#/c/9690/3/be/src/runtime/krpc-data-stream-recvr.h@254 PS3, Line 254: /// Pointer to profile's inactive timer. Not owned. then it should be in this section, right? http://gerrit.cloudera.org:8080/#/c/9690/3/be/src/runtime/krpc-data-stream-recvr.h@255 PS3, Line 255: /// Measure the same duration as 'data_arrival_time_' above. then why have data_wait_time_? http://gerrit.cloudera.org:8080/#/c/9690/3/be/src/runtime/krpc-data-stream-recvr.h@262 PS3, Line 262: /// Total number of EOS received. : RuntimeProfile::Counter* total_eos_received_; why is this in the "dequeue" profile? seems more enqueue side, no? http://gerrit.cloudera.org:8080/#/c/9690/3/be/src/runtime/krpc-data-stream-recvr.cc File be/src/runtime/krpc-data-stream-recvr.cc: http://gerrit.cloudera.org:8080/#/c/9690/3/be/src/runtime/krpc-data-stream-recvr.cc@212 PS3, Line 212: num_pending_enqueue_ + num_deserialize_tasks_pending_ nit: would be nice to order these in the same order of the comment http://gerrit.cloudera.org:8080/#/c/9690/3/be/src/runtime/krpc-data-stream-recvr.cc@457 PS3, Line 457: static int64_t GetSerializedBatchSize( : const TransmitDataRequestPB* request, RpcContext* rpc_context) { : kudu::Slice tuple_offsets; : if (UNLIKELY(!rpc_context->GetInboundSidecar( : request->tuple_offsets_sidecar_idx(), &tuple_offsets).ok())) { : return 0; : } : kudu::Slice tuple_data; : if (UNLIKELY(!rpc_context->GetInboundSidecar( : request->tuple_data_sidecar_idx(), &tuple_data).ok())) { : return 0; : } : return tuple_data.size() + tuple_offsets.size(); : } why not use UnpackRequest() for this? If we can't use that for some reason, let's at least be consistent. UnpackRequest() is a static method of the class while this is a static function. Let's be consistent one way or another if we have both. http://gerrit.cloudera.org:8080/#/c/9690/3/be/src/runtime/krpc-data-stream-sender.cc File be/src/runtime/krpc-data-stream-sender.cc: http://gerrit.cloudera.org:8080/#/c/9690/3/be/src/runtime/krpc-data-stream-sender.cc@388 PS3, Line 388: resp_.queue_time() but that doesn't include time spent in the krpc service queue, right? so how accurate is this "network time" calculation? http://gerrit.cloudera.org:8080/#/c/9690/3/be/src/runtime/krpc-data-stream-sender.cc@389 PS3, Line 389: COUNTER_ADD(parent_->profile()->total_time_counter(), network_time); why is that? why does total time not include time spent waiting for the reply due to server side queuing? do we document somewhere was total time counter for the data-stream-sender should mean? http://gerrit.cloudera.org:8080/#/c/9690/3/be/src/runtime/runtime-state.h File be/src/runtime/runtime-state.h: http://gerrit.cloudera.org:8080/#/c/9690/3/be/src/runtime/runtime-state.h@a356 PS3, Line 356: is the "across all threads" part true? maybe include that in the new comment, otherwise it's not clear whether the time is wallclock time when any thread is waiting, or what. http://gerrit.cloudera.org:8080/#/c/9690/3/be/src/service/data-stream-service.cc File be/src/service/data-stream-service.cc: http://gerrit.cloudera.org:8080/#/c/9690/3/be/src/service/data-stream-service.cc@80 PS3, Line 80: // AddData() is guaranteed to eventually respond to this RPC so we don't do it here. I think it would be clearer to set the initial value of set_queue_time() here, rather than in the data-stream-mgr. That way, the logic for that computation is all in this class. http://gerrit.cloudera.org:8080/#/c/9690/3/be/src/service/data-stream-service.cc@88 PS3, Line 88: status.ToProto(response->mutable_status()); : response->set_queue_time(MonotonicNanos() - response->queue_time()); : ctx->RespondSuccess(); could just call RespondRpc(), no? http://gerrit.cloudera.org:8080/#/c/9690/3/common/protobuf/data_stream_service.proto File common/protobuf/data_stream_service.proto: http://gerrit.cloudera.org:8080/#/c/9690/3/common/protobuf/data_stream_service.proto@53 PS3, Line 53: // Queue time in the receiver. what is "queue time"? Also, indicate the units at least in the comment (and perhaps in the field name). -- To view, visit http://gerrit.cloudera.org:8080/9690 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8ba405921b3df920c1e85b940ce9c8d02fc647cd Gerrit-Change-Number: 9690 Gerrit-PatchSet: 3 Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com> Gerrit-Comment-Date: Thu, 22 Mar 2018 21:40:50 +0000 Gerrit-HasComments: Yes