Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14662 )

Change subject: IMPALA-9128: improved diagnostics for slow data stream RPCs
......................................................................


Patch Set 1:

(7 comments)

Address most of the comments, just had another question about the receiver 
timing.

http://gerrit.cloudera.org:8080/#/c/14662/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14662/1//COMMIT_MSG@16
PS1, Line 16: a
> typo
Done


http://gerrit.cloudera.org:8080/#/c/14662/1/be/src/common/global-flags.cc
File be/src/common/global-flags.cc:

http://gerrit.cloudera.org:8080/#/c/14662/1/be/src/common/global-flags.cc@291
PS1, Line 291: DEFINE_int64(impala_slow_rpc_threshold_ms, 2 * 60 * 1000,
> (Advanced)?
Done. Did both.


http://gerrit.cloudera.org:8080/#/c/14662/1/be/src/runtime/krpc-data-stream-sender.h
File be/src/runtime/krpc-data-stream-sender.h:

http://gerrit.cloudera.org:8080/#/c/14662/1/be/src/runtime/krpc-data-stream-sender.h@225
PS1, Line 225:   RuntimeProfile::SummaryStatsCounter* recvr_time_stats_ = 
nullptr;
> Does this provide us with enough info to determine if slow rpcs from the re
I think you're right on that one - it looks like the timer starts when the 
incoming RPC is queued, not when Impala's handler is called. It's asymmetrical 
though in the the timer stops before the output RPC is queued, so it includes 
the KRPC overhead in one direction only (at least that's what it looks like 
based on my limited understanding).

I think ideally we'd use tracing to get more granular breakdown of where the 
time went (I'm also looking into that). I'm a little reluctant to add another 
timer that's very similar to an existing one, just cause of the confusion 
factor.

I wonder if it would make more sense to tweak receiver_time_ns so that it only 
includes the Impala RPC handling time. I can't really tell if the current 
definition of receiver time is that way for a good reason or just arbitrarily.


http://gerrit.cloudera.org:8080/#/c/14662/1/be/src/runtime/krpc-data-stream-sender.cc
File be/src/runtime/krpc-data-stream-sender.cc:

http://gerrit.cloudera.org:8080/#/c/14662/1/be/src/runtime/krpc-data-stream-sender.cc@342
PS1, Line 342:   LOG(INFO) << "slow " << rpc_name << " RPC to " << 
TNetworkAddressToString(address_)
> nit: Slow
Done


http://gerrit.cloudera.org:8080/#/c/14662/1/be/src/runtime/krpc-data-stream-sender.cc@361
PS1, Line 361: int64_t elapsed_time_ms = elapsed_time_ns / NANOS_PER_MICRO / 
MICROS_PER_MILLI;
             :     if (elapsed_time_ms > FLAGS_impala_slow_rpc_threshold_ms) {
> IsSlowRpc
Done


http://gerrit.cloudera.org:8080/#/c/14662/1/be/src/runtime/krpc-data-stream-sender.cc@453
PS1, Line 453:     DoRpcFn rpc_fn =
> Do we want to collect these timing stats for failed rpcs too?
I added the slow RPC logging for failed RPCs. The stats seemed a bit confusing 
because we don't have the receiver time, so can't calculate the same values.  I 
decided not to tackle that problem


http://gerrit.cloudera.org:8080/#/c/14662/1/be/src/runtime/krpc-data-stream-sender.cc@565
PS1, Line 565:     DoRpcFn rpc_fn =
> Same as above: do we want these stats for failed rpcs?
See above - I added the logging only.



--
To view, visit http://gerrit.cloudera.org:8080/14662
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I258ac91b9fbbdbc86d0e8091c34f511f8957c4cd
Gerrit-Change-Number: 14662
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Comment-Date: Fri, 08 Nov 2019 14:28:40 +0000
Gerrit-HasComments: Yes

Reply via email to