Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14662 )
Change subject: IMPALA-9128: improved diagnostics for slow data stream RPCs ...................................................................... Patch Set 1: (7 comments) Address most of the comments, just had another question about the receiver timing. http://gerrit.cloudera.org:8080/#/c/14662/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/14662/1//COMMIT_MSG@16 PS1, Line 16: a > typo Done http://gerrit.cloudera.org:8080/#/c/14662/1/be/src/common/global-flags.cc File be/src/common/global-flags.cc: http://gerrit.cloudera.org:8080/#/c/14662/1/be/src/common/global-flags.cc@291 PS1, Line 291: DEFINE_int64(impala_slow_rpc_threshold_ms, 2 * 60 * 1000, > (Advanced)? Done. Did both. http://gerrit.cloudera.org:8080/#/c/14662/1/be/src/runtime/krpc-data-stream-sender.h File be/src/runtime/krpc-data-stream-sender.h: http://gerrit.cloudera.org:8080/#/c/14662/1/be/src/runtime/krpc-data-stream-sender.h@225 PS1, Line 225: RuntimeProfile::SummaryStatsCounter* recvr_time_stats_ = nullptr; > Does this provide us with enough info to determine if slow rpcs from the re I think you're right on that one - it looks like the timer starts when the incoming RPC is queued, not when Impala's handler is called. It's asymmetrical though in the the timer stops before the output RPC is queued, so it includes the KRPC overhead in one direction only (at least that's what it looks like based on my limited understanding). I think ideally we'd use tracing to get more granular breakdown of where the time went (I'm also looking into that). I'm a little reluctant to add another timer that's very similar to an existing one, just cause of the confusion factor. I wonder if it would make more sense to tweak receiver_time_ns so that it only includes the Impala RPC handling time. I can't really tell if the current definition of receiver time is that way for a good reason or just arbitrarily. http://gerrit.cloudera.org:8080/#/c/14662/1/be/src/runtime/krpc-data-stream-sender.cc File be/src/runtime/krpc-data-stream-sender.cc: http://gerrit.cloudera.org:8080/#/c/14662/1/be/src/runtime/krpc-data-stream-sender.cc@342 PS1, Line 342: LOG(INFO) << "slow " << rpc_name << " RPC to " << TNetworkAddressToString(address_) > nit: Slow Done http://gerrit.cloudera.org:8080/#/c/14662/1/be/src/runtime/krpc-data-stream-sender.cc@361 PS1, Line 361: int64_t elapsed_time_ms = elapsed_time_ns / NANOS_PER_MICRO / MICROS_PER_MILLI; : if (elapsed_time_ms > FLAGS_impala_slow_rpc_threshold_ms) { > IsSlowRpc Done http://gerrit.cloudera.org:8080/#/c/14662/1/be/src/runtime/krpc-data-stream-sender.cc@453 PS1, Line 453: DoRpcFn rpc_fn = > Do we want to collect these timing stats for failed rpcs too? I added the slow RPC logging for failed RPCs. The stats seemed a bit confusing because we don't have the receiver time, so can't calculate the same values. I decided not to tackle that problem http://gerrit.cloudera.org:8080/#/c/14662/1/be/src/runtime/krpc-data-stream-sender.cc@565 PS1, Line 565: DoRpcFn rpc_fn = > Same as above: do we want these stats for failed rpcs? See above - I added the logging only. -- To view, visit http://gerrit.cloudera.org:8080/14662 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I258ac91b9fbbdbc86d0e8091c34f511f8957c4cd Gerrit-Change-Number: 14662 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Comment-Date: Fri, 08 Nov 2019 14:28:40 +0000 Gerrit-HasComments: Yes
