[Impala-ASF-CR] IMPALA-4187: Switch RPC latency metrics to histograms
Hello Juan Yu, Matthew Jacobs, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4516 to look at the new patch set (#4). Change subject: IMPALA-4187: Switch RPC latency metrics to histograms .. IMPALA-4187: Switch RPC latency metrics to histograms It's usually better to measure latency distributions with histograms, not avg / min / max. This change switches out the metrics that measure the latency of RPC processing time to histograms. Add HistogramMetric::Reset() to allow histogram to remove all entries. Added spinlock around histogram access. On a 8-core i7 @ 3.4GHz, the following throughputs were observed: 1 thread -> 25M updates/sec 4 threads -> 7M updates/sec 16 threads -> 5M updates/sec Each histogram takes ~108KB of storage for its buckets. This can be reduced by reducing the maximum value, currently 60 minutes. The new metrics have the following text representation: Count: 148, 25th %-ile: 0, 50th %-ile: 0, 75th %-ile: 0, 90th %-ile: 0, 95th %-ile: 0, 99.9th %-ile: 1ms Which changes from the old: count: 345, last: 6ms, min: 0, max: 12ms, mean: 1ms, stddev: 1ms Change-Id: I9ba6d4270dd5676eeeff35ad8d9dc5dcddd95e34 --- M be/src/rpc/rpc-trace.cc M be/src/rpc/rpc-trace.h M be/src/util/histogram-metric.h M common/thrift/metrics.json 4 files changed, 64 insertions(+), 37 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/4516/4 -- To view, visit http://gerrit.cloudera.org:8080/4516 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9ba6d4270dd5676eeeff35ad8d9dc5dcddd95e34 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry RobinsonGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-4187: Switch RPC latency metrics to histograms
Dan Hecht has posted comments on this change. Change subject: IMPALA-4187: Switch RPC latency metrics to histograms .. Patch Set 3: Code-Review+2 (1 comment) Can you give an example of what the old and new metries look like? Does CM need to be updated at all? http://gerrit.cloudera.org:8080/#/c/4516/3/be/src/util/histogram-metric.h File be/src/util/histogram-metric.h: Line 107: // Protects histogram_ how about: Protects histogram_ pointer. to make it clear that it's only the pointer, not the implementation that's being protected. -- To view, visit http://gerrit.cloudera.org:8080/4516 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9ba6d4270dd5676eeeff35ad8d9dc5dcddd95e34 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry RobinsonGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4187: Switch RPC latency metrics to histograms
Henry Robinson has posted comments on this change. Change subject: IMPALA-4187: Switch RPC latency metrics to histograms .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/4516/3/be/src/rpc/rpc-trace.cc File be/src/rpc/rpc-trace.cc: PS3, Line 179: constexpr int32_t SIXTY_MINUTES_IN_MS = 60 * 1000 * 60; > We'll see users with long running queries go over this (not uncommon) for h I checked - any value > 60 minutes will be clipped to 60 minutes. That seems ok - beyond 60 minutes I'm not sure there's much use in knowing the distribution. 60 minutes might be too much since most RPCs will never get anywhere close, but the total data structure cost is 100KB and there are only a handful of them for all the RPCs we support. -- To view, visit http://gerrit.cloudera.org:8080/4516 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9ba6d4270dd5676eeeff35ad8d9dc5dcddd95e34 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry RobinsonGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4187: Switch RPC latency metrics to histograms
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4187: Switch RPC latency metrics to histograms .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/4516/3/be/src/rpc/rpc-trace.cc File be/src/rpc/rpc-trace.cc: PS3, Line 179: constexpr int32_t SIXTY_MINUTES_IN_MS = 60 * 1000 * 60; We'll see users with long running queries go over this (not uncommon) for hs2::ExecuteStatement/beeswax::query I'm not sure it's worth doing anything about, but do you know how it will be handled? -- To view, visit http://gerrit.cloudera.org:8080/4516 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9ba6d4270dd5676eeeff35ad8d9dc5dcddd95e34 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry RobinsonGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4187: Switch RPC latency metrics to histograms
Juan Yu has posted comments on this change. Change subject: IMPALA-4187: Switch RPC latency metrics to histograms .. Patch Set 3: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/4516 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9ba6d4270dd5676eeeff35ad8d9dc5dcddd95e34 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry RobinsonGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4187: Switch RPC latency metrics to histograms
Juan Yu has posted comments on this change. Change subject: IMPALA-4187: Switch RPC latency metrics to histograms .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/4516/2/be/src/rpc/rpc-trace.cc File be/src/rpc/rpc-trace.cc: PS2, Line 181: SIXTY_MINUTES_IN_MS If I understand correctly, this means the longest RPC call could be 60min. Isn't it too long? most of RPC should be in ms range or even shorter, 1 min is already abnormal, right? http://gerrit.cloudera.org:8080/#/c/4516/2/be/src/util/histogram-metric.h File be/src/util/histogram-metric.h: PS2, Line 37: highest_trackable_value, num_significant_digits would it be worth to store those in HistogramMetric? -- To view, visit http://gerrit.cloudera.org:8080/4516 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9ba6d4270dd5676eeeff35ad8d9dc5dcddd95e34 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry RobinsonGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Juan Yu Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4187: Switch RPC latency metrics to histograms
Henry Robinson has posted comments on this change. Change subject: IMPALA-4187: Switch RPC latency metrics to histograms .. Patch Set 2: The core change to HistogramMetric (adding a Reset() method) is needed for our proposed solution to https://gerrit.cloudera.org/#/c/4500/. Since I had this patch all ready to go, might as well get the RPC metric changes reviewed at the same time. -- To view, visit http://gerrit.cloudera.org:8080/4516 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9ba6d4270dd5676eeeff35ad8d9dc5dcddd95e34 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry RobinsonGerrit-Reviewer: Henry Robinson Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4187: Switch RPC latency metrics to histograms
Henry Robinson has uploaded a new change for review. http://gerrit.cloudera.org:8080/4516 Change subject: IMPALA-4187: Switch RPC latency metrics to histograms .. IMPALA-4187: Switch RPC latency metrics to histograms It's usually better to measure latency distributions with histograms, not avg / min / max. This change switches out the metrics that measure the latency of RPC processing time to histograms. Add HistogramMetric::Reset() to allow histogram to remove all entries. Added spinlock around histogram access. Change-Id: I9ba6d4270dd5676eeeff35ad8d9dc5dcddd95e34 --- M be/src/rpc/rpc-trace.cc M be/src/rpc/rpc-trace.h M be/src/util/histogram-metric.h 3 files changed, 58 insertions(+), 36 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/4516/2 -- To view, visit http://gerrit.cloudera.org:8080/4516 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I9ba6d4270dd5676eeeff35ad8d9dc5dcddd95e34 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson