Grant Henke has uploaded this change for review. ( http://gerrit.cloudera.org:8080/15254
Change subject: KUDU-3056: Reduce HdrHistogramAccumulator overhead ...................................................................... KUDU-3056: Reduce HdrHistogramAccumulator overhead This patch makes a few changes to reduce the overhead of the HdrHistogramAccumulator. It changes from using `SynchronizedHistogram` (value type long) to using `IntCountsHistogram` (value type int). This significantly reduces the data footprint of the histogram and is safe given write durations will never exceed `Integer.MAX_VALUE`. Because thread safety is still important we syncronize all access to `IntCountsHistogram` in `HistogramWrapper`. It also adjust the `HistogramWrapper` to lazily instantiate an `IntCountsHistogram`. This means that if no values are recorded, the overhead of the `HdrHistogramAccumulator` should be almost zero. Last it reduces the `numberOfSignificantValueDigits` tracked in the histogram from 3 to 2. The result is relatively similar output in the Spark accumulator with a significantly smaller histogram. I tested each variant using `getEstimatedFootprintInBytes()` and the result is that the new implimentation is 90% smaller when the HdrHistogramAccumulator is used. The new implementation is 100% smaller when not no values are stored: long w/ precision 3 & max 30000ms: 49664 (current) long w/ precision 2 & max 30000ms: 9728 long w/ precision 1 & max 30000ms: 2048 int w/ precision 3 & max 30000ms: 25088 int w/ precision 2 & max 30000ms: 5120 (new) int w/ precision 1 & max 30000ms: 1280 Note: I used a max of 30000ms in these calculations because that is the default operation timeout Below is sample string output from before and after this patch generated with 1000 random values between 0ms and 500ms. Before: 0.2%: 0ms, 50.3%: 265ms, 75.1%: 376ms, 87.5%: 437ms, 93.8%: 470ms, 96.9%: 484ms, 98.6%: 493ms, 99.5%: 496ms, 99.8%: 498ms, 100.0%: 499ms, 100.0%: 499ms After: 0.2%: 0ms, 50.3%: 265ms, 75.4%: 377ms, 87.5%: 437ms, 93.9%: 471ms, 97.3%: 485ms, 98.6%: 493ms, 99.5%: 497ms, 100.0%: 499ms, 100.0%: 499ms Note: I used the same seed to generate the same values for both strings. Change-Id: Ic7c2a33bc61a2baa38703ea3340a07e06ab39db3 --- M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/HdrHistogramAccumulator.scala M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala 2 files changed, 68 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/15254/1 -- To view, visit http://gerrit.cloudera.org:8080/15254 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ic7c2a33bc61a2baa38703ea3340a07e06ab39db3 Gerrit-Change-Number: 15254 Gerrit-PatchSet: 1 Gerrit-Owner: Grant Henke <[email protected]>
