Hello Kudu Jenkins, Andrew Wong, Adar Dembo, Hao Hao,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15254

to look at the new patch set (#2).

Change subject: KUDU-3056: Reduce HdrHistogramAccumulator overhead
......................................................................

KUDU-3056: Reduce HdrHistogramAccumulator overhead

This patch makes a few changes to reduce the overhead of the
HdrHistogramAccumulator.

It changes from using `SynchronizedHistogram` (value type
long) to using `IntCountsHistogram` (value type int).
This significantly reduces the data footprint of the histogram and is
safe given write durations will never exceed `Integer.MAX_VALUE`.
Because thread safety is still important we syncronize all access
to `IntCountsHistogram` in `HistogramWrapper`.

It also adjusts the `HistogramWrapper` to lazily instantiate an
`IntCountsHistogram`. This means that if no values are recorded,
the overhead of the `HdrHistogramAccumulator` should be almost
zero.

Lastly it reduces the `numberOfSignificantValueDigits` tracked
in the histogram from 3 to 2. The result is relatively similar
output in the Spark accumulator with a significantly smaller
histogram.

I tested each variant using `getEstimatedFootprintInBytes()` and
the result is that the new implementation is 90% smaller when the
HdrHistogramAccumulator is used. The new implementation
is 100% smaller when no values are stored:

long w/ precision 3 & max 30000ms: 49664 (current)
long w/ precision 2 & max 30000ms: 9728
long w/ precision 1 & max 30000ms: 2048
int  w/ precision 3 & max 30000ms: 25088
int  w/ precision 2 & max 30000ms: 5120 (new)
int  w/ precision 1 & max 30000ms: 1280

Note: I used a max of 30000ms in these calculations because that
is the default operation timeout

Below is sample string output from before and after this patch
generated with 1000 random values between 0ms and 500ms.

Before:
0.2%: 0ms, 50.3%: 265ms, 75.1%: 376ms, 87.5%: 437ms, 93.8%: 470ms, 96.9%: 
484ms, 98.6%: 493ms, 99.5%: 496ms, 99.8%: 498ms, 100.0%: 499ms, 100.0%: 499ms

After:
0.2%: 0ms, 50.3%: 265ms, 75.4%: 377ms, 87.5%: 437ms, 93.9%: 471ms, 97.3%: 
485ms, 98.6%: 493ms, 99.5%: 497ms, 100.0%: 499ms, 100.0%: 499ms

Note: I used the same seed so as to generate the same values for both strings.

Change-Id: Ic7c2a33bc61a2baa38703ea3340a07e06ab39db3
---
M 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/HdrHistogramAccumulator.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
2 files changed, 72 insertions(+), 27 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/15254/2
--
To view, visit http://gerrit.cloudera.org:8080/15254
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic7c2a33bc61a2baa38703ea3340a07e06ab39db3
Gerrit-Change-Number: 15254
Gerrit-PatchSet: 2
Gerrit-Owner: Grant Henke <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Hao Hao <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)

Reply via email to