Re: Review Request 53965: track percentile values for timed sliding stats

Zameer Manji Tue, 22 Nov 2016 10:27:17 -0800


> On Nov. 22, 2016, 10:07 a.m., David McLaughlin wrote:
> > Thanks for the patch! I think percentiles are crucial for analyzing 
> > scheduling performance and I'm pretty excited about having them for our 
> > latencies. However, I'm not sure if it makes sense to patch our local fork 
> > of twitter.commons. It might be better to either:
> > 
> > a) Upstream these changes to twitter.commons (where we might get better 
> > feedback on the approach here) where the tests for the code actually runs, 
> > and then copy them over into our fork 
> > 
> > OR
> > 
> > b) Move off twitter-commons stats and use something like util-stats 
> > (https://github.com/twitter/util/tree/develop/util-stats) instead. 
> > 
> > 
> > The key advantage of (b) is that we'd be using a library that is very 
> > lightweight but has both percentiles and can support multiple granularities 
> > (e.g. for per second metrics for graphview, but minutely metrics for long 
> > term storage) and has a migration strategy (I've done this migration 
> > internally at Twitter myself). 
> > 
> > I'm not sure what the implications of using (b) would be in terms of our 
> > dependency graph, though. AFAIK we forked twitter-commons originally 
> > because of that.
> 
> Joshua Cohen wrote:
>     I know Zameer had some intentions of modernizing our stats suite and had 
> been doing some early work that was laying the groundwork for replacing 
> Twitter Commons[1][2]. I'm definitely in favor of reducing our usage of 
> Twitter Commons, replacing it with something that's actively maintained 
> wherever possible (e.g. the recent Curator work to replace our ZK singleton 
> service). So given that, I definitely don't want to contribute this patch 
> back upstream to Twitter Commons; I think that ship sailed when we forked it. 
> I'm ok with a stop-gap approach where we update Percentile in our fork with a 
> long term goal of moving to some other stats library, whatever that may be.
>     
>     (And for what it's worth, the tests for Twitter Commons definitely run in 
> our fork, I verified this locally by adding a `fail()` to `PercentileTest`, 
> running `gradlew build` then failed as expected).
>     
>     
>     [1] 
> https://github.com/apache/aurora/commit/059b08621ae4892d98954a6cd0e88f274cdd1bef
>     [2] 
> https://github.com/apache/aurora/commit/9b34a4036ab84f560ca80dfce9bdcd831efdeb30


I was hoping for us to migrate to http://metrics.dropwizard.io/3.1.0/ which is 
a very widely used and popular metrics library. On par on how Curator is the ZK 
library for Java.

I believe it is possible for us to hide metrics behind the `StatsProvider` 
interface.


- Zameer


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53965/#review156595
-----------------------------------------------------------


On Nov. 22, 2016, 8:35 a.m., Reza Motamedi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/53965/
> -----------------------------------------------------------
> 
> (Updated Nov. 22, 2016, 8:35 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, Mehrdad 
> Nurolahzade, and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> The timing of many methods are sampled with @Timed annotation. Currently 
> SlidingSatas is being used to track total-msec-in-method-x and 
> times-invoked-of-method-x and then compute the 
> mean-msec-in-method-x-per-invocation. However mean values are known not to 
> represent distrubtions with high skewness. 
> 
> This is request for review feedback for changes on computing stats, esp. 
> timing stats by adding percentile.
> 
> This patch introduces `SlidingTimedStats`, which uses `SlidingStats` to track 
> the mean mean-msec-in-method-x-per-invocation for events in the last K 
> sampling windows and `Percentiles` to track the p-values for msec-in-method-x 
> for all invocations in the last K windows.
> 
> __changes overview__:
> - Introduced `SlidingTimedStats` to measure percentiles on all @Timed method 
> calls
> - Comments added for a better way to calculate ratio based on two rates (num 
> and denom) in `SlidingStats`.
> - Tests added
> 
> _sample new stat:_
> attribute_store_fetch_one_10_0_percentile 0.0
> attribute_store_fetch_one_50_0_percentile 0.0
> attribute_store_fetch_one_90_0_percentile 0.0
> attribute_store_fetch_one_99_0_percentile 0.0
> attribute_store_fetch_one_99_9_percentile 0.0
> attribute_store_fetch_one_99_99_percentile 0.0
> 
> 
> Diffs
> -----
> 
>   commons/src/main/java/org/apache/aurora/common/inject/TimedInterceptor.java 
> a4c4240b67cafb108ca587ad86d9fb9d9bb1553d 
>   commons/src/main/java/org/apache/aurora/common/stats/SlidingStats.java 
> f7a5ae41e307627fc55157758e9b7cdd861c3268 
>   commons/src/main/java/org/apache/aurora/common/stats/SlidingTimedStats.java 
> PRE-CREATION 
>   
> commons/src/test/java/org/apache/aurora/common/stats/SlidingTimedStatsTest.java
>  PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/53965/diff/
> 
> 
> Testing
> -------
> 
> SlidingTimedStatsTest
> > passed
> 
> Seems like gradle is not set to run tests on `common`. Tests passed in IDEA.
> 
> 
> Thanks,
> 
> Reza Motamedi
> 
>

Re: Review Request 53965: track percentile values for timed sliding stats

Reply via email to