Re: Review Request 53965: track percentile values for timed sliding stats

Joshua Cohen Tue, 22 Nov 2016 10:20:45 -0800


> On Nov. 22, 2016, 6:07 p.m., David McLaughlin wrote:
> > Thanks for the patch! I think percentiles are crucial for analyzing 
> > scheduling performance and I'm pretty excited about having them for our 
> > latencies. However, I'm not sure if it makes sense to patch our local fork 
> > of twitter.commons. It might be better to either:
> > 
> > a) Upstream these changes to twitter.commons (where we might get better 
> > feedback on the approach here) where the tests for the code actually runs, 
> > and then copy them over into our fork 
> > 
> > OR
> > 
> > b) Move off twitter-commons stats and use something like util-stats 
> > (https://github.com/twitter/util/tree/develop/util-stats) instead. 
> > 
> > 
> > The key advantage of (b) is that we'd be using a library that is very 
> > lightweight but has both percentiles and can support multiple granularities 
> > (e.g. for per second metrics for graphview, but minutely metrics for long 
> > term storage) and has a migration strategy (I've done this migration 
> > internally at Twitter myself). 
> > 
> > I'm not sure what the implications of using (b) would be in terms of our 
> > dependency graph, though. AFAIK we forked twitter-commons originally 
> > because of that.


I know Zameer had some intentions of modernizing our stats suite and had been 
doing some early work that was laying the groundwork for replacing Twitter 
Commons[1][2]. I'm definitely in favor of reducing our usage of Twitter 
Commons, replacing it with something that's actively maintained wherever 
possible (e.g. the recent Curator work to replace our ZK singleton service). So 
given that, I definitely don't want to contribute this patch back upstream to 
Twitter Commons; I think that ship sailed when we forked it. I'm ok with a 
stop-gap approach where we update Percentile in our fork with a long term goal 
of moving to some other stats library, whatever that may be.

(And for what it's worth, the tests for Twitter Commons definitely run in our 
fork, I verified this locally by adding a `fail()` to `PercentileTest`, running 
`gradlew build` then failed as expected).


[1] 
https://github.com/apache/aurora/commit/059b08621ae4892d98954a6cd0e88f274cdd1bef
[2] 
https://github.com/apache/aurora/commit/9b34a4036ab84f560ca80dfce9bdcd831efdeb30


- Joshua


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53965/#review156595
-----------------------------------------------------------


On Nov. 22, 2016, 4:35 p.m., Reza Motamedi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/53965/
> -----------------------------------------------------------
> 
> (Updated Nov. 22, 2016, 4:35 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, Mehrdad 
> Nurolahzade, and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> The timing of many methods are sampled with @Timed annotation. Currently 
> SlidingSatas is being used to track total-msec-in-method-x and 
> times-invoked-of-method-x and then compute the 
> mean-msec-in-method-x-per-invocation. However mean values are known not to 
> represent distrubtions with high skewness. 
> 
> This is request for review feedback for changes on computing stats, esp. 
> timing stats by adding percentile.
> 
> This patch introduces `SlidingTimedStats`, which uses `SlidingStats` to track 
> the mean mean-msec-in-method-x-per-invocation for events in the last K 
> sampling windows and `Percentiles` to track the p-values for msec-in-method-x 
> for all invocations in the last K windows.
> 
> __changes overview__:
> - Introduced `SlidingTimedStats` to measure percentiles on all @Timed method 
> calls
> - Comments added for a better way to calculate ratio based on two rates (num 
> and denom) in `SlidingStats`.
> - Tests added
> 
> _sample new stat:_
> attribute_store_fetch_one_10_0_percentile 0.0
> attribute_store_fetch_one_50_0_percentile 0.0
> attribute_store_fetch_one_90_0_percentile 0.0
> attribute_store_fetch_one_99_0_percentile 0.0
> attribute_store_fetch_one_99_9_percentile 0.0
> attribute_store_fetch_one_99_99_percentile 0.0
> 
> 
> Diffs
> -----
> 
>   commons/src/main/java/org/apache/aurora/common/inject/TimedInterceptor.java 
> a4c4240b67cafb108ca587ad86d9fb9d9bb1553d 
>   commons/src/main/java/org/apache/aurora/common/stats/SlidingStats.java 
> f7a5ae41e307627fc55157758e9b7cdd861c3268 
>   commons/src/main/java/org/apache/aurora/common/stats/SlidingTimedStats.java 
> PRE-CREATION 
>   
> commons/src/test/java/org/apache/aurora/common/stats/SlidingTimedStatsTest.java
>  PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/53965/diff/
> 
> 
> Testing
> -------
> 
> SlidingTimedStatsTest
> > passed
> 
> Seems like gradle is not set to run tests on `common`. Tests passed in IDEA.
> 
> 
> Thanks,
> 
> Reza Motamedi
> 
>

Re: Review Request 53965: track percentile values for timed sliding stats

Reply via email to