> On Jun 12, 2017, at 4:54 PM, Pavol Vaskovic <p...@pali.sk> wrote:
> 
> 
> 
> On Mon, Jun 12, 2017 at 11:55 PM, Michael Gottesman <mgottes...@apple.com 
> <mailto:mgottes...@apple.com>> wrote:
> 
> The current design assumes that in such cases, the workload will be increased 
> so that is not an issue.
> 
> I understand. But clearly some part of our process is failing, because there 
> are multiple benchmarks in 10ms range in the tree for months without fixing 
> this.

I think that is just inertia and being busy. Patch? I'll review = ).

>  
> The reason why we use the min is that statistically we are not interesting in 
> estimated the "mean" or "center" of the distribution. Rather, we are actually 
> interested in the "speed of light" of the computation implying that we are 
> looking for the min.
> 
> I understand that. But all measurements have a certain degree of error 
> associated with them. Our issue is two-fold: we need to differentiate between 
> normal variation between measured samples under "perfect" conditions and 
> samples that are worse because of interference from other background 
> processes.

I disagree. CPUs are inherently messy but disruptions tend to be due to 
temporary spikes most of the time once you have quieted down your system by 
unloading a few processes.

>  
> What do you mean by anomalous results?
> 
> I mean results that significantly stand out from the measured sample 
> population.

What that could mean is that we need to run a couple of extra iterations to 
warm up the cpu/cache/etc before we start gathering samples.

> 
>> Currently I'm working on improved sample filtering algorithm. Stay tuned for 
>> demonstration in Benchmark_Driver (Python), if it pans out, it might be time 
>> to change adaptive sampling in DriverUtil.swift.
> 
> Have you looked at using the Mann-Whitney U algorithm? (I am not sure if we 
> are using it or not)
> 
> I don't know what that is.

Check it out: https://en.wikipedia.org/wiki/Mann–Whitney_U_test 
<https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test>. It is a 
non-parametric test that two sets of samples are from the same distribution. As 
a bonus, it does not assume that our data is from a normal distribution (a 
problem with using mean/standard deviation which assumes a normal distribution).

We have been using Mann-Whitney internally for a while successfully to reduce 
the noise.

> Here's what I've been doing:
> 
> Depending on the "weather" on the test machine, you sometimes measure 
> anomalies. So I'm tracking the coefficient of variance from the sample 
> population and purging anomalous results (1 sigma from max) when it exceeds 
> 5%. This results in quite solid sample population where standard deviation is 
> a meaningful value, that can be use in judging the significance of change 
> between master and branch.
> 
> --Pavol

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

Reply via email to