Hi Samuel,
my name is Peter and I work in the performance team. I also read the post and I 
also found it interesting. Our performance metrics are viewable in Grafana, a 
good start point is the performance summary dashboard: 
https://grafana.wikimedia.org/d/cZgMg49Wz/performance-summary. We have many 
dashboards but we lack some documentations, so please ask so I can guide you.

We collect and keep track of performance metrics directly from our users, we 
run synthetic browser tests every X hour where we record a video of the browser 
screen, collect visual metrics and we also run some tests on commits.

The largest research we've done in this is the study Gilles did about 
correlation between what the user perceive vs browser metrics 
https://techblog.wikimedia.org/2019/06/17/performance-perception-correlation-to-rum-metrics/
 and the paper https://nonsns.github.io/paper/rossi19www.pdf.

For regressions, I've gone through the same path as the people at Netflix by 
trying different amount of runs, taking median/fastest/slowest runs etc to find 
more "stable" metrics. We don't proxy performance by memory usage, we focus 
more on visual metrics for the users and for us we need to do more than three 
runs. We do 5-11 runs depending on what we test. I haven't blogged about that 
work but it should be in some Phabricator tasks, I can look it up if you are 
interested. What is also interesting is what kind of practical regression you 
could find. In our most trimmed systems I think we can find performance 
regressions that are slighlty over 2%. But there's parts where the regression 
needs to be 10-20% for us to get alerts. 

I wrote a blog post a couple of years ago about one regression 
https://techblog.wikimedia.org/2018/10/03/best-friends-forever/

I like the use of anomaly detection, we discussed that in the teams some time 
ago but we haven't tried it out. Today we mostly use static thresholds in some 
way. I think a tool for anomaly detection would be something many teams could 
use.

I really like that they have statistics about false alerts etc. We don't have 
that today but we should. I started to keep track of them manually, but hmm I 
failed :)

Best
Peter Hedenskog
_______________________________________________
Wiki-research-l mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to