Asher, Do we know what our numbers are now? That's probably a pretty good baseline to start with as a discussion.
p99 banner request latency of 80ms Fundraising banners? From start of page load; or is this specifically how fast our API requests run? On the topic of APIs; we should set similar perf goals for requests to the API / jobs. This gets very subjective though because now we're talking about CPU time, memory usage, HDD usage, cache key space usage -- are these in your scope; or are we simply starting the discussion with response times? Further down the road -- consistency is going to be important (my box will profile differently than someone else's) so it seems like this is a good candidate for 'yet another' continuous integration test. I can easily see us being able to get an initial feel for response times in the CI environment. Or maybe we should just continuously hammer the alpha/beta servers... On deployment though -- currently the only way I know of to see if something is performing is to look directly at graphite -- can icinga/something alert us -- presumably via email? Ideally we would be able to set up new metrics as we go (obviously start with global page loads; but maybe I want to keep an eye on banner render time). I would love to get an email about something I've deployed under-performing. ~Matt Walker Wikimedia Foundation Fundraising Technology Team On Thu, Mar 21, 2013 at 6:40 PM, Asher Feldman <afeld...@wikimedia.org>wrote: > I'd like to push for a codified set of minimum performance standards that > new mediawiki features must meet before they can be deployed to larger > wikimedia sites such as English Wikipedia, or be considered complete. > > These would look like (numbers pulled out of a hat, not actual > suggestions): > > - p999 (long tail) full page request latency of 2000ms > - p99 page request latency of 800ms > - p90 page request latency of 150ms > - p99 banner request latency of 80ms > - p90 banner request latency of 40ms > - p99 db query latency of 250ms > - p90 db query latency of 50ms > - 1000 write requests/sec (if applicable; writes operations must be free > from concurrency issues) > - guidelines about degrading gracefully > - specific limits on total resource consumption across the stack per > request > - etc.. > > Right now, varying amounts of effort are made to highlight potential > performance bottlenecks in code review, and engineers are encouraged to > profile and optimize their own code. But beyond "is the site still up for > everyone / are users complaining on the village pump / am I ranting in > irc", we've offered no guidelines as to what sort of request latency is > reasonable or acceptable. If a new feature (like aftv5, or flow) turns out > not to meet perf standards after deployment, that would be a high priority > bug and the feature may be disabled depending on the impact, or if not > addressed in a reasonable time frame. Obviously standards like this can't > be applied to certain existing parts of mediawiki, but systems other than > the parser or preprocessor that don't meet new standards should at least be > prioritized for improvement. > > Thoughts? > > Asher > _______________________________________________ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l