Asher,

Do we know what our numbers are now? That's probably a pretty good baseline
to start with as a discussion.

p99 banner request latency of 80ms

Fundraising banners? From start of page load; or is this specifically how
fast our API requests run?

On the topic of APIs; we should set similar perf goals for requests to the
API / jobs. This gets very subjective though because now we're talking
about CPU time, memory usage, HDD usage, cache key space usage -- are these
in your scope; or are we simply starting the discussion with response times?

Further down the road -- consistency is going to be important (my box will
profile differently than someone else's) so it seems like this is a good
candidate for 'yet another' continuous integration test. I can easily see
us being able to get an initial feel for response times in the
CI environment. Or maybe we should just continuously hammer the alpha/beta
servers...

On deployment though -- currently the only way I know of to see if
something is performing is to look directly at graphite -- can
icinga/something alert us -- presumably via email? Ideally we would be able
to set up new metrics as we go (obviously start with global page loads; but
maybe I want to keep an eye on banner render time). I would love to get an
email about something I've deployed under-performing.

~Matt Walker
Wikimedia Foundation
Fundraising Technology Team


On Thu, Mar 21, 2013 at 6:40 PM, Asher Feldman <afeld...@wikimedia.org>wrote:

> I'd like to push for a codified set of minimum performance standards that
> new mediawiki features must meet before they can be deployed to larger
> wikimedia sites such as English Wikipedia, or be considered complete.
>
> These would look like (numbers pulled out of a hat, not actual
> suggestions):
>
> - p999 (long tail) full page request latency of 2000ms
> - p99 page request latency of 800ms
> - p90 page request latency of 150ms
> - p99 banner request latency of 80ms
> - p90 banner request latency of 40ms
> - p99 db query latency of 250ms
> - p90 db query latency of 50ms
> - 1000 write requests/sec (if applicable; writes operations must be free
> from concurrency issues)
> - guidelines about degrading gracefully
> - specific limits on total resource consumption across the stack per
> request
> - etc..
>
> Right now, varying amounts of effort are made to highlight potential
> performance bottlenecks in code review, and engineers are encouraged to
> profile and optimize their own code.  But beyond "is the site still up for
> everyone / are users complaining on the village pump / am I ranting in
> irc", we've offered no guidelines as to what sort of request latency is
> reasonable or acceptable.  If a new feature (like aftv5, or flow) turns out
> not to meet perf standards after deployment, that would be a high priority
> bug and the feature may be disabled depending on the impact, or if not
> addressed in a reasonable time frame.  Obviously standards like this can't
> be applied to certain existing parts of mediawiki, but systems other than
> the parser or preprocessor that don't meet new standards should at least be
> prioritized for improvement.
>
> Thoughts?
>
> Asher
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to