BBlack added a comment.

The lack of graph data from falling off the history is a sad commentary on how long this has remained unresolved :(

Some salient points from earlier within this ticket, to recap:

Continuing with some stuff I was saying in IRC the other day. At the "new normal", we're seeing something in the approximate ballpark of 400/s articles purged (which is then multiplied commonly for ?action="" and mobile and ends up more like ~1600/s actual HTCP packets), whereas the edit rate across all projects is something like 10/s. That 400/s number used to be somewhere south of 100 before December.

Regardless, the average rate of HTCP these days is normally-flat-ish (a few scary spikes aside), and is mostly throttled by the jobqueue. The question still remains: what caused permanent, large bumps in the jobqueue htmlCacheUpdate insertion rate on ~Dec4, ~Dec11, and ~Jan20?

Re; the outstanding patch that's been seeing some bumps ( https://gerrit.wikimedia.org/r/#/c/295027/ ) - The situation has evolved since that patch was first uploaded. Our current maximum TTLs are capped at a single day in all cache layers. However, they can still add up across layers if the race to refresh content plays out just right, with the worst theoretical edge case being 4 total days (fetching from ulsfo when eqiad is the primary).

Those edge cases are also bounded by the actual Cache-Control (s-)max-age, but that's currently at two weeks still, I believe, so they don't really come into play. We should probably look at moving the mediawiki-config wgSquidMaxAge (and similar) down to something around 5-7 days, so that it's more reflective of the reality of the situation on the caches.

We'll eventually get to a point where we've eliminated the corner-case refreshes and can definitely say that the whole of the cache infrastructure has a hard cap at one full day, but there's more work to do there in T124954 + T50835 (Surrogate-Control) first.

I think even now, and especially once we reach that point later, we're starting to reach a point where purging Varnish for mass invalidations like refreshLinks and templating doesn't make sense. Those would be spooled out over a fairly long asynchronous period anyways. They can simply get updated as the now-short TTLs expire, reserving immediate HTCP invalidation for actual content edits of specific articles. Those kinds of ideas may need to be a separate discussion in another ticket?


TASK DETAIL
https://phabricator.wikimedia.org/T124418

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: GWicke, ArielGlenn, Krinkle, Peter, EBernhardson, Smalyshev, gerritbot, Legoktm, daniel, hoo, aude, Lydia_Pintscher, JanZerebecki, MZMcBride, Luke081515, aaron, faidon, Joe, ori, BBlack, Aklapper, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, Vali.matei, Zppix, Izno, Wikidata-bugs, Mbch331, Jay8g, fgiunchedi
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to