EBernhardson added a comment.

It seems there are a couple options here, my thoughts:

Reduce the default refresh interval

By default we use a 30 second refresh interval for all wikis. This means that 30 seconds worth of updates get bundled together into a single update. Updates are not searchable until they have been refreshed. While it may not be particularly important on an individual wiki level, across 9k shards in the cluster this saves us considerable IO. We could potentially reduce the refresh rate only for wikidata to 5 seconds, or maybe even 1 second (the elasticsearch default). Trying to estimate the effect this has on the cluster is difficult, but my gut feeling is that a 5 second refresh for only 21 wikidatawiki_content shards it would probably go un-noticed.

It looks like our data collection around refresh is busted, graphite has intermittent data for some reason. Our indexing rate is fairly constant through the day though so i pulled some numbers directly for 2 minutes worth of activity (at ~23:20UTC) and saw 19 refreshes/second (1157/minute) across the full eqiad cluster. Worst case on increasing wikidatawiki_content with 21 shards from 30s to 5s would be from current 0.7/sec (42/minute) to 4.2/sec (252/minute) or an 18% increase. Likely not every shard is flushed on every opportunity.

Force refreshes from the cirrus codebase

Rather than increasing the default refresh rate, we could explicitly issue refreshes in the limited cases that we know it's important. Conceptually (i may have simply not thought about it enough) de-bouncing these to keep from issuing 100 refreshes in the same second seems non-trivial. We could certainly throttle the actions, but ensuring it happens after the throttle time runs out might not be so easy.

Best option?

In general I think i'm in favor of the less complicated, and likely more robust solution, adjusting the refresh rate for wikidatawiki_content index down to 5s. Based on the current rates i think this will be reasonable. We can test on the codfw cluster first which receives all the same updates as eqiad. If 5s isn't fast enough I would have to rethink the forced refreshes, as worst case of a 1s refresh would double the refresh rate across the cluster. That might also be acceptable but a little harder to guesstimate.


TASK DETAIL
https://phabricator.wikimedia.org/T183053

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: EBernhardson
Cc: debt, jhsoby, Lydia_Pintscher, EBernhardson, dcausse, Aklapper, Smalyshev, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, LawExplorer, Avner, Gehel, Jdrewniak, FloNight, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to