hoo added a comment.

In https://phabricator.wikimedia.org/T123867#1961241, @jcrespo wrote:

> I suppose it is possible, there is a spike of 26 seconds on lag on db1045, 
> but probably only for a few seconds. But it is not s5 going read only- it is 
> the API going read only because "too much lag" on db1045. It is a protection 
> measure. It means it is working as intended. We can minimize this happening 
> (we mentioned having a second API server, when we get the hardware), avoid 
> lag by fixing mediawiki's queries, but this is not an error that should not 
> happen- it is meant to force users to retry and not saturate the servers.


The API only goes read only if more than half of the servers are lagged for 
more than 5s. That really should not happen unless there actually are too many 
writes.
I agree that it's ok for that to happen sometimes, but it shouldn't happen 
often.

To actually know how often this happens, I would like to get 
https://gerrit.wikimedia.org/r/264595 merged and deployed, please review.

Maybe we should increase `APIMaxLagThreshold` to 7s or even 10s? In order to be 
able to make such choices, we might also want to log the lag times in case the 
API goes read only.


TASK DETAIL
  https://phabricator.wikimedia.org/T123867

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hoo
Cc: Aklapper, StudiesWorld, aaron, daniel, aude, Lydia_Pintscher, Multichill, 
jcrespo, hoo, Wikidata-bugs, Mbch331, Krenair



_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to