EBernhardson added a comment.
Usually the first stop for this kind of error would be reviewing the `ATS Backends <-> Origin Servers Overview` which suggest a low rate of 5xxs, typically 1-5% of requests fail. In a quick review of the last few 500 requests on one of the servers they were all malformed queries. We may need to look into more specific timespans rather than the generic 500 errors. Modifying one of the dashboard queries[1] to return success rate per 15 minutes and running it against thanos to get all DC's, looking for time periods of low success, the following time periods should be reviewed: 2022-04-16T17:30-18:10 2022-04-17T08:30-10:00 2022-04-22T16:26-17:12 2022-04-22T19:09-19:36 2022-04-26T16:20-17:37 2022-05-04T19:50-21:42 If this turns up the problem we could consider how it could be turned into an alert. [1] sum(increase(trafficserver_backend_requests_seconds_count{status=~"2[0-9][0-9]", cluster=~"cache_text", backend=~"wcqs\\.discovery\\.wmnet"}[15m])) by (backend) / sum(increase(trafficserver_backend_requests_seconds_count{status=~"[25][0-9][0-9]", cluster=~"cache_text", backend=~"wcqs\\.discovery\\.wmnet"}[15m])) by (backend) TASK DETAIL https://phabricator.wikimedia.org/T306899 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson Cc: EBernhardson, FRomeo_WMF, GFontenelle_WMF, Gehel, Fuzheado, Aklapper, Dominicbm, Astuthiodit_1, karapayneWMDE, Invadibot, MPhamWMF, maantietaja, CBogen, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
_______________________________________________ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org