Joe added a comment. After a few back and forth, We're pretty sure the cause of the outage was due to the changes in the jobchron service on the jobrunners that were released on saturday via
https://gerrit.wikimedia.org/r/#/c/208408/ when I correctly restarted the jobchron service (which is not named at all on the jobrunners deploy page on wikitech) after reverting that change the contentions on redis disappeared. We earlier tracked the problem to redis maxing out 1 CPU, while blocked in the lua interpreter, which is probably single-threaded. TASK DETAIL https://phabricator.wikimedia.org/T97930 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Joe Cc: Stryn, Joe, Krenair, Steinsplitter, Jianhui67, Lydia_Pintscher, Sjoerddebruin, Romaine, Aklapper, Multichill, Wikidata-bugs, RobH, aude, GWicke, mark, faidon, fgiunchedi, Dzahn, chasemp _______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
