https://bugzilla.wikimedia.org/show_bug.cgi?id=56882
Tim Starling <tstarl...@wikimedia.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|High |Unprioritized --- Comment #5 from Tim Starling <tstarl...@wikimedia.org> --- Here's an aggregation of error messages in the current memcached-serious.log: 1 ITEM TOO BIG 2081 CONNECTION FAILURE 8199483 SERVER HAS FAILED AND IS DISABLED UNTIL TIMED RETRY 9 SERVER ERROR 101 A TIMEOUT OCCURRED 3 A BAD KEY WAS PROVIDED/CHARACTERS OUT OF RANGE If you exclude TIMED RETRY messages, you get only 27 different appservers, whereas there are 387 appservers in the log overall. CONNECTION FAILURE and A TIMEOUT OCCURRED can lead to a flood of TIMED RETRY messages afterwards, but obviously they can't account for most of the TIMED RETRY messages. In fact, all 2081 CONNECTION FAILURE messages in this log came from snapshot3. So we need to look for unlogged causes of servers being marked for timed retry. -- You are receiving this mail because: You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l