https://bugzilla.wikimedia.org/show_bug.cgi?id=56882

Tim Starling <tstarl...@wikimedia.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|High                        |Unprioritized

--- Comment #5 from Tim Starling <tstarl...@wikimedia.org> ---
Here's an aggregation of error messages in the current memcached-serious.log:

1  ITEM TOO BIG
2081  CONNECTION FAILURE
8199483  SERVER HAS FAILED AND IS DISABLED UNTIL TIMED RETRY
9  SERVER ERROR
101  A TIMEOUT OCCURRED
3  A BAD KEY WAS PROVIDED/CHARACTERS OUT OF RANGE

If you exclude TIMED RETRY messages, you get only 27 different appservers,
whereas there are 387 appservers in the log overall. CONNECTION FAILURE and A
TIMEOUT OCCURRED can lead to a flood of TIMED RETRY messages afterwards, but
obviously they can't account for most of the TIMED RETRY messages. In fact, all
2081 CONNECTION FAILURE messages in this log came from snapshot3. So we need to
look for unlogged causes of servers being marked for timed retry.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to