https://bugzilla.wikimedia.org/show_bug.cgi?id=56882

--- Comment #4 from Tim Starling <[email protected]> ---
The error message indicates that a libmemcached memcached_connect() call gave a
result of MEMCACHED_SERVER_TEMPORARILY_DISABLED. This can happen if
memcached_mark_server_for_timeout() was called on the server, which can happen
if memcached_quit_server is called with io_death=true, which can happen in all
sorts of different cases in io.cc and in one case in response.cc. More
investigation is needed to determine exactly which case is causing it. 

Unfortunately, MEMCACHED_BEHAVIOR_RETRY_TIMEOUT (i.e. retry_timeout in our
config) has a minimum of one whole second. In the case of PHP talking to
twemproxy, immediate reconnection would probably be a better policy. The
one-second timeout is why we see floods of messages in bursts that last
approximately one second each.

The fact that pmtpa apaches responding to pybal monitoring requests are heavily
represented in the logs may be caused by transient packet loss on the pmtpa to
eqiad link, which would increase the rate of connection timeouts.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to