https://bugzilla.wikimedia.org/show_bug.cgi?id=29233

--- Comment #6 from Tim Starling <[email protected]> 2011-06-17 00:44:24 
UTC ---
(In reply to comment #3)
> Apache/PHP doesn't select databases; MediaWiki's LoadBalancer class does. It
> should already be failing over to the next available server in the case of a
> connection error (MySQL error 2003 is "can't connect").

Yes, LoadBalancer has failover code. During the downtime on May 24, the
appropriate sort of connection error was logged:

Tue May 24 13:41:01 UTC 2011    srv191    rowiki    Connection error: No
working slave server: No working slave server: Unknown error ()

This error indicates that the fallback sequence was exhausted, so whatever is
going on, it's clear that we're not just letting connection error exceptions
leak out of LoadBalancer. There were 4775 instances on that day.

If we get an error in Database::query(), then we don't close the connection and
switch over to another database. I'm not sure if that's what CT is asking for.
We don't appear to properly log "MySQL server has gone away" or "Lost
connection to MySQL server during query" errors. They are dealt with by
automatically reconnecting, and then if the reconnection fails, a
DBConnectionError would probably be thrown.

There were 162,973 instances of "LB failure with no last connection" on May 24,
which is somewhat concerning. But it's hard to know if that's the problem of
interest. 

What we really need to know is: when exactly was this "site failure", and what
were the observed symptoms of it?

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to