https://bugzilla.wikimedia.org/show_bug.cgi?id=29233

--- Comment #5 from Asher Feldman <[email protected]> 2011-06-17 00:28:30 
UTC ---
During an 8 minute period on May 29 where mysql was down on db32 after being
killed by the kernel due to an OOM condition but the server was otherwise up,
en.wikipedia.org was observed to be down hard and 1209654 messages of the
following type were logged:

Sun May 29 2:05:32 UTC 2011     srv230  enwiki  Error connecting to 10.0.6.42: 
Lost connection to MySQL server at 'reading initial communication packet',
system error: 111
Sun May 29 2:05:32 UTC 2011     srv170  enwiki  Error connecting to 10.0.6.42: 
Can't connect to MySQL server on '10.0.6.42' (115)
Sun May 29 2:05:32 UTC 2011     srv175  enwiki  Error connecting to 10.0.6.42:
Lost connection to MySQL server at 'reading initial communication packet',
system error: 111 (10.0.6.42)

>From the message formatting, it appears that all of these messages were logged
within DatabaseMysql::open(). 

The connection failures should have occurred within milliseconds and it does
appear that the LoadBalancer class should handle such an occurrence with
minimal impact. However, LoadBalancer::reportConnectionError only calls
wfLogDBError() in the following two ways:


                        wfLogDBError( "LB failure with no last connection\n" );

                        wfLogDBError( "Connection error: {$this->mLastError}
({$server})\n" );

Neither of these messages appear in the dberror log during the enwiki / db32
outage.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to