https://bugzilla.wikimedia.org/show_bug.cgi?id=29233
--- Comment #5 from Asher Feldman <[email protected]> 2011-06-17 00:28:30 UTC --- During an 8 minute period on May 29 where mysql was down on db32 after being killed by the kernel due to an OOM condition but the server was otherwise up, en.wikipedia.org was observed to be down hard and 1209654 messages of the following type were logged: Sun May 29 2:05:32 UTC 2011 srv230 enwiki Error connecting to 10.0.6.42: Lost connection to MySQL server at 'reading initial communication packet', system error: 111 Sun May 29 2:05:32 UTC 2011 srv170 enwiki Error connecting to 10.0.6.42: Can't connect to MySQL server on '10.0.6.42' (115) Sun May 29 2:05:32 UTC 2011 srv175 enwiki Error connecting to 10.0.6.42: Lost connection to MySQL server at 'reading initial communication packet', system error: 111 (10.0.6.42) >From the message formatting, it appears that all of these messages were logged within DatabaseMysql::open(). The connection failures should have occurred within milliseconds and it does appear that the LoadBalancer class should handle such an occurrence with minimal impact. However, LoadBalancer::reportConnectionError only calls wfLogDBError() in the following two ways: wfLogDBError( "LB failure with no last connection\n" ); wfLogDBError( "Connection error: {$this->mLastError} ({$server})\n" ); Neither of these messages appear in the dberror log during the enwiki / db32 outage. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
