Hi, one of our database servers, db22, had a disk failure a little while ago, and while this failed disk was to be replaced another RAID problem appeared. This caused downtime of db22 and users started reporting problems at around 7 pm:
19:09 < malafaya> so, what's wrong? was even before: 19:19 <+nagios-wm> PROBLEM - Host db22 is DOWN: PING CRITICAL - Packet loss = 100% Since this affected CentralAuth, users kept getting error messages like: [db22: s4] 10.0.6.32 Database ops immediately started moving a database slave to be the new master, while the hardware issue on db22 is still being investigated. The current effect is that commons is read-only. The expected downtime was at 10 minutes when writing this. -- -- Daniel Zahn <[email protected]> _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
