https://bugzilla.wikimedia.org/show_bug.cgi?id=57642
--- Comment #8 from Sean Pringle <[email protected]> --- The issue is on labsdb1002 and is a flow-on effect from this incident http://lists.wikimedia.org/pipermail/labs-l/2013-November/001883.html . The labsdb1002:3308 dewiki.revision table is still being synced from upstream by pt-table-sync and it is affecting replication. Context: - Originally replication was stopped completely and a full dump/restore from upstream dewiki was done, however labsdb1002:3308 mysqld crashed in the process (see below). The revision table was only partially restored. - To avoid blatting labs user data with a full rebuild affecting all wikis, I switched to using pt-table-sync with replication on the weekend to bring revision back up to full row count. However labsdb1002 has since crashed again with the kernel OOM killer sniping mysqld:3308. The sync process is batched and low footprint (where the dump method was not) but other labsdb txns must still be slowed down enough to add up to an infrequent mem usage spike. Therefore yesterday I reduced the InnoDB buffer pool size for all three labsdb1002 mysqld instances by 25%. OOM killer has not struck since and based on row counts the dewiki.revision sync process should resolve within the next 12h. -- You are receiving this mail because: You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
