https://bugzilla.wikimedia.org/show_bug.cgi?id=57642

--- Comment #8 from Sean Pringle <[email protected]> ---
The issue is on labsdb1002 and is a flow-on effect from this incident
http://lists.wikimedia.org/pipermail/labs-l/2013-November/001883.html . The
labsdb1002:3308 dewiki.revision table is still being synced from upstream by
pt-table-sync and it is affecting replication. Context:

- Originally replication was stopped completely and a full dump/restore from
upstream dewiki was done, however labsdb1002:3308 mysqld crashed in the process
(see below). The revision table was only partially restored.

- To avoid blatting labs user data with a full rebuild affecting all wikis, I
switched to using pt-table-sync with replication on the weekend to bring
revision back up to full row count.

However labsdb1002 has since crashed again with the kernel OOM killer sniping
mysqld:3308. The sync process is batched and low footprint (where the dump
method was not) but other labsdb txns must still be slowed down enough to add
up to an infrequent mem usage spike.

Therefore yesterday I reduced the InnoDB buffer pool size for all three
labsdb1002 mysqld instances by 25%. OOM killer has not struck since and based
on row counts the dewiki.revision sync process should resolve within the next
12h.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to