https://bugzilla.wikimedia.org/show_bug.cgi?id=56577

--- Comment #6 from Sean Pringle <[email protected]> ---
Slave drift is not a particular bug with MariaDB, but a series of possible
problems mostly related to statements that are non-deterministic plus a few
fringe cases.

http://dev.mysql.com/doc/refman/5.5/en/replication-rbr-safe-unsafe.html

Since we upgraded to MariaDB 5.5 our mysqld error logs have been steadily
growing with warnings related to unsafe binary log statements that could cause
a Slave to drift out of sync with its master. An example (not specifically
related to this bug):

131028  5:58:22 [Warning] Unsafe statement written to the binary log using
statement format since BINLOG_FORMAT = STATEMENT. INSERT... ON DUPLICATE KEY
UPDATE

These warning messages didn't appear as frequently in MySQL 5.1 error logs, not
because it was safer then, but because they're additional warnings added to
MariaDB 5.5 to identify possible problems exposed in bug reports filed *since*
5.1 went GA. They indicate that our slave datasets have been slowly drifting
for some time.

I've been investigating and correcting the problem using the pt-table-sync tool
[1]. Unfortunately that tool requires a primary or unique key on each table to
operate efficiently, which `archive` did not have, hence it may have been
further out of sync on slaves than other tables. This is also the reason I
wanted to find a way to do the schema change on the master rather than rotating
a slave.

In this case archive schema was modified by:

- Delaying a slave as fallback
- Creating a copy or `archive` as `_archive_new`
- Altering `_archive_new` to add ar_id PK
- Altering `_archive_new` to add ar_hash (an md5 hash of the row, for sanity
checks)
- Adding INSERT/UPDATE/DELETE triggers to `archive` to keep `_arcive_new` in
sync
- Batch inserting data from `archive` to `_archive_new`
- Cross checking the two tables for differences
- Removing ar_hash
- Switching the two tables
- Keeping old table around as `archive_save` for a time

The batch transfer step was done with pt-online-schema-change using the
temporary ar_hash as key (normally it would use the table primary key, but of
course `archive` didn't have one). 

[1] As there may yet be more unforseen bugs with SBR, the only true fix is to
switch to RBR.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to