https://bugzilla.wikimedia.org/show_bug.cgi?id=51254

--- Comment #5 from Sean Pringle <[email protected]> ---
Firstly, we've determined this problem occurred due to an (apparent) bug in
pt-online-schema-change when using a combination of:

- A table without primary key
- A table with unique indexes that all include nullable columns
- An unfortunately timed REPLACE statement in normal db traffic

Posc does online table alteration by:

- Creating a copy of the table with altered schema
- Setting triggers on the original table to keep the copy updated
- Copying data across using a batch process

In this case, posc set a DELETE trigger on tag_summary using a poor UNIQUE
index (ts_log_id) with low cardinality and a nullable field. Then during the
batching process, an external REPLACE statement with ts_log_id=NULL caused many
too many rows to be deleted in the temporary table being altered. Given that
many rows in tag_summary have ts_log_id=NULL, the table was massively reduced
in size.

Now to the fix:

We've checked the other wikis and found no problems; only enwiki was affected.

Furthermore, only enwiki.tag_summary was affected. We've verified that
enwiki.change_tag is complete and did not suffer the same problem. This was
based on:

- Index cardinality and table size information collected before running the
schema migration
- An investigation of the events in the binary log surrounding the migration
period

Currently we are rebuilding tag_summary based on change_tag data. That will
complete within 30 mins at the time of writing this comment.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to