Marostegui added a subscriber: mark.
Marostegui added a comment.

What we know for now is that there was a gap in replication between codfw and eqiad the 13th Sept (day after the failover), so whatever was inserted during that time on codfw master never reached eqiad (for all s8 table). We don't know why yet.
We believe the duration of this gap was between 40 minutes and 1 hour - we do not know yet for sure.

So, we have discussed this with @mark and we are going to address this with two different paths.

To mitigate the user impact as quickly as possible. Both approaches will be done in parallel

  1. We are going to fill out that data gap inserting data on eqiad replicas from codfw binlogs for: revision, text and page table (maybe also user).
  2. We are going to rebuild all the eqiad hosts from codfw hosts, so data will be identical for all the tables.

Apart from that, and in order to understand what happened we have:

  • Saved s8 previous backups so they don't get rotated
  • Saved binlogs from codfw s8 master
  • Increased binlog expiration on s8 codfw master to 60 days

We are pretty sure this has just affected s8, because it has only affected db1071.
@Addshore has kindly helped us to check s5 and s6 and both were clean.

Once this is all done, we will think about ways to prevent this from happening again, although it is pretty hard as we still don't know how this has happened (maybe a bug in GTID, mariadb or replication) and why it didn't break replication, which is the usual think that happens when you skip transactions, specially having servers with row based replication (sanitariums)


TASK DETAIL
https://phabricator.wikimedia.org/T206743

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Marostegui
Cc: mark, Stashbot, Nikki, Marostegui, daniel, TerraCodes, Liuxinyu970226, Addshore, Ladsgroup, Lea_Lacroix_WMDE, Lexicographical data, KaMan, Nandana, Banyek, jijiki, Mringgaard, AndyTan, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Minhnv-2809, Volans, Jonas, Luke081515, Wikidata-bugs, aude, Lydia_Pintscher, Darkdadaah, Mbch331, Jay8g, Krenair, akosiaris
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to