[Bug 52853] nlwiki logged in users shown wrong version of page, possible db corruption
https://bugzilla.wikimedia.org/show_bug.cgi?id=52853 Sean Pringle sprin...@wikimedia.org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 52853] nlwiki logged in users shown wrong version of page, possible db corruption
https://bugzilla.wikimedia.org/show_bug.cgi?id=52853 --- Comment #23 from Sean Pringle sprin...@wikimedia.org --- Right, sorry, s/page_latest/page_random/. Coffee was wearing off. http://www.percona.com/doc/percona-toolkit/2.2/pt-table-sync.html#cmdoption-pt-table-sync--float-precision -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 52853] nlwiki logged in users shown wrong version of page, possible db corruption
https://bugzilla.wikimedia.org/show_bug.cgi?id=52853 --- Comment #24 from MZMcBride b...@mzmcbride.com --- It seems like the cause of this issue has been identified (comment 11) and the issue has been resolved (no further reports of the issue and the relevant DB has been re-synced and re-pooled). I'm not sure there's much else to be done here. Can this bug report be marked resolved/fixed? -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 52853] nlwiki logged in users shown wrong version of page, possible db corruption
https://bugzilla.wikimedia.org/show_bug.cgi?id=52853 --- Comment #19 from Sean Pringle sprin...@wikimedia.org --- db1009 has been synced with the s2 master and returned to the pool. The specific errors identified here (the missing rev ids, page_latest, etc) have been spot-checked and are intact. If anyone spots a recurrence, please shout. One small observation: At least some of the original page_latest differences I observed were due to a difference in FLOAT/DOUBLE precision which threw off the checksum utility. Rounding to a fixed precision reduced the mismatches considerably, but did not eliminate them. Therefore I can't say definitively whether the symptom in comment 17 (wrong version) is due to this bug; just keep it in mind... -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 52853] nlwiki logged in users shown wrong version of page, possible db corruption
https://bugzilla.wikimedia.org/show_bug.cgi?id=52853 --- Comment #20 from Bawolff (Brian Wolff) bawolff...@gmail.com --- (In reply to comment #19) One small observation: At least some of the original page_latest differences I observed were due to a difference in FLOAT/DOUBLE precision which threw off the checksum utility. Rounding to a fixed precision reduced the mismatches considerably, but did not eliminate them. Therefore I can't say definitively whether the symptom in comment 17 (wrong version) is due to this bug; just keep it in mind... What? Page_latest should be an integer. Its a foreign key coresponding to rev_id. -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 52853] nlwiki logged in users shown wrong version of page, possible db corruption
https://bugzilla.wikimedia.org/show_bug.cgi?id=52853 --- Comment #21 from Ariel T. Glenn ar...@wikimedia.org --- page_random was different in some of the rows; I guess that's the floating vs double mentioned above. separately, page_latest was different in some rows as well, although not nearly as many. -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 52853] nlwiki logged in users shown wrong version of page, possible db corruption
https://bugzilla.wikimedia.org/show_bug.cgi?id=52853 Platonides platoni...@gmail.com changed: What|Removed |Added CC||platoni...@gmail.com --- Comment #22 from Platonides platoni...@gmail.com --- page_random is only set on page insertion. I also find strange that the tool noticed differences due to float/double. I expect it to consistently read the same data (no matter if float or double) in both systems... -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 52853] nlwiki logged in users shown wrong version of page, possible db corruption
https://bugzilla.wikimedia.org/show_bug.cgi?id=52853 --- Comment #12 from Gerrit Notification Bot gerritad...@wikimedia.org --- Change 79178 had a related patch set uploaded by Springle: remove db1009 from action while investigating bug 52853 https://gerrit.wikimedia.org/r/79178 -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 52853] nlwiki logged in users shown wrong version of page, possible db corruption
https://bugzilla.wikimedia.org/show_bug.cgi?id=52853 --- Comment #13 from Gerrit Notification Bot gerritad...@wikimedia.org --- Change 79178 merged by Springle: remove db1009 from action while investigating bug 52853 https://gerrit.wikimedia.org/r/79178 -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 52853] nlwiki logged in users shown wrong version of page, possible db corruption
https://bugzilla.wikimedia.org/show_bug.cgi?id=52853 p858snake p858sn...@gmail.com changed: What|Removed |Added Status|PATCH_TO_REVIEW |NEW CC||p858sn...@gmail.com --- Comment #14 from p858snake p858sn...@gmail.com --- (Back to new, Patch is just for removing a db-box from the rotation) -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 52853] nlwiki logged in users shown wrong version of page, possible db corruption
https://bugzilla.wikimedia.org/show_bug.cgi?id=52853 Sean Pringle sprin...@wikimedia.org changed: What|Removed |Added Status|NEW |ASSIGNED --- Comment #15 from Sean Pringle sprin...@wikimedia.org --- db1009 is depooled for investigation and, eventually, resync. Please shout if anyone still sees the issue. -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 52853] nlwiki logged in users shown wrong version of page, possible db corruption
https://bugzilla.wikimedia.org/show_bug.cgi?id=52853 --- Comment #16 from Sean Pringle sprin...@wikimedia.org --- db1009 `revisions` is missing rev_id's 38674172 - 38674177. Several other tables checksummed (recentchanges, loging, page) also show small numbers of missing rows and/or missing updates. Therefore db1009 is now doing a full checksum using the percona tools. Based on missing row timestamps, the powercycle event seems the probable cause. However, don't know yet how replication replication managed to *avoid* dying unless some funky disk syncing happened. Unfortunately relay logs don't go back that far (preciesly because the SQL thread thought all was fine and deleted them after processing). -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 52853] nlwiki logged in users shown wrong version of page, possible db corruption
https://bugzilla.wikimedia.org/show_bug.cgi?id=52853 --- Comment #17 from Bawolff (Brian Wolff) bawolff...@gmail.com --- (In reply to comment #16) db1009 `revisions` is missing rev_id's 38674172 - 38674177. Several other tables checksummed (recentchanges, loging, page) also show small numbers of missing rows and/or missing updates. Therefore db1009 is now doing a full checksum using the percona tools. If page_latest for the vote page is wrong for that slave, it would explain the other symptom people are experiancing ( wrong version coming up) -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 52853] nlwiki logged in users shown wrong version of page, possible db corruption
https://bugzilla.wikimedia.org/show_bug.cgi?id=52853 --- Comment #18 from Sean Pringle sprin...@wikimedia.org --- Can confirm some db1009 `page` records have wrong page_latest. -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 52853] nlwiki logged in users shown wrong version of page, possible db corruption
https://bugzilla.wikimedia.org/show_bug.cgi?id=52853 Bawolff (Brian Wolff) bawolff...@gmail.com changed: What|Removed |Added Summary|On a voting page on nl-wiki |nlwiki logged in users |users get shown an older|shown wrong version of |page and/or edit in an |page, possible db |older page |corruption -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 52853] nlwiki logged in users shown wrong version of page, possible db corruption
https://bugzilla.wikimedia.org/show_bug.cgi?id=52853 MZMcBride b...@mzmcbride.com changed: What|Removed |Added Priority|Unprioritized |High -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 52853] nlwiki logged in users shown wrong version of page, possible db corruption
https://bugzilla.wikimedia.org/show_bug.cgi?id=52853 Greg Grossmeier g...@wikimedia.org changed: What|Removed |Added CC||afeld...@wikimedia.org, ||sprin...@wikimedia.org --- Comment #9 from Greg Grossmeier g...@wikimedia.org --- cc'ing Sean and Asher based on Brian's db slave corruption theory. This is a pretty unfun looking issue. Also, I can confirm the flipping between showing Dqfn13 and not on the last url in comment 8. -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 52853] nlwiki logged in users shown wrong version of page, possible db corruption
https://bugzilla.wikimedia.org/show_bug.cgi?id=52853 Ariel T. Glenn ar...@wikimedia.org changed: What|Removed |Added CC||ar...@wikimedia.org --- Comment #10 from Ariel T. Glenn ar...@wikimedia.org --- select * from revision where rev_page = 8838 order by rev_id desc limit 140; on db1002, the last five revision ids are, in order: 38674247 38674177 38674166 38674128 38674117 on db1009, the last five revision ids are: 38674247 38674166 38674128 38674117 38674097 and that is because: mysql:wikiadmin@db1009 [nlwiki] select * from revision where rev_id = 38674177; Empty set (0.01 sec) whereas on all other slaves (that show up in the noc dbtree listing) the record is present. Not sure how to investigate the cause of the corruption though. -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 52853] nlwiki logged in users shown wrong version of page, possible db corruption
https://bugzilla.wikimedia.org/show_bug.cgi?id=52853 --- Comment #11 from Ariel T. Glenn ar...@wikimedia.org --- Note that according to https://wikitech.wikimedia.org/wiki/Server_Admin_Log we had 13:59 paravoid: powercycling db1009, down for 20h, kernel prints BUG: soft lockup on Aug 11, maybe something didn't quite recover right from that. (Timestamp of the revision in question is 20130810173157) -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l