[Bug 52853] nlwiki logged in users shown wrong version of page, possible db corruption

2013-09-09 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=52853

Sean Pringle sprin...@wikimedia.org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 52853] nlwiki logged in users shown wrong version of page, possible db corruption

2013-08-18 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=52853

--- Comment #23 from Sean Pringle sprin...@wikimedia.org ---
Right, sorry, s/page_latest/page_random/. Coffee was wearing off.

http://www.percona.com/doc/percona-toolkit/2.2/pt-table-sync.html#cmdoption-pt-table-sync--float-precision

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 52853] nlwiki logged in users shown wrong version of page, possible db corruption

2013-08-18 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=52853

--- Comment #24 from MZMcBride b...@mzmcbride.com ---
It seems like the cause of this issue has been identified (comment 11) and the
issue has been resolved (no further reports of the issue and the relevant DB
has been re-synced and re-pooled).

I'm not sure there's much else to be done here. Can this bug report be marked
resolved/fixed?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 52853] nlwiki logged in users shown wrong version of page, possible db corruption

2013-08-17 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=52853

--- Comment #19 from Sean Pringle sprin...@wikimedia.org ---
db1009 has been synced with the s2 master and returned to the pool.

The specific errors identified here (the missing rev ids, page_latest, etc)
have been spot-checked and are intact. If anyone spots a recurrence, please
shout.

One small observation: At least some of the original page_latest differences I
observed were due to a difference in FLOAT/DOUBLE precision which threw off the
checksum utility. Rounding to a fixed precision reduced the mismatches
considerably, but did not eliminate them. Therefore I can't say definitively
whether the symptom in comment 17 (wrong version) is due to this bug; just keep
it in mind...

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 52853] nlwiki logged in users shown wrong version of page, possible db corruption

2013-08-17 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=52853

--- Comment #20 from Bawolff (Brian Wolff) bawolff...@gmail.com ---
(In reply to comment #19)

 One small observation: At least some of the original page_latest differences
 I
 observed were due to a difference in FLOAT/DOUBLE precision which threw off
 the
 checksum utility. Rounding to a fixed precision reduced the mismatches
 considerably, but did not eliminate them. Therefore I can't say definitively
 whether the symptom in comment 17 (wrong version) is due to this bug; just
 keep
 it in mind...

What? Page_latest should be an integer. Its a foreign key coresponding to
rev_id.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 52853] nlwiki logged in users shown wrong version of page, possible db corruption

2013-08-17 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=52853

--- Comment #21 from Ariel T. Glenn ar...@wikimedia.org ---
page_random was different in some of the rows; I guess that's the floating vs
double mentioned above.  separately, page_latest was different in some rows as
well, although not nearly as many.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 52853] nlwiki logged in users shown wrong version of page, possible db corruption

2013-08-17 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=52853

Platonides platoni...@gmail.com changed:

   What|Removed |Added

 CC||platoni...@gmail.com

--- Comment #22 from Platonides platoni...@gmail.com ---
page_random is only set on page insertion.

I also find strange that the tool noticed differences due to float/double. I
expect it to consistently read the same data (no matter if float or double) in
both systems...

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 52853] nlwiki logged in users shown wrong version of page, possible db corruption

2013-08-15 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=52853

--- Comment #12 from Gerrit Notification Bot gerritad...@wikimedia.org ---
Change 79178 had a related patch set uploaded by Springle:
remove db1009 from action while investigating bug 52853

https://gerrit.wikimedia.org/r/79178

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 52853] nlwiki logged in users shown wrong version of page, possible db corruption

2013-08-15 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=52853

--- Comment #13 from Gerrit Notification Bot gerritad...@wikimedia.org ---
Change 79178 merged by Springle:
remove db1009 from action while investigating bug 52853

https://gerrit.wikimedia.org/r/79178

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 52853] nlwiki logged in users shown wrong version of page, possible db corruption

2013-08-15 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=52853

p858snake p858sn...@gmail.com changed:

   What|Removed |Added

 Status|PATCH_TO_REVIEW |NEW
 CC||p858sn...@gmail.com

--- Comment #14 from p858snake p858sn...@gmail.com ---
(Back to new, Patch is just for removing a db-box from the rotation)

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 52853] nlwiki logged in users shown wrong version of page, possible db corruption

2013-08-15 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=52853

Sean Pringle sprin...@wikimedia.org changed:

   What|Removed |Added

 Status|NEW |ASSIGNED

--- Comment #15 from Sean Pringle sprin...@wikimedia.org ---
db1009 is depooled for investigation and, eventually, resync. Please shout if
anyone still sees the issue.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 52853] nlwiki logged in users shown wrong version of page, possible db corruption

2013-08-15 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=52853

--- Comment #16 from Sean Pringle sprin...@wikimedia.org ---
db1009 `revisions` is missing rev_id's 38674172 - 38674177. Several other
tables checksummed (recentchanges, loging, page) also show small numbers of
missing rows and/or missing updates.

Therefore db1009 is now doing a full checksum using the percona tools. 

Based on missing row timestamps, the powercycle event seems the probable cause. 

However, don't know yet how replication replication managed to *avoid* dying
unless some funky disk syncing happened. Unfortunately relay logs don't go back
that far (preciesly because the SQL thread thought all was fine and deleted
them after processing).

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 52853] nlwiki logged in users shown wrong version of page, possible db corruption

2013-08-15 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=52853

--- Comment #17 from Bawolff (Brian Wolff) bawolff...@gmail.com ---
(In reply to comment #16)
 db1009 `revisions` is missing rev_id's 38674172 - 38674177. Several other
 tables checksummed (recentchanges, loging, page) also show small numbers of
 missing rows and/or missing updates.
 
 Therefore db1009 is now doing a full checksum using the percona tools. 
 

If page_latest for the vote page is wrong for that slave, it would explain the
other symptom people are experiancing ( wrong version coming up)

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 52853] nlwiki logged in users shown wrong version of page, possible db corruption

2013-08-15 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=52853

--- Comment #18 from Sean Pringle sprin...@wikimedia.org ---
Can confirm some db1009 `page` records have wrong page_latest.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 52853] nlwiki logged in users shown wrong version of page, possible db corruption

2013-08-14 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=52853

Bawolff (Brian Wolff) bawolff...@gmail.com changed:

   What|Removed |Added

Summary|On a voting page on nl-wiki |nlwiki logged in users
   |users get shown an older|shown wrong version of
   |page and/or edit in an  |page, possible db
   |older page  |corruption

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 52853] nlwiki logged in users shown wrong version of page, possible db corruption

2013-08-14 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=52853

MZMcBride b...@mzmcbride.com changed:

   What|Removed |Added

   Priority|Unprioritized   |High

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 52853] nlwiki logged in users shown wrong version of page, possible db corruption

2013-08-14 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=52853

Greg Grossmeier g...@wikimedia.org changed:

   What|Removed |Added

 CC||afeld...@wikimedia.org,
   ||sprin...@wikimedia.org

--- Comment #9 from Greg Grossmeier g...@wikimedia.org ---
cc'ing Sean and Asher based on Brian's db slave corruption theory. This is a
pretty unfun looking issue.

Also, I can confirm the flipping between showing Dqfn13 and not on the last url
in comment 8.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 52853] nlwiki logged in users shown wrong version of page, possible db corruption

2013-08-14 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=52853

Ariel T. Glenn ar...@wikimedia.org changed:

   What|Removed |Added

 CC||ar...@wikimedia.org

--- Comment #10 from Ariel T. Glenn ar...@wikimedia.org ---
select * from revision where rev_page = 8838 order by rev_id desc limit 140;

on db1002, the last five revision ids are, in order: 
38674247  38674177  38674166  38674128  38674117

on db1009, the last five revision ids are:
38674247  38674166  38674128  38674117  38674097

and that is because:
mysql:wikiadmin@db1009 [nlwiki] select * from revision where rev_id =
38674177;
Empty set (0.01 sec)

whereas on all other slaves (that show up in the noc dbtree listing) the record
is present.

Not sure how to investigate the cause of the corruption though.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 52853] nlwiki logged in users shown wrong version of page, possible db corruption

2013-08-14 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=52853

--- Comment #11 from Ariel T. Glenn ar...@wikimedia.org ---
Note that according to https://wikitech.wikimedia.org/wiki/Server_Admin_Log we
had

 13:59 paravoid: powercycling db1009, down for 20h, kernel prints BUG: soft
lockup 

on Aug 11, maybe something didn't quite recover right from that. (Timestamp of
the revision in question is 20130810173157)

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l