[Bug 51254] tag_summary missing records

2014-02-03 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=51254

Sam Reed (reedy) s...@reedyboy.net changed:

   What|Removed |Added

 Blocks||40867

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 51254] tag_summary missing records

2013-10-18 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=51254

Bartosz Dziewoński matma@gmail.com changed:

   What|Removed |Added

 Status|REOPENED|RESOLVED
 Resolution|--- |FIXED
   Assignee|jforres...@wikimedia.org|sprin...@wikimedia.org

--- Comment #20 from Bartosz Dziewoński matma@gmail.com ---
That was bug 52077. Closing this.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 51254] tag_summary missing records

2013-07-25 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=51254

--- Comment #18 from Steven Walling swall...@wikimedia.org ---
(In reply to comment #16)
 Much better, but I'm still seeing some issues:
 
 Looking for 500 blanking tags gives 498 blanking plus 2 labeled as just
 mobile edit.
 
 http://en.wikipedia.org/w/api.
 php?action=querylist=recentchangesrctag=blankingrclimit=500rcprop=user%7C
 comment%7Ctitle%7Ctags%7Ctimestamp

There are other strange things going on with tags... 

http://en.wikipedia.org/wiki/Wikipedia_talk:Tags#Incorrect_tagging

Not sure if it's related or if we should file a separate bug for incorrect
tagging. I think mobile is also suffering from this issue (or was as of
yesterday).

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 51254] tag_summary missing records

2013-07-25 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=51254

--- Comment #19 from Bartosz Dziewoński matma@gmail.com ---
Whatever is causing that (maybe just a misconfigured local filter?), it's most
likely not related to this bug.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 51254] tag_summary missing records

2013-07-17 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=51254

--- Comment #17 from Robert Rohde ro...@robertrohde.com ---
As a follow up, the two problematic tags I note in Comment 16 are both recent. 
It is possible they have a different underlying cause than the previous
corruption.  For example, this might represent a logic error in how the mobile
edit tag is being recorded.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 51254] tag_summary missing records

2013-07-16 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=51254

--- Comment #12 from Sean Pringle sprin...@wikimedia.org ---
As Robert suggested in comment 8, the rebuild process missed some rows where
revisions had multiple tags.

The script has been fixed and will run in batches on enwiki today. More info
shortly...

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 51254] tag_summary missing records

2013-07-16 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=51254

--- Comment #13 from Sean Pringle sprin...@wikimedia.org ---
Btw, change_tag still looks complete to me; the binlog shows no problems there.
Should just be the tag_summary rebuild logic at fault.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 51254] tag_summary missing records

2013-07-16 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=51254

--- Comment #14 from Sean Pringle sprin...@wikimedia.org ---
Rebuild #2 of tag_summary has completed and the reports in comment 8 look
better (to me). Anyone care to verify...

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 51254] tag_summary missing records

2013-07-16 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=51254

--- Comment #15 from James Forrester jforres...@wikimedia.org ---
(In reply to comment #14)
 Rebuild #2 of tag_summary has completed and the reports in comment 8 look
 better (to me). Anyone care to verify...

Appears to work for me, yes. Might be worth waiting for others to weigh-in, but
from my POV this is fixed.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 51254] tag_summary missing records

2013-07-16 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=51254

--- Comment #16 from Robert Rohde ro...@robertrohde.com ---
Much better, but I'm still seeing some issues:

Looking for 500 blanking tags gives 498 blanking plus 2 labeled as just
mobile edit.

http://en.wikipedia.org/w/api.php?action=querylist=recentchangesrctag=blankingrclimit=500rcprop=user%7Ccomment%7Ctitle%7Ctags%7Ctimestamp

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 51254] tag_summary missing records

2013-07-15 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=51254

--- Comment #9 from Andre Klapper aklap...@wikimedia.org ---
(In reply to comment #8)
 A API query of 200 revisions tags as flagged as blanking:
 While this query returns 200 entries, we find that only 188 of them report as
 actually having the blanking tag.

That's still the case today.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 51254] tag_summary missing records

2013-07-15 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=51254

Greg Grossmeier g...@wikimedia.org changed:

   What|Removed |Added

   Priority|Highest |High
   Assignee|sprin...@wikimedia.org  |jforres...@wikimedia.org

--- Comment #10 from Greg Grossmeier g...@wikimedia.org ---
Lowering priority a bit since I don't there is data loss here (the table that
was used to recreate the data still exists).

James: Assigning to you to determine the priority for getting around to fixing
this data (since it affects VE related data, and you know what metrics are
being tracked).

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 51254] tag_summary missing records

2013-07-15 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=51254

--- Comment #11 from Sean Pringle sprin...@wikimedia.org ---
Am investigating whether the tag_summary rebuild was conceptually flawed with
regard to revisions with multiple tags, or not.

Also dumping enwiki binlogs on a slave (we have a month's worth) and pulling
out all change_tag queries. Will reload them offline and join against a copy of
change_tag to prove whether it is, in fact, completely intact.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 51254] tag_summary missing records

2013-07-12 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=51254

--- Comment #1 from Bartosz Dziewoński matma@gmail.com ---
Only seems to affect en.wp right now (works correctly on pl.wp and mw.org, for
example).

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 51254] tag_summary missing records

2013-07-12 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=51254

Greg Grossmeier g...@wikimedia.org changed:

   What|Removed |Added

   Priority|Unprioritized   |Highest
 CC||g...@wikimedia.org
   Severity|normal  |major

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 51254] tag_summary missing records

2013-07-12 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=51254

Rob Lanphier ro...@wikimedia.org changed:

   What|Removed |Added

 CC||ro...@wikimedia.org

--- Comment #2 from Rob Lanphier ro...@wikimedia.org ---
Sean and Asher narrowed this down to a problem with the schema change tool that
we use, and are working on a strategy to fix the data.  This looks like it's
strictly a db-related problem that once fixed should stay fixed (assuming we
don't try another similar schema migration before an upstream fix is made to
the migration tool)

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 51254] tag_summary missing records

2013-07-12 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=51254

Rob Lanphier ro...@wikimedia.org changed:

   What|Removed |Added

   Assignee|wikibugs-l@lists.wikimedia. |sprin...@wikimedia.org
   |org |

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 51254] tag_summary missing records

2013-07-12 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=51254

--- Comment #3 from Bartosz Dziewoński matma@gmail.com ---
Was it determined if any other databases apart from en.wp's one were affected?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 51254] tag_summary missing records

2013-07-12 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=51254

--- Comment #4 from Sam Reed (reedy) s...@reedyboy.net ---
(In reply to comment #3)
 Was it determined if any other databases apart from en.wp's one were
 affected?

Not sure. The wikis that potentially may have this issue are:

+   'arwiki' = true,
+   'commonswiki' = true,
+   'cswiki' = true,
+   'dewiki' = true,
+   'elwiki' = true,
+   'enwiki' = true,
+   'enwikisource' = true,
+   'enwiktionary' = true,
+   'eswiki' = true,
+   'etwiki' = true,
+   'fawiki' = true,
+   'fiwiki' = true,
+   'frwiki' = true,
+   'hewiki' = true,
+   'huwiki' = true,
+   'idwiki' = true,
+   'itwiki' = true,
+   'jawiki' = true,
+   'ltwiki' = true,
+   'mrwiki' = true,
+   'nlwiki' = true,
+   'plwiki' = true,
+   'ptwiki' = true,
+   'rowiki' = true,
+   'ruwiki' = true,
+   'simplewiki' = true,
+   'svwiki' = true,
+   'trwiki' = true,
+   'ukwiki' = true,
+   'zhwiki' = true,

cf bug 40867#c6

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 51254] tag_summary missing records

2013-07-12 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=51254

--- Comment #5 from Sean Pringle sprin...@wikimedia.org ---
Firstly, we've determined this problem occurred due to an (apparent) bug in
pt-online-schema-change when using a combination of:

- A table without primary key
- A table with unique indexes that all include nullable columns
- An unfortunately timed REPLACE statement in normal db traffic

Posc does online table alteration by:

- Creating a copy of the table with altered schema
- Setting triggers on the original table to keep the copy updated
- Copying data across using a batch process

In this case, posc set a DELETE trigger on tag_summary using a poor UNIQUE
index (ts_log_id) with low cardinality and a nullable field. Then during the
batching process, an external REPLACE statement with ts_log_id=NULL caused many
too many rows to be deleted in the temporary table being altered. Given that
many rows in tag_summary have ts_log_id=NULL, the table was massively reduced
in size.

Now to the fix:

We've checked the other wikis and found no problems; only enwiki was affected.

Furthermore, only enwiki.tag_summary was affected. We've verified that
enwiki.change_tag is complete and did not suffer the same problem. This was
based on:

- Index cardinality and table size information collected before running the
schema migration
- An investigation of the events in the binary log surrounding the migration
period

Currently we are rebuilding tag_summary based on change_tag data. That will
complete within 30 mins at the time of writing this comment.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 51254] tag_summary missing records

2013-07-12 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=51254

--- Comment #6 from Sean Pringle sprin...@wikimedia.org ---
enwiki.tag_summary rebuild is complete.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 51254] tag_summary missing records

2013-07-12 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=51254

Steven Walling swall...@wikimedia.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from Steven Walling swall...@wikimedia.org ---
Just checked this on-wiki as well. Seems fixed.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 51254] tag_summary missing records

2013-07-12 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=51254

Robert Rohde ro...@robertrohde.com changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 CC||ro...@robertrohde.com
 Resolution|FIXED   |---

--- Comment #8 from Robert Rohde ro...@robertrohde.com ---
Sorry to add to what I'm sure was a bit of a hectic day for someone, but I'm
still seeing lingering bits of corruption.  Perhaps some sort of edge case that
wasn't handled correctly by the rebuild?  99.9% of tags may be okay at this
point, but here are some example that still seem to be errors.

A API query of 200 revisions tags as flagged as blanking:

http://en.wikipedia.org/w/api.php?action=querylist=recentchangesrctag=blankingrclimit=200rcprop=user%7Ccomment%7Ctitle%7Ctags%7Ctimestamp|idsrccontinue=2013-07-12T22:20:40Z|589061595

While this query returns 200 entries, we find that only 188 of them report as
actually having the blanking tag.

The remainder are things like 
  rcid=590123889 timestamp=2013-07-12T14:30:16Z
  tagvisualeditor/tag

  rcid=590032703 timestamp=2013-07-12T00:33:31Z 
  tagmobile edit/tag

Where some other tag is reported but the expected blanking tag is not
reported.

For another example of this issue see the API query for the
visualeditor-needcheck tag:

http://en.wikipedia.org/w/api.php?action=querylist=recentchangesrctag=visualeditor-needcheckrclimit=200rcprop=user%7Ccomment%7Ctitle%7Ctags%7Ctimestamp|ids

This tag should only be applied if the visualeditor tag is also present, but
we observe that most of the results have either visualeditor or
visualeditor-needcheck but not both.  A few entries even have other tags
entirely.


What appears to have happened is that rebuild didn't correctly handle cases
where a single revision was subject to multiple tags.  Instead it looks as
though the rebuilt table applies at most one tag to each of the historical
revisions.  Most of the time that's okay since few revisions actually have
multiple tags, but it still leaves a bit of corruption and missing data on the
rare cases when a revision is expected to have multiple tags.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l