https://bugzilla.wikimedia.org/show_bug.cgi?id=39011
Web browser: ---
Bug #: 39011
Summary: redirect, pagelinks and page.page_is_redirect table
information out of sync
Product: Wikimedia
Version: unspecified
Platform: All
OS/Version: All
Status: NEW
Severity: normal
Priority: Unprioritized
Component: General/Unknown
AssignedTo: [email protected]
ReportedBy: [email protected]
Classification: Unclassified
Mobile Platform: ---
I imported the page, pagelinks and redirect tables from the 20120702 dump in
order to analyze the pagelink structure. For this i need to resolve redirects
and found some weirdness in the process:
There are some pages in the redirect table which don't have the
page.page_is_redirect flag set:
select * from redirect join page on rd_from=page_id where page_is_redirect !=
1 limit 100;
Results in 23 rows: http://p.defau.lt/?smGJYbXXlqwLPhreqbe8VA
Another weirdness: there are pages marked as page.page_is_redirect but don't
appear in the redirect table, even though that might be explained here:
http://www.mediawiki.org/wiki/Manual:Redirect_table (see the NOTE as of Aug
2007)
select * from page left join redirect on page_id=rd_from where
page_is_redirect != 0 and rd_from is null limit 100;
Results in 5 rows: http://p.defau.lt/?pwJZMIVtTGGZYb69o8YWxg
Matthew_ from #wikimedia-tech ran these queries on the toolserver and seems the
same problems exist in production.
Yet another weirdness can be found in the pagelinks table: there are ~25000
pagelinks for redirect pages which don't just link to the redirected page but
to other pages alone in the article namespace:
select count(1) from page inner join redirect on page_id = rd_from inner join
pagelinks on rd_from=pl_from and rd_namespace=pl_namespace where pl_namespace =
0 and rd_title != pl_title;
+----------+
| count(1) |
+----------+
| 25168 |
+----------+
1 row in set (1 hour 44 min 35.16 sec)
Would be cool if someone could have a look.
--
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l