https://bugzilla.wikimedia.org/show_bug.cgi?id=56840
Web browser: ---
Bug ID: 56840
Summary: Special:Allpages is too slow!
Product: MediaWiki
Version: unspecified
Hardware: All
OS: All
Status: NEW
Severity: normal
Priority: Unprioritized
Component: Special pages
Assignee: [email protected]
Reporter: [email protected]
Classification: Unclassified
Mobile Platform: ---
The following query was pulled from ishmael:
/* SpecialAllpages::showToplevel */ select page_title from `page` where
page_namespace = ? and page_is_redirect = ? and (page_title >= ?) and
(page_title >= ?) order by page_title limit ?
Real sample:
SELECT /* SpecialAllpages::showToplevel 108.35.125.176 */ page_title FROM
`page` WHERE page_namespace = '0' AND page_is_redirect = '0' AND (page_title
>= '1887') AND (page_title >= 'Centennial_Trail_State_Park') ORDER BY
page_title ASC LIMIT 86786,2
Hundreds of concurrent instances of this have been seen. They touched ~9M rows
each, took over 100s, and had problematic clauses like that amazing LIMIT. They
are not duplicates, having different page_title values and limit offsets (yet
always a limit count of 2).
Furthermore batches of this query have been increasing in frequency.
The query itself comes from:
http://git.wikimedia.org/blob/mediawiki%2Fcore.git/ceba5987b0d36f134ffcd9e4f704d1608fe88b79/includes%2Fspecials%2FSpecialAllpages.php#L225
including that wonderful 'limit ...,2'.
It has this wonderful comment (at
http://git.wikimedia.org/blob/mediawiki%2Fcore.git/ceba5987b0d36f134ffcd9e4f704d1608fe88b79/includes%2Fspecials%2FSpecialAllpages.php#L175):
# TODO: Either make this *much* faster or cache the title index points
# in the querycache table.
...so that should probably be done.
My best guess is that some third-party crawler is trying to obtain the titles
of all articles by paging through Special:Allpages --- and, as we found out the
hard way --- generating the list that was is O(N^3) where N is how many pages
into the list you are.
If we "cach[ing] the title index points" we can get the complexity down to
O(N^2). Which is still bad, but an improvement.
To make the page request (amortized) O(1) we need to generate the index points
for the entire list of articles in a single traversal of all the articles
(possibly done in chunks, possibly done by a background task).
--
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l