I think that the problem, at least with this specific query, is with
the query structure. page_namespace and rev_user are both already
indexed in the page and revision tables, but the query planner doesn't
optimize the join "correctly" (maybe coz the page_namespace index is
long). Nevertheless, the query is slow because there isn't a user_page
table and you have to dig in the huge revision table, so if you
explicitly dump all the rev_page data into a derived table that might
help your query speed.
For example, try this alternate version:
SELECT COUNT(*), p.page_namespace
FROM page p
JOIN (SELECT DISTINCT rev_page FROM revision WHERE rev_user = XXXXX) r
ON (r.rev_page = p.page_id)
GROUP BY p.page_namespace
ORDER BY p.page_namespace
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: <derived2>
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 14202
Extra: Using temporary; Using filesort
*************************** 2. row ***************************
id: 1
select_type: PRIMARY
table: page
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: r.rev_page
rows: 1
Extra:
*************************** 3. row ***************************
id: 2
select_type: DERIVED
table: revision
type: ref
possible_keys: user_timestamp
key: user_timestamp
key_len: 4
ref:
rows: 79640
Extra: Using where; Using index; Using temporary
-darren
On 15 September 2008, at 2:18 PM, Aryeh Gregor wrote:
> On Mon, Sep 15, 2008 at 1:48 PM, Victor Vasiliev <[EMAIL PROTECTED]>
> wrote:
>> Is it possible to locally create some indexes specific for
>> toolserver?
>> As for now, it will help to seriously speed up most tools. E.g. index
>> (rev_user,rev_page) will speed up all editcounters, which usually
>> uses
>> query like
>> SELECT COUNT(*), page_namespace FROM page JOIN revision ON page_id =
>> rev_page WHERE rev_user = 37880 GROUP BY page_namespace ORDER BY
>> page_namespace;
>
> Why would that index speed up the access noticeably? There's already
> an index on (rev_user, rev_timestamp) (assuming that the toolserver
> has stock indexes), and rev_page is in the leaf nodes.
>
> If optimization is needed here, a root should install the microslow
> patch and do an analysis of what's taking up the most time, and see if
> those can be indexed or otherwise improved. At any rate, an ALTER
> TABLE would require considerable downtime, since as far as I know
> there's only one toolserver database server, so you can't just throw
> all the load onto the other slaves while you take one out of rotation.
> (Or are there multiples these days?)
>
> _______________________________________________
> Toolserver-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/toolserver-l
explain
SELECT COUNT(*), p.page_namespace
FROM page p
JOIN revision r ON (p.page_id = r.rev_page)
WHERE r.rev_user = 37880
GROUP BY p.page_namespace
ORDER BY p.page_namespace
\G
[EMAIL PROTECTED]:~$ sql enwiki_p < test.sql
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: revision
type: ref
possible_keys: PRIMARY,page_timestamp,user_timestamp
key: user_timestamp
key_len: 4
ref: const
rows: 1
Extra: Using index; Using temporary; Using filesort
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: page
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: enwiki.revision.rev_page
rows: 1
Extra:
_______________________________________________
Toolserver-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/toolserver-l