I think that the problem, at least with this specific query, is with  
the query structure. page_namespace and rev_user are both already  
indexed in the page and revision tables, but the query planner doesn't  
optimize the join "correctly" (maybe coz the page_namespace index is  
long). Nevertheless, the query is slow because there isn't a user_page  
table and you have to dig in the huge revision table, so if you  
explicitly dump all the rev_page data into a derived table that might  
help your query speed.

For example, try this alternate version:

SELECT  COUNT(*), p.page_namespace
FROM    page p
JOIN (SELECT DISTINCT rev_page FROM revision WHERE rev_user = XXXXX) r  
ON (r.rev_page = p.page_id)
GROUP BY p.page_namespace
ORDER BY p.page_namespace

*************************** 1. row ***************************
            id: 1
   select_type: PRIMARY
         table: <derived2>
          type: ALL
possible_keys: NULL
           key: NULL
       key_len: NULL
           ref: NULL
          rows: 14202
         Extra: Using temporary; Using filesort
*************************** 2. row ***************************
            id: 1
   select_type: PRIMARY
         table: page
          type: eq_ref
possible_keys: PRIMARY
           key: PRIMARY
       key_len: 4
           ref: r.rev_page
          rows: 1
         Extra:
*************************** 3. row ***************************
            id: 2
   select_type: DERIVED
         table: revision
          type: ref
possible_keys: user_timestamp
           key: user_timestamp
       key_len: 4
           ref:
          rows: 79640
         Extra: Using where; Using index; Using temporary

-darren


On 15 September 2008, at 2:18 PM, Aryeh Gregor wrote:

> On Mon, Sep 15, 2008 at 1:48 PM, Victor Vasiliev <[EMAIL PROTECTED]>  
> wrote:
>> Is it possible to locally create some indexes specific for  
>> toolserver?
>> As for now, it will help to seriously speed up most tools. E.g. index
>> (rev_user,rev_page) will speed up all editcounters, which usually  
>> uses
>> query like
>> SELECT COUNT(*), page_namespace FROM page JOIN revision ON page_id =
>> rev_page WHERE rev_user = 37880 GROUP BY page_namespace ORDER BY
>> page_namespace;
>
> Why would that index speed up the access noticeably?  There's already
> an index on (rev_user, rev_timestamp) (assuming that the toolserver
> has stock indexes), and rev_page is in the leaf nodes.
>
> If optimization is needed here, a root should install the microslow
> patch and do an analysis of what's taking up the most time, and see if
> those can be indexed or otherwise improved.  At any rate, an ALTER
> TABLE would require considerable downtime, since as far as I know
> there's only one toolserver database server, so you can't just throw
> all the load onto the other slaves while you take one out of rotation.
> (Or are there multiples these days?)
>
> _______________________________________________
> Toolserver-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/toolserver-l

explain
SELECT  COUNT(*), p.page_namespace
FROM    page p
        JOIN revision r ON (p.page_id = r.rev_page)
WHERE   r.rev_user = 37880
GROUP BY p.page_namespace
ORDER BY p.page_namespace
\G
[EMAIL PROTECTED]:~$ sql enwiki_p < test.sql
*************************** 1. row ***************************
            id: 1
   select_type: SIMPLE
         table: revision
          type: ref
possible_keys: PRIMARY,page_timestamp,user_timestamp
           key: user_timestamp
       key_len: 4
           ref: const
          rows: 1
         Extra: Using index; Using temporary; Using filesort
*************************** 2. row ***************************
            id: 1
   select_type: SIMPLE
         table: page
          type: eq_ref
possible_keys: PRIMARY
           key: PRIMARY
       key_len: 4
           ref: enwiki.revision.rev_page
          rows: 1
         Extra:


_______________________________________________
Toolserver-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/toolserver-l

Reply via email to