On Mon, Sep 15, 2008 at 8:54 PM, Darren Hardy <[EMAIL PROTECTED]> wrote:
> I think that the problem, at least with this specific query, is with
> the query structure. page_namespace and rev_user are both already
> indexed in the page and revision tables, but the query planner doesn't
> optimize the join "correctly" (maybe coz the page_namespace index is
> long). Nevertheless, the query is slow because there isn't a user_page
> table and you have to dig in the huge revision table, so if you
> explicitly dump all the rev_page data into a derived table that might
> help your query speed.

Well, to begin with, who says this query is slow?

mysql> SELECT COUNT(*), page_namespace FROM page JOIN revision ON
page_id = rev_page WHERE rev_user = 158371 GROUP BY page_namespace
ORDER BY page_namespace;
+----------+----------------+
| COUNT(*) | page_namespace |
+----------+----------------+
|     2893 |              0 |
|      478 |              1 |
|      578 |              2 |
|      628 |              3 |
|     1925 |              4 |
|      699 |              5 |
|       42 |              6 |
|       14 |              7 |
|       85 |              9 |
|      275 |             10 |
|      116 |             11 |
|        1 |             12 |
|      104 |             14 |
|       17 |             15 |
|        5 |            100 |
+----------+----------------+
15 rows in set (0.17 sec)

That's certainly fast enough for an occasional statistics-gathering query.

I don't see why you think it's not optimizing the join correctly.  The
execution procedure is to scan through the appropriate range in the
user_timestamp index on the revision table, and join each retrieved
row to the primary key of the page table, maintaining in-memory counts
(in a temporary table) of how many edits there are for each namespace.
 I don't see what a superior execution strategy would be here.

What relevance could an index on page_namespace have?  Are you
suggesting that the query should try selecting from the page table
first, and joining to the revision table?

> For example, try this alternate version:
>
> SELECT  COUNT(*), p.page_namespace
> FROM    page p
> JOIN (SELECT DISTINCT rev_page FROM revision WHERE rev_user = XXXXX) r
> ON (r.rev_page = p.page_id)
> GROUP BY p.page_namespace
> ORDER BY p.page_namespace

It seems to be almost identical in speed, testing on enwiki_p with
user id 1385729 (BetacommandBot, 700k edits).  It's doing exactly the
same thing, as far as I can tell, except that it's storing the result
of the revision query in a temporary table first -- which could cause
trouble if the table gets too large.  Both take about 2.5-3 s on a hot
cache.

_______________________________________________
Toolserver-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/toolserver-l

Reply via email to