On Sat, Nov 21, 2009 at 8:45 PM, Aryeh Gregor
<[email protected]> wrote:
> Yeah, pretty much.  If you had an index on (page_is_redirect,
> page_namespace, page_title) or such, that would speed it up
> significantly (at least in MySQL).  It would have to scan through the
> whole index, but that only contains three columns plus a row id of
> some kind, so it should be quite a lot faster than scanning the whole
> table.

Quite a lot?  The theoretical max would be about twice as fast, as
(page_is_redirect, page_namespace, page_title) is going to take up at
least half as much space as the whole page table.  But I'm not sure
even that theoretical max could be reached by MySQL.

I know it wouldn't be reached by PostgreSQL, which would still do a
sequential scan through the table.  If you clustered on
page_is_redirect you'd save yourself from having to go through the
parts of the table which were redirects, but you're still stuck with a
sequential scan.

In either database, if you really wanted the absolute fastest
solution, you'd create a materialized view for exactly that query.
But as you said, "this isn't a likely query for optimization."

>> Yes, it can be very hard to switch your DBMS, and that's a very good
>> thing for MySQL.  :)
>
> Let's not have a DBMS flame war here, please.

Aww, c'mon, just a little light ribbing...  I couldn't resist, you set
me right up for it.

I'm actually in the process of trying to import enwiki into a
postgresql database right now.  Attempt 1 was to import everything
into a MySQL database (with no indexes), export it as a TSV file, then
import from the TSV file into Postgresql.  Hit a snag with some data
that Postgres is saying isn't valid UTF8, which is probably due to
something I did wrong with the import, but I can't figure out what it
is.

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to