On Sat, Nov 21, 2009 at 8:45 PM, Aryeh Gregor <[email protected]> wrote: > Yeah, pretty much. If you had an index on (page_is_redirect, > page_namespace, page_title) or such, that would speed it up > significantly (at least in MySQL). It would have to scan through the > whole index, but that only contains three columns plus a row id of > some kind, so it should be quite a lot faster than scanning the whole > table.
Quite a lot? The theoretical max would be about twice as fast, as (page_is_redirect, page_namespace, page_title) is going to take up at least half as much space as the whole page table. But I'm not sure even that theoretical max could be reached by MySQL. I know it wouldn't be reached by PostgreSQL, which would still do a sequential scan through the table. If you clustered on page_is_redirect you'd save yourself from having to go through the parts of the table which were redirects, but you're still stuck with a sequential scan. In either database, if you really wanted the absolute fastest solution, you'd create a materialized view for exactly that query. But as you said, "this isn't a likely query for optimization." >> Yes, it can be very hard to switch your DBMS, and that's a very good >> thing for MySQL. :) > > Let's not have a DBMS flame war here, please. Aww, c'mon, just a little light ribbing... I couldn't resist, you set me right up for it. I'm actually in the process of trying to import enwiki into a postgresql database right now. Attempt 1 was to import everything into a MySQL database (with no indexes), export it as a TSV file, then import from the TSV file into Postgresql. Hit a snag with some data that Postgres is saying isn't valid UTF8, which is probably due to something I did wrong with the import, but I can't figure out what it is. _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
