On Sat, Nov 21, 2009 at 6:39 PM, Aryeh Gregor <[email protected]> wrote: > Selecting a list of all titles that are not > redirects will take a long time on any database, unless you have > everything in memory, because it requires a table scan -- there's no > index that covers the relevant columns (IIRC).
You could build an index on page_is_redirect in the "page" table (see enwiki-*-page.sql.gz). But I'm pretty sure Postgresql wouldn't use it, and would do a sequential scan, since pretty much all the pages are going to have to be accessed anyway. Five or ten minutes sounds about right. I can't imagine this is a query you want to run over and over again. If it is, you'd probably want to use partitioning (http://www.postgresql.org/docs/8.4/interactive/ddl-partitioning.html), but you're still not going to cut down the query time very much, as it's going to be returning millions of rows. > It's also worth pointing out that Wikipedia uses a version of MySQL > with substantial modifications, and Wikimedia sysadmins are very > familiar with its behavior. Switching to a new technology might > theoretically be better in the long term (although I wouldn't take > that for granted in this case), but the transition cost would be > substantial. Heck, Wikipedia hasn't even upgraded to MySQL 4.1, let > alone a whole different DBMS. Yes, it can be very hard to switch your DBMS, and that's a very good thing for MySQL. :) _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
