On Fri, May 15, 2009 at 4:22 AM, Tisza Gergő <[email protected]> wrote: > Would it be very expensive to have a separate (namespace, title, sortkey) > table, > and join on that for queries that need sorting?
You would have to scan the *entire* table you're joining from (which may be hundreds of millions of rows). Not a possibility. On Fri, May 15, 2009 at 5:47 AM, Tisza Gergő <[email protected]> wrote: > Coding the first or second type of collation rule seems relatively simple, and > already a huge gain. (Also, RFC 3454 might be worth checking out as it has > language-independent rules for more than diacritics.) I agree. > You can have a separate raw_sortkey column if that's a large concern. That would still mean an UPDATE of many millions of rows. Plus you'd add another column to a table that's already very large -- categorylinks is ~40,000,000 rows on enwiki, and that's an extra 40m varchar(255)s clogging up the buffer pool even though they're never going to be used except for the occasional update. > Anyway, > this is the same for any solution that does not rely on MySQL collation: when > the rules change, you need to update the relevant column in the database. Correct. In fact, when MySQL's rules change you also have to rebuild the index, AFAIK. > What are the chances that we get decent MySQL collation in the close future > (say, next few years)? If we don't upgrade, I'd say about 0%. :) Even if we do, there are still the uniqueness problems, and the non-BMP problem. So not very good, I'd say, for our purposes. (That's not to say MySQL collation isn't decent for other purposes). _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
