El 5/12/09 1:49 PM, Aryeh Gregor escribió: > On Tue, May 12, 2009 at 4:38 PM, Brion Vibber<[email protected]> wrote: >> * Collation use for sorting needs to be double-checked to confirm it >> wouldn't interfere with present uniqueness constraints > > Since cl_sortkey isn't part of any unique key, this appears not to be > an issue for this use. Of course, it's an issue for every other > sorted list of titles, but those can't have custom sort keys specified > to begin with and don't seem to be included in this proposal. Perhaps > they should be, though. In that case we'd probably end up needing an > extra column in every single table that includes the page title, just > for sorting (but we'd be able to use flexible algorithms to generate > the sort key, rather than being stuck with MySQL's).
As a general issue we also need to consider managing paging through collation-sorted lists, since sort keys for different inputs may produce the same result. At the moment I think category lists are paged by offset (bad!) but we should ensure this is planned for. >> * Multilingual sites possibly not well served by table-wide >> language-specific coding > > utf8 sorting would be a lot better than binary sorting for any site, > I'm pretty sure. (I assume utf8 sorts sanely and not according to > codepoint.) Well, "utf8" doesn't tell you anything specific there... :) There's a "general" as well as "binary" which would be the same as what we do now (except for not supporting 4-byte characters AT ALL) http://dev.mysql.com/doc/refman/5.0/en/charset-unicode-sets.html For a multilingual site we'd probably end up using utf8_unicode_ci, which at least partially implements the Unicode Collation Algorithm (UCA), which sounds kind of confusing since at least a glance at http://www.unicode.org/reports/tr10/ makes it quite explicit that collation properties are language-dependent... presumably that's an un-tailored version which won't have most language-specific properties. >> Doing our own localized sort key encoding and adding another indexed >> column to sort on would avoid some dependency issues but has its own >> deployment and maintenance difficulties. > > You don't need another column for categorylinks, you can use the > existing cl_sortkey, so that should be relatively easy to deploy. It > doesn't help with non-category use cases, of course. You would if you need to store a processed sort key index that's not in the form of displayable characters. (eg, the output of the UCA) >> It would also be possible to use a separate column for the collated >> sorting while using MySQL 4.1+'s native collations, if the uniqueness >> constraints are a problem, but this is still dependent on rolling out an >> upgrade from 4.0. > > In that case we may as well make it like cl_sortkey and populate it > ourselves, surely. For the unique case of categorylinks yes. For everything else, additional columns are not already present. -- brion _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
