https://bugzilla.wikimedia.org/show_bug.cgi?id=164
--- Comment #200 from Philippe Verdy <verd...@wanadoo.fr> 2010-07-26 22:44:37 UTC --- Note that if your collator stub does not really compute true sort keys (compacted in binary format and representing collation weights), it may return spaces (U+0020), that is represented by byte 0x20 in the UTF-8 encoding. The separator used must be lower than this encoded character, so the VARCHAR(N) database field has to accept this character. If it does not, you may offset all the UTF-8 bytes returned by your stub by 1 (because UTF-8 bytes never use the byte value 0xFF), so that you'll be able to ue the byte value 0x20 for the separator (the UTF-8 encoded SPACE will be stored in the sortkey field as 0x21 after it has been offseted). This does not matter because sortkeys are opaque, so this is stimple to do in the stub. The stored sort key field DOES NOT have to use the UTF-8 encoding explicitly, it must just accept binary encoded bytes that can fit a non-Unicode character. If the database wants you to specify a charset for this binary sortable field, use ISO-8859-1, as long as the database will not alter the binary order of ISO-8859-1 when computing ORDER BY clauses. I really hope that we will always have the possibility of using VARBINARY(N) fields allowing compact random byte values, because it will be much more efficient for our use of opaque binary sortable sort keys (that are just streams of bytes). Be also careful about the sign of bytes (i.e. their range value), otherwise byte 0x80.0xFF will sort before 0x00..0x7F in the order by clause. This should be the case if the field is declared to use the ISO 8859-1 encoding with its binary order. Which SQL backends (and dialects) do you support in MediaWiki ? This may help me understanding some development constraints. I know you support MySQL, but are there simpler backends like BerkeleyDB, dBase files, or flat text files for small projects supported only via some ODBC driver with a PHP interface ? -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l