https://bugzilla.wikimedia.org/show_bug.cgi?id=164

--- Comment #200 from Philippe Verdy <verd...@wanadoo.fr> 2010-07-26 22:44:37 
UTC ---
Note that if your collator stub does not really compute true sort keys
(compacted in binary format and representing collation weights), it may return
spaces (U+0020), that is represented by byte 0x20 in the UTF-8 encoding. The
separator used must be lower than this encoded character, so the VARCHAR(N)
database field has to accept this character.

If it does not, you may offset all the UTF-8 bytes returned by your stub by 1
(because UTF-8 bytes never use the byte value 0xFF), so that you'll be able to
ue the byte value 0x20 for the separator (the UTF-8 encoded SPACE will be
stored in the sortkey field as 0x21 after it has been offseted).

This does not matter because sortkeys are opaque, so this is stimple to do in
the stub. The stored sort key field DOES NOT have to use the UTF-8 encoding
explicitly, it must just accept binary encoded bytes that can fit a non-Unicode
character. If the database wants you to specify a charset for this binary
sortable field, use ISO-8859-1, as long as the database will not alter the
binary order of ISO-8859-1 when computing ORDER BY clauses.

I really hope that we will always have the possibility of using VARBINARY(N)
fields allowing compact random byte values, because it will be much more
efficient for our use of opaque binary sortable sort keys (that are just
streams of bytes).

Be also careful about the sign of bytes (i.e. their range value), otherwise
byte 0x80.0xFF will sort before 0x00..0x7F in the order by clause. This should
be the case if the field is declared to use the ISO 8859-1 encoding with its
binary order.

Which SQL backends (and dialects) do you support in MediaWiki ? This may help
me understanding some development constraints. I know you support MySQL, but
are there simpler backends like BerkeleyDB, dBase files, or flat text files for
small projects supported only via some ODBC driver with a PHP interface ?

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

_______________________________________________
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to