On Mon, Apr 16, 2007 at 10:04:55AM +0200, Petko Yotov wrote: > On Sunday 15 April 2007 23:16, Patrick R. Michaud wrote: > > Searches on sites using utf-8 are now performed case-insensitively > > for accented characters. > > Hello Patrick. > > There is a problem with the $CaseConversions array, line 214: > > "\xc9\xbd" => "\x171\xa4",
You're correct. It's now fixed for the next release. > I tried to find what is written in the source [1] but could not find the > sequence C9BD. "\xc9\xbd" is a UTF-8 sequence, while the UnicodeData.txt file gives codepoint values (not UTF-8). So, \xc9\xbd (utf8) is the encoding for U+027D. The uppercase conversion of U+027D is U+2C64. The codepoint U+2C64 requires a 3-byte UTF-8 encoding, but the program I wrote to translate codepoints to UTF-8 was set up to only handle 1-byte and 2-byte conversions, so it mis-encoded the sequence as \x171. So, the uppercase conversion of U+027D (\xc9\xbd in UTF-8) is U+2C64 (\xe2\xb1\xa4 in UTF-8). I had already caught the 3-byte instances elsewhere in the tables, but apparently missed this one. Thanks for catching it! Pm _______________________________________________ pmwiki-users mailing list [email protected] http://www.pmichaud.com/mailman/listinfo/pmwiki-users
