On Mon, Apr 16, 2007 at 10:04:55AM +0200, Petko Yotov wrote:
> On Sunday 15 April 2007 23:16, Patrick R. Michaud wrote:
> > Searches on sites using utf-8 are now performed case-insensitively
> > for accented characters.
> 
> Hello Patrick.
> 
> There is a problem with the $CaseConversions array, line 214:
> 
>    "\xc9\xbd" => "\x171\xa4",

You're correct.  It's now fixed for the next release.

> I tried to find what is written in the source [1] but could not find the 
> sequence C9BD.

"\xc9\xbd" is a UTF-8 sequence, while the UnicodeData.txt file
gives codepoint values (not UTF-8).  So, \xc9\xbd (utf8) is the
encoding for U+027D.

The uppercase conversion of U+027D is U+2C64.  The codepoint U+2C64
requires a 3-byte UTF-8 encoding, but the program I wrote to translate
codepoints to UTF-8 was set up to only handle 1-byte and 2-byte 
conversions, so it mis-encoded the sequence as \x171.

So, the uppercase conversion of U+027D (\xc9\xbd in UTF-8) is 
U+2C64 (\xe2\xb1\xa4 in UTF-8).

I had already caught the 3-byte instances elsewhere in the
tables, but apparently missed this one.  Thanks for catching it!

Pm

_______________________________________________
pmwiki-users mailing list
[email protected]
http://www.pmichaud.com/mailman/listinfo/pmwiki-users

Reply via email to