Re: Improper UTF-8 combining character handling

2007-06-12 Thread Sean Burke
I've retried with 3.2-17 with the same results. Notably, the issue isn't (and has not been) that all multibyte characters are handled properly. Instead, sequences which contain combining characters seem to treat the sequence inconsistently. For example, the character that represents D WITH DOT

Re: Improper UTF-8 combining character handling

2007-06-10 Thread Benno Schulenberg
Sean Burke wrote: The Unicode normalization test data at http://www.unicode.org/Public/UNIDATA/NormalizationTest.txt contains many sequences of this sort. The first chara cter sequence, LATIN CAPITAL LETTER D WITH DOT ABOVE, does produce this problem. Paste it into the