Good morning!

I frequently consult the Unihan database to get detailed information about Japanese and Chinese characters, and I have noticed that at least some pages are encoded in more than one encoding, that is to say, although the main encoding is in "UTF-8" (as one would expect on the Unihan site), certain characters on those pages are encoded in "ISO-8859-1", which makes them unreadable until one forces a change of the encoding.

I just looked at these pages:
http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=58b3 (character: 墳) http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=5893 (character: 墓)

The wrongly encoded characters appear here in the Hanyu Pinyin column: the accented letters are from the ISO-8859-1 charset and not from UTF-8 and will only become legible if one changes the encoding setting to ISO-8859-1 (which renders, of course, much the rest of the page unusable)

kHanyuPinyin 10485.060:fén,fèn
kHanyuPinyin 10470.090:mù

I suspect that the providers of this information would like to see all of it to be encoded in UTF-8 and that the current encoding scheme is just an accident. :-)

Thank you for your time!




Reply via email to