> > How do we mishandle shift-jis? We just use the OS's multibyte functions, > > and we assume that a CJK-OS traps the OS's multibyte char access functions. > > > > in paragraph.c, line 1564, there is some conversion going on (seems like > parser puts raw SHIFT_JIS characters into the file, but claiming they > are UCS2 - or is it vice versa?). This will IMHO mess up with unicode > values, if plucker is compiled with HAVE_IMODE.
Yup, that's intentional. :-) According to NTT Docomo "S-JIS character encoding must be used" on i-mode sites. http://www.nttdocomo.co.jp/english/i/tag/index.html However, when testing i-mode, I found that this was not adhered to in practice. Sites claim a charset of SHIFT_JIS, but use unicode entities for the i-mode icons, or vice versa. I opted to accept either range of values as i-mode icons, regardless of the charset of the document. As per the above spec, all i-mode sites should support JIS entities as icons, so this strategy should stay as is for sites using unicode character sets. The only problem I can see is for sites using SHIFT_JIS--is the unicode range overlapping what should be valid JIS characters? I don't think so, but I don't read Japanese. If anyone can and finds this to be the case, let me know and I will change it. What I meant with my (badly worded) comment on paragraph.c, line 1570 was that the Plucker format assumes that any 16 or 32 bit value (functions 0x83 & 0x85) is the "Unicode character code for the character." When converting a SHIFT_JIS document, this is not the case. The 0x83/0x85 functions *do not* represent the "Unicode character code for the character", but rather the *SHIFT_JIS* character code for the character. The Plucker Document format wording on this issue is misleading. Dave. _______________________________________________ plucker-dev mailing list [EMAIL PROTECTED] http://lists.rubberchicken.org/mailman/listinfo/plucker-dev
