> > How do we mishandle shift-jis?  We just use the OS's multibyte
functions,
> > and we assume that a CJK-OS traps the OS's multibyte char access
functions.
> >
>
> in paragraph.c, line 1564, there is some conversion going on (seems like
> parser puts raw SHIFT_JIS characters into the file, but claiming they
> are UCS2 - or is it vice versa?). This will IMHO mess up with unicode
> values, if plucker is compiled with HAVE_IMODE.

Yup, that's intentional. :-)  According to NTT Docomo "S-JIS character
encoding must be used" on i-mode sites.
http://www.nttdocomo.co.jp/english/i/tag/index.html
However, when testing i-mode, I found that this was not adhered to in
practice.  Sites claim a charset of SHIFT_JIS, but use unicode entities for
the i-mode icons, or vice versa.  I opted to accept either range of values
as i-mode icons, regardless of the charset of the document.  As per the
above spec, all i-mode sites should support JIS entities as icons, so this
strategy should stay as is for sites using unicode character sets.  The only
problem I can see is for sites using SHIFT_JIS--is the unicode range
overlapping what should be valid JIS characters?  I don't think so, but I
don't read Japanese. If anyone can and finds this to be the case, let me
know and I will change it.


What I meant with my (badly worded) comment on paragraph.c, line 1570 was
that the Plucker format assumes that any 16 or 32 bit value (functions 0x83
& 0x85) is the "Unicode character code for the character."  When converting
a SHIFT_JIS document, this is not the case. The 0x83/0x85 functions *do not*
represent the "Unicode character code for the character", but rather the
*SHIFT_JIS* character code for the character.  The Plucker Document format
wording on this issue is misleading.

Dave.

_______________________________________________
plucker-dev mailing list
[EMAIL PROTECTED]
http://lists.rubberchicken.org/mailman/listinfo/plucker-dev

Reply via email to