On Tue, Jul 3, 2012 at 10:17 AM, Tatsuo Ishii <is...@postgresql.org> wrote:

> > OK.  So, in that case, I suggest that if the leading byte is non-zero,
> > we emit 0x9d followed by the three available bytes, instead of first
> > testing whether the first byte is >= 0xf0.  That test seems to serve
> > no purpose but to confuse the issue.
>
> Probably the code shoud look like this(see below comment):
>
>                 else if (lb >= 0xf0 && lb <= 0xfe)
>                 {
>                     if (lb <= 0xf4)
>                           *to++ = 0x9c;
>             else
>                           *to++ = 0x9d;
>                         *to++ = lb;
>                         *to++ = (*from >> 8) & 0xff;
>                         *to++ = *from & 0xff;
>                         cnt += 4;


It's likely we also need to assign some names to all these numbers
(0xf0, 0xf4, 0xfe, 0x9c, 0x9d). But it's hard for me to invent such names.


> > I further suggest that we improve the comments on the mule functions
> > for both wchar->mb and mb->wchar to make all this more clear.
>
> I have added comments about mule internal encoding by refreshing my
> memory and from old document found on
> web(
> http://mibai.tec.u-ryukyu.ac.jp/cgi-bin/info2www?%28mule%29Buffer%20and%20string
> ).
>
> Please take a look at.  BTW, it seems conversion between multibyte and
> wchar can be roundtrip in the leading character is LCPRV2 case:
>
> If the second byte of wchar (out of 4 bytes of wchar. The first byte
> is always 0x00) is in range of 0xf0 to 0xf4, then the first byte of
> multibyte must be 0x9c.  If the second byte of wchar is in range of
> 0xf5 to 0xfe, then the first byte of multibyte must be 0x9d.


Should I intergrate these code changes into my patch? Or we would like to
commit them first?

------
With best regards,
Alexander Korotkov.

Reply via email to