> However, running the script with that doesn't produce exactly what we
> have in utf8_to_sjis.map, either. It's otherwise same, but we have
> some extra mappings:
> -  {0xc2a5, 0x5c},

0xc2a5 is U+00a5. The glyph is "YEN SIGN" which is corresponding to
0x5c in SJIS. So this is a valid mapping.

In the mean time, Microsoft wants to map U+005c to 0x5c in CP932.  The
glyph of U+005c is "REVERSE SOLDIUS" (back slash). So MS
decided that the glyph of U+00x5c is "YEN SIGN" in CP932!

In summary we need to keep both of mappings:

U+00a5 (utf 0xc2a5) -> 0x5c and U+005c -> 0x5c.

Obviously this breaks the round trip conversion between UTF8 and SJIS
encoding in this case though.

> -  {0xc2ac, 0x81ca},
U+00ac (NOT SIGN). Exists in SJIS.

> -  {0xe28096, 0x8161},


> -  {0xe280be, 0x7e},

U+213e (OVERLINE). Mapped to acii 0x7e, which is "half width tilde".

> -  {0xe28892, 0x817c},

U+2212 (MINUS SIGN). Mapped to "double width minus sign" in SJIS.

> -  {0xe3809c, 0x8160},

u+301c (WAVE DASH). Mapped to "double width wave dash" in SJIS.

> Those mappings were added in commit
> a8bd7e1c6e026678019b2f25cffc0a94ce62b24b, back in 2002. The bogus
> mapping for the invalid 0xc19c UTF-8 byte sequence was also added by
> that commit, as well a few valid mappings that UCS_to_SJIS.pl also
> produces.
> I can't judge if those mappings make sense. If we can't find an
> authoritative source for them, I suggest that we leave them as they
> are, but also hard-code them to UCS_to_SJIS.pl, so that running that
> script produces those mappings in utf8_to_sjis.map, even though they
> are not present in the CP932.TXT source file.

Sounds acceptable.

In summary current PostgreSQL UTF8 <--> SJIS mapping is a somewhat
mixture of SJIS (Shift_JIS) and MS932. There's no cleaner solution to
exodus this situation. I think we need live with it.

Best regards,
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to