At Fri, 7 Oct 2016 23:58:45 +0300, Heikki Linnakangas <hlinn...@iki.fi> wrote 
in <9c544547-7214-aebe-9b04-57624aedd...@iki.fi>
> > So, I wonder how the mappings related to SJIS (and/or EUC-JP) are
> > maintained. If no authoritative information is available, the
> > generating script no longer usable. If any other autority is
> > choosed, it is to be modified according to whatever the new
> > source format is.
> The script is clearly intended to read CP932.TXT, rather than
> SHIFTJIS.TXT, despite the comments in it. CP932.TXT can be found at
> ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT
> However, running the script with that doesn't produce exactly what we
> have in utf8_to_sjis.map, either. It's otherwise same, but we have
> some extra mappings:
> -  {0xc2a5, 0x5c},
> -  {0xc2ac, 0x81ca},
> -  {0xe28096, 0x8161},
> -  {0xe280be, 0x7e},
> -  {0xe28892, 0x817c},
> -  {0xe3809c, 0x8160},
> Those mappings were added in commit
> a8bd7e1c6e026678019b2f25cffc0a94ce62b24b, back in 2002. The bogus
> mapping for the invalid 0xc19c UTF-8 byte sequence was also added by
> that commit, as well a few valid mappings that UCS_to_SJIS.pl also
> produces.
> I can't judge if those mappings make sense. If we can't find an
> authoritative source for them, I suggest that we leave them as they

The mappings have a hystorical reason came from differences
between Unicode definition and Oracle and Microsoft
implementations and developing of Unicode specification. So the
several SJIS (and EUC-JP) characters have two or more mappings to
Unicode. There's also several variations of the opposite
mapping. But none of them is the autority and what to adopt
depends on system requirement. The only requirement that
PostgreSQL should keep seems to be round-trip consistency starts
from SJIS input.

> are, but also hard-code them to UCS_to_SJIS.pl, so that running that
> script produces those mappings in utf8_to_sjis.map, even though they
> are not present in the CP932.TXT source file.

Agreed. I do that at least for Japanese charsets.


Kyotaro Horiguchi
NTT Open Source Software Center

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to