Hi, I found an useless entry in utf8_to_sjis.map
> {0xc19c, 0x815f}, which is apparently illegal as UTF-8 which postgresql deliberately refuses. So it should be removed and the attached patch does that. 0x815f(SJIS) is also mapped from 0xefbcbc(U+FF3C FULLWIDTH REVERSE SOLIDUS) and it is a right mapping. By the way, the file comment at the beginning of UCS_to_SJIS.pl is the following. # Generate UTF-8 <--> SJIS code conversion tables from # map files provided by Unicode organization. # Unfortunately it is prohibited by the organization # to distribute the map files. So if you try to use this script, # you have to obtain SHIFTJIS.TXT from # the organization's ftp site. The file was found at the following place thanks to google. ftp://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/JIS/ As the URL is showing, or as written in the file Public/MAPPINGS/EASTASIA/ReadMe.txt, it is already obsolete and the *live* definition *may* be found in Unicode Character Database. But I haven't found SJIS-related informatin there. If I'm not missing anything, the only available authority would be JIS X 0208/0213 but what should be implmented seems to be maybe-modified MS932 for which I don't know the authority. Anyway I ran UCS_to_SJIS.pl with the SHIFTJIS.TXT above and I got a quite different mapping files from the current ones. So, I wonder how the mappings related to SJIS (and/or EUC-JP) are maintained. If no authoritative information is available, the generating script no longer usable. If any other autority is choosed, it is to be modified according to whatever the new source format is. Any suggestions? Or opinions? regards, -- Kyotaro Horiguchi NTT Open Source Software Center
diff --git a/src/backend/utils/mb/Unicode/utf8_to_sjis.map b/src/backend/utils/mb/Unicode/utf8_to_sjis.map index bcb76c9..47f5fdf 100644 --- a/src/backend/utils/mb/Unicode/utf8_to_sjis.map +++ b/src/backend/utils/mb/Unicode/utf8_to_sjis.map @@ -1,5 +1,4 @@ -static const pg_utf_to_local ULmapSJIS[ 7398 ] = { - {0xc19c, 0x815f}, +static const pg_utf_to_local ULmapSJIS[ 7397 ] = { {0xc2a2, 0x8191}, {0xc2a3, 0x8192}, {0xc2a5, 0x5c},
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers