> Hi All, > > Today while working on some other task related to database encoding, I > noticed that the MINUS SIGN (with byte sequence a1-dd) in EUC-JP is > mapped to FULLWIDTH HYPHEN-MINUS (with byte sequence ef-bc-8d) in > UTF-8. See below: > > postgres=# select convert('\xa1dd', 'euc_jp', 'utf8'); > convert > ---------- > \xefbc8d > (1 row) > > Isn't this a bug? Shouldn't this have been converted to the MINUS SIGN > (with byte sequence e2-88-92) in UTF-8 instead of FULLWIDTH > HYPHEN-MINUS SIGN.
Yeah. Originally EUC_JP 0xa1dd was converted to UTF8 0xe28892. At some point, someone changed the mapping and now you see it. > When the MINUS SIGN (with byte sequence e2-88-92) in UTF-8 is > converted to EUC-JP, the convert functions fails with an error saying: > "character with byte sequence 0xe2 0x88 0x92 in encoding UTF8 has no > equivalent in encoding EUC_JP". See below: > > postgres=# select convert('\xe28892', 'utf-8', 'euc_jp'); > ERROR: character with byte sequence 0xe2 0x88 0x92 in encoding "UTF8" > has no equivalent in encoding "EUC_JP" Again, originally UTF8 0xe28892 was converted to EUC_JP 0xa1dd . At some point, someone changed the mapping. > However, when the same MINUS SIGN in UTF-8 is converted to SJIS > encoding, the convert function returns the correct result. See below: > > postgres=# select convert('\xe28892', 'utf-8', 'sjis'); > convert > --------- > \x817c > (1 row) > > Please note that the byte sequence (81-7c) in SJIS represents MINUS > SIGN in SJIS which means the MINUS SIGN in UTF8 got converted to the > MINUS SIGN in SJIS and that is what we expect. Isn't it? Agreed. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp