Hi All, Today while working on some other task related to database encoding, I noticed that the MINUS SIGN (with byte sequence a1-dd) in EUC-JP is mapped to FULLWIDTH HYPHEN-MINUS (with byte sequence ef-bc-8d) in UTF-8. See below:
postgres=# select convert('\xa1dd', 'euc_jp', 'utf8'); convert ---------- \xefbc8d (1 row) Isn't this a bug? Shouldn't this have been converted to the MINUS SIGN (with byte sequence e2-88-92) in UTF-8 instead of FULLWIDTH HYPHEN-MINUS SIGN. When the MINUS SIGN (with byte sequence e2-88-92) in UTF-8 is converted to EUC-JP, the convert functions fails with an error saying: "character with byte sequence 0xe2 0x88 0x92 in encoding UTF8 has no equivalent in encoding EUC_JP". See below: postgres=# select convert('\xe28892', 'utf-8', 'euc_jp'); ERROR: character with byte sequence 0xe2 0x88 0x92 in encoding "UTF8" has no equivalent in encoding "EUC_JP" However, when the same MINUS SIGN in UTF-8 is converted to SJIS encoding, the convert function returns the correct result. See below: postgres=# select convert('\xe28892', 'utf-8', 'sjis'); convert --------- \x817c (1 row) Please note that the byte sequence (81-7c) in SJIS represents MINUS SIGN in SJIS which means the MINUS SIGN in UTF8 got converted to the MINUS SIGN in SJIS and that is what we expect. Isn't it? -- With Regards, Ashutosh Sharma EnterpriseDB:http://www.enterprisedb.com