Re: MINUS SIGN (U+2212) in EUC-JP encoding is mapped to FULLWIDTH HYPHEN-MINUS (U+FF0D) in UTF-8

2020-10-30 Thread Kyotaro Horiguchi
At Fri, 30 Oct 2020 16:33:01 +0900 (JST), Kyotaro Horiguchi wrote in > At Fri, 30 Oct 2020 14:38:30 +0900, Amit Langote > wrote in > I'm not sure how we should construct our won mapping, but the > difference made by we simply moved to JIS0208.TXT based as Ishii-san > suggested the differences

Re: MINUS SIGN (U+2212) in EUC-JP encoding is mapped to FULLWIDTH HYPHEN-MINUS (U+FF0D) in UTF-8

2020-10-30 Thread Kyotaro Horiguchi
At Fri, 30 Oct 2020 14:38:30 +0900, Amit Langote wrote in > On Fri, Oct 30, 2020 at 12:20 PM Kyotaro Horiguchi > wrote: > > So ping-pong between Unicode and SJIS behaves like this: > > > > U+2212 => 0x817c@sjis => U+ff0d => 0x817c@sjis ... > > Is it the following piece of code in UCS_TO_SJIS.p

Re: MINUS SIGN (U+2212) in EUC-JP encoding is mapped to FULLWIDTH HYPHEN-MINUS (U+FF0D) in UTF-8

2020-10-29 Thread Amit Langote
On Fri, Oct 30, 2020 at 12:20 PM Kyotaro Horiguchi wrote: > At Fri, 30 Oct 2020 06:13:53 +0530, Ashutosh Sharma > wrote in > > However, when the same MINUS SIGN in UTF-8 is converted to SJIS > > encoding, the convert function returns the correct result. See below: > > > > postgres=# select conve

Re: MINUS SIGN (U+2212) in EUC-JP encoding is mapped to FULLWIDTH HYPHEN-MINUS (U+FF0D) in UTF-8

2020-10-29 Thread Kyotaro Horiguchi
At Fri, 30 Oct 2020 13:17:08 +0900 (JST), Tatsuo Ishii wrote in > > The mapping is generated from CP932.TXT and JIS0212.TXT by > > UCS_to_UEC_JP.pl. > > I still don't understand why this change has been made. Originally the > conversion was based on JIS0208.txt, JIS0212.txt and JIS0201.txt, > w

Re: MINUS SIGN (U+2212) in EUC-JP encoding is mapped to FULLWIDTH HYPHEN-MINUS (U+FF0D) in UTF-8

2020-10-29 Thread Ashutosh Sharma
On Fri, Oct 30, 2020 at 8:49 AM Kyotaro Horiguchi wrote: > > Hello. > > At Fri, 30 Oct 2020 06:13:53 +0530, Ashutosh Sharma > wrote in > > Hi All, > > > > Today while working on some other task related to database encoding, I > > noticed that the MINUS SIGN (with byte sequence a1-dd) in EUC-JP i

Re: MINUS SIGN (U+2212) in EUC-JP encoding is mapped to FULLWIDTH HYPHEN-MINUS (U+FF0D) in UTF-8

2020-10-29 Thread Tatsuo Ishii
> The mapping is generated from CP932.TXT and JIS0212.TXT by > UCS_to_UEC_JP.pl. I still don't understand why this change has been made. Originally the conversion was based on JIS0208.txt, JIS0212.txt and JIS0201.txt, which is the exact definition of EUC-JP. CP932.txt is defined by Microsoft for t

Re: MINUS SIGN (U+2212) in EUC-JP encoding is mapped to FULLWIDTH HYPHEN-MINUS (U+FF0D) in UTF-8

2020-10-29 Thread Tatsuo Ishii
> Hi All, > > Today while working on some other task related to database encoding, I > noticed that the MINUS SIGN (with byte sequence a1-dd) in EUC-JP is > mapped to FULLWIDTH HYPHEN-MINUS (with byte sequence ef-bc-8d) in > UTF-8. See below: > > postgres=# select convert('\xa1dd', 'euc_jp', 'utf

Re: MINUS SIGN (U+2212) in EUC-JP encoding is mapped to FULLWIDTH HYPHEN-MINUS (U+FF0D) in UTF-8

2020-10-29 Thread Kyotaro Horiguchi
At Fri, 30 Oct 2020 12:08:51 +0900, Amit Langote wrote in > I noticed that the commit a8bd7e1c6e02 from ages ago removed > conversions from and to utf-8's e28892, in favor of efbc8d, and that > change has stuck. (Note though that these maps looked pretty > different back then.) > > --- a/src/b

Re: MINUS SIGN (U+2212) in EUC-JP encoding is mapped to FULLWIDTH HYPHEN-MINUS (U+FF0D) in UTF-8

2020-10-29 Thread Tom Lane
Amit Langote writes: > On Fri, Oct 30, 2020 at 9:44 AM Ashutosh Sharma wrote: >> Today while working on some other task related to database encoding, I >> noticed that the MINUS SIGN (with byte sequence a1-dd) in EUC-JP is >> mapped to FULLWIDTH HYPHEN-MINUS (with byte sequence ef-bc-8d) in >> UT

Re: MINUS SIGN (U+2212) in EUC-JP encoding is mapped to FULLWIDTH HYPHEN-MINUS (U+FF0D) in UTF-8

2020-10-29 Thread Kyotaro Horiguchi
Hello. At Fri, 30 Oct 2020 06:13:53 +0530, Ashutosh Sharma wrote in > Hi All, > > Today while working on some other task related to database encoding, I > noticed that the MINUS SIGN (with byte sequence a1-dd) in EUC-JP is > mapped to FULLWIDTH HYPHEN-MINUS (with byte sequence ef-bc-8d) in > U

Re: MINUS SIGN (U+2212) in EUC-JP encoding is mapped to FULLWIDTH HYPHEN-MINUS (U+FF0D) in UTF-8

2020-10-29 Thread Amit Langote
On Fri, Oct 30, 2020 at 9:44 AM Ashutosh Sharma wrote: > > Hi All, > > Today while working on some other task related to database encoding, I > noticed that the MINUS SIGN (with byte sequence a1-dd) in EUC-JP is > mapped to FULLWIDTH HYPHEN-MINUS (with byte sequence ef-bc-8d) in > UTF-8. See below

MINUS SIGN (U+2212) in EUC-JP encoding is mapped to FULLWIDTH HYPHEN-MINUS (U+FF0D) in UTF-8

2020-10-29 Thread Ashutosh Sharma
Hi All, Today while working on some other task related to database encoding, I noticed that the MINUS SIGN (with byte sequence a1-dd) in EUC-JP is mapped to FULLWIDTH HYPHEN-MINUS (with byte sequence ef-bc-8d) in UTF-8. See below: postgres=# select convert('\xa1dd', 'euc_jp', 'utf8'); convert --