Re: [HACKERS] Errors in our encoding conversion tables

2015-12-02 Thread Tom Lane
Robert Haas writes: > On Fri, Nov 27, 2015 at 8:54 PM, Tatsuo Ishii wrote: >> In short, there are number of reasons we cannot simply import the >> consortium's mapping regarding SJIS (and EUC_JP). > I haven't seen a response to this point, but it seems important. I'll defer to Tatsuo-san concer

Re: [HACKERS] Errors in our encoding conversion tables

2015-12-02 Thread Robert Haas
On Fri, Nov 27, 2015 at 8:54 PM, Tatsuo Ishii wrote: > I explain why the manual editing is necessary. > > One of the most famous problems with Unicode is "wave dash" > (U+301C). According the Unicode consortium's Unicode/SJIS map, it > corresponds to 0x8160 of Shift_JIS. Unfortunately this was a m

Re: [HACKERS] Errors in our encoding conversion tables

2015-11-28 Thread Tom Lane
I wrote: > There's a discussion over at > http://www.postgresql.org/message-id/flat/2sa.dhu5.1hk1yrptnfy.1ml...@seznam.cz > of an apparent error in our WIN1250 -> LATIN2 conversion. Attached is an updated patch (against today's HEAD) showing proposed changes to bring cyrillic_and_mic.c and latin2_

Re: [HACKERS] Errors in our encoding conversion tables

2015-11-27 Thread Tatsuo Ishii
> I wrote: >> I have not attempted to reverify the files in utils/mb/Unicode against the >> original Unicode Consortium data, but maybe we ought to do that before >> taking any further steps here. > > I downloaded the mapping files from unicode.org and attempted to verify > that the Unicode/*.map

Re: [HACKERS] Errors in our encoding conversion tables

2015-11-27 Thread Tom Lane
I wrote: > gb18030_to_utf8.map utf8_to_gb18030.map > Could not find the reference file gb-18030-2000.xml, whose origin is > unstated anyway. Ah, scratch that complaint; digging in our git history turned up the origin of that file, so I double-checked it and then updated the script with a comment

Re: [HACKERS] Errors in our encoding conversion tables

2015-11-27 Thread Tom Lane
I wrote: > I have not attempted to reverify the files in utils/mb/Unicode against the > original Unicode Consortium data, but maybe we ought to do that before > taking any further steps here. I downloaded the mapping files from unicode.org and attempted to verify that the Unicode/*.map files could

Re: [HACKERS] Errors in our encoding conversion tables

2015-11-27 Thread Tom Lane
Albe Laurenz writes: > I agree with your proposed fix, the only thing that makes me feel > uncomfortable > is that you get error messages like: > ERROR: character with byte sequence 0x96 in encoding "WIN1250" has no > equivalent in encoding "MULE_INTERNAL" Hm, yeah. It's pretty silly that t

Re: [HACKERS] Errors in our encoding conversion tables

2015-11-27 Thread Albe Laurenz
Tom Lane wrote: > There's a discussion over at > http://www.postgresql.org/message-id/flat/2sa.dhu5.1hk1yrptnfy.1ml...@seznam.cz > of an apparent error in our WIN1250 -> LATIN2 conversion. I looked into this > and found that indeed, the code will happily translate certain characters > for which th

Re: [HACKERS] Errors in our encoding conversion tables

2015-11-26 Thread Tom Lane
Tatsuo Ishii writes: > I have started to looking into it. I wonder how do you create the part > of your patch: The code I used is below. > In the above you seem to disable the conversion from 0x96 of win1250 > to ISO-8859-2 by using the Unicode mapping files in > src/backend/utils/mb/Unicode. Bu

Re: [HACKERS] Errors in our encoding conversion tables

2015-11-26 Thread Tatsuo Ishii
> There's a discussion over at > http://www.postgresql.org/message-id/flat/2sa.dhu5.1hk1yrptnfy.1ml...@seznam.cz > of an apparent error in our WIN1250 -> LATIN2 conversion. I looked into this > and found that indeed, the code will happily translate certain characters > for which there seems to be

[HACKERS] Errors in our encoding conversion tables

2015-11-26 Thread Tom Lane
There's a discussion over at http://www.postgresql.org/message-id/flat/2sa.dhu5.1hk1yrptnfy.1ml...@seznam.cz of an apparent error in our WIN1250 -> LATIN2 conversion. I looked into this and found that indeed, the code will happily translate certain characters for which there seems to be no justifi