Re: [HACKERS] again: Bug #943: Server-Encoding from EUC_TW
Tatsuo Ishii wrote: I reported bug #943 (I found in 7.3.2) and you checked in some change against integer overflow. Now I upgraded to 7.3.3 and I'm not happy with this. The exact error as I described is fixed, but I found new errors in conversion UTF-8 - EUC_TW and BIG5: Copy to table (DB has UTF-8 encoding) from file: for PGCLIENTENCODING=BIG5: WARNING: copy: line 1, LocalToUtf: could not convert (0xf9d6) BIG5 to UTF-8. Ignored WARNING: copy: line 2, LocalToUtf: could not convert (0xf9d7) BIG5 to UTF-8. Ignored WARNING: copy: line 3, LocalToUtf: could not convert (0xf9d8) BIG5 to UTF-8. Ignored WARNING: copy: line 4, LocalToUtf: could not convert (0xf9db) BIG5 to UTF-8. Ignored I see no problem here. The only standard conversion map I could found on-line form so far (see below URL) does not include entries 0xf9d6 or above. http://www.unicode.org/Public/UNIDATA/Unihan.txt I found in this file: U+F9D7 in line 604519 U+F9D8 in line 219540 U+F9D6...U+F9DB in lines 730707...730766. No. U+F9D6 means *Unicode* code point, not BIG5 code point. Ok. I have looked into my Linux box and found this in /usr/share/i18n/charmaps/BIG5.gz: % Chinese charmap for BIG5 (CP950) % version: 0.92 % Contact: Tung-Han Hsieh [EMAIL PROTECTED] % Yuan-Chung Cheng [EMAIL PROTECTED] % Distribution and use is free, even for comercial purpose. % % This charmap is converted from: % ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP950.TXT % ... There my characters are in. Don't you agree that it is strange that I can (for EUC_TW) copy to file without error but I can not copy from file without error? Michael for EUC_TW WARNING: copy: line 1, LocalToUtf: could not convert (0x8ea3c3b7) EUC_TW to UTF-8. Ignored WARNING: copy: line 2, LocalToUtf: could not convert (0x8ea3cfd0) EUC_TW to UTF-8. Ignored WARNING: copy: line 3, LocalToUtf: could not convert (0x8ea3c4ce) EUC_TW to UTF-8. Ignored WARNING: copy: line 4, LocalToUtf: could not convert (0x8ea3bdfe) EUC_TW to UTF-8. Ignored Hum. These seem to be CNS 11643-1993, plane 3. Currently PostgreSQL supports only: CNS 11643-1993, plane 0 CNS 11643-1993, plane 1 CNS 11643-1993, plane 2 CNS 11643-1993, plane 15 Would you like to have support for rest of CNS 11643-1993 planes: CNS 11643-1993, plane 3 CNS 11643-1993, plane 4 CNS 11643-1993, plane 5 CNS 11643-1993, plane 6 CNS 11643-1993, plane 7 support for upcoming 7.4? Copy out to file from table (UTF-8 data): to BIG5 WARNING: UtfToLocal: could not convert UTF-8 (0xe7a281). Ignored WARNING: UtfToLocal: could not convert UTF-8 (0xe98ab9). Ignored WARNING: UtfToLocal: could not convert UTF-8 (0xe8a38f). Ignored WARNING: UtfToLocal: could not convert UTF-8 (0xe7b2a7). Ignored to EUC_TW is ok! BIG5 and EUC_TW have different code points. So this is not very strange. But it is very strange that I can (for EUC_TW) copy to file without error but I can not copy from file without error. Michael ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] again: Bug #943: Server-Encoding from EUC_TW toUTF-8
Tatsuo Ishii wrote: Hello, I reported bug #943 (I found in 7.3.2) and you checked in some change against integer overflow. Now I upgraded to 7.3.3 and I'm not happy with this. The exact error as I described is fixed, but I found new errors in conversion UTF-8 - EUC_TW and BIG5: Copy to table (DB has UTF-8 encoding) from file: for PGCLIENTENCODING=BIG5: WARNING: copy: line 1, LocalToUtf: could not convert (0xf9d6) BIG5 to UTF-8. Ignored WARNING: copy: line 2, LocalToUtf: could not convert (0xf9d7) BIG5 to UTF-8. Ignored WARNING: copy: line 3, LocalToUtf: could not convert (0xf9d8) BIG5 to UTF-8. Ignored WARNING: copy: line 4, LocalToUtf: could not convert (0xf9db) BIG5 to UTF-8. Ignored I see no problem here. The only standard conversion map I could found on-line form so far (see below URL) does not include entries 0xf9d6 or above. Sorry, I do not know anything about conversion maps and CNS 11643-1993 planes. I only got a file in BIG5 encoding from Taiwan and found that it is not possible to load all text to postgresql 7.3.3. But it is possible to convert to UTF-8 with iconv tool from glibc (Linux). It would be good if next release supports todays BIG5. Michael http://www.unicode.org/Public/UNIDATA/Unihan.txt for EUC_TW WARNING: copy: line 1, LocalToUtf: could not convert (0x8ea3c3b7) EUC_TW to UTF-8. Ignored WARNING: copy: line 2, LocalToUtf: could not convert (0x8ea3cfd0) EUC_TW to UTF-8. Ignored WARNING: copy: line 3, LocalToUtf: could not convert (0x8ea3c4ce) EUC_TW to UTF-8. Ignored WARNING: copy: line 4, LocalToUtf: could not convert (0x8ea3bdfe) EUC_TW to UTF-8. Ignored Hum. These seem to be CNS 11643-1993, plane 3. Currently PostgreSQL supports only: CNS 11643-1993, plane 0 CNS 11643-1993, plane 1 CNS 11643-1993, plane 2 CNS 11643-1993, plane 15 Would you like to have support for rest of CNS 11643-1993 planes: CNS 11643-1993, plane 3 CNS 11643-1993, plane 4 CNS 11643-1993, plane 5 CNS 11643-1993, plane 6 CNS 11643-1993, plane 7 support for upcoming 7.4? Copy out to file from table (UTF-8 data): to BIG5 WARNING: UtfToLocal: could not convert UTF-8 (0xe7a281). Ignored WARNING: UtfToLocal: could not convert UTF-8 (0xe98ab9). Ignored WARNING: UtfToLocal: could not convert UTF-8 (0xe8a38f). Ignored WARNING: UtfToLocal: could not convert UTF-8 (0xe7b2a7). Ignored to EUC_TW is ok! BIG5 and EUC_TW have different code points. So this is not very strange. -- Tatsuo Ishii ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] again: Bug #943: Server-Encoding from EUC_TW toUTF-8
I reported bug #943 (I found in 7.3.2) and you checked in some change against integer overflow. Now I upgraded to 7.3.3 and I'm not happy with this. The exact error as I described is fixed, but I found new errors in conversion UTF-8 - EUC_TW and BIG5: Copy to table (DB has UTF-8 encoding) from file: for PGCLIENTENCODING=BIG5: WARNING: copy: line 1, LocalToUtf: could not convert (0xf9d6) BIG5 to UTF-8. Ignored WARNING: copy: line 2, LocalToUtf: could not convert (0xf9d7) BIG5 to UTF-8. Ignored WARNING: copy: line 3, LocalToUtf: could not convert (0xf9d8) BIG5 to UTF-8. Ignored WARNING: copy: line 4, LocalToUtf: could not convert (0xf9db) BIG5 to UTF-8. Ignored I see no problem here. The only standard conversion map I could found on-line form so far (see below URL) does not include entries 0xf9d6 or above. http://www.unicode.org/Public/UNIDATA/Unihan.txt I found in this file: U+F9D7 in line 604519 U+F9D8 in line 219540 U+F9D6...U+F9DB in lines 730707...730766. No. U+F9D6 means *Unicode* code point, not BIG5 code point. for EUC_TW WARNING: copy: line 1, LocalToUtf: could not convert (0x8ea3c3b7) EUC_TW to UTF-8. Ignored WARNING: copy: line 2, LocalToUtf: could not convert (0x8ea3cfd0) EUC_TW to UTF-8. Ignored WARNING: copy: line 3, LocalToUtf: could not convert (0x8ea3c4ce) EUC_TW to UTF-8. Ignored WARNING: copy: line 4, LocalToUtf: could not convert (0x8ea3bdfe) EUC_TW to UTF-8. Ignored Hum. These seem to be CNS 11643-1993, plane 3. Currently PostgreSQL supports only: CNS 11643-1993, plane 0 CNS 11643-1993, plane 1 CNS 11643-1993, plane 2 CNS 11643-1993, plane 15 Would you like to have support for rest of CNS 11643-1993 planes: CNS 11643-1993, plane 3 CNS 11643-1993, plane 4 CNS 11643-1993, plane 5 CNS 11643-1993, plane 6 CNS 11643-1993, plane 7 support for upcoming 7.4? Copy out to file from table (UTF-8 data): to BIG5 WARNING: UtfToLocal: could not convert UTF-8 (0xe7a281). Ignored WARNING: UtfToLocal: could not convert UTF-8 (0xe98ab9). Ignored WARNING: UtfToLocal: could not convert UTF-8 (0xe8a38f). Ignored WARNING: UtfToLocal: could not convert UTF-8 (0xe7b2a7). Ignored to EUC_TW is ok! BIG5 and EUC_TW have different code points. So this is not very strange. But it is very strange that I can (for EUC_TW) copy to file without error but I can not copy from file without error. Michael ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] again: Bug #943: Server-Encoding from EUC_TW toUTF-8
Copy to table (DB has UTF-8 encoding) from file: for PGCLIENTENCODING=BIG5: WARNING: copy: line 1, LocalToUtf: could not convert (0xf9d6) BIG5 to UTF-8. Ignored WARNING: copy: line 2, LocalToUtf: could not convert (0xf9d7) BIG5 to UTF-8. Ignored WARNING: copy: line 3, LocalToUtf: could not convert (0xf9d8) BIG5 to UTF-8. Ignored WARNING: copy: line 4, LocalToUtf: could not convert (0xf9db) BIG5 to UTF-8. Ignored I see no problem here. The only standard conversion map I could found on-line form so far (see below URL) does not include entries 0xf9d6 or above. Sorry, I do not know anything about conversion maps and CNS 11643-1993 planes. I only got a file in BIG5 encoding from Taiwan and found that it is not possible to load all text to postgresql 7.3.3. But it is possible to convert to UTF-8 with iconv tool from glibc (Linux). It would be good if next release supports todays BIG5. I'm not looking forward to add any conversion entries confirmed by standards. Can some one explain me the current status of the conversion maps between BIG5 and Unicode? The only info I could found so far is in www.unicode.org. -- Tatsuo Ishii ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] again: Bug #943: Server-Encoding from EUC_TW to
Hello, I reported bug #943 (I found in 7.3.2) and you checked in some change against integer overflow. Now I upgraded to 7.3.3 and I'm not happy with this. The exact error as I described is fixed, but I found new errors in conversion UTF-8 - EUC_TW and BIG5: Copy to table (DB has UTF-8 encoding) from file: for PGCLIENTENCODING=BIG5: WARNING: copy: line 1, LocalToUtf: could not convert (0xf9d6) BIG5 to UTF-8. Ignored WARNING: copy: line 2, LocalToUtf: could not convert (0xf9d7) BIG5 to UTF-8. Ignored WARNING: copy: line 3, LocalToUtf: could not convert (0xf9d8) BIG5 to UTF-8. Ignored WARNING: copy: line 4, LocalToUtf: could not convert (0xf9db) BIG5 to UTF-8. Ignored I see no problem here. The only standard conversion map I could found on-line form so far (see below URL) does not include entries 0xf9d6 or above. http://www.unicode.org/Public/UNIDATA/Unihan.txt for EUC_TW WARNING: copy: line 1, LocalToUtf: could not convert (0x8ea3c3b7) EUC_TW to UTF-8. Ignored WARNING: copy: line 2, LocalToUtf: could not convert (0x8ea3cfd0) EUC_TW to UTF-8. Ignored WARNING: copy: line 3, LocalToUtf: could not convert (0x8ea3c4ce) EUC_TW to UTF-8. Ignored WARNING: copy: line 4, LocalToUtf: could not convert (0x8ea3bdfe) EUC_TW to UTF-8. Ignored Hum. These seem to be CNS 11643-1993, plane 3. Currently PostgreSQL supports only: CNS 11643-1993, plane 0 CNS 11643-1993, plane 1 CNS 11643-1993, plane 2 CNS 11643-1993, plane 15 Would you like to have support for rest of CNS 11643-1993 planes: CNS 11643-1993, plane 3 CNS 11643-1993, plane 4 CNS 11643-1993, plane 5 CNS 11643-1993, plane 6 CNS 11643-1993, plane 7 support for upcoming 7.4? Copy out to file from table (UTF-8 data): to BIG5 WARNING: UtfToLocal: could not convert UTF-8 (0xe7a281). Ignored WARNING: UtfToLocal: could not convert UTF-8 (0xe98ab9). Ignored WARNING: UtfToLocal: could not convert UTF-8 (0xe8a38f). Ignored WARNING: UtfToLocal: could not convert UTF-8 (0xe7b2a7). Ignored to EUC_TW is ok! BIG5 and EUC_TW have different code points. So this is not very strange. -- Tatsuo Ishii ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings