Re: [PATCHES] prevent encoding conversion recursive error
On Thu, Sep 01, 2005 at 05:48:55PM +0200, Peter Eisentraut wrote: > Am Sonntag, 21. August 2005 01:48 schrieb Tom Lane: > > One thing that occurred to me is that we might be able to simplify the > > problem by adopting a project standard that all NLS message files shall > > be in UTF8, period. Then we only have one encoding name to figure out > > rather than N. Maybe this doesn't help much ... > > I suppose this would then break NLS on Windows, no? We now have a patch to handle UTF-8 on Windows, via recoding to UTF-16 and back, so I guess not (not sure though). It would manage to annoy me as a translator, but nothing too serious really. -- Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com "Escucha y olvidar?s; ve y recordar?s; haz y entender?s" (Confucio) ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PATCHES] prevent encoding conversion recursive error
Am Sonntag, 21. August 2005 01:48 schrieb Tom Lane: > One thing that occurred to me is that we might be able to simplify the > problem by adopting a project standard that all NLS message files shall > be in UTF8, period. Then we only have one encoding name to figure out > rather than N. Maybe this doesn't help much ... I suppose this would then break NLS on Windows, no? -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] prevent encoding conversion recursive error
Bruce Momjian writes: > Is there a TODO here? Yeah: * Fix problems with wrong runtime encoding conversion for NLS message files One thing that occurred to me is that we might be able to simplify the problem by adopting a project standard that all NLS message files shall be in UTF8, period. Then we only have one encoding name to figure out rather than N. Maybe this doesn't help much ... regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] prevent encoding conversion recursive error
Peter Eisentraut wrote: > Am Sonntag, 14. August 2005 23:48 schrieb Tom Lane: > > Yeah, but don't we already have some code for that (or, actually, the > > reverse direction) in initdb? It's probably not perfect, but it'd be > > a lot better than crashing. > > The reverse direction is a lot simpler because we know the set of possible > output values. I'm not sure how to do the mapping in the direction of the > OS. Is there a TODO here? -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PATCHES] prevent encoding conversion recursive error
Am Sonntag, 14. August 2005 23:48 schrieb Tom Lane: > Yeah, but don't we already have some code for that (or, actually, the > reverse direction) in initdb? It's probably not perfect, but it'd be > a lot better than crashing. The reverse direction is a lot simpler because we know the set of possible output values. I'm not sure how to do the mapping in the direction of the OS. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] prevent encoding conversion recursive error
Peter Eisentraut <[EMAIL PROTECTED]> writes: > Am Dienstag, 9. August 2005 04:51 schrieb Tom Lane: >> which leads to the question "why aren't we using >> bind_textdomain_codeset() to tell gettext what character set it should >> produce"? > That would probably require us to solve the question on how to translate > PostgreSQL encoding names to OS encoding names. Yeah, but don't we already have some code for that (or, actually, the reverse direction) in initdb? It's probably not perfect, but it'd be a lot better than crashing. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] prevent encoding conversion recursive error
Am Dienstag, 9. August 2005 04:51 schrieb Tom Lane: > which leads to the question "why aren't we using > bind_textdomain_codeset() to tell gettext what character set it should > produce"? That would probably require us to solve the question on how to translate PostgreSQL encoding names to OS encoding names. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] prevent encoding conversion recursive error
I wrote: > This does not look real easy to fix. Who's up for reimplementing > gettext and a few other pieces from scratch? However, I did find http://gnu.miscellaneousmirror.org/software/libc/manual/html_node/Charset-conversion-in-gettext.html#Charset-conversion-in-gettext which leads to the question "why aren't we using bind_textdomain_codeset() to tell gettext what character set it should produce"? regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] prevent encoding conversion recursive error
I wrote: > ...real problem is that gettext() has not been told the correct character > set to convert messages to. > ISTM we've seen this issue before and Peter had an idea how to fix it, > but I forget the details. Peter? A little bit of digging in the list archives located http://archives.postgresql.org/pgsql-hackers/2003-11/msg01299.php in which Peter opines : - lc_collate and lc_ctype need to be held fixed in the entire cluster. : : - Gettext relies on iconv character set conversion, which relies on : lc_ctype, which leads to a complete screw-up in the server because of : the previous item. which seems to fit with my observation: the message texts are being converted to the cluster's original encoding rather than the encoding that's active in the current database. This does not look real easy to fix. Who's up for reimplementing gettext and a few other pieces from scratch? There is a separate line of thought here, which is that we are unlikely ever to get this completely perfect, and so it'd be good if errors during error processing didn't lead to recursion and PANIC. I don't have an idea how to solve that one either. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] prevent encoding conversion recursive error
"Qingqing Zhou" <[EMAIL PROTECTED]> writes: > Yeah, it is not a very clean solution. Do you mean the general problem is > "prevent recursive error reporting because of the error in transalting error > message"? > I put the image of the reporting email here: > http://www.cs.toronto.edu/~zhouqq/encode.jpg Actually, I believe the general problem is that the gettext software is doing the wrong internal character-set conversion for translated message texts. I can get this same crash on a Linux machine if I have server encoding = utf8 and client encoding = gb18030 and I set lc_messages = zh_TW ... but if I instead make lc_messages = zh_CN, no problem. The backend zh_TW.po file contains msgid "ignoring unconvertible UTF-8 character 0x%04x" msgstr "忽ç¥ç¡æ³è½æçUTF-8åå 0x%04x" and if I read the header correctly, this is claimed to be in UTF8 encoding. So it ought to be delivered as-is when in a UTF8 database. But tracing through the failure with gdb, I see that what is actually delivered back from gettext() is (gdb) p str $1 = 0x82e8a74 "ºöÂÔ?·¨??µÄUTF-8×ÖÔª0xd4da" (gdb) x/32cx str 0x82e8a74: 0xba0xf60xc20xd40x3f0xb70xa80x3f 0x82e8a7c: 0x3f0xb50xc40x550x540x460x2d0x38 0x82e8a84: 0xd70xd60xd40xaa0x300x780x640x34 0x82e8a8c: 0x640x610x000x7e0x7f0x7f0x7f0x7f (gdb) so some sort of conversion has taken place. I had initially initialized the database with initdb --locale=zh_CN, which was interpreted by Postgres as requesting EUC_CN encoding. I suspect the above is the EUC_CN equivalent of the message text from the .po file, and that the real problem is that gettext() has not been told the correct character set to convert messages to. ISTM we've seen this issue before and Peter had an idea how to fix it, but I forget the details. Peter? regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] prevent encoding conversion recursive error
"Tom Lane" <[EMAIL PROTECTED]> writes > > This is a really ugly solution ... and I don't think it solves the > general problem anyway, since this isn't the only possible error message. > Yeah, it is not a very clean solution. Do you mean the general problem is "prevent recursive error reporting because of the error in transalting error message"? I put the image of the reporting email here: http://www.cs.toronto.edu/~zhouqq/encode.jpg Regards, Qingqing ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] prevent encoding conversion recursive error
"Qingqing Zhou" <[EMAIL PROTECTED]> writes: > As this thread reports (some sever messages are in Chinese): > http://archives.postgresql.org/pgsql-bugs/2005-07/msg00247.php > a SQL grammar error could crash the error stack. Hmm, I thought we had fixed that years ago. > To fix this, we just change errmsg() to errmsg_internal() to avoid > tranlation which could stop the recursion at step 2. This is a really ugly solution ... and I don't think it solves the general problem anyway, since this isn't the only possible error message. I don't seem to have gotten the original problem report, and the archive page is pretty useless because all the non-ASCII characters have gotten changed to "?". Could you post a self-contained example case? Might be best to wrap it as a compressed attachment so it doesn't get munged in transmission. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match