Re: [BUGS] BUG #2685: Wrong charset of server messages on client

Sergiy Vyshnevetskiy Thu, 12 Oct 2006 12:51:23 -0700

On Tue, 10 Oct 2006, Tom Lane wrote:

Sergiy Vyshnevetskiy <[EMAIL PROTECTED]> writes:

Here is a new and improved patch, that closes security hole as well.


We really can't consider a patch like this, because not only does it
ignore the problem of multiple spellings of encoding names, but it
actually breaks existing functionality on platforms with a variant
spelling of the name.  I think a minimum requirement ought to be that
it work with any of the spellings recognized by initdb.

Alright, that was too strict. But when server uses messages inLC_CTYPE encoding with data in database encoding and pushes this mixthrough database-to-client charset conversion - that's a bug. PostgreSQLbug. And "UTF-8 panic" is it's direct result.

As a stop-gap I included a version of patch that breaks nothing. But itwill fix the "wrong encoding" bug and "UTF-8 panic" only on those OS whorecognize the supplied spelling. Linux and FreeBSD are among them.

Cycling through possible spellings in SetDatabaseEncoding() is suboptimal.The time and place to do it is somewhere in the configure script. There wecan fill pg_enc2localname_tbl with results of testing possible charsetnames.

We can also just leave the patch as it is, because more and more OS learnmore and more different charset name spellings every new version. Whywaste too mush power chasing a horce that runs _to_you_? :)

--- src/backend/utils/mb/mbutils.c.orig Sun May 21 23:05:48 2006
+++ src/backend/utils/mb/mbutils.c      Thu Oct 12 21:31:15 2006
@@ -17,6 +17,114 @@
 #include "catalog/namespace.h"
 
 /*
+ * Try to map most internal character encodings to the proper and
+ * preferred IANA string. Use this in mbutils.c to feed gettext info about
+ * the database's character encoding.
+ *
+ * Palle Girgensohn, 2005
+ */
+
+pg_enc2name pg_enc2localname_tbl[] =
+{
+       {
+               "US-ASCII", PG_SQL_ASCII
+       },
+       {
+               "EUC-JP", PG_EUC_JP
+       },
+       {
+               "GB2312", PG_EUC_CN
+       },
+       {
+               "EUC-KR", PG_EUC_KR
+       },
+       {
+               "ISO-2022-CN", PG_EUC_TW
+       },
+       {
+               "KS_C_5601-1987", PG_JOHAB  /* either KS_C_5601-1987 or 
ISO-2022-KR ??? */
+       },
+       {
+               "UTF-8", PG_UTF8
+       },
+       {
+               "MULE_INTERNAL", PG_MULE_INTERNAL  /* is not for real */
+       },
+       {
+               "ISO-8859-1", PG_LATIN1
+       },
+       {
+               "ISO-8859-2", PG_LATIN2
+       },
+       {
+               "ISO-8859-3", PG_LATIN3
+       },
+       {
+               "ISO-8859-4", PG_LATIN4
+       },
+       {
+               "ISO-8859-9", PG_LATIN5
+       },
+       {
+               "ISO-8859-10", PG_LATIN6
+       },
+       {
+               "ISO-8859-13", PG_LATIN7
+       },
+       {
+               "ISO-8859-14", PG_LATIN8
+       },
+       {
+               "ISO-8859-15", PG_LATIN9
+       },
+       {
+               "ISO-8859-16", PG_LATIN10
+       },
+       {
+               "windows-1256", PG_WIN1256
+       },
+       {
+               "windows-874", PG_WIN874
+       },
+       {
+               "KOI8-R", PG_KOI8R
+       },
+       {
+               "windows-1251", PG_WIN1251
+       },
+       {
+               "ISO-8859-5", PG_ISO_8859_5
+       },
+       {
+               "ISO-8859-6", PG_ISO_8859_6
+       },
+       {
+               "ISO-8859-7", PG_ISO_8859_7
+       },
+       {
+               "ISO-8859-8", PG_ISO_8859_8
+       },
+       {
+               "windows-1250", PG_WIN1250
+       },
+       {
+               "Shift_JIS", PG_SJIS
+       },
+       {
+               "Big5", PG_BIG5
+       },
+       {
+               "GBK", PG_GBK
+       },
+       {
+               "cp949", PG_UHC
+       },
+       {
+               "GB18030", PG_GB18030
+       }
+};
+
+/*
  * We handle for actual FE and BE encoding setting encoding-identificator
  * and encoding-name too. It prevent searching and conversion from encoding
  * to encoding name in getdatabaseencoding() and other routines.
@@ -611,6 +719,14 @@
 
        DatabaseEncoding = &pg_enc2name_tbl[encoding];
        Assert(DatabaseEncoding->encoding == encoding);
+       
+       /* 
+        * Try to set charset for messages the same as database charset. 
+        * If OS doesn't recognize charset name - do nothing.
+        */
+       
+       bind_textdomain_codeset("postgres",
+               (&pg_enc2localname_tbl[encoding])->name);
 }
 
 void

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org

Re: [BUGS] BUG #2685: Wrong charset of server messages on client

Reply via email to