On Tue, 10 Oct 2006, Tom Lane wrote:

Sergiy Vyshnevetskiy <[EMAIL PROTECTED]> writes:
Here is a new and improved patch, that closes security hole as well.

We really can't consider a patch like this, because not only does it
ignore the problem of multiple spellings of encoding names, but it
actually breaks existing functionality on platforms with a variant
spelling of the name.  I think a minimum requirement ought to be that
it work with any of the spellings recognized by initdb.

Alright, that was too strict. But when server uses messages in LC_CTYPE encoding with data in database encoding and pushes this mix through database-to-client charset conversion - that's a bug. PostgreSQL bug. And "UTF-8 panic" is it's direct result.

As a stop-gap I included a version of patch that breaks nothing. But it will fix the "wrong encoding" bug and "UTF-8 panic" only on those OS who recognize the supplied spelling. Linux and FreeBSD are among them.

Cycling through possible spellings in SetDatabaseEncoding() is suboptimal. The time and place to do it is somewhere in the configure script. There we can fill pg_enc2localname_tbl with results of testing possible charset names.

We can also just leave the patch as it is, because more and more OS learn more and more different charset name spellings every new version. Why waste too mush power chasing a horce that runs _to_you_? :)
--- src/backend/utils/mb/mbutils.c.orig Sun May 21 23:05:48 2006
+++ src/backend/utils/mb/mbutils.c      Thu Oct 12 21:31:15 2006
@@ -17,6 +17,114 @@
 #include "catalog/namespace.h"
 
 /*
+ * Try to map most internal character encodings to the proper and
+ * preferred IANA string. Use this in mbutils.c to feed gettext info about
+ * the database's character encoding.
+ *
+ * Palle Girgensohn, 2005
+ */
+
+pg_enc2name pg_enc2localname_tbl[] =
+{
+       {
+               "US-ASCII", PG_SQL_ASCII
+       },
+       {
+               "EUC-JP", PG_EUC_JP
+       },
+       {
+               "GB2312", PG_EUC_CN
+       },
+       {
+               "EUC-KR", PG_EUC_KR
+       },
+       {
+               "ISO-2022-CN", PG_EUC_TW
+       },
+       {
+               "KS_C_5601-1987", PG_JOHAB  /* either KS_C_5601-1987 or 
ISO-2022-KR ??? */
+       },
+       {
+               "UTF-8", PG_UTF8
+       },
+       {
+               "MULE_INTERNAL", PG_MULE_INTERNAL  /* is not for real */
+       },
+       {
+               "ISO-8859-1", PG_LATIN1
+       },
+       {
+               "ISO-8859-2", PG_LATIN2
+       },
+       {
+               "ISO-8859-3", PG_LATIN3
+       },
+       {
+               "ISO-8859-4", PG_LATIN4
+       },
+       {
+               "ISO-8859-9", PG_LATIN5
+       },
+       {
+               "ISO-8859-10", PG_LATIN6
+       },
+       {
+               "ISO-8859-13", PG_LATIN7
+       },
+       {
+               "ISO-8859-14", PG_LATIN8
+       },
+       {
+               "ISO-8859-15", PG_LATIN9
+       },
+       {
+               "ISO-8859-16", PG_LATIN10
+       },
+       {
+               "windows-1256", PG_WIN1256
+       },
+       {
+               "windows-874", PG_WIN874
+       },
+       {
+               "KOI8-R", PG_KOI8R
+       },
+       {
+               "windows-1251", PG_WIN1251
+       },
+       {
+               "ISO-8859-5", PG_ISO_8859_5
+       },
+       {
+               "ISO-8859-6", PG_ISO_8859_6
+       },
+       {
+               "ISO-8859-7", PG_ISO_8859_7
+       },
+       {
+               "ISO-8859-8", PG_ISO_8859_8
+       },
+       {
+               "windows-1250", PG_WIN1250
+       },
+       {
+               "Shift_JIS", PG_SJIS
+       },
+       {
+               "Big5", PG_BIG5
+       },
+       {
+               "GBK", PG_GBK
+       },
+       {
+               "cp949", PG_UHC
+       },
+       {
+               "GB18030", PG_GB18030
+       }
+};
+
+/*
  * We handle for actual FE and BE encoding setting encoding-identificator
  * and encoding-name too. It prevent searching and conversion from encoding
  * to encoding name in getdatabaseencoding() and other routines.
@@ -611,6 +719,14 @@
 
        DatabaseEncoding = &pg_enc2name_tbl[encoding];
        Assert(DatabaseEncoding->encoding == encoding);
+       
+       /* 
+        * Try to set charset for messages the same as database charset. 
+        * If OS doesn't recognize charset name - do nothing.
+        */
+       
+       bind_textdomain_codeset("postgres",
+               (&pg_enc2localname_tbl[encoding])->name);
 }
 
 void
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org

Reply via email to