On Tue, 10 Oct 2006, Tom Lane wrote:
Sergiy Vyshnevetskiy <[EMAIL PROTECTED]> writes:
Here is a new and improved patch, that closes security hole as well.
We really can't consider a patch like this, because not only does it
ignore the problem of multiple spellings of encoding names, but it
actually breaks existing functionality on platforms with a variant
spelling of the name. I think a minimum requirement ought to be that
it work with any of the spellings recognized by initdb.
Alright, that was too strict. But when server uses messages in
LC_CTYPE encoding with data in database encoding and pushes this mix
through database-to-client charset conversion - that's a bug. PostgreSQL
bug. And "UTF-8 panic" is it's direct result.
As a stop-gap I included a version of patch that breaks nothing. But it
will fix the "wrong encoding" bug and "UTF-8 panic" only on those OS who
recognize the supplied spelling. Linux and FreeBSD are among them.
Cycling through possible spellings in SetDatabaseEncoding() is suboptimal.
The time and place to do it is somewhere in the configure script. There we
can fill pg_enc2localname_tbl with results of testing possible charset
names.
We can also just leave the patch as it is, because more and more OS learn
more and more different charset name spellings every new version. Why
waste too mush power chasing a horce that runs _to_you_? :)
--- src/backend/utils/mb/mbutils.c.orig Sun May 21 23:05:48 2006
+++ src/backend/utils/mb/mbutils.c Thu Oct 12 21:31:15 2006
@@ -17,6 +17,114 @@
#include "catalog/namespace.h"
/*
+ * Try to map most internal character encodings to the proper and
+ * preferred IANA string. Use this in mbutils.c to feed gettext info about
+ * the database's character encoding.
+ *
+ * Palle Girgensohn, 2005
+ */
+
+pg_enc2name pg_enc2localname_tbl[] =
+{
+ {
+ "US-ASCII", PG_SQL_ASCII
+ },
+ {
+ "EUC-JP", PG_EUC_JP
+ },
+ {
+ "GB2312", PG_EUC_CN
+ },
+ {
+ "EUC-KR", PG_EUC_KR
+ },
+ {
+ "ISO-2022-CN", PG_EUC_TW
+ },
+ {
+ "KS_C_5601-1987", PG_JOHAB /* either KS_C_5601-1987 or
ISO-2022-KR ??? */
+ },
+ {
+ "UTF-8", PG_UTF8
+ },
+ {
+ "MULE_INTERNAL", PG_MULE_INTERNAL /* is not for real */
+ },
+ {
+ "ISO-8859-1", PG_LATIN1
+ },
+ {
+ "ISO-8859-2", PG_LATIN2
+ },
+ {
+ "ISO-8859-3", PG_LATIN3
+ },
+ {
+ "ISO-8859-4", PG_LATIN4
+ },
+ {
+ "ISO-8859-9", PG_LATIN5
+ },
+ {
+ "ISO-8859-10", PG_LATIN6
+ },
+ {
+ "ISO-8859-13", PG_LATIN7
+ },
+ {
+ "ISO-8859-14", PG_LATIN8
+ },
+ {
+ "ISO-8859-15", PG_LATIN9
+ },
+ {
+ "ISO-8859-16", PG_LATIN10
+ },
+ {
+ "windows-1256", PG_WIN1256
+ },
+ {
+ "windows-874", PG_WIN874
+ },
+ {
+ "KOI8-R", PG_KOI8R
+ },
+ {
+ "windows-1251", PG_WIN1251
+ },
+ {
+ "ISO-8859-5", PG_ISO_8859_5
+ },
+ {
+ "ISO-8859-6", PG_ISO_8859_6
+ },
+ {
+ "ISO-8859-7", PG_ISO_8859_7
+ },
+ {
+ "ISO-8859-8", PG_ISO_8859_8
+ },
+ {
+ "windows-1250", PG_WIN1250
+ },
+ {
+ "Shift_JIS", PG_SJIS
+ },
+ {
+ "Big5", PG_BIG5
+ },
+ {
+ "GBK", PG_GBK
+ },
+ {
+ "cp949", PG_UHC
+ },
+ {
+ "GB18030", PG_GB18030
+ }
+};
+
+/*
* We handle for actual FE and BE encoding setting encoding-identificator
* and encoding-name too. It prevent searching and conversion from encoding
* to encoding name in getdatabaseencoding() and other routines.
@@ -611,6 +719,14 @@
DatabaseEncoding = &pg_enc2name_tbl[encoding];
Assert(DatabaseEncoding->encoding == encoding);
+
+ /*
+ * Try to set charset for messages the same as database charset.
+ * If OS doesn't recognize charset name - do nothing.
+ */
+
+ bind_textdomain_codeset("postgres",
+ (&pg_enc2localname_tbl[encoding])->name);
}
void
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
http://archives.postgresql.org