Re: [PATCHES] [BUGS] BUG #2120: Crash when doing UTF8-ISO_8859_8 encoding

2005-12-22 Thread Tatsuo Ishii
 Tom Lane wrote:
  Tatsuo Ishii [EMAIL PROTECTED] writes:
   It looks like somebody rearranged the pg_enc enum without bothering to
   fix the tables that are affected by this.
  
   I will look into this.
  
  Thank you.  It might be worth adding a comment to pg_wchar.h listing all
  the places that need to be fixed when enum pg_enc changes.
  
 
 I have developed the following patch against CVS.  Tatsuo, you can use
 it as a starting point.  It adds a comment to encnames.c and reorders
 utf8_and_iso8859.c to match the existing order.  I also added the
 missing entries at the bottom.  I checked for pg_conv_map in the source
 code and only utf8_and_iso8859.c has that structure, so I assume it is
 the only one that also depends on the encnames.c ordering.

I think the current implementaion in utf8_and_iso8859.c is fast but
too fragile against rearranging of encoding id. I modify those functions
in utf8_and_iso8859.c to do a linear search with encoding id. With
this change developers feel free to rearrange encoding id, and this
kind of problems will be gone forever. The only penalty is the time of
searching 13 entries in the encoding map. We can do a quick sort but
it will need sorted entry by encoding id and may cause similar problem
in the future. So I'm not sure it's worth doing the quick sort.

Propsed patch attached.

 Looking at 8.0.X, it has the matching order, so we are OK there, but it
 doesn't have the trailing entries.  Tatsuo, are those needed?

I think it's OK, since the last missing entry will never be visited.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
Index: src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c
===
RCS file: 
/cvsroot/pgsql/src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c,v
retrieving revision 1.16
diff -u -r1.16 utf8_and_iso8859.c
--- src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c   
22 Nov 2005 18:17:26 -  1.16
+++ src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c   
23 Dec 2005 01:43:38 -
@@ -68,15 +68,6 @@
 } pg_conv_map;
 
 static pg_conv_map maps[] = {
-   {PG_SQL_ASCII}, /* SQL/ASCII */
-   {PG_EUC_JP},/* EUC for Japanese */
-   {PG_EUC_CN},/* EUC for Chinese */
-   {PG_EUC_KR},/* EUC for Korean */
-   {PG_EUC_TW},/* EUC for Taiwan */
-   {PG_JOHAB}, /* EUC for Korean JOHAB 
*/
-   {PG_UTF8},  /* Unicode UTF8 */
-   {PG_MULE_INTERNAL}, /* Mule internal code */
-   {PG_LATIN1},/* ISO-8859-1 Latin 1 */
{PG_LATIN2, LUmapISO8859_2, ULmapISO8859_2,
sizeof(LUmapISO8859_2) / sizeof(pg_local_to_utf),
sizeof(ULmapISO8859_2) / sizeof(pg_utf_to_local)},  /* ISO-8859-2 
Latin 2 */
@@ -104,12 +95,6 @@
{PG_LATIN10, LUmapISO8859_16, ULmapISO8859_16,
sizeof(LUmapISO8859_16) / sizeof(pg_local_to_utf),
sizeof(ULmapISO8859_16) / sizeof(pg_utf_to_local)}, /* ISO-8859-16 
Latin 10 */
-   {PG_WIN1256},   /* windows-1256 */
-   {PG_WIN1258},   /* Windows-1258 */
-   {PG_WIN874},/* windows-874 */
-   {PG_KOI8R}, /* KOI8-R */
-   {PG_WIN1251},   /* windows-1251 */
-   {PG_WIN866},/* (MS-DOS CP866) */
{PG_ISO_8859_5, LUmapISO8859_5, ULmapISO8859_5,
sizeof(LUmapISO8859_5) / sizeof(pg_local_to_utf),
sizeof(ULmapISO8859_5) / sizeof(pg_utf_to_local)},  /* ISO-8859-5 */
@@ -131,11 +116,23 @@
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+   int i;
 
Assert(PG_GETARG_INT32(1) == PG_UTF8);
Assert(len = 0);
 
-   LocalToUtf(src, dest, maps[encoding].map1, maps[encoding].size1, 
encoding, len);
+   for (i=0;isizeof(maps)/sizeof(pg_conv_map);i++)
+   {
+   if (encoding == maps[i].encoding)
+   {
+   LocalToUtf(src, dest, maps[i].map1, maps[i].size1, 
encoding, len);
+   PG_RETURN_VOID();
+   }
+   }
+
+   ereport(ERROR,
+   (errcode(ERRCODE_INTERNAL_ERROR),
+errmsg(unexpected encoding id %d for ISO-8859 
charsets, encoding)));
 
PG_RETURN_VOID();
 }
@@ -147,11 +144,23 @@
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = 

Re: [PATCHES] [BUGS] BUG #2120: Crash when doing UTF8-ISO_8859_8 encoding conversion

2005-12-22 Thread Bruce Momjian

That is a nice solution --- instead of listing all the encodings, you
listed just the ones that need to be used.  The list is shorter and
clearer.  It seems like the right approach.  Thanks.

---

Tatsuo Ishii wrote:
  Tom Lane wrote:
   Tatsuo Ishii [EMAIL PROTECTED] writes:
It looks like somebody rearranged the pg_enc enum without bothering to
fix the tables that are affected by this.
   
I will look into this.
   
   Thank you.  It might be worth adding a comment to pg_wchar.h listing all
   the places that need to be fixed when enum pg_enc changes.
   
  
  I have developed the following patch against CVS.  Tatsuo, you can use
  it as a starting point.  It adds a comment to encnames.c and reorders
  utf8_and_iso8859.c to match the existing order.  I also added the
  missing entries at the bottom.  I checked for pg_conv_map in the source
  code and only utf8_and_iso8859.c has that structure, so I assume it is
  the only one that also depends on the encnames.c ordering.
 
 I think the current implementaion in utf8_and_iso8859.c is fast but
 too fragile against rearranging of encoding id. I modify those functions
 in utf8_and_iso8859.c to do a linear search with encoding id. With
 this change developers feel free to rearrange encoding id, and this
 kind of problems will be gone forever. The only penalty is the time of
 searching 13 entries in the encoding map. We can do a quick sort but
 it will need sorted entry by encoding id and may cause similar problem
 in the future. So I'm not sure it's worth doing the quick sort.
 
 Propsed patch attached.
 
  Looking at 8.0.X, it has the matching order, so we are OK there, but it
  doesn't have the trailing entries.  Tatsuo, are those needed?
 
 I think it's OK, since the last missing entry will never be visited.
 --
 Tatsuo Ishii
 SRA OSS, Inc. Japan

 Index: 
 src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c
 ===
 RCS file: 
 /cvsroot/pgsql/src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c,v
 retrieving revision 1.16
 diff -u -r1.16 utf8_and_iso8859.c
 --- src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c 
 22 Nov 2005 18:17:26 -  1.16
 +++ src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c 
 23 Dec 2005 01:43:38 -
 @@ -68,15 +68,6 @@
  } pg_conv_map;
  
  static pg_conv_map maps[] = {
 - {PG_SQL_ASCII}, /* SQL/ASCII */
 - {PG_EUC_JP},/* EUC for Japanese */
 - {PG_EUC_CN},/* EUC for Chinese */
 - {PG_EUC_KR},/* EUC for Korean */
 - {PG_EUC_TW},/* EUC for Taiwan */
 - {PG_JOHAB}, /* EUC for Korean JOHAB 
 */
 - {PG_UTF8},  /* Unicode UTF8 */
 - {PG_MULE_INTERNAL}, /* Mule internal code */
 - {PG_LATIN1},/* ISO-8859-1 Latin 1 */
   {PG_LATIN2, LUmapISO8859_2, ULmapISO8859_2,
   sizeof(LUmapISO8859_2) / sizeof(pg_local_to_utf),
   sizeof(ULmapISO8859_2) / sizeof(pg_utf_to_local)},  /* ISO-8859-2 
 Latin 2 */
 @@ -104,12 +95,6 @@
   {PG_LATIN10, LUmapISO8859_16, ULmapISO8859_16,
   sizeof(LUmapISO8859_16) / sizeof(pg_local_to_utf),
   sizeof(ULmapISO8859_16) / sizeof(pg_utf_to_local)}, /* ISO-8859-16 
 Latin 10 */
 - {PG_WIN1256},   /* windows-1256 */
 - {PG_WIN1258},   /* Windows-1258 */
 - {PG_WIN874},/* windows-874 */
 - {PG_KOI8R}, /* KOI8-R */
 - {PG_WIN1251},   /* windows-1251 */
 - {PG_WIN866},/* (MS-DOS CP866) */
   {PG_ISO_8859_5, LUmapISO8859_5, ULmapISO8859_5,
   sizeof(LUmapISO8859_5) / sizeof(pg_local_to_utf),
   sizeof(ULmapISO8859_5) / sizeof(pg_utf_to_local)},  /* ISO-8859-5 */
 @@ -131,11 +116,23 @@
   unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
   unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
   int len = PG_GETARG_INT32(4);
 + int i;
  
   Assert(PG_GETARG_INT32(1) == PG_UTF8);
   Assert(len = 0);
  
 - LocalToUtf(src, dest, maps[encoding].map1, maps[encoding].size1, 
 encoding, len);
 + for (i=0;isizeof(maps)/sizeof(pg_conv_map);i++)
 + {
 + if (encoding == maps[i].encoding)
 + {
 + LocalToUtf(src, dest, maps[i].map1, maps[i].size1, 
 encoding, len);
 + PG_RETURN_VOID();
 + }
 + }
 +
 + ereport(ERROR,
 + (errcode(ERRCODE_INTERNAL_ERROR),
 +  

Re: [PATCHES] [BUGS] BUG #2120: Crash when doing UTF8-ISO_8859_8 encoding

2005-12-22 Thread Tom Lane
Tatsuo Ishii [EMAIL PROTECTED] writes:
 I think the current implementaion in utf8_and_iso8859.c is fast but
 too fragile against rearranging of encoding id. I modify those functions
 in utf8_and_iso8859.c to do a linear search with encoding id.

That's not unreasonable, but I was wondering whether we could add some
Assert() tests that would catch problems without imposing any extra cost
in normal non-Assert builds.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] [BUGS] BUG #2120: Crash when doing UTF8-ISO_8859_8 encoding conversion

2005-12-21 Thread Bruce Momjian
Tom Lane wrote:
 Tatsuo Ishii [EMAIL PROTECTED] writes:
  It looks like somebody rearranged the pg_enc enum without bothering to
  fix the tables that are affected by this.
 
  I will look into this.
 
 Thank you.  It might be worth adding a comment to pg_wchar.h listing all
 the places that need to be fixed when enum pg_enc changes.
 

I have developed the following patch against CVS.  Tatsuo, you can use
it as a starting point.  It adds a comment to encnames.c and reorders
utf8_and_iso8859.c to match the existing order.  I also added the
missing entries at the bottom.  I checked for pg_conv_map in the source
code and only utf8_and_iso8859.c has that structure, so I assume it is
the only one that also depends on the encnames.c ordering.

Looking at 8.0.X, it has the matching order, so we are OK there, but it
doesn't have the trailing entries.  Tatsuo, are those needed?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073
Index: src/backend/utils/mb/encnames.c
===
RCS file: /cvsroot/pgsql/src/backend/utils/mb/encnames.c,v
retrieving revision 1.26
diff -c -c -u -r1.26 encnames.c
--- src/backend/utils/mb/encnames.c 15 Oct 2005 02:49:33 -  1.26
+++ src/backend/utils/mb/encnames.c 22 Dec 2005 02:45:02 -
@@ -264,6 +264,7 @@
 /* --
  * These are official encoding names.
  * XXX must be sorted by the same order as pg_enc type (see mb/pg_wchar.h)
+ * and pg_conv_map in mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c.
  * --
  */
 pg_enc2name pg_enc2name_tbl[] =
Index: src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c
===
RCS file: 
/cvsroot/pgsql/src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c,v
retrieving revision 1.16
diff -c -c -u -r1.16 utf8_and_iso8859.c
--- src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c   
22 Nov 2005 18:17:26 -  1.16
+++ src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c   
22 Dec 2005 02:45:03 -
@@ -106,10 +106,11 @@
sizeof(ULmapISO8859_16) / sizeof(pg_utf_to_local)}, /* ISO-8859-16 
Latin 10 */
{PG_WIN1256},   /* windows-1256 */
{PG_WIN1258},   /* Windows-1258 */
+   {PG_WIN866},/* (MS-DOS CP866) */
{PG_WIN874},/* windows-874 */
{PG_KOI8R}, /* KOI8-R */
{PG_WIN1251},   /* windows-1251 */
-   {PG_WIN866},/* (MS-DOS CP866) */
+   {PG_WIN1252},   /* windows-1252 */
{PG_ISO_8859_5, LUmapISO8859_5, ULmapISO8859_5,
sizeof(LUmapISO8859_5) / sizeof(pg_local_to_utf),
sizeof(ULmapISO8859_5) / sizeof(pg_utf_to_local)},  /* ISO-8859-5 */
@@ -122,6 +123,12 @@
{PG_ISO_8859_8, LUmapISO8859_8, ULmapISO8859_8,
sizeof(LUmapISO8859_8) / sizeof(pg_local_to_utf),
sizeof(ULmapISO8859_8) / sizeof(pg_utf_to_local)},  /* ISO-8859-8 */
+   {PG_WIN1250},   /* windows-1250 */
+   {PG_SJIS},  /* SJIS */
+   {PG_BIG5},  /* BIG5 */
+   {PG_GBK},   /* GBK */
+   {PG_UHC},   /* UHC */
+   {PG_GB18030}/* GB18030 */
 };
 
 Datum

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match