Ok, I will commit the patches. BTW, the example code sequence (ISO-8859-8) Sagi posted seems to have wrong one.
select '????????'; WARNING: ignoring unconvertible ISO_8859_8 character 0x00d7 : : 0x00d7(\327) is not listed in our ISO-8858-8/UTF-8 conversion map. Is this OK or do we need to add the conversion for the code? What do you think, Sega? -- Tatsuo Ishii SRA OSS, Inc. Japan > That is a nice solution --- instead of listing all the encodings, you > listed just the ones that need to be used. The list is shorter and > clearer. It seems like the right approach. Thanks. > > --------------------------------------------------------------------------- > > Tatsuo Ishii wrote: > > > Tom Lane wrote: > > > > Tatsuo Ishii <[EMAIL PROTECTED]> writes: > > > > >> It looks like somebody rearranged the pg_enc enum without bothering > > > > >> to > > > > >> fix the tables that are affected by this. > > > > > > > > > I will look into this. > > > > > > > > Thank you. It might be worth adding a comment to pg_wchar.h listing all > > > > the places that need to be fixed when enum pg_enc changes. > > > > > > > > > > I have developed the following patch against CVS. Tatsuo, you can use > > > it as a starting point. It adds a comment to encnames.c and reorders > > > utf8_and_iso8859.c to match the existing order. I also added the > > > missing entries at the bottom. I checked for pg_conv_map in the source > > > code and only utf8_and_iso8859.c has that structure, so I assume it is > > > the only one that also depends on the encnames.c ordering. > > > > I think the current implementaion in utf8_and_iso8859.c is fast but > > too fragile against rearranging of encoding id. I modify those functions > > in utf8_and_iso8859.c to do a linear search with encoding id. With > > this change developers feel free to rearrange encoding id, and this > > kind of problems will be gone forever. The only penalty is the time of > > searching 13 entries in the encoding map. We can do a quick sort but > > it will need sorted entry by encoding id and may cause similar problem > > in the future. So I'm not sure it's worth doing the quick sort. > > > > Propsed patch attached. > > > > > Looking at 8.0.X, it has the matching order, so we are OK there, but it > > > doesn't have the trailing entries. Tatsuo, are those needed? > > > > I think it's OK, since the last missing entry will never be visited. > > -- > > Tatsuo Ishii > > SRA OSS, Inc. Japan > > > Index: > > src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c > > =================================================================== > > RCS file: > > /cvsroot/pgsql/src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c,v > > retrieving revision 1.16 > > diff -u -r1.16 utf8_and_iso8859.c > > --- > > src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c > > 22 Nov 2005 18:17:26 -0000 1.16 > > +++ > > src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c > > 23 Dec 2005 01:43:38 -0000 > > @@ -68,15 +68,6 @@ > > } pg_conv_map; > > > > static pg_conv_map maps[] = { > > - {PG_SQL_ASCII}, /* SQL/ASCII */ > > - {PG_EUC_JP}, /* EUC for Japanese */ > > - {PG_EUC_CN}, /* EUC for Chinese */ > > - {PG_EUC_KR}, /* EUC for Korean */ > > - {PG_EUC_TW}, /* EUC for Taiwan */ > > - {PG_JOHAB}, /* EUC for Korean JOHAB > > */ > > - {PG_UTF8}, /* Unicode UTF8 */ > > - {PG_MULE_INTERNAL}, /* Mule internal code */ > > - {PG_LATIN1}, /* ISO-8859-1 Latin 1 */ > > {PG_LATIN2, LUmapISO8859_2, ULmapISO8859_2, > > sizeof(LUmapISO8859_2) / sizeof(pg_local_to_utf), > > sizeof(ULmapISO8859_2) / sizeof(pg_utf_to_local)}, /* ISO-8859-2 > > Latin 2 */ > > @@ -104,12 +95,6 @@ > > {PG_LATIN10, LUmapISO8859_16, ULmapISO8859_16, > > sizeof(LUmapISO8859_16) / sizeof(pg_local_to_utf), > > sizeof(ULmapISO8859_16) / sizeof(pg_utf_to_local)}, /* ISO-8859-16 > > Latin 10 */ > > - {PG_WIN1256}, /* windows-1256 */ > > - {PG_WIN1258}, /* Windows-1258 */ > > - {PG_WIN874}, /* windows-874 */ > > - {PG_KOI8R}, /* KOI8-R */ > > - {PG_WIN1251}, /* windows-1251 */ > > - {PG_WIN866}, /* (MS-DOS CP866) */ > > {PG_ISO_8859_5, LUmapISO8859_5, ULmapISO8859_5, > > sizeof(LUmapISO8859_5) / sizeof(pg_local_to_utf), > > sizeof(ULmapISO8859_5) / sizeof(pg_utf_to_local)}, /* ISO-8859-5 */ > > @@ -131,11 +116,23 @@ > > unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2); > > unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3); > > int len = PG_GETARG_INT32(4); > > + int i; > > > > Assert(PG_GETARG_INT32(1) == PG_UTF8); > > Assert(len >= 0); > > > > - LocalToUtf(src, dest, maps[encoding].map1, maps[encoding].size1, > > encoding, len); > > + for (i=0;i<sizeof(maps)/sizeof(pg_conv_map);i++) > > + { > > + if (encoding == maps[i].encoding) > > + { > > + LocalToUtf(src, dest, maps[i].map1, maps[i].size1, > > encoding, len); > > + PG_RETURN_VOID(); > > + } > > + } > > + > > + ereport(ERROR, > > + (errcode(ERRCODE_INTERNAL_ERROR), > > + errmsg("unexpected encoding id %d for ISO-8859 > > charsets", encoding))); > > > > PG_RETURN_VOID(); > > } > > @@ -147,11 +144,23 @@ > > unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2); > > unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3); > > int len = PG_GETARG_INT32(4); > > + int i; > > > > Assert(PG_GETARG_INT32(0) == PG_UTF8); > > Assert(len >= 0); > > > > - UtfToLocal(src, dest, maps[encoding].map2, maps[encoding].size2, len); > > + for (i=0;i<sizeof(maps)/sizeof(pg_conv_map);i++) > > + { > > + if (encoding == maps[i].encoding) > > + { > > + UtfToLocal(src, dest, maps[i].map2, maps[i].size2, len); > > + PG_RETURN_VOID(); > > + } > > + } > > + > > + ereport(ERROR, > > + (errcode(ERRCODE_INTERNAL_ERROR), > > + errmsg("unexpected encoding id %d for ISO-8859 > > charsets", encoding))); > > > > PG_RETURN_VOID(); > > } > > -- > Bruce Momjian | http://candle.pha.pa.us > pgman@candle.pha.pa.us | (610) 359-1001 > + If your life is a hard drive, | 13 Roberts Road > + Christ can be your backup. | Newtown Square, Pennsylvania 19073 > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match > ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match