Hannu Krosing <[EMAIL PROTECTED]> wrote: > > > We've had an optimization for single-byte encodings using > > > pg_database_encoding_max_length() == 1 test. I'll propose to extend it > > > in UTF-8 with locale-C case. > > > > If this works for UTF8, won't it work for all the backend-legal > > encodings? > > I guess it works well for % but not for _ , the latter has to know, how > many bytes the current (multibyte) character covers.
Yes, % is not used in trailing bytes for all encodings, but _ is used in some of them. I think we can use the optimization for all of the server encodings except JOHAB. Also, I took notice that locale settings are not used in LIKE matching, so the following is enough for checking availability of byte-wise matching functions. or am I missing something? #define sb_match_available() (GetDatabaseEncoding() == PG_JOHAB)) Multi-byte encodings supported by a server encoding. | % 0x25 | _ 0x5f | \ 0x5c | --------------+--------+--------+--------+- EUC_JP | unused | unused | unused | EUC_CN | unused | unused | unused | EUC_KR | unused | unused | unused | EUC_TW | unused | unused | unused | JOHAB | unused | *used* | *used* | UTF8 | unused | unused | unused | MULE_INTERNAL | unused | unused | unused | Just for reference, encodings only supported as a client encoding. | % 0x25 | _ 0x5f | \ 0x5c | --------------+--------+--------+--------+- SJIS | unused | *used* | *used* | BIG5 | unused | *used* | *used* | GBK | unused | *used* | *used* | UHC | unused | unused | unused | GB18030 | unused | *used* | *used* | Regards, --- ITAGAKI Takahiro NTT Open Source Software Center ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster