Andrew - Supernews <[EMAIL PROTECTED]> wrote: > Actually, I think your proposal is fundamentally correct, merely incomplete.
Yeah, I fixed the patch to handle '_' correctly. > Doing octet-based rather than character-based matching of strings is a > _design goal_ of UTF8. I think all "safe ASCII-supersets" encodings are comparable by bytes, not only UTF-8. Their all multibyte characters consist of bytes larger than 127. I updated the patch on this presupposition. It uses octet-based matching usually and character-based matching at '_'. There was 30%+ of performance win in selection using multibytes LIKE '%foo%'. encoding | HEAD | patched -----------+---------+--------- SQL_ASCII | 7094ms | 7062ms LATIN1 | 7083ms | 7078ms UTF8 | 17974ms | 11635ms (64.7%) EUC_JP | 17032ms | 12109ms (71.1%) If this patch is acceptable, please drop JOHAB encoding from server encodings before it is applied. Trailing bytes of JOHAB can be less than 128. http://archives.postgresql.org/pgsql-hackers/2007-03/msg01475.php Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
mbtextmatch.patch
Description: Binary data
---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate