Re: [HACKERS] LIKE optimization in UTF-8 and locale-C

2007-03-25 Thread Hannu Krosing
Ühel kenal päeval, R, 2007-03-23 kell 06:10, kirjutas Andrew - Supernews: On 2007-03-23, ITAGAKI Takahiro [EMAIL PROTECTED] wrote: Thanks, it all made sense to me. My proposal was completely wrong. Actually, I think your proposal is fundamentally correct, merely incomplete. Doing

Re: [HACKERS] LIKE optimization in UTF-8 and locale-C

2007-03-22 Thread Tom Lane
ITAGAKI Takahiro [EMAIL PROTECTED] writes: I found LIKE operators are slower on multi-byte encoding databases than single-byte encoding ones. It comes from difference between MatchText() and MBMatchText(). We've had an optimization for single-byte encodings using

Re: [HACKERS] LIKE optimization in UTF-8 and locale-C

2007-03-22 Thread Hannu Krosing
Ühel kenal päeval, N, 2007-03-22 kell 11:08, kirjutas Tom Lane: ITAGAKI Takahiro [EMAIL PROTECTED] writes: I found LIKE operators are slower on multi-byte encoding databases than single-byte encoding ones. It comes from difference between MatchText() and MBMatchText(). We've had an

Re: [HACKERS] LIKE optimization in UTF-8 and locale-C

2007-03-22 Thread ITAGAKI Takahiro
Hannu Krosing [EMAIL PROTECTED] wrote: We've had an optimization for single-byte encodings using pg_database_encoding_max_length() == 1 test. I'll propose to extend it in UTF-8 with locale-C case. If this works for UTF8, won't it work for all the backend-legal encodings? I

Re: [HACKERS] LIKE optimization in UTF-8 and locale-C

2007-03-22 Thread ITAGAKI Takahiro
Dennis Bjorklund [EMAIL PROTECTED] wrote: The problem with the like pattern _ is that it has to know how long the single caracter is that it should pass over. Say you have a UTF-8 string with 2 characters encoded in 3 bytes ('ÖA'). Where the first character is 2 bytes: 0xC3 0x96 'A'

Re: [HACKERS] LIKE optimization in UTF-8 and locale-C

2007-03-22 Thread Dennis Bjorklund
ITAGAKI Takahiro skrev: I guess it works well for % but not for _ , the latter has to know, how many bytes the current (multibyte) character covers. Yes, % is not used in trailing bytes for all encodings, but _ is used in some of them. I think we can use the optimization for all of the server

Re: [HACKERS] LIKE optimization in UTF-8 and locale-C

2007-03-22 Thread Andrew - Supernews
On 2007-03-22, Tom Lane [EMAIL PROTECTED] wrote: ITAGAKI Takahiro [EMAIL PROTECTED] writes: I found LIKE operators are slower on multi-byte encoding databases than single-byte encoding ones. It comes from difference between MatchText() and MBMatchText(). We've had an optimization for

Re: [HACKERS] LIKE optimization in UTF-8 and locale-C

2007-03-22 Thread Andrew - Supernews
On 2007-03-23, ITAGAKI Takahiro [EMAIL PROTECTED] wrote: Thanks, it all made sense to me. My proposal was completely wrong. Actually, I think your proposal is fundamentally correct, merely incomplete. Doing octet-based rather than character-based matching of strings is a _design goal_ of UTF8.

[HACKERS] LIKE optimization in UTF-8 and locale-C

2007-03-21 Thread ITAGAKI Takahiro
Hello, I found LIKE operators are slower on multi-byte encoding databases than single-byte encoding ones. It comes from difference between MatchText() and MBMatchText(). We've had an optimization for single-byte encodings using pg_database_encoding_max_length() == 1 test. I'll propose to extend