Re: [HACKERS] LIKE optimization in UTF-8 and locale-C

ITAGAKI Takahiro Thu, 22 Mar 2007 21:49:45 -0800

Dennis Bjorklund <[EMAIL PROTECTED]> wrote:

> The problem with the like pattern _ is that it has to know how long the 
> single caracter is that it should pass over. Say you have a UTF-8 string 
> with 2 characters encoded in 3 bytes ('ÖA'). Where the first character 
> is 2 bytes:
> 
> 0xC3 0x96 'A'
> 
> and now you want to match that with the LIKE pattern:
> 
> '_A'


Thanks, it all made sense to me. My proposal was completely wrong.
The optimization of MBMatchText() seems to be the right way...

> Maybe one should simply write a special version of LIKE for the UTF-8 
> encoding since it's probably the most used encoding today. But I don't 
> think you can use the C locale and that it would work for UTF-8.

But then, present LIKE matching is not locale aware. we treat multi-byte
characters properly, but always perform a char-by-char comparison.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] LIKE optimization in UTF-8 and locale-C

Reply via email to