Re: [HACKERS] LIKE optimization in UTF-8 and locale-C

Dennis Bjorklund Thu, 22 Mar 2007 21:56:40 -0800

ITAGAKI Takahiro skrev:

I guess it works well for % but not for _ , the latter has to know, how
many bytes the current (multibyte) character covers.


Yes, % is not used in trailing bytes for all encodings, but _ is
used in some of them. I think we can use the optimization for all

of the server encodings except JOHAB.

The problem with the like pattern _ is that it has to know how long thesingle caracter is that it should pass over. Say you have a UTF-8 stringwith 2 characters encoded in 3 bytes ('ÖA'). Where the first characteris 2 bytes:


0xC3 0x96 'A'

and now you want to match that with the LIKE pattern:

'_A'

How would that work in the C locale?

Maybe one should simply write a special version of LIKE for the UTF-8encoding since it's probably the most used encoding today. But I don'tthink you can use the C locale and that it would work for UTF-8.


/Dennis

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [EMAIL PROTECTED] so that your
      message can get through to the mailing list cleanly

Re: [HACKERS] LIKE optimization in UTF-8 and locale-C

Reply via email to