Re: [HACKERS] Latest on CITEXT 2.0

Marko Kreen Tue, 01 Jul 2008 08:16:21 -0700

On 7/1/08, Tom Lane <[EMAIL PROTECTED]> wrote:
> "Marko Kreen" <[EMAIL PROTECTED]> writes:
>  > On 6/26/08, Tom Lane <[EMAIL PROTECTED]> wrote:
>
> >> BTW, I don't think you can use that same-length optimization for
>  >> citext.  There's no reason to think that upper/lowercase pairs will
>  >> have the same length all the time in multibyte encodings.
>
>  > What about this code in current str_tolower():
>
>  >         /* Output workspace cannot have more codes than input bytes */
>  >         workspace = (wchar_t *) palloc((nbytes + 1) * sizeof(wchar_t));
>
>
> That's working with wchars, not bytes.


Ah, I missed the point of char2wchar() line.

I'm rather unfamiliar with various MB API-s, sorry.

There's another thing I'm probably missing: does current code handle
multi-wchar codepoints?  Or is it guaranteed they don't happen?
(Wasn't wchar_t usually 16bit value?)

-- 
marko

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Latest on CITEXT 2.0

Reply via email to