Re: [HACKERS] Unicode upper() bug still present

Hannu Krosing Mon, 20 Oct 2003 00:27:09 -0700

Tom Lane kirjutas E, 20.10.2003 kell 03:35:
> Oliver Elphick <[EMAIL PROTECTED]> writes:
> > There is a bug in Unicode upper() which has been present since 7.2:
> 
> We don't support upper/lower in multibyte character sets, and can't as
> long as the functionality is dependent on <ctype.h>'s toupper()/tolower().
> It's been suggested that we could use <wctype.h> where available.
> However there are a bunch of issues that would have to be solved to make
> that happen.  (How do we convert between the database character encoding 
> and the wctype representation?


How do we do it for sorting ?

> How do we even find out what
> representation the current locale setting expects to use?)

Why not use the same locale settings as for sorting (i.e. databse
encoding) until we have a proper multi-locale support in the backend ?

It seems inconsistent that we do use locale-aware sorts but not
upper/lower.

this is for UNICODE database using locale et_EE.UTF-8

ucdb=# select t, upper(t) from tt order by 1;
 t | upper
---+-------
 a | A
 s | S
 Š | Š
 š | š
 Õ | Õ
 õ | õ
 Ä | Ä
 ä | ä
(8 rows)

as you see, the sort order is right, but "some" characters are and some
are not converted the result is a complete mess ;(

-------------------
Hannu


---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [EMAIL PROTECTED] so that your
      message can get through to the mailing list cleanly

Re: [HACKERS] Unicode upper() bug still present

Reply via email to