[PHP-DEV] Re: #19257 [Bgs]: strtolower & strtoupper does not work for UTF-8 strings

Wez Furlong Fri, 13 Sep 2002 03:46:43 -0700

Hey Stig,

The person behind that report suggested that towupper and towlower
could be used, although we would need to convert our PHP string to
wide chars first.  This is problematic since we don't know (without
being told) which encoding the string is using in the first place.
And even if we do know it, there are no guarantees about libraries
available for making the conversion, or even if the system has those
wide char functions.

I think it makes sense to add a new function to the mbstring extension:

proto mb_change_case(string str, string mapping [,string encoding])

Where mapping is one of "upper", "lower" or "title" (since unicode
knows about title case).  This function would then be able to
internally convert to unicode, apply the appropriate transformation
and then convert back to the original encoding.

We could then add mb_toupper and mb_tolower that internally call
the change_case function with the appropriate arguments, and these
functions would be good candidates for the function overloading
feature of mbstring (if you need it; I still feel happier with
explicit wide char calls).

Until we make the whole of PHP multi-byte aware, I think mbstring is
the best place for this functionality.

I'm tempted to volunteer for this, if you don't mind supplying that
unicode manipulation code (I'm fairly familiar with the mbstring
internals).

--Wez.

On 09/13/02, "Stig Venaas" <[EMAIL PROTECTED]> wrote:
> I really don't think toupper() would work well with UTF-8. I can't
> imagine how it can be done correctly when only passing 6 bits at
> the time of the character. You would need the entire Unicode code
> point in one call. If there is interest, I have code that can do
> Unicode normalization, change cases etc, that could be included in
> PHP. Only problem is that there would be yet another library to
> link with.
> 
> Stig

-- 
PHP Development Mailing List <http://www.php.net/>
To unsubscribe, visit: http://www.php.net/unsub.php

[PHP-DEV] Re: #19257 [Bgs]: strtolower & strtoupper does not work for UTF-8 strings

Reply via email to