Hello Wez, IMO the current behaviour of mb_convert_case() with MB_CASE_TITLE looks a bit strange as per Unicode specification.
--snip-- (cited from http://www.unicode.org/unicode/reports/tr21/) S3. toTitlecase(X) For each character C, find the preceding character B. ignore any intervening case-ignorable characters when finding B. If B exists, and is cased map C to UCD_lower(C) Otherwise, map C to UCD_title(C) --snip-- The attached patch modifies the conversion routine so that it conforms to the document referred above. I don't know what the expected result is, so I refrain from committing it immediately. Are there any problems with this? Moriyoshi
Index: php_unicode.c =================================================================== RCS file: /repository/php4/ext/mbstring/php_unicode.c,v retrieving revision 1.2 diff -u -r1.2 php_unicode.c --- php_unicode.c 1 Oct 2002 10:16:40 -0000 1.2 +++ php_unicode.c 23 Oct 2002 17:59:21 -0000 @@ -257,11 +257,26 @@ } break; - case PHP_UNICODE_CASE_TITLE: + case PHP_UNICODE_CASE_TITLE: { + int mode = 0; + for (i = 0; i < unicode_len / sizeof(unsigned long); i++) { - unicode_ptr[i] = php_unicode_totitle(unicode_ptr[i]); + int res = php_unicode_is_prop(unicode_ptr[i], + +UC_MN|UC_ME|UC_CF|UC_LM|UC_SK|UC_LU|UC_LL|UC_LT, 0); + if (mode) { + if (res) { + unicode_ptr[i] = +php_unicode_tolower(unicode_ptr[i]); + } else { + mode = 0; + } + } else { + if (res) { + mode = 1; + unicode_ptr[i] = +php_unicode_totitle(unicode_ptr[i]); + } + } } - break; + } break; }
-- PHP Development Mailing List <http://www.php.net/> To unsubscribe, visit: http://www.php.net/unsub.php