Hello Wez,
IMO the current behaviour of mb_convert_case() with MB_CASE_TITLE looks
a bit strange as per Unicode specification.
--snip-- (cited from http://www.unicode.org/unicode/reports/tr21/)
S3. toTitlecase(X)
For each character C, find the preceding character B.
ignore any intervening case-ignorable characters when finding B.
If B exists, and is cased
map C to UCD_lower(C)
Otherwise,
map C to UCD_title(C)
--snip--
The attached patch modifies the conversion routine so that it conforms to
the document referred above.
I don't know what the expected result is, so I refrain from committing it
immediately. Are there any problems with this?
Moriyoshi
Index: php_unicode.c
===================================================================
RCS file: /repository/php4/ext/mbstring/php_unicode.c,v
retrieving revision 1.2
diff -u -r1.2 php_unicode.c
--- php_unicode.c 1 Oct 2002 10:16:40 -0000 1.2
+++ php_unicode.c 23 Oct 2002 17:59:21 -0000
@@ -257,11 +257,26 @@
}
break;
- case PHP_UNICODE_CASE_TITLE:
+ case PHP_UNICODE_CASE_TITLE: {
+ int mode = 0;
+
for (i = 0; i < unicode_len / sizeof(unsigned long); i++) {
- unicode_ptr[i] = php_unicode_totitle(unicode_ptr[i]);
+ int res = php_unicode_is_prop(unicode_ptr[i],
+
+UC_MN|UC_ME|UC_CF|UC_LM|UC_SK|UC_LU|UC_LL|UC_LT, 0);
+ if (mode) {
+ if (res) {
+ unicode_ptr[i] =
+php_unicode_tolower(unicode_ptr[i]);
+ } else {
+ mode = 0;
+ }
+ } else {
+ if (res) {
+ mode = 1;
+ unicode_ptr[i] =
+php_unicode_totitle(unicode_ptr[i]);
+ }
+ }
}
- break;
+ } break;
}
--
PHP Development Mailing List <http://www.php.net/>
To unsubscribe, visit: http://www.php.net/unsub.php