[PHP-DEV] [PATCH] Make mb_convert_case() conform to Unicode Spec

Moriyoshi Koizumi Wed, 23 Oct 2002 11:21:23 -0700

Hello Wez,

IMO the current behaviour of mb_convert_case() with MB_CASE_TITLE looks 
a bit strange as per Unicode specification.


--snip-- (cited from http://www.unicode.org/unicode/reports/tr21/)

S3. toTitlecase(X)
For each character C, find the preceding character B. 
ignore any intervening case-ignorable characters when finding B. 
If B exists, and is cased 
map C to UCD_lower(C) 
Otherwise, 
map C to UCD_title(C) 

--snip--

The attached patch modifies the conversion routine so that it conforms to 
the document referred above.

I don't know what the expected result is, so I refrain from committing it 
immediately. Are there any problems with this?

Moriyoshi

Index: php_unicode.c
===================================================================
RCS file: /repository/php4/ext/mbstring/php_unicode.c,v
retrieving revision 1.2
diff -u -r1.2 php_unicode.c
--- php_unicode.c       1 Oct 2002 10:16:40 -0000       1.2
+++ php_unicode.c       23 Oct 2002 17:59:21 -0000
@@ -257,11 +257,26 @@
                        }
                        break;
 
-               case PHP_UNICODE_CASE_TITLE:
+               case PHP_UNICODE_CASE_TITLE: {
+                       int mode = 0; 
+
                        for (i = 0; i < unicode_len / sizeof(unsigned long); i++) {
-                               unicode_ptr[i] = php_unicode_totitle(unicode_ptr[i]);
+                               int res = php_unicode_is_prop(unicode_ptr[i],
+                                       
+UC_MN|UC_ME|UC_CF|UC_LM|UC_SK|UC_LU|UC_LL|UC_LT, 0);
+                               if (mode) {
+                                       if (res) {
+                                               unicode_ptr[i] = 
+php_unicode_tolower(unicode_ptr[i]);
+                                       } else {
+                                               mode = 0;
+                                       }       
+                               } else {
+                                       if (res) {
+                                               mode = 1;
+                                               unicode_ptr[i] = 
+php_unicode_totitle(unicode_ptr[i]);
+                                       }
+                               }
                        }
-                       break;
+               } break;
 
        }

-- 
PHP Development Mailing List <http://www.php.net/>
To unsubscribe, visit: http://www.php.net/unsub.php

[PHP-DEV] [PATCH] Make mb_convert_case() conform to Unicode Spec

Reply via email to