ID: 48322 Updated by: ras...@php.net Reported By: netspy at me dot com Status: Wont fix Bug Type: *Unicode Issues Operating System: Mac OS X PHP Version: 5.2.9 New Comment:
The code for this function is just: RETURN_LONG(strcoll((const char *) Z_STRVAL_PP(s1), (const char *) Z_STRVAL_PP(s2))); We use the underlying system strcoll function. There is nothing for us to fix here. If your system's strcoll function is broken, you are out of luck. OSX has a long history of buggy C99 functions and it wouldn't surprise me if the strcoll function doesn't handle UTF8 locales correctly. But that still isn't something we can fix short of doing an OS-specific hack here which we try to avoid. Previous Comments: ------------------------------------------------------------------------ [2009-05-19 14:03:13] netspy at me dot com What is your result on Linux? Do you saved the test file with UTF-8 coding? Because strcoll is basically a C function, I can't see why it is a PHP Unicode issue and why you close the bug as Wont fix. ------------------------------------------------------------------------ [2009-05-19 12:58:18] j...@php.net I get the wrong order on Linux. Did you mix the results there? Anyways, this really is a problem in unicode support. To get _really_ working stuff, use the intl extension or wait for PHP 6. Wont fix. ------------------------------------------------------------------------ [2009-05-19 12:35:29] netspy at me dot com On Linux strcoll works fine, I get only on Mac OS X (BSD) a false order. I also test it with a ISO 8859-1 string and locale de_DE.ISO8859-1. The same result, on Linux correct, on Mac OS X wrong. So I think it's not a Unicode issue! Here is another test code: $string_utf = "abcdefghijklmnopqrstuvwxyzäöüß"; $string_iso = utf8_decode($string_utf); $array_utf = array(); $array_iso = array(); for ($i=0; $i<mb_strlen($string_utf, 'UTF-8'); $i++) { $array_utf[]=mb_substr($string_utf, $i, 1, 'UTF-8'); $array_iso[]=substr($string_iso, $i, 1); } print("\nLocale: " . setlocale(LC_COLLATE, 'de_DE.UTF-8')); usort($array_utf, 'strcoll'); print("\n" . implode('', $array_utf) . "\n"); print("\nLocale: " . setlocale(LC_COLLATE, 'de_DE.ISO8859-1')); usort($array_iso, 'strcoll'); print("\n" . utf8_encode(implode('', $array_iso)) . "\n"); The result on Mac OS X: Locale: de_DE.UTF-8 abcdefghijklmnopqrstuvwxyzßäöü Locale: de_DE.ISO8859-1 abcdefghijklmnopqrstuvwxyzßäöü And the Linux result: Locale: de_DE.UTF-8 aäbcdefghijklmnoöpqrsßtuüvwxyz Locale: de_DE.ISO8859-1 aäbcdefghijklmnoöpqrsßtuüvwxyz ------------------------------------------------------------------------ [2009-05-19 10:50:59] j...@php.net It doesn't work on any system below PHP 6. You can always use the intl extension from PECL while waiting for proper unicode support: http://pecl.php.net/intl Using the collator (http://php.net/collator) you can achieve sorting with any locales. ------------------------------------------------------------------------ [2009-05-18 22:37:22] netspy at me dot com Description: ------------ strcoll() does not sort UTF-8 strings correctly on Mac OS X. Reproduce code: --------------- $locale = 'de_DE.UTF-8'; $string = "abcdefghijklmnopqrstuvwxyzäöüß"; $array = array(); for ($i=0; $i<mb_strlen($string, 'UTF-8'); $i++) { $array[]=mb_substr($string, $i, 1, 'UTF-8'); } $oldLocale = setlocale(LC_COLLATE, "0"); print("\nOld: $oldLocale New: "); print(setlocale(LC_COLLATE, $locale)); usort($array, 'strcoll'); setlocale(LC_COLLATE, $oldLocale); print("\n" . implode('', $array) . "\n"); Expected result: ---------------- Old: C New: de_DE.UTF-8 aäbcdefghijklmnoöpqrsßtuüvwxyz Actual result: -------------- Old: C New: de_DE.UTF-8 abcdefghijklmnopqrstuvwxyzßäöü ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=48322&edit=1