ID:               48322
 User updated by:  netspy at me dot com
 Reported By:      netspy at me dot com
-Status:           Closed
+Status:           Open
 Bug Type:         *Unicode Issues
 Operating System: Mac OS X
 PHP Version:      5.2.9
 New Comment:

On Linux strcoll works fine, I get only on Mac OS X (BSD) a false 
order. I also test it with a ISO 8859-1 string and locale 
de_DE.ISO8859-1. The same result, on Linux correct, on Mac OS X wrong.

So I think it's not a Unicode issue!

Here is another test code:

$string_utf = "abcdefghijklmnopqrstuvwxyzäöüß";
$string_iso = utf8_decode($string_utf);

$array_utf = array(); $array_iso = array();

for ($i=0; $i<mb_strlen($string_utf, 'UTF-8'); $i++) {
    $array_utf[]=mb_substr($string_utf, $i, 1, 'UTF-8');
    $array_iso[]=substr($string_iso, $i, 1);
}

print("\nLocale: " . setlocale(LC_COLLATE, 'de_DE.UTF-8'));
usort($array_utf, 'strcoll');
print("\n" . implode('', $array_utf) . "\n");

print("\nLocale: " . setlocale(LC_COLLATE, 'de_DE.ISO8859-1'));
usort($array_iso, 'strcoll');
print("\n" . utf8_encode(implode('', $array_iso)) . "\n");


The result on Mac OS X:

Locale: de_DE.UTF-8
abcdefghijklmnopqrstuvwxyzßäöü

Locale: de_DE.ISO8859-1
abcdefghijklmnopqrstuvwxyzßäöü

And the Linux result:

Locale: de_DE.UTF-8
aäbcdefghijklmnoöpqrsßtuüvwxyz

Locale: de_DE.ISO8859-1
aäbcdefghijklmnoöpqrsßtuüvwxyz


Previous Comments:
------------------------------------------------------------------------

[2009-05-19 10:50:59] j...@php.net

It doesn't work on any system below PHP 6. You can always use the intl
extension from PECL while waiting for proper unicode support:
http://pecl.php.net/intl 

Using the collator (http://php.net/collator) you can achieve sorting
with any locales.

------------------------------------------------------------------------

[2009-05-18 22:37:22] netspy at me dot com

Description:
------------
strcoll() does not sort UTF-8 strings correctly on Mac OS X.

Reproduce code:
---------------
$locale = 'de_DE.UTF-8'; 
$string = "abcdefghijklmnopqrstuvwxyzäöüß"; 

$array = array(); 

for ($i=0; $i<mb_strlen($string, 'UTF-8'); $i++) { 
    $array[]=mb_substr($string, $i, 1, 'UTF-8'); 
} 

$oldLocale = setlocale(LC_COLLATE, "0"); 

print("\nOld: $oldLocale New: "); 
print(setlocale(LC_COLLATE, $locale)); 
usort($array, 'strcoll'); 
setlocale(LC_COLLATE, $oldLocale); 
print("\n" . implode('', $array) . "\n"); 

Expected result:
----------------
Old: C New: de_DE.UTF-8
aäbcdefghijklmnoöpqrsßtuüvwxyz

Actual result:
--------------
Old: C New: de_DE.UTF-8
abcdefghijklmnopqrstuvwxyzßäöü


------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=48322&edit=1

Reply via email to