ID:               48322
 Updated by:       ras...@php.net
 Reported By:      netspy at me dot com
 Status:           Wont fix
 Bug Type:         *Unicode Issues
 Operating System: Mac OS X
 PHP Version:      5.2.9
 New Comment:

The code for this function is just:

    RETURN_LONG(strcoll((const char *) Z_STRVAL_PP(s1),
                        (const char *) Z_STRVAL_PP(s2)));

We use the underlying system strcoll function.  There is nothing for us
to fix here.  If your system's strcoll function is broken, you are out
of luck.  OSX has a long history of buggy C99 functions and it wouldn't
surprise me if the strcoll function doesn't handle UTF8 locales
correctly.  But that still isn't something we can fix short of doing an
OS-specific hack here which we try to avoid.


Previous Comments:
------------------------------------------------------------------------

[2009-05-19 14:03:13] netspy at me dot com

What is your result on Linux? Do you saved the test file with UTF-8 
coding?

Because strcoll is basically a C function, I can't see why it is a PHP

Unicode issue and why you close the bug as Wont fix.

------------------------------------------------------------------------

[2009-05-19 12:58:18] j...@php.net

I get the wrong order on Linux. Did you mix the results there? Anyways,
this really is a problem in unicode support. To get _really_ working
stuff, use the intl extension or wait for PHP 6. Wont fix.

------------------------------------------------------------------------

[2009-05-19 12:35:29] netspy at me dot com

On Linux strcoll works fine, I get only on Mac OS X (BSD) a false 
order. I also test it with a ISO 8859-1 string and locale 
de_DE.ISO8859-1. The same result, on Linux correct, on Mac OS X wrong.

So I think it's not a Unicode issue!

Here is another test code:

$string_utf = "abcdefghijklmnopqrstuvwxyzäöüß";
$string_iso = utf8_decode($string_utf);

$array_utf = array(); $array_iso = array();

for ($i=0; $i<mb_strlen($string_utf, 'UTF-8'); $i++) {
    $array_utf[]=mb_substr($string_utf, $i, 1, 'UTF-8');
    $array_iso[]=substr($string_iso, $i, 1);
}

print("\nLocale: " . setlocale(LC_COLLATE, 'de_DE.UTF-8'));
usort($array_utf, 'strcoll');
print("\n" . implode('', $array_utf) . "\n");

print("\nLocale: " . setlocale(LC_COLLATE, 'de_DE.ISO8859-1'));
usort($array_iso, 'strcoll');
print("\n" . utf8_encode(implode('', $array_iso)) . "\n");


The result on Mac OS X:

Locale: de_DE.UTF-8
abcdefghijklmnopqrstuvwxyzßäöü

Locale: de_DE.ISO8859-1
abcdefghijklmnopqrstuvwxyzßäöü

And the Linux result:

Locale: de_DE.UTF-8
aäbcdefghijklmnoöpqrsßtuüvwxyz

Locale: de_DE.ISO8859-1
aäbcdefghijklmnoöpqrsßtuüvwxyz

------------------------------------------------------------------------

[2009-05-19 10:50:59] j...@php.net

It doesn't work on any system below PHP 6. You can always use the intl
extension from PECL while waiting for proper unicode support:
http://pecl.php.net/intl 

Using the collator (http://php.net/collator) you can achieve sorting
with any locales.

------------------------------------------------------------------------

[2009-05-18 22:37:22] netspy at me dot com

Description:
------------
strcoll() does not sort UTF-8 strings correctly on Mac OS X.

Reproduce code:
---------------
$locale = 'de_DE.UTF-8'; 
$string = "abcdefghijklmnopqrstuvwxyzäöüß"; 

$array = array(); 

for ($i=0; $i<mb_strlen($string, 'UTF-8'); $i++) { 
    $array[]=mb_substr($string, $i, 1, 'UTF-8'); 
} 

$oldLocale = setlocale(LC_COLLATE, "0"); 

print("\nOld: $oldLocale New: "); 
print(setlocale(LC_COLLATE, $locale)); 
usort($array, 'strcoll'); 
setlocale(LC_COLLATE, $oldLocale); 
print("\n" . implode('', $array) . "\n"); 

Expected result:
----------------
Old: C New: de_DE.UTF-8
aäbcdefghijklmnoöpqrsßtuüvwxyz

Actual result:
--------------
Old: C New: de_DE.UTF-8
abcdefghijklmnopqrstuvwxyzßäöü


------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=48322&edit=1

Reply via email to