ID:               46165
 Updated by:       [EMAIL PROTECTED]
 Reported By:      gehrig at ishd dot de
 Status:           Assigned
 Bug Type:         Strings related
 Operating System: win32 only
 PHP Version:      5.2.6
 Assigned To:      pajoye
 New Comment:

I'm not sure it is worth fixing, for many reasons. The first is to get
it working on each system in a portable is a real pain (and as Derick
pointed out, really not only on windows).

My opinion is that all these functions should be deprecated (or
strongly recommend to do not be used for anything but ascii) in favour
of the new unicode APIs (6.x or partially with intl).


Previous Comments:
------------------------------------------------------------------------

[2008-10-29 10:05:26] [EMAIL PROTECTED]

POSIX locales, don't really deal with multi byte strings like UTF8 so
well, not even on unices.

------------------------------------------------------------------------

[2008-10-27 14:10:00] [EMAIL PROTECTED]

As a windows only bug assigned to the windows port maintainer.

------------------------------------------------------------------------

[2008-09-24 07:31:52] gehrig at ishd dot de

Description:
------------
The strcoll() function for sorting comparing strings in a locale-aware
manner does not seem to work with UTF-8 encoded strings despite using
the correct Windows locale with UTF-8 codepage (65001). strcoll() always
returns 2147483647 which makes array sorting of such strings more or
less random (for example).
Running the same snippet with Windows-1252 (ISO-8859-1) encoded strings
or on a Linux machine does in fact work as expected.

Please note: for running the following reproduce code, the PHP file
must be UTF-8 encoded!

Reproduce code:
---------------
<?php
function traceStrColl($a, $b) {
    $outValue=strcoll($a, $b);
    echo "$a $b $outValue\r\n";
    return $outValue;
}

$locale=(defined('PHP_OS') && stristr(PHP_OS, 'win')) ?
'German_Germany.65001' : 'de_DE.utf8';

$string="ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÜabcdefghijklmnopqrstuvwxyzäöüß";
$array=array();
for ($i=0; $i<mb_strlen($string, 'UTF-8'); $i++) {
    $array[]=mb_substr($string, $i, 1, 'UTF-8');
}
$oldLocale=setlocale(LC_COLLATE, "0");
var_dump(setlocale(LC_COLLATE, $locale));
usort($array, 'traceStrColl');
setlocale(LC_COLLATE, $oldLocale);
var_dump($array);

Expected result:
----------------
string(20) "German_Germany.65001"
a B -1
[...]
array(59) {
  [0]=>
  string(1) "a"
  [1]=>
  string(1) "A"
  [2]=>
  string(2) "ä"
  [3]=>
  string(2) "Ä"
  [4]=>
  string(1) "b"
  [5]=>
  string(1) "B"
  [6]=>
  string(1) "c"
  [7]=>
  string(1) "C"
  [8]=>
  string(1) "d"
  [9]=>
  string(1) "D"
  [10]=>
  string(1) "e"
  [11]=>
  string(1) "E"
  [12]=>
  string(1) "f"
  [13]=>
  string(1) "F"
  [14]=>
  string(1) "g"
  [15]=>
  string(1) "G"
  [16]=>
  string(1) "h"
  [17]=>
  string(1) "H"
  [18]=>
  string(1) "i"
  [19]=>
  string(1) "I"
  [20]=>
  string(1) "j"
  [21]=>
  string(1) "J"
  [22]=>
  string(1) "k"
  [23]=>
  string(1) "K"
  [24]=>
  string(1) "l"
  [25]=>
  string(1) "L"
  [26]=>
  string(1) "m"
  [27]=>
  string(1) "M"
  [28]=>
  string(1) "n"
  [29]=>
  string(1) "N"
  [30]=>
  string(1) "o"
  [31]=>
  string(1) "O"
  [32]=>
  string(2) "ö"
  [33]=>
  string(2) "Ö"
  [34]=>
  string(1) "p"
  [35]=>
  string(1) "P"
  [36]=>
  string(1) "q"
  [37]=>
  string(1) "Q"
  [38]=>
  string(1) "r"
  [39]=>
  string(1) "R"
  [40]=>
  string(1) "s"
  [41]=>
  string(1) "S"
  [42]=>
  string(2) "ß"
  [43]=>
  string(1) "t"
  [44]=>
  string(1) "T"
  [45]=>
  string(1) "u"
  [46]=>
  string(1) "U"
  [47]=>
  string(2) "ü"
  [48]=>
  string(2) "Ü"
  [49]=>
  string(1) "v"
  [50]=>
  string(1) "V"
  [51]=>
  string(1) "w"
  [52]=>
  string(1) "W"
  [53]=>
  string(1) "x"
  [54]=>
  string(1) "X"
  [55]=>
  string(1) "y"
  [56]=>
  string(1) "Y"
  [57]=>
  string(1) "z"
  [58]=>
  string(1) "Z"
}

Actual result:
--------------
string(20) "German_Germany.65001"
a B 2147483647
[...]
array(59) {
  [0]=>
  string(1) "c"
  [1]=>
  string(1) "B"
  [2]=>
  string(1) "s"
  [3]=>
  string(1) "C"
  [4]=>
  string(1) "k"
  [5]=>
  string(1) "D"
  [6]=>
  string(2) "ä"
  [7]=>
  string(1) "E"
  [8]=>
  string(1) "g"
  [9]=>
  string(1) "F"
  [10]=>
  string(1) "o"
  [11]=>
  string(1) "G"
  [12]=>
  string(1) "w"
  [13]=>
  string(1) "H"
  [14]=>
  string(1) "A"
  [15]=>
  string(1) "I"
  [16]=>
  string(1) "e"
  [17]=>
  string(1) "J"
  [18]=>
  string(1) "i"
  [19]=>
  string(1) "K"
  [20]=>
  string(1) "m"
  [21]=>
  string(1) "L"
  [22]=>
  string(1) "q"
  [23]=>
  string(1) "M"
  [24]=>
  string(1) "u"
  [25]=>
  string(1) "N"
  [26]=>
  string(1) "y"
  [27]=>
  string(1) "O"
  [28]=>
  string(2) "ü"
  [29]=>
  string(1) "P"
  [30]=>
  string(1) "b"
  [31]=>
  string(1) "Q"
  [32]=>
  string(1) "d"
  [33]=>
  string(1) "R"
  [34]=>
  string(1) "f"
  [35]=>
  string(1) "S"
  [36]=>
  string(1) "h"
  [37]=>
  string(1) "T"
  [38]=>
  string(1) "j"
  [39]=>
  string(1) "U"
  [40]=>
  string(1) "l"
  [41]=>
  string(1) "V"
  [42]=>
  string(1) "n"
  [43]=>
  string(1) "W"
  [44]=>
  string(1) "p"
  [45]=>
  string(1) "X"
  [46]=>
  string(1) "r"
  [47]=>
  string(1) "Y"
  [48]=>
  string(1) "t"
  [49]=>
  string(1) "Z"
  [50]=>
  string(1) "v"
  [51]=>
  string(2) "Ä"
  [52]=>
  string(1) "x"
  [53]=>
  string(2) "Ö"
  [54]=>
  string(1) "z"
  [55]=>
  string(2) "Ü"
  [56]=>
  string(2) "ö"
  [57]=>
  string(1) "a"
  [58]=>
  string(2) "ß"
}


------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=46165&edit=1

Reply via email to