Edit report at http://bugs.php.net/bug.php?id=54028&edit=1
ID: 54028
User updated by: schmale at froglogic dot com
Reported by: schmale at froglogic dot com
Summary: Directory::read() cannot handle non-unicode chars
properly
Status: Bogus
Type: Bug
Package: Directory function related
Operating System: Windows 7
PHP Version: 5.3.5
Block user comment: N
Private report: N
New Comment:
Well, I don't know what Windows uses as encoding, but I sure do know,
that it works properly with the Windows CGI version. The point is, a
directory called 'Startmenü' will return 'Startmenü' with Linux/CGI,
Linux/CLI, Windows/CGI, but NOT with Windows/CLI - the latter returning
'Startmenñæ' (or sth similar). In other words: The behaviour with
Windows/CLI is broken, where the other versions return the exact name of
the directory, as expected.
So I think it has nothing (little) to do with unicode filesystem support
or the encoding of Windows, but with differences between CGI and CLI.
Previous Comments:
------------------------------------------------------------------------
[2011-02-15 16:54:17] [email protected]
There is already a feature request for unicode filesystem support.
Btw, Windows does not use UTF-8 for its encoding.
------------------------------------------------------------------------
[2011-02-15 16:51:20] schmale at froglogic dot com
Description:
------------
Notice: This problem does ONLY affect the CLI interpreter, NOT the CGI.
Using dir('path/to/dir'), the read() method does not return UTF-8, if
the directory contains e.g. umlauts (ä, ö, ü). I tested this on Linux
and Windows, both CGI and CLI, and the problem does only occur with
Windows/CLI.
Test script:
---------------
$path = 'path/to/directory/which/contains/umlauts';
$directory = dir($path);
while (false !== ($content = $directory->read())) {
if (mb_check_encoding($content, 'UTF-8') === false) {
fprintf(STDERR, 'Returned non-utf-8 (%s)', $content);
}
}
Expected result:
----------------
The expected result, of course, was that the return value of read is
always encoded in UTF-8, i.e. no messages are print, when we run the
script.
Actual result:
--------------
If a subdirectory contains umlauts (or I guess any non-unicode
character), a message is print, i.e. the return value is not encoded in
UTF-8.
------------------------------------------------------------------------
--
Edit this bug report at http://bugs.php.net/bug.php?id=54028&edit=1