Edit report at https://bugs.php.net/bug.php?id=33350&edit=1
ID: 33350 Comment by: bvibber at wikimedia dot org Reported by: php-bug dot scyt at spamgourmet dot com Summary: php can't handle utf-8 in path names Status: Open Type: Feature/Change Request Package: Feature/Change Request Operating System: Windows XP PHP Version: 5.0.4 Block user comment: N Private report: N New Comment: So, this is still an issue that affects programs like MediaWiki running on Windows servers: https://bugzilla.wikimedia.org/show_bug.cgi?id=1780 Windows NT/XP/Vista/7/8/etc use Unicode for file names, but the POSIX filesystem APIs expose only the "ANSI" interfaces, which use 8-bit or DBCS locale-specific encodings. Since PHP's filesystem APIs seem to use the POSIX ones, you can't use UTF-8 for filenames as one expects on Linux, Mac OS X, etc. Not only this, but there's no general way to simply switch encodings -- you'll be limited to the 8-bit encoding's character set, so can't for instance upload a Chinese-named file to a Russian-configured server. Some other cross-platform toolkits like glib use Win32's Unicode filesystem APIs internally, which allow using the full Unicode character set for filenames, and expose the Unicode strings as UTF-8 for C null-terminated string compatibility. Previous Comments: ------------------------------------------------------------------------ [2005-06-15 14:42:12] php-bug dot scyt at spamgourmet dot com Nice try. :) It should do the same thing apache does. Convert the UTF-8 string to the system's character encoding. That is UTF-16 or the 8-bit character encoding windows is currently set to. You could do the conversion to the 8-bit character encoding. That would save some code changes and would work in most cases. But there are systems where this would still fail. e.g. having directory names with german umlauts and setting the 8-bit character encoding to a character encoding without this umlauts (e.g. russian). The UTF-16 way would work in this scenario and is therefore preferable. ------------------------------------------------------------------------ [2005-06-15 13:44:36] ed...@php.net The underlying filesystem is not UTF-8 based so there is very little PHP can do here. ------------------------------------------------------------------------ [2005-06-15 11:59:56] php-bug dot scyt at spamgourmet dot com Description: ------------ Apache2 on Windows NT based operating systems uses utf-8 to encode pathnames. php fails with "file not found" error messages if it encounters such pathes. mod_php and php-cgi.exe is affected. Reproduce code: --------------- Rename a directory that contains a php script to "testÄ". That is an capital A with two dots above it. Codepoint 196 in latin-1. Then try to access this php script through the webserver. ------------------------------------------------------------------------ -- Edit this bug report at https://bugs.php.net/bug.php?id=33350&edit=1