Edit report at https://bugs.php.net/bug.php?id=33350&edit=1

 ID:                 33350
 Comment by:         bvibber at wikimedia dot org
 Reported by:        php-bug dot scyt at spamgourmet dot com
 Summary:            php can't handle utf-8 in path names
 Status:             Open
 Type:               Feature/Change Request
 Package:            Feature/Change Request
 Operating System:   Windows XP
 PHP Version:        5.0.4
 Block user comment: N
 Private report:     N

 New Comment:

So, this is still an issue that affects programs like MediaWiki running on 
Windows servers: https://bugzilla.wikimedia.org/show_bug.cgi?id=1780

Windows NT/XP/Vista/7/8/etc use Unicode for file names, but the POSIX 
filesystem APIs expose only the "ANSI" interfaces, which use 8-bit or DBCS 
locale-specific encodings.

Since PHP's filesystem APIs seem to use the POSIX ones, you can't use UTF-8 for 
filenames as one expects on Linux, Mac OS X, etc. Not only this, but there's no 
general way to simply switch encodings -- you'll be limited to the 8-bit 
encoding's character set, so can't for instance upload a Chinese-named file to 
a Russian-configured server.

Some other cross-platform toolkits like glib use Win32's Unicode filesystem 
APIs internally, which allow using the full Unicode character set for 
filenames, and expose the Unicode strings as UTF-8 for C null-terminated string 
compatibility.


Previous Comments:
------------------------------------------------------------------------
[2005-06-15 14:42:12] php-bug dot scyt at spamgourmet dot com

Nice try. :)

It should do the same thing apache does. Convert the UTF-8 string to the 
system's character encoding. That is UTF-16 or the 8-bit character encoding 
windows is currently set to.

You could do the conversion to the 8-bit character encoding. That would save 
some code changes and would work in most cases. But there are systems where 
this would still fail. e.g. having directory names with german umlauts and 
setting the 8-bit character encoding to a character encoding without this 
umlauts (e.g. russian). The UTF-16 way would work in this scenario and is 
therefore preferable.

------------------------------------------------------------------------
[2005-06-15 13:44:36] ed...@php.net

The underlying filesystem is not UTF-8 based so there is very little PHP can do 
here.

------------------------------------------------------------------------
[2005-06-15 11:59:56] php-bug dot scyt at spamgourmet dot com

Description:
------------
Apache2 on Windows NT based operating systems uses utf-8 to encode pathnames. 

php fails with "file not found" error messages if it encounters such pathes. 
mod_php and php-cgi.exe is affected.

Reproduce code:
---------------
Rename a directory that contains a php script to "testÄ". That is an capital A 
with two dots above it. Codepoint 196 in latin-1. Then try to access this php 
script through the webserver.



------------------------------------------------------------------------



-- 
Edit this bug report at https://bugs.php.net/bug.php?id=33350&edit=1

Reply via email to