Edit report at https://bugs.php.net/bug.php?id=54569&edit=1
ID: 54569 Comment by: tomek at dma dot net dot pl Reported by: vmiszczak at ankama dot com Summary: echo,proc_open,... issues with UTF-8 terminal Status: Bogus Type: Bug Package: *General Issues Operating System: Windows PHP Version: 5.3.6 Block user comment: N Private report: N New Comment: <?php $data="Ä ÄÄÅÅ"; fwrite(STDOUT,$data."\n"); ?> this not works! Previous Comments: ------------------------------------------------------------------------ [2011-04-19 23:24:01] vmiszczak at ankama dot com Ok I think I understand what you mean. Windows does not handle strings as UTF-8 internally. I'm OK with that, so does Linux. It really does not matter as long you use correct encoding/decoding. Under Linux, I can name a file eï.php and call "php eï.php". It will execute that file (assuming locales are UTF-8). Here is an illustration: # echo "<?php echo \"ï\n\";?>" > eï.php # php eï.php ï # locale LANG=fr_FR.UTF-8 LC_CTYPE="fr_FR.UTF-8" LC_NUMERIC="fr_FR.UTF-8" LC_TIME="fr_FR.UTF-8" LC_COLLATE="fr_FR.UTF-8" LC_MONETARY="fr_FR.UTF-8" LC_MESSAGES="fr_FR.UTF-8" LC_PAPER="fr_FR.UTF-8" LC_NAME="fr_FR.UTF-8" LC_ADDRESS="fr_FR.UTF-8" LC_TELEPHONE="fr_FR.UTF-8" LC_MEASUREMENT="fr_FR.UTF-8" LC_IDENTIFICATION="fr_FR.UTF-8" LC_ALL= Doing the same under Windows (having the cmd.exe use chcp 65001): c:\>chcp 65001 Active code page: 65001 c:\>type eï.php <?php echo "ï\n";?> c:\php eï.php �� The "type" command is able to decode characters. Windows terminal handles UTF-8. You just have to activate it. Just like Linux. If I change the script to : c:\>type eï.php <?php echo "A ï in a string\n";?> c:\php eï.php A ï in a string So it seems it fails under certain circumstances. This is a simple example. My goal is to call (using proc_open() or similar) a program that takes unicode parameters. I cannot convert those parameters because they are file paths that contain multilingual characters. A mix of several languages in a string that makes impossible to code that on 1 byte. If I make an UTF-8 batch and call it, no problem, the program handles unicode and does its job. Using proc_open() or similar, characters are decoded the wrong way. Generating the script to a file and launching it from filesystem is not acceptable because the PHP program I'm working on needs performance. I'm working on interresting articles like http://stackoverflow.com/questions/2706097/how-to-do-proper-unicode-and-ansi- output-redirection-on-cmd-exe. It could be a Windows bug. I'd really like to work on Linux, but the project launches Windows tools :/ ------------------------------------------------------------------------ [2011-04-19 18:41:59] paj...@php.net The operating system runtime has no idea about UFT-8, in its shell, its file name encoding, etc. Its internal APIs obviously provide UTF-8 conversion function (or other). Secondly, as I said earlier you have to do the conversion manually as PHP does not use the WildChar APIs. ------------------------------------------------------------------------ [2011-04-19 18:32:26] vmiszczak at ankama dot com "Windows has no idea about UTF-8" Are you serious?!! C function WideCharToMultiByte(CP_UTF8,NULL,native,- 1,encoded,sizeof(encoded),NULL,NULL) does encode as UTF-8. printf("%s",encoded) does output correct things within a rightly configured terminal, just like you do under Linux. PHP does not. ------------------------------------------------------------------------ [2011-04-19 18:10:50] paj...@php.net Windows has no idea about UTF-8, especially not in console mode. But that does not mean the data you are echo'ing is not correct UTF-8. ------------------------------------------------------------------------ [2011-04-19 17:56:04] vmiszczak at ankama dot com Description: ------------ Those functions (at least) do strange things when using an UTF-8 terminal under Windows. For instance, doing a php 'echo' on a UTF-8 string does not output the string! Printing the same string with fwrite(STDOUT,$string) works. More problematic : parsing UTF-8 data and trying to execute a program (I've tried popen,exec,proc_open,passthru) with those data as program arguments make the program to not understand the decoding that should be used. Writing the script output to a batch file and executing the batch works. Test script: --------------- Launch a cmd.exe. Execute chcp 65001 to get the terminal use UTF-8 (make sure your font support this, use Lucida console for instance). Launch this script : <?php $data="ï"; fwrite(STDOUT,$data."\n"); echo $data."\n"; ?> Expected result: ---------------- I'm expecting UTF-8 data to be shown correctly and program execution string passed to exec() and so use correct string decoding. Actual result: -------------- ï �� ------------------------------------------------------------------------ -- Edit this bug report at https://bugs.php.net/bug.php?id=54569&edit=1