ID: 22108 Updated by: [EMAIL PROTECTED] Reported By: bugzilla at jellycan dot com -Status: Wont fix +Status: Assigned Bug Type: Feature/Change Request Operating System: Any PHP Version: All (as of the current implementation) Assigned To: moriyoshi New Comment:
Derick, Please do not change the status of the bug that is already assigned to someone. There's no point that PHP can only handle ASCII documents because if you want to use German in PHP for example, at least you have to use ISO-8859-1 or ISO-8859-15, which is not even part of ASCII. Previous Comments: ------------------------------------------------------------------------ [2003-06-03 14:17:22] [EMAIL PROTECTED] Feel free to rewrite the parser, but that's just not going to happen. We want ascii import, not unicode. ------------------------------------------------------------------------ [2003-06-03 14:07:16] gump at hotmail dot com > [8 Feb 4:24am CST] [EMAIL PROTECTED] > PHP doesn't want UNICODE scripts, but just ASCII ones. Not > a bug -> bogus. Not bogus. PHP is embedded in HTML, the surrounding document determines the encoding. You can't just specify this problem out of existence. ------------------------------------------------------------------------ [2003-05-05 03:40:23] tokiee at sayclub dot com for who are not familiar with UTF-8: UTF-8(UCS Transformation Format 8) is not different to ASCII. it's compatible with the ASCII: if you write your text in english with UTF-8. you dont see any difference between the text in ASCII in each byte. (and UTF-8 BOM is optional). it's not quite a exact explanation of UTF-8 but: UTF-8 expands ASCII to support Full UNICODE characters without disurbing any existing alphabet order or something. so basically the UTF-8 is ASCII. and you dont have to imagine it as totally new freak. actually, when a modern Unicode-supported OS reads this UTF-8, the OS needs to CONVERT it to real UNICODE internally. so the UTF-8 is rather similar with URL encoding. in ASCII world, each byte corresponds a character, up to 255 characters. in UNICODE, two bytes corresponds a character, up to 65535 characters. and it's totally a new system as you think. in UTF-8, it's interesting, a character can be one byte, or two bytes, or even 3, 4 bytes!. why is that so complicated but the rule is simple and actually you dont have to handle this: OS will do it for you. even if you have any software which does not understand the utf-8, it's totally okay because it's ASCII transparent. so it "can be used with normal string comparison functions for sorting and such." (quoted in PHP.NET Reference: utf8_encode()) ------------------------------------------------------------------------ [2003-04-14 12:17:37] [EMAIL PROTECTED] As a short-term workaround (yes I know it's not a solution), can you try using output buffering? That should at least solve the problem of sneaking the headers in prior to the BOM even if it doesn't solve the underlying problem of recoginizing document encodings properly. ------------------------------------------------------------------------ [2003-04-06 00:53:04] tronxoe at hotpop dot com The BOM is still fine when the php file does not include another Unicode file (by using @include()). Another problem: If a php file is saved in unicode, session and cookies can not be used because "headers already sent ...". I think the first 3 bytes has been sent in this case ------------------------------------------------------------------------ The remainder of the comments for this report are too long. To view the rest of the comments, please view the bug report online at http://bugs.php.net/22108 -- Edit this bug report at http://bugs.php.net/?id=22108&edit=1