ID: 41554 User updated by: victorepand at gmail dot com Reported By: victorepand at gmail dot com -Status: Feedback +Status: Open Bug Type: Strings related Operating System: Linux PHP Version: 4.4.7 New Comment:
Here are 2 short test scripts that demonstrate the problem: <?php $testhtml="<html>\n<head>\n<META http-equiv=Content-Type content=\"text/html; charset=UTF-8\">\n</head>\n<body>\nSpecial Characters: ©,,,,,®,, </body>\n</html>"; print $testhtml; ?> The sample output is shown here: http://www.vacuumfoodsealer.info/utftest2.php Special Characters: �,�,�,�,�,�,�,� The result is garbled which is correct in this case, because the content-type of the page is UTF-8 and the characters are not encoded. However, the second test script: <?php $testhtml="<html>\n<head>\n<META http-equiv=Content-Type content=\"text/html; charset=UTF-8\">\n</head>\n<body>\nSpecial Characters: ©,,,,,®,, </body>\n</html>"; print utf8_encode($testhtml); ?> Produces this output here: http://www.vacuumfoodsealer.info/utftest.php Special Characters: ©,’,—,“,”,®,™,… This time the characters have been encoded into UTF-8. Since the content-type of the page is UTF-8 and the characters have been encoded into UTF-8, then why should they appear garbled? And if it is not a bug with utf8_encode, then what method would I use to correctly display these characters in UTF-8? I don't know of any function that will convert these characters! Previous Comments: ------------------------------------------------------------------------ [2007-06-04 17:27:33] [EMAIL PROTECTED] Thank you for this bug report. To properly diagnose the problem, we need a short but complete example script to be able to reproduce this bug ourselves. A proper reproducing script starts with <?php and ends with ?>, is max. 10-20 lines long and does not require any external resources such as databases, etc. If the script requires a database to demonstrate the issue, please make sure it creates all necessary tables, stored procedures etc. Please avoid embedding huge scripts into the report. ------------------------------------------------------------------------ [2007-06-01 01:32:55] [EMAIL PROTECTED] My gut reaction to your problem is to mention that you've probably mixed up ISO 8859-1 and Windows-1252: the two are commonly confused for each other, the Windows encoding containing several more characters: However, said behavior does not precisely match up with your predicament, as © and ® are part of ISO 8859-1. Furthermore, the URL you supplied is already encoded in UTF-8. Perhaps you are double encoding? Either way, this is not a problem with the documentation, except possibly the fact that the user notes are waaaaay to long on utf8_encode and some of the info needs to be integrated into the main docs. ------------------------------------------------------------------------ [2007-06-01 00:57:31] victorepand at gmail dot com Description: ------------ I have used the function utf8_encode to encode iso-8859-1 pages into UTF-8 and displayed them on my site, but strange and funny characters are appearing such as "" and "Â". It turns out that the iso-8859-1 page contains the use of characters such as these: ©,,,,,®,, These characters display fine on my browser from the iso-8859-1 page, but when I use the utf8_encode function and display it on my utf-8 page, the result is garbled. So I have found the only solution is to manually convert all of the characters above before using the utf8_encode function and that solves the problem crudely, but it is not a perfect solution. What if I have missed any characters? Isn't there a cleaner method, a PHP function, that will do all this conversion without worry and without missing any characters? Reproduce code: --------------- Here is an example of an iso-8859-1 page which displays fine on my browser, but contains such characters such as ©,,,,,®,, mentioned above: http://www.jardenstore.com/product.aspx?bid=18&pid=1251 Expected result: ---------------- After using the utf8_encode function, I expected to see the page displaying correctly again on my UTF-8 page with these characters intact: ©,,,,,®,, Actual result: -------------- Instead, the result was garbled like this: â,â,â,â,Â,ââ¢,ââ¢,â,é,ð,â¢,,,è,Ž, ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=41554&edit=1
