Edit report at https://bugs.php.net/bug.php?id=60714&edit=1
ID: 60714 Updated by: [email protected] Reported by: mahatma at bspu dot unibel dot by Summary: htmlspecialchars() ignore default_charset value Status: Open Type: Bug Package: *Languages/Translation Operating System: linux PHP Version: 5.4.0RC5 Block user comment: N Private report: N New Comment: The problem is changing the default from ISO-8859-1 to UTF-8. The ISO-8859-1 default was good for CP 1251 because the encoded characters are represented the same way (using htmlentities, however, would be a problem). The break in 5.4 is that the default was changed from ISO-8859-1 to UTF-8, and CP 1251 byte streams, in general, are not valid UTF-8 bytestreams, even though the encoded characters are represented the same way. This causes htmlspecialchars to error out and return an empty string. Previous Comments: ------------------------------------------------------------------------ [2012-01-12 08:47:08] [email protected] There was discussion. Simply changing the not-specified case to use the default_charset setting would break a lot of code. However, since it is a useful feature, it is supported through the empty string case as documented at http://php.net/htmlspecialchars ------------------------------------------------------------------------ [2012-01-11 22:50:11] [email protected] Yes, there's a BC break in 5.4 in this respect. You can make htmlspecialchars use default_charset, but you'd still have to change all the calls to use the empty string as charset. I'm leaving this open for now, as this change was made without any discussion I remember. ------------------------------------------------------------------------ [2012-01-11 15:38:30] mahatma at bspu dot unibel dot by Description: ------------ Since default charset changed, I got compatibility problem - htmlspecialchars() start to strip cp1251 (or any non-unicode) national symbols and no way to change another default charset. But looks like default_charset is provided for similar goals (and, ideally - charset= html detection too). I suggest just to get default charset for htmlspecialchars() (and IMHO for htmlentities()) from default_charset. ------------------------------------------------------------------------ -- Edit this bug report at https://bugs.php.net/bug.php?id=60714&edit=1
