Edit report at https://bugs.php.net/bug.php?id=47494&edit=1
ID: 47494 Updated by: ras...@php.net Reported by: philipp dot feigl at gmail dot com Summary: htmlspecialchars does not throw E_WARNING on multibyte problems Status: Not a bug Type: Feature/Change Request Package: Strings related Operating System: CentOS5 PHP Version: 5.2.8 Block user comment: N Private report: N New Comment: You assume ASCII7 compatibility for all encodings which is a bad assumption. Previous Comments: ------------------------------------------------------------------------ [2012-09-06 11:39:19] lzsiga at freemail dot c3 dot hu Imho htmlspecialchars should not check for multi-byte validity at all, because it only deals with a few characters that are all in ASCII7, so it could safely ignore every byte between 0x80 and 0xFF. The third parameter could be simply ignored (as if it were 'ISO-8859-1') ------------------------------------------------------------------------ [2012-08-30 19:21:49] ni...@php.net @the disappointed user: PHP 5.4 no longer throws said warning (it was just confusing). Instead there are several new options for dealing with incorrect encoding. Of particular interest is ENT_SUBSTITUTE, which will replace invalid code unit sequences with the Unicode Replacement Character (instead of returning a rather unhelpful empty string). This way you can easily spot where the string is incorrectly encoded. Furthermore this option has the additional advantage of being more graceful (it just removed individual incorrectly encoded bytes, not the whole string). Hope this helps you. More info in the docs: http://de2.php.net/htmlspecialchars ------------------------------------------------------------------------ [2012-08-30 19:01:22] another_disappointed_php_programmer at exam This is very sad. This is a bug, and it's sad that PHP core developers said that it's a feature and it won't be fixed. I'm disappointed. ------------------------------------------------------------------------ [2012-07-01 15:34:03] ras...@php.net This really isn't a bug. I do agree that the approach isn't ideal, but we shouldn't throw warnings on bad input here because htmlspecialchars() is explicitly designed to clean up bad input and it is run directly on user data most of the time. In order for someone to avoid this warning they would need to first call something like iconv('utf-8','utf-8') to clean up the input data and that doesn't make much sense since htmlspecialchars() essentially does that already. But, in order to help debugging there should be some way to see why an htmlspecialchars() call failed so a last_error() function similar to how it is handled for json decoding would make sense. ------------------------------------------------------------------------ [2012-07-01 15:12:31] chris at cbsinteractive dot com Happening our production servers, can replicate, PHP 5.3.10, Centos 5.6 ------------------------------------------------------------------------ The remainder of the comments for this report are too long. To view the rest of the comments, please view the bug report online at https://bugs.php.net/bug.php?id=47494 -- Edit this bug report at https://bugs.php.net/bug.php?id=47494&edit=1