Edit report at https://bugs.php.net/bug.php?id=47494&edit=1
ID: 47494 Updated by: ras...@php.net Reported by: philipp dot feigl at gmail dot com Summary: htmlspecialchars does not throw E_WARNING on multibyte problems Status: Not a bug Type: Feature/Change Request Package: Strings related Operating System: CentOS5 PHP Version: 5.2.8 Block user comment: N Private report: N New Comment: By simple I assume you mean an htmlspecialchars() function that doesn't check the validity of the characters. The problem is that we have to do that. We can't encode characters without understanding which charset we are dealing with and we need to make sure that the character we are looking at is a valid one. The world has moved beyond 7-bit ASCII, sorry. Previous Comments: ------------------------------------------------------------------------ [2012-09-13 17:07:47] lzsiga at freemail dot c3 dot hu If the name of the function were 'check_for_multibyte_validity_and_htmlspecialchars' then you'd be right, but even then I'd lobby for a simple 'htmlspecialchars' function... Doing something (ie multibyte validity check) that the user (the PHP-programmer in this case) didn't specifically ask doesn't seem to me to be a good idea (see magic_quotes for another example). PS: Of course I wouldn't complaining (or even know about the whole question) if the default value hadn't been changed to 'UTF-8' in 5.4. ------------------------------------------------------------------------ [2012-09-06 15:33:13] ras...@php.net Also note that many, if not most, apps use this as their only validity filter and if you output invalid UTF-8, for example, it can lead to security problems like the well-known IE 0xE0 XSS exploit. So at some point along the line you have to do a multi-byte check and it may as well be here since we need to do it anyway. ------------------------------------------------------------------------ [2012-09-06 15:29:07] ras...@php.net You assume ASCII7 compatibility for all encodings which is a bad assumption. ------------------------------------------------------------------------ [2012-09-06 11:39:19] lzsiga at freemail dot c3 dot hu Imho htmlspecialchars should not check for multi-byte validity at all, because it only deals with a few characters that are all in ASCII7, so it could safely ignore every byte between 0x80 and 0xFF. The third parameter could be simply ignored (as if it were 'ISO-8859-1') ------------------------------------------------------------------------ [2012-08-30 19:21:49] ni...@php.net @the disappointed user: PHP 5.4 no longer throws said warning (it was just confusing). Instead there are several new options for dealing with incorrect encoding. Of particular interest is ENT_SUBSTITUTE, which will replace invalid code unit sequences with the Unicode Replacement Character (instead of returning a rather unhelpful empty string). This way you can easily spot where the string is incorrectly encoded. Furthermore this option has the additional advantage of being more graceful (it just removed individual incorrectly encoded bytes, not the whole string). Hope this helps you. More info in the docs: http://de2.php.net/htmlspecialchars ------------------------------------------------------------------------ The remainder of the comments for this report are too long. To view the rest of the comments, please view the bug report online at https://bugs.php.net/bug.php?id=47494 -- Edit this bug report at https://bugs.php.net/bug.php?id=47494&edit=1