ID: 43549 Updated by: [EMAIL PROTECTED] Reported By: mariusads at helpedia dot com -Status: Assigned +Status: Wont fix Bug Type: Strings related Operating System: Redhat?, Linux PHP Version: 5.2.5 Assigned To: stas New Comment:
As function seems to work as intended and there's other way for sanitizing utf-8, I'm marking it as wontfix for now, unless any new info arrives. Previous Comments: ------------------------------------------------------------------------ [2008-01-29 21:13:16] [EMAIL PROTECTED] As I commented in that bug, assuming you are passing in that character properly encoded, it will work. Nothing in that bug report shows an actual problem as you don't show the exact byte sequence you are passing in. ------------------------------------------------------------------------ [2008-01-29 14:31:46] tallyce at gmail dot com Thanks, but see http://bugs.php.net/43294 which shows that the dagger character (and others) results in the whole string disappearing, on some installations at least. I thought the dagger character was a valid UTF8 string, or would a submission of that character be considered "invalid input"? ------------------------------------------------------------------------ [2008-01-28 23:32:01] [EMAIL PROTECTED] It comes down to what to do with invalid input. We can't let invalid UTF-8 through, because if you do, your site will be insecure. Before this fix, your site was actually open to XSS exploits since you were spitting invalid UTF-8 chars out on a page marked as UTF-8 and that confuses IE. I suppose we could change htmlentities to just strip the invalid chars, but from a security perspective that is typically not the right approach. You can strip the invalid utf-8 chars yourself with: $str = iconv('utf-8','utf-8',$str); ------------------------------------------------------------------------ [2008-01-24 20:54:10] tallyce at gmail dot com See also bugs 43294 and 43896 which seem to be the same thing. This is really starting to bite now. Please can this be fixed, or suggest how we can reliably process incoming user data in UTF8 given this behaviour change! I concur this seems to be installation specific and earlier than 5.2.5 as shown in bug 43294. ------------------------------------------------------------------------ [2008-01-14 08:36:21] s-beutel at gmx dot de Hi, I confirm the very same issue for PHP 5.2.1/Apache2/RedHat. - has nothing to do with the browser encoding or GET'ed/POST'ed variables, since I simply convert a static string - seems to be installation specific, since it runs perfectly on my windows box (PHP 5.2.0) - I have the idea - but no evidence yet - that it's an older issue: for almost one year I tried to fix an issue with a tiny webshop which is an outcome of this, and which some users have been complaining about every now and then (obviously, without debugging or narrower information) Example skript: http://sbeutel.sb.ohost.de/trans.php Plain Text code: http://sbeutel.sb.ohost.de/trans.txt It simply encodes the string aou_äöü with various settings, and htmlentities($str,ENT_QUOTES,'utf-8'); spits out just nothing as soon as non-ASCII characters (german umlauts, in this case) are contained in the string. Hope this helps. Contact me if I may provide more information. ------------------------------------------------------------------------ The remainder of the comments for this report are too long. To view the rest of the comments, please view the bug report online at http://bugs.php.net/43549 -- Edit this bug report at http://bugs.php.net/?id=43549&edit=1
