Edit report at https://bugs.php.net/bug.php?id=62861&edit=1
ID: 62861 User updated by: soapergem at gmail dot com Reported by: soapergem at gmail dot com Summary: htmlentities returns empty string when it shouldn't Status: Not a bug Type: Bug Package: *General Issues Operating System: Windows PHP Version: 5.4.6 Block user comment: N Private report: N New Comment: I am aware that Notepad is not a suitable editor for development. It is just the de facto "basic" editor in Windows. If something doesn't work in Notepad, you're usually in trouble. I use an editor called EditPlus, which is a very good editor. The older version which I have used does not have support for removing the BOM, but I see the newer version does, so I will have to upgrade. But I would really appreciate it if you could address my suggestion about using the default_charset defined in php.ini automatically. Right now having to call htmlentities($string, ENT_COMPAT | ENT_HTML401, "") seems very counter-intuitive to invoke what should be the default. Previous Comments: ------------------------------------------------------------------------ [2012-08-19 14:27:31] ras...@php.net Every real editor can do that. Windows Notepad is not a real editor. Notepad++ (which is free and much much better than Notepad), Notepad2, Textmate, Vim, Jedit, Ultraedit, Emacs, SourceEdit can all do this. ------------------------------------------------------------------------ [2012-08-19 14:27:07] ni...@php.net Windows Notepad does not support this because Notepad is not a suitable editor for development. All development-oriented texteditors and IDEs support saving files without BOM. One commonly used text editor for Windows is Notepad++ (in case you don't want to use a full-blown IDE). ------------------------------------------------------------------------ [2012-08-19 14:11:43] soapergem at gmail dot com There is no option to save without the BOM in Windows Notepad. Nor is there an option to save with/without the BOM in many other Windows editors. It is automatically added to the file and there is nothing I can do about that -- short of writing a script to programmatically go through all my other scripts with fopen(), remove the first three characters, and then re-save. That is NOT a practical option. PHP should be handling this. As it stands, PHP 5.4 is completely unusable. Until you guys fix this, I need to stick with 5.3, because 5.4 will break all of my scripts -- and all the scripts of ANYONE who uses htmlentities() on a Windows server. Please take my suggestion about using the default_charset to heart. That would finally resolve this issue. ------------------------------------------------------------------------ [2012-08-19 13:59:09] ni...@php.net Save your document as UTF-8 *without* BOM. The  is just what the UTF-8 Byte Order Mark (BOM) looks like when it is output (which is probably something you don't want, so save the file without it). ------------------------------------------------------------------------ [2012-08-19 13:49:39] ras...@php.net >From my command line: php > echo htmlentities('©', ENT_COMPAT | ENT_HTML401, 'UTF-8'); © it works fine. If you are actually providing the correct UTF-8 char it will work fine. You can verify that by doing this: php > $a = chr(0xC2).chr(0xA9); php > echo htmlentities($a, ENT_COMPAT | ENT_HTML401, 'UTF-8'); © Here I am explicitly passing C2A9 in and I get © back out. So I have no idea what your Windows Notepad is doing. Look at the output with a hex editor and see what it is converting that copyright character to. ------------------------------------------------------------------------ The remainder of the comments for this report are too long. To view the rest of the comments, please view the bug report online at https://bugs.php.net/bug.php?id=62861 -- Edit this bug report at https://bugs.php.net/bug.php?id=62861&edit=1