Edit report at https://bugs.php.net/bug.php?id=62861&edit=1

 ID:                 62861
 Updated by:         ras...@php.net
 Reported by:        soapergem at gmail dot com
 Summary:            htmlentities returns empty string when it shouldn't
-Status:             Open
+Status:             Not a bug
 Type:               Bug
 Package:            *General Issues
 Operating System:   Windows
 PHP Version:        5.4.6
 Block user comment: N
 Private report:     N

 New Comment:

UTF-8 is only compatible with low-ascii, not with high. The copyright symbol in 
ISO-8859-1 is character code (in hex) <A9>. In UTF-8 the copyright symbol is 
represented by two bytes, <C2><A9>. The world has gone UTF-8. If your editor is 
in UTF-8 mode and you enter/paste a copyright symbol and pass it to 
htmlentities() you will get "&copy;" back. So rather than change the code to 
hardcode ISO-8859-1 you should convert your datasources to UTF-8. Most of them 
are probably already UTF-8 which means that your current code was likely not 
handling these correctly since it assumed ISO-8859-1 before.

For some perspetive: 
http://w3techs.com/technologies/overview/character_encoding/all
which shows that 72% of the top-million sites on the Web are using UTF-8. And 
this number is growing.


Previous Comments:
------------------------------------------------------------------------
[2012-08-19 04:14:03] soapergem at gmail dot com

Description:
------------
Doesn't UTF-8 include basic ASCII characters, too? Right now when I try to 
encode the copyright symbol (©) using htmlentities (it should encode to 
&copy;), it doesn't work. I discovered this since the default encoding for 
htmlentities() was switched from ISO-8859-1 to UTF-8 in version 5.4.

I have plenty of places where I rely on basic symbols, such as the copyright 
symbol, being encoded properly with htmlentities(). Having to go in and change 
all the instances of htmlentities($string) to htmlentities($string, ENT_COMPAT 
| ENT_HTML401, 'ISO-8859-1') is not practical (there are MANY). And with the 
whole output of the function being blank, it just makes my scripts completely 
unusable now.

Help!

Test script:
---------------
<?php

echo htmlentities('©', ENT_COMPAT | ENT_HTML401, 'UTF-8');

?>

Expected result:
----------------
&copy;

Actual result:
--------------
(Nothing - an empty string)


------------------------------------------------------------------------



-- 
Edit this bug report at https://bugs.php.net/bug.php?id=62861&edit=1

Reply via email to