Edit report at https://bugs.php.net/bug.php?id=60884&edit=1
ID: 60884
User updated by: t dot nickl at exse dot de
Reported by: t dot nickl at exse dot de
Summary: htmlentities() behaves differently and thus breaks
existing code
Status: Bogus
Type: Bug
Package: *General Issues
Operating System: CentOS 4.4
PHP Version: 5.4.0RC6
Block user comment: N
Private report: N
New Comment:
@[email protected]:
Setting default_charset to latin1 does not work. Empty string is still
outputted when calling htmlentities with only one argument.
Your copy&paste preamble does not help, changing the meaning of the written
code is a bug, don't worry.
@[email protected]:
Thank you, I sadly will change every htmlentities($a) to
htmlentities($a,NULL,'') before deploying php5.4.
Previous Comments:
------------------------------------------------------------------------
[2012-01-25 22:52:52] [email protected]
I know it hurts, but we really need to move away from ISO-8859-1 and towards
UTF-8 as the default charset of the Web. We have chosen to take the hit in 5.4.
The documentation has carried a warning about this impending change for quite a
while urging people to specify a charset.
For PHP 5.4 compatibility Typo3 should either hardcode iso-8859-1 or they
should
change their calls to:
htmlentities($a,NULL,'')
to pick up the default script-encoding charset.
------------------------------------------------------------------------
[2012-01-25 18:01:23] [email protected]
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php
In PHP 5.4 the default_charset php.ini option was set to utf-8. You can
override this in php.ini or .htaccess or such.
------------------------------------------------------------------------
[2012-01-25 15:29:09] t dot nickl at exse dot de
Description:
------------
//This code must be run via web:
//This is a string from e.g. some database containing a german umlaut 'ä'.
Note the encoding really is iso8859-1 . It's just assigned here literally to be
concise.
$a = "Rechnungsadresse ändern";
//this output works: (An empty string activates some autodetection)
var_dump(htmlentities($a, ENT_COMPAT | ENT_HTML401, ''));
//this works too (the same output is generated):
var_dump(htmlentities($a, ENT_COMPAT | ENT_HTML401, 'ISO-8859-1'));
//this does NOT work (outputs empty string)
var_dump(htmlentities($a));
// Reason: php changed the charset htmlentities uses when you NOT give anything
(90% of the code out there):
//determine_charset() :
///////////////////////////////////////////////////////
// php-5.2.1/ext/standard/html.c :
// /* Guarantee default behaviour for backwards compatibility */
// if (charset_hint == NULL)
// return cs_8859_1;
/////////////////////////////////////////////////////
// php-5.4.0RC4/ext/standard/html.c :
// /* Default is now UTF-8 */
// if (charset_hint == NULL)
// return cs_utf_8;
// This breaks the meaning of existing german code. For example, typo3 outputs
empty string if end users used german umlauts in rich text editor in backend.
// Please change determine_charset() back to using cs_8859_1 if the third
parameter of htmlentities() is omitted.
Test script:
---------------
See description.
Expected result:
----------------
See description.
Actual result:
--------------
See description.
------------------------------------------------------------------------
--
Edit this bug report at https://bugs.php.net/bug.php?id=60884&edit=1