#25670 [WFx]: cannot yet handle MBCS in html_entity_decode()

nospam at unclassified dot de Sat, 27 Sep 2003 03:14:02 -0700

 ID:               25670
 User updated by:  nospam at unclassified dot de
 Reported By:      nospam at unclassified dot de
 Status:           Wont fix
 Bug Type:         *General Issues
 Operating System: Windows, Linux
 PHP Version:      4.3.2
 New Comment:


OK, I found another way for my issue anyway...

<?php
// Returns the utf string corresponding to the unicode value (from
php.net, courtesy - [EMAIL PROTECTED])
function code2utf($num)
{
        if ($num < 128) return chr($num);
        if ($num < 2048) return chr(($num >> 6) + 192) . chr(($num & 63) +
128);
        if ($num < 65536) return chr(($num >> 12) + 224) . chr((($num >> 6) &
63) + 128) . chr(($num & 63) + 128);
        if ($num < 2097152) return chr(($num >> 18) + 240) . chr((($num >> 12)
& 63) + 128) . chr((($num >> 6) & 63) + 128) . chr(($num & 63) + 128);
        return '';
}

function encode($str)
{
        return preg_replace('/&#(\\d+);/e', 'code2utf($1)',
utf8_encode($str));
}
?>


Previous Comments:
------------------------------------------------------------------------

[2003-09-27 00:07:57] [EMAIL PROTECTED]

The very issue was already addressed and the appropriate fix is ready
for php5, though we won't introduce this feature to the current stable
version (4.3.x).

See:
http://cvs.php.net/diff.php/php-src/NEWS?r1=1.1403&r2=1.1404



------------------------------------------------------------------------

[2003-09-26 09:30:48] nospam at unclassified dot de

Slightly correcting: It won't use Latin-1 but just do nothing. I tested
with a Unicode character (&#x2248; "almost equal") and it returned the
character readable.
Can be tested by passing html_entity_decode to htmlspecialchars and
then echo'ing.

------------------------------------------------------------------------

[2003-09-26 09:24:52] nospam at unclassified dot de

Description:
------------
Trying to decode HTML entities into UTF-8 results in the following
error message:

Warning: cannot yet handle MBCS in html_entity_decode()!

The line is repeated about 200 times, then html_entity_decode just uses
ISO-8859-1 charset.

Reproduce code:
---------------
echo html_entity_decode("&uuml;", ENT_QUOTES, "UTF-8");

Expected result:
----------------
some UTF-8 encoding of '�'

Actual result:
--------------
error messages see above, then Latin-1 encoding of '�'


------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=25670&edit=1

#25670 [WFx]: cannot yet handle MBCS in html_entity_decode()

Reply via email to