ID: 28654
Updated by: [EMAIL PROTECTED]
Reported By: [EMAIL PROTECTED]
-Status: Open
+Status: Analyzed
-Bug Type: *XML functions
+Bug Type: *Languages/Translation
Operating System: WinXP
PHP Version: 4.3.4
New Comment:
utf8_encode only deals with iso-8859-1, which does not define
characters in the range from 128 to 160. Though it should probably just
replace those characters with a question mark, as that's how invalid
characters are usually converted.
Previous Comments:
------------------------------------------------------------------------
[2004-06-06 22:55:32] [EMAIL PROTECTED]
Description:
------------
Hi!
I'm currently developing a nice script that generates OpenOffice SXW
files by filling the content.xml (which is UTF-8 encoded) with database
content. While trying to do this I found out that utf8_encode('�')
(charcode 147) returns ''. But when I checked the whole result in
OffenOffice '�' is displayed as square (character unknown?!). So I made
some tests with UTF-8 conversion (even mb_* functions) and recognized
that characters between 128 and 160 returned by utf8_encode() don�t
seem to match the standard. As mentioned above '�' is returned as ''
but should be '’' (as you will get it using UltraEdit for
conversion).
Does anyone can give me some explanations here?
I�m not familiar with this UTF-8 / bit-conversion stuff, but I don�t
think PHP does what it�s supposed to do here. For a first workaround I
simply coded a custom_utf8_encode() that uses an own char map to
override this misbehaviour (see below). Can someone help my out with
this strange bug?!
Regards
Bjoern Kraus
function custom_utf8_encode($str)
{
$chrMap = array(128 => '�', 129 => '', 130 => '‚', 131 =>
'ƒ',
132 => '„', 133 => '…', 134 => '� ', 135 =>
'‡',
136 => 'ˆ', 137 => '‰', 138 => '� ', 139 =>
'‹',
140 => 'Œ', 141 => '', 142 => 'Ž', 143 =>
'',
144 => '', 145 => '‘', 146 => '’', 147 =>
'“',
148 => '”', 149 => '•', 150 => '–', 151 =>
'—',
152 => '˜', 153 => '™', 154 => 'š', 155 =>
'›',
156 => 'œ', 157 => '', 158 => 'ž', 159 =>
'Ÿ');
$newStr = '';
for ($i = 0; $i < strlen($str); $i++) {
$chrVal = ord($str[$i]);
if ($chrVal > 127 && $chrVal < 160) {
$newStr .= $chrMap[$chrVal];
}
else {
$newStr .= utf8_encode($str[$i]);
}
}
return $newStr;
}
------------------------------------------------------------------------
--
Edit this bug report at http://bugs.php.net/?id=28654&edit=1