ID: 28654 Updated by: [EMAIL PROTECTED] Reported By: [EMAIL PROTECTED] -Status: Analyzed +Status: Assigned Bug Type: *Languages/Translation Operating System: WinXP PHP Version: 4.3.4 -Assigned To: +Assigned To: moriyoshi New Comment:
Moriyoshi: Was that last comment a statement of this being a bug in PHP or what? Is this verified bug? Can you fix it if it is? (if it's not bug -> bogus..) Previous Comments: ------------------------------------------------------------------------ [2004-06-14 21:00:29] [EMAIL PROTECTED] Looks like you are trying to do the conversion between the code page 1252 and UTF-8. http://www.microsoft.com/globaldev/reference/sbcs/ 1252.htm Let alone mbstring, most of iconv() implementations support CP1252 (a.k.a. IBM1252). HTH ------------------------------------------------------------------------ [2004-06-10 00:11:14] [EMAIL PROTECTED] Hm, what ISO standard do I use (german, Win32) when I paste© Word text into a textarea and post it to a PHP script? Is it possible to solve my problem by converting my character encoding to iso-8859-1 with the mb-functions? ------------------------------------------------------------------------ [2004-06-08 09:38:23] [EMAIL PROTECTED] utf8_encode only deals with iso-8859-1, which does not define characters in the range from 128 to 160. Though it should probably just replace those characters with a question mark, as that's how invalid characters are usually converted. ------------------------------------------------------------------------ [2004-06-06 22:55:32] [EMAIL PROTECTED] Description: ------------ Hi! I'm currently developing a nice script that generates OpenOffice SXW files by filling the content.xml (which is UTF-8 encoded) with database content. While trying to do this I found out that utf8_encode('�') (charcode 147) returns ''. But when I checked the whole result in OffenOffice '�' is displayed as square (character unknown?!). So I made some tests with UTF-8 conversion (even mb_* functions) and recognized that characters between 128 and 160 returned by utf8_encode() don�t seem to match the standard. As mentioned above '�' is returned as '' but should be '’' (as you will get it using UltraEdit for conversion). Does anyone can give me some explanations here? I�m not familiar with this UTF-8 / bit-conversion stuff, but I don�t think PHP does what it�s supposed to do here. For a first workaround I simply coded a custom_utf8_encode() that uses an own char map to override this misbehaviour (see below). Can someone help my out with this strange bug?! Regards Bjoern Kraus function custom_utf8_encode($str) { $chrMap = array(128 => '�', 129 => '', 130 => '‚', 131 => 'ƒ', 132 => '„', 133 => '…', 134 => '� ', 135 => '‡', 136 => 'ˆ', 137 => '‰', 138 => '� ', 139 => '‹', 140 => 'Œ', 141 => '', 142 => 'Ž', 143 => '', 144 => '', 145 => '‘', 146 => '’', 147 => '“', 148 => '”', 149 => '•', 150 => '–', 151 => '—', 152 => '˜', 153 => '™', 154 => 'š', 155 => '›', 156 => 'œ', 157 => '', 158 => 'ž', 159 => 'Ÿ'); $newStr = ''; for ($i = 0; $i < strlen($str); $i++) { $chrVal = ord($str[$i]); if ($chrVal > 127 && $chrVal < 160) { $newStr .= $chrMap[$chrVal]; } else { $newStr .= utf8_encode($str[$i]); } } return $newStr; } ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=28654&edit=1
