ID:               28654
 Updated by:       [EMAIL PROTECTED]
 Reported By:      [EMAIL PROTECTED]
-Status:           Analyzed
+Status:           Assigned
 Bug Type:         *Languages/Translation
 Operating System: WinXP
 PHP Version:      4.3.4
-Assigned To:      
+Assigned To:      moriyoshi
 New Comment:

Moriyoshi: Was that last comment a statement of this being a bug in PHP
or what? Is this verified bug? Can you fix it if it is? (if it's not bug
-> bogus..)



Previous Comments:
------------------------------------------------------------------------

[2004-06-14 21:00:29] [EMAIL PROTECTED]

Looks like you are trying to do the conversion between 
the code page 1252 and UTF-8.

http://www.microsoft.com/globaldev/reference/sbcs/
1252.htm

Let alone mbstring, most of iconv() implementations 
support CP1252 (a.k.a. IBM1252).

HTH


------------------------------------------------------------------------

[2004-06-10 00:11:14] [EMAIL PROTECTED]

Hm, what ISO standard do I use (german, Win32) when I paste&copy Word
text into a textarea and post it to a PHP script?
Is it possible to solve my problem by converting my character encoding
to iso-8859-1 with the mb-functions?

------------------------------------------------------------------------

[2004-06-08 09:38:23] [EMAIL PROTECTED]

utf8_encode only deals with iso-8859-1, which does not define
characters in the range from 128 to 160. Though it should probably just
replace those characters with a question mark, as that's how invalid
characters are usually converted.

------------------------------------------------------------------------

[2004-06-06 22:55:32] [EMAIL PROTECTED]

Description:
------------
Hi!

I'm currently developing a nice script that generates OpenOffice SXW
files by filling the content.xml (which is UTF-8 encoded) with database
content. While trying to do this I found out that utf8_encode('')
(charcode 147) returns '“'. But when I checked the whole result in
OffenOffice '' is displayed as square (character unknown?!). So I made
some tests with UTF-8 conversion (even mb_* functions) and recognized
that characters between 128 and 160 returned by utf8_encode() dont
seem to match the standard. As mentioned above '' is returned as '“'
but should be '’' (as you will get it using UltraEdit for
conversion).

Does anyone can give me some explanations here?

Im not familiar with this UTF-8 / bit-conversion stuff, but I dont
think PHP does what its supposed to do here. For a first workaround I
simply coded a custom_utf8_encode() that uses an own char map to
override this misbehaviour (see below). Can someone help my out with
this strange bug?!

Regards
Bjoern Kraus


function custom_utf8_encode($str)
{
    $chrMap = array(128 => '', 129 => '',  130 => '‚', 131 =>
'ƒ',
                    132 => '„', 133 => '…', 134 => ' ', 135 =>
'‡',
                    136 => 'ˆ',  137 => '‰', 138 => ' ',  139 =>
'‹',
                    140 => 'Œ',  141 => '',  142 => 'Ž',  143 =>
'',
                    144 => '',  145 => '‘', 146 => '’', 147 =>
'“',
                    148 => '”', 149 => '•', 150 => '–', 151 =>
'—',
                    152 => '˜',  153 => '™', 154 => 'š',  155 =>
'›',
                    156 => 'œ',  157 => '',  158 => 'ž',  159 =>
'Ÿ');
                    
    $newStr = '';

    for ($i = 0; $i < strlen($str); $i++) {
        $chrVal = ord($str[$i]);
        if ($chrVal > 127 && $chrVal < 160) {
            $newStr .= $chrMap[$chrVal];
        }
        else {
            $newStr .= utf8_encode($str[$i]);
        }
    }
    
    return $newStr;
}




------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=28654&edit=1

Reply via email to