From:             [EMAIL PROTECTED]
Operating system: WinXP
PHP version:      4.3.4
PHP Bug Type:     *XML functions
Bug description:  Possible bug in utf8_encode (bit operations)

Description:
------------
Hi!

I'm currently developing a nice script that generates OpenOffice SXW files
by filling the content.xml (which is UTF-8 encoded) with database content.
While trying to do this I found out that utf8_encode('�') (charcode 147)
returns '“'. But when I checked the whole result in OffenOffice '�' is
displayed as square (character unknown?!). So I made some tests with UTF-8
conversion (even mb_* functions) and recognized that characters between
128 and 160 returned by utf8_encode() don�t seem to match the standard. As
mentioned above '�' is returned as '“' but should be '’' (as you will
get it using UltraEdit for conversion).

Does anyone can give me some explanations here?

I�m not familiar with this UTF-8 / bit-conversion stuff, but I don�t think
PHP does what it�s supposed to do here. For a first workaround I simply
coded a custom_utf8_encode() that uses an own char map to override this
misbehaviour (see below). Can someone help my out with this strange bug?!

Regards
Bjoern Kraus


function custom_utf8_encode($str)
{
    $chrMap = array(128 => '�', 129 => '',  130 => '‚', 131 => 'ƒ',
                    132 => '„', 133 => '…', 134 => '� ', 135 =>
'‡',
                    136 => 'ˆ',  137 => '‰', 138 => '� ',  139 =>
'‹',
                    140 => 'Œ',  141 => '',  142 => 'Ž',  143 =>
'',
                    144 => '',  145 => '‘', 146 => '’', 147 =>
'“',
                    148 => '”', 149 => '•', 150 => '–', 151 =>
'—',
                    152 => '˜',  153 => '™', 154 => 'š',  155 =>
'›',
                    156 => 'œ',  157 => '',  158 => 'ž',  159 =>
'Ÿ');
                    
    $newStr = '';

    for ($i = 0; $i < strlen($str); $i++) {
        $chrVal = ord($str[$i]);
        if ($chrVal > 127 && $chrVal < 160) {
            $newStr .= $chrMap[$chrVal];
        }
        else {
            $newStr .= utf8_encode($str[$i]);
        }
    }
    
    return $newStr;
}



-- 
Edit bug report at http://bugs.php.net/?id=28654&edit=1
-- 
Try a CVS snapshot (php4):  http://bugs.php.net/fix.php?id=28654&r=trysnapshot4
Try a CVS snapshot (php5):  http://bugs.php.net/fix.php?id=28654&r=trysnapshot5
Fixed in CVS:               http://bugs.php.net/fix.php?id=28654&r=fixedcvs
Fixed in release:           http://bugs.php.net/fix.php?id=28654&r=alreadyfixed
Need backtrace:             http://bugs.php.net/fix.php?id=28654&r=needtrace
Need Reproduce Script:      http://bugs.php.net/fix.php?id=28654&r=needscript
Try newer version:          http://bugs.php.net/fix.php?id=28654&r=oldversion
Not developer issue:        http://bugs.php.net/fix.php?id=28654&r=support
Expected behavior:          http://bugs.php.net/fix.php?id=28654&r=notwrong
Not enough info:            http://bugs.php.net/fix.php?id=28654&r=notenoughinfo
Submitted twice:            http://bugs.php.net/fix.php?id=28654&r=submittedtwice
register_globals:           http://bugs.php.net/fix.php?id=28654&r=globals
PHP 3 support discontinued: http://bugs.php.net/fix.php?id=28654&r=php3
Daylight Savings:           http://bugs.php.net/fix.php?id=28654&r=dst
IIS Stability:              http://bugs.php.net/fix.php?id=28654&r=isapi
Install GNU Sed:            http://bugs.php.net/fix.php?id=28654&r=gnused
Floating point limitations: http://bugs.php.net/fix.php?id=28654&r=float

Reply via email to