From:             [EMAIL PROTECTED]
Operating system: WinXP
PHP version:      4.3.4
PHP Bug Type:     *XML functions
Bug description:  Possible bug in utf8_encode (bit operations)

Description:
------------
Hi!

I'm currently developing a nice script that generates OpenOffice SXW files
by filling the content.xml (which is UTF-8 encoded) with database content.
While trying to do this I found out that utf8_encode('') (charcode 147)
returns '“'. But when I checked the whole result in OffenOffice '' is
displayed as square (character unknown?!). So I made some tests with UTF-8
conversion (even mb_* functions) and recognized that characters between
128 and 160 returned by utf8_encode() dont seem to match the standard. As
mentioned above '' is returned as '“' but should be '’' (as you will
get it using UltraEdit for conversion).

Does anyone can give me some explanations here?

Im not familiar with this UTF-8 / bit-conversion stuff, but I dont think
PHP does what its supposed to do here. For a first workaround I simply
coded a custom_utf8_encode() that uses an own char map to override this
misbehaviour (see below). Can someone help my out with this strange bug?!

Regards
Bjoern Kraus


function custom_utf8_encode($str)
{
    $chrMap = array(128 => '', 129 => '',  130 => '‚', 131 => 'ƒ',
                    132 => '„', 133 => '…', 134 => ' ', 135 =>
'‡',
                    136 => 'ˆ',  137 => '‰', 138 => ' ',  139 =>
'‹',
                    140 => 'Œ',  141 => '',  142 => 'Ž',  143 =>
'',
                    144 => '',  145 => '‘', 146 => '’', 147 =>
'“',
                    148 => '”', 149 => '•', 150 => '–', 151 =>
'—',
                    152 => '˜',  153 => '™', 154 => 'š',  155 =>
'›',
                    156 => 'œ',  157 => '',  158 => 'ž',  159 =>
'Ÿ');
                    
    $newStr = '';

    for ($i = 0; $i < strlen($str); $i++) {
        $chrVal = ord($str[$i]);
        if ($chrVal > 127 && $chrVal < 160) {
            $newStr .= $chrMap[$chrVal];
        }
        else {
            $newStr .= utf8_encode($str[$i]);
        }
    }
    
    return $newStr;
}



-- 
Edit bug report at http://bugs.php.net/?id=28654&edit=1
-- 
Try a CVS snapshot (php4):  http://bugs.php.net/fix.php?id=28654&r=trysnapshot4
Try a CVS snapshot (php5):  http://bugs.php.net/fix.php?id=28654&r=trysnapshot5
Fixed in CVS:               http://bugs.php.net/fix.php?id=28654&r=fixedcvs
Fixed in release:           http://bugs.php.net/fix.php?id=28654&r=alreadyfixed
Need backtrace:             http://bugs.php.net/fix.php?id=28654&r=needtrace
Need Reproduce Script:      http://bugs.php.net/fix.php?id=28654&r=needscript
Try newer version:          http://bugs.php.net/fix.php?id=28654&r=oldversion
Not developer issue:        http://bugs.php.net/fix.php?id=28654&r=support
Expected behavior:          http://bugs.php.net/fix.php?id=28654&r=notwrong
Not enough info:            http://bugs.php.net/fix.php?id=28654&r=notenoughinfo
Submitted twice:            http://bugs.php.net/fix.php?id=28654&r=submittedtwice
register_globals:           http://bugs.php.net/fix.php?id=28654&r=globals
PHP 3 support discontinued: http://bugs.php.net/fix.php?id=28654&r=php3
Daylight Savings:           http://bugs.php.net/fix.php?id=28654&r=dst
IIS Stability:              http://bugs.php.net/fix.php?id=28654&r=isapi
Install GNU Sed:            http://bugs.php.net/fix.php?id=28654&r=gnused
Floating point limitations: http://bugs.php.net/fix.php?id=28654&r=float

Reply via email to