ID: 30549
User updated by: david at davidheath dot org
Reported By: david at davidheath dot org
-Status: Feedback
+Status: Open
Bug Type: mbstring related
Operating System: linux
PHP Version: 4.3.9
New Comment:
Hi Derick,
ok, I included the charset map parsing code so that you could see that
I was deriving the mappings directly from the unicode mapping files.
Anyway, here is a lean-and-mean version:
<?php
testMapping('ISO-8859-7',
array(
0xa4=>0x20ac,
0xa5=>0x20af,
0xaa=>0x37a)
);
testMapping('ISO-8859-8',
array(
0xaf=>0xaf,
0xfd=>0x200e,
0xfe=>0x200f)
);
testMapping('ISO-8859-10',
array(
0xa4=>0x12a
)
);
function testMapping($targetEncoding, $map) {
print "Encoding: $targetEncoding\n";
foreach($map as $fromChar=>$toChar) {
$expectChar = $toChar;
// convert to UCS-4, which represents every possible unicode
// char as a single fixed width 32bit value
$unicodeChar=mb_convert_encoding(chr($fromChar), 'UCS-4LE',
$targetEncoding);
$unicodeCharNumber = unpack('L', $unicodeChar);
if ($expectChar!=$unicodeCharNumber[''] and ($expectChar!=0 and
$unicodeCharNumber!=0x3f)) {
printf(" incorrect mapping of char 0x%x: got 0x%x,
expected 0x%x\n", $char, $unicodeCharNumber[''], $expectChar);
}
}
}
?>
Previous Comments:
------------------------------------------------------------------------
[2004-10-25 10:33:42] [EMAIL PROTECTED]
Hello David,
can you please make a *short* script that show that the warnings are
wrong as it takes quite some time to figure out what your script is
exactly doing.
regards,
Derick
------------------------------------------------------------------------
[2004-10-25 09:53:55] david at davidheath dot org
Description:
------------
MBstring appears to incorrectly map some characters for the following
ISO-8859 charsets, as follows:
Encoding: ISO-8859-7
incorrect mapping of char 0xa4: got 0x3f, expected 0x20ac
incorrect mapping of char 0xa5: got 0x3f, expected 0x20af
incorrect mapping of char 0xaa: got 0x3f, expected 0x37a
Encoding: ISO-8859-8
incorrect mapping of char 0xaf: got 0x203e, expected 0xaf
incorrect mapping of char 0xfd: got 0x3f, expected 0x200e
incorrect mapping of char 0xfe: got 0x3f, expected 0x200f
Encoding: ISO-8859-10
incorrect mapping of char 0xa4: got 0x124, expected 0x12a
This is based on the mappings provided at
ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/ on 25th Oct 2004.
Note, there are undated comments in the "Version history" for the above
files, as follows:
8859-7:
# 2.0 version updates 1.0 version by adding mappings for the
# three newly added characters 0xA4, 0xA5, 0xAA.
8859-8:
# 1.1 version updates to the published 8859-8:1999, correcting
# the mapping of 0xAF and adding mappings for LRM and RLM.
8859-10:
# 1.1 corrected mistake in mapping of 0xA4
So I guess these mappings have changed since mbstring was first
written. I'm not sure if there would be a backward-compatability
problem if the mappings were changed.
Thanks
Dave
Reproduce code:
---------------
Code for this test is available at:
http://davidheath.org/mbstring/mbstring_test.tar.bz2
Expected result:
----------------
Mappings as stated "expected xxx" above.
Actual result:
--------------
Mappings as stated "got xxx" above.
------------------------------------------------------------------------
--
Edit this bug report at http://bugs.php.net/?id=30549&edit=1