ID: 30549 User updated by: david at davidheath dot org Reported By: david at davidheath dot org Status: Open Bug Type: mbstring related Operating System: linux PHP Version: 4.3.9 New Comment:
oops, minor bug in that script. Line 35 should read: printf(" incorrect mapping of char 0x%x: got 0x%x, expected 0x%x\n", $fromChar, $unicodeCharNumber[''], $expectChar); Corrected version of script for your cut+paste convenience: <?php testMapping('ISO-8859-7', array( 0xa4=>0x20ac, 0xa5=>0x20af, 0xaa=>0x37a) ); testMapping('ISO-8859-8', array( 0xaf=>0xaf, 0xfd=>0x200e, 0xfe=>0x200f) ); testMapping('ISO-8859-10', array( 0xa4=>0x12a ) ); function testMapping($targetEncoding, $map) { print "Encoding: $targetEncoding\n"; foreach($map as $fromChar=>$toChar) { $expectChar = $toChar; // convert to UCS-4, which represents every possible unicode // char as a single fixed width 32bit value $unicodeChar=mb_convert_encoding(chr($fromChar), 'UCS-4LE', $targetEncoding); $unicodeCharNumber = unpack('L', $unicodeChar); if ($expectChar!=$unicodeCharNumber[''] and ($expectChar!=0 and $unicodeCharNumber!=0x3f)) { printf(" incorrect mapping of char 0x%x: got 0x%x, expected 0x%x\n", $fromChar, $unicodeCharNumber[''], $expectChar); } } } ?> Previous Comments: ------------------------------------------------------------------------ [2004-10-25 13:25:30] david at davidheath dot org Hi Derick, ok, I included the charset map parsing code so that you could see that I was deriving the mappings directly from the unicode mapping files. Anyway, here is a lean-and-mean version: <?php testMapping('ISO-8859-7', array( 0xa4=>0x20ac, 0xa5=>0x20af, 0xaa=>0x37a) ); testMapping('ISO-8859-8', array( 0xaf=>0xaf, 0xfd=>0x200e, 0xfe=>0x200f) ); testMapping('ISO-8859-10', array( 0xa4=>0x12a ) ); function testMapping($targetEncoding, $map) { print "Encoding: $targetEncoding\n"; foreach($map as $fromChar=>$toChar) { $expectChar = $toChar; // convert to UCS-4, which represents every possible unicode // char as a single fixed width 32bit value $unicodeChar=mb_convert_encoding(chr($fromChar), 'UCS-4LE', $targetEncoding); $unicodeCharNumber = unpack('L', $unicodeChar); if ($expectChar!=$unicodeCharNumber[''] and ($expectChar!=0 and $unicodeCharNumber!=0x3f)) { printf(" incorrect mapping of char 0x%x: got 0x%x, expected 0x%x\n", $char, $unicodeCharNumber[''], $expectChar); } } } ?> ------------------------------------------------------------------------ [2004-10-25 10:33:42] [EMAIL PROTECTED] Hello David, can you please make a *short* script that show that the warnings are wrong as it takes quite some time to figure out what your script is exactly doing. regards, Derick ------------------------------------------------------------------------ [2004-10-25 09:53:55] david at davidheath dot org Description: ------------ MBstring appears to incorrectly map some characters for the following ISO-8859 charsets, as follows: Encoding: ISO-8859-7 incorrect mapping of char 0xa4: got 0x3f, expected 0x20ac incorrect mapping of char 0xa5: got 0x3f, expected 0x20af incorrect mapping of char 0xaa: got 0x3f, expected 0x37a Encoding: ISO-8859-8 incorrect mapping of char 0xaf: got 0x203e, expected 0xaf incorrect mapping of char 0xfd: got 0x3f, expected 0x200e incorrect mapping of char 0xfe: got 0x3f, expected 0x200f Encoding: ISO-8859-10 incorrect mapping of char 0xa4: got 0x124, expected 0x12a This is based on the mappings provided at ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/ on 25th Oct 2004. Note, there are undated comments in the "Version history" for the above files, as follows: 8859-7: # 2.0 version updates 1.0 version by adding mappings for the # three newly added characters 0xA4, 0xA5, 0xAA. 8859-8: # 1.1 version updates to the published 8859-8:1999, correcting # the mapping of 0xAF and adding mappings for LRM and RLM. 8859-10: # 1.1 corrected mistake in mapping of 0xA4 So I guess these mappings have changed since mbstring was first written. I'm not sure if there would be a backward-compatability problem if the mappings were changed. Thanks Dave Reproduce code: --------------- Code for this test is available at: http://davidheath.org/mbstring/mbstring_test.tar.bz2 Expected result: ---------------- Mappings as stated "expected xxx" above. Actual result: -------------- Mappings as stated "got xxx" above. ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=30549&edit=1