ID: 17154 Updated by: [EMAIL PROTECTED] Reported By: k dot joe at freemail dot hu Status: Bogus Bug Type: Recode related Operating System: Linux2.2.19/Debian PHP Version: 4.3.3-dev New Comment:
This is definately a bug in recode-3.6. Please see http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=156635 for a patch against recode-3.6. Maybe we should check for this bug when configuring PHP --with recode. Debian maintainers have also renamed internal symbols that conflicted with imap and mysql (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=131080), so it might be wise to explicitly check against those symbols before denying configuration with PHP 5 aswell. Previous Comments: ------------------------------------------------------------------------ [2003-07-07 07:52:59] [EMAIL PROTECTED] I had a look at this, but it really looks correct from the PHP side. For some reason the recode library returns a string that is too long with random chars behind it. It's not a bug in PHP, everything is done as the documentation of recode tells it should be done. I used recode 3.6 for my tests and it definitely doesn't behave as it should. ------------------------------------------------------------------------ [2002-09-18 17:42:45] luka at mail dot ljudmila dot org this bug is for real! just stumbled into it while writing a mail script. recode _does_ stubbornly add somewhat random trailing garbage to strings on my system. i made a test script to figure it out, so i might as well post it here. my php is 4.2.3, system is Debian. i also got some segfaults from my mail script, but this was rare and might or might not be connected to the trailing garbage bug sample output first (wrong, clearly): SNIP> bash-2.05b$ php4 recodetest.php X-Powered-By: PHP/4.2.3 Content-type: text/html testing recode request ISO-8859-1..UTF-8 INPUT: "Some Hacker <[EMAIL PROTECTED]>" OUTPUT: "Some Hacker <[EMAIL PROTECTED]>" "Some Hacker <[EMAIL PROTECTED]>" "Some Hacker <[EMAIL PROTECTED]>&" "Some Hacker <[EMAIL PROTECTED]>" "Some Hacker <[EMAIL PROTECTED]>" "Some Hacker <[EMAIL PROTECTED]>" "Some Hacker <[EMAIL PROTECTED]>@" "Some Hacker <[EMAIL PROTECTED]" "Some Hacker <[EMAIL PROTECTED]>0u" INPUT: "Some Hacker <[EMAIL PROTECTED] " OUTPUT: "Some Hacker <[EMAIL PROTECTED] 0u" INPUT: "Some Hacker [EMAIL PROTECTED]>" OUTPUT: "Some Hacker [EMAIL PROTECTED]>0u" INPUT: "Some Hacker <[EMAIL PROTECTED]>" OUTPUT: "Some Hacker <[EMAIL PROTECTED]>u" INPUT: "Some Hacker <[EMAIL PROTECTED] " OUTPUT: "Some Hacker <[EMAIL PROTECTED] u" INPUT: "Some Hacker [EMAIL PROTECTED]>" OUTPUT: "Some Hacker [EMAIL PROTECTED]>u" INPUT: "Some Hacker <[EMAIL PROTECTED]> " OUTPUT: "Some Hacker <[EMAIL PROTECTED]> " INPUT: "Some Hacker <[EMAIL PROTECTED] " OUTPUT: "Some Hacker <[EMAIL PROTECTED] " INPUT: "Some Hacker [EMAIL PROTECTED]> " OUTPUT: "Some Hacker [EMAIL PROTECTED]> " INPUT: "� B " OUTPUT: "�� B " INPUT: "MAKE MONEY REALLY REALLY REALLY FAST" OUTPUT: "MAKE MONEY REALLY REALLY REALLY FASTY" "MAKE MONEY REALLY REALLY REALLY FAST" Tried 200 loops on 11 test(s). <SNIP and the code, so you can try too! <?php #try different encodings $from='ISO-8859-1'; #$from='ascii'; $to='UTF-8'; #$to='HTML'; #$to='flat'; echo "testing recode request $from..$to\n"; $tests=array ( 'Some Hacker <[EMAIL PROTECTED]>', 'Some Hacker <[EMAIL PROTECTED] ', 'Some Hacker [EMAIL PROTECTED]>', 'Some Hacker <[EMAIL PROTECTED]>', 'Some Hacker <[EMAIL PROTECTED] ', 'Some Hacker [EMAIL PROTECTED]>', 'Some Hacker <[EMAIL PROTECTED]> ', 'Some Hacker <[EMAIL PROTECTED] ', 'Some Hacker [EMAIL PROTECTED]> ', "\xA0 \x10 \x42 \x00", 'MAKE MONEY REALLY REALLY REALLY FAST', ); $tries=200; foreach ($tests as $t) { print "\nINPUT: \"$t\"\nOUTPUT:\n"; for ($i=0;$i<$tries;$i++) { $output=recode("$from..$to",$t); if ($output!=$old) { print "\"$output\"\n"; $old=$output; } } } echo "\n\nTried $tries loops on ".sizeof($tests)." test(s).\n"; ?> hopefully this will give someone a chance to test on latest sources, or at least a clue about the cause of the bug ------------------------------------------------------------------------ [2002-06-24 12:10:28] [EMAIL PROTECTED] not exactly true what i said: 4.3.0-dev does not always segfault (mostly with a string-length of 96...) and it seems to behave like 4.2 chregu ------------------------------------------------------------------------ [2002-06-24 12:08:07] [EMAIL PROTECTED] Same problem here. same string lengths, which cause errors. recode on the commandline does it perfectly right. php 4.2 did add trailing garbage php 4.3-dev segfaults chregu ------------------------------------------------------------------------ [2002-06-06 18:03:32] k dot joe at freemail dot hu Tests with PHP4.3.0-dev and PHP4.2.1 get same (wrong) result. The recoded string's length and the original stringlength are not equal. Simply try to recode a 36 chr long string will results a 40 byte long string, so the return value contains additional 4 byte 0x00 chr garbage at the end: recode ("utf8..latin2", "0123456789012345678901234567890123456"); The error is reproducable at several stringlength: 36-39, 96-99, 186-189, 321-324, 523-526, 826-829, 1281-1284, 1963-1966, 2986-2989, 4521-4524, 6823-6826, and so on...:) (operation on the result string makes random crashes). Please try the examples above and report if it's working correctly. (sorry, if the previous description was confusing ;) Thx ------------------------------------------------------------------------ The remainder of the comments for this report are too long. To view the rest of the comments, please view the bug report online at http://bugs.php.net/17154 -- Edit this bug report at http://bugs.php.net/?id=17154&edit=1
