ID:               17154
 Updated by:       [EMAIL PROTECTED]
 Reported By:      k dot joe at freemail dot hu
 Status:           Bogus
 Bug Type:         Recode related
 Operating System: Linux2.2.19/Debian
 PHP Version:      4.3.3-dev
 New Comment:

This is definately a bug in recode-3.6. Please see
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=156635 for a patch
against recode-3.6.

Maybe we should check for this bug when configuring PHP --with recode.

Debian maintainers have also renamed internal symbols that conflicted
with imap and mysql
(http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=131080), so it might
be wise to explicitly check against those symbols before denying
configuration with PHP 5 aswell.


Previous Comments:
------------------------------------------------------------------------

[2003-07-07 07:52:59] [EMAIL PROTECTED]

I had a look at this, but it really looks correct from the PHP side.
For some reason the recode library returns a string that is too long
with random chars behind it. It's not a bug in PHP, everything is done
as the documentation of recode tells it should be done. I used recode
3.6 for my tests and it definitely doesn't behave as it should.

------------------------------------------------------------------------

[2002-09-18 17:42:45] luka at mail dot ljudmila dot org

this bug is for real!
just stumbled into it while writing a mail script.
recode _does_ stubbornly add somewhat random trailing garbage to
strings on my system. i made a test script to figure it out, so i might
as well post it here. 

my php is 4.2.3, system is Debian. i also got some segfaults from my
mail script, but this was rare and might or might not be connected to
the trailing garbage bug



sample output first (wrong, clearly):

SNIP>
bash-2.05b$ php4 recodetest.php
X-Powered-By: PHP/4.2.3
Content-type: text/html

testing recode request ISO-8859-1..UTF-8

INPUT: "Some Hacker <[EMAIL PROTECTED]>"
OUTPUT:
"Some Hacker <[EMAIL PROTECTED]>"
"Some Hacker <[EMAIL PROTECTED]>"
"Some Hacker <[EMAIL PROTECTED]>&"
"Some Hacker <[EMAIL PROTECTED]>"
"Some Hacker <[EMAIL PROTECTED]>"
"Some Hacker <[EMAIL PROTECTED]>"
"Some Hacker <[EMAIL PROTECTED]>@"
"Some Hacker <[EMAIL PROTECTED]"
"Some Hacker <[EMAIL PROTECTED]>0u"

INPUT: "Some Hacker <[EMAIL PROTECTED] "
OUTPUT:
"Some Hacker <[EMAIL PROTECTED] 0u"

INPUT: "Some Hacker  [EMAIL PROTECTED]>"
OUTPUT:
"Some Hacker  [EMAIL PROTECTED]>0u"

INPUT: "Some Hacker  <[EMAIL PROTECTED]>"
OUTPUT:
"Some Hacker  <[EMAIL PROTECTED]>u"

INPUT: "Some Hacker  <[EMAIL PROTECTED] "
OUTPUT:
"Some Hacker  <[EMAIL PROTECTED] u"

INPUT: "Some Hacker   [EMAIL PROTECTED]>"
OUTPUT:
"Some Hacker   [EMAIL PROTECTED]>u"

INPUT: "Some Hacker <[EMAIL PROTECTED]>  "
OUTPUT:
"Some Hacker <[EMAIL PROTECTED]>  "

INPUT: "Some Hacker <[EMAIL PROTECTED]   "
OUTPUT:
"Some Hacker <[EMAIL PROTECTED]   "

INPUT: "Some Hacker  [EMAIL PROTECTED]>  "
OUTPUT:
"Some Hacker  [EMAIL PROTECTED]>  "

INPUT: "&#65533;  B "
OUTPUT:
"&#65533;&#65533;  B "

INPUT: "MAKE MONEY REALLY REALLY REALLY FAST"
OUTPUT:
"MAKE MONEY REALLY REALLY REALLY FASTY"
"MAKE MONEY REALLY REALLY REALLY FAST"


Tried 200 loops on 11 test(s).

<SNIP

and the code, so you can try too!

<?php

#try different encodings


$from='ISO-8859-1';
#$from='ascii';

$to='UTF-8'; 
#$to='HTML';
#$to='flat';

echo "testing recode request $from..$to\n";

$tests=array
(
 'Some Hacker <[EMAIL PROTECTED]>',
 'Some Hacker <[EMAIL PROTECTED] ',
 'Some Hacker  [EMAIL PROTECTED]>',

 'Some Hacker  <[EMAIL PROTECTED]>',
 'Some Hacker  <[EMAIL PROTECTED] ',
 'Some Hacker   [EMAIL PROTECTED]>',

 'Some Hacker <[EMAIL PROTECTED]>  ',
 'Some Hacker <[EMAIL PROTECTED]   ',
 'Some Hacker  [EMAIL PROTECTED]>  ',

 "\xA0 \x10 \x42 \x00",
 'MAKE MONEY REALLY REALLY REALLY FAST',
);


$tries=200;

foreach ($tests as $t) {

  print "\nINPUT: \"$t\"\nOUTPUT:\n";
  for ($i=0;$i<$tries;$i++) {
    $output=recode("$from..$to",$t);
    if ($output!=$old) {
      print "\"$output\"\n";
      $old=$output;
    }
  }
}

echo "\n\nTried $tries loops on ".sizeof($tests)." test(s).\n";
?>

hopefully this will give someone a chance to test on latest sources, or
at least a clue about the cause of the bug

------------------------------------------------------------------------

[2002-06-24 12:10:28] [EMAIL PROTECTED]

not exactly true what i said:

4.3.0-dev does not always segfault (mostly with a string-length of
96...) and it seems to behave like 4.2

chregu 

------------------------------------------------------------------------

[2002-06-24 12:08:07] [EMAIL PROTECTED]

Same problem here. same string lengths, which cause errors.

recode on the commandline does it perfectly right.
php 4.2 did add trailing garbage
php 4.3-dev segfaults

chregu

------------------------------------------------------------------------

[2002-06-06 18:03:32] k dot joe at freemail dot hu

Tests with PHP4.3.0-dev and PHP4.2.1 get same (wrong) result. The
recoded string's length and the original stringlength are not equal.
Simply try to recode a 36 chr long string will results a 40 byte long
string, so the return value contains additional 4 byte 0x00 chr garbage
at the end:

recode ("utf8..latin2", "0123456789012345678901234567890123456");

The error is reproducable at several stringlength: 36-39, 96-99,
186-189, 321-324, 523-526, 826-829, 1281-1284, 1963-1966, 2986-2989,
4521-4524, 6823-6826, and so on...:)
(operation on the result string makes random crashes).

Please try the examples above and report if it's working correctly.
(sorry, if the previous description was  confusing ;)

Thx

------------------------------------------------------------------------

The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
    http://bugs.php.net/17154

-- 
Edit this bug report at http://bugs.php.net/?id=17154&edit=1

Reply via email to