On Sat, 16 Jul 2005 22:05:13 +0900, SADAHIRO Tomoyuki <[EMAIL PROTECTED]>
wrote:

> > This is a bug report for perl from [EMAIL PROTECTED],
> > generated with the help of perlbug 1.35 running under perl v5.8.4.
> > 
> > I ran into this, and wondered if it is a bug.
> > 
> > I have tested on perl 5.8.4 with Encode.pm version 1.99_01 (from
> > Debian package) and 2.10 (from CPAN).
> 
> Thanks for the report.

Thanks for the fast patch. Applied as change #25158

> utf8_to_uvchr((U8*)s, 0) used in do_chop() returns 0,
> not only if the octet sequence from *s is malformed,
> but also if *s == '\0'. The return value 0 should be
> for U+0000 (NUL) rather than malformedness.  Oops :-<
> 
> P.S. by the way, when the string in utf8 ends with malformed
> octet(s), how should chop() do?
> It has returned undef without modification of the string.

Seems reasonable, though just cutting off one byte of the string would maybe
more of an expected behaviour. Maybe

> SADAHIRO Tomoyuki
> 
> 
> diff -ur perl~/doop.c perl/doop.c
> --- perl~/doop.c      Mon Jul 11 04:49:52 2005
> +++ perl/doop.c       Sat Jul 16 21:53:44 2005
> @@ -977,7 +977,7 @@
>           s = send - 1;
>           while (s > start && UTF8_IS_CONTINUATION(*s))
>               s--;
> -         if (utf8_to_uvchr((U8*)s, 0)) {
> +         if (is_utf8_string((U8*)s, send - s)) {
>               sv_setpvn(astr, s, send - s);
>               *s = '\0';
>               SvCUR_set(sv, s - start);
> diff -ur perl~/t/op/chop.t perl/t/op/chop.t
> --- perl~/t/op/chop.t Fri Jan 23 23:19:45 2004
> +++ perl/t/op/chop.t  Sat Jul 16 20:59:16 2005
> @@ -6,7 +6,7 @@
>      require './test.pl';
>  }
>  
> -plan tests => 133;
> +plan tests => 137;
>  
>  $_ = 'abc';
>  $c = do foo();
> @@ -221,4 +221,14 @@
>      $a = "A$/";
>      $b = chomp $a;
>      is ($b, 2);
> +}
> +
> +{
> +    # [perl #36569] chop fails on decoded string with trailing nul
> +    my $asc = "perl\0";
> +    my $utf = "perl".pack('U',0); # marked as utf8
> +    is(chop($asc), "\0", "chopping ascii NUL");
> +    is(chop($utf), "\0", "chopping utf8 NUL");
> +    is($asc, "perl", "chopped ascii NUL");
> +    is($utf, "perl", "chopped utf8 NUL");
>  }
> END OF PATCH

-- 
H.Merijn Brand        Amsterdam Perl Mongers (http://amsterdam.pm.org/)
using Perl 5.6.2, 5.8.0, 5.8.5, & 5.9.2  on HP-UX 10.20, 11.00 & 11.11,
 AIX 4.3 & 5.2, SuSE 9.2 & 9.3, and Cygwin. http://www.cmve.net/~merijn
Smoking perl: http://www.test-smoke.org,    perl QA: http://qa.perl.org
 reports  to: [EMAIL PROTECTED],                perl-qa@perl.org

Reply via email to