Re: [perl #36569] chop fails on decoded string with trailing nul

Yitzchak Scott-Thoennes Mon, 18 Jul 2005 20:33:39 -0700

On Sat, Jul 16, 2005 at 03:48:10PM +0200, H.Merijn Brand wrote:
> On Sat, 16 Jul 2005 22:05:13 +0900, SADAHIRO Tomoyuki <[EMAIL PROTECTED]>
> wrote:
> 
> > > This is a bug report for perl from [EMAIL PROTECTED],
> > > generated with the help of perlbug 1.35 running under perl v5.8.4.
> > > 
> > > I ran into this, and wondered if it is a bug.
> > > 
> > > I have tested on perl 5.8.4 with Encode.pm version 1.99_01 (from
> > > Debian package) and 2.10 (from CPAN).
> > 
> > Thanks for the report.
> 
> Thanks for the fast patch. Applied as change #25158
> 
> > utf8_to_uvchr((U8*)s, 0) used in do_chop() returns 0,
> > not only if the octet sequence from *s is malformed,
> > but also if *s == '\0'. The return value 0 should be
> > for U+0000 (NUL) rather than malformedness.  Oops :-<
> > 
> > P.S. by the way, when the string in utf8 ends with malformed
> > octet(s), how should chop() do?
> > It has returned undef without modification of the string.
> 
> Seems reasonable, though just cutting off one byte of the string would maybe
> more of an expected behaviour. Maybe


Was there more to that sentence?

I'd vote for removing and returning a malformed char, from the last
non continuation byte on (or just the unexpected continuation bytes,
if the problem was too many of them).

That way, the data error is propagated onto the return value (as IMO
it should be), and a full-buffer problem will result in at most one
bad char.  In fact, I could see being able to rely on this being
advantageous to buffering code (both XS and perl):

   fill buffer with bytes
   chop char and set aside
   process buffer
   move choped char to start of buffer
   repeat

Re: [perl #36569] chop fails on decoded string with trailing nul

Reply via email to