[perl.git] branch smoke-me/khw-smoke, created. v5.15.9-159-g762a88e

Karl Williamson Thu, 19 Apr 2012 12:16:10 -0700

In perl.git, the branch smoke-me/khw-smoke has been created

<http://perl5.git.perl.org/perl.git/commitdiff/762a88e1354db58815138deea62f4c5c7d68a2ae?hp=0000000000000000000000000000000000000000>


        at  762a88e1354db58815138deea62f4c5c7d68a2ae (commit)

- Log -----------------------------------------------------------------
commit 762a88e1354db58815138deea62f4c5c7d68a2ae
Author: Karl Williamson <[email protected]>
Date:   Thu Apr 19 13:13:49 2012 -0600

    uni/labels.t: Add feature unicode_eval
    
    This is temporary

M       t/uni/labels.t

commit edd0e82060e5a69a7fc73f9416574d06ab15d22f
Author: Karl Williamson <[email protected]>
Date:   Wed Apr 18 22:14:15 2012 -0600

    is_utf8_char_slow(): Avoid accepting overlongs
    
    There are possible overlong sequences that this function blindly
    accepts.  Instead of developing the code to figure this out, turn this
    function into a wrapper for utf8n_to_uvuni() which already has this
    check.

M       utf8.c

commit 98796faf1093ebdebea5dd3e345ce96744500c9f
Author: Karl Williamson <[email protected]>
Date:   Wed Apr 18 18:32:57 2012 -0600

    perlapi: Update for changes in utf8 decoding

M       utf8.c

commit 91eb706253d0741855918d9a717ddff118bdb848
Author: Karl Williamson <[email protected]>
Date:   Wed Apr 18 18:28:55 2012 -0600

    utf8n_to_uvuni(): Return REPLACEMENT not garbage
    
    Given malformed input with no warnings, this function used to return
    whatever it had computed so far.  But this is really invalid garbage.
    Return the REPLACEMENT CHARACTER instead.

M       Porting/perl5160delta.pod
M       utf8.c

commit 12e14d97f36f9bef09bcf568faf6a03a466a2c61
Author: Karl Williamson <[email protected]>
Date:   Wed Apr 18 17:36:01 2012 -0600

    utf8.c: refactor utf8n_to_uvuni()
    
    The prior version had a number of issues, some of which have been taken
    care of in previous commits; and some are left to do.
    
    The goal when presented with malformed input is to consume as few bytes
    as possible so as to position the input for the next try to the first
    possible byte that could be the beginning of a character.  We don't want
    to consume too few bytes, so that the next call has us thinking that
    what is the middle of a character is really the beginning; nor do we
    want to consume too many, so as to skip valid input characters.
    The previous code could do both of these in various circumstances.
    
    In some cases it believed that the first byte in a character is correct,
    and skipped looking at the rest of the bytes in the sequence.  This is
    wrong when just that first byte is garbled.  We have to look at all
    bytes in the expected sequence to make sure it hasn't been prematurely
    terminated.
    
    Likewise when we get an overflow: we have to keep looking at each byte
    in the alleged sequence.  It may be that the initial byte was garbled to
    give us an apparent large number, but the actual sequence is shorter
    than expected, and there really wouldn't have been an overflow.  We
    want to position the pointer for the next call to be the beginning of
    the next potentially good character.
    
    This fixes a long-standing TODO from an externally supplied utf8 decode
    test suite.
    
    Another bug is that the code was careless about what happens when an
    allowed malformation happens. For example, a sequence should not start
    with a continuation byte.  If that malformation is allowed, the code
    pretends it is a start byte and extracts the length of the sequence from
    that.  But pretending it is a start byte is not the same thing as it
    being a start byte, and that extracted length is bogus.

M       t/op/utf8decode.t
M       utf8.c

commit eca8b51a86eef094bf12398bb85076deb813278e
Author: Karl Williamson <[email protected]>
Date:   Wed Apr 18 17:19:31 2012 -0600

    utf8n_to_uvuni(): Move checking for >32 bit code points
    
    This just moves the code that does the checking if the code point is
    non-portable to 32-bit machines to later.  These code points aren't
    representable at all on EBCDIC platforms.
    
    This will have the effect that on platforms where these aren't
    representable, the error returned will be overflow instead of these,
    but the moving fixes the problem where the first byte is garbled, and
    the input really isn't such a large code point.  Prior to this patch,
    the first byte is treated as gospel, and the intervening code points
    aren't examined, leaving the pointer to the next input byte incorrectly
    advanced too far.

M       utf8.c
M       utf8.h

commit ebf9139ad6c02435f9568183ea91df67045614e7
Author: Karl Williamson <[email protected]>
Date:   Wed Apr 18 16:48:29 2012 -0600

    utf8n_to_uvuni: Avoid reading outside of buffer
    
    Prior to this patch, if the first byte of a UTF-8 sequence indicated
    that the sequence occupied n bytes, but the input parameters indicated
    that fewer were available, all n were attempted to be read

M       utf8.c

commit 88d1cd1162b257a09b8cc8977df82e33630883b5
Author: Karl Williamson <[email protected]>
Date:   Wed Apr 18 16:35:39 2012 -0600

    utf8.c: Clarify pod

M       utf8.c

commit 84fa68d2b6e66b78b99bc15e817e7bca52cf0f9f
Author: Karl Williamson <[email protected]>
Date:   Wed Apr 18 16:20:22 2012 -0600

    utf8.c: Use macros instead of if..else.. sequence
    
    There are two existing macros that do the job that this longish sequence
    does.  One, UTF8SKIP(), does an array lookup and is very likely to be in
    the machine's cache as it is used ubiquitously when processing UTF-8.
    The other is a simple test and shift.  These simplify the code and
    should speed things up as well.

M       utf8.c

commit 625fd54e0370a6313105d6879baf80b30220e0a5
Author: Karl Williamson <[email protected]>
Date:   Wed Apr 18 15:25:28 2012 -0600

    utf8.h: Use correct definition of start byte
    
    The previous definition allowed for (illegal) overlongs

M       utf8.h
M       utfebcdic.h

commit dd0a58cb73dd66e8154b2f274b287141887b401f
Author: Christian Hansen <[email protected]>
Date:   Wed Apr 18 14:32:16 2012 -0600

    utf8.h: Use correct UTF-8 downgradeable definition
    
    Previously, the macro changed by this commit would accept overlong
    sequences.
    
    The committer changed the original patch to swap a mask instead of a
    test, in keeping with the prior version of the code; and to include
    EBCDIC changes.

M       AUTHORS
M       t/op/print.t
M       utf8.h
M       utfebcdic.h

commit a7c4bfeea1d92ff97c09f58719491bb143ad1732
Author: Karl Williamson <[email protected]>
Date:   Wed Apr 18 14:07:33 2012 -0600

    test.pl: Add fresh_perl_unlike()

M       t/test.pl

commit 43a357c42662b613c86e4353af7e90241bf47e0e
Author: Karl Williamson <[email protected]>
Date:   Thu Mar 22 20:00:26 2012 -0600

    embed.fnc: Change formal param name to match docs
    
    This is purely so that perlapi will be accurate in this regard.

M       embed.fnc
M       proto.h

commit 1cca3103643a286b7a44a707698006a716051083
Author: Karl Williamson <[email protected]>
Date:   Wed Mar 21 08:41:44 2012 -0600

    doio.c: Add some comments

M       doio.c
-----------------------------------------------------------------------

--
Perl5 Master Repository

[perl.git] branch smoke-me/khw-smoke, created. v5.15.9-159-g762a88e

Reply via email to