[perl.git] branch smoke-me/khw-smoke, created. v5.15.9-220-g2183a1b

Karl Williamson Mon, 23 Apr 2012 13:08:48 -0700

In perl.git, the branch smoke-me/khw-smoke has been created

<http://perl5.git.perl.org/perl.git/commitdiff/2183a1b76ab5b68cfed4d5680f6cefa346df2e49?hp=0000000000000000000000000000000000000000>


        at  2183a1b76ab5b68cfed4d5680f6cefa346df2e49 (commit)

- Log -----------------------------------------------------------------
commit 2183a1b76ab5b68cfed4d5680f6cefa346df2e49
Author: Karl Williamson <[email protected]>
Date:   Mon Apr 23 13:28:32 2012 -0600

    utf8.c: White-space only
    
    This outdents to account for the removal of a surrounding block.

M       utf8.c

commit cb0df87f6337508dbe548c66a0da75424397da8a
Author: Karl Williamson <[email protected]>
Date:   Wed Apr 18 22:14:15 2012 -0600

    is_utf8_char_slow(): Avoid accepting overlongs
    
    There are possible overlong sequences that this function blindly
    accepts.  Instead of developing the code to figure this out, turn this
    function into a wrapper for utf8n_to_uvuni() which already has this
    check.

M       utf8.c

commit 858f90d599a40497816c7d1fba63652348925bba
Author: Karl Williamson <[email protected]>
Date:   Wed Apr 18 18:32:57 2012 -0600

    perlapi: Update for changes in utf8 decoding

M       utf8.c

commit 4b714b4aa5bdbbc0870d4929703fa97222678f0d
Author: Karl Williamson <[email protected]>
Date:   Wed Apr 18 17:36:01 2012 -0600

    utf8.c: refactor utf8n_to_uvuni()
    
    The prior version had a number of issues, some of which have been taken
    care of in previous commits; and some are left to do.
    
    The goal when presented with malformed input is to consume as few bytes
    as possible so as to position the input for the next try to the first
    possible byte that could be the beginning of a character.  We don't want
    to consume too few bytes, so that the next call has us thinking that
    what is the middle of a character is really the beginning; nor do we
    want to consume too many, so as to skip valid input characters.
    The previous code could do both of these in various circumstances.
    
    In some cases it believed that the first byte in a character is correct,
    and skipped looking at the rest of the bytes in the sequence.  This is
    wrong when just that first byte is garbled.  We have to look at all
    bytes in the expected sequence to make sure it hasn't been prematurely
    terminated.
    
    Likewise when we get an overflow: we have to keep looking at each byte
    in the alleged sequence.  It may be that the initial byte was garbled to
    give us an apparent large number, but the actual sequence is shorter
    than expected, and there really wouldn't have been an overflow.  We
    want to position the pointer for the next call to be the beginning of
    the next potentially good character.
    
    This fixes a long-standing TODO from an externally supplied utf8 decode
    test suite.
    
    It is unclear that the old algorithm for finding overflow catches all
    such cases.  This now uses an algorithm suggested by Hugo van der Sanden
    that should work in all instances.
    
    Another bug is that the code was careless about what happens when an
    allowed malformation happens. For example, a sequence should not start
    with a continuation byte.  If that malformation is allowed, the code
    pretends it is a start byte and extracts the length of the sequence from
    that.  But pretending it is a start byte is not the same thing as it
    being a start byte, and that extracted length is bogus.
    
    Yet another bug fixed is that the utf8 warning category had to have been
    turned on to get warnings that should have been raised when only the
    surrogate, non_unicode, or nonchar categories were on.
    
    And yet another change is that Given malformed input with no warnings,
    this function used to return whatever it had computed so far.  But this
    is really invalid garbage.  Return the REPLACEMENT CHARACTER instead.
    
    Thanks to Hugo van der Sanden for reviewing and finding problems with an
    earlier version of these commits

M       Porting/perl5160delta.pod
M       t/op/utf8decode.t
M       utf8.c
M       utf8.h

commit cda5e1cf156ad68813ce4801fe00a324b18f5a8e
Author: Karl Williamson <[email protected]>
Date:   Wed Apr 18 16:48:29 2012 -0600

    utf8n_to_uvuni: Avoid reading outside of buffer
    
    Prior to this patch, if the first byte of a UTF-8 sequence indicated
    that the sequence occupied n bytes, but the input parameters indicated
    that fewer were available, all n were attempted to be read

M       utf8.c

commit befea4e91fc36816b95a20d1bde4ccb494694dac
Author: Karl Williamson <[email protected]>
Date:   Wed Apr 18 16:35:39 2012 -0600

    utf8.c: Clarify and correct pod
    
    Some of these were spotted by Hugo van der Sanden

M       utf8.c

commit a32a9df0ee8ad375c4ab49412064c2579a7220cf
Author: Karl Williamson <[email protected]>
Date:   Wed Apr 18 16:20:22 2012 -0600

    utf8.c: Use macros instead of if..else.. sequence
    
    There are two existing macros that do the job that this longish sequence
    does.  One, UTF8SKIP(), does an array lookup and is very likely to be in
    the machine's cache as it is used ubiquitously when processing UTF-8.
    The other is a simple test and shift.  These simplify the code and
    should speed things up as well.

M       utf8.c

commit 6b260a2ed60f875f9f143eca967477ee1f6474b3
Author: Karl Williamson <[email protected]>
Date:   Wed Apr 18 15:25:28 2012 -0600

    utf8.h: Use correct definition of start byte
    
    The previous definition allowed for (illegal) overlongs.  The uses of
    this macro in the core assume that it is accurate.  The inacurracy can
    cause such code to fail.

M       utf8.h
M       utfebcdic.h

commit 8612d47db9cfa128e6bdd587ffa456297bbd1d85
Author: Christian Hansen <[email protected]>
Date:   Wed Apr 18 14:32:16 2012 -0600

    utf8.h: Use correct UTF-8 downgradeable definition
    
    Previously, the macro changed by this commit would accept overlong
    sequences.
    
    This patch was changed by the committer to to include EBCDIC changes;
    and in the non-EBCDIC case, to save a test, by using a mask instead, in
    keeping with the prior version of the code

M       AUTHORS
M       t/op/print.t
M       utf8.h
M       utfebcdic.h

commit 837c169eab0939dd9157fcbcfe14a697a85d5170
Author: Brian Fraser <[email protected]>
Date:   Fri Apr 20 22:09:56 2012 -0300

    Make unicode label tests use unicode_eval.
    
    A recent change exposed a faulty test, in t/uni/labels.t;
    Previously, a downgraded label passed to eval under 'use utf8;'
    would've been erroneously considered UTF-8 and the tests
    would pass. Now it's correctly reported as illegal UTF-8
    unless unicode_eval is in effect.

M       t/uni/labels.t

commit 3959739a2c75006a5c681895eec67aec34630749
Author: Karl Williamson <[email protected]>
Date:   Thu Mar 22 20:00:26 2012 -0600

    embed.fnc: Change formal param name to match docs
    
    This is purely so that perlapi will be accurate in this regard.

M       embed.fnc
M       proto.h

commit f7dc93f2998e581f36b6e8b523507737b615bc6b
Author: Karl Williamson <[email protected]>
Date:   Wed Mar 21 08:41:44 2012 -0600

    doio.c: Add some comments

M       doio.c
-----------------------------------------------------------------------

--
Perl5 Master Repository

[perl.git] branch smoke-me/khw-smoke, created. v5.15.9-220-g2183a1b

Reply via email to