[perl.git] branch smoke-me/khw-encode, created. v5.25.5-98-gaa167ec

Karl Williamson Tue, 11 Oct 2016 20:05:16 -0700

In perl.git, the branch smoke-me/khw-encode has been created

<http://perl5.git.perl.org/perl.git/commitdiff/aa167eca067c69d49627b386b51b8b1fbcd93bc2?hp=0000000000000000000000000000000000000000>


        at  aa167eca067c69d49627b386b51b8b1fbcd93bc2 (commit)

- Log -----------------------------------------------------------------
commit aa167eca067c69d49627b386b51b8b1fbcd93bc2
Author: Karl Williamson <[email protected]>
Date:   Thu Sep 15 09:09:07 2016 -0600

    XXX incomplete: Add sv_utf8_decode_flags

M       embed.fnc
M       embed.h
M       proto.h
M       sv.c
M       sv.h

commit b26d64c9bbbc4e9226eded7f8a37f2b3c66da676
Author: Karl Williamson <[email protected]>
Date:   Wed Sep 14 22:40:23 2016 -0600

    customized

M       t/porting/customized.dat

commit cf22cf35d4ed60ce1e55a2c2343c801adf16ee60
Author: Karl Williamson <[email protected]>
Date:   Thu Sep 1 12:20:52 2016 -0600

    Use core REPLACEMENT CHARACTER definition
    
    This allows the code to now work on EBCDIC as well.

M       cpan/Encode/Encode/encode.h

commit 43fd692352e8ebcdf02ecccce0c53a7200e8cf19
Author: Karl Williamson <[email protected]>
Date:   Thu Sep 1 12:16:00 2016 -0600

    XXX commit msg: Encode.xs: Rmv unused function

M       cpan/Encode/Encode.xs

commit c045e38c18214c03f78fd190afe01d329c539415
Author: Karl Williamson <[email protected]>
Date:   Thu Sep 1 12:12:39 2016 -0600

    Encode.xs: white-space only

M       cpan/Encode/Encode.xs

commit 7127e301537309f15fed5b5374a299960526ef33
Author: Karl Williamson <[email protected]>
Date:   Thu Sep 1 12:12:06 2016 -0600

    XXX maybe more in commit msg: Speed up Encode UTF-8 validation checking
    
    This replaces the current scheme for checking UTF-8 validity by one
    in which normal processing doesn't require having to decode the UTF-8
    into code points.  The copying of characters individually from the input
    to the output is changed to be a single operation for each entire span
    of valid input at once.
    
    Thus in the normal case, what ends up happening is a tight loop to
    check the validity, and then a memmove of the entire input to the
    output, then return.
    
    If an error is found, it copies all the valid input before the error,
    then handles the character in error, then positions to the next input
    position, and repeats the whole process starting from there.
    
    It uses the functionality available from the Perl 5 core to to look at
    just the bytes that comprise the UTF-8 to make the determination,
    converting to code points only those that are defective some how in
    order to display them in warnings and error messages.
    
    Thus, this does not need to know about the intricacies of UTF-8
    malformations, relying on the core to handle this.
    
    This cannot be pushed to CPAN until Devel::PPPort has been updated to
    implement all the functions now needed.

M       cpan/Encode/Encode.pm
M       cpan/Encode/Encode.xs

commit 70d359917524f476b6d07dbd5f8093ef564713c8
Author: Karl Williamson <[email protected]>
Date:   Mon Oct 10 21:18:37 2016 -0600

    XXX pod, delta: Add utf8n_to_uvchr_error
    
    This new function behaves like utf8n_to_uvchr(), but takes an extra
    parameter that points to a U32 which will be set to 0 if no errors are
    found; otherwise each error found will set a bit in it.  This can be
    used by the caller to figure out precisely what the error(s) is/are.
    Previously, one would have to capture and parse the warning/error
    messages raised.   This can be used, for example, to customize the
    messages to the expected end-user's knowledge level.

M       embed.fnc
M       embed.h
M       ext/XS-APItest/APItest.xs
M       ext/XS-APItest/t/utf8.t
M       proto.h
M       utf8.c
M       utf8.h

commit d7c4eebd83ea92a6adfd914b5fd404ff53b020e9
Author: Karl Williamson <[email protected]>
Date:   Sat Oct 8 21:19:18 2016 -0600

    utf8n_to_uvchr():  Make a parameter const

M       embed.fnc
M       proto.h
M       utf8.c

commit 3ef3f0f6f5793be7f0ff9f3b8fbaf36bc4337efc
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 5 19:09:02 2016 -0600

    utf8n_to_uvchr(): Note multiple malformations
    
    Some UTF-8 sequences can have multiple malformations.  For example, a
    sequence can be the start of an overlong representation of a code point,
    and still be incomplete.  Until this commit what was generally done was
    to stop looking when the first malformation was found.  This was not
    correct behavior, as that malformation may be allowed, while another
    unallowed one went unnoticed.  This commit refactors the error handling
    of this function to set a flag and keep going if a malformation is found
    that doesn't precude others.  Then each is handled in a loop at the end,
    warning if warranted.  The result is that there is a warning for each
    malformation for which warnings should be generated, and an error return
    is made if any one is disallowed.
    
    In the case of overflow, this automatically is for a non-Unicode code
    point and for one above 31 bits; these are not independent
    malformations, so only one warning is output--the most dire.
    
    This will speed up the normal case slightly, as the test for overflow is
    pulled out of the loop, allowing the UV to overflow.  Then a single test
    after the loop is done to see if there was overflow or not.

M       ext/XS-APItest/t/utf8.t
M       pod/perldiag.pod
M       t/op/utf8decode.t
M       utf8.c
M       utf8.h

commit 6d8cec9c33b6d893402659c054da6a9161d3a062
Author: Karl Williamson <[email protected]>
Date:   Sat Oct 8 20:53:31 2016 -0600

    APItest/t/utf8.t: Fix improper tests
    
    These two tests are overlong malformations, besides being the ones
    purportedly being tested.  Make them not overlong, so are testing just
    one thing

M       ext/XS-APItest/t/utf8.t

commit 260b5005ae04cff620c0298e83cef9bfd900caea
Author: Karl Williamson <[email protected]>
Date:   Fri Oct 7 15:07:57 2016 -0600

    APItest/t/utf8.t: Indent a bunch of code
    
    And reflow to fit in 80 columns.  This is in preparation for the next
    commit which will enlocde this new code with two more for loops.
    Several lines that were missing semi-colons have these added (they were
    at the end of nested blocks, so it wasn't an error)

M       ext/XS-APItest/t/utf8.t

commit ec63b1b8a65bb6a4b41803c75171da358abdc291
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 5 18:34:15 2016 -0600

    APItest/t/utf8.t: Add missing test
    
    Under some circumstances we weren't validating that the generated
    warnings are correct.  This required reordering some 'if' tests, and
    revised special casing of the overflow test.

M       ext/XS-APItest/t/utf8.t

commit ca56f72e31efd0b256ea22f3f2f1b1599c3cccb1
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 5 18:32:55 2016 -0600

    APItest/t/utf8.t: Rename test for clarity

M       ext/XS-APItest/t/utf8.t

commit c688737f379efb8bb9b0681d9d9c48845133f263
Author: Karl Williamson <[email protected]>
Date:   Sun Oct 2 21:50:10 2016 -0600

    utf8.c: Extract some code into 2 functions
    
    This is in preparation for the same functionality to each be used in a
    new place in a future commit

M       embed.fnc
M       embed.h
M       proto.h
M       utf8.c

commit 418e82b04119ba76e8e17909671065cc8f0c5ac7
Author: Karl Williamson <[email protected]>
Date:   Sun Oct 2 21:31:52 2016 -0600

    utf8.c: Rename a couple of macros for clarity
    
    These were recently added in 2b47960981adadbe81b9635d4ca7861c45ccdced.
    This also removes the #undefs of these in preparation for them to be
    used later in the file.

M       utf8.c

commit 76c6fb65b0ed753d9d5ccbd9f4104d3a6285a8b7
Author: Karl Williamson <[email protected]>
Date:   Sun Oct 2 21:09:27 2016 -0600

    utf8.h: Change some flag definition constants
    
    These #defines give flag bits in a U32.  This commit opens a gap that
    will be filled in a future commit.  A test file has to change to
    correspond, as it duplicates the defines.

M       ext/XS-APItest/t/utf8.t
M       utf8.h

commit dba8551fc0e8807438deb02b18452b2cd744e92f
Author: Karl Williamson <[email protected]>
Date:   Sun Oct 2 21:05:15 2016 -0600

    APItest/t/utf8.t: Extract code to common function
    
    There are many instances of this simple code to dump an array of trapped
    warning messages.  The problem is that they display better when joined
    by "" rather than by a comma.  Rather than change each instance to do
    that, I changed each instance to a sub call and changed it there.

M       ext/XS-APItest/t/utf8.t

commit 326860be79372e67974dd50fb4218d37e207fe73
Author: Karl Williamson <[email protected]>
Date:   Fri Sep 30 12:42:45 2016 -0600

    utf8.c: Add some UNLIKELY()s
    
    for branch prediction

M       utf8.c

commit 8e39e2a47cd6653f778980d187e1528c04141a33
Author: Karl Williamson <[email protected]>
Date:   Wed Sep 28 15:05:17 2016 -0600

    Add details to UTF-8 malformation error messages
    
    I've long been unsatisfied with the information contained in the
    error/warning messages raised when some input is malformed UTF-8, but
    have been reluctant to change the text in case some one is relying on
    it.  One reason that someone might be parsing the messages is that there
    has been no convenient way to otherwise pin down what the exact
    malformation might be.  A couple of commits from now will add a facility
    to get the type of malformation unambiguously.  This will be a better
    mechanism to use for those rare modules that need to know what's the
    exact malformation.
    
    So, I will fix and issue pull requests for any module broken by this
    commit.
    
    The messages are changed by now dumping (in \xXY format) the bytes that
    make up the malformed character, and extra details are added in most
    cases.
    
    Messages about overlongs now display the code point they evaluate to and
    what the shortest UTF-8 sequence for generating that code point is.
    
    Messages about overflowing now just display that it overflows, since the
    entire byte sequence is now dumped.  The previous message displayed just
    the byte which was being processed where overflow was detected, but that
    is not helpful at all.

M       embed.fnc
M       embed.h
M       ext/XS-APItest/t/utf8.t
M       lib/utf8.t
M       proto.h
M       t/io/utf8.t
M       t/lib/warnings/utf8
M       t/op/pack.t
M       t/op/utf8decode.t
M       utf8.c

commit a2f450e53064d00973b99c1b4bd7f312d50818c2
Author: Karl Williamson <[email protected]>
Date:   Wed Sep 28 10:19:03 2016 -0600

    utf8.c: Consolidate duplicate error msg text
    
    This text is generated in 2 places; consolidate into one place.

M       embed.fnc
M       embed.h
M       proto.h
M       utf8.c
-----------------------------------------------------------------------

--
Perl5 Master Repository

[perl.git] branch smoke-me/khw-encode, created. v5.25.5-98-gaa167ec

Reply via email to