[perl.git] branch smoke-me/khw-encode, created. v5.25.5-95-g5831346

Karl Williamson Mon, 10 Oct 2016 22:05:56 -0700

In perl.git, the branch smoke-me/khw-encode has been created

<http://perl5.git.perl.org/perl.git/commitdiff/583134622b558a2aaa0ccd194c14bf1ae78e1a78?hp=0000000000000000000000000000000000000000>


        at  583134622b558a2aaa0ccd194c14bf1ae78e1a78 (commit)

- Log -----------------------------------------------------------------
commit 583134622b558a2aaa0ccd194c14bf1ae78e1a78
Author: Karl Williamson <[email protected]>
Date:   Thu Sep 15 09:09:07 2016 -0600

    XXX incomplete: Add sv_utf8_decode_flags

M       embed.fnc
M       embed.h
M       proto.h
M       sv.c
M       sv.h

commit 341d064f4929671f78f53f392d40e8e29f4f4c9a
Author: Karl Williamson <[email protected]>
Date:   Wed Sep 14 22:40:23 2016 -0600

    customized

M       t/porting/customized.dat

commit 4068078f7af608a99f148c2052e1a6e9f25e1b05
Author: Karl Williamson <[email protected]>
Date:   Thu Sep 1 12:20:52 2016 -0600

    Use core REPLACEMENT CHARACTER definition
    
    This allows the code to now work on EBCDIC as well.

M       cpan/Encode/Encode/encode.h

commit ee9b0e6e602dfdd8cfe133ba251dcb964b5b7e59
Author: Karl Williamson <[email protected]>
Date:   Thu Sep 1 12:16:00 2016 -0600

    XXX commit msg: Encode.xs: Rmv unused function

M       cpan/Encode/Encode.xs

commit 9be566b07dbc37351360acba68a28bdeb68b28fd
Author: Karl Williamson <[email protected]>
Date:   Thu Sep 1 12:12:39 2016 -0600

    Encode.xs: white-space only

M       cpan/Encode/Encode.xs

commit ab7c6894c26b478a7c48f775a2a6f517f6355e20
Author: Karl Williamson <[email protected]>
Date:   Thu Sep 1 12:12:06 2016 -0600

    XXX maybe more in commit msg: Speed up Encode UTF-8 validation checking
    
    This replaces the current scheme for checking UTF-8 validity by one
    in which normal processing doesn't require having to decode the UTF-8
    into code points.  The copying of characters individually from the input
    to the output is changed to be a single operation for each entire span
    of valid input at once.
    
    Thus in the normal case, what ends up happening is a tight loop to
    check the validity, and then a memmove of the entire input to the
    output, then return.
    
    If an error is found, it copies all the valid input before the error,
    then handles the character in error, then positions to the next input
    position, and repeats the whole process starting from there.
    
    It uses the functionality available from the Perl 5 core to to look at
    just the bytes that comprise the UTF-8 to make the determination,
    converting to code points only those that are defective some how in
    order to display them in warnings and error messages.
    
    Thus, this does not need to know about the intricacies of UTF-8
    malformations, relying on the core to handle this.
    
    This cannot be pushed to CPAN until Devel::PPPort has been updated to
    implement all the functions now needed.

M       cpan/Encode/Encode.pm
M       cpan/Encode/Encode.xs

commit 8fdf9723488670ca1775f9ff3faaac63f8ef62b0
Author: Karl Williamson <[email protected]>
Date:   Mon Oct 10 21:18:37 2016 -0600

    XXX pod, delta: Add utf8n_to_uvchr_error
    
    This new function behaves like utf8n_to_uvchr(), but takes an extra
    parameter that points to a U32 which will be set to 0 if no errors are
    found; otherwise each error found will set a bit in it.  This can be
    used by the caller to figure out precisely what the error(s) is/are.
    Previously, one would have to capture and parse the warning/error
    messages raised.   This can be used, for example, to customize the
    messages to the expected end-user's knowledge level.

M       embed.fnc
M       embed.h
M       ext/XS-APItest/APItest.xs
M       ext/XS-APItest/t/utf8.t
M       proto.h
M       utf8.c
M       utf8.h

commit 78d01b0f31637f805415baf35079fcc810f5749b
Author: Karl Williamson <[email protected]>
Date:   Sat Oct 8 21:19:18 2016 -0600

    utf8n_to_uvchr():  Make a parameter const

M       embed.fnc
M       proto.h
M       utf8.c

commit bf79738e76567fe04e6c89f05e1ebe03dccc0a17
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 5 19:09:02 2016 -0600

    utf8n_to_uvchr(): Note multiple malformations
    
    Some UTF-8 sequences can have multiple malformations.  For example, a
    sequence can be the start of an overlong representation of a code point,
    and still be incomplete.  Until this commit what was generally done was
    to stop looking when the first malformation was found.  This was not
    correct behavior, as that malformation may be allowed, while another
    unallowed one went unnoticed.  This commit refactors the error handling
    of this function to set a flag and keep going if a malformation is found
    that doesn't precude others.  Then each is handled in a loop at the end,
    warning if warranted.  The result is that there is a warning for each
    malformation for which warnings should be generated, and an error return
    is made if any one is disallowed.
    
    In the case of overflow, this automatically is for a non-Unicode code
    point and for one above 31 bits; these are not independent
    malformations, so only one warning is output--the most dire.
    
    This will speed up the normal case slightly, as the test for overflow is
    pulled out of the loop, allowing the UV to overflow.  Then a single test
    after the loop is done to see if there was overflow or not.

M       ext/XS-APItest/t/utf8.t
M       pod/perldiag.pod
M       t/op/utf8decode.t
M       utf8.c
M       utf8.h

commit 884bdbe3cc3353d353dee19293ac1b494aa516ba
Author: Karl Williamson <[email protected]>
Date:   Sat Oct 8 20:53:31 2016 -0600

    APItest/t/utf8.t: Fix improper tests
    
    These two tests are overlong malformations, besides being the ones
    purportedly being tested.  Make them not overlong, so are testing just
    one thing

M       ext/XS-APItest/t/utf8.t

commit 4e4ac410df520b4be0c903b1cbb782e96742e963
Author: Karl Williamson <[email protected]>
Date:   Fri Oct 7 15:07:57 2016 -0600

    APItest/t/utf8.t: Indent a bunch of code
    
    And reflow to fit in 80 columns.  This is in preparation for the next
    commit which will enlocde this new code with two more for loops.
    Several lines that were missing semi-colons have these added (they were
    at the end of nested blocks, so it wasn't an error)

M       ext/XS-APItest/t/utf8.t

commit 9edd73835be25396c6c21796184af257e3db8a4a
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 5 18:34:15 2016 -0600

    APItest/t/utf8.t: Add missing test
    
    Under some circumstances we weren't validating that the generated
    warnings are correct.  This required reordering some 'if' tests, and
    revised special casing of the overflow test.

M       ext/XS-APItest/t/utf8.t

commit 70de9a99a4b79d5527d8e3f3a1aa5a4d6b03d006
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 5 18:32:55 2016 -0600

    APItest/t/utf8.t: Rename test for clarity

M       ext/XS-APItest/t/utf8.t

commit 702eed37a7232b9b1ffe0619e533e98aa436129c
Author: Karl Williamson <[email protected]>
Date:   Sun Oct 2 21:50:10 2016 -0600

    utf8.c: Extract some code into 2 functions
    
    This is in preparation for the same functionality to each be used in a
    new place in a future commit

M       embed.fnc
M       embed.h
M       proto.h
M       utf8.c

commit d9132190f6fbcb3794d867e57c6fb4b0f13b07ae
Author: Karl Williamson <[email protected]>
Date:   Sun Oct 2 21:31:52 2016 -0600

    utf8.c: Rename a couple of macros for clarity
    
    These were recently added in 2b47960981adadbe81b9635d4ca7861c45ccdced.
    This also removes the #undefs of these in preparation for them to be
    used later in the file.

M       utf8.c

commit f1f8aa4d9f9546f67ecc8d2a9cd8bee0b0499aef
Author: Karl Williamson <[email protected]>
Date:   Sun Oct 2 21:09:27 2016 -0600

    utf8.h: Change some flag definition constants
    
    These #defines give flag bits in a U32.  This commit opens a gap that
    will be filled in a future commit.  A test file has to change to
    correspond, as it duplicates the defines.

M       ext/XS-APItest/t/utf8.t
M       utf8.h

commit 59fb8148d8e008ad496a84101922c7041c673835
Author: Karl Williamson <[email protected]>
Date:   Sun Oct 2 21:05:15 2016 -0600

    APItest/t/utf8.t: Extract code to common function
    
    There are many instances of this simple code to dump an array of trapped
    warning messages.  The problem is that they display better when joined
    by "" rather than by a comma.  Rather than change each instance to do
    that, I changed each instance to a sub call and changed it there.

M       ext/XS-APItest/t/utf8.t

commit 6d533da66a4afd8b6358db96647e221aef87f0b0
Author: Karl Williamson <[email protected]>
Date:   Fri Sep 30 12:42:45 2016 -0600

    utf8.c: Add some UNLIKELY()s
    
    for branch prediction

M       utf8.c

commit a756f28e2ce0a6142b1eb16a13855ffd6cdaa34f
Author: Karl Williamson <[email protected]>
Date:   Wed Sep 28 15:05:17 2016 -0600

    Add details to UTF-8 malformation error messages
    
    I've long been unsatisfied with the information contained in the
    error/warning messages raised when some input is malformed UTF-8, but
    have been reluctant to change the text in case some one is relying on
    it.  One reason that someone might be parsing the messages is that there
    has been no convenient way to otherwise pin down what the exact
    malformation might be.  A couple of commits from now will add a facility
    to get the type of malformation unambiguously.  This will be a better
    mechanism to use for those rare modules that need to know what's the
    exact malformation.
    
    So, I will fix and issue pull requests for any module broken by this
    commit.
    
    The messages are changed by now dumping (in \xXY format) the bytes that
    make up the malformed character, and extra details are added in most
    cases.
    
    Messages about overlongs now display the code point they evaluate to and
    what the shortest UTF-8 sequence for generating that code point is.
    
    Messages about overflowing now just display that it overflows, since the
    entire byte sequence is now dumped.  The previous message displayed just
    the byte which was being processed where overflow was detected, but that
    is not helpful at all.

M       embed.fnc
M       embed.h
M       ext/XS-APItest/t/utf8.t
M       lib/utf8.t
M       proto.h
M       t/io/utf8.t
M       t/lib/warnings/utf8
M       t/op/pack.t
M       t/op/utf8decode.t
M       utf8.c

commit 91ad2c4fb2e9b2a98b547994d70466bce78625c6
Author: Karl Williamson <[email protected]>
Date:   Wed Sep 28 10:19:03 2016 -0600

    utf8.c: Consolidate duplicate error msg text
    
    This text is generated in 2 places; consolidate into one place.

M       embed.fnc
M       embed.h
M       proto.h
M       utf8.c
-----------------------------------------------------------------------

--
Perl5 Master Repository

[perl.git] branch smoke-me/khw-encode, created. v5.25.5-95-g5831346

Reply via email to