In perl.git, the branch smoke-me/khw-fatal has been created

<http://perl5.git.perl.org/perl.git/commitdiff/8a7f49f6882cb47e4a1241bdc902e85bd2c920af?hp=0000000000000000000000000000000000000000>

        at  8a7f49f6882cb47e4a1241bdc902e85bd2c920af (commit)

- Log -----------------------------------------------------------------
commit 8a7f49f6882cb47e4a1241bdc902e85bd2c920af
Author: Karl Williamson <[email protected]>
Date:   Sat Jul 1 11:58:00 2017 -0600

    Forbid above IV_MAX code points
    
    This implements the restriction of code points to 0..IV_MAX in such a
    way that the process doesn't die when presented with input UTF-8 that
    evaluates to a larger one.  Instead, it is treated as overflow.
    
    The commit reinstates causing the offending process to die if trying to
    create a character somehow that is above IV_MAX (like
    chr(0xFFFFFFFFFFFFF) or trying to do certain operations on one if
    somehow one did get created.
    
    The long term goal is to use code points above IV_MAX internally, as
    Perl6 does.  So code and tests are not removed, just commented out

M       ext/XS-APItest/t/utf8_warn_base.pl
M       t/lib/warnings/utf8
M       t/op/index.t
M       t/op/utf8decode.t
M       t/re/pat_advanced.t
M       utf8.c

commit d2077fa10f04397cd96cf637ced8ea2a11777283
Author: Karl Williamson <[email protected]>
Date:   Sat Jul 1 11:23:49 2017 -0600

    utf8.c: Change 2 static fcns to handle overlongs
    
    This will be used in the following commit.
    
    One function is made more complicated, so we stop asking it to be
    inlined.

M       embed.fnc
M       proto.h
M       utf8.c

commit 55d8a9b58b092f89795ed50469f2c5ce52882aca
Author: Karl Williamson <[email protected]>
Date:   Sat Jul 1 11:17:28 2017 -0600

    utf8.c: Move and slightly change comment block
    
    This is so there are fewer real differences shown in the next commit

M       utf8.c

commit d415acfeee61570ee2fe3ace01e2258cf117fa0f
Author: Karl Williamson <[email protected]>
Date:   Sat Jul 1 07:45:40 2017 -0600

    utf8.c: Generalize static fcn return for indeterminate result
    
    This makes it harder for a maintainer to be deceived into thinking that
    0 means a definite FALSE.

M       embed.fnc
M       proto.h
M       utf8.c

commit 9adbaa6a96bb0067a6d55c7f4e8bd6b9d769c43e
Author: Karl Williamson <[email protected]>
Date:   Sat Jul 1 07:21:09 2017 -0600

    utf8.c: Generalize static fcn return for indeterminate result
    
    This makes it harder to think that 0 means a definite FALSE.

M       embed.fnc
M       proto.h
M       utf8.c

commit d531afe3bbcd548d97eef57d35f361df0106a931
Author: Karl Williamson <[email protected]>
Date:   Sat Jul 1 06:32:28 2017 -0600

    utf8.c: Move a fcn within the file
    
    This simply moves a function to later in the file.  The next commIt will
    change it to needing a definition which, until this commit, preceded it
    in the file.

M       utf8.c

commit 33d57d588bea594da12ff5d87b5496d2e4137124
Author: Karl Williamson <[email protected]>
Date:   Sat Jul 1 06:43:34 2017 -0600

    utf8.c: Generalize static fcn return for indeterminate result
    
    This makes it harder to think that 0 means a definite FALSE.

M       embed.fnc
M       proto.h
M       utf8.c

commit d4a90dce9823fc572e1baa248361a439cbf157f5
Author: Karl Williamson <[email protected]>
Date:   Sat Jul 1 06:18:01 2017 -0600

    utf8.c: Generalize static fcn return for indeterminate result
    
    Prior to this commit, isFF_OVERLONG() returned a boolean, with 0 also
    indicating that there wasn't enough information to make a determination.
    I realized that I was forgetting that 0 wasn't necessarily definitive
    while coding.  By changing the API to return 3 values, forgetting that
    won't likely happen.
    
    This and the next several commits change several other functions that
    have the same predicament.

M       embed.fnc
M       proto.h
M       utf8.c

commit 115da1d3afbbc842237752ab93e67eea0221f486
Author: Karl Williamson <[email protected]>
Date:   Fri Jun 30 13:21:58 2017 -0600

    utf8.h: Comments only
    
    An earlier commit had split some comments up.  And this adds clarifying
    details.

M       utf8.h

commit 3796960c1647125f08988955fb95c9a27a77ae47
Author: Karl Williamson <[email protected]>
Date:   Fri Jun 30 13:19:10 2017 -0600

    utf8.c: Reorder two 'if' clauses
    
    This is purely to get vertical line up that easier to see of slightly
    differently spelled tests

M       utf8.c

commit 83e738b916e43f3157dff158a1404d2d6a374f38
Author: Karl Williamson <[email protected]>
Date:   Fri Jun 30 11:19:59 2017 -0600

    utf8.c: Slightly simplify some code
    
    This just does a small refactor, which I think makes things easier to
    understand.

M       utf8.c

commit cb943b14621391088a45e516a46d8c8a037da608
Author: Karl Williamson <[email protected]>
Date:   Sat Jul 8 14:54:28 2017 -0600

    utf8n_to_uvchr(): Properly handle extremely high code points
    
    It turns out that it could incorrectly deem something to be overflowing
    or overlong.  This fixes that and changes the test to catch this
    possibility.  This fixes a bug, so now on 32-bit systems, it detects
    that if you have a start byte of FE, you need  a continuation byte to
    determine if the result overflows.

M       ext/XS-APItest/t/utf8_warn_base.pl
M       t/op/utf8decode.t
M       utf8.c

commit f2c89228cbdd678a297dbc63123f7253cff6ab77
Author: Karl Williamson <[email protected]>
Date:   Fri Jul 7 12:39:33 2017 -0600

    rm APItest/t/utf8_malformed.t
    
    This file no longer contains any tests.  All were either made redundant
    with utf8_warn_base.pl or have been moved to it.

M       MANIFEST
D       ext/XS-APItest/t/utf8_malformed.t

commit 454fcc903ec76fe3916a9dd061b631ae56b00785
Author: Karl Williamson <[email protected]>
Date:   Fri Jul 7 12:37:39 2017 -0600

    Move test to utf8_warn_base.pl
    
    This is the final test that was in utf8_malformed.t.  The next commit
    will remove the file.

M       ext/XS-APItest/t/utf8_malformed.t
M       ext/XS-APItest/t/utf8_warn_base.pl

commit 2bb7d44698a8607a0f40208de893636eb743f3e3
Author: Karl Williamson <[email protected]>
Date:   Wed Jul 5 10:27:25 2017 -0600

    APItest/t/utf8_malformed.t: Remove 2 redundant tests
    
    These tests for the malformation where a UTF-8 sequence is interrupted
    by the beginning of another character, already get tested int
    utf8_warn_base.pl

M       ext/XS-APItest/t/utf8_malformed.t

commit 5a80dcb80fd615904b69545409988a42713ea54a
Author: Karl Williamson <[email protected]>
Date:   Fri Jul 7 15:20:44 2017 -0600

    APItest/t/utf8_warn_base.pl: White-space only
    
    This indents properly after the previous commit created a block around
    this code, and reflows to fit in 79 columns.

M       ext/XS-APItest/t/utf8_warn_base.pl

commit 7cafe2646a26adc6baafc2b947eddd6efe9628af
Author: Karl Williamson <[email protected]>
Date:   Tue Jul 4 12:57:40 2017 -0600

    APItest/t/utf8_warn_base.pl: Add a test
    
    This verifies that we don't mistake an overlong for overflow

M       ext/XS-APItest/t/utf8_warn_base.pl

commit d5aa63179d1c0c30051088743f32084e3f8191e0
Author: Karl Williamson <[email protected]>
Date:   Tue Jul 4 16:04:26 2017 -0600

    APItest/t/utf8_malformed.t: move tests to utf8_warn_base.pl
    
    This adds infrastructure to utf8_warn_base.pl to handle the overlong
    tests that are now moved to it from utf8_malformed.t

M       ext/XS-APItest/t/utf8_malformed.t
M       ext/XS-APItest/t/utf8_warn_base.pl

commit d4505b5aabb4b408ccfb9938954c5c85c4284544
Author: Karl Williamson <[email protected]>
Date:   Tue Jul 4 12:22:29 2017 -0600

    APItest/t/utf8_malformed.t: move test to utf8_warn_base.pl
    
    Actually, this test was already in utf8_warn_base, but was executed only
    on 64 bit platforms.  It is reasonable to make sure it works on 32 bit
    ones, as it is an edge case there as well, in the sense that it is the
    first 13 byte code point.
    
    This is the first of a series of commits to remove all the tests in
    utf8_malformed, so the entire file can be removed.
    
    utf8_warn_base has been heavily cleaned up, and now has better
    infrastructure for more completely testing thant utf8_malformed.  The
    two files have much the same logic, and rather than trying to maintain
    two versions, it's better to combine them.

M       ext/XS-APItest/t/utf8_malformed.t
M       ext/XS-APItest/t/utf8_warn_base.pl

commit d673d41893c8056c46c8eed566ea7eb6cf122405
Author: Karl Williamson <[email protected]>
Date:   Tue Jul 4 13:23:18 2017 -0600

    APItest/t/utf8_malformed.t: Remove redundant test
    
    This tests the too short malformation, which is already adequately
    tested in utf8_warn_base.pl

M       ext/XS-APItest/t/utf8_malformed.t

commit 186ca96d2d6fd6c0679a8e7e671cbaa76ccbd6a4
Author: Karl Williamson <[email protected]>
Date:   Tue Jul 4 13:19:33 2017 -0600

    APItest/t/utf8_malformed.t: Remove 2 redundant tests
    
    These test overflowing, which is already adequately tested in
    utf8_warn_base.pl

M       ext/XS-APItest/t/utf8_malformed.t

commit 907c56e03c45ab5d51514bd7c57b24153b1451ac
Author: Karl Williamson <[email protected]>
Date:   Tue Jul 4 10:06:37 2017 -0600

    APItest/t/utf8_malformed.t: Remove redundant test
    
    This test already is covered in utf8_warn_base.pl.  It tests an overlong
    for 2**32.

M       ext/XS-APItest/t/utf8_malformed.t

commit 7cd70612272c201f1c55cae693ca471326a53e5a
Author: Karl Williamson <[email protected]>
Date:   Fri Jul 7 10:56:23 2017 -0600

    APItest/t/utf8_warn_base.pl: Add tests
    
    This test file has various tests, and it intentionally perturbs them to
    create malformations to test that these get properly handled.  Prior to
    this commit, only the function utf8n_to_uvchr_error() was being tested
    with these perturbations.  Now, the functions whoe names start with 'is'
    also get tested.

M       ext/XS-APItest/t/utf8_warn_base.pl

commit 75c67571d2b9b34cefef5ee7e83c66795b0eb794
Author: Karl Williamson <[email protected]>
Date:   Wed Jul 5 14:58:43 2017 -0600

    APItest/t/utf8_warn_base.pl: Move some tests
    
    This just moves a block and indents and reflows it.  It is moved to
    within the loops that set up various malformations in the input.  The
    next commit will change these tests to actually use the perturbed
    inputs.

M       ext/XS-APItest/t/utf8_warn_base.pl

commit bac9af56c011889ae35b263e03feef1aa88c44cd
Author: Karl Williamson <[email protected]>
Date:   Wed Jul 5 13:09:27 2017 -0600

    APItest/t/utf8_warn_base.pl: Move some setup code
    
    We don't need this code until we've determined we're actually going to
    go through with a test.

M       ext/XS-APItest/t/utf8_warn_base.pl

commit a20f69f469f19658dc64008e4de736277c455d7c
Author: Karl Williamson <[email protected]>
Date:   Fri Jul 7 10:34:01 2017 -0600

    APItest/t/utf8_warn_base.pl: Clean up test name
    
    This name was confusing, as there are two types of things that can be
    (dis)allowed, and in the case of an overflow, the first type is not
    being tested but has the adjective (dis)allowed present.  Add the term
    only when appropriate.

M       ext/XS-APItest/t/utf8_warn_base.pl

commit 7bca1484174bd18057d2892301b5c2a4e26d0c45
Author: Karl Williamson <[email protected]>
Date:   Wed Jul 5 13:00:03 2017 -0600

    APItest/t/utf8_warn_base.pl: Skip inappropriate tests
    
    If we don't have enough information for the test to be meaningful, don't
    bother doing it.

M       ext/XS-APItest/t/utf8_warn_base.pl

commit 75920e41530468ee53f8ada2de5e4885664eeef5
Author: Karl Williamson <[email protected]>
Date:   Fri Jun 30 22:29:36 2017 -0600

    APItest/t/utf8_warn_base.pl: Use a default value
    
    This adds a default number of bytes needed to detect overflows, like
    previous commits have added defaults for other categories.

M       ext/XS-APItest/t/utf8_warn_base.pl

commit 9a3e302f28434f2fb56b072fc906b99d90986bc8
Author: Karl Williamson <[email protected]>
Date:   Tue Jun 27 14:46:26 2017 -0600

    utf8n_to_uvchr() Properly test for extended UTF-8
    
    It somehow dawned on me that the code is incorrect for
    warning/disallowing very high code points.  What is really wanted in the
    API is to catch UTF-8 that is not necessarily portable.  There are
    several classes of this, but I'm referring here to just the code points
    that are above the Unicode-defined maximum of 0x10FFFF.  These can be
    considered non-portable, and there is a mechanism in the API to
    warn/disallow these.
    
    However an earlier standard defined UTF-8 to handle code points up to
    2**31-1.  Anything above that is using an extension to UTF-8 that has
    never been officially recognized.  Perl does use such an extension, and
    the API is supposed to have a different mechanism to warn/disallow on
    this.
    
    Thus there are two classes of warning/disallowing for above-Unicode code
    points.  One for things that have some non-Unicode official recognition,
    and the other for things that have never had official recognition.
    
    UTF-EBCDIC differs somewhat in this, and since Perl 5.24, we have had a
    Perl extension that allows it to any handle code point that fits in a
    64-bit word.  This kicks in at code points above 2**30-1, a number
    different than UTF-8 extended kicks in on ASCII platforms.
    
    Things are also complicated by the fact that the API has provisions for
    accepting the overlong UTF-8 malformation.  It is possible to use
    extended UTF-8 to represent code points smaller than 31-bit ones.
    
    Until this commit, the extended warning/disallowing was based on the
    resultant code point, and only when that code point did not fit into 31
    bits.
    
    But what is really wanted is if extended UTF-8 was used to represent a
    code point, no matter how large the resultant code point is.  This
    differs from the previous definition, but only for EBCDIC platforms, or
    when the overlong malformation was also present.  So it does not affect
    very many real-world cases.
    
    This commit fixes that.  It turns out that it is easier to tell if
    something is using extended-UTF8.  One just looks at the first byte of a
    sequence.
    
    The trailing part of the warning message that gets raised is slightly
    changed to be clearer.  It's not significant enough to affect perldiag.

M       ext/XS-APItest/t/utf8_warn_base.pl
M       t/lib/warnings/utf8
M       utf8.c
M       utf8.h
M       utfebcdic.h

commit 4266413d17ad6781b86bc378406b6c77138acab7
Author: Karl Williamson <[email protected]>
Date:   Mon Jun 26 11:43:21 2017 -0600

    utf8.h: Add synonyms for flag names
    
    The next commit will fix the detection of using Perl's extended UTF-8 to
    be more accurate.  The current name for various flags in the API is
    somewhat misleading.  What is really wanted to know is if extended UTF-8
    was used, not the value of the resultant code point.
    
    This commit basically does
    
        s/ABOVE_31_BIT/PERL_EXTENDED/g
    
    It also similarly changes the name of a hash key in APItest/t/utf8.t.
    
    This intermediary step makes the next commit easier to read.

M       ext/XS-APItest/t/utf8.t
M       ext/XS-APItest/t/utf8_setup.pl
M       ext/XS-APItest/t/utf8_warn_base.pl
M       inline.h
M       utf8.c
M       utf8.h

commit cc7af58675f0c74d46276e2b1f72aac7c1e799c6
Author: Karl Williamson <[email protected]>
Date:   Mon Jun 26 22:22:32 2017 -0600

    APItest/t/utf8_warn_base.pl: Generate smaller overlongs
    
    This file generates overlongs for testing that that malformation is
    handled properly.  This commit changes it to avoid generating an
    overlong that uses Perl's extended UTF-8.  This will come in handy a
    couple of commits from now, when a bug dealing with that gets fixed.
    
    It also moves setting a variable to outside the loop

M       ext/XS-APItest/t/utf8_warn_base.pl

commit acfec7435addaedf7915a46b7e26dfd0c03cb291
Author: Karl Williamson <[email protected]>
Date:   Fri Jun 30 12:57:49 2017 -0600

    APItest/t/utf8_warn_base.pl: Data::Dumper isn't needed

M       ext/XS-APItest/t/utf8_warn_base.pl

commit efabceebd3ba521e29a311210c8254c54a19cb8c
Author: Karl Williamson <[email protected]>
Date:   Fri Jun 30 13:14:57 2017 -0600

    APItest/t/utf8_warn_base.pl: Move some tests from loop
    
    These test if any warnings are generated.  None are ever likely to be
    given the way things work.  We can test after the loop that none of the
    iterations generated warnings, as any would accumulate.

M       ext/XS-APItest/t/utf8_warn_base.pl

commit f037078009e7d1f94a53ee364fd3b03b5637c9a0
Author: Karl Williamson <[email protected]>
Date:   Sun Jun 25 21:35:05 2017 -0600

    APItest/t/utf8_warn_base.pl: Extract code into a fcn
    
    This uses a function to test for a common paradigm.  The next couple of
    commits will change that paradigm, and now the code will only have to
    change in one place.

M       ext/XS-APItest/t/utf8_warn_base.pl

commit 7d64fdcc3b3143395a49cb65963edc7fa9d21a8f
Author: Karl Williamson <[email protected]>
Date:   Mon Jun 19 12:58:19 2017 -0600

    utf8.c: Fix bugs with overlongs combined with other malformations.
    
    The code handling the UTF-8 overlong malformation must come after
    handling all the other malformations.  This is because it may change the
    code point represented to the REPLACEMENT CHARACTER.  The other
    malformation code is expecting the code point to be the original one.
    This may cause failure to catch and report other malformations, or
    report the wrong value of the erroneous code point.
    
    What was needed was simply to move the 'if else' branch for overlongs to
    after the branches for the other formations.

M       ext/XS-APItest/t/utf8_warn_base.pl
M       utf8.c

commit 96555668166fd8b50cb8b6b843bf7ffac5fbf2fe
Author: Karl Williamson <[email protected]>
Date:   Sat Jun 24 22:55:10 2017 -0600

    APItest/t/utf8_warn_base.pl: Add some tests
    
    This adds testing for having some malformations allowed.  These had not
    been checked for, and there were some bugs.  It's easiest to TODO all
    ones that might fail, creating many passing TODOs.  The TODO will be
    removed in the next commit.

M       ext/XS-APItest/t/utf8_warn_base.pl

commit 15939cf13ceba5cbf98358b63e13d611d7d9c0e9
Author: Karl Williamson <[email protected]>
Date:   Sat Jun 24 22:42:25 2017 -0600

    APItest/t/utf8_warn_base.pl: Move things out of inner loop
    
    The most expensive stuff in this set of nested loops can actually be
    done several nests up (even higher for some things, but it's not worth
    the trouble).  Given that this test file has been too-long runnning, I
    moved things to an outer loop context.

M       ext/XS-APItest/t/utf8_warn_base.pl

commit a3d866eb06ee216b5b9a7508bb9d3432c555fb69
Author: Karl Williamson <[email protected]>
Date:   Sat Jun 24 21:32:41 2017 -0600

    APItest/t/utf8_warn_base.pl: Reorder loop nesting
    
    This is in preparation for the next commit.  It also changes some of the
    loop variables to 1 to indicate truth, rather than a string.  This will
    make some things easier later.

M       ext/XS-APItest/t/utf8_warn_base.pl

commit eb2a9d4cd07fb797c0afd1401c87ccf659ad6cf0
Author: Karl Williamson <[email protected]>
Date:   Wed Jun 21 13:38:55 2017 -0600

    APItest/t/utf8_warn_base.pl: Revamp testing isFOO
    
    Several commits ago, the loop that handles testing the functions that
    convert from/to UTF-8 was revampled.  This commit does a similar thing
    for the portion of the code that handles the isFOO functions, and
    partial character recognition.
    
    It reorders the nesting of loops so that more tests can be done than
    previously in the outer loop.  Among these, it now doesn't skip overflow
    and deals with using Perl's extended UTF-8 better.

M       ext/XS-APItest/t/utf8_warn_base.pl

commit 6026e1ecd4572f186ec6b17442a47c7289fe1e4f
Author: Karl Williamson <[email protected]>
Date:   Mon Jun 19 12:56:38 2017 -0600

    utf8n_to_uvchr: U+ should be for only Unicode code points
    
    For above-Unicode, we should use 0xDEADBEEF instead of U+DEADBEEF.
                                     ^^                    ^^
    This is because U+ only applies to Unicode.  This only affects a warning
    message for overlongs.

M       ext/XS-APItest/t/utf8_warn_base.pl
M       utf8.c

commit b316e9a0bcdea13b8ee869acee86c39e612a391d
Author: Karl Williamson <[email protected]>
Date:   Mon Jun 19 11:52:34 2017 -0600

    APItest/t/utf8_warn_base.pl: Add some tests
    
    This adds the edges between overflowing and not on 64-bit platforms

M       ext/XS-APItest/t/utf8_warn_base.pl

commit 45b841a2c5f1c63cff6d19558bc263b8dd43a382
Author: Karl Williamson <[email protected]>
Date:   Mon Jun 19 11:47:54 2017 -0600

    APItest/t/utf8_warn_base.pl: Do test on all platforms
    
    This modifies and moves a test so it gets done on all platforms, not
    just 32-bit ASCII.  It is an edge case on all platforms, but gives
    differing results, overflowing on 32-bit ones.

M       ext/XS-APItest/t/utf8_warn_base.pl

commit bca379d5d8f63a07408faa867babe9c41372f2e7
Author: Karl Williamson <[email protected]>
Date:   Mon Jun 19 11:01:54 2017 -0600

    APItest/t/utf8_warn_base.pl: Rename and modify test
    
    This test is testing the first code point that requires 13 UTF-8 bytes
    to represent on ASCII platforms.  Change the name from its previous
    vague one to one that indicates this.  And don't test for it on EBCDIC
    platforms, as it isn't an edge case there.

M       ext/XS-APItest/t/utf8_warn_base.pl

commit 8c3afa1ff13abd31dde84205ad35cb10fc4e7ff8
Author: Karl Williamson <[email protected]>
Date:   Sun Jun 18 22:55:38 2017 -0600

    APItest/t/utf8_warn_base.pl: Remove obsolete test
    
    This was an attempt to test the fact that very high code points are
    controlled both by regular above-Unicode warnings, and special,
    non-portable warnings.  This test is now done better in the loop in the
    file.

M       ext/XS-APItest/t/utf8_warn_base.pl

commit 37bd3ae986027cd99895349f7b0e40197149aab7
Author: Karl Williamson <[email protected]>
Date:   Sun Jun 18 22:52:06 2017 -0600

    APItest/t/utf8_warn_base.pl: Rename a test
    
    The names are now more uniform.

M       ext/XS-APItest/t/utf8_warn_base.pl

commit d392e748cb4c6cb3fbe351da57f403e9abfe42b7
Author: Karl Williamson <[email protected]>
Date:   Sun Jun 18 22:50:12 2017 -0600

    APItest/t/utf8_warn_base.pl: Move some tests in the file
    
    The order had been to mostly test in increasing code point order.  This
    sorts the two exceptions to comply.

M       ext/XS-APItest/t/utf8_warn_base.pl

commit fe3a76b3908dea76232dd634622ddcfdeaaefc0b
Author: Karl Williamson <[email protected]>
Date:   Sun Jun 18 22:36:21 2017 -0600

    APItest/t/utf8_warn_base.pl: Split test into 64 vs 32 bit versions
    
    It's cleaner to have this test which differs on 32 vs 64 bit platforms
    in the appropriate sections that have other tests specific to their
    platforms.
    
    The tests for EBCDIC were arbitrary, just placeholders really, since
    these particular tests were added for situations found only on ASCII
    platforms.  Therefore, the EBCDIC tests were removed.

M       ext/XS-APItest/t/utf8_warn_base.pl

commit b0e4569a29dea8f1682d2d8a4ba067c07fa1178a
Author: Karl Williamson <[email protected]>
Date:   Sun Jun 18 22:25:39 2017 -0600

    APItest/t/utf8_warn_base.pl: Create block for warnings control
    
    This adds a block that turns off warnings in the whole thing, so that
    tests can be more easily be modified in future commits, and the interior
    warnings control statments can be removed.

M       ext/XS-APItest/t/utf8_warn_base.pl

commit 5d56784e12d5ffea21e0e5fc13901517573c9844
Author: Karl Williamson <[email protected]>
Date:   Sat Jun 17 22:31:58 2017 -0600

    APItest/t/utf8_warn_base.pl: White-space, comments only
    
    This reflows things after the changes in the previous commits

M       ext/XS-APItest/t/utf8_warn_base.pl

commit c1ecca555e2c7bfc95a14b291565b13e452572e5
Author: Karl Williamson <[email protected]>
Date:   Sat Jun 17 18:58:54 2017 -0600

    APItest/t/utf8_warn_base.pl: Remove hash element
    
    The previous commit has enabled this one to remove another of the hash
    elements from the tests data structure.  The value can now be calculated
    from the code point.  The element gave the warnings category to used.
    But now we use the category based on the code point, with special
    handling of the ones that can be true for regular above-Unicode, and
    those that are so far above Unicode that they must use Perl's extended
    UTF-8.

M       ext/XS-APItest/t/utf8_warn_base.pl

commit f4ef7ad14a4b832b84fb63ec239e752eb34dbd68
Author: Karl Williamson <[email protected]>
Date:   Sat Jun 17 06:43:03 2017 -0600

    APItest/t/utf8_warn_base.pl: Remove most tests
    
    In order to test that the various flags passed to utf8n_to_uvchr()
    work independently of each other, previously this file tried all
    possible combinations.  But, as explained in the comments added in this
    commit, by appropriate use of all the flags that don't apply to
    something being tested, we can verify that those flags are independent
    of that thing, and cut down the combinatorial complexity significantly.

M       ext/XS-APItest/t/utf8_warn_base.pl

commit c576ab6001ede00bf5dd30070b67608434c6ad72
Author: Karl Williamson <[email protected]>
Date:   Thu Jun 15 12:06:57 2017 -0600

    utf8n_to_uvchr() Use correct warnings category
    
    The warning about too large a code point should be under the
    'non_unicode' warnings category.

M       ext/XS-APItest/t/utf8_warn_base.pl
M       t/lib/warnings/utf8
M       utf8.c

commit e50fc975548303e3e009c029ddcf23a86b6b84e0
Author: Karl Williamson <[email protected]>
Date:   Sun Jul 2 09:11:17 2017 -0600

    APItest/t/utf8_warn_base.pl: Revamp loop to/from utf8
    
    This test file had gotten kinda messy as new tasks were shoe horned into
    it.  This cleans it up, and positions it to be easier maintain going
    forward.  I tried to minimize the number of changes shown per commit,
    but this is the minimal I could get, and since it is a revamp, there are
    lots of differences.
    
    Some combinatorial explosion has been removed.
    
    A new subroutine is created which compares the expected vs actually
    gotten warnings, and is called in two places, removing duplicated code.
    
    This exposed a bug in very large, hence rare, code points.  It will be
    fixed in the next commit.  It was far easier to just make all similar
    tests TODO here, removing that in the next commit.  This means this
    commit has many passing TODOs

M       ext/XS-APItest/t/utf8_warn_base.pl

commit edfc2251dde70f8f42603a16c5173eae9fc74609
Author: Karl Williamson <[email protected]>
Date:   Thu Jun 15 18:53:43 2017 -0600

    APItest/t/utf8_warn_base.pl: Tighten up tests
    
    This commit causes the tests to check that messages containing a code
    point have the correct exact wording, including the code point.  The
    tests are tightened up somewhat for other messages, but more is coming
    in a later commit.

M       ext/XS-APItest/t/utf8_warn_base.pl

commit 69adde3be2d90a482b7f94b22124c890d351fd85
Author: Karl Williamson <[email protected]>
Date:   Thu Jun 15 18:27:54 2017 -0600

    APItest/t/utf8_warn_base.pl: Skip most tests
    
    This test file tests every end-of-Unicode-plane noncharacter, and a
    middling surrogate, and a nonchar in the interior of the consecutive
    range of them.  But, we don't really have to do more than basic testing
    for these  middling cases.  We should test that they are detected as
    being in their respective categories, but testing that all combinations
    of warning and disallowed flags and return flags shouldn't be necessary.
    It's sufficient to test for those for the real edge cases.
    
    This cuts the number of tests in this file to somewhat less than 1/3 of
    the original.

M       ext/XS-APItest/t/utf8_warn_base.pl

commit 1210fb9553e84756df0959a0ed9d16eb17a7fabd
Author: Karl Williamson <[email protected]>
Date:   Sat Jun 17 06:27:59 2017 -0600

    APItest/t/utf8_warn_base.pl: Store warnings sans \n
    
    This will make the output more legible that future commits will create

M       ext/XS-APItest/t/utf8_warn_base.pl

commit 8f0f9304172839d54711489633470912ebcae204
Author: Karl Williamson <[email protected]>
Date:   Thu Jun 15 17:35:54 2017 -0600

    APItest/t/utf8_warn_base.pl: Change some test names
    
    This omits distracting detail from subsidiary tests, indenting them from
    the major one.

M       ext/XS-APItest/t/utf8_warn_base.pl

commit b02bbd4c2ec216b612856ef6ee70e79695fc7661
Author: Karl Williamson <[email protected]>
Date:   Thu Jun 15 16:13:12 2017 -0600

    APItest/t/utf8_warn_base.pl: Simplify some calculations
    
    This commit pulls some variable setting outside an inner loop.  It's
    easily settable there, instead of being calculated.  It allows for
    removal of another hash element.

M       ext/XS-APItest/t/utf8_warn_base.pl

commit 3b0eebd6751eeabef27f1076f7ce84474b0837e2
Author: Karl Williamson <[email protected]>
Date:   Thu Jun 15 15:45:14 2017 -0600

    APItest/t/utf8_warn_base.pl: Do formatting outside loop
    
    To save extra effort

M       ext/XS-APItest/t/utf8_warn_base.pl

commit 6148a5a8f4f285263ec68eb16e9d151a5226ab88
Author: Karl Williamson <[email protected]>
Date:   Thu Jun 15 15:00:08 2017 -0600

    APItest/t/utf8_warn_base.pl: Improve some more diagnostics
    
    This changes the diagnostics when testing utf8n_to_uvchr() so they are
    more human readable, and aren't generated until failure.
    
    It also corrects things to display $@ on eval failure (previously it
    displayed $!)

M       ext/XS-APItest/t/utf8_warn_base.pl

commit 3fab67e26368d2c1fce74034cb98b4f47fab0814
Author: Karl Williamson <[email protected]>
Date:   Thu Jun 15 14:24:05 2017 -0600

    APItest/t/utf8_warn_base.pl: Improve some diagnostics
    
    This creates a function that will display in more human-readable form
    the eval string used for testing uvchr_to_utf8().  And it calls that
    function should there be a failure.  Thus the calculations aren't done
    unless necessary.
    
    It also corrects a diagnostic to show $@ after an eval failure instead
    of $!

M       ext/XS-APItest/t/utf8_warn_base.pl

commit 3c3845c4f6582988e7673e74f3f4f38b10fe5207
Author: Karl Williamson <[email protected]>
Date:   Thu Jun 15 12:49:10 2017 -0600

    APItest/t/utf8_warn_base.pl: Display mnemonics on error
    
    Part of the testing for this is that the returned flags for problematic
    conditions are correct.  This commit adds a routine that will convert
    numeric values of the flags into a mnemonic string like FOO|BAR|BAZ.
    This makes debugging easier.  The names are not computed unless there is
    an error.

M       ext/XS-APItest/t/utf8_warn_base.pl

commit 4d9d4e0bfa58eda850604bb21758abbad83b37e8
Author: Karl Williamson <[email protected]>
Date:   Tue Jun 13 22:48:36 2017 -0600

    APItest/t/utf8_warn_base.pl: Rename some variables
    
    The new names more closely indicate the variables' purposes.

M       ext/XS-APItest/t/utf8_warn_base.pl

commit 305dd7705b8871d9852518c5b65486d52d34f493
Author: Karl Williamson <[email protected]>
Date:   Fri Jun 30 11:55:18 2017 -0600

    APItest/t/utf8_warn_base.pl: Make hash element optional
    
    This element of the hash gives how many bytes are needed in an
    incomplete sequence in order to classify the full sequence.  In some
    cases every code point in the category has this be the same number, and
    it can be cleaner to not manually specify the number.

M       ext/XS-APItest/t/utf8_warn_base.pl

commit d8acef1fe59fb5ddc9ef7b671e38997fd8f176cc
Author: Karl Williamson <[email protected]>
Date:   Thu May 25 21:16:29 2017 -0600

    APItest/t/utf8_warn_base.pl: Remove hash elements
    
    These two elements can be calculated from the others

M       ext/XS-APItest/t/utf8_warn_base.pl

commit f0a333e197c6aeafe7af493084e69a88bc67e77b
Author: Karl Williamson <[email protected]>
Date:   Thu May 25 21:04:09 2017 -0600

    APItest/t/utf8_warn_base.pl: Remove element from hash
    
    The warning message can be figured out from other elements.

M       ext/XS-APItest/t/utf8_warn_base.pl

commit 1257c2f20f338ae516493f2ca7513f593219c031
Author: Karl Williamson <[email protected]>
Date:   Thu May 25 20:09:07 2017 -0600

    APItest/t/utf8_warn_base.pl: Eliminate hash element
    
    This is leftover from an earlier version of the tests, and can be
    calculated instead of having to manually specify it.

M       ext/XS-APItest/t/utf8_warn_base.pl

commit 37a13d7390010d23867bf1d7200e36b5cb44c25c
Author: Karl Williamson <[email protected]>
Date:   Wed Jun 14 15:24:29 2017 -0600

    APItest/t/utf8_warn_base.pl: Standardize overflow test detection
    
    There are two methods currently for detecting if a test is for overflow.
    This standardizes on the one where the expected code point is 0, and
    uses the already existing variable instead of qr//

M       ext/XS-APItest/t/utf8_warn_base.pl

commit aee980966c39e96f001c7e6505c64b3c5d9c85b6
Author: Karl Williamson <[email protected]>
Date:   Mon May 15 09:54:40 2017 -0600

    APItest/t/utf8.t: Don't test above IV_MAX
    
    For 32-bit platforms, this means moving  the tests to the 64-bit only 
portion of
    the file.  And it comments out the tests that are above 64-bit IV_MAX.
    
    This is in preparation for IV_MAX being the upper legal limit for code
    points.

M       ext/XS-APItest/t/utf8.t

commit 3c2fc3a9c730b046767bc06394391f3b4d44476f
Author: Karl Williamson <[email protected]>
Date:   Wed Jul 5 11:31:12 2017 -0600

    APItest/t/utf8.t: Add a test
    
    This test will be important when we convert to limiting code points to
    at most IV_MAX.

M       ext/XS-APItest/t/utf8.t

commit 090afa8de952e8a18a20b4c667e77296057c4dd5
Author: Karl Williamson <[email protected]>
Date:   Sat May 13 22:58:00 2017 -0600

    APItest/t/utf8.t: Comments, white-space only

M       ext/XS-APItest/t/utf8.t

commit 32c851f603bcffe58babc9ac4c22be99a2a4d707
Author: Karl Williamson <[email protected]>
Date:   Sat May 13 22:53:47 2017 -0600

    APItest/t/utf8.t: Better handle some platforms
    
    A future commit will cause some expected errors to not actually be
    errors on some platforms.  This detects and handles these.

M       ext/XS-APItest/t/utf8.t

commit 8b9659c9b459c2c8ff074af2505aa2ac80f936ed
Author: Karl Williamson <[email protected]>
Date:   Sat May 13 22:51:43 2017 -0600

    APItest/t/utf8.t: Remove unnecessary hash initializations

M       ext/XS-APItest/t/utf8.t

commit 41c592be50ead68c0cac88c4341ac41c833ace7a
Author: Karl Williamson <[email protected]>
Date:   Sat May 13 22:50:26 2017 -0600

    APItest/t/utf8.t: Fix some convoluted code
    
    This code got overly complex as time went by, and can be cleaned up.

M       ext/XS-APItest/t/utf8.t

commit 82970464f328cb3dbc4eabd6bc006b5e0af0e819
Author: Karl Williamson <[email protected]>
Date:   Mon May 8 09:47:41 2017 -0600

    APItest/t/utf8.t: Rmv useless line
    
    This entry is overwritten by the next line.

M       ext/XS-APItest/t/utf8.t

commit c8f5128fd964752d7421fe340aed9ecc3badae22
Author: Karl Williamson <[email protected]>
Date:   Mon Jun 26 22:27:23 2017 -0600

    APItest/t: Change some variable names
    
    One of these is used in multiple test files in this directory.
    
    The names are ambiguous for the contexts they occur in.  'first' can
    mean earliest in the string, but here it means the lowest ordinal value.

M       ext/XS-APItest/t/utf8.t
M       ext/XS-APItest/t/utf8_setup.pl
M       ext/XS-APItest/t/utf8_warn_base.pl

commit 4c0a845bd39680ae2dcb0c582c41e8beb581b5ba
Author: Karl Williamson <[email protected]>
Date:   Thu Jun 15 12:01:15 2017 -0600

    APItest/t/utf8_setup.pl: Make sure diagnostics are on separate lines
    
    This changes diagnostic output to guarantee each element of the array
    starts on a new line, for easier readability.  The array may or may not
    already have terminating \n characters in the elements.

M       ext/XS-APItest/t/utf8_setup.pl

commit 04adb9df5f4018ee5a32da7c4d2d7b0b0d6e66c0
Author: Karl Williamson <[email protected]>
Date:   Mon May 29 20:58:32 2017 -0600

    APItest/t/utf8_setup.pl: Split function into two
    
    This function outputs a byte string as hex bytes.  A future commit will
    want that output without surrounding quotes, so create a version that
    doesn't have them.
    
    This also corrects the number of bytes needed to discern that the
    overflow happens on 64-bit platforms from 2 to 3.  This error would
    be exposed by tests added in future commits.

M       ext/XS-APItest/t/utf8_setup.pl

commit d45746330a2997f57c595b93097dd6365b7624b5
Author: Karl Williamson <[email protected]>
Date:   Mon Jun 26 22:08:01 2017 -0600

    utf8n_to_uvchr(): Avoid some work
    
    By adding a single mask, we can avoid some unnecessary work, as that
    work is not necessary if just the one bit is set.

M       utf8.c

commit 97885082418fdcffd9c31600fff0a2de99073201
Author: Karl Williamson <[email protected]>
Date:   Fri Jun 30 12:37:15 2017 -0600

    utf8.c: Comments, white-space only

M       utf8.c

commit f4a95a8ce72e00aba4f4f136d1fcb991058855ea
Author: Karl Williamson <[email protected]>
Date:   Fri Jun 30 12:35:53 2017 -0600

    utf8.c: Consolidate duplicated string constants
    
    This reduces maintenance costs if they have to be updated.

M       utf8.c

commit 624f311ee4aaa00540c427bdd5c2d44e9a14e85d
Author: Karl Williamson <[email protected]>
Date:   Tue May 9 20:16:13 2017 -0600

    utf8.c: Don't calc code point from overflowing UTF8
    
    This avoids calculating a code point from UTF-8 that is known to
    overflow.  This could give incorrect results (used only in warning
    messages), but is done only when there are 3 (or more) malformations:
    overflow, overlong, UTF-8 terminated early, so it's unlikely to actually
    happen in the field.
    
    I am not adding any tests, as I don't know of any existing failures, and
    soon there will be a commit that limits code points to be at most
    IV_MAX.  That commit will cause cause existing tests to fail without
    this fix, so that is good enough to test it.  I imagine a brute force
    generator of UTF-8 would find some string that showed this problem up
    absent the other coming changes, but it's not worth it.

M       utf8.c

commit 1bfc96440f2f059490f52bf49e907ff26d26c628
Author: Karl Williamson <[email protected]>
Date:   Mon Jul 3 18:59:50 2017 -0600

    t/uni/parser.t: Skip some tests on 32-bit platforms
    
    These tests require code points that are too large for 32-bit platforms,
    so skip there.

M       t/uni/parser.t

commit a5673894212f8b972102080c9f02b14840cb2c13
Author: Karl Williamson <[email protected]>
Date:   Tue May 9 20:27:40 2017 -0600

    Move test from t/opbasic to t/uni
    
    This test is really not very basic, so it doesn't belong in opbasic.  It
    is for having a string delimiter be a very large code point, well above
    the legal strict Unicode max.  The code point is 2**32 - 1, which is
    UV_MAX on 32-bit platforms.
    
    Use of UV_MAX for a delimiter is about to become illegal, and so this
    test needs to be skipped on these.  Since this is compile time, there
    are a few complications in getting the script to compile on such
    systems, even though it is skipped at run time.
    
    The opbasic test file is so basic that it doesn't even use t/test.pl,
    whereas the one in t/uni does use that, and that has better
    infrastructure for handling this issue, including getting it to work on
    EBCDIC platforms.

M       t/opbasic/qq.t
M       t/uni/parser.t

commit 507cb9a9cee9e27a91a010d57aa42daa94599f5e
Author: Karl Williamson <[email protected]>
Date:   Mon Jul 3 11:30:52 2017 -0600

    t/comp/parser.t: Skip test on 32-bit builds
    
    This code point is no longer legal on such builds.  We need to use this
    code point to trigger the bug, so can't lower it to work on those
    builds.

M       t/comp/parser.t

commit 6b337a806a749a1f3e068b681a4f1332dfce393c
Author: Karl Williamson <[email protected]>
Date:   Mon Jul 3 13:52:31 2017 -0600

    t/op/index.t: Skip now illegal code points on 32 bit builds
    
    These tests use code points that are now illegal on 32-bit platforms, so
    skip them there.  The failures these tests were added for did not happen
    except on these now-illegal code points.

M       t/op/index.t

commit 0e91066aadebe2e814355c7bf4ecf28cd4d486b5
Author: Karl Williamson <[email protected]>
Date:   Mon Jul 3 09:33:09 2017 -0600

    t/op/chop.t: Don't use too large code points
    
    The bug this was testing for requires a code point that will no longer
    be legal on 32-bit machines.  So skip there, and revise to use chr() in
    the skipped code instead of "\x{}".  The latter would compile even if
    execution gets skipped, so would cause it to die.  This also tests the
    very highest legal code point on 64-bit machines, which is now illegal,
    so test the new very highest one.

M       t/op/chop.t

commit 7985103f6da4172811e971ee10ed81e40ccfb54e
Author: Karl Williamson <[email protected]>
Date:   Sun Jul 2 10:34:12 2017 -0600

    t/re/pat_advanced.t: Revise some tests
    
    These tests used the highest available code points, but those will soon
    be made illegal.  The tests don't need to be for these particular code
    points, but there do need to be tests of user-defined properties of high
    code points, so this commit changes to use the highest ones that will be
    legal after that change.

M       t/re/pat_advanced.t

commit 7ddf5620cd0dfa12f6bbda5e6392f1d90a78782a
Author: Karl Williamson <[email protected]>
Date:   Mon Jul 3 13:46:42 2017 -0600

    Restore a portion of reverted commits
    
    See the previous commit for details.

M       t/lib/warnings/utf8
M       t/op/ver.t

commit b10178f226eee6098e3431d4c1fc7859bf477889
Author: Karl Williamson <[email protected]>
Date:   Mon Jul 3 12:26:34 2017 -0600

    Revert: Restrict code points to <= IV_MAX
    
    This reverts the two related commits
    51099b64db323d0e1d871837f619d72bea8ca2f9  (partially)
    13f4dd346e6f3b61534a20f246de3a80b3feb743  (entirely)
    
    I was in the middle of a long branch dealing with this and related
    issues when these were pushed to blead.  It was far easier for me to
    revert these at the beginning of my branch than to try to rebase
    unreverted.  And there are changes needed to the approaches taken in the
    reverted commits.  A third related commit,
    113b8661ce6d987db4dd217e2f90cbb983ce5d00, doesn't cause problems so
    isn't reverted.
    
    I reverted the second commit, then the first one, and squashed them
    together into one.  This is to avoid problems when bisecting on a 32-bit
    machine.  If the bisect landed between the commits, it could show
    failures.  The portion of the first commit that wasn't reverted was the
    part that was rendered moot because of the changes in the meantime that
    forbid bitwise operations on strings containing code points above
    Latin1.
    
    The next commit in this series will restore portions of these commits.
    I reverted as much as possible here to make it easier to track down any
    issues that arise.
    
    The biggest problem with these commits, is that some Perl applications
    are made vulnerable to Denial of Service attacks.  I do believe it is ok
    to croak when a program tries, for example, to do chr() of too large a
    number, which is what the reverted commit does (and what this branch
    will eventually reinstate doing).  But when parsing UTF-8, you can't
    just die if you find something too large.  That would be an easy DOS on
    any program, such as a web server, that gets its UTF-8 from the public.
    Perl already has a means to deal with too-large code points (i.e.  those
    that overflow the word size), and web servers should have already been
    written in such a way as to deal with these.  This branch just adapts
    the code so that anything above IV_MAX is considered to be overflowing.
    Web servers should not have to change as a result.
    
    A second issue is that one of the reasons we did the original
    deprecation is so that we can use the forbidden code points internally
    ourselves, such as Perl 6 does to store Grapheme Normal Form.  The
    implementation should not burn bridges, but allow that use to easily
    happen when the time comes.  For that reason, some tests should not be
    deleted, but commented out, so they can be quickly adapted.
    
    While working on this branch, I found several unlikely-to-occur bugs in
    the existing code.  These should be fixed now in the code that handles
    up to UV_MAX code points, so that when we do allow internal use of such,
    the bugs are already gone.
    
    I also had researched the tests that fail as a result of the IV_MAX
    restriction.  Some of the test changes in these reverted commits were
    inappropriate.
    
    For example, some tests that got changed were for bugs that happen only
    on code points that are now illegal on 32-bit builds.  Lowering the code
    point in the test to a legal value, as was done in some instances,  no
    longer tests for the original bug.  Instead, where I found this, I just
    skip the test on 32-bit platforms.
    
    Other tests were simply deleted, where a lower code point would have
    worked, and the test is useful with a lower code point.  I retain such
    tests, using a lower code point.
    
    And still other tests were from files that I extensively revamp, so I
    went with the revamp.
    
    The following few commits fix those as far as possible now.  This is so
    that the reversion of the tests and my changes are close together in the
    final commit series.  Some changes have to wait to later, as for those
    where the entire test files are revamped, or when the deprecation
    messages finally go away in the final commit of this series.

M       ext/XS-APItest/t/utf8.t
M       ext/XS-APItest/t/utf8_warn_base.pl
M       t/comp/parser.t
M       t/lib/warnings/utf8
M       t/op/chop.t
M       t/op/index.t
M       t/op/ver.t
M       t/opbasic/qq.t
M       t/re/pat_advanced.t
M       t/uni/parser.t
M       utf8.c
-----------------------------------------------------------------------

--
Perl5 Master Repository

Reply via email to