In perl.git, the branch smoke-me/khw-ebcdic has been created

<https://perl5.git.perl.org/perl.git/commitdiff/b68728aa1fa18100e00885c14faead9e0a84613d?hp=0000000000000000000000000000000000000000>

        at  b68728aa1fa18100e00885c14faead9e0a84613d (commit)

- Log -----------------------------------------------------------------
commit b68728aa1fa18100e00885c14faead9e0a84613d
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 2 22:34:37 2019 -0600

    intrpvar.h: Add variable for use in tr///
    
    This is part of this branch of changes.

commit 6baddda157b548e8ccafc9caa52fbe4284b6c6cc
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 2 15:29:05 2019 -0600

    Allow core to work with code points above IV_MAX
    
    Higher has been reserved for core use, and a future commit will want to
    finally do this.

commit 7be70f378d1537352483db59ccc5610dcfbc0eb5
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 2 22:04:12 2019 -0600

    utfebcdic.h: Add comments

commit aa66debe59fcb1e3e724d416fa85b74f3066231b
Author: Karl Williamson <[email protected]>
Date:   Fri Sep 20 09:51:13 2019 -0600

    Move Perl_regnext to regexec.c
    
    This function is moved to the file that calls it incessantly in real
    time from regcomp.c that uses it in compilation, which experience has
    shown can be less efficient and doesn't affect the overall performance.
    
    Now the compiler has full knowledge of this function in the translation
    unit that performance is critical in, and can hopefully perform better
    optimizations.

commit 279a8ecd6df5a5b9458c4cfdd16fbeb7bb1c6b4f
Author: Karl Williamson <[email protected]>
Date:   Fri Sep 20 09:45:29 2019 -0600

    regnext: Add some branch predictor hints

commit 8b2c5c4665fd4ae84b578c748f24a13b344cf3dc
Author: Karl Williamson <[email protected]>
Date:   Thu Sep 19 22:18:02 2019 -0600

    Change data lookup from a macro to a function

commit 2edf274e5bdd94a33433fb171c95cc445bec7aac
Author: Karl Williamson <[email protected]>
Date:   Thu Sep 19 21:54:03 2019 -0600

    regen/regcomp.pl: Enforce all lonj nodes being last

commit da0f852b1261fefaae0dc71ed2a9efb18337a971
Author: Karl Williamson <[email protected]>
Date:   Thu Sep 19 20:34:17 2019 -0600

    regcomp.sym: Move regnodes to end that don't use next_off
    
    Most regnodes use the next_off field in a regnode structure, to link to
    the next one in the chain.  But some require more than the 16 bits it
    contains, so they use a different, 32 bit, field.
    
    Currently, there is a lookup array to distinguish between the types, but
    that becomes unnecessary if all of one sort are grouped before or after
    all of the other.

commit b66fee5ca6a7ef65a91207b8ac53320f0f313fc7
Author: Karl Williamson <[email protected]>
Date:   Sat Sep 21 09:51:52 2019 -0600

    Add ANYOFRb regnode
    
    This is like the ANYOFR regnode added in the previous commit, but all
    code points in the range it matches are known to have the same first
    UTF-8 start byte.  That means it can't match UTF-8 invariant characters,
    like ASCII, because the "start" byte is different on each one, so it
    could only match a range of 1, and the compiler wouldn't generate this
    node for that; instead using an EXACT.
    
    Pattern matching can rule out most code points by looking at the first
    character of their UTF-8 representation, before having to convert from
    UTF-8.
    
    On ASCII this rules out all but 64 2-byte UTF-8 characters from this
    simple comparison.  3-byte it's up to 4096, and 4-byte, 2**18, so the
    test is less effective for higher code points.
    
    I believe that most UTF-8 patterns that otherwise would compile to
    ANYOFR will instead compile to this, as I can't envision real life
    applications wanting to match large single ranges.  Even the 2048
    surrogates all have the same first byte.

commit 951c76af412102ad78ce037727bec523bc9027d6
Author: Karl Williamson <[email protected]>
Date:   Thu Sep 19 16:03:04 2019 -0600

    Add ANYOFR regnode
    
    This matches a single range of code points.  It is both faster and
    smaller than other ANYOF-type nodes, requiring, after set-up, a single
    subtraction and conditional branch.
    
    The vast majority of Unicode properties match a single range, though
    most of these are not likely to be used in real world applications.  But
    things like [ij] are a single range, and those are quite commonly
    encountered.  This matches them more efficiently than a bitmap would,
    and doesn't require the space for one either.
    
    The flags field is used to store the minimum matchable start byte for
    UTF-8 strings, and is ignored for non-UTF-8 targets.  This, like ANYOFH
    nodes which have the same mechanism, allows for quick weeding out of
    many possible matches without having to convert the UTF-8 to its
    corresponding code point.
    
    This regnode packs the 32 bit argument with 20 bits for the minimum code
    point the node matches, and 12 bits for the maximum range.  Values
    outside those simply won't compile to this regnode, instead going to one
    of the ANYOFH flavors.  This is sufficient to match all of Unicode
    except for the final (private use) 65K plane.

commit 4a5972f17b0c63abf4bfdd1e44bcad264793360e
Author: Karl Williamson <[email protected]>
Date:   Thu Sep 19 16:04:03 2019 -0600

    regexec.c: Rmv some unnecessary casts
    
    The called macro does the cast, and this makes it more legibile

commit e1a04aef35ca838aee68ad26845bce0476a4ad90
Author: Karl Williamson <[email protected]>
Date:   Thu Sep 19 15:47:51 2019 -0600

    regcomp.c: Use variables initialized to macro results
    
    instead of the macros.  This is in preparation for the next commit.

commit a73d5a16e9a5ccafeb44035a88ef5e67408c0a6f
Author: Karl Williamson <[email protected]>
Date:   Thu Sep 19 14:20:59 2019 -0600

    regcomp.c: Add parameter to static function
    
    This further decouples this function from knowing details of the calling
    structure, by passing this detail in.

commit 80e45f2b92d27e80207b601530c5bbb96ce06f38
Author: Karl Williamson <[email protected]>
Date:   Wed Sep 18 13:20:42 2019 -0600

    t/re/anyof.t: Add a test
    
    This makes sure a non-folding above-Latin1 character is tested.

commit 567ce18e6521f1969f2436f703072a1197dc2cba
Author: Karl Williamson <[email protected]>
Date:   Thu Sep 19 14:38:39 2019 -0600

    regcomp.c: Comments/white-space
    
    Included is outdenting code whose enclosing block was removed in the
    previous commit.

commit f0fe8dd77686b1db50609732d77cad87bbea183b
Author: Karl Williamson <[email protected]>
Date:   Wed Sep 18 13:12:51 2019 -0600

    XXX warning tests,Prefer EXACTish regnodes to ANYOFH nodes
    
    ANYOFH nodes (that match code points above 255) are smaller than regular
    ANYOF nodes because they don't have a 256-bit bitmap.  But the
    disadvantage of them over EXACT nodes is that the characters encountered
    must first be converted from UTF-8 to code point.  The difference is
    less clearcut with /i, because typically, currently, the UTF-8 must also
    be converted to code point in order to fold them.  But the EXACTFish
    node doesn't have an inversion list to do lookup in, and occupies
    less space, because it doesn't have inversion list data attached to it.
    
    Also there is a bug in using ANYOFH under /l, as wide character warnings
    should be emitted if the locale isn't a UTF-8 one.
    
    The reason this change hasn't been made before (by me anyway) is that
    the old way avoided upgrading the pattern to UTF-8.  But having thought
    about this for a long time, to match this node, the target string must
    be in UTF-8 anyway, and having a UTF8ness mismatch slows down pattern
    matching, as things have to be continually converted, and reconverted
    after backtracking.

commit 2fa05d80418a0aa3feb1310550be877ce6d629b3
Author: Karl Williamson <[email protected]>
Date:   Wed Sep 18 12:45:55 2019 -0600

    t/re/anyof.t: Fix highest range tests
    
    Previously we had infinity minus 1, but infinity should be beyond the
    range, and the highest isn't infinity - 1, but the highest legal code
    point.

commit 2f219c7edd72ddf9186bde7a8b389bf333975b1a
Author: Karl Williamson <[email protected]>
Date:   Wed Sep 18 12:41:41 2019 -0600

    t/re/anyof.t: Remove duplicate test
    
    These are covered by the single code point tests.

commit 561d56c097eb2a70d29ace331a35f0716e1677ef
Author: Karl Williamson <[email protected]>
Date:   Wed Sep 18 12:34:23 2019 -0600

    t/re/anyof.t: Remove invalid test
    
    One shouldn't be able to specify an infinite code point.  The tests have
    the conceit that one can specify a range's upper limit as infinity, but
    that is just shorthand for the range being unbounded.

commit 6ab283c902eaf797e130902ba2b3fe25770e61ab
Author: Karl Williamson <[email protected]>
Date:   Sat Sep 21 10:00:40 2019 -0600

    t/re/anyof.t: Revise test
    
    to make it correspond more with the test that precedes it

commit dc2747c0107f55c808879448ece90da38d33b7d4
Author: Karl Williamson <[email protected]>
Date:   Wed Sep 18 12:31:11 2019 -0600

    re/anyof.t: Clarify failing message
    
    When a test fails, an extra test is run to output debugging info; this
    will cause the planned number of tests to be wrong, which will output an
    extra, confusing message.  This adds an explanation that the number is
    expected to be wrong, hence not to worry.

commit 0f3e53356ae5da6fb343e5db7b90c92369f8ef10
Author: Karl Williamson <[email protected]>
Date:   Thu Sep 12 20:19:07 2019 -0600

    Allow some optimizations of qr/(?[...])/
    
    Prior to this commit, this construct always returned an ANYOF node, even
    if it could be optimized into something else.

commit a0afc45f67e3c1a98850094df25850e882b66533
Author: Karl Williamson <[email protected]>
Date:   Thu May 30 20:57:27 2019 -0600

    regcomp.c: Add invlist_lowest()
    
    This function hides the invlist implementation from the calling code,
    and will be called in more than one place in the future.

commit 328893707f9b1a60d2d76ae446445c38a32bf2a6
Author: Karl Williamson <[email protected]>
Date:   Thu Sep 12 21:06:45 2019 -0600

    regcomp.c: Code for qr/(?[...]) handle restart
    
    There is an existing mechanism for code to realize it needs to restart
    parsing from the beginning, say because it needs to upgrade to UTF-8.
    The code for /(?[...])/ did not participate in this.  Currently I don't
    know of any case where it needs to, though perhaps some very hard to
    reproduce case when branch instructions need to start needing to handle
    more than 16 bits, but I kind of doubt it.  Anyway, the next few commits
    introduce the possibility.

commit a6706a92a186e79e5ea446b62591644e24ff5c3d
Author: Karl Williamson <[email protected]>
Date:   Sat Sep 7 09:18:49 2019 -0600

    malloc.c: Use isDIGIT macro instead of hand-rolling it
    
    The macro is more efficient

commit 9d513ed30c437e6cbb4b0a6e999fd6cd38bf8108
Author: Karl Williamson <[email protected]>
Date:   Fri Sep 6 10:25:26 2019 -0600

    doio.c: Use inRANGE macro

commit a8e1c66fe956e639dfb8d691b3af821555ef97db
Author: Karl Williamson <[email protected]>
Date:   Tue Oct 1 22:34:25 2019 -0600

    util.c: Use inRANGE macro

commit cd82dbb8f9b43b80e3167f465308fbc7eff8885f
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 2 20:54:57 2019 -0600

    t/op/tr_latin1.t: Skip ASCII-centric tests on EBCDIC

commit 69d38f5936a78bfec97019ad78e1e723dd285e7f
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 2 20:54:12 2019 -0600

    dist/Data-Dumper/t/dumper.t: Skip ASCII-centric tests on EBCDIC

commit d1d511cf1f6a544a06f7f260cd6daa69dd804e93
Author: Karl Williamson <[email protected]>
Date:   Fri Sep 6 10:23:26 2019 -0600

    t/re/regexp.t: Only convert to EBCDIC once
    
    Some tests get added as we go along, and those added tests have already
    been converted to EBCDIC if necessary.  Don't reconvert, which messes
    things up.

commit bbfbb659b59b21d4674ebf0a1d6635cb724dd14b
Author: Karl Williamson <[email protected]>
Date:   Fri Sep 6 09:49:41 2019 -0600

    re/regexp.t: Change variable name to be more meaningful

commit 9a8c183cd121ab02b266dfd56066688da219530a
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 2 21:20:42 2019 -0600

    Configure klude about none optimize

commit 81881f0431d001e61402fd25bf8585a91d3f6629
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 2 21:19:59 2019 -0600

    XXX regexec.c: debugging prints

commit 8299d9a65aa908797d995dcfc8c75cee303e3bde
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 2 21:19:22 2019 -0600

    regcomp.c: Use inRANGE macro
    
    This is faster and clearer

commit 41e5e8301f0e597790e8acd745d6d82b5b92b467
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 2 21:17:23 2019 -0600

    lib/ExtUtils/t/Embed.t: Skip on EBCDIC
    
    This is not currently implemented for EBCDIC

commit fb1b2d8ca83a56eda2fcb6c8951b87b79db73a71
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 2 21:16:42 2019 -0600

    XXX Pod-Simple

commit 8f3a4f2627a4313a8ab35638c82db70a7dce846d
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 2 21:16:14 2019 -0600

    XXX Encode

commit aabcdcfd42a3fe08f73749835f652e8af1d8596c
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 2 21:12:24 2019 -0600

    dist/Storable/t/regexp.t: Mark some tests as ASCII-only
    
    These tests are ASCII centric

commit ba7e5a52b4b5fcc0e6609d083373b2892e9a2d6b
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 2 21:11:14 2019 -0600

    ext/DynaLoader/dl_aix.xs: Use isDIGIT macro
    
    which is more efficient

commit 62c7270cb33948bf6b4f04106a275136a27dfbe0
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 2 21:10:36 2019 -0600

    pp_pack.c: Use inRANGE macro
    
    which is more efficient

commit ad7e6a0fcea9eb84eeff7d9409675433f08c2a03
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 2 21:09:24 2019 -0600

    t/op/die.t: 'use utf8'
    
    This file is encoded in UTF-8, even though it didn't say it was.

commit 9e3afc13ac13bff3c52b38393dc781daa6e38f27
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 2 21:08:46 2019 -0600

    t/op/qr.t: Don't use fancy apostrophe
    
    when the ASCII one will do.

commit 5cacf99d51ecb8ba6a2ebcc7a05cca517973dd6e
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 2 21:07:50 2019 -0600

    t/op/threads-dirh.t: Add ability to skip on memory constrained
    
    This ran out of memory on a very limited smoker; add a check for
    environment variable PERL_SKIP_BIG_MEM_TESTS being non-zero to skip
    it.

commit 5d340c0dff1c0ef03e2acb06d83e91a94b0849e2
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 2 21:06:50 2019 -0600

    t/re/bigfuzzy_not_utf8.t: Add ability to skip on memory constrained
    
    This test blew the memory on a very limited smoker; add a check
    for environment variable PERL_SKIP_BIG_MEM_TESTS being non-zero to skip
    these.

commit e59982955e95d22f70ad1119f8e072150165ea2f
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 2 21:05:02 2019 -0600

    t/re/pat.t: Add ability to skip on memory constrained
    
    A few tests were blowin the memory on a very limited smoker; add a check
    for environment variable PERL_SKIP_BIG_MEM_TESTS being non-zero to skip
    these.

commit 2ed9cd9019ab3c408b5b7e8dad64da76f619e1de
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 2 21:03:54 2019 -0600

    t/re/re_tests: Skip ASCII-centric test for EBCDIC
    
    Add a similar one for EBCDIC

commit cfda69717ede221d7a3df0974deac79553cffa40
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 2 21:02:44 2019 -0600

    win32/vdir.h: Use inRANGE macro
    
    which is more efficient.

commit 96c2cf42708b270b5a62cd18b474a782804cc89b
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 2 21:02:01 2019 -0600

    win32/win32.c: Use inRANGE macro
    
    which is more efficient.

commit 50541fc8a6adbf36f781171da7f292a7a3885fec
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 2 21:01:21 2019 -0600

    win32/win32io.c: Use inRANGE macro
    
    which is more efficient.

commit 15c7e4eceb39b412fe3d05157403a60f20dc165d
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 2 20:58:15 2019 -0600

    caretx.c: Use inRANGE()
    
    This is more efficient

commit ce1b88ff3b80f9807193931a38eb4b283a82d203
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 2 20:43:49 2019 -0600

    l1_char_class_tab.h: Remove some special EBCDIC cases
    
    These are no longer needed.

commit 692e610fac8138f8229c805aedf34527d48b24c3
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 2 20:42:19 2019 -0600

    utfebcdic.h: Move some #defines

commit 0cbd7e30d8496db35967e60b2a4b72b557e88e2f
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 2 20:08:24 2019 -0600

    Make defn of UTF_IS_CONTINUED common
    
    This can be derived from other values, removing an EBCDIC dependency

commit c142cb67e904426bd573736c1b5ae67d367cbc23
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 2 20:37:17 2019 -0600

    Make defn of UVCHR_IS_INVARIANT common
    
    This can be derived from other values, removing an EBCDIC dependency

commit f897e6ffc7abc9f7b9d0fd70b4c81be8be86f518
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 2 18:08:32 2019 -0600

    Make defn of OFFUNI_IS_INVARIANT common
    
    This can be derived from other values, removing an EBCDIC dependency

commit 9fee57bcbab0361d204b56075aeb8847a8cca1c3
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 2 18:03:26 2019 -0600

    Make defn of UTF8_IS_DOWNGRADEABLE_START common
    
    This can be derived from other values, removing an EBCDIC dependency

commit 3f0d59ea8b3b6a7feaa981db742c77f07bcccdd3
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 2 17:56:01 2019 -0600

    Make defn of UTF_IS_ABOVE_LATIN1 common
    
    This can be derived from other values, removing an EBCDIC dependency

commit f73ae121e048236a32b5eb31d60a31ba575fd2f8
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 2 17:52:34 2019 -0600

    Make defn of UTF8_IS_START common
    
    This can be derived from other values, removing an EBCDIC dependency

commit 76c9de5b3ee08e5562f26eaeeee87570ab9fa8b4
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 2 17:13:31 2019 -0600

    Make defn of UTF8_IS_CONTINUATION common
    
    This can be derived from other values, removing an EBCDIC dependency

commit 216873c6dbaf59a0baa8d877bd295745a6129a3b
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 2 17:07:50 2019 -0600

    Make defn of UTF_CONTINUATION_MARK common
    
    This can be derived from other values, removing an EBCDIC dependency

commit e7bfc74a64d3bfcab96ff66f245de63fc6344ba5
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 2 16:48:38 2019 -0600

    Make UTF_IS_CONTINUATION_MASK common
    
    This variable can be defined from the same base in both UTF-8 and
    UTF-EBCDIC, and doing so eliminates an EBCDIC dependency.

commit cf6e138a71378c5a1fdc931600ae14d33e1ff3a5
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 2 16:43:50 2019 -0600

    utf8.h: Add comment

commit f53aa87dcae4a0d1b4e054d5c9c4eeb60afa8eb2
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 2 16:40:44 2019 -0600

    utf8.h: Remove redundant cast
    
    The called macro does the cast already

commit 26dbade501a3fa222d3c17974c69826039360ee7
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 2 16:37:17 2019 -0600

    utf8.h: Make sure macros not called with a ptr
    
    By doing an '| 0' with a parameter in a macro expansion, a C syntax
    error will be generated.  This is free protection.

commit e81cf5797705f445ffd76676cb49a034c56e88f1
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 2 16:23:39 2019 -0600

    t/TEST: Test most of CPAN on EBCDIC
    
    CPAN was mostly skipped before because so many distros raised errors,
    but that is no longer true, so just skip about 10 that have big
    problems, and test the rest

commit acc6895ab412287d0440ebf2747655f6d30490c1
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 2 16:19:49 2019 -0600

    lib/charnames.t: Fix Named Sequence test for EBCDIC
    
    The file from Unicode needs to be translated to native

commit 8534deafabafbbab7dbaf76afdea645483c4bc56
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 2 16:13:31 2019 -0600

    mktables: Fix Named Sequences for EBCDIC
    
    This table wasn't being translated into native code points

commit 8225f4a0c60b41fdb1bd4617c0f0e385afef8a2a
Author: Karl Williamson <[email protected]>
Date:   Wed Jun 26 13:02:35 2019 -0600

    XXX Configure

commit 4ae00defd5122bfe845f6071a97d7adddfcdc95c
Author: Karl Williamson <[email protected]>
Date:   Fri Aug 30 10:31:51 2019 -0600

    ebcdic bridge alphas

-----------------------------------------------------------------------

-- 
Perl5 Master Repository

Reply via email to