In perl.git, the branch smoke-me/khw-anyof has been created

<https://perl5.git.perl.org/perl.git/commitdiff/abce1e48ab079baffb02ae6c44a73dd56ca00b7d?hp=0000000000000000000000000000000000000000>

        at  abce1e48ab079baffb02ae6c44a73dd56ca00b7d (commit)

- Log -----------------------------------------------------------------
commit abce1e48ab079baffb02ae6c44a73dd56ca00b7d
Author: Karl Williamson <k...@cpan.org>
Date:   Thu Dec 6 17:05:50 2018 -0700

    f

commit 0883c204eb7c5f979db4ef57ecc3502317d8bb29
Author: Karl Williamson <k...@cpan.org>
Date:   Thu Dec 6 17:05:20 2018 -0700

    f

commit 4d2b534b9bdf5352eb504a7301cc0a14fb0874b0
Author: Karl Williamson <k...@cpan.org>
Date:   Thu Dec 6 16:57:17 2018 -0700

    regen/mk_invlists.pl: Add new table
    
    This table contains all the code points that are in any multi-character
    fold (not the folded-from character, but what that character folds to).
    
    It will be used in a future commit.

commit f1b6826efbd73bffc5c521ceb595c1da4cbf13b2
Author: Karl Williamson <k...@cpan.org>
Date:   Thu Dec 6 16:53:23 2018 -0700

    regen/mk_invlists.pl: Rmv no longer used array

commit 2530835317281049140c0ee7cdcde6e3312117d5
Author: Karl Williamson <k...@cpan.org>
Date:   Mon Nov 26 20:16:09 2018 -0700

    XXX need to do process; figure name Configure Fix alignment needed probe

commit b2e4f00ee2ccc0db0570a9d3b89c042be4bcb7ae
Author: Karl Williamson <k...@cpan.org>
Date:   Sun Dec 2 13:53:20 2018 -0700

    regcomp.c: Remove no longer used static function

commit deff9afede8e5d9f0cd20482df155856b5511224
Author: Karl Williamson <k...@cpan.org>
Date:   Thu Dec 6 13:25:13 2018 -0700

    f later

commit f3f885228835ec8cebfb781cdc0d62ff869a4f97
Author: Karl Williamson <k...@cpan.org>
Date:   Thu Dec 6 09:28:42 2018 -0700

    change engine size

commit 760f2beeb0d46964d30041c62e5426d899612f55
Author: Karl Williamson <k...@cpan.org>
Date:   Wed Nov 28 08:50:06 2018 -0700

    XXX tests: Revamp compile optimizations of /[bar]/

commit d37cbc3506fc7249c9cdb92ed42d22c0008aa187
Author: Karl Williamson <k...@cpan.org>
Date:   Wed Nov 28 08:40:29 2018 -0700

    regcomp.c: White-space, comments only

commit 364ecd24b6149eb96ed12bd6efb5e1addb1bc55b
Author: Karl Williamson <k...@cpan.org>
Date:   Wed Nov 28 08:13:31 2018 -0700

    regcomp.c: Add variable that is an OR of several
    
    This makes the code easier to read, as it summarizes the purposes of the
    three

commit 3be5bee79248f9c0932eba14c472dbaf9f7e24a0
Author: Karl Williamson <k...@cpan.org>
Date:   Tue Nov 27 12:15:56 2018 -0700

    regcomp.c: White space only
    
    Indent after the previous commit created a new outer loop

commit 45e229cc58918a54fce7056693af52816b608b9a
Author: Karl Williamson <k...@cpan.org>
Date:   Tue Nov 27 11:59:03 2018 -0700

    regcomp.c: Refactor looking for POSIX optimizations
    
    Instead of repeating the code, slightly modified, this uses a loop.
    This is in preparation for a future commit where a third instance would
    have been required

commit 1ba2862a7eb424fe835e70d2da7dcb868dfaf012
Author: Karl Williamson <k...@cpan.org>
Date:   Tue Nov 27 11:20:56 2018 -0700

    regcomp.c: Rename a variable
    
    The new name more accurately expresses the usage, as what gets generated
    may not actually be an ANYOFD.

commit 48cf5a25d86bd5d146782c4e61519e8df1633d1c
Author: Karl Williamson <k...@cpan.org>
Date:   Tue Nov 27 11:12:15 2018 -0700

    regcomp.c: Consolidate common code
    
    These flags can be set in one place, rather than in multiple ones.

commit e2d735e014bfa07bf4852e3284f88400b644aaa6
Author: Karl Williamson <k...@cpan.org>
Date:   Tue Nov 27 11:05:34 2018 -0700

    regcomp.c: Simplify ANYOFM node generation
    
    This refactors the code somewhat.  When we discover a deal-breaker code
    point we can just break out of the loop (using a goto) instead of
    setting a flag, continuing, and later testing it.

commit e16fed43284022b984fa0f1f33f848f210ddd9e8
Author: Karl Williamson <k...@cpan.org>
Date:   Tue Nov 27 10:51:46 2018 -0700

    regcomp.c: Don't zap larger scope variables
    
    It doesn't matter currently, but it's best to declare more limited scope
    variables for doing limited scope work, rather than using the more
    global variable, which someday might want to be used later, outside the
    block that zapped it, and would lead to a surprise.

commit 9316900986eda67f57f831e6bf72e5d88919044d
Author: Karl Williamson <k...@cpan.org>
Date:   Sat Nov 17 12:45:24 2018 -0700

    Remove ASCII/NASCII regnodes
    
    The ANYOFM/NANYOFM regnodes are generalizations of these.  They have
    more masks and shifts than the removed nodes, but not more branches, so
    are effectively the same speed.  Remove the ASCII/NASCII nodes in favor
    of having less code to maintain.

commit 9132356d52e6368d0d3597429cb629edf0bb9813
Author: Karl Williamson <k...@cpan.org>
Date:   Tue Nov 20 22:22:56 2018 -0700

    regcomp.c: Prefer ANYOF/NANYOFM regnodes
    
    These two regnodes are faster than regular /[[:posix:]]/ ones, and some
    of the latter are equivalent to some of the former.  So try the faster
    optimizations first.
    
    This commit just swaps the two blocks of code, and outdents
    appropriately

commit 09f7773dc48ca02dfac98669155f4eb36e6d8874
Author: Karl Williamson <k...@cpan.org>
Date:   Tue Dec 4 09:58:13 2018 -0700

    regcomp.c: Allow more EXACTFish nodes to be trieable
    
    The previous two commits fixed bugs where it would be possible during
    optimization to join two EXACTFish nodes together, and the result would
    not work properly with LATIN SMALL LETTER SHARP S.  But by doing so,
    the commits caused all non-UTF-8 EXACTFU nodes that begin or end with
    [Ss] from being trieable.
    
    This commit changes things so that the only the ones that are
    non-trieable are the ones that, when joined, have the sequence [Ss][Ss]
    in them.  To do so, I created three new node types that indicate if the
    node begins with [Ss] or ends with them, or both.  These preclude having
    to examine the node contents at joining to determine this.  And since
    there are plenty of node types available, it seemed the best choice.
    But other options would be available should we run out of nodes.
    Examining the first and final characters of a node is not expensive, for
    example.

commit a11df35ea2a3530611a9ddaa088ab26e56015716
Author: Karl Williamson <k...@cpan.org>
Date:   Fri Nov 30 09:31:46 2018 -0700

    regcomp.c: Make sure /di nodes begining in 's' are EXACTF
    
    This is defensive coding.  The previous commit changed things so under
    /di a node ending in [Ss] doesn't get made an EXACTFU.  This commit does
    the same for nodes that begin with [Ss].  This isn't actually necessary
    as one needs two EXACTFU nodes in a row for the problem to occur, and
    the previous commit appears to remove the possibility for the first node
    being an EXACTFU.  But I'm leery of relying on this.  So this commit
    makes sure that a node beginning with 'S' or 's' under /di remains
    EXACTF

commit d55a68e43bd28f297c456c915a73d6cbc9caa28e
Author: Karl Williamson <k...@cpan.org>
Date:   Thu Nov 29 20:43:32 2018 -0700

    regcomp.c: Make sure /di nodes ending in 's'  are EXACTF
    
    Prior to this commit only nodes that filled fully were guaranteed not to
    be upgraded to EXACTFU.
    
    EXACTF nodes are used when /u rules aren't to be in effect unless the
    string being matched again is in UTF-8.  EXACTFU nodes are used when the
    /u rules are to be used always.  The regex compilation keeps track of
    what's in an EXACTFish node, and if possible uses EXACTFU even under /d.
    It does this because EXACTFU nodes are trieable, and are faster at
    runtime due to not having to check the UTF-8ness of the target string,
    and that it also folds the pattern at compile time, avoiding that step
    at runtime.  If what a node matches is the precise same thing under /d
    and /u, whether the node is EXACTF or EXACTFU is irrelevant, so is
    changed to the more desirable EXACTFU.
    
    The sequences 'ss', 'SS', 'Ss', 'sS'  are very tricky, for several
    reasons.  For this commit, the trickiness lies in that they are the
    only sequences that can match a target string differently under /ui
    rules than /di, with both not being encoded in UTF-8.  In all other
    cases, one or the other must be UTF-8 for there to be a difference.
    
    The code has long taken special care for these sequences, but overlooked
    two fairly obscure cases where it matters.  This commit addresses one of
    those cases; the next commit, the other.
    
    Because these  are sequences, it might be possible for it to be split
    across two EXACTFish nodes, so that the first 's' or 'S' is the last
    thing in the first node, and the second 's' or 'S' is the first thing in
    the second.  Should these nodes get joined during optimization, the
    sequence beccomes obvious.  The code has long recognized this
    possibility if the first node gets filled up, necessitating a split, and
    it doesn't make the first node EXACTFU if it ends in 's' or 'S'.  But we
    don't fill those nodes completely, and optimization can join two
    together.  (It would require some extra work to fully pack them, which
    is possible to do; but hasn't ever been done.  They are more fully
    packed than they used to be.)
    
    But future commits in the pipeline will join nodes in more cases during
    optimization, and so, we need to not create an EXACTFU for trailing 's'
    or 'S' always, not just if the first node fills up.  This commit moves
    the code that accomplishes this so it always gets executed at node end
    of /di nodes.

commit d408e6259242a82865faaa3dbde383183dda6be8
Author: Karl Williamson <k...@cpan.org>
Date:   Sun Dec 2 12:27:46 2018 -0700

    regcomp.c: Add assertion

commit 9be035a8b11b2250cb6604216761e74bf3b6221e
Author: Karl Williamson <k...@cpan.org>
Date:   Sun Dec 2 17:39:05 2018 -0700

    regcomp.c: Simplify a bit of code
    
    By using a macro with a slightly different API, we don't have to mess
    with the parse pointer.

commit 05a8b02e6855967844bcba239521c76df805b0f0
Author: Karl Williamson <k...@cpan.org>
Date:   Sun Dec 2 12:38:27 2018 -0700

    Remove one use of static function
    
    Previous commits in 5.29 have removed all but two call to this function,
    and the remaining ones take radically different paths in it, with very
    little common code.  It simplifies things if we expand each call to the
    code that gets evaluated.  This commit does one; the next commit, the
    other.
    
    The need for an #ifdef is removed by adding a flag and setting it in an
    existing #ifdef.

commit 408ca8acc70549ddeeee3384f5eb5c5d02dc195d
Author: Karl Williamson <k...@cpan.org>
Date:   Sun Dec 2 13:05:47 2018 -0700

    regcomp.c: Use simpler variable name as long as possible
    
    This just extends the use of a variable name a little longer, as it's
    easier to read than the nested macro calls that eventually have to be
    used.

commit de083f8cc63f528e8fa39427b19949cacce9f479
Author: Karl Williamson <k...@cpan.org>
Date:   Sun Dec 2 12:22:39 2018 -0700

    regcomp.c: Prefer one of similarly named vars
    
    These two variables are similarly named, but have slightly different
    purposes.  Comment out one of them, and convert to using the other
    consistently

commit 7f0f8d404d73e61160a688348f5ddffff2e9705a
Author: Karl Williamson <k...@cpan.org>
Date:   Sun Dec 2 11:46:07 2018 -0700

    t/re/anyof.t: Remove duplicate test case

commit 99620b744afc9a04dd8efe25fc76f27db5b570f9
Author: Karl Williamson <k...@cpan.org>
Date:   Sun Dec 2 11:17:26 2018 -0700

    Use consistent spelling in qr// dumping
    
    Under -Dr (or use re 'Debug') the compiled regex engine program is
    displayed.  I noticed that it used two different spellings for
    'infinity'.  This commit changes so only one is used, the one that has
    been in the field the longest.

commit 5ed704648680590f57b5e6278a2750427c901ce4
Author: Karl Williamson <k...@cpan.org>
Date:   Mon Dec 3 11:33:49 2018 -0700

    regcomp.c: Can join certain EXACTish node types
    
    The optimization phase of regular expression pattern compilation looks
    for adjacent EXACTish nodes and joins them if they are the same flavor
    of EXACT.  Commits a9f8c7ac75c364c3e05305718f38c5f8ccd935d8 and
    f6b4b99d2e584fbcd85eeed475eea10b87858e54 introduced two new nodes
    that are so close to existing flavors that they are joinable with their
    respective flavor.  This commit does that.

commit 22bd677ff47c7a032784728e90dd41b283a22d9f
Author: Karl Williamson <k...@cpan.org>
Date:   Sun Dec 2 17:45:06 2018 -0700

    regcomp.c: Move clause of while() conditional into loop
    
    This is in preparation for making the conditional more complicated than
    can be easily done in the condition.

commit 3932608dbb1b96ec4ded560508a23bb714e8101a
Author: Karl Williamson <k...@cpan.org>
Date:   Tue Dec 4 18:22:05 2018 -0700

    regcomp.c: Clarify comment

commit 7b102b2b79dbdf17c0bc18e8206ed2b3514191a9
Author: Karl Williamson <k...@cpan.org>
Date:   Tue Dec 26 18:09:08 2017 -0700

    XXX don't push, khw customization for bench.pl

-----------------------------------------------------------------------

-- 
Perl5 Master Repository

Reply via email to