In perl.git, the branch smoke-me/khw-masked has been created

<https://perl5.git.perl.org/perl.git/commitdiff/b8d3d6be94068ca0adf73db3e909ab2d6c1ca694?hp=0000000000000000000000000000000000000000>

        at  b8d3d6be94068ca0adf73db3e909ab2d6c1ca694 (commit)

- Log -----------------------------------------------------------------
commit b8d3d6be94068ca0adf73db3e909ab2d6c1ca694
Author: Karl Williamson <k...@cpan.org>
Date:   Sun Feb 4 19:43:00 2018 -0700

    regcomp.c: Under/i segregate folding vs non-folding characters
    
    For matching sequences of characters, the regex compiler generates
    various flavors of EXACT-type nodes.  The optimizer uses those nodes to
    look for sequences that must be in the matched string.  In this way, the
    pattern matching engine may be able to quickly rule out any possible
    match altogether, or to narrow down the places in the target string that
    might match.
    
    Under /i matching, this generally has not been possible, because there
    is no fixed string that the optimizer can grab onto, as something can
    match, say, either 'A' or 'a', etc.  However, in many patterns that
    contain text, there are characters that are fixed even under /i.  Things
    like tabs, space, and punctuation, for example.
    
    This commit segregates such folding vs non-folding characters into
    separate nodes.  I proposed this 7 months ago:
    
    http://nntp.perl.org/group/perl.perl5.porters/245342
    
    and in talking with Yves recently, decided to go ahead with it.
    
    In the proposal of July, I suggested that a new node type be used to
    mark those nodes which are under /i but contain no characters that match
    other than themselves under /i.  These nodes could be joined with either
    a plain EXACT node, or an EXACTFish one to create longer nodes.
    
    The reason for joining is that there is overhead in the engine whenever
    we switch to the next node.  But the reason for not doing it is that it
    is slower to match /i nodes than ones that an memEQ can be used on.
    
    I suppose we could join short nodes, and leave longer ones separate, but
    that decision can be deferred based on real-world experience.
    
    This patch also consolidates into one place the handling of the Latin
    Sharp S, in order to avoid extra #ifdefs, and cause the logic to be
    linearly shown.

-----------------------------------------------------------------------

-- 
Perl5 Master Repository

Reply via email to