In perl.git, the branch smoke-me/khw-foldbug has been created

<http://perl5.git.perl.org/perl.git/commitdiff/2ab171cf81a5b3a36eeb116587a7b6d28135f0f7?hp=0000000000000000000000000000000000000000>

        at  2ab171cf81a5b3a36eeb116587a7b6d28135f0f7 (commit)

- Log -----------------------------------------------------------------
commit 2ab171cf81a5b3a36eeb116587a7b6d28135f0f7
Author: Karl Williamson <[email protected]>
Date:   Tue Oct 16 12:09:04 2012 -0600

    regex: \R can match either 1 or 2 chars
    
    Therefore it is not "simple", and should not be compiled as such,
    causing things like the test added herein to fail.

M       regcomp.c
M       regexec.c
M       t/re/re_tests

commit 90a60a02304506845f078e60e55bc1d337161a79
Author: Karl Williamson <[email protected]>
Date:   Tue Oct 16 11:12:22 2012 -0600

    regcomp.c: Pass NULL instead of &dummy to function
    
    This saves the function from setting a throw-away value

M       regcomp.c

commit 5ae6aaffdfaa94b4ac1163cdd5082bd47d847569
Author: Karl Williamson <[email protected]>
Date:   Tue Oct 16 11:11:11 2012 -0600

    regcomp.c, regexec.c: Comments only; no code changes

M       regcomp.c
M       regexec.c

commit edc7efa02154c13b6f9fa4577666d22a64bd52ef
Author: Karl Williamson <[email protected]>
Date:   Tue Oct 16 11:09:52 2012 -0600

    regexec.c: White-space only; no code changes
    
    This indents a newly-formed block correctly

M       regexec.c

commit 6d51eb4434dcd640fadef635e7855bf3ac4ce48c
Author: Karl Williamson <[email protected]>
Date:   Tue Oct 16 10:56:28 2012 -0600

    regexec.c: Tighten loops in regrepeat()
    
    regrepeat() is used to match some simple thing repeatedly in a row.  In
    the case of EXACTFish nodes, it will repeat a single character (and its
    fold).  Prior to this commit, it was using the full generality of
    foldEQ_utf8() whenever the target was encoded in UTF-8.  This full
    generality requires quite a bit of processing.  However, most
    Unicode folds are of the simple variety containing just a character and
    its upper- or lower-cased equivalent, and so the full generality of
    foldEQ_utf8() is needed only comparatively infrequently.
    
    This commit takes advantage of the newly added and enhanced
    S_setup_EXACTISH_ST_c1_c2() to look at the character being repeated and
    decide what level of generality is needed.  regrepeat() then uses a loop
    that is only as complicated as needed.
    
    This also adds some asserts that the nodes contain exactly 1 character

M       regexec.c

commit af06cc7218e913f3db41fe417fa4adaa06c6a98d
Author: Karl Williamson <[email protected]>
Date:   Tue Oct 16 10:17:01 2012 -0600

    regexec: Do less work on quantified UTF-8
    
    Consider the regexes /A*B/ and /A*?B/ where A and B are arbitrary,
    except that B begins with an EXACTish node.  Prior to this patch, as a
    shortcut, the loop for accumulating A* would look for the first character
    of B to help it decide if B is a possiblity for the next thing.  It did
    not test for all of B unless testing showed that the next thing could be
    the beginning of B.  If the target string was UTF-8, it converted each
    new sequence of bytes to the code point they represented, and then did
    the comparision.  This is a relative expensive process.
    
    This commit avoids that conversion by just doing a memEQ at the current
    input position.  To do this, it revamps S_setup_EXACTISH_ST_c1_c2() to
    output the UTF-8 sequences to compare against.  The function also has
    been tightened up so that there are fewer false positives.

M       regexec.c
M       regexp.h
M       utf8.c

commit 04694875bc1091186f030f7f269cd4f943f12ef6
Author: Karl Williamson <[email protected]>
Date:   Tue Oct 16 09:58:24 2012 -0600

    utf8.h: Add guard against recursive #include
    
    A future commit will #include this from another header

M       utf8.h

commit 938a71825d437fa4a30595cd32db6091ef3ffa1c
Author: Karl Williamson <[email protected]>
Date:   Tue Oct 16 10:45:44 2012 -0600

    regen/regcharclass.pl: Change name of generated macro
    
    This changes the macro isMULTI_CHAR_FOLD() (non-utf8 version) from just
    generating ascii-range code points to generating the full Latin1 range.
    However there are no such non-ASCII values, so the macro expansion is
    unchanged.  By changing the name, it becomes clearer in future commits
    that we aren't excluding things that we should be considering.

M       regcharclass.h
M       regcomp.c
M       regen/regcharclass.pl
M       regen/regcharclass_multi_char_folds.pl

commit 0621ef8b608392ff3d0cc05d26c835bb8d4a9e09
Author: Karl Williamson <[email protected]>
Date:   Tue Oct 9 13:34:08 2012 -0600

    regexec.c: Change variable name
    
    This actually is a pointer to the pattern string, not to a byte.

M       regexec.c

commit b52a2fb28f70476b51463c46e7a5c6554085332d
Author: Karl Williamson <[email protected]>
Date:   Tue Oct 9 13:32:12 2012 -0600

    regexp.h: Update comments
    
    These comments should have been changed in commit
    c74f6de970ef0f0eb8ba43b1840fde0cf5a45497, but were mistakenly omitted.

M       regexp.h
-----------------------------------------------------------------------

--
Perl5 Master Repository

Reply via email to