In perl.git, the branch smoke-me/khw-foldbug has been created
<http://perl5.git.perl.org/perl.git/commitdiff/2ab171cf81a5b3a36eeb116587a7b6d28135f0f7?hp=0000000000000000000000000000000000000000>
at 2ab171cf81a5b3a36eeb116587a7b6d28135f0f7 (commit)
- Log -----------------------------------------------------------------
commit 2ab171cf81a5b3a36eeb116587a7b6d28135f0f7
Author: Karl Williamson <[email protected]>
Date: Tue Oct 16 12:09:04 2012 -0600
regex: \R can match either 1 or 2 chars
Therefore it is not "simple", and should not be compiled as such,
causing things like the test added herein to fail.
M regcomp.c
M regexec.c
M t/re/re_tests
commit 90a60a02304506845f078e60e55bc1d337161a79
Author: Karl Williamson <[email protected]>
Date: Tue Oct 16 11:12:22 2012 -0600
regcomp.c: Pass NULL instead of &dummy to function
This saves the function from setting a throw-away value
M regcomp.c
commit 5ae6aaffdfaa94b4ac1163cdd5082bd47d847569
Author: Karl Williamson <[email protected]>
Date: Tue Oct 16 11:11:11 2012 -0600
regcomp.c, regexec.c: Comments only; no code changes
M regcomp.c
M regexec.c
commit edc7efa02154c13b6f9fa4577666d22a64bd52ef
Author: Karl Williamson <[email protected]>
Date: Tue Oct 16 11:09:52 2012 -0600
regexec.c: White-space only; no code changes
This indents a newly-formed block correctly
M regexec.c
commit 6d51eb4434dcd640fadef635e7855bf3ac4ce48c
Author: Karl Williamson <[email protected]>
Date: Tue Oct 16 10:56:28 2012 -0600
regexec.c: Tighten loops in regrepeat()
regrepeat() is used to match some simple thing repeatedly in a row. In
the case of EXACTFish nodes, it will repeat a single character (and its
fold). Prior to this commit, it was using the full generality of
foldEQ_utf8() whenever the target was encoded in UTF-8. This full
generality requires quite a bit of processing. However, most
Unicode folds are of the simple variety containing just a character and
its upper- or lower-cased equivalent, and so the full generality of
foldEQ_utf8() is needed only comparatively infrequently.
This commit takes advantage of the newly added and enhanced
S_setup_EXACTISH_ST_c1_c2() to look at the character being repeated and
decide what level of generality is needed. regrepeat() then uses a loop
that is only as complicated as needed.
This also adds some asserts that the nodes contain exactly 1 character
M regexec.c
commit af06cc7218e913f3db41fe417fa4adaa06c6a98d
Author: Karl Williamson <[email protected]>
Date: Tue Oct 16 10:17:01 2012 -0600
regexec: Do less work on quantified UTF-8
Consider the regexes /A*B/ and /A*?B/ where A and B are arbitrary,
except that B begins with an EXACTish node. Prior to this patch, as a
shortcut, the loop for accumulating A* would look for the first character
of B to help it decide if B is a possiblity for the next thing. It did
not test for all of B unless testing showed that the next thing could be
the beginning of B. If the target string was UTF-8, it converted each
new sequence of bytes to the code point they represented, and then did
the comparision. This is a relative expensive process.
This commit avoids that conversion by just doing a memEQ at the current
input position. To do this, it revamps S_setup_EXACTISH_ST_c1_c2() to
output the UTF-8 sequences to compare against. The function also has
been tightened up so that there are fewer false positives.
M regexec.c
M regexp.h
M utf8.c
commit 04694875bc1091186f030f7f269cd4f943f12ef6
Author: Karl Williamson <[email protected]>
Date: Tue Oct 16 09:58:24 2012 -0600
utf8.h: Add guard against recursive #include
A future commit will #include this from another header
M utf8.h
commit 938a71825d437fa4a30595cd32db6091ef3ffa1c
Author: Karl Williamson <[email protected]>
Date: Tue Oct 16 10:45:44 2012 -0600
regen/regcharclass.pl: Change name of generated macro
This changes the macro isMULTI_CHAR_FOLD() (non-utf8 version) from just
generating ascii-range code points to generating the full Latin1 range.
However there are no such non-ASCII values, so the macro expansion is
unchanged. By changing the name, it becomes clearer in future commits
that we aren't excluding things that we should be considering.
M regcharclass.h
M regcomp.c
M regen/regcharclass.pl
M regen/regcharclass_multi_char_folds.pl
commit 0621ef8b608392ff3d0cc05d26c835bb8d4a9e09
Author: Karl Williamson <[email protected]>
Date: Tue Oct 9 13:34:08 2012 -0600
regexec.c: Change variable name
This actually is a pointer to the pattern string, not to a byte.
M regexec.c
commit b52a2fb28f70476b51463c46e7a5c6554085332d
Author: Karl Williamson <[email protected]>
Date: Tue Oct 9 13:32:12 2012 -0600
regexp.h: Update comments
These comments should have been changed in commit
c74f6de970ef0f0eb8ba43b1840fde0cf5a45497, but were mistakenly omitted.
M regexp.h
-----------------------------------------------------------------------
--
Perl5 Master Repository