In perl.git, the branch smoke-me/khw-regex has been created
<http://perl5.git.perl.org/perl.git/commitdiff/fb484ce57776c883d329ed2d85e22b77e922290e?hp=0000000000000000000000000000000000000000>
at fb484ce57776c883d329ed2d85e22b77e922290e (commit)
- Log -----------------------------------------------------------------
commit fb484ce57776c883d329ed2d85e22b77e922290e
Author: Karl Williamson <[email protected]>
Date: Sat Oct 13 18:17:11 2012 -0600
regcomp.c: Don't set /i in start class unless /l
There is a deficiency in the optimizer in which it doesn't get rid of
flags that it should. One of these is if it should match /i or not.
Currently it always (perhaps not quite, I don't know) assumes that it
should match under /i, yielding false positives and slowing things down.
But a recent commit changed the flag that tells it to do this, so that it
only gets set if /l is also specified. There is already existing code to
work around the optimizer deficiency for /l. This commit just moves the
/i flag handling to that existing code, so it won't get invoked unless
/l is specified.
M regcomp.c
commit 152f1dc3abb20ad2c5a5460274d66f34d57d3aa6
Author: Karl Williamson <[email protected]>
Date: Sat Oct 13 10:00:18 2012 -0600
regexp.t: Add 'no warnings "utf8";
This .t works fine unless there are failures that it tries to output,
and the handle hasn't been opened using utf8. Because we aren't sure if
that operation works, just turn off warnings.
M t/re/regexp.t
commit 9acf8664c559ac0278089e7ef5735f69dc83d6b9
Author: Karl Williamson <[email protected]>
Date: Sat Oct 13 09:52:42 2012 -0600
utf8.h: Correct some values for EBCDIC
It occurred to me that EBCDIC has different maximums for the number of
bytes a character can occupy. This moves the definition in utf8.h to
within an #ifndef EBCDIC, and adds the correct values to utfebcdic.h
M utf8.h
M utfebcdic.h
commit 01291d4b6228f961f316413776bf5e3b2771d0a3
Author: Karl Williamson <[email protected]>
Date: Sat Oct 13 09:20:11 2012 -0600
regex: White-space, comment only; no code changes
This outdents code that just had its containing block removed, and
reflows its comments to fill 79 columns; and does some other white space
adjustments, plus a typo in a comment.
M regcomp.c
M regexec.c
M sv.c
commit 75622c5754c9604aa4015d822eb20cfcde91e244
Author: Karl Williamson <[email protected]>
Date: Sat Oct 13 09:15:37 2012 -0600
regex: Rename macro to reflect its narrowed use
This macro is now only used under locale; its other use has now been
removed. Change the name to reflect its only use.
M regcomp.c
M regcomp.h
M regexec.c
commit 2c4c6afadc3a5098363b1f1f3e68c15374d662b3
Author: Karl Williamson <[email protected]>
Date: Sat Oct 13 09:07:05 2012 -0600
regex: Splice out no longer used array element
A recent commit removed all uses of an array element in the middle of an
array. This moves up the elements that followed it.
M regcomp.c
M regexec.c
commit ae5247c09a68157ae08d338c41cd02c3b3d38d5d
Author: Karl Williamson <[email protected]>
Date: Sat Oct 13 08:49:26 2012 -0600
regex: Remove old code that tried to handle multi-char folds
A recent commit has changed the algorithm used to handle multi-character
folding in bracketed character classes. The old code is no longer
needed.
M embed.fnc
M embed.h
M proto.h
M regcomp.c
M regcomp.sym
M regexec.c
M regnodes.h
commit a65fd577e041093dd8e91d91c51f78b93402d85a
Author: Karl Williamson <[email protected]>
Date: Fri Oct 12 11:42:38 2012 -0600
regcomp.c: Fix-up indentaion; no code changes
Indent a newly-formed block
M regcomp.c
commit 03222e4baeb60fcc48d3d5519fc412d6ca319d3a
Author: Karl Williamson <[email protected]>
Date: Thu Oct 11 21:49:31 2012 -0600
PATCH: [perl #89774] multi-char fold + its fold in char class
The design for handling characters that fold to multiple characters when
the former are encountered in a bracketed character class is defective.
The ticket reads, "If a bracketed character class includes a character
that has a multi-char fold, and it also includes the first character of
that fold, the multi-char fold will never be matched; just the first
character of the fold.". Thus, in the class /[\0-\xff]/i, \xDF will
never be matched, because its fold is 'ss', the first character of
which, 's', is also in the class.
The reason the design is defective is that it doesn't allow for
backtracking and trying the other options.
This commit solves this by effectively rewriting the above to be
/ (?: \xdf | [\0-\xde\xe0-\xff] ) /xi. And so the backtracking gets
handled automatcially by the regex engine.
M embedvar.h
M intrpvar.h
M pod/perldelta.pod
M pod/perlre.pod
M pod/perlrecharclass.pod
M regcomp.c
M sv.c
M t/re/re_tests
commit 7f62429d27ea2645a9d3f340a322da39bf200309
Author: Karl Williamson <[email protected]>
Date: Fri Oct 12 11:24:34 2012 -0600
regen/mk_invlists.pl: Make list for multi-fold chars
This causes charclass_invlists.h to have a new list of all the
characters whose fold is a sequence of more than one character.
M charclass_invlists.h
M regen/mk_invlists.pl
commit b6546165754863fd8eb3bd2363c69047fd24e059
Author: Karl Williamson <[email protected]>
Date: Fri Oct 12 09:10:10 2012 -0600
mktables: Add table for chars with multi-char fold
This will be used in a later commit
M lib/unicore/mktables
commit ebefcf635ddccaae8224cd44688daefae51165a0
Author: Karl Williamson <[email protected]>
Date: Sat Oct 13 08:31:29 2012 -0600
regcomp.c: Rename a macro, fix-up comments
This very recently introduced macro's name could be clearer, and it can
be used in another place, and the comment concerning that is slightly
inaccurate.
M regcomp.c
-----------------------------------------------------------------------
--
Perl5 Master Repository