In perl.git, the branch smoke-me/khw-regcomp has been created
<http://perl5.git.perl.org/perl.git/commitdiff/a6419ab3c527766b5d73ced0383af2ad02da2d29?hp=0000000000000000000000000000000000000000>
at a6419ab3c527766b5d73ced0383af2ad02da2d29 (commit)
- Log -----------------------------------------------------------------
commit a6419ab3c527766b5d73ced0383af2ad02da2d29
Author: Karl Williamson <[email protected]>
Date: Thu Dec 17 10:22:44 2015 -0700
later
M embed.fnc
M embed.h
M proto.h
M regcomp.c
commit 80c275e1da303ef981053d91e917f840641ffe35
Author: Karl Williamson <[email protected]>
Date: Wed Dec 16 13:24:45 2015 -0700
regcomp.h: Add comments
M regcomp.h
commit d9556cf12996a5e089c1936269807623ba4a7c14
Author: Karl Williamson <[email protected]>
Date: Wed Dec 16 12:06:46 2015 -0700
regex matching: Don't do unnecessary work
This commit sets a flag at pattern compilation time to indicate if
a rare case is present that requires special handling, so that that
handling can be avoided unless necessary.
M regcomp.c
M regcomp.h
M regexec.c
commit 9903b1324a1ccfb8b2875a482c06eab503c81593
Author: Karl Williamson <[email protected]>
Date: Wed Dec 16 11:40:18 2015 -0700
regcomp.h: Renumber 2 flag bits
This changes the spare bit to be adjacent to the LOC_FOLD bit, in
preparation for the next commit, which will use that bit for a
LOC_FOLD-related use.
M regcomp.h
commit d379953155efd6a7e716f55d832a910e1e7d8726
Author: Karl Williamson <[email protected]>
Date: Wed Dec 16 11:05:17 2015 -0700
regex: Free a ANYOF node bit
This is done by combining 2 mutually exclusive bits into one. I hadn't
seen this possibility before because the name of one of them misled me.
It also misled me into turning on one that flag unnecessarily, and to
miss opportunities to not have to create a swash at runtime. This
commit corrects those things as well.
M regcomp.c
M regcomp.h
M regexec.c
commit 3a504054f77ef8ee553e71740282d99785919a81
Author: Karl Williamson <[email protected]>
Date: Tue Dec 15 22:42:18 2015 -0700
regcomp.c: Move comments adjacent to their object
M regcomp.c
commit 4e3a2cb6de4a50ff16152075540809b96a592826
Author: Karl Williamson <[email protected]>
Date: Tue Dec 15 22:20:20 2015 -0700
regcomp.c: Try simplifications in some qr/[...]/d
Characters in a bracketed character class can come from a bunch of
sources, all bundled together. Some things under /d match only when the
target string is UTF-8; some match only when it isn't UTF-8. Other
sources may introduce characters match regardless. It may be that some
things are specified as conditionally matching from one source, and as
unconditionally matching from another. We can subtract the
unconditionals from the conditionals, leaving a simpler set of things
that must be conditionally matched. In some cases, the conditional set
may go to zero, allowing other optimizations to happen that otherwise
couldn't. An example is
qr/[\W\xAB]/
which before this commit compiled to:
ANYOFD[^0-9A-Z_a-z\x{80}-\x{AA}\x{AC}-\x{FF}][{non-utf8-latin1-all}
{utf8}0080-00A9 00AC-00B4 00B6-00B9 00BB-00BF 00D7 00F7
02C2-02C5...] (12)
and after it, compiles to
ANYOFD[^0-9A-Z_a-z\x{AA}\x{B5}\x{BA}\x{C0}-\x{D6}\x{D8}-\x{F6}
\x{F8}-\x{FF}][{non-utf8-latin1-all}{utf8}02C2-02C5...] (12)
Notice that the {utf8} component has been stripped of everything below
256. That means no swash has to be created at runtime when matching
code points below 256, unlike the case before this commit.
A starker example, though unlikely in real life except in
machine-generated code, is
qr/[\w\W]/
Before this commit, it would generate:
ANYOFD[\x{00}-\x{7F}][{non-utf8-latin1-all}{above_bitmap_all}
{utf8}0080-00FF]
and afterwards, simply:
SANY
M regcomp.c
commit f13a2edfb6181ac5381bdc14622a7c5289f4505c
Author: Karl Williamson <[email protected]>
Date: Tue Dec 15 21:46:42 2015 -0700
regcomp.c: Change variable name to be clearer
This name confused me, and led to suboptimal code. The new name is more
cumbersome, but won't confuse.
M regcomp.c
commit f6fab5abf63bf1fe0198696ec8e9d049e115b515
Author: Karl Williamson <[email protected]>
Date: Thu Nov 19 21:39:54 2015 -0700
regcomp.c: Make sure parse ptr positioned for err msgs
This modifies some macros to make sure that the <--HERE pointer is at a
character boundary, not beyond the input string.
M regcomp.c
commit b087be5291d6e1c25aa408277b5810124be116c1
Author: Karl Williamson <[email protected]>
Date: Thu Nov 19 20:51:04 2015 -0700
regcomp.c: Add 2 basic assertions
These should be true because an SV* should always have a trailing NUL,
but a lot of things in this code depend on it. It's worthwhile to point
that out; I wasn't sure it was true until I investigated. And an
assert() makes sure it is really true
M regcomp.c
commit 96342bcf260c454e960a5dd5c028456c432df7ca
Author: Karl Williamson <[email protected]>
Date: Tue Oct 20 22:23:00 2015 -0600
pp_hot.c: Add assertion
This will make the cause of any future failures more clear.
M pp_hot.c
commit d8e754cec070335bb1cff38368863fa425a01414
Author: Karl Williamson <[email protected]>
Date: Tue Oct 20 22:21:42 2015 -0600
perlapi: Clarify 'string' vs. buffer
A string strictly is NUL terminated, but our terminology is lax
M autodoc.pl
M handy.h
commit 967b25b1c64fdc95a526ab80495bf2b908fe2a4a
Author: Karl Williamson <[email protected]>
Date: Tue Oct 20 22:08:59 2015 -0600
utf8.h: Add 2 assertions
This makes sure in DEBUGGING builds that the macro is called correctly.
M utf8.h
commit 3c34a4a47fd61cb8c278155bb0fc4e8c5b2d26ba
Author: Karl Williamson <[email protected]>
Date: Fri Sep 11 12:37:27 2015 -0600
test.pl
M t/test.pl
-----------------------------------------------------------------------
--
Perl5 Master Repository