In perl.git, the branch smoke-me/khw-optimizer has been created
<http://perl5.git.perl.org/perl.git/commitdiff/596608ffd648c76a783af092d349586436e55010?hp=0000000000000000000000000000000000000000>
at 596608ffd648c76a783af092d349586436e55010 (commit)
- Log -----------------------------------------------------------------
commit 596608ffd648c76a783af092d349586436e55010
Author: Karl Williamson <[email protected]>
Date: Thu Sep 5 22:40:54 2013 -0600
Enlarge dummy regex pass1 compilation node
In pass 1 of compiling regular expressions, the needed size is
calculated. There is space allocated for a scratch node that can be
used for the things that the real one will hold in pass 2. It is valid
only while working on the current node, and gets overwritten in the next
node.
Until this commit, this scratch space was sized only for the smallest
node type, meaning that larger types could not use it for scratch. Now
it is sized to be the largest non EXACTish node.
We could make it an array of 256 + overhead bytes instead to be able to
hold the EXACTish nodes, but I don't see a need for that now.
M regcomp.c
M regcomp.h
commit 22841c083d4b5fd8459a36626b90c15527cbb006
Author: Karl Williamson <[email protected]>
Date: Thu Aug 15 15:27:08 2013 -0600
regcomp.c: Use STR_WITH_LEN to avoid bookkeeping
By changing the order of the parameters to the static function
S_add_data, we can call it with STR_WITH_LEN and avoid a human having to
count characters.
M embed.fnc
M proto.h
M regcomp.c
commit adce5e2e5a87c7531477f7945d45a64e22718494
Author: Karl Williamson <[email protected]>
Date: Thu Aug 15 15:07:44 2013 -0600
Rename regex flag bit for clarity
ANYOF_UNICODE_ALL doesn't mean every Unicode code point. It means those
above the Latin1 range. Rename it, while retaining the old one for back
compat.
M regcomp.c
M regcomp.h
M regexec.c
commit c647635de8d658a09874eb9d381570e4c0def382
Author: Karl Williamson <[email protected]>
Date: Thu Aug 15 14:55:16 2013 -0600
regcomp.c: Better DEBUGGING builds error detection
The code had a default: catch-all in the switch statement, but the
comments indicated that it was uncertain what all was being caught.
This changes this to panic only in DEBUGGING builds so that we can find
out if there are indeed other possibilities that we haven't handled, and
which could use better handling than the default, match everything.
The two known possibilities are given separate case: statements in
preparation for handling them differently.
M regcomp.c
commit e12338af9827ba0bb9af49d827699672e1063563
Author: Karl Williamson <[email protected]>
Date: Thu Aug 15 14:49:37 2013 -0600
regcomp.c: Change some static parameters to const
I found I needed const in a planned future commit.
M embed.fnc
M proto.h
M regcomp.c
commit 8b1e55b7ad70633625975e188b1d72d107ca7e96
Author: Karl Williamson <[email protected]>
Date: Thu Aug 15 14:27:53 2013 -0600
Retain an inversion list's mortality in its replacement
A couple of inversion list handling functions end up sometimes creating
a new inversion list, replacing the old one instead of modifying it.
This commit causes the replacement list to have the same mortality or
not of the old one. That is, mortality is now preserved across these
operations.
M regcomp.c
commit bd13e52626d5e1a6dd079bfe8a81504b52085558
Author: Karl Williamson <[email protected]>
Date: Thu Aug 15 14:04:43 2013 -0600
perl.c: Clean up some SV*s at termination
These were omitted from cleaning up when PERL_DESTRUCT_LEVEL is non-zero
M perl.c
commit aae90a9c3204f65ce0ec49239abb60cfb1b584fb
Author: Karl Williamson <[email protected]>
Date: Thu Aug 15 11:19:02 2013 -0600
regcomp.c: Add parameter to static function
This parameter will be used in future commits. This commit is really
only to make the difference listing smaller in those, by committing
separately just the book-keeping parts. This parameter requires also
passing the aTHX_ thread parameter
M embed.fnc
M embed.h
M proto.h
M regcomp.c
commit 2e56a64eaf0804a17cedc29b5886a92990e20e77
Author: Karl Williamson <[email protected]>
Date: Thu Aug 15 10:59:01 2013 -0600
Remove PL_ASCII; use existing array slots for it
PL_ASCII contains an inversion list to match the ASCII-range code
points. It is unusable outside the core regular expression code because
all the functions that manipulate inversion lists are defined only
within a few core files. Therefore no outside code should be depending
on it.
It turns out that there are arrays of similar inversion lists, and these
all have slots which should have this inversion list in them. This
commit fills them, instead of using PL_ASCII.
M embedvar.h
M intrpvar.h
M regcomp.c
M sv.c
commit 1a53a6857217f98bf8777dc20f512531d1df8a97
Author: Karl Williamson <[email protected]>
Date: Thu Aug 15 10:51:24 2013 -0600
regcomp.c: Typos in comments; Fix another comment
The non-typo fix is the result of allowing a parameter to the function
be NULL, and not updating the comments to reflect that.
M regcomp.c
commit 2f99c5022cafb68ac83a5b074573e10e104235dd
Author: Karl Williamson <[email protected]>
Date: Thu Aug 15 10:39:14 2013 -0600
regcomp.c: Fix syntax error in #ifdef'd out code
This line is currently not compiled, but would fail if the #ifdef is
changed.
M regcomp.c
commit 986bebd9a1b97fc2eef8350630536b299af1594e
Author: Karl Williamson <[email protected]>
Date: Thu Aug 15 10:36:29 2013 -0600
perl.h: Don't pollute global namespace
These structures are used internally in the regular expression files,
and are declared here only because of #include ordering issues. Wrap
them in an #ifdef so only visible to the correct files.
M perl.h
commit 7bb309bbde214055eb8abe7d5cd2b4998a8cc954
Author: Karl Williamson <[email protected]>
Date: Wed Aug 14 21:13:52 2013 -0600
Make typedef fully typedef
The regcomp.c struct RExC_state_t has not been usable fully as a
typedef, requiring the 'struct' at times. This has caused me, and I
presume others, wasted time when we forget to use it under those
circumstances when it should be used, but it's never been a big enough
issue to cause me to spend tuits on it. But, working on something else,
I finally came to the realization of what the problem is. It is because
proto.h is #included before regcomp.h is, and so functions that are
declared in proto.h that have something that is a RExC_state_t as a
parameter don't know that it is a typedef because that is defined in
regcomp.h. A way around this is already used for other similar
structures, and that is to declare them in perl.h which is always read
in before proto.h, leaving the definitions to regcomp.h. Thus proto.h
knows enough to compile.
The structure was already declared in perl.h; just not typedef'd.
Otherwise proto.h would not know about it at all. This patch moves two
regcomp.c related declarations in perl.h to the same section as the
others, and changes the one for RExC_state_t to be a typedef. All the
'struct' uses are removed.
M embed.fnc
M embed.h
M perl.h
M proto.h
M regcomp.c
commit a5599705c3f13f07e5fb6d4f815a265bb88cbde0
Author: Karl Williamson <[email protected]>
Date: Wed Aug 14 11:39:38 2013 -0600
regcomp.h: Create new typedef synonym for clarity
This commit finishes (at least for now) removing some of the overloading
of the term class. A 'regnode_charclass_class' node contains space for
storing the posix classes it matches that are never defined until the
moment of matching because they are subject to the current run-time
locale. This commit creates a typedef 'regnode_charclass_posixl'
synonym that doesn't re-use the term 'class' for two different purposes.
M perl.h
M regcomp.h
commit 569ebcabc9183845c56e5be9c4761f255c0d8610
Author: Karl Williamson <[email protected]>
Date: Fri Aug 9 12:21:53 2013 -0600
regcomp.h: Parenthesize macro formal parameter
Not doing so can cause problems, so it is standard procedure to
parenthesize all parameters within a macro definition.
M regcomp.h
commit 3703cd2e303cdc4e246ceb9878eff7738a643a54
Author: Karl Williamson <[email protected]>
Date: Fri Aug 9 11:51:09 2013 -0600
regcomp.h: Add better named synonyms
This continues the process started two commits ago of removing some of
the overloading of the term 'class'.
In this case, this commit adds some #defines referring to the portions
of the regnode associated with bracketed character classes, the ANYOF
node. Specifically those portions that deal with the Posix character
classes, like \w and [:punct:] under /l (locale) matching are renamed
substituting POSIXL for CLASS. POSIXL is already used for POSIX-related
things under /l. I remember being terribly confused when I started
reading this code about this. One had a class within a class. This
should clarify things somewhat.
The old names are retained in case files outside the core #include and
use it (there are a few such in cpan).
M regcomp.c
M regcomp.h
M regexec.c
commit b9cc34f62b9a7f0ae29fd43277a700f95438b484
Author: Karl Williamson <[email protected]>
Date: Tue Aug 6 21:41:53 2013 -0600
regcomp.h: Move #define
This moves it to be adjacent to similar #defines
M regcomp.h
commit 0959196ae78a56b19da0030c9ca52503e19527df
Author: Karl Williamson <[email protected]>
Date: Wed Aug 14 11:19:18 2013 -0600
regcomp.c: Change names of some static functions
The term 'class' is very overloaded in regex code and documentation.
perlrecharclass.pod calls the dot (matching any char) a class, and
calls the [] form "bracketed character classes". There are other
meanings as well. This is the first commit in a short series that
removes some of those overloadings.
One instance of class is the "synthetic start class", generated by the
regex optimizer to be a list of all the code points a sucessful match
could possibly start with. This is useful in more quickly finding where
to start looking in matching against a target string. Prior to this
commit, the routines that referred to this began with 'cl_', and the
formal parameters were 'cl', which could mean any class. This commit
changes those instances of 'cl' to 'ssc' to indicate this is the only
type of class that is being handled.
M embed.fnc
M embed.h
M proto.h
M regcomp.c
commit fc58215136c278baef8a7147ba97adf68db907eb
Author: Karl Williamson <[email protected]>
Date: Wed Aug 14 10:01:53 2013 -0600
regcomp.c: Rework static function call; comments
The previous commit just extracted out code into a function. This
commit renames a parameter for clarity, combines two parameters to make
the interface cleaner, and adds and moves comments around.
M embed.fnc
M embed.h
M proto.h
M regcomp.c
commit ddd7bb8b9cbf24c497f3c875e4524029afe672d6
Author: Karl Williamson <[email protected]>
Date: Wed Aug 14 11:09:58 2013 -0600
regcomp.c: Extract code into separate function
A future commit will use this functionality from another place. For
now, just cut and paste, and do the minimal ancillary work to get it to
compile and pass.
M embed.fnc
M embed.h
M proto.h
M regcomp.c
commit 91187f2c3f43f73fe189fec7f8a053021edde6b5
Author: Karl Williamson <[email protected]>
Date: Fri Aug 2 12:33:07 2013 -0600
regcomp.c: Use PL_sv_undef instead of NULL in an AV
The NULL gets turned into an SVt_NULL anyway. This array is read only
by S_core_regclass_swash() in regexec.c. That uses an SvROK, so it
doesn't have to change.
This commit also beefs up the comments around this operation
M regcomp.c
commit a8721538ca5df4589e4cec7921f7162ffa6476d9
Author: Karl Williamson <[email protected]>
Date: Thu Aug 1 14:49:29 2013 -0600
Add regnode struct for synthetic start class
As part of extending the regular expression optimizer to properly handle
above Latin1 code points, I need an inversion list to contain which code
points the synthetic start class (ssc) matches.
The ssc currently is the same as a locale-aware ANYOF node, which uses
the struct of a regular ANYOF node, plus some extra fields at the end.
This commit creates a new typedef for ssc use, which is the locale-aware
ANYOF node, plus an extra SV* at the end to hold the inversion list.
M embed.fnc
M embed.h
M perl.h
M proto.h
M regcomp.c
M regcomp.h
commit bbaaa96b1ec81c8c2c5b4d01e71a93f85acc13d7
Author: Karl Williamson <[email protected]>
Date: Wed Jul 24 19:56:24 2013 -0600
regcomp.c: Move a #define, add a similar one
Future commits will use this #define (and the new one) earlier in the
file than currently defined.
M regcomp.c
commit d163fa5e952b4844e9e35ce6a7bece3aa38a9030
Author: Karl Williamson <[email protected]>
Date: Tue Jul 23 10:01:29 2013 -0600
Add inversion list for U+80 - U+FF
This is the upper half of the Latin1 range. This simplifies some code
very slightly, but will be of use in future commits.
M charclass_invlists.h
M embedvar.h
M intrpvar.h
M regcomp.c
M regen/mk_invlists.pl
M sv.c
commit 3f9b1a8ecbff94970595af1b586e4f1c41fdccc6
Author: Karl Williamson <[email protected]>
Date: Sun Jul 21 21:13:38 2013 -0600
regcomp.c: Extract code into separate function
This is in preparation for it to be called from more than one place, in
a future commit.
M embed.fnc
M embed.h
M proto.h
M regcomp.c
commit 942d2c65f6279ee529fd2fea9d37ad8dad8cbfad
Author: Karl Williamson <[email protected]>
Date: Sun Jul 21 10:10:56 2013 -0600
regcomp.c: Remove redundant matching possibilities
The flag ANYOF_UNICODE_ALL is for performance. It is set when the
inversion list for the ANYOF node includes every code point above
Latin1, and avoids runtime searching through the list. We don't need
both, as the flag being set short-circuits even looking at the other
list. By removing the code points from the list, we perhaps will get
rid of the list entirely, thus saving some operations, or will shorten
it so that later binary searches run faster.
M regcomp.c
commit ba34c43bc53980b70d31bc1efbb35b6c0a27e9b8
Author: Karl Williamson <[email protected]>
Date: Sun Jul 21 08:21:34 2013 -0600
regcomp.c: Centralize assignment
It's better to do something in one common place than two. This properly
initializes the regex opcode for the synthetic start class when it is
created, rather than at the end where the code has to be repeated to get
all instances.
M regcomp.c
-----------------------------------------------------------------------
--
Perl5 Master Repository