In perl.git, the branch smoke-me/khw-readonly_invlists has been created
<http://perl5.git.perl.org/perl.git/commitdiff/89dc9015ae90b224aab8ca73016734b832b88c57?hp=0000000000000000000000000000000000000000>
at 89dc9015ae90b224aab8ca73016734b832b88c57 (commit)
- Log -----------------------------------------------------------------
commit 89dc9015ae90b224aab8ca73016734b832b88c57
Author: Karl Williamson <[email protected]>
Date: Tue Jan 7 10:41:55 2014 -0700
regexec.c: White-space only
Align a macro continuation backslash
M regexec.c
commit 2bcc9ac31f2ec8a941189d855fa0d8aaf83afb21
Author: Karl Williamson <[email protected]>
Date: Tue Jan 7 10:39:19 2014 -0700
regexec.c: Use compiled-in POSIX definitions
This changes the regex engine to not go out to the disk when needing to
find the definitions of POSIX classes, but to use the compiled-in ones
introduced earlier in this commit series.
M regexec.c
commit e980a809daed6866ccbfa349a33d4dd616a86aad
Author: Karl Williamson <[email protected]>
Date: Tue Jan 7 10:32:20 2014 -0700
lib/B/Deparse.t: Move order-dependent test
This makes TODO and moves a test to earlier in the file where it fails,
and creates a copy just after the test that causes it to succeed.
M lib/B/Deparse.t
commit 5f4894ab1c8d377ed69e7cf344ba5b7806f04f58
Author: Karl Williamson <[email protected]>
Date: Mon Jan 6 13:41:46 2014 -0700
IDStart and IDCont no longer go out to disk
These are the base names for various macros used in parsing identifiers.
Prior to this patch, parsing a code point above Latin1 caused loading
disk files. This patch causes all the information to be compiled into
the Perl binary.
M charclass_invlists.h
M embed.fnc
M embed.h
M proto.h
M regen/mk_invlists.pl
M utf8.c
commit 76a72fbf03d11931401fd0b14c69c38db73d3247
Author: Karl Williamson <[email protected]>
Date: Mon Jan 6 13:38:58 2014 -0700
utf8.c: Add comment
M utf8.c
commit 7c86bb4046e927bf7b7cd6834ad1248f3ae4f556
Author: Karl Williamson <[email protected]>
Date: Mon Jan 6 12:22:02 2014 -0700
isWORDCHAR_uni(), isDIGIT_utf8() etc no longer go out to disk
Previous commits in this series have caused all the POSIX classes to be
completely specified at C compile time. This allows us to revise the
base function used by all these macros to use these definitions,
avoiding reading them in from disk.
M embed.fnc
M embed.h
M proto.h
M utf8.c
commit 44fde04f672af4f1d1ca0dea613bf4580f3c386a
Author: Karl Williamson <[email protected]>
Date: Mon Jan 6 12:14:31 2014 -0700
Move initialization of PL_XPosix_ptrs[] to perl.c
This was performed unconditionally in regcomp.c. However, future
commits will use this from other code. Almost all (but not completely
all) Perl code uses regular expressions, so only rarely will this small
amount of initialization be performed when it currently isn't.
M embed.fnc
M embed.h
M perl.c
M proto.h
M regcomp.c
commit 5032cd195ab1446a55bef844fecadd91954babc8
Author: Karl Williamson <[email protected]>
Date: Mon Jan 6 11:57:53 2014 -0700
regen/mk_invlists.pl: White-space only
This outdents a block to be in line with adjacent lines.
M regen/mk_invlists.pl
commit 44f1058adb319f793415a36a146fb32f996e6ed6
Author: Karl Williamson <[email protected]>
Date: Mon Jan 6 11:55:17 2014 -0700
regcomp.c: Reword expression for clarity
I believe the new version is clearer as to what is meant, and it brings
it in line with the same expression in nearby uses.
M regcomp.c
commit f82ff82311a0199ed98ea21404e7775a3aed7b8a
Author: Karl Williamson <[email protected]>
Date: Mon Jan 6 11:52:21 2014 -0700
Rmv PL_Posix_ptrs
Previous commits in this series have removed all uses of this global
array. This completely removes it.
Since it is a global, consideration need be given to possible uses of it
outside the core. It has never been externally documented, and is an
opaque structure whose internals have changed with every release. The
functions used to access it are almost all static to regcomp.c; those
few that aren't have been hidden from all but the few .c files that need
to have access to them, via #if's.
M charclass_invlists.h
M perl.c
M regcomp.c
M regen/mk_invlists.pl
M sv.c
commit b4cbe70cbf600391366ae83e67972bd387846844
Author: Karl Williamson <[email protected]>
Date: Mon Jan 6 11:41:53 2014 -0700
regcomp.c: Rmv remaining uses of PL_Posix_ptrs
Previous commits have removed all but a few uses of PL_Posix_ptrs. This
removes the rest. ASCII is the same whether over all code points, or
just the ASCII range, so we can substitute the version for all code
points. There is an extra intersection introduced by this commit during
the construction of a synthetic start class under /a and /aa, but the
performance hit should be negligible.
M regcomp.c
commit e995f619f18dd88e65ddedcb22c70a47023c39d5
Author: Karl Williamson <[email protected]>
Date: Mon Jan 6 11:36:49 2014 -0700
regcomp.c: Collapse two branches.
Previous commits in this series have removed the need to special case
[:ascii:]. This commit removes the special casing. There is a slight
performance penalty if this is the only POSIX class in the bracketed
class, and is being compiled under /a or /aa: An extra intersection will
be performed. Since this is regex compilation, this should be
unnoticeable.
M regcomp.c
commit bc39bc2fb68376cca4874c0b2d435b61bc4a2178
Author: Karl Williamson <[email protected]>
Date: Sat Jan 4 23:13:56 2014 -0700
XXX finish regcomp.c: Trade a little time for simplicity
Perl traditionally has chosen time when there is a space tradeoff. This
commit goes the other way, but the reason isn't space, it is for
making the code simpler.
Perl currently has two sets of inversion lists for the POSIX classes
built in. One set is for the entire Unicode range; the other for just
the ASCII range. This latter set could be derived from the larger one
at run time by doing an intersection with ASCII.
M regcomp.c
commit 795a7f4b62166dfe8d0a32bfc0e23178c3076d89
Author: Karl Williamson <[email protected]>
Date: Sat Jan 4 21:12:07 2014 -0700
regcomp.c: Collapse two code branches
Previous commits have simplified things so these two if-then-else
branches can be collapsed into one.
M regcomp.c
commit 44fb1547fcf9db914bb9965b8d8bed68e92b55a1
Author: Karl Williamson <[email protected]>
Date: Sat Jan 4 15:31:36 2014 -0700
Keep temp separate list of posix classes and complements
In building up the list of code points that are matched by a bracketed
character class, there can be both posix classes and complemented posix
classes, like [\s\W]. By keeping each type in a separate list, we can
simplify code in later commits.
M regcomp.c
commit 4d872635824f0a8a622c569e2581ba4a343f2b69
Author: Karl Williamson <[email protected]>
Date: Sat Jan 4 14:45:54 2014 -0700
regcomp.c: White-space, comments only
This outdents code due to the removal of a block in the previous commit.
And it clarifies some comments about it.
M regcomp.c
commit a2bd4911e5e7e8710c1c9f12d90c0ac8c3f4a7fe
Author: Karl Williamson <[email protected]>
Date: Sat Jan 4 14:43:16 2014 -0700
regcomp.c: Collapse two code branches
The previous commit has enabled us to collapse the branches for dealing
with, e.g., \w and \W into the same branch, using a flag to indicate to
complement or not.
M regcomp.c
commit 537c67d9171b64f4e27a1658d2ac747ffe56ad3a
Author: Karl Williamson <[email protected]>
Date: Sat Jan 4 14:05:44 2014 -0700
regcomp.c: Use unconditional match list for POSIX above 255
The POSIX classes, \w, [:blank:], always match the same non-Latin1 code
points regardless of locale, folding, etc. They can be added to the
unconditional match list, and not have to be dealt with further.
M regcomp.c
commit b6e39f5cdf949fe2b450eada1f050f53e7a8f973
Author: Karl Williamson <[email protected]>
Date: Sat Jan 4 13:17:22 2014 -0700
regcomp.c: White-space only
This indents code properly that is within a block newly formed by the
previous commit
M regcomp.c
commit 0d111451723584f3963e8f25670300a7161a9079
Author: Karl Williamson <[email protected]>
Date: Sat Jan 4 13:08:42 2014 -0700
Keep temp separate list of foldable characters
When populating what a bracketed character class should match, it turns
out that if we keep a separate list of code points whose folds may have
to be added, we can simplify the code. This commit just adds the new
list. Future commits will do the simplification.
M regcomp.c
commit 2da49cc651b9f53cf750514ac23ba7c138c099c9
Author: Karl Williamson <[email protected]>
Date: Sat Jan 4 09:19:32 2014 -0700
regcomp.c: Move some code around.
We have a section of code already for handling POSIX classes under /l.
This moves all that handling to there, leading to simpler, easier to
read code, and modifies some comments Further simplifications will be
in future commits.
M regcomp.c
commit 28bcd7bf44901018f56a130742a7ec6cdfefef84
Author: Karl Williamson <[email protected]>
Date: Mon Jan 6 09:40:42 2014 -0700
regcomp.c: Add some comments
M regcomp.c
commit f62045d514d62888c837673c8834c42011558194
Author: Karl Williamson <[email protected]>
Date: Thu Jan 2 16:44:39 2014 -0700
Remove PL_L1Posix_ptrs
This global array is no longer used, having been removed in previous
commits in this series.
Since it is a global, consideration need be given to possible uses of it
outside the core. It has never been externally documented, and is an
opaque structure whose internals have changed with every release. The
functions used to access it are almost all static to regcomp.c; those
few that aren't have been hidden from all but the few .c files that need
to have access to them, via #if's.
M charclass_invlists.h
M embedvar.h
M intrpvar.h
M perl.c
M regcomp.c
M regen/mk_invlists.pl
M sv.c
commit c41e5285997aa6594a9c12f81e1c25c0697a0cd5
Author: Karl Williamson <[email protected]>
Date: Thu Jan 2 09:50:16 2014 -0700
Rmv more code for delayed 'til runtime POSIX defns
Now that all the POSIX class definitions are known at compile time, we
no longer need to handle the case that some aren't known until runtime.
This removes some more code that dealt with that.
M regcomp.c
commit 9d41a11443c83377939e494753b48406e8d11067
Author: Karl Williamson <[email protected]>
Date: Thu Jan 2 09:48:51 2014 -0700
regcomp.c: White-space only
This outdents and reflows lines that were in a block removed in the
previous commit.
M regcomp.c
commit c6668322c9e58189e4af2fff2474dfa15abeaa1d
Author: Karl Williamson <[email protected]>
Date: Thu Jan 2 09:47:03 2014 -0700
Compile in list of foldable code points
When constructing what matches code points under /i, Perl uses an
inversion list of all the possible code points that participate in
folds. This number is relatively few compared to the possible universe
of code points, as most of the world's scripts aren't cased, and many
characters in the scripts that do fold aren't foldable (such as
punctuation). Prior to this commit, the list for the above-Latin1 code
points was read-in from disk if and only if needed. This commit causes
the list to be added to read-only data in a C header, trading a little
space in Perl's text segment for speed at execution. This will enable
ripping out some code in this and future commits (offsetting the space
used by this one).
M charclass_invlists.h
M regcomp.c
M regen/mk_invlists.pl
commit 2ab61a8e62c9d33ace02be334b77c96f81c193f8
Author: Karl Williamson <[email protected]>
Date: Wed Jan 1 22:10:12 2014 -0700
regcomp.c: Reword comment to avoid ambiguity
M regcomp.c
commit 5e0415d11e7805df64340b4531216466d88181c1
Author: Karl Williamson <[email protected]>
Date: Thu Jan 2 07:43:35 2014 -0700
regcomp.c: Rmv code for delayed 'til runtime POSIX defns
The previous commit made compile-time inversion lists available for all
POSIX classes, not just some.. Therefore the code that deals with not
having them available until runtime can be removed. This commit does
the largest chunk of this code, used when a POSIX class is used within a
bracketed character class.
M regcomp.c
commit 4093c33d99b626ba442460354eca018a4796b0e2
Author: Karl Williamson <[email protected]>
Date: Wed Jan 1 20:25:00 2014 -0700
Compile in all POSIX class inversion lists
This changes charclass_invlists.h to have the complete definitions for
all the POSIX classes, like \w and [:alpha:]. Thus these won't have to
be loaded off disk at run-time.
Taking advantage of this will be done in stages in future commits
M charclass_invlists.h
M regen/mk_invlists.pl
-----------------------------------------------------------------------
--
Perl5 Master Repository