In perl.git, the branch smoke-me/khw-regex has been created
<http://perl5.git.perl.org/perl.git/commitdiff/4cb9e33486d4738ef7443d971ff2853e89a04f95?hp=0000000000000000000000000000000000000000>
at 4cb9e33486d4738ef7443d971ff2853e89a04f95 (commit)
- Log -----------------------------------------------------------------
commit 4cb9e33486d4738ef7443d971ff2853e89a04f95
Author: Karl Williamson <[email protected]>
Date: Wed Jul 25 12:31:27 2012 -0600
for smoke
M embed.fnc
M embed.h
M lib/unicore/mktables
M pod/perlebcdic.pod
M proto.h
M regcomp.c
M regexec.c
M t/re/pat_advanced.t
M t/test.pl
M utf8.h
commit c25ebd17d87927454dfb2e45b58f0b86ccca99e7
Author: Karl Williamson <[email protected]>
Date: Mon Jun 18 13:09:38 2012 -0600
XXX squash_with_next
M embedvar.h
M handy.h
M intrpvar.h
M regcomp.c
M sv.c
commit 35eda35c5828ea5a5456394e05ed39742cc2908a
Author: Karl Williamson <[email protected]>
Date: Mon Jun 18 12:55:42 2012 -0600
Generate tables for chars that aren't in final fold pos
This starts with the existing table that mktables generates that lists
all the characters in Unicode that occur in multi-character folds, and
aren't in the final positions of any such fold.
It generates data structures with this information to make it quickly
available to code that wants to use it. Future commits will use these
tables.
M charclass_invlists.h
M handy.h
M l1_char_class_tab.h
M regen/mk_PL_charclass.pl
M regen/mk_invlists.pl
commit 263721dc09ffeede8fd1e535037f19f16f8cf403
Author: Karl Williamson <[email protected]>
Date: Mon Jun 18 12:44:55 2012 -0600
regen/mk_invlists: Add mode to generate above-Latin1 only
This change adds the ability to specify that an output inversion list is
to contain only those code points that are above Latin-1. Typically,
the Latin-1 ones will be accessed from some other means.
M regen/mk_invlists.pl
commit 8fd1e4fb6c2c8959d3844a0b00f3490298718246
Author: Karl Williamson <[email protected]>
Date: Mon Jun 18 12:38:41 2012 -0600
Unicode::UCD::prop_invlist() Allow to return internal property
This creates an optional undocumented parameter to this function to
allow it to return the inversion list of an internal-only Perl property.
This will be used by other functions in Perl, but should not be
documented, as we don't want to encourage the use of internal-only
properties, which are subject to change or removal without notice.
M lib/Unicode/UCD.pm
commit cc142cd3c2c2eefb032fbb2ef233b6eb2a2640cf
Author: Karl Williamson <[email protected]>
Date: Mon Jun 18 12:37:52 2012 -0600
mktables: Add comment to gen'd data file
M lib/unicore/mktables
commit 6da9b8f4bb85be8c197d9668bf8b6bf87cd0a1f9
Author: Karl Williamson <[email protected]>
Date: Mon Jun 18 12:22:41 2012 -0600
mktables: grammar in comments
M lib/unicore/mktables
commit 6b1b9f9bdb6072c24bac03fc4416891ccadfadeb
Author: Karl Williamson <[email protected]>
Date: Mon Jun 18 12:20:42 2012 -0600
regen/mk_PL_charclass.pl: Remove obsolete code
Octals are no longer checked via this mechanism.
M regen/mk_PL_charclass.pl
commit c03b40697da7610feb4ef9933052541d77d86501
Author: Karl Williamson <[email protected]>
Date: Mon Jun 18 11:51:43 2012 -0600
regcomp.c: Make invlist_search() usable from re_comp.c
This was a static function which I couldn't get to be callable from the
debugging version of regcomp.c. This makes it public, but known only
in the regcomp.c source file. It changes the name to begin with an
underscore so that if someone cheats by adding preprocessor #defines,
they still have to call it with the name that convention indicates is a
private function.
M embed.fnc
M embed.h
M proto.h
M regcomp.c
commit a445b16824eff7203919f37267fcc3671a97d823
Author: Karl Williamson <[email protected]>
Date: Mon Jun 18 11:41:18 2012 -0600
perlop:clarify wording
M pod/perlop.pod
commit 07c391951500cc26e066d64c484398962d2db86b
Author: Karl Williamson <[email protected]>
Date: Sat Jun 16 20:02:07 2012 -0600
regcomp.c: Rename static fcn to better reflect its purpose
This function handles \N of any ilk, not just named sequences.
M embed.fnc
M embed.h
M proto.h
M regcomp.c
commit 5163624ca6288d8483ba37287628bf8fe6be0522
Author: Karl Williamson <[email protected]>
Date: Sat Jun 16 19:55:15 2012 -0600
regcomp.c: Make comment more accurate
M regcomp.c
commit 14c3c8525e0a80cb5b51f20487bdded3624dbf54
Author: Karl Williamson <[email protected]>
Date: Sat Jun 16 19:52:12 2012 -0600
regcomp.c: Can now do /u instead of forcing to utf8
Now that there is a /u modifier, a regex doesn't have to be in UTF-8 in
order to force Unicode semantics. Change this relict from the past.
M regcomp.c
commit 6178a596d1d4a8dcd3bcea284db85150d2d265ff
Author: Karl Williamson <[email protected]>
Date: Wed Jun 6 15:02:43 2012 -0600
regcomp.c: Comments update
This adds some comments and white-space lines, and updates other
comments to account for the fact that trie handling has changed since
they were written.
M regcomp.c
commit 3b005b02cf84b5d5dd46eb74ee2e754b4f74a932
Author: Karl Williamson <[email protected]>
Date: Mon May 28 10:49:37 2012 -0600
regcomp.c: Remove variable whose value needed just once
Previous commits have removed all but one instance of using this
variable, so just use the expression it equates to.
M regcomp.c
commit 06b8328217313d1f8b3aa5a6c988ea75cb204552
Author: Karl Williamson <[email protected]>
Date: Mon May 28 10:42:03 2012 -0600
regcomp.c: White-space only
This indents and outdents to compensate for newly formed and orphan
blocks, respectively; and reflows comments to fit in 80 columns
M regcomp.c
commit e67833e34649e7ba56337f22d4aad708c726daf2
Author: Karl Williamson <[email protected]>
Date: Sun May 27 01:08:46 2012 -0600
regcomp.c: Trade stack space for time
Pass 1 of regular expression compilation merely calculates the size it
will need. (Note that Yves and I both think this is very suboptimal
behavior.) Nothing is written out during this pass, but sizes are
just incremented. The code in regcomp.c all knows this, and skips
writing things in pass 1. However, when folding, code in other files is
called which doesn't have this size-only mode, and always writes its
results out. Currently, regcomp handles this by passing to that code a
temporary buffer allocated for the purpose. In pass1, the result is
simply ignored; in pass2, the results are copied to the correct final
destination.
We can avoid that copy by making the temporary buffer large enough to
hold the whole node, and in pass1, use it instead of the node. The
non-regcomp code writes to the same relative spot in the buffer that it
will use for the real node. In pass2 the real destination is used, and
the fold gets written directly to the correct spot.
Note that this increases the size pushed onto the stack, but code is
ripped out as well.
However, the main reason I'm doing this is not this speed-up; it is
because it is needed by future commits to fix a bug.
M regcomp.c
commit 5737363226e2a7826322f730d82b50f3cfb43fd0
Author: Karl Williamson <[email protected]>
Date: Sun May 27 01:04:39 2012 -0600
regcomp.c: Use mnemonic not numeric constant
Future commits will add other uses of this number.
M regcomp.c
commit a40335a726361a9e92d8e141efae42e789c3a095
Author: Karl Williamson <[email protected]>
Date: Sat May 26 22:19:22 2012 -0600
regcomp.c: Resolve EBCDIC inconsistency towards simpler
This code has assumed that to_uni_fold() returns its folds in Unicode
(i.e. Latin1) rather than native EBCDIC. Other code in the core
assumes the opposite. One has to change. I'm changing this one, as the
issues should be dealt with at the lowest level possible, which is in
to_uni_fold(). Since we don't currently have an EBCDIC platform to test
on, making sure that it all hangs together will have to be deferred
until such time as we do.
By doing this we make this code simpler and faster. The fold has
already been calculated, we just need to copy it to the final place
(done in pass2).
M regcomp.c
commit b49ca2d363d72c4b0d8f145a120b83f98e375567
Author: Karl Williamson <[email protected]>
Date: Sat May 26 21:39:32 2012 -0600
regcomp.c: Use function instead of repeating its code
A new flag to to_uni_fold() causes it to do the same work that this code
does, so just call it.
M regcomp.c
commit 88ec19533fb0a74c36ab07676ef67758842e0789
Author: Karl Williamson <[email protected]>
Date: Sat May 26 14:19:18 2012 -0600
regcomp.c: Remove (almost) duplicate code
A previous commit opened the way to refactor this so that the two
fairly lengthy code blocks that are identical (except for changing the
variable <len>) can have one of them removed.
M regcomp.c
commit 57ac1bd7077929b803c91df635fde03bd3387227
Author: Karl Williamson <[email protected]>
Date: Thu May 24 22:14:04 2012 -0600
regcomp.c: Refactor so can remove duplicate code
This commit prepares the way for a later commit to remove a chunk of
essentially duplicate code. It does this at the cost of an extra
test of a boolean each time through the loop. But, it saves calculating
the fold unless necessary, a potentially expensive operation. When the
next input is a quantifier that calculated fold is discarded, unused.
This commit avoids doing that calculation when the next input is a
quantifier.
M regcomp.c
commit 5cf91b4d07ad9c7831deccd2d5599320d52f83c9
Author: Karl Williamson <[email protected]>
Date: Thu May 24 21:39:58 2012 -0600
Revert "regcomp.c: Move duplicated code to inline function"
This reverts commit 1ceb3049131abe6184db5a55104a620ffea6958d.
M regcomp.c
commit 531b4d113fb48ee44d463e298be5e5d7138d15e7
Author: Karl Williamson <[email protected]>
Date: Sun May 6 08:10:33 2012 -0600
regcomp.c: Move duplicated code to inline function
This simply extracts the code to one function with only required
ancillary changes. Later commits will clean things up
M regcomp.c
-----------------------------------------------------------------------
--
Perl5 Master Repository