In perl.git, the branch smoke-me/khw-regex has been created
<http://perl5.git.perl.org/perl.git/commitdiff/bccdb57e006e7c524ffd50d98a9b2ad67c6769d6?hp=0000000000000000000000000000000000000000>
at bccdb57e006e7c524ffd50d98a9b2ad67c6769d6 (commit)
- Log -----------------------------------------------------------------
commit bccdb57e006e7c524ffd50d98a9b2ad67c6769d6
Author: Karl Williamson <[email protected]>
Date: Sat Aug 11 14:56:55 2012 -0600
mktables: Rebuild if local Makefile has changed
Normally, mktables is called from the Makefile at the base level. But
during development, it may manually be called from the directory (and
hence that directory's Makefile). This patch causes it to rebuild if
that Makefile changes.
M lib/unicore/mktables
commit 1cf6472fdd8148ceb94c71e9cb4c9444eb6eaf13
Author: Karl Williamson <[email protected]>
Date: Sat Aug 11 14:30:02 2012 -0600
perlre: Nits
This fixes some grammar ("either" legally should refer to only a
dual-valued option set) and removes unnecessary distracting detail.
M pod/perlre.pod
commit 28656fb5f44f982a692b65e859e0c3525a5a7ee2
Author: Karl Williamson <[email protected]>
Date: Sat Aug 11 14:19:45 2012 -0600
regcomp.c: Optimization not valid for Latin Sharp S
The regex optimizer optimizes some quantifier expressions into simpler
versions. It turns out that these optimizations don't work on a
quantified, folded LATIN SMALL LETTER SHARP S under /d. This is due to
the size differential of the fold from the source.
This commit omits the optimization if this circumstance occurs anywhere
in the regex prior to the determination of whether to optimize or not.
I tried adding a parameter to study_chunk() to indicate more locally if
the optimization should be excluded or not; but my first attempt did not
fix the bug, and I chose to not pursue that line. This character is so
abnormal that it's probably best anyway to be overly cautious when
confronted with it.
M regcomp.c
M t/re/re_tests
commit 0defc69fbd1666434dd1df0a50fb0909ac20ad04
Author: Karl Williamson <[email protected]>
Date: Sat Aug 11 14:10:05 2012 -0600
regcomp.c: Extract duplicate code to common function
Comments warned about keeping the two code sections in sync; this commit
takes the portions that are identical and makes a common function out of
them, so the synchronization becomes automatic.
M regcomp.c
commit a249e965bed5ccb92f1431328566ba0166f49bfe
Author: Karl Williamson <[email protected]>
Date: Fri Aug 10 12:16:45 2012 -0600
regcomp.c: Make sure counter same in passes 1 and 2
The number of elements was not being incremented in pass 1, whereas that
number is needed later on in pass 1. This did not cause a
bug, as currently, in pass 1 we care only if the count is 1 or not, and
this occurred only in a case where it would get incremented properly to
more than 1 anyway. But this is a potential bug that should be
squelched before it happens.
M regcomp.c
commit e962c79e4d4c4f7dba77e9a8d63ff21d0c3f1154
Author: Karl Williamson <[email protected]>
Date: Fri Aug 10 11:58:49 2012 -0600
regcomp.c: Comments only
The diffs will show more than this, as a block of comments was moved and
revised
M regcomp.c
commit 1e0a9dae5e8b5db5328f863c05e02ac57d25f070
Author: Karl Williamson <[email protected]>
Date: Fri Aug 10 11:53:20 2012 -0600
regcomp.c: Use old paradigm in dealing with flags recursively
In a recursive call to reg(), instead of passing our flags pointer, pass
a new one and upon return or in that result with the existing one. I
can see why this should be done, as you don't want to lose what you
already have, as reg() will start by resetting it to 0. I don't know
why one ands it with the known flags, but I'm presuming there is a
reason, and so am copying the paradigm. I searched the commit messages
and didn't find anything. No tests failed, and I didn't figure out a
test that would fail.
M regcomp.c
commit ccc72993830d29f3a0b182d834687b0bbfdbba80
Author: Karl Williamson <[email protected]>
Date: Fri Aug 10 09:11:11 2012 -0600
regcomp.c: Create NOTHING node when would have been 0 length EXACT
It's peculiar circumstances indeed that would get to this point in the
code with an EXACT node to be created, but nothing to populate it with.
Perhaps it is impossible; I'm not sure. But commit
5f820f894e71b6970a5aa0fd763a84b647fd628a changed the behavior, which I
discovered in later re-reading the code. Probably the node would be
populated with a single NUL. Just in case it is possible to get here
under these peculiar circumstances, this commit adds code to handle the
case, with a NOTHING node instead of a 0 length EXACT.
M regcomp.c
commit 26b296380651d3272254ebab007f6493b2b72033
Author: Karl Williamson <[email protected]>
Date: Thu Aug 9 14:38:03 2012 -0600
regcomp.c: Set flags when optimizing a [char class]
A bracketed character class containing a single Latin1-range character
has long been optimized into an EXACT node. Also, flags are set to
include SIMPLE. However, EXACT nodes containing code points that are
different when encoded under UTF-8 versus not UTF-8 should not be marked
simple.
To fix this, the address of the flags parameter is now passed to
regclass(), the function that parses bracketed character classes, which
now sets it appropriately. The unconditional setting of SIMPLE that was
always done in the code after calling regclass() has been removed.
In addition, the setting of the flags for EXACT nodes has been pushed
into the common function that populates them.
regclass() will also now increment the naughtiness count if optimized to
a node that normally does that. I do not understand this heuristic
behavior very well, and could not come up with a test case for it;
experimentation revealed that there are no test cases in our test suite
for which naughtiness makes any difference at all.
M embed.fnc
M embed.h
M proto.h
M regcomp.c
M t/re/pat.t
commit b380cd2d9d45191a6659787888ae4c9f8acfb764
Author: Karl Williamson <[email protected]>
Date: Tue Aug 7 21:06:06 2012 -0600
regcomp.c: change pattern to utf8 if needed in \N{}
This patch is in preparation for future patches that will no longer
always make any pattern that contains \N{} be encoded in UTF-8. Thus
this patch doesn't actually change anything, but enables future ones.
M regcomp.c
commit 5074eb91a15bb7fec8137d0991202f13150d89a1
Author: Karl Williamson <[email protected]>
Date: Mon Aug 6 16:42:27 2012 -0600
re/re_tests: Correct Todo test
This test was not doing what it purported to test. It should show that
a /[s\xDF]/i would not match 'ss', because the 's' is seen in the class,
and not the \xDF (which matches 'ss' under /i) in the appropriate
strings
M t/re/re_tests
commit 3cbbeebfd8f9c69dcebecfe248ce3d845c2c763c
Author: Karl Williamson <[email protected]>
Date: Sat Aug 4 11:02:16 2012 -0600
re.pm: Nits in pod
This has clarifications, grammar changes, and reflowing to fit into 79
columns
M ext/re/re.pm
M t/porting/known_pod_issues.dat
commit 22354e6e9d5866a13ca443c68d10b3ee20c2318c
Author: Karl Williamson <[email protected]>
Date: Thu Aug 2 10:50:00 2012 -0600
Add some tests for [\N{}]
M t/re/pat_advanced.t
-----------------------------------------------------------------------
--
Perl5 Master Repository