In perl.git, the branch khw/tricky has been created
<http://perl5.git.perl.org/perl.git/commitdiff/2373c1b5d1361ab8bbe954fc8234512fd554e7e7?hp=0000000000000000000000000000000000000000>
at 2373c1b5d1361ab8bbe954fc8234512fd554e7e7 (commit)
- Log -----------------------------------------------------------------
commit 2373c1b5d1361ab8bbe954fc8234512fd554e7e7
Author: Karl Williamson <[email protected]>
Date: Sun Dec 25 14:42:06 2011 -0700
re/reg_fold.t: Add and revise comments
M t/re/reg_fold.t
commit 316393f0b9d30be1862abb6ee1eed22bbaa8b55b
Author: Karl Williamson <[email protected]>
Date: Sun Dec 25 14:35:54 2011 -0700
reg_fold.t: Test bracketed character classes
These were removed when things were very broken, but now they work,
except for things like
"\N{LATIN SMALL LIGATURE FFI}" =~ /[a-z]{3}/i
where the multi-char fold crosses single bracketed character class
boundaries. These will probably never be fixed in Perl in the general
case (using \F and fc() instead), but I expect that
"\N{LATIN SMALL LIGATURE FFI}" =~ /[f][f][i]/i
will eventually be changed so the brackets are optimized away, and will
work. Then these TODOs will start passing.
M t/re/reg_fold.t
commit 7b7ef1799255a4a4654ab0d681144b1ed9f2d3f6
Author: Karl Williamson <[email protected]>
Date: Sun Dec 25 14:32:56 2011 -0700
re/reg_fold.t: Test more code points
This statement was wrong that said all these things are tested in
fold_grind.t. It will test them all when run with a particular option,
but due to time issues, it skips many code points. reg_fold.t, on the
other hand, does just basic sanity testing, and so should always test
every code point for that.
M t/re/reg_fold.t
commit 526c964465a4dc7ee07f4f5566d4137d0062decb
Author: Karl Williamson <[email protected]>
Date: Sun Dec 25 14:30:20 2011 -0700
re/reg_fold.t: Remove fixed TODOs
These TODOs have not been tested, mostly, for a while
M t/re/reg_fold.t
commit 8d56eeaba1f0cb755e32bc94a96998e1e9b8a1e4
Author: Karl Williamson <[email protected]>
Date: Sun Dec 25 14:24:42 2011 -0700
re/reg_fold.t: Use /u rules for Unicode tests
These tests are for Unicode, so should have /u (instead of /d).
M t/re/reg_fold.t
commit 13b03be875f7cff6c86906f8145d1c1e304819be
Author: Karl Williamson <[email protected]>
Date: Sun Dec 25 14:20:42 2011 -0700
regcomp.c: Refactor join_exact() to eliminate extra passes
The strings in every EXACTFish node are examined for certain problematic
sequences and code points. Prior to this patch, this was done in
several passes, but this refactors the routine to do it in a single
pass.
M embed.fnc
M embed.h
M proto.h
M regcomp.c
commit 8946931a8659c48464c570af7b3dffb5f68aa043
Author: Karl Williamson <[email protected]>
Date: Sun Dec 25 14:18:55 2011 -0700
regcomp.c: Modify some comments
M regcomp.c
commit d1973fb998d4bef100f19357dabf47dc71bb49f0
Author: Karl Williamson <[email protected]>
Date: Fri Dec 23 20:19:27 2011 -0700
regex: Remove FOLDCHAR regnode type
This node type hasn't been used since 5.14.0. Instead an ANYOFV node
was generated where formerly a FOLDCHAR node would have been used. The
ANYOFV was used because it already existed and was up-to-date, whereas
FOLDCHAR would have needed some bug fixes to adapt it, even though it
would be faster in execution than ANYOFV; so the code for it was
retained in case it was needed.
However, both these solutions were defective, and a previous commit has
changed things to a different type of solution entirely. Thus FOLDCHAR
is obsolescent and can be removed, though the code in it was used as a
base for some of the new solutions.
M regcomp.c
M regcomp.sym
M regexec.c
M regnodes.h
commit 1ad0d7f33588302f27cc3eecf5a4d102762137b1
Author: Karl Williamson <[email protected]>
Date: Fri Dec 23 20:11:22 2011 -0700
regex: Fix some tricky fold problems
As described in the comments, this changes the design of handling the
Unicode tricky fold characters to not generate a node for each possible
sequence but to get them to work within EXACTFish nodes.
The previous design(s) all used a node to handle these, which suffers
from the downfall that it precludes legitimate matches that would cross
the node boundary.
The new design is described in the comments.
M regcomp.c
M regcomp.h
M t/re/re_tests
commit 3197feb2dbc98026252aed7c98a6cf46118a1358
Author: Karl Williamson <[email protected]>
Date: Fri Dec 23 19:46:10 2011 -0700
regcomp.c: Rework join_exact()
This re formats and refactors portions of join_exact() that look for the
tricky Greek fold sequences. I renamed various variables, etc, to help
me understand what was going on. It turns out that there were two
off-by-one bugs that prevented this from working properly.
The first bug had the loop quit one too soon The boundary should be
"<=", and not strictly less-than. This means that if the sequence is
the last thing in the string (or only thing) it will not be found.
The other bug had the end-needle parameter be 1 too short, which means
that this would succeed with only the first 3 bytes of the sequence
(now called 'tail'), thus matching many more things than it should
(provided it got the chance to match at all given the first bug).
M regcomp.c
commit 20483d45a6c99ac58d8c7e727e3aee01a3149273
Author: Karl Williamson <[email protected]>
Date: Fri Dec 23 19:37:36 2011 -0700
regex: Add new node type EXACTFU_NO_TRIE
This new node is like EXACTFU but is not currently trie'able. This adds
handling for it in regexec.c, but it is not currently generated; this
commit is preparing for future commits
M regcomp.c
M regcomp.sym
M regexec.c
M regnodes.h
commit 2a9220c17c855b18663a9b5c4c7acdfee98712a6
Author: Karl Williamson <[email protected]>
Date: Fri Dec 23 19:30:09 2011 -0700
regex: Add new node type EXACTFU_SS
This node will be used to distinguish between the case in a non-UTF8
pattern and string where something could be matched that is of different
lengths. The only instance where this can happen is the LATIN SMALL
LETTER SHARP S can match the sequences "ss", "Ss", "sS", or "SS", hence
the name.
This node is not currently generated; this prepares for future commits
M regcomp.c
M regcomp.sym
M regexec.c
M regnodes.h
commit f8b1e2fade1b5046a8f1d6493876ac4023c6dc9c
Author: Karl Williamson <[email protected]>
Date: Fri Dec 23 19:13:24 2011 -0700
regcomp.c: Need to account for delta sizes
When a node can match varying sizes, the delta variable in the optimizer
needs to change to account for that, and it can no longer match a fixed
length string.
This code was adapted from the existing code for the FOLDCHAR node that
has to deal with the same problem.
M regcomp.c
commit a9d15093c69d925c66bb4067fc5eb3fa74a43344
Author: Karl Williamson <[email protected]>
Date: Fri Dec 23 18:51:45 2011 -0700
regcomp.c: Change param to join_exact()
This changes a parameter to this function to instead of changing a running
total, return the actual value computed by the function; and it changes
the calling areas of code to compensate.
M embed.fnc
M proto.h
M regcomp.c
commit 8b516b3ea741601171eac6e8de45e0340c9e0ca0
Author: Karl Williamson <[email protected]>
Date: Fri Dec 23 16:58:31 2011 -0700
perlunicode: nit
M pod/perlunicode.pod
commit 5791593ce7a0466e027f8f0be7e9ab80c1fc22e4
Author: Karl Williamson <[email protected]>
Date: Fri Dec 23 12:24:09 2011 -0700
regcomp.c: regex start class for sharp s
Under most folding types, the optimizer start class should include all
of s, S, and the sharp s (\xdf) if it includes any of them. The code
was neglecting the latter. This is currently not relevant, as there is
special handling of the sharp s elsewhere in regcomp.c. But this is a
step to changing that special handling to fix some bugs.
M regcomp.c
commit 1e4ce0b4ef6f349abb2cc48006c4823839665d23
Author: Karl Williamson <[email protected]>
Date: Fri Dec 23 08:48:07 2011 -0700
regcomp.c: white-space only and comments only
M regcomp.c
commit 9d0bd308bcede234634a1749efe48171092bee4b
Author: Karl Williamson <[email protected]>
Date: Fri Dec 23 08:42:17 2011 -0700
regcomp.c: Save computed value in variable for later use
This will be used in future commits. Retrieving it via OP() doesn't
work in pass1 of the regex compiler.
M regcomp.c
commit 5ac5d9023ba9865ab255d9da42611d9f705eb9b8
Author: Karl Williamson <[email protected]>
Date: Thu Dec 22 20:09:11 2011 -0700
regcomp.c: Make sure trie can handle node passed to it
M regcomp.c
commit 7f0c80d29600ee862e626a53799cdb8745ee9688
Author: Karl Williamson <[email protected]>
Date: Thu Dec 22 20:03:55 2011 -0700
regexec.c: white space only
M regexec.c
commit dbd139cd57ea313ff632ca940d311a211e2895d4
Author: Karl Williamson <[email protected]>
Date: Thu Dec 22 19:51:37 2011 -0700
regexec.c: EXACTF nodes can never be UTF
By definition a regex pattern that is in UTF-8 uses Unicode matching
rules, and EXACTF is non-Unicode (unless the target string is UTF-8).
Therefore an EXACTF node will never be generated for a UTF-8 pattern,
and there is no need to test for it being so.
M regexec.c
commit aa157951fcf655cf0c8bc8cc644e4d01277d36cd
Author: Karl Williamson <[email protected]>
Date: Thu Dec 22 17:58:20 2011 -0700
regcomp.c: Silence valgrind warning
This happens only in doing debug output. Initialize these two debugging
variables
M regcomp.c
commit 78dee5f2d1bd1184e5038416777b3681b8eadfe9
Author: Karl Williamson <[email protected]>
Date: Thu Dec 22 14:29:12 2011 -0700
regexp_noamp.t: Add comment
M t/re/regexp_noamp.t
commit e468997e35db3e4be53adfd0111bd95fea55787c
Author: Karl Williamson <[email protected]>
Date: Wed Dec 21 09:57:43 2011 -0700
t/re/re_tests: Add some tests
M t/re/re_tests
commit d69b77426ba803ae2194eec18b6d0821fa72a751
Author: Karl Williamson <[email protected]>
Date: Wed Dec 21 09:54:38 2011 -0700
t/re/re_tests: revise test
This is the wrong test for the cited ticket. That one is for tests
occurring in bracketed character classes.
M t/re/re_tests
commit cc3a798902d3116960c536f616194a668ee32f4e
Author: Karl Williamson <[email protected]>
Date: Wed Dec 21 09:53:41 2011 -0700
t/re/re_tests: Update comment
This reflects that now that there is autoloading of \N{}, such tests can
go in this file
M t/re/re_tests
commit 1a071a476b71f6295442e624a5bd6f544a2f8f6b
Author: Karl Williamson <[email protected]>
Date: Tue Dec 20 09:28:47 2011 -0700
util.c: Add comment
M util.c
commit bc417572dc9080f3602a3cf498141ff3e70b44ed
Author: Karl Williamson <[email protected]>
Date: Sun Dec 18 13:27:06 2011 -0700
regcomp.c: Don't print incorrect debug info
The break out of the loop should be done before the debug statements
that indicate the things that happen only if the break isn't done.
M regcomp.c
commit 39be665482910c9448d2b8abd039492f9193efed
Author: Karl Williamson <[email protected]>
Date: Sun Dec 18 12:22:11 2011 -0700
regcomp.sym: Change comments
M regcomp.sym
M regnodes.h
-----------------------------------------------------------------------
--
Perl5 Master Repository