In perl.git, the branch smoke-me/khw-regcomp has been created
<http://perl5.git.perl.org/perl.git/commitdiff/f7e44c8b378454a00689de6778ffe7eeabe3143a?hp=0000000000000000000000000000000000000000>
at f7e44c8b378454a00689de6778ffe7eeabe3143a (commit)
- Log -----------------------------------------------------------------
commit f7e44c8b378454a00689de6778ffe7eeabe3143a
Author: Karl Williamson <[email protected]>
Date: Mon Feb 15 20:59:10 2016 -0700
perlapi: Hide the swash functions
These should be internal only, and we may want to get rid of them
someday. Hide their existence so that people who don't already know
about them won't be tempted to try to use them.
M embed.fnc
commit b63afe405a6a338988fe2f537c896cc86262e043
Author: Karl Williamson <[email protected]>
Date: Mon Feb 15 20:32:32 2016 -0700
regcomp.h: Not all ANYOF flags are in use.
So, it's better to not have a mask to include the unused ones.
M regcomp.h
commit 778a2478d061bf88af552f05230eceabd931a0c0
Author: Karl Williamson <[email protected]>
Date: Tue Dec 29 22:48:09 2015 -0700
regcomp.c: Extract code to a separate function
This is in preparation for the next commit, where it will be called from
a second place.
M embed.fnc
M embed.h
M proto.h
M regcomp.c
commit 7bba5830a1ba04f88e96a1836dcbac85cda89d6b
Author: Karl Williamson <[email protected]>
Date: Mon Feb 15 17:35:22 2016 -0700
regcomp.c: Rmv unnecessary tests
This tested some flag bits, but these are guaranteed to be set by the
first test in the 'if'.
M regcomp.c
commit 5964444fe879fa0a3647de42582e14e10ed85e4d
Author: Karl Williamson <[email protected]>
Date: Mon Feb 15 16:27:20 2016 -0700
regcomp.c: Save a branch test
This branch will only be true if the answer to the previous branch was
also true, so can just move it to within that to avoid an unnecessary
test.
M regcomp.c
commit 68554e0af843a9d222b88bffdcc188a24dff621c
Author: Karl Williamson <[email protected]>
Date: Mon Feb 15 16:20:43 2016 -0700
regcomp.c: Clarify -Dr output under /l
It is now redundant to indicate that an ANYOF node is for locale, as the
regnode type ANYOFL now clearly indicates that. But also sometimes the
node is only vaid if the runtime locale is a UTF-8 one. That was not
clearly indicated.
M regcomp.c
commit 3eeac7a7cb5c60f6d4a86333ac3e18fa1ca21240
Author: Karl Williamson <[email protected]>
Date: Mon Feb 15 16:13:29 2016 -0700
regcomp.c: Rmv unnecessary -Dr output
The previous commit removed all ambiguity as far as the 2nd [] in the
-Dr output of a bracketed character class, so we can remove the
clarification text, which is unnecessary, and clutters up the output.
It is required to leave text in in the case where the expression is
applicable only when the target being matched against is UTF-8.
M regcomp.c
commit f4c28635c09635db6252fcf22abdee4b82b7249f
Author: Karl Williamson <[email protected]>
Date: Mon Feb 15 15:41:46 2016 -0700
regcomp.c: -Dr output move
This finishes the process of several commits ago of moving the output of
what happens when the locale is UTF-8 into the first bracketed class
expression in -Dr output. This output thus now is accurate when the
class is marked as inverted.
M regcomp.c
commit 48810df50735caf5ba0fe87e4ea9a7bd7bd23d19
Author: Karl Williamson <[email protected]>
Date: Mon Feb 15 15:37:24 2016 -0700
regcomp.c: -Dr: Add a pipe symbol for clarity
This output of what gets compiled is the OR of the two [] bracketed
expressions. Add a '|' to indicate that. Otherwise, it would legally
mean one expression followed by the other.
M regcomp.c
commit 049c754731a1b50362434f2016fc57a22022ae90
Author: Karl Williamson <[email protected]>
Date: Mon Feb 15 15:06:49 2016 -0700
Explicitly show the chars in -Dr for which UTF-8-ness matters
Prior to this commit, when displaying what a pattern compiles to,
general text was used to indicate that the characters \x80 to \xFF all
matches when the target being matches is not UTF-8, while some of them
matched under UTF-8 as well. This changes to be explicit to show
precisely for which ones UTF-8-ness matters.
M regcomp.c
commit 1c8498f13ccc407e6bc168c587a810cc7a33962a
Author: Karl Williamson <[email protected]>
Date: Sun Feb 14 10:33:31 2016 -0700
later
M regcomp.c
commit 657de4ab36366cba93eb2aa83d82ab8ee8e4e06a
Author: Karl Williamson <[email protected]>
Date: Sun Feb 14 10:26:46 2016 -0700
regcomp.c: Output XXX
M regcomp.c
commit 3d9c3c4bcb2323353f1c301d92c7c24fbf80f5ba
Author: Karl Williamson <[email protected]>
Date: Sat Feb 13 18:25:10 2016 -0700
regcomp.c: Move some -Dr output
Under -Dr compilation output, there can be multiple [...][...]
displayed. Some items are output to show the matches that would be
valid when the current locale is a UTF-8 one, and they currently aren't
displayed in the first [...]. But they should be, for the case where
the class is inverted. For example /[^aQ]/li should display as
[^aQ{utf8 locale}Aq]. Not having them in the first [ ] runs afoul of De
Morgan's laws and could be misleading.
This commit doesn't get them all the way there, but it is the first step
in doing so.
M regcomp.c
commit b2a0e25758f677731e96f509ffa3918b7d5a6b74
Author: Karl Williamson <[email protected]>
Date: Sat Feb 13 18:07:32 2016 -0700
regcomp.c: No need to truncate some -Dr output
When displaying what a /i regex pattern compiled into, in the case of
some that are based on the current locale, certain matches are known to
occur when the locale is a UTF-8 one. These are listed separately from
the other ones in the display, and there has been code to truncate it if
it gets too big. However, it can't ever get too large, as the only
things in it are the alphabetics in the 0-FF range, as everything above
that doesn't vary by locale. So the worst case is not very large
M regcomp.c
commit 94dd3f3f3ab531f1cd9d086f81ff4d0976de002c
Author: Karl Williamson <[email protected]>
Date: Sat Feb 13 18:00:36 2016 -0700
regcomp.c: Comments, white-space, add grouping () for clarity
M regcomp.c
commit 782ac453836e593b0572af5683dda77acfdfcf0e
Author: Karl Williamson <[email protected]>
Date: Mon Feb 15 11:02:07 2016 -0700
Cast correctly to U8, not char
U8 is what the function being called is expecting
M regcomp.c
commit 7d772c3f2a700b41d5322c35013ea789def283d8
Author: Karl Williamson <[email protected]>
Date: Sat Feb 13 17:21:28 2016 -0700
regcomp.c: Simplify a few lines of code
This code had been written before the isMNEMONIC_CNTRL() macro was
created. Using the macro simplifies things a little.
M regcomp.c
commit 82be73e79610a7f694fba85140c1fe2ec7773421
Author: Karl Williamson <[email protected]>
Date: Sat Feb 13 15:51:50 2016 -0700
regcomp.c: Clean up logic in function
This function uses some crude heuristics to decide whether to make a
synthetic start class or not. This commit removes some redundancies.
M regcomp.c
commit 4f208f2f6b366e3205ace32b485fa29e93b1ac70
Author: Karl Williamson <[email protected]>
Date: Sat Feb 13 15:03:51 2016 -0700
Add environment variable for -Dr: PERL_DUMP_RE_MAX_LEN
The regex engine when displaying debugging info, say under -Dr, will elide
data in order to keep the output from getting too long. For example,
the number of code points in all of Unicode matched by \w is quite
large, and so when displaying a pattern that matches this, only the
first some number of them are printed, and the rest are truncated,
represented by "...".
Sometimes, one wants to see more than what the
compiled-into-the-engine-max shows. This commit creates code to read
this environment variable to override the default max lengths. This
changes the lengths for everything to the input number, even if they
have different compiled maximums in the absence of this variable.
I'm not currently documenting this variable, as I don't think it works
properly under threads, and we may want to alter the behavior in various
ways as a result of gaining experience with using it.
M embedvar.h
M intrpvar.h
M regcomp.c
M regcomp.h
commit 2c7be3e53a7f3611248e4d3946c2f9fdc4a8bc1c
Author: Karl Williamson <[email protected]>
Date: Thu Feb 11 10:25:04 2016 -0700
regcomp.c: -Dr \xZZ instead of \x{ZZ}
The brackets are unnecessary and clutter the output.
M regcomp.c
commit cb67236ae27307f68463452ad9e337c8ea3c4a7a
Author: Karl Williamson <[email protected]>
Date: Thu Feb 11 10:12:57 2016 -0700
regcomp.c: Fix -Dr bug
It was using a wrong length calculation, which under some circumstances
caused the output to include extra bytes. Also I added comments, and
changed a variable name, so I don't have to figure this out again from
scratch.
M regcomp.c
commit 9c05439ce1b2b32caeb29ebbf97fde4983b96348
Author: Karl Williamson <[email protected]>
Date: Mon Feb 15 11:25:35 2016 -0700
XXX need tests, comments Fix /\p{User-defined}/i
M regcomp.c
M t/re/pat_advanced.t
commit bd6e8e74b959d807bb5ba4d158874829b3a1e02d
Author: Karl Williamson <[email protected]>
Date: Mon Feb 15 11:04:36 2016 -0700
regcomp.c: Use macro to hide complexity
There is an existing macro that does these three lines in one source
line.
M regcomp.c
commit 66b1baaef5a9a386fa5f9bed92866c9a2ca795e5
Author: Karl Williamson <[email protected]>
Date: Sat Feb 13 13:49:00 2016 -0700
Don't allow /\N{}/ under 're strict'
This is the one remaining empty {} that was accepted under the
experimental 'use re "strict"'.
M embed.fnc
M embed.h
M pod/perldelta.pod
M pod/perldiag.pod
M proto.h
M regcomp.c
M t/re/reg_mesg.t
commit d105faa657594560f4116a4da47f4729f2c12186
Author: Karl Williamson <[email protected]>
Date: Sat Feb 13 15:20:49 2016 -0700
perlrecharclass: Add some missing info
M pod/perlrecharclass.pod
commit 5dd4d5811bed63a0c533d875032638fc314985b7
Author: Karl Williamson <[email protected]>
Date: Sat Feb 13 15:35:11 2016 -0700
PATCH: [perl 127537] /\W/ regression with UTF-8
This bug is apparently uncommon in the field, as I was the one who
discovered it. It requires a complemented posix class, like \W or \S,
in an inverted character class, like [^\Wfoo] in a pattern that also has
a synthetic start class generated by the regex optimizer for it .
The fix is trivial.
M pod/perldelta.pod
M regcomp.c
M t/re/re_tests
commit 6d26347314b6c6a2fe517443924a5c3bd4e1ef25
Author: Karl Williamson <[email protected]>
Date: Sat Feb 13 11:53:50 2016 -0700
regcomp.c, toke.c: swap functions being inline static
grok_bslash_x() is so large that no compiler will inline it. Move it to
dquote.c from dq_inline.c. Conversely, move form_octal_warning() to
dq_inline.c. It is so tiny that the function call overhead is scarcely
smaller than the function body.
This also moves things in embed.fnc so all these functions. are not
visible outside the few files they are supposed to be used in.
M dquote.c
M dquote_inline.h
M embed.fnc
M embed.h
M proto.h
commit faa731084701c5993d3a128fb5e50df3e6f00fcf
Author: Karl Williamson <[email protected]>
Date: Wed Feb 10 14:29:15 2016 -0700
XXX partial don't push regex: Add ASCII/NASCII regnodes
These are a little more efficient than using the POSIXA(:ascii:)
mechanism.
M pod/perldebguts.pod
M regcomp.sym
M regexec.c
M regnodes.h
commit f0d98c7955eb28400675c2442ae713e58d1a6b62
Author: Karl Williamson <[email protected]>
Date: Wed Feb 3 13:41:11 2016 -0700
constant.pm lower memory use
M dist/constant/lib/constant.pm
-----------------------------------------------------------------------
--
Perl5 Master Repository