In perl.git, the branch smoke-me/khw-regcomp has been created
<http://perl5.git.perl.org/perl.git/commitdiff/797ead50375ed771eb6db3431ba2f5efb3f4e43f?hp=0000000000000000000000000000000000000000>
at 797ead50375ed771eb6db3431ba2f5efb3f4e43f (commit)
- Log -----------------------------------------------------------------
commit 797ead50375ed771eb6db3431ba2f5efb3f4e43f
Author: Karl Williamson <[email protected]>
Date: Mon Feb 15 20:59:10 2016 -0700
perlapi: Hide the swash functions
These should be internal only, and we may want to get rid of them
someday. Hide their existence so that people who don't already know
about them won't be tempted to try to use them.
M embed.fnc
commit 333842d020017775b6f059fbede972f5c0fe5acc
Author: Karl Williamson <[email protected]>
Date: Mon Feb 15 20:32:32 2016 -0700
regcomp.h: Not all ANYOF flags are in use.
So, it's better to not have a mask to include the unused ones.
M regcomp.h
commit 1864b5847ebf813d95d82e0076e46d74126014d2
Author: Karl Williamson <[email protected]>
Date: Tue Dec 29 22:48:09 2015 -0700
regcomp.c: Extract code to a separate function
This is in preparation for the next commit, where it will be called from
a second place.
M embed.fnc
M embed.h
M proto.h
M regcomp.c
commit 2863b4c2e39544e40cbac880c2565f6eb3f7a221
Author: Karl Williamson <[email protected]>
Date: Mon Feb 15 17:35:22 2016 -0700
regcomp.c: Rmv unnecessary tests
This tested some flag bits, but these are guaranteed to be set by the
first test in the 'if'.
M regcomp.c
commit af40d9ead54b4e27c2ebe2e3a81833c6fee49d85
Author: Karl Williamson <[email protected]>
Date: Mon Feb 15 16:27:20 2016 -0700
regcomp.c: Save a branch test
This branch will only be true if the answer to the previous branch was
also true, so can just move it to within that to avoid an unnecessary
test.
M regcomp.c
commit 3f8e6c0419be445f4230ce1b300f0e6eb6a7b69e
Author: Karl Williamson <[email protected]>
Date: Mon Feb 15 16:20:43 2016 -0700
regcomp.c: Clarify -Dr output under /l
It is now redundant to indicate that an ANYOF node is for locale, as the
regnode type ANYOFL now clearly indicates that. But also sometimes the
node is only vaid if the runtime locale is a UTF-8 one. That was not
clearly indicated.
M regcomp.c
commit 3725f9f62b6d2001579fee83641efe1de12afa74
Author: Karl Williamson <[email protected]>
Date: Mon Feb 15 16:13:29 2016 -0700
regcomp.c: Rmv unnecessary -Dr output
The previous commit removed all ambiguity as far as the 2nd [] in the
-Dr output of a bracketed character class, so we can remove the
clarification text, which is unnecessary, and clutters up the output.
It is required to leave text in in the case where the expression is
applicable only when the target being matched against is UTF-8.
M regcomp.c
commit da6713ca169bc09575f111c4bc4a13afb0fc59a5
Author: Karl Williamson <[email protected]>
Date: Mon Feb 15 15:41:46 2016 -0700
regcomp.c: -Dr output move
This finishes the process of several commits ago of moving the output of
what happens when the locale is UTF-8 into the first bracketed class
expression in -Dr output. This output thus now is accurate when the
class is marked as inverted.
M regcomp.c
commit f2dc6432795b903f4716b699f6824e88286b6de4
Author: Karl Williamson <[email protected]>
Date: Mon Feb 15 15:37:24 2016 -0700
regcomp.c: -Dr: Add a pipe symbol for clarity
This output of what gets compiled is the OR of the two [] bracketed
expressions. Add a '|' to indicate that. Otherwise, it would legally
mean one expression followed by the other.
M regcomp.c
commit 7651d3ab3cbe7909f7ae3232a33eacea2c09fade
Author: Karl Williamson <[email protected]>
Date: Mon Feb 15 15:06:49 2016 -0700
Explicitly show the chars in -Dr for which UTF-8-ness matters
Prior to this commit, when displaying what a pattern compiles to,
general text was used to indicate that the characters \x80 to \xFF all
matches when the target being matches is not UTF-8, while some of them
matched under UTF-8 as well. This changes to be explicit to show
precisely for which ones UTF-8-ness matters.
M regcomp.c
commit 026a9f21a1d0eddf34ab7f1abc48337a5684fad6
Author: Karl Williamson <[email protected]>
Date: Sun Feb 14 10:33:31 2016 -0700
later
M regcomp.c
commit dd1f9c910b332c91262adfc9a399a6455693bd5d
Author: Karl Williamson <[email protected]>
Date: Sun Feb 14 10:26:46 2016 -0700
regcomp.c: Output XXX
M regcomp.c
commit 5e85ced4910c43e43575bfd0c975570292b71fec
Author: Karl Williamson <[email protected]>
Date: Sat Feb 13 18:25:10 2016 -0700
regcomp.c: Move some -Dr output
Under -Dr compilation output, there can be multiple [...][...]
displayed. Some items are output to show the matches that would be
valid when the current locale is a UTF-8 one, and they currently aren't
displayed in the first [...]. But they should be, for the case where
the class is inverted. For example /[^aQ]/li should display as
[^aQ{utf8 locale}Aq]. Not having them in the first [ ] runs afoul of De
Morgan's laws and could be misleading.
This commit doesn't get them all the way there, but it is the first step
in doing so.
M regcomp.c
commit 7d6b45f05d2baa28b907f7952b830b8c7f5925fc
Author: Karl Williamson <[email protected]>
Date: Sat Feb 13 18:07:32 2016 -0700
regcomp.c: No need to truncate some -Dr output
When displaying what a /i regex pattern compiled into, in the case of
some that are based on the current locale, certain matches are known to
occur when the locale is a UTF-8 one. These are listed separately from
the other ones in the display, and there has been code to truncate it if
it gets too big. However, it can't ever get too large, as the only
things in it are the alphabetics in the 0-FF range, as everything above
that doesn't vary by locale. So the worst case is not very large
M regcomp.c
commit eac3355fa1c8cdc77f9a5336e0d0a4d8f58e17c2
Author: Karl Williamson <[email protected]>
Date: Sat Feb 13 18:00:36 2016 -0700
regcomp.c: Comments, white-space, add grouping () for clarity
M regcomp.c
commit b3c87891505d22b5ba3bfc76857ca98c126e8a81
Author: Karl Williamson <[email protected]>
Date: Mon Feb 15 11:02:07 2016 -0700
Cast correctly to U8, not char
U8 is what the function being called is expecting
M regcomp.c
commit 9b9e198cfc422244dfb01719dafc7ea9461cab48
Author: Karl Williamson <[email protected]>
Date: Sat Feb 13 17:21:28 2016 -0700
regcomp.c: Simplify a few lines of code
This code had been written before the isMNEMONIC_CNTRL() macro was
created. Using the macro simplifies things a little.
M regcomp.c
commit 51a07529d1b5c5a2f03f69fc819f8f32232594bb
Author: Karl Williamson <[email protected]>
Date: Sat Feb 13 15:51:50 2016 -0700
regcomp.c: Clean up logic in function
This function uses some crude heuristics to decide whether to make a
synthetic start class or not. This commit removes some redundancies.
M regcomp.c
commit 43b3a8e8591503fbd04a400e792ce6cd9b9fb388
Author: Karl Williamson <[email protected]>
Date: Sat Feb 13 15:03:51 2016 -0700
Add environment variable for -Dr: PERL_DUMP_RE_MAX_LEN
The regex engine when displaying debugging info, say under -Dr, will elide
data in order to keep the output from getting too long. For example,
the number of code points in all of Unicode matched by \w is quite
large, and so when displaying a pattern that matches this, only the
first some number of them are printed, and the rest are truncated,
represented by "...".
Sometimes, one wants to see more than what the
compiled-into-the-engine-max shows. This commit creates code to read
this environment variable to override the default max lengths. This
changes the lengths for everything to the input number, even if they
have different compiled maximums in the absence of this variable.
I'm not currently documenting this variable, as I don't think it works
properly under threads, and we may want to alter the behavior in various
ways as a result of gaining experience with using it.
M embedvar.h
M intrpvar.h
M regcomp.c
M regcomp.h
commit 2c7be3e53a7f3611248e4d3946c2f9fdc4a8bc1c
Author: Karl Williamson <[email protected]>
Date: Thu Feb 11 10:25:04 2016 -0700
regcomp.c: -Dr \xZZ instead of \x{ZZ}
The brackets are unnecessary and clutter the output.
M regcomp.c
commit cb67236ae27307f68463452ad9e337c8ea3c4a7a
Author: Karl Williamson <[email protected]>
Date: Thu Feb 11 10:12:57 2016 -0700
regcomp.c: Fix -Dr bug
It was using a wrong length calculation, which under some circumstances
caused the output to include extra bytes. Also I added comments, and
changed a variable name, so I don't have to figure this out again from
scratch.
M regcomp.c
commit 9c05439ce1b2b32caeb29ebbf97fde4983b96348
Author: Karl Williamson <[email protected]>
Date: Mon Feb 15 11:25:35 2016 -0700
XXX need tests, comments Fix /\p{User-defined}/i
M regcomp.c
M t/re/pat_advanced.t
commit bd6e8e74b959d807bb5ba4d158874829b3a1e02d
Author: Karl Williamson <[email protected]>
Date: Mon Feb 15 11:04:36 2016 -0700
regcomp.c: Use macro to hide complexity
There is an existing macro that does these three lines in one source
line.
M regcomp.c
commit 66b1baaef5a9a386fa5f9bed92866c9a2ca795e5
Author: Karl Williamson <[email protected]>
Date: Sat Feb 13 13:49:00 2016 -0700
Don't allow /\N{}/ under 're strict'
This is the one remaining empty {} that was accepted under the
experimental 'use re "strict"'.
M embed.fnc
M embed.h
M pod/perldelta.pod
M pod/perldiag.pod
M proto.h
M regcomp.c
M t/re/reg_mesg.t
commit d105faa657594560f4116a4da47f4729f2c12186
Author: Karl Williamson <[email protected]>
Date: Sat Feb 13 15:20:49 2016 -0700
perlrecharclass: Add some missing info
M pod/perlrecharclass.pod
commit 5dd4d5811bed63a0c533d875032638fc314985b7
Author: Karl Williamson <[email protected]>
Date: Sat Feb 13 15:35:11 2016 -0700
PATCH: [perl 127537] /\W/ regression with UTF-8
This bug is apparently uncommon in the field, as I was the one who
discovered it. It requires a complemented posix class, like \W or \S,
in an inverted character class, like [^\Wfoo] in a pattern that also has
a synthetic start class generated by the regex optimizer for it .
The fix is trivial.
M pod/perldelta.pod
M regcomp.c
M t/re/re_tests
commit 6d26347314b6c6a2fe517443924a5c3bd4e1ef25
Author: Karl Williamson <[email protected]>
Date: Sat Feb 13 11:53:50 2016 -0700
regcomp.c, toke.c: swap functions being inline static
grok_bslash_x() is so large that no compiler will inline it. Move it to
dquote.c from dq_inline.c. Conversely, move form_octal_warning() to
dq_inline.c. It is so tiny that the function call overhead is scarcely
smaller than the function body.
This also moves things in embed.fnc so all these functions. are not
visible outside the few files they are supposed to be used in.
M dquote.c
M dquote_inline.h
M embed.fnc
M embed.h
M proto.h
commit faa731084701c5993d3a128fb5e50df3e6f00fcf
Author: Karl Williamson <[email protected]>
Date: Wed Feb 10 14:29:15 2016 -0700
XXX partial don't push regex: Add ASCII/NASCII regnodes
These are a little more efficient than using the POSIXA(:ascii:)
mechanism.
M pod/perldebguts.pod
M regcomp.sym
M regexec.c
M regnodes.h
commit f0d98c7955eb28400675c2442ae713e58d1a6b62
Author: Karl Williamson <[email protected]>
Date: Wed Feb 3 13:41:11 2016 -0700
constant.pm lower memory use
M dist/constant/lib/constant.pm
-----------------------------------------------------------------------
--
Perl5 Master Repository