In perl.git, the branch smoke-me/khw-5.21 has been created
<http://perl5.git.perl.org/perl.git/commitdiff/3def420f9d8556a1bf6d6c3a6e2bfe2c0d02485d?hp=0000000000000000000000000000000000000000>
at 3def420f9d8556a1bf6d6c3a6e2bfe2c0d02485d (commit)
- Log -----------------------------------------------------------------
commit 3def420f9d8556a1bf6d6c3a6e2bfe2c0d02485d
Author: Karl Williamson <[email protected]>
Date: Fri Sep 5 10:15:09 2014 -0600
regcomp.c: Don't doubly do 'use encoding'
When reparsing, values have already been converted (if necessary) to
native, so don't do it again.
M regcomp.c
commit 8e5fb39429a576f5ea29a27b062be968bb251eb2
Author: Karl Williamson <[email protected]>
Date: Fri Sep 5 09:45:27 2014 -0600
regcomp.c: Remove extraneous tests
These two messages used to be warnings, but are now errors, so there is
no need to test which pass they are being output in.
M regcomp.c
commit 84b0697251c695cf1c59054be016f09622d58ea7
Author: Karl Williamson <[email protected]>
Date: Fri Sep 5 09:34:26 2014 -0600
numeric.c: Comment tweak
M numeric.c
commit 26c527e527f3bc0b3e3704cc10122cddbe78f4b8
Author: Karl Williamson <[email protected]>
Date: Fri Sep 5 09:09:28 2014 -0600
XXXdelta Allow \N{named seq} in qr/[...]/
This commit changes the regex handler to properly match in many
instances a \N{named sequence} in a bracketed character class.
A named sequence is one which consists of a string of multiple
characters but given one name. Unicode has hundreds of them, like LATIN
CAPITAL LETTER A WITH MACRON AND GRAVE. These are encoded by Unicode
when there is some user community that thinks of the conglomeration as a
single unit, but there was no prior standard that had it so, and it is
possible to encode it in Unicode using other means, typically a sequence
of a base character followed by some combining marks. (If there had not
been such a prior standard, 8859-1, things like LATIN CAPITAL LETTER A
WITH GRAVE would have been put into Unicode this way too.) If they did
not do it this way, they would run out of availble code points much
sooner.
Not having these as single characters adds a burden to the programmer
having to deal with them. Hiding this detail as much as possible makes
it easier to program. This commit hides this in one more place than
previously.
It takes advantage of the infrastructure added some releases ago dealing
with the fact that the match of some single characters
case-insensitively can be 2 or even 3 characters.
"ss" =~ /[Ã]/i;
is the most prominent example.
We earlier discovered that /[^Ã]/ leads to unexpected behavior, and
using one of these sequences as an endpoint in a range is also unclear
as to what is meant. This commit leaves existing behavior for those
cases. That behavior is to use just the first code point in the
sequence for regular [...], and to generate a fatal syntax error for
(?[...]).
M lib/diagnostics.t
M pod/perldiag.pod
M regcomp.c
M t/re/pat_advanced.t
M t/re/reg_mesg.t
commit 7aa0f2412bebad5f444e14b4c04cc4d8052b4ba1
Author: Karl Williamson <[email protected]>
Date: Thu Sep 4 22:48:22 2014 -0600
regcomp.c: Extract out functionality into a function
This is in preparation for it being called from a 2nd place. The code
was merely moved and outdented, and comments moved within the function
and added to.
M embed.fnc
M embed.h
M proto.h
M regcomp.c
commit 06c25297b47888da4ed6511c62955a2d0652c71c
Author: Karl Williamson <[email protected]>
Date: Wed Sep 3 20:00:28 2014 -0600
regcomp.c: White-space only
Properly indent code in blocks newly formed by the previous commit
M regcomp.c
commit 0420c35f7ccd170edfbcb96973b348c35d10d40d
Author: Karl Williamson <[email protected]>
Date: Wed Sep 3 19:52:05 2014 -0600
regcomp.c: Refactor func so caller handles anomalies
S_grok_bslash_N() is refactored to not know about the strictness level
required by the caller, and to return things instead so that the caller
can decide what action to take.
This is in preparation for some changes in the caller's behavior in
future commits.
This has the effect of changing the parsing position or where a problem
occurs shown in a warning message.
M embed.fnc
M embed.h
M proto.h
M regcomp.c
M t/re/reg_mesg.t
commit 959e974eed262e3cfa3bc84f522b46f5807dfeb3
Author: Karl Williamson <[email protected]>
Date: Wed Sep 3 18:28:25 2014 -0600
regcomp.c: Comment clarifications, nits
M regcomp.c
commit 557920b5c8860fea36911953ae975d99b37037d5
Author: Karl Williamson <[email protected]>
Date: Wed Sep 3 17:31:39 2014 -0600
regcomp.c: Refactor one area to use common subroutine
By using the inline function append_utf8_from_native_byte(), the details
of this conversion are hidden from here. Since that routine advances
the parsing pointer with each byte, this has to be slightly refactored.
M regcomp.c
commit 19f72904541878792cbabade6e5752304d1b8262
Author: Karl Williamson <[email protected]>
Date: Mon Sep 1 20:00:01 2014 -0600
XXXdelta PATCH: [perl #122671] Many warnings in regcomp.c can occur twice
This solves the problem by moving the warnings to be output only in
pass2 of compilation. The problem arises because almost all of pass1
can be repeated under certain circumstances described in the ticket and
the added comments of this patch.
M regcomp.c
M t/lib/warnings/regcomp
M t/re/reg_mesg.t
commit 78d3387c7f1ee049f0bee4934f2e8de4a7d55a9c
Author: Karl Williamson <[email protected]>
Date: Mon Sep 1 18:54:03 2014 -0600
recomp.c: Don't output same warning twice
This warning was untested for as well.
M regcomp.c
M t/re/reg_mesg.t
commit d8c0765dabcfb0324542479cff69938b30d42f31
Author: Karl Williamson <[email protected]>
Date: Mon Sep 1 16:44:38 2014 -0600
regcomp.c: Vertically stack ternary
for legibility
M regcomp.c
commit 1a3ee223359ee18b52ccb2793734dd093488020c
Author: Karl Williamson <[email protected]>
Date: Mon Sep 1 14:57:49 2014 -0600
regcomp.c: Don't prematurely skip error checking
The assertion in the comment changed by this commit was true only for
pass1 of the regex compilation; not pass2. This makes it true in both
passes by moving it, and the code it was about past some error checking.
This error checking was executed in pass1, but not pass2. It also
changes the warning to only be done in the second pass, part of
[perl #122671]. A future commit will fix the others
M regcomp.c
commit ea76f303d704f4b8911b687a2c15954eeccceb5c
Author: Karl Williamson <[email protected]>
Date: Mon Sep 1 14:48:02 2014 -0600
regcomp.c: Move comment closer to code it applies to
M regcomp.c
commit 0a635ef8c73e4344b8c5e2f6871e26873d7d2443
Author: Karl Williamson <[email protected]>
Date: Thu Aug 28 21:12:39 2014 -0600
regcomp.c: Remove unnecessary cast
The macro does the appropriate cast, and this is slightly more legible.
M regcomp.c
commit 2987f1d7a8b8289a42f02ebc7d36d23760524b1f
Author: Karl Williamson <[email protected]>
Date: Tue Aug 26 15:34:25 2014 -0600
regcomp.c: Make macro a lookup
The recently introduced macro isMNEMONIC_CNTRL has a look-up and several
tests in it, which occupy time and space. Since it was only used for
debugging, that did not matter much, but future commits will use it in
more mainline code. This commit changes it to be a single look-up,
using up one of the spare bits available for that purpose in
PL_charclass. There are enough available bits that we aren't likely to
run out, really ever. (We can always add a 2nd word of bits if
necessary.)
M handy.h
M l1_char_class_tab.h
M regcomp.c
M regen/mk_PL_charclass.pl
commit f8031bf4eaf9713b719f3d5296f0344d02b1b7f8
Author: Karl Williamson <[email protected]>
Date: Tue Aug 26 17:29:31 2014 -0600
regcomp.c: Extract functionality into a static function
This is in preparation for it being used in more than one place in a
future commit.
M embed.fnc
M embed.h
M proto.h
M regcomp.c
commit 25f3b59d7c3a1a8f1c22fc0d983ebd242af0945b
Author: Karl Williamson <[email protected]>
Date: Thu Aug 28 13:59:01 2014 -0600
XXXcharbits
M regcomp.h
commit 8d33af32dbeeb9b623ade03854ad3bc7708f6154
Author: Karl Williamson <[email protected]>
Date: Tue Jun 17 18:49:53 2014 -0600
XXX partial perlapi text
M perlvars.h
commit 24fb14887a147fa59dc04f0a1527976bf1ba9548
Author: Karl Williamson <[email protected]>
Date: Sat May 17 19:37:06 2014 -0600
XXX Don't push. attempt to tell tries everything at compile time
But, it appears this is thrown away, have to consult with Yves to see if
is worth pursuing
M embed.fnc
M embed.h
M perl.h
M proto.h
M regcomp.c
M regcomp.h
M regcomp.sym
commit 6654d3a7572c634e54d22b93b4f2a64043b60d4f
Author: Karl Williamson <[email protected]>
Date: Wed Sep 3 12:42:07 2014 -0600
regcomp.h: Comment nits
M regcomp.h
commit adf50a83f9c816310449d1277c15d31187dd8692
Author: Karl Williamson <[email protected]>
Date: Thu Aug 28 14:22:14 2014 -0600
Allow for changing size of bracketed regex char class
This commit allows Perl to be compiled with a bitmap size that is larger
than 256. This bitmap is used to directly look up whether a character
matches or not, without having to do a binary search or hash lookup. It
might improve the performance for some installations that have a lot of
use of scripts that are above the Latin1 range.
M embedvar.h
M intrpvar.h
M perl.c
M regcomp.c
M regcomp.h
M regexec.c
M sv.c
commit 1a0c44798e1435fabf54e508df3c0096bf827109
Author: Karl Williamson <[email protected]>
Date: Thu Aug 28 20:07:30 2014 -0600
Fix -Dr output to work for larger ANYOF node size
This generalizes the code for -Dr output to work to dump the contents of
ANYOF nodes (bracketed character classes) which have bitmaps for more
than code points 0-255.
M embed.fnc
M embed.h
M proto.h
M regcomp.c
commit 3505086c17d27b9a979fa7b1abb9cbaec52d996d
Author: Karl Williamson <[email protected]>
Date: Tue Aug 26 08:36:31 2014 -0600
regcomp.c: Swap if/else clauses
This makes it slightly easier to understand as there is no explicit
complement, but is mostly for a future commit.
M regcomp.c
commit c9d230a5de110a5b87478fe985943e1020f70f96
Author: Karl Williamson <[email protected]>
Date: Thu Aug 28 14:05:40 2014 -0600
Rename some internal regex #defines
These are renamed to be more clear as to their actual meanings. I know
other people have been confused by their former names.
Some of the name changes will become more important as future commits
will allow the bitmap in a bracketed character class to be a different
size.
M regcomp.c
M regcomp.h
M regexec.c
commit 1aa174e3899fd41863344405b7fe73fc2d68d23f
Author: Karl Williamson <[email protected]>
Date: Thu Aug 28 18:19:56 2014 -0600
regcomp.h: Remove some no-longer used #defines
This is an internal header, so can change names within it.
M regcomp.h
commit c744fd52a09db2ad18bb1ed078e60bdcc8f77035
Author: Karl Williamson <[email protected]>
Date: Thu Aug 28 14:36:15 2014 -0600
regcomp.h: Use unsigned 1 in left shift
This prevents a signed result if this macro ever gets used in a U8.
The ANYOF_BITMAP_TEST macro must now be cast or it would generate warnings
when compiled with -DPERL_BOOL_AS_CHAR
M regcomp.h
commit 178a5215c0ec653f14e7a0e0a58c4bd6f4bb02fa
Author: Karl Williamson <[email protected]>
Date: Thu Aug 28 18:50:22 2014 -0600
regcomp.h: Fix comment that said the opposite of the truth
Too many negations led to this.
M regcomp.h
commit 8bedfb5a59a4e5893e8c96702b68894a88bb8fa1
Author: Karl Williamson <[email protected]>
Date: Thu Aug 28 18:13:47 2014 -0600
regcomp.c: Remove unnecessary test
The 'while' makes the 'if' unnecessary here.
M regcomp.c
commit 9aa604f57302e4cd5f6c48ff074c71b4bae9ea28
Author: Karl Williamson <[email protected]>
Date: Wed Aug 27 22:12:02 2014 -0600
regexec.c: Simplify a short code section
Two "if"s can be combined, leading to one fewer (unoptimized) tests
M regexec.c
-----------------------------------------------------------------------
--
Perl5 Master Repository