In perl.git, the branch sprout/regexp has been created
<http://perl5.git.perl.org/perl.git/commitdiff/36dac416e8fff5e04b6db0eab5d6620349661e7a?hp=0000000000000000000000000000000000000000>
at 36dac416e8fff5e04b6db0eab5d6620349661e7a (commit)
- Log -----------------------------------------------------------------
commit 36dac416e8fff5e04b6db0eab5d6620349661e7a
Author: Father Chrysostomos <[email protected]>
Date: Tue Jul 30 23:49:58 2013 -0700
Stop substr re optimisation from rejecting long strs
Using I32 for the fields that record information about the location of
a fixed string that must be found for a regular expression to match
can result in match failures, because I32 is not large enough to store
offsets >= 2**31.
SSize_t is appropriate, since it is 64 bits on 64-bit platforms and 32
bits on 32-bit platforms.
This commit changes enough instance of I32 to SSize_t to get the added
test passing and suppress compiler warninsg. A later commit will
change many more.
M embed.fnc
M perl.h
M proto.h
M regcomp.c
M regexec.c
M regexp.h
M t/bigmem/regexp.t
commit 317d2acae720f943cda8f653a51ff9020a35d466
Author: Father Chrysostomos <[email protected]>
Date: Sat Jul 27 07:10:08 2013 -0700
regexec.c:regexec_flags: Remove unused var
This variable, end_shift, only ever holds the value 0. It is used in
only one place, where its value is assigned to another variable.
The history is interesting:
start_shift and end_shift were added in c277df42229, which added sub-
string optimisations. 146174a91a192 rearranged stuff a bit, causing
these variables to be unused. 155aba94f6 commented out the unused
start_shift to suppress a compiler warning. 1df70142a966e remove=
the start_shift comment, leaving the comment describing end_shift
nonsensical.
M regexec.c
commit 29ab6e9e151f71edbce8d065ab67696fe5e6960e
Author: Father Chrysostomos <[email protected]>
Date: Fri Jul 26 06:25:36 2013 -0700
Remove null check from mg.c:magic_getvec
lsv can never be null here. This null check has been here since
vecâs get-magic was added in ae389c8a or 6ff81951f7.
M mg.c
commit 604d47281585ff8dfc3dfa96cf2f8da0d5ec4bf6
Author: Father Chrysostomos <[email protected]>
Date: Fri Jul 26 01:26:54 2013 -0700
Fix assert fail when fetching pos clobbers ref with undef
pos($x) returns a special magical scalar that sets the match position
on $x. Calling pos($x) twice will provide two such scalars. If we
set one of them to a reference, set the other to undef, and then read
the first, all hail breaks loose, because of the use of SvOK_off.
SvOK_off is not sufficient if arbitrary values can be assigned by Perl
code. Globs, refs and regexps (among others) need special handling,
which sv_setsv knows how to do.
M mg.c
M t/op/pos.t
commit efbdd4f51c13e387c1560e8380f93c40fe7aced9
Author: Father Chrysostomos <[email protected]>
Date: Thu Jul 25 23:09:58 2013 -0700
Fix assertion failure with $#a=\1
If the array has been freed and a reference is then assigned to
the arylen scalar and then get-magic is called on that scalar,
Perl_magic_getarylen misbehaves. SvOK_off is not sufficient if
arbitrary values can be assigned by Perl code. Globs, refs and
regexps (among others) need special handling, which sv_setsv
knows how to do.
M mg.c
M t/op/array.t
commit 1bbb99ec8435cf81dbc4895988a73ca2e5ccf937
Author: Father Chrysostomos <[email protected]>
Date: Thu Jul 25 18:08:23 2013 -0700
Stop values from âstickingâ to @- and @+ elems
These arrays are very similar to tied arrays, in that the elements are
created on the fly when looked up. So push @_, \$+[0], \$+[0], will
push references to two different scalars on to @_.
That they are created on the fly prevents this bug from showing up
in most code: If you reference the element you can observe that, on
FETCH, it gets set to the corresponding offset *if* the last match has
a set of capturing parentheses with the right number. Otherwise, the
value in the element is left as-is.
So, doing another pattern match with, say, 5 captures and then another
with fewer will leave $+[5] and $-[5] holding values from the first
match, if there is a FETCH in between the two matches:
$ perl -le '" "=~/()()()()(..)/; $_ = \$+[5]; print $$_; ""=~ /()/; print
$$_;'
2
2
And attempts at assignment will succeed, even though they croak:
$ perl -le 'for ($-[0]) { eval { $_ = *foo }; print $_ }'
*main::foo
The solution here is to make the magic âgetâ handler set the SV
no matter what, instead of just setting it when it refers to a
valid offset.
M mg.c
M t/re/pat.t
commit 80015b6c7696731a12388637c057f572dbc904a3
Author: Father Chrysostomos <[email protected]>
Date: Thu Jul 25 16:52:59 2013 -0700
Make @- and @+ return correct offsets beyond 2**31
M mg.c
M t/bigmem/regexp.t
commit 7eae411efcf8191cd02cc50afe86fe4eb224e965
Author: Father Chrysostomos <[email protected]>
Date: Thu Jul 25 16:41:01 2013 -0700
Make $' work past the 2**31 threshold
M regcomp.c
M regexp.h
M t/bigmem/regexp.t
commit 24788a74c2c30a9fc25da235576703e9a9ad3931
Author: Father Chrysostomos <[email protected]>
Date: Thu Jul 25 00:41:07 2013 -0700
[perl #116907] Allow //g matching past 2**31 threshold
Change the internal fields for storing positions so that //g in scalar
context can move past the 2**31 character threshold. Before this com-
mit, the numbers would wrap, resulting in assertion failures.
The changes in this commit are only enough to get the added test pass-
ing. Stay tuned for more.
M embed.fnc
M pp_hot.c
M proto.h
M regcomp.c
M regexec.c
M regexp.h
M t/bigmem/regexp.t
commit 3cef77f025431cbd4df8413a61b79b23170f1028
Author: Father Chrysostomos <[email protected]>
Date: Wed Jul 24 18:14:06 2013 -0700
pp_hot.c: Show lengths in -Dr output for minlen optimisation
M pp_hot.c
commit a26e09358e1cabbfce2ad099055165c2b1361362
Author: Father Chrysostomos <[email protected]>
Date: Wed Jul 24 14:23:54 2013 -0700
Stop minlen regexp optimisation from rejecting long strings
This fixes #112790 and part of #116907.
The length of the string is cast to I32, so it wraps and end up less
than the minimum length.
For now, simply skip this optimisation if minlen itself wraps and
becomes negative.
M MANIFEST
M pp_hot.c
A t/bigmem/regexp.t
commit 7ebce98bf139a89fd2db021cf837c59c869699ee
Author: Father Chrysostomos <[email protected]>
Date: Tue Jul 23 13:15:34 2013 -0700
Stop pos() from being confused by changing utf8ness
The value of pos() is stored as a byte offset. If it is stored on a
tied variable or a reference (or glob), then the stringification could
change, resulting in pos() now pointing to a different character off-
set or pointing to the middle of a character:
$ ./perl -Ilib -le '$x = bless [], chr 256; pos $x=1; bless $x, a; print
pos $x'
2
$ ./perl -Ilib -le '$x = bless [], chr 256; pos $x=1; bless $x, "\x{1000}";
print pos $x'
Malformed UTF-8 character (unexpected end of string) in match position at
-e line 1.
0
So pos() should be stored as a character offset.
The regular expression engine expects byte offsets always, so allow it
to store bytes when possible (a pure non-magical string) but use char-
acters otherwise.
This does result in more complexity than I should like, but the alter-
native (always storing a character offset) would slow down regular
expressions, which is a big no-no.
M dump.c
M embed.fnc
M embed.h
M ext/Devel-Peek/t/Peek.t
M inline.h
M mg.c
M mg.h
M pp.c
M pp_ctl.c
M pp_hot.c
M proto.h
M regexec.c
M regexp.h
M sv.c
M sv.h
M t/op/pos.t
-----------------------------------------------------------------------
--
Perl5 Master Repository