In perl.git, the branch sprout/regexp has been created

<http://perl5.git.perl.org/perl.git/commitdiff/36dac416e8fff5e04b6db0eab5d6620349661e7a?hp=0000000000000000000000000000000000000000>

        at  36dac416e8fff5e04b6db0eab5d6620349661e7a (commit)

- Log -----------------------------------------------------------------
commit 36dac416e8fff5e04b6db0eab5d6620349661e7a
Author: Father Chrysostomos <[email protected]>
Date:   Tue Jul 30 23:49:58 2013 -0700

    Stop substr re optimisation from rejecting long strs
    
    Using I32 for the fields that record information about the location of
    a fixed string that must be found for a regular expression to match
    can result in match failures, because I32 is not large enough to store
    offsets >= 2**31.
    
    SSize_t is appropriate, since it is 64 bits on 64-bit platforms and 32
    bits on 32-bit platforms.
    
    This commit changes enough instance of I32 to SSize_t to get the added
    test passing and suppress compiler warninsg.  A later commit will
    change many more.

M       embed.fnc
M       perl.h
M       proto.h
M       regcomp.c
M       regexec.c
M       regexp.h
M       t/bigmem/regexp.t

commit 317d2acae720f943cda8f653a51ff9020a35d466
Author: Father Chrysostomos <[email protected]>
Date:   Sat Jul 27 07:10:08 2013 -0700

    regexec.c:regexec_flags: Remove unused var
    
    This variable, end_shift, only ever holds the value 0.  It is used in
    only one place, where its value is assigned to another variable.
    
    The history is interesting:
    
    start_shift and end_shift were added in c277df42229, which added sub-
    string optimisations.  146174a91a192 rearranged stuff a bit, causing
    these variables to be unused.  155aba94f6 commented out the unused
    start_shift to suppress a compiler warning.  1df70142a966e remove=
    the start_shift comment, leaving the comment describing end_shift
    nonsensical.

M       regexec.c

commit 29ab6e9e151f71edbce8d065ab67696fe5e6960e
Author: Father Chrysostomos <[email protected]>
Date:   Fri Jul 26 06:25:36 2013 -0700

    Remove null check from mg.c:magic_getvec
    
    lsv can never be null here.  This null check has been here since
    vec’s get-magic was added in ae389c8a or 6ff81951f7.

M       mg.c

commit 604d47281585ff8dfc3dfa96cf2f8da0d5ec4bf6
Author: Father Chrysostomos <[email protected]>
Date:   Fri Jul 26 01:26:54 2013 -0700

    Fix assert fail when fetching pos clobbers ref with undef
    
    pos($x) returns a special magical scalar that sets the match position
    on $x.  Calling pos($x) twice will provide two such scalars.  If we
    set one of them to a reference, set the other to undef, and then read
    the first, all hail breaks loose, because of the use of SvOK_off.
    
    SvOK_off is not sufficient if arbitrary values can be assigned by Perl
    code.  Globs, refs and regexps (among others) need special handling,
    which sv_setsv knows how to do.

M       mg.c
M       t/op/pos.t

commit efbdd4f51c13e387c1560e8380f93c40fe7aced9
Author: Father Chrysostomos <[email protected]>
Date:   Thu Jul 25 23:09:58 2013 -0700

    Fix assertion failure with $#a=\1
    
    If the array has been freed and a reference is then assigned to
    the arylen scalar and then get-magic is called on that scalar,
    Perl_magic_getarylen misbehaves.  SvOK_off is not sufficient if
    arbitrary values can be assigned by Perl code.  Globs, refs and
    regexps (among others) need special handling, which sv_setsv
    knows how to do.

M       mg.c
M       t/op/array.t

commit 1bbb99ec8435cf81dbc4895988a73ca2e5ccf937
Author: Father Chrysostomos <[email protected]>
Date:   Thu Jul 25 18:08:23 2013 -0700

    Stop values from ‘sticking’ to @- and @+ elems
    
    These arrays are very similar to tied arrays, in that the elements are
    created on the fly when looked up.  So push @_, \$+[0], \$+[0], will
    push references to two different scalars on to @_.
    
    That they are created on the fly prevents this bug from showing up
    in most code:  If you reference the element you can observe that, on
    FETCH, it gets set to the corresponding offset *if* the last match has
    a set of capturing parentheses with the right number.  Otherwise, the
    value in the element is left as-is.
    
    So, doing another pattern match with, say, 5 captures and then another
    with fewer will leave $+[5] and $-[5] holding values from the first
    match, if there is a FETCH in between the two matches:
    
    $ perl -le '"  "=~/()()()()(..)/; $_ = \$+[5]; print $$_; ""=~ /()/; print 
$$_;'
    2
    2
    
    And attempts at assignment will succeed, even though they croak:
    
    $ perl -le 'for ($-[0]) { eval { $_ = *foo }; print $_ }'
    *main::foo
    
    The solution here is to make the magic ‘get’ handler set the SV
    no matter what, instead of just setting it when it refers to a
    valid offset.

M       mg.c
M       t/re/pat.t

commit 80015b6c7696731a12388637c057f572dbc904a3
Author: Father Chrysostomos <[email protected]>
Date:   Thu Jul 25 16:52:59 2013 -0700

    Make @- and @+ return correct offsets beyond 2**31

M       mg.c
M       t/bigmem/regexp.t

commit 7eae411efcf8191cd02cc50afe86fe4eb224e965
Author: Father Chrysostomos <[email protected]>
Date:   Thu Jul 25 16:41:01 2013 -0700

    Make $' work past the 2**31 threshold

M       regcomp.c
M       regexp.h
M       t/bigmem/regexp.t

commit 24788a74c2c30a9fc25da235576703e9a9ad3931
Author: Father Chrysostomos <[email protected]>
Date:   Thu Jul 25 00:41:07 2013 -0700

    [perl #116907] Allow //g matching past 2**31 threshold
    
    Change the internal fields for storing positions so that //g in scalar
    context can move past the 2**31 character threshold.  Before this com-
    mit, the numbers would wrap, resulting in assertion failures.
    
    The changes in this commit are only enough to get the added test pass-
    ing.  Stay tuned for more.

M       embed.fnc
M       pp_hot.c
M       proto.h
M       regcomp.c
M       regexec.c
M       regexp.h
M       t/bigmem/regexp.t

commit 3cef77f025431cbd4df8413a61b79b23170f1028
Author: Father Chrysostomos <[email protected]>
Date:   Wed Jul 24 18:14:06 2013 -0700

    pp_hot.c: Show lengths in -Dr output for minlen optimisation

M       pp_hot.c

commit a26e09358e1cabbfce2ad099055165c2b1361362
Author: Father Chrysostomos <[email protected]>
Date:   Wed Jul 24 14:23:54 2013 -0700

    Stop minlen regexp optimisation from rejecting long strings
    
    This fixes #112790 and part of #116907.
    
    The length of the string is cast to I32, so it wraps and end up less
    than the minimum length.
    
    For now, simply skip this optimisation if minlen itself wraps and
    becomes negative.

M       MANIFEST
M       pp_hot.c
A       t/bigmem/regexp.t

commit 7ebce98bf139a89fd2db021cf837c59c869699ee
Author: Father Chrysostomos <[email protected]>
Date:   Tue Jul 23 13:15:34 2013 -0700

    Stop pos() from being confused by changing utf8ness
    
    The value of pos() is stored as a byte offset.  If it is stored on a
    tied variable or a reference (or glob), then the stringification could
    change, resulting in pos() now pointing to a different character off-
    set or pointing to the middle of a character:
    
    $ ./perl -Ilib -le '$x = bless [], chr 256; pos $x=1; bless $x, a; print 
pos $x'
    2
    $ ./perl -Ilib -le '$x = bless [], chr 256; pos $x=1; bless $x, "\x{1000}"; 
print pos $x'
    Malformed UTF-8 character (unexpected end of string) in match position at 
-e line 1.
    0
    
    So pos() should be stored as a character offset.
    
    The regular expression engine expects byte offsets always, so allow it
    to store bytes when possible (a pure non-magical string) but use char-
    acters otherwise.
    
    This does result in more complexity than I should like, but the alter-
    native (always storing a character offset) would slow down regular
    expressions, which is a big no-no.

M       dump.c
M       embed.fnc
M       embed.h
M       ext/Devel-Peek/t/Peek.t
M       inline.h
M       mg.c
M       mg.h
M       pp.c
M       pp_ctl.c
M       pp_hot.c
M       proto.h
M       regexec.c
M       regexp.h
M       sv.c
M       sv.h
M       t/op/pos.t
-----------------------------------------------------------------------

--
Perl5 Master Repository

Reply via email to