[perl.git] branch maint-5.20, updated. v5.20.1-75-gff0890f

Steve Hay Sat, 10 Jan 2015 06:13:43 -0800

In perl.git, the branch maint-5.20 has been updated

<http://perl5.git.perl.org/perl.git/commitdiff/ff0890fe397087cb96ed5cab0e5a62eaced63858?hp=48ef8337fed3e33960323ad60ce1b5d3396c56e8>


- Log -----------------------------------------------------------------
commit ff0890fe397087cb96ed5cab0e5a62eaced63858
Author: Steve Hay <[email protected]>
Date:   Sat Jan 10 13:18:24 2015 +0000

    Fix typo in previous commit
    
    Manually backported from 605eee6061b9fb79ab7be45ac13eaf417fd8db4f.

M       pod/perldiag.pod

commit b8271aef8d2cc621e8de9b6a5c90d4c203b1a4ae
Author: Karl Williamson <[email protected]>
Date:   Thu Nov 27 22:29:36 2014 -0700

    perldiag: Add missing entry
    
    (cherry picked from commit faad849dd7af72f2db0a94098b4c462d3e573f5f)

M       pod/perldiag.pod

commit c53c18b37582ea0d3309c034dc67c4c337838b53
Author: Father Chrysostomos <[email protected]>
Date:   Sun Dec 14 14:36:56 2014 -0800

    perldelta for 487e470d / #123410
    
    (cherry picked from commit 070733dfda1bb0d07d337128db5d1e78f14e6562)

M       pod/perldelta.pod

commit 6ecf4c9b3d3ceee25646d9ff9cce633e5a8acde1
Author: Father Chrysostomos <[email protected]>
Date:   Thu Dec 11 18:01:56 2014 -0800

    [perl #123410] sort CORE::fake bizarre behaviour
    
    This commit:
    
    commit 01b5ef509f2ebf466fd7de2c1e7406717bb14332
    Author: Father Chrysostomos <[email protected]>
    Date:   Fri Jun 7 20:16:23 2013 -0700
    
        [perl #24482] Fix sort and require to treat CORE:: as keyword
    
    caused sort CORE::lc "FOO" to be equivalent to sort +CORE::lc "FOO",
    the way it does if a keyword is not preceded by CORE::.  But it made
    the mistake of chopping off the last six characters if it is not a
    keyword after CORE::.
    
    So
    
        sort CORE::f @_
    
    became equivalent to
    
        sort C @_
    
    !
    
    This commit just reverts to the previous behaviour in such cases.
    
    (cherry picked from commit 487e470dbd7a885bb6a92a735b2783e1c6740066)

M       t/op/sort.t
M       toke.c

commit b58dfac208eabad2bba6ad7680218c4fee116e16
Author: Karl Williamson <[email protected]>
Date:   Sat Jan 10 13:17:03 2015 +0000

    PATCH: [perl #123539] regcomp.c node overrun/segfault
    
    This is a minimal patch suitable for a maintenance release.  It extracts
    the guts of reguni and REGC without the conditional they have.  The next
    commit will do some refactoring to avoid branching and to make things
    clearer.
    
    This bug is due to the current two pass structure of the Perl regular
    expression compiler.  The first pass attempts to do just enough work to
    figure out how much space to malloc for the compiled pattern; and the
    2nd pass actually fills in the details.  One problem with this design is
    that in many cases quite a bit of work is required to figure out the
    size, and this work is thrown away and redone in the second pass.
    Another problem is that it is easy to forget to do enough work in the
    sizing pass, and that is what happened with the blamed commit.  I
    understand that there are plans (God speed) to change the compiler
    design.
    
    When not under /i matching, the size of a node that will match a
    sequence of characters is just the number of bytes those characters take
    up.  We have an easy way to calculate the number of bytes any code point
    will occupy in UTF-8, and it's just 1 byte per code point for non-UTF-8.
    So in the sizing pass, we don't actually have to figure out the
    representation of the characters.  However under /i matching, we do.
    First of all, matching of UTF-8 strings is done by replacing each
    character of each string by its fold-case (function fc()) and then
    comparing.  This is required by the nature of full Unicode matching
    which is not 1-1.  If we do that replacement for the pattern at compile
    time, we avoid having to do it over-and-over as pattern matching
    backtracks at execution.  And because fc(x) may not occupy the same
    number of bytes as x, and there is no easy way to know that size without
    actually doing the fc(), we have to do the fold in the sizing pass.
    Now, there are relatively few folds where sizeof(fc(x)) != sizeof(x), so
    we could construct an exception table for those few cases where it is,
    and look up through that.
    
    But there is another reason that we have to fold in the sizing pass.
    And that is because of the potential for multi-character folds being
    split across regnodes.  The regular expression compiler generates
    EXACTish regnodes for matching sequences of characters exactly or via
    /i.  The limit for how many bytes in a sequence such a node can match is
    255 because the length is stored in a U8.  If the pattern has a sequence
    longer than that, it is split into two or more EXACTish nodes in a row.
    (Actually, the compiler splits at a size much lower than that; I'm not
    sure why, but then two adjoining nodes whose total sum length is at most
    255 get joined later in the third, optimizing pass.)  Now consider,
    matching the character U+FB03 LATIN SMALL LIGATURE FFI.  It matches the
    sequence of the three characters "f f i".  Because of the design of the
    regex pattern matching code, if these characters are such that the first
    one or two are at the end of one EXACTish node, and the final two or one
    are in another EXACTish node, then U+FB03 wrongly would not match them.
    Matches can't cross node boundaries.  If the pattern were tweaked so all
    three characters were in either the first or second node, then the match
    would succeed.  And that is what the compiler does.  When it reaches the
    node's size limit, and the final character is one that is a non-terminal
    character in a multi-char fold, what's in the node is backed-off until
    it ends with a character without this characteristic.  This has to be
    done in the sizing pass, as we are repacking the nodes, which can affect
    the size of the pattern, and we have to know what the folds are in order
    to determine all this.
    
    (We don't fold non-UTF-8 patterns.  This is for two reasons.  One is
    that one character, the U+00B5 MICRO SIGN, folds to above-Latin1, and if
    we folded it, we would have to change the pattern into UTF-8, and that
    would slow everything down.  I've thought about adding a regnode type
    for the much more common case of a sequence that doesn't have this
    character in it, and which could hence be folded at compile time.  But
    I've not been able to justify this because of the 2nd reason, which is
    folds in this range are simple enough to be handled by an array lookup,
    so folding is fast at runtime.)
    
    Then there is the complication of matching under locale rules.  This bug
    manifested itself only under /l matching.  We can't fold at pattern
    compile time, because the folding rules won't be known until runtime.
    This isn't a problem for non-UTF-8 locales, as all folds are 1-1, and so
    there never will be a multi-char fold.  But there could be such folds in
    a UTF-8 locale, so the regnodes have to be packed to work for that
    eventuality.  The blamed commit did not do that, and because this issue
    doesn't arise unless there is a string long enough to trigger the
    problem, this wasn't found until now.  What is needed, and what this
    commit does, is for the unfolded characters to be accumulated in both
    passes.  The code that looks for potential multi-char fold issues
    handles both folded and unfolded-inputs, so will work.
    
    (cherry picked from commit 405dffcb17b9cc9d0e5d7b41835b998ca7f1d873)

M       regcomp.c
M       t/re/pat.t
-----------------------------------------------------------------------

Summary of changes:
 pod/perldelta.pod |  5 +++++
 pod/perldiag.pod  |  2 ++
 regcomp.c         | 16 +++++++++++++++-
 t/op/sort.t       |  4 +++-
 t/re/pat.t        |  7 ++++++-
 toke.c            |  5 +++--
 6 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/pod/perldelta.pod b/pod/perldelta.pod
index 34502e8..1273e67 100644
--- a/pod/perldelta.pod
+++ b/pod/perldelta.pod
@@ -410,6 +410,11 @@ Calling C<write> on a format with a C<^**> field could 
produce a panic
 in sv_chop() if there were insufficient arguments or if the variable
 used to fill the field was empty.  [perl #123245]
 
+In perl 5.20.0, C<sort CORE::fake> where 'fake' is anything other than a
+keyword started chopping of the last 6 characters and treating the result
+as a sort sub name.  The previous behaviour of treating "CORE::fake" as a
+sort sub name has been restored.  [perl #123410]
+
 =back
 
 =head1 Known Problems
diff --git a/pod/perldiag.pod b/pod/perldiag.pod
index e4b861b..d669168 100644
--- a/pod/perldiag.pod
+++ b/pod/perldiag.pod
@@ -3082,6 +3082,8 @@ can vary from one line to the next.
 
 (F) Missing right brace in C<\x{...}>, C<\p{...}>, C<\P{...}>, or C<\N{...}>.
 
+=item Missing right brace on \N{}
+
 =item Missing right brace on \N{} or unescaped left brace after \N
 
 (F) C<\N> has two meanings.
diff --git a/regcomp.c b/regcomp.c
index 98b69b0..573072a 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -12001,7 +12001,17 @@ tryagain:
                         && is_PROBLEMATIC_LOCALE_FOLD_cp(ender)))
                 {
                     if (UTF) {
-                        const STRLEN unilen = reguni(pRExC_state, ender, s);
+
+                        /* Normally, we don't need the representation of the
+                         * character in the sizing pass--just its size, but if
+                         * folding, we have to actually put the character out
+                         * even in the sizing pass, because the size could
+                         * change as we juggle things at the end of this loop
+                         * to avoid splitting a too-full node in the middle of
+                         * a potential multi-char fold [perl #123539] */
+                        const STRLEN unilen = (SIZE_ONLY && ! FOLD)
+                                               ? UNISKIP(ender)
+                                               : (uvchr_to_utf8((U8*)s, ender) 
- (U8*)s);
                         if (unilen > 0) {
                            s   += unilen;
                            len += unilen;
@@ -12014,6 +12024,10 @@ tryagain:
                          * cancel out the increment that follows */
                         len--;
                     }
+                    else if (FOLD) {
+                        /* See comment above for [perl #123539] */
+                        *(s++) = (char) ender;
+                    }
                     else {
                         REGC((char)ender, s++);
                     }
diff --git a/t/op/sort.t b/t/op/sort.t
index dd60f97..151e3ea 100644
--- a/t/op/sort.t
+++ b/t/op/sort.t
@@ -6,7 +6,7 @@ BEGIN {
     require 'test.pl';
 }
 use warnings;
-plan( tests => 182 );
+plan( tests => 183 );
 
 # these shouldn't hang
 {
@@ -122,6 +122,8 @@ cmp_ok("@b",'eq','1 2 3 4','reverse then sort');
 @b = sort CORE::reverse (4,1,3,2);
 cmp_ok("@b",'eq','1 2 3 4','CORE::reverse then sort');
 
+eval  { @b = sort CORE::revers (4,1,3,2); };
+like($@, qr/^Undefined sort subroutine "CORE::revers" called at /);
 
 
 sub twoface { no warnings 'redefine'; *twoface = sub { $a <=> $b }; &twoface }
diff --git a/t/re/pat.t b/t/re/pat.t
index 51838f9..a9cb739 100644
--- a/t/re/pat.t
+++ b/t/re/pat.t
@@ -20,7 +20,7 @@ BEGIN {
     require './test.pl';
 }
 
-plan tests => 722;  # Update this when adding/deleting tests.
+plan tests => 724;  # Update this when adding/deleting tests.
 
 run_tests() unless caller;
 
@@ -1588,6 +1588,11 @@ EOP
         like("X", qr/$x/, "UTF-8 of /[x]/i matches upper case");
     }
 
+    {  # [perl #123539]
+        
like("TffffffffffffTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT5TTTTTTTTTTTTTTTTTTTTTTTTT3TTgTTTTTTTTTTTTTTTTTTTTT2TTTTTTTTTTTTTTTTTTTTTTTHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHiH
 ... [640 chars truncated]
+        
like("TffffffffffffT\x{100}TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT5TTTTTTTTTTTTTTTTTTTTTTTTT3TTgTTTTTTTTTTTTTTTTTTTTT2TTTTTTTTTTTTTTTTTTTTTTTHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
 ... [652 chars truncated]
+    }
+
 } # End of sub run_tests
 
 1;
diff --git a/toke.c b/toke.c
index 0028b18..5af680d 100644
--- a/toke.c
+++ b/toke.c
@@ -2268,9 +2268,10 @@ S_force_word(pTHX_ char *start, int token, int 
check_keyword, int allow_pack)
        s = scan_word(s, PL_tokenbuf, sizeof PL_tokenbuf, allow_pack, &len);
        if (check_keyword) {
          char *s2 = PL_tokenbuf;
+         STRLEN len2 = len;
          if (allow_pack && len > 6 && strnEQ(s2, "CORE::", 6))
-           s2 += 6, len -= 6;
-         if (keyword(s2, len, 0))
+           s2 += 6, len2 -= 6;
+         if (keyword(s2, len2, 0))
            return start;
        }
        start_force(PL_curforce);

--
Perl5 Master Repository

[perl.git] branch maint-5.20, updated. v5.20.1-75-gff0890f

Reply via email to