In perl.git, the branch blead has been updated

<http://perl5.git.perl.org/perl.git/commitdiff/7dcb3b25fc4113f0eeb68d0d3c47ccedd5ff3f2a?hp=cfbab81b96edaf7de871d0fa306f1723e15a56d7>

- Log -----------------------------------------------------------------
commit 7dcb3b25fc4113f0eeb68d0d3c47ccedd5ff3f2a
Author: Karl Williamson <k...@khw-desktop.(none)>
Date:   Tue Apr 13 21:25:36 2010 -0600

    PATCH: [perl #72998] regex looping
    
    If a character folds to multiple ones in case-insensitive matching,
    it should not match just one of those, or the regular expression can
    loop.  For example, \N{LATIN SMALL LIGATURE FF} folds to 'ff', and so
        "\N{LATIN SMALL LIGATURE FF}" =~ /f+/i
    should match.  Prior to this patch, this function returned that there is
    a match, but left the matching string  pointer at the beginning of the
    "\N{LATIN SMALL LIGATURE FF}" because it doesn't make sense to match
    just half a character, and at this level it doesn't know about the '+'.
    This leaves things in an inconsistent state, with the reporting of a
    match, but the input pointer unchanged, the result of which is a loop.
    
    I don't know how to fix this so that it correctly matches, and there are
    semantic issues with doing so.  For example, if
        "\N{LATIN SMALL LIGATURE FF}" =~ /ff/i
    matches, then one would think that so should
        "\N{LATIN SMALL LIGATURE FF}" =~ /(f)(f)/i
    But $1 and $2 don't really make sense here, since they both refer to the
    half of the same character.
    
    So this patch just returns failure if only a partial character is
    matched.  That leaves things consistent, and solves the problem of
    looping, so that Perl doesn't hang on such a construct, but leaves the
    ultimate solution for another day.
-----------------------------------------------------------------------

Summary of changes:
 t/re/re.t |   10 +++++++++-
 utf8.c    |    3 ++-
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/t/re/re.t b/t/re/re.t
index 87965f2..249c6dd 100644
--- a/t/re/re.t
+++ b/t/re/re.t
@@ -51,6 +51,14 @@ if ('1234'=~/(?:(?<A>\d)|(?<C>!))(?<B>\d)(?<A>\d)(?<B>\d)/){
     }
     is(regnames_count(),3);
 }
+
+    { # Keep this test last, as whole script will be interrupted if times out
+        # Bug #72998; this can loop 
+        watchdog(2);
+        eval '"\x{100}\x{FB00}" =~ /\x{100}\N{U+66}+/i';
+        pass("Didn't loop");
+    }
+
 # New tests above this line, don't forget to update the test count below!
-BEGIN { plan tests => 18 }
+BEGIN { plan tests => 19 }
 # No tests here!
diff --git a/utf8.c b/utf8.c
index 9ed0663..1a6077c 100644
--- a/utf8.c
+++ b/utf8.c
@@ -2609,7 +2609,8 @@ Perl_ibcmp_utf8(pTHX_ const char *s1, char **pe1, 
register UV l1, bool u1, const
 
      /* A match is defined by all the scans that specified
       * an explicit length reaching their final goals. */
-     match = (f1 == 0 || p1 == f1) && (f2 == 0 || p2 == f2);
+     match = (n1 == 0 && n2 == 0    /* Must not match partial char; Bug #72998 
*/
+            && (f1 == 0 || p1 == f1) && (f2 == 0 || p2 == f2));
 
      if (match) {
          if (pe1)

--
Perl5 Master Repository

Reply via email to