Title: [233690] trunk
Revision
233690
Author
[email protected]
Date
2018-07-10 10:34:34 -0700 (Tue, 10 Jul 2018)

Log Message

YARR: . doesn't match non-BMP Unicode characters in some cases
https://bugs.webkit.org/show_bug.cgi?id=187248

Reviewed by Geoffrey Garen.

JSTests:

New regression test.

* stress/regexp-with-nonBMP-any.js: Added.

Source/_javascript_Core:

The safety check in optimizeAlternative() for moving character classes that only consist of BMP
characters did not take into account that the character class is inverted.  In this case, we
represent '.' as "not a newline" using the newline character class with an inverted check.
Clearly that includes non-BMP characters.

The fix is to check that the character class doesn't have non-BMP characters AND it isn't an
inverted use of that character class.

* yarr/YarrJIT.cpp:
(JSC::Yarr::YarrGenerator::optimizeAlternative):

Modified Paths

Added Paths

Diff

Modified: trunk/JSTests/ChangeLog (233689 => 233690)


--- trunk/JSTests/ChangeLog	2018-07-10 17:22:34 UTC (rev 233689)
+++ trunk/JSTests/ChangeLog	2018-07-10 17:34:34 UTC (rev 233690)
@@ -1,3 +1,14 @@
+2018-07-10  Michael Saboff  <[email protected]>
+
+        YARR: . doesn't match non-BMP Unicode characters in some cases
+        https://bugs.webkit.org/show_bug.cgi?id=187248
+
+        Reviewed by Geoffrey Garen.
+
+        New regression test.
+
+        * stress/regexp-with-nonBMP-any.js: Added.
+
 2018-07-09  Michael Saboff  <[email protected]>
 
         REGRESSION (ICU-62100.0.1): JSC test mozilla-tests.yaml/ecma/String/15.5.4.12-3.js is failing

Added: trunk/JSTests/stress/regexp-with-nonBMP-any.js (0 => 233690)


--- trunk/JSTests/stress/regexp-with-nonBMP-any.js	                        (rev 0)
+++ trunk/JSTests/stress/regexp-with-nonBMP-any.js	2018-07-10 17:34:34 UTC (rev 233690)
@@ -0,0 +1,10 @@
+// This test that . followed by fixed character terms works with non-BMP characters
+
+if (!/^.-clef/u.test("\u{1D123}-clef"))
+    throw "Should have matched string with leading non-BMP with BOL anchored . in RE";
+
+if (!/c.lef/u.test("c\u{1C345}lef"))
+    throw "Should have matched string with non-BMP with . in RE";
+
+
+

Modified: trunk/Source/_javascript_Core/ChangeLog (233689 => 233690)


--- trunk/Source/_javascript_Core/ChangeLog	2018-07-10 17:22:34 UTC (rev 233689)
+++ trunk/Source/_javascript_Core/ChangeLog	2018-07-10 17:34:34 UTC (rev 233690)
@@ -1,3 +1,21 @@
+2018-07-10  Michael Saboff  <[email protected]>
+
+        YARR: . doesn't match non-BMP Unicode characters in some cases
+        https://bugs.webkit.org/show_bug.cgi?id=187248
+
+        Reviewed by Geoffrey Garen.
+
+        The safety check in optimizeAlternative() for moving character classes that only consist of BMP
+        characters did not take into account that the character class is inverted.  In this case, we
+        represent '.' as "not a newline" using the newline character class with an inverted check.
+        Clearly that includes non-BMP characters.
+
+        The fix is to check that the character class doesn't have non-BMP characters AND it isn't an
+        inverted use of that character class.
+
+        * yarr/YarrJIT.cpp:
+        (JSC::Yarr::YarrGenerator::optimizeAlternative):
+
 2018-07-09  Mark Lam  <[email protected]>
 
         Add --traceLLIntExecution and --traceLLIntSlowPath options.

Modified: trunk/Source/_javascript_Core/yarr/YarrJIT.cpp (233689 => 233690)


--- trunk/Source/_javascript_Core/yarr/YarrJIT.cpp	2018-07-10 17:22:34 UTC (rev 233689)
+++ trunk/Source/_javascript_Core/yarr/YarrJIT.cpp	2018-07-10 17:34:34 UTC (rev 233690)
@@ -321,7 +321,7 @@
             // We can move BMP only character classes after fixed character terms.
             if ((term.type == PatternTerm::TypeCharacterClass)
                 && (term.quantityType == QuantifierFixedCount)
-                && (!m_decodeSurrogatePairs || !term.characterClass->m_hasNonBMPCharacters)
+                && (!m_decodeSurrogatePairs || (!term.characterClass->m_hasNonBMPCharacters && !term.m_invert))
                 && (nextTerm.type == PatternTerm::TypePatternCharacter)
                 && (nextTerm.quantityType == QuantifierFixedCount)) {
                 PatternTerm termCopy = term;
_______________________________________________
webkit-changes mailing list
[email protected]
https://lists.webkit.org/mailman/listinfo/webkit-changes

Reply via email to