Title: [233690] trunk
- Revision
- 233690
- Author
- [email protected]
- Date
- 2018-07-10 10:34:34 -0700 (Tue, 10 Jul 2018)
Log Message
YARR: . doesn't match non-BMP Unicode characters in some cases
https://bugs.webkit.org/show_bug.cgi?id=187248
Reviewed by Geoffrey Garen.
JSTests:
New regression test.
* stress/regexp-with-nonBMP-any.js: Added.
Source/_javascript_Core:
The safety check in optimizeAlternative() for moving character classes that only consist of BMP
characters did not take into account that the character class is inverted. In this case, we
represent '.' as "not a newline" using the newline character class with an inverted check.
Clearly that includes non-BMP characters.
The fix is to check that the character class doesn't have non-BMP characters AND it isn't an
inverted use of that character class.
* yarr/YarrJIT.cpp:
(JSC::Yarr::YarrGenerator::optimizeAlternative):
Modified Paths
Added Paths
Diff
Modified: trunk/JSTests/ChangeLog (233689 => 233690)
--- trunk/JSTests/ChangeLog 2018-07-10 17:22:34 UTC (rev 233689)
+++ trunk/JSTests/ChangeLog 2018-07-10 17:34:34 UTC (rev 233690)
@@ -1,3 +1,14 @@
+2018-07-10 Michael Saboff <[email protected]>
+
+ YARR: . doesn't match non-BMP Unicode characters in some cases
+ https://bugs.webkit.org/show_bug.cgi?id=187248
+
+ Reviewed by Geoffrey Garen.
+
+ New regression test.
+
+ * stress/regexp-with-nonBMP-any.js: Added.
+
2018-07-09 Michael Saboff <[email protected]>
REGRESSION (ICU-62100.0.1): JSC test mozilla-tests.yaml/ecma/String/15.5.4.12-3.js is failing
Added: trunk/JSTests/stress/regexp-with-nonBMP-any.js (0 => 233690)
--- trunk/JSTests/stress/regexp-with-nonBMP-any.js (rev 0)
+++ trunk/JSTests/stress/regexp-with-nonBMP-any.js 2018-07-10 17:34:34 UTC (rev 233690)
@@ -0,0 +1,10 @@
+// This test that . followed by fixed character terms works with non-BMP characters
+
+if (!/^.-clef/u.test("\u{1D123}-clef"))
+ throw "Should have matched string with leading non-BMP with BOL anchored . in RE";
+
+if (!/c.lef/u.test("c\u{1C345}lef"))
+ throw "Should have matched string with non-BMP with . in RE";
+
+
+
Modified: trunk/Source/_javascript_Core/ChangeLog (233689 => 233690)
--- trunk/Source/_javascript_Core/ChangeLog 2018-07-10 17:22:34 UTC (rev 233689)
+++ trunk/Source/_javascript_Core/ChangeLog 2018-07-10 17:34:34 UTC (rev 233690)
@@ -1,3 +1,21 @@
+2018-07-10 Michael Saboff <[email protected]>
+
+ YARR: . doesn't match non-BMP Unicode characters in some cases
+ https://bugs.webkit.org/show_bug.cgi?id=187248
+
+ Reviewed by Geoffrey Garen.
+
+ The safety check in optimizeAlternative() for moving character classes that only consist of BMP
+ characters did not take into account that the character class is inverted. In this case, we
+ represent '.' as "not a newline" using the newline character class with an inverted check.
+ Clearly that includes non-BMP characters.
+
+ The fix is to check that the character class doesn't have non-BMP characters AND it isn't an
+ inverted use of that character class.
+
+ * yarr/YarrJIT.cpp:
+ (JSC::Yarr::YarrGenerator::optimizeAlternative):
+
2018-07-09 Mark Lam <[email protected]>
Add --traceLLIntExecution and --traceLLIntSlowPath options.
Modified: trunk/Source/_javascript_Core/yarr/YarrJIT.cpp (233689 => 233690)
--- trunk/Source/_javascript_Core/yarr/YarrJIT.cpp 2018-07-10 17:22:34 UTC (rev 233689)
+++ trunk/Source/_javascript_Core/yarr/YarrJIT.cpp 2018-07-10 17:34:34 UTC (rev 233690)
@@ -321,7 +321,7 @@
// We can move BMP only character classes after fixed character terms.
if ((term.type == PatternTerm::TypeCharacterClass)
&& (term.quantityType == QuantifierFixedCount)
- && (!m_decodeSurrogatePairs || !term.characterClass->m_hasNonBMPCharacters)
+ && (!m_decodeSurrogatePairs || (!term.characterClass->m_hasNonBMPCharacters && !term.m_invert))
&& (nextTerm.type == PatternTerm::TypePatternCharacter)
&& (nextTerm.quantityType == QuantifierFixedCount)) {
PatternTerm termCopy = term;
_______________________________________________
webkit-changes mailing list
[email protected]
https://lists.webkit.org/mailman/listinfo/webkit-changes