Title: [203206] trunk/Source/_javascript_Core
Revision
203206
Author
[email protected]
Date
2016-07-13 18:20:11 -0700 (Wed, 13 Jul 2016)

Log Message

YARR uses mixture of int and unsigned values to index into subject string
https://bugs.webkit.org/show_bug.cgi?id=159744

Reviewed by Benjamin Poulain.

In most cases, we refer to characters in subject strings using a negative index from the number of
"checked" characters in a subject string.  The required length is compared against the actual length
and then we use that required length as the checked amount.  For example, when matching the string of
4 digits in the RegExp /abc \d{4}/, we know that need 8 characters in the subject string.  We refer
to the digits part of the _expression_ from an already checked index of 8 and use negative offsets of
-4 through -1.  In many cases we used a signed int for the negative offsets.  There are other cases
where we used unsigned values as the amount of negative offset to use when accessing subject characters.

Changed all occurrances of character offsets to unsigned or Checked Arithmetic unsigned values.  Note
that the pre-existing Checked class is used in other places to check for under/overflow with arithmetic
operations.  Those unsigned offsets are always the number of characters before (negative) from the
current checked character offset.  Also added some asserts for cases where arithmetic is not protected
by other checks or with Checked<> wrapped values.

In the case of the JIT, subject characters are accessed using base with scaled index and offset
addressing.  The MacroAssembler provides this addressing using the BaseIndex struct.  The offset for
this struct is a 32 bit signed quantity.  Since we only care about negative offsets, we really only
have 31 bits.  Changed the generation of a BaseOffset address to handle the case when the offset and
scaled combination will exceed the 31 bits of negative offset.  This is done by moving the base value
into a temp register and biasing the temp base and offset to smaller values so that we can emit
instructions that can reference characters without exceeding the 31 bits of negative offset.

To abstract the character address generation, put the base with scaled index and offset into
one function and used that function everywhere the YARR JIT wants to access subject characters.
Also consilidated a few cases where we were generating inline what readCharacter() does.  Usually
this was due to using a different index register.

Added a new regression test.

* tests/stress/regress-159744.js: Added regression test.
(testRegExp):
* yarr/YarrInterpreter.cpp:
(JSC::Yarr::Interpreter::recordParenthesesMatch):
(JSC::Yarr::Interpreter::resetMatches):
(JSC::Yarr::Interpreter::matchParenthesesOnceEnd):
(JSC::Yarr::Interpreter::backtrackParenthesesOnceEnd):
(JSC::Yarr::ByteCompiler::closeBodyAlternative):
(JSC::Yarr::ByteCompiler::atomParenthesesSubpatternEnd):
(JSC::Yarr::ByteCompiler::atomParenthesesOnceEnd):
(JSC::Yarr::ByteCompiler::atomParenthesesTerminalEnd):
(JSC::Yarr::ByteCompiler::emitDisjunction):
* yarr/YarrInterpreter.h:
(JSC::Yarr::ByteTerm::ByteTerm):
(JSC::Yarr::ByteTerm::BOL):
(JSC::Yarr::ByteTerm::UncheckInput):
(JSC::Yarr::ByteTerm::EOL):
(JSC::Yarr::ByteTerm::WordBoundary):
(JSC::Yarr::ByteTerm::BackReference):
* yarr/YarrJIT.cpp:
(JSC::Yarr::YarrGenerator::notAtEndOfInput):
(JSC::Yarr::YarrGenerator::negativeOffsetIndexedAddress):
(JSC::Yarr::YarrGenerator::readCharacter):
(JSC::Yarr::YarrGenerator::jumpIfCharNotEquals):
(JSC::Yarr::YarrGenerator::storeToFrame):
(JSC::Yarr::YarrGenerator::generateAssertionBOL):
(JSC::Yarr::YarrGenerator::backtrackAssertionBOL):
(JSC::Yarr::YarrGenerator::generateAssertionEOL):
(JSC::Yarr::YarrGenerator::matchAssertionWordchar):
(JSC::Yarr::YarrGenerator::generateAssertionWordBoundary):
(JSC::Yarr::YarrGenerator::generatePatternCharacterOnce):
(JSC::Yarr::YarrGenerator::generatePatternCharacterFixed):
(JSC::Yarr::YarrGenerator::generatePatternCharacterGreedy):
(JSC::Yarr::YarrGenerator::backtrackPatternCharacterNonGreedy):
(JSC::Yarr::YarrGenerator::generateCharacterClassOnce):
(JSC::Yarr::YarrGenerator::generateCharacterClassFixed):
(JSC::Yarr::YarrGenerator::generateCharacterClassGreedy):
(JSC::Yarr::YarrGenerator::backtrackCharacterClassNonGreedy):
(JSC::Yarr::YarrGenerator::generate):
(JSC::Yarr::YarrGenerator::backtrack):
(JSC::Yarr::YarrGenerator::YarrGenerator):
* yarr/YarrPattern.h:
(JSC::Yarr::PatternTerm::PatternTerm):

Modified Paths

Added Paths

Diff

Modified: trunk/Source/_javascript_Core/ChangeLog (203205 => 203206)


--- trunk/Source/_javascript_Core/ChangeLog	2016-07-14 00:52:00 UTC (rev 203205)
+++ trunk/Source/_javascript_Core/ChangeLog	2016-07-14 01:20:11 UTC (rev 203206)
@@ -1,3 +1,83 @@
+2016-07-13  Michael Saboff  <[email protected]>
+
+        YARR uses mixture of int and unsigned values to index into subject string
+        https://bugs.webkit.org/show_bug.cgi?id=159744
+
+        Reviewed by Benjamin Poulain.
+
+        In most cases, we refer to characters in subject strings using a negative index from the number of
+        "checked" characters in a subject string.  The required length is compared against the actual length
+        and then we use that required length as the checked amount.  For example, when matching the string of
+        4 digits in the RegExp /abc \d{4}/, we know that need 8 characters in the subject string.  We refer
+        to the digits part of the _expression_ from an already checked index of 8 and use negative offsets of
+        -4 through -1.  In many cases we used a signed int for the negative offsets.  There are other cases
+        where we used unsigned values as the amount of negative offset to use when accessing subject characters.
+
+        Changed all occurrances of character offsets to unsigned or Checked Arithmetic unsigned values.  Note
+        that the pre-existing Checked class is used in other places to check for under/overflow with arithmetic
+        operations.  Those unsigned offsets are always the number of characters before (negative) from the
+        current checked character offset.  Also added some asserts for cases where arithmetic is not protected
+        by other checks or with Checked<> wrapped values.
+
+        In the case of the JIT, subject characters are accessed using base with scaled index and offset
+        addressing.  The MacroAssembler provides this addressing using the BaseIndex struct.  The offset for
+        this struct is a 32 bit signed quantity.  Since we only care about negative offsets, we really only
+        have 31 bits.  Changed the generation of a BaseOffset address to handle the case when the offset and
+        scaled combination will exceed the 31 bits of negative offset.  This is done by moving the base value
+        into a temp register and biasing the temp base and offset to smaller values so that we can emit
+        instructions that can reference characters without exceeding the 31 bits of negative offset.
+
+        To abstract the character address generation, put the base with scaled index and offset into
+        one function and used that function everywhere the YARR JIT wants to access subject characters.
+        Also consilidated a few cases where we were generating inline what readCharacter() does.  Usually
+        this was due to using a different index register.
+
+        Added a new regression test.
+
+        * tests/stress/regress-159744.js: Added regression test.
+        (testRegExp):
+        * yarr/YarrInterpreter.cpp:
+        (JSC::Yarr::Interpreter::recordParenthesesMatch):
+        (JSC::Yarr::Interpreter::resetMatches):
+        (JSC::Yarr::Interpreter::matchParenthesesOnceEnd):
+        (JSC::Yarr::Interpreter::backtrackParenthesesOnceEnd):
+        (JSC::Yarr::ByteCompiler::closeBodyAlternative):
+        (JSC::Yarr::ByteCompiler::atomParenthesesSubpatternEnd):
+        (JSC::Yarr::ByteCompiler::atomParenthesesOnceEnd):
+        (JSC::Yarr::ByteCompiler::atomParenthesesTerminalEnd):
+        (JSC::Yarr::ByteCompiler::emitDisjunction):
+        * yarr/YarrInterpreter.h:
+        (JSC::Yarr::ByteTerm::ByteTerm):
+        (JSC::Yarr::ByteTerm::BOL):
+        (JSC::Yarr::ByteTerm::UncheckInput):
+        (JSC::Yarr::ByteTerm::EOL):
+        (JSC::Yarr::ByteTerm::WordBoundary):
+        (JSC::Yarr::ByteTerm::BackReference):
+        * yarr/YarrJIT.cpp:
+        (JSC::Yarr::YarrGenerator::notAtEndOfInput):
+        (JSC::Yarr::YarrGenerator::negativeOffsetIndexedAddress):
+        (JSC::Yarr::YarrGenerator::readCharacter):
+        (JSC::Yarr::YarrGenerator::jumpIfCharNotEquals):
+        (JSC::Yarr::YarrGenerator::storeToFrame):
+        (JSC::Yarr::YarrGenerator::generateAssertionBOL):
+        (JSC::Yarr::YarrGenerator::backtrackAssertionBOL):
+        (JSC::Yarr::YarrGenerator::generateAssertionEOL):
+        (JSC::Yarr::YarrGenerator::matchAssertionWordchar):
+        (JSC::Yarr::YarrGenerator::generateAssertionWordBoundary):
+        (JSC::Yarr::YarrGenerator::generatePatternCharacterOnce):
+        (JSC::Yarr::YarrGenerator::generatePatternCharacterFixed):
+        (JSC::Yarr::YarrGenerator::generatePatternCharacterGreedy):
+        (JSC::Yarr::YarrGenerator::backtrackPatternCharacterNonGreedy):
+        (JSC::Yarr::YarrGenerator::generateCharacterClassOnce):
+        (JSC::Yarr::YarrGenerator::generateCharacterClassFixed):
+        (JSC::Yarr::YarrGenerator::generateCharacterClassGreedy):
+        (JSC::Yarr::YarrGenerator::backtrackCharacterClassNonGreedy):
+        (JSC::Yarr::YarrGenerator::generate):
+        (JSC::Yarr::YarrGenerator::backtrack):
+        (JSC::Yarr::YarrGenerator::YarrGenerator):
+        * yarr/YarrPattern.h:
+        (JSC::Yarr::PatternTerm::PatternTerm):
+
 2016-07-13  Keith Miller  <[email protected]>
 
         Crashes with detached ArrayBuffers

Added: trunk/Source/_javascript_Core/tests/stress/regress-159744.js (0 => 203206)


--- trunk/Source/_javascript_Core/tests/stress/regress-159744.js	                        (rev 0)
+++ trunk/Source/_javascript_Core/tests/stress/regress-159744.js	2016-07-14 01:20:11 UTC (rev 203206)
@@ -0,0 +1,15 @@
+// Regression test for 159744.  This test should not crash or throw an exception.
+
+function testRegExp(pattern, flags, string, result)
+{
+    let r = new RegExp(pattern, flags);
+    if (r.exec(string) !== result)
+        throw("Expected " + r + "exec(\"" + string + "\") to return " + result + ".");
+}
+
+testRegExp("((?=$))??(?:\\1){34359738368,}", "gm", "abc\nabc\nabc\nabc\n", null);
+testRegExp("(\\w+)(?:\\s(\\1)){1100000000,}", "i", "word Word WORD WoRd", null);
+testRegExp("\\d{4,}.{1073741825}", "", "1234567\u1234", null);
+testRegExp("(?:abcd){2148473648,}", "", "abcdabcdabcd", null);
+testRegExp("(?:abcd){2148473648,}", "y", "abcdabcdabcd", null);
+testRegExp("(ab){1073741825,}(xy){1073741825,}", "", "abxyabxy", null);

Modified: trunk/Source/_javascript_Core/yarr/YarrInterpreter.cpp (203205 => 203206)


--- trunk/Source/_javascript_Core/yarr/YarrInterpreter.cpp	2016-07-14 00:52:00 UTC (rev 203205)
+++ trunk/Source/_javascript_Core/yarr/YarrInterpreter.cpp	2016-07-14 01:20:11 UTC (rev 203206)
@@ -675,8 +675,8 @@
     {
         if (term.capture()) {
             unsigned subpatternId = term.atom.subpatternId;
-            output[(subpatternId << 1)] = context->getDisjunctionContext(term)->matchBegin + term.inputPosition;
-            output[(subpatternId << 1) + 1] = context->getDisjunctionContext(term)->matchEnd + term.inputPosition;
+            output[(subpatternId << 1)] = context->getDisjunctionContext(term)->matchBegin - term.inputPosition;
+            output[(subpatternId << 1) + 1] = context->getDisjunctionContext(term)->matchEnd - term.inputPosition;
         }
     }
     void resetMatches(ByteTerm& term, ParenthesesDisjunctionContext* context)
@@ -742,7 +742,7 @@
 
         if (term.capture()) {
             unsigned subpatternId = term.atom.subpatternId;
-            output[(subpatternId << 1) + 1] = input.getPos() + term.inputPosition;
+            output[(subpatternId << 1) + 1] = input.getPos() - term.inputPosition;
         }
 
         if (term.atom.quantityType == QuantifierFixedCount)
@@ -806,7 +806,7 @@
                     ASSERT((&term - term.atom.parenthesesWidth)->type == ByteTerm::TypeParenthesesSubpatternOnceBegin);
                     ASSERT((&term - term.atom.parenthesesWidth)->inputPosition == term.inputPosition);
                     unsigned subpatternId = term.atom.subpatternId;
-                    output[subpatternId << 1] = input.getPos() + term.inputPosition;
+                    output[subpatternId << 1] = input.getPos() - term.inputPosition;
                 }
                 context->term -= term.atom.parenthesesWidth;
                 return true;
@@ -1811,7 +1811,7 @@
         m_bodyDisjunction->terms[endIndex].frameLocation = frameLocation;
     }
 
-    void atomParenthesesSubpatternEnd(unsigned lastSubpatternId, int inputPosition, unsigned frameLocation, Checked<unsigned> quantityCount, QuantifierType quantityType, unsigned callFrameSize = 0)
+    void atomParenthesesSubpatternEnd(unsigned lastSubpatternId, unsigned inputPosition, unsigned frameLocation, Checked<unsigned> quantityCount, QuantifierType quantityType, unsigned callFrameSize = 0)
     {
         unsigned beginTerm = popParenthesesStack();
         closeAlternative(beginTerm + 1);
@@ -1845,7 +1845,7 @@
         m_bodyDisjunction->terms[beginTerm].frameLocation = frameLocation;
     }
 
-    void atomParenthesesOnceEnd(int inputPosition, unsigned frameLocation, Checked<unsigned> quantityCount, QuantifierType quantityType)
+    void atomParenthesesOnceEnd(unsigned inputPosition, unsigned frameLocation, Checked<unsigned> quantityCount, QuantifierType quantityType)
     {
         unsigned beginTerm = popParenthesesStack();
         closeAlternative(beginTerm + 1);
@@ -1867,7 +1867,7 @@
         m_bodyDisjunction->terms[endTerm].atom.quantityType = quantityType;
     }
 
-    void atomParenthesesTerminalEnd(int inputPosition, unsigned frameLocation, Checked<unsigned> quantityCount, QuantifierType quantityType)
+    void atomParenthesesTerminalEnd(unsigned inputPosition, unsigned frameLocation, Checked<unsigned> quantityCount, QuantifierType quantityType)
     {
         unsigned beginTerm = popParenthesesStack();
         closeAlternative(beginTerm + 1);
@@ -1983,18 +1983,21 @@
                             disjunctionAlreadyCheckedCount = term.parentheses.disjunction->m_minimumSize;
                         else
                             alternativeFrameLocation += YarrStackSpaceForBackTrackInfoParenthesesOnce;
-                        unsigned delegateEndInputOffset = term.inputPosition - currentCountAlreadyChecked;
-                        atomParenthesesOnceBegin(term.parentheses.subpatternId, term.capture(), disjunctionAlreadyCheckedCount - delegateEndInputOffset, term.frameLocation, alternativeFrameLocation);
+                        ASSERT(currentCountAlreadyChecked >= term.inputPosition);
+                        unsigned delegateEndInputOffset = currentCountAlreadyChecked - term.inputPosition;
+                        atomParenthesesOnceBegin(term.parentheses.subpatternId, term.capture(), disjunctionAlreadyCheckedCount + delegateEndInputOffset, term.frameLocation, alternativeFrameLocation);
                         emitDisjunction(term.parentheses.disjunction, currentCountAlreadyChecked, disjunctionAlreadyCheckedCount);
                         atomParenthesesOnceEnd(delegateEndInputOffset, term.frameLocation, term.quantityCount, term.quantityType);
                     } else if (term.parentheses.isTerminal) {
-                        unsigned delegateEndInputOffset = term.inputPosition - currentCountAlreadyChecked;
-                        atomParenthesesTerminalBegin(term.parentheses.subpatternId, term.capture(), disjunctionAlreadyCheckedCount - delegateEndInputOffset, term.frameLocation, term.frameLocation + YarrStackSpaceForBackTrackInfoParenthesesOnce);
+                        ASSERT(currentCountAlreadyChecked >= term.inputPosition);
+                        unsigned delegateEndInputOffset = currentCountAlreadyChecked - term.inputPosition;
+                        atomParenthesesTerminalBegin(term.parentheses.subpatternId, term.capture(), disjunctionAlreadyCheckedCount + delegateEndInputOffset, term.frameLocation, term.frameLocation + YarrStackSpaceForBackTrackInfoParenthesesOnce);
                         emitDisjunction(term.parentheses.disjunction, currentCountAlreadyChecked, disjunctionAlreadyCheckedCount);
                         atomParenthesesTerminalEnd(delegateEndInputOffset, term.frameLocation, term.quantityCount, term.quantityType);
                     } else {
-                        unsigned delegateEndInputOffset = term.inputPosition - currentCountAlreadyChecked;
-                        atomParenthesesSubpatternBegin(term.parentheses.subpatternId, term.capture(), disjunctionAlreadyCheckedCount - delegateEndInputOffset, term.frameLocation, 0);
+                        ASSERT(currentCountAlreadyChecked >= term.inputPosition);
+                        unsigned delegateEndInputOffset = currentCountAlreadyChecked - term.inputPosition;
+                        atomParenthesesSubpatternBegin(term.parentheses.subpatternId, term.capture(), disjunctionAlreadyCheckedCount + delegateEndInputOffset, term.frameLocation, 0);
                         emitDisjunction(term.parentheses.disjunction, currentCountAlreadyChecked, 0);
                         atomParenthesesSubpatternEnd(term.parentheses.lastSubpatternId, delegateEndInputOffset, term.frameLocation, term.quantityCount, term.quantityType, term.parentheses.disjunction->m_callFrameSize);
                     }
@@ -2004,8 +2007,8 @@
                 case PatternTerm::TypeParentheticalAssertion: {
                     unsigned alternativeFrameLocation = term.frameLocation + YarrStackSpaceForBackTrackInfoParentheticalAssertion;
 
-                    ASSERT(currentCountAlreadyChecked >= static_cast<unsigned>(term.inputPosition));
-                    unsigned positiveInputOffset = currentCountAlreadyChecked - static_cast<unsigned>(term.inputPosition);
+                    ASSERT(currentCountAlreadyChecked >= term.inputPosition);
+                    unsigned positiveInputOffset = currentCountAlreadyChecked - term.inputPosition;
                     unsigned uncheckAmount = 0;
                     if (positiveInputOffset > term.parentheses.disjunction->m_minimumSize) {
                         uncheckAmount = positiveInputOffset - term.parentheses.disjunction->m_minimumSize;

Modified: trunk/Source/_javascript_Core/yarr/YarrInterpreter.h (203205 => 203206)


--- trunk/Source/_javascript_Core/yarr/YarrInterpreter.h	2016-07-14 00:52:00 UTC (rev 203205)
+++ trunk/Source/_javascript_Core/yarr/YarrInterpreter.h	2016-07-14 01:20:11 UTC (rev 203206)
@@ -106,7 +106,7 @@
     bool m_invert : 1;
     unsigned inputPosition;
 
-    ByteTerm(UChar32 ch, int inputPos, unsigned frameLocation, Checked<unsigned> quantityCount, QuantifierType quantityType)
+    ByteTerm(UChar32 ch, unsigned inputPos, unsigned frameLocation, Checked<unsigned> quantityCount, QuantifierType quantityType)
         : frameLocation(frameLocation)
         , m_capture(false)
         , m_invert(false)
@@ -129,7 +129,7 @@
         inputPosition = inputPos;
     }
 
-    ByteTerm(UChar32 lo, UChar32 hi, int inputPos, unsigned frameLocation, Checked<unsigned> quantityCount, QuantifierType quantityType)
+    ByteTerm(UChar32 lo, UChar32 hi, unsigned inputPos, unsigned frameLocation, Checked<unsigned> quantityCount, QuantifierType quantityType)
         : frameLocation(frameLocation)
         , m_capture(false)
         , m_invert(false)
@@ -153,7 +153,7 @@
         inputPosition = inputPos;
     }
 
-    ByteTerm(CharacterClass* characterClass, bool invert, int inputPos)
+    ByteTerm(CharacterClass* characterClass, bool invert, unsigned inputPos)
         : type(ByteTerm::TypeCharacterClass)
         , m_capture(false)
         , m_invert(invert)
@@ -164,7 +164,7 @@
         inputPosition = inputPos;
     }
 
-    ByteTerm(Type type, unsigned subpatternId, ByteDisjunction* parenthesesInfo, bool capture, int inputPos)
+    ByteTerm(Type type, unsigned subpatternId, ByteDisjunction* parenthesesInfo, bool capture, unsigned inputPos)
         : type(type)
         , m_capture(capture)
         , m_invert(false)
@@ -185,7 +185,7 @@
         atom.quantityCount = 1;
     }
 
-    ByteTerm(Type type, unsigned subpatternId, bool capture, bool invert, int inputPos)
+    ByteTerm(Type type, unsigned subpatternId, bool capture, bool invert, unsigned inputPos)
         : type(type)
         , m_capture(capture)
         , m_invert(invert)
@@ -196,7 +196,7 @@
         inputPosition = inputPos;
     }
 
-    static ByteTerm BOL(int inputPos)
+    static ByteTerm BOL(unsigned inputPos)
     {
         ByteTerm term(TypeAssertionBOL);
         term.inputPosition = inputPos;
@@ -217,7 +217,7 @@
         return term;
     }
     
-    static ByteTerm EOL(int inputPos)
+    static ByteTerm EOL(unsigned inputPos)
     {
         ByteTerm term(TypeAssertionEOL);
         term.inputPosition = inputPos;
@@ -224,7 +224,7 @@
         return term;
     }
 
-    static ByteTerm WordBoundary(bool invert, int inputPos)
+    static ByteTerm WordBoundary(bool invert, unsigned inputPos)
     {
         ByteTerm term(TypeAssertionWordBoundary, invert);
         term.inputPosition = inputPos;
@@ -231,7 +231,7 @@
         return term;
     }
     
-    static ByteTerm BackReference(unsigned subpatternId, int inputPos)
+    static ByteTerm BackReference(unsigned subpatternId, unsigned inputPos)
     {
         return ByteTerm(TypeBackReference, subpatternId, false, false, inputPos);
     }

Modified: trunk/Source/_javascript_Core/yarr/YarrJIT.cpp (203205 => 203206)


--- trunk/Source/_javascript_Core/yarr/YarrJIT.cpp	2016-07-14 00:52:00 UTC (rev 203205)
+++ trunk/Source/_javascript_Core/yarr/YarrJIT.cpp	2016-07-14 01:20:11 UTC (rev 203206)
@@ -285,10 +285,51 @@
         return branch32(NotEqual, index, length);
     }
 
-    Jump jumpIfCharNotEquals(UChar32 ch, int inputPosition, RegisterID character)
+    BaseIndex negativeOffsetIndexedAddress(Checked<unsigned> negativeCharacterOffset, RegisterID tempReg, RegisterID indexReg = index)
     {
-        readCharacter(inputPosition, character);
+        RegisterID base = input;
 
+        // BaseIndex() addressing can take a int32_t offset. Given that we can have a regular
+        // _expression_ that has unsigned character offsets, BaseIndex's signed offset is insufficient
+        // for addressing in extreme cases where we might underflow. Therefore we check to see if
+        // negativeCharacterOffset will underflow directly or after converting for 16 bit characters.
+        // If so, we do our own address calculating by adjusting the base, using the result register
+        // as a temp address register.
+        unsigned maximumNegativeOffsetForCharacterSize = m_charSize == Char8 ? 0x7fffffff : 0x3fffffff;
+        unsigned offsetAdjustAmount = 0x40000000;
+        if (negativeCharacterOffset.unsafeGet() > maximumNegativeOffsetForCharacterSize) {
+            base = tempReg;
+            move(input, base);
+            while (negativeCharacterOffset.unsafeGet() > maximumNegativeOffsetForCharacterSize) {
+                subPtr(TrustedImm32(offsetAdjustAmount), base);
+                if (m_charSize != Char8)
+                    subPtr(TrustedImm32(offsetAdjustAmount), base);
+                negativeCharacterOffset -= offsetAdjustAmount;
+            }
+        }
+
+        Checked<int32_t> characterOffset(-static_cast<int32_t>(negativeCharacterOffset.unsafeGet()));
+
+        if (m_charSize == Char8)
+            return BaseIndex(input, indexReg, TimesOne, (characterOffset * static_cast<int32_t>(sizeof(char))).unsafeGet());
+
+        return BaseIndex(input, indexReg, TimesTwo, (characterOffset * static_cast<int32_t>(sizeof(UChar))).unsafeGet());
+    }
+
+    void readCharacter(Checked<unsigned> negativeCharacterOffset, RegisterID resultReg, RegisterID indexReg = index)
+    {
+        BaseIndex address = negativeOffsetIndexedAddress(negativeCharacterOffset, resultReg, indexReg);
+
+        if (m_charSize == Char8)
+            load8(address, resultReg);
+        else
+            load16Unaligned(address, resultReg);
+    }
+
+    Jump jumpIfCharNotEquals(UChar32 ch, Checked<unsigned> negativeCharacterOffset, RegisterID character)
+    {
+        readCharacter(negativeCharacterOffset, character);
+
         // For case-insesitive compares, non-ascii characters that have different
         // upper & lower case representations are converted to a character class.
         ASSERT(!m_pattern.ignoreCase() || isASCIIAlpha(ch) || isCanonicallyUnique(ch));
@@ -299,15 +340,7 @@
 
         return branch32(NotEqual, character, Imm32(ch));
     }
-
-    void readCharacter(int inputPosition, RegisterID reg)
-    {
-        if (m_charSize == Char8)
-            load8(BaseIndex(input, index, TimesOne, inputPosition * sizeof(char)), reg);
-        else
-            load16(BaseIndex(input, index, TimesTwo, inputPosition * sizeof(UChar)), reg);
-    }
-
+    
     void storeToFrame(RegisterID reg, unsigned frameLocation)
     {
         poke(reg, frameLocation);
@@ -493,9 +526,9 @@
         bool m_isDeadCode;
 
         // Currently used in the case of some of the more complex management of
-        // 'm_checked', to cache the offset used in this alternative, to avoid
+        // 'm_checkedOffset', to cache the offset used in this alternative, to avoid
         // recalculating it.
-        int m_checkAdjust;
+        Checked<unsigned> m_checkAdjust;
 
         // Used by OpNestedAlternativeNext/End to hold the pointer to the
         // value that will be pushed into the pattern's frame to return to,
@@ -645,9 +678,9 @@
 
             JumpList matchDest;
             if (!term->inputPosition)
-                matchDest.append(branch32(Equal, index, Imm32(m_checked)));
+                matchDest.append(branch32(Equal, index, Imm32(m_checkedOffset.unsafeGet())));
 
-            readCharacter((term->inputPosition - m_checked) - 1, character);
+            readCharacter(m_checkedOffset - term->inputPosition + 1, character);
             matchCharacterClass(character, matchDest, m_pattern.newlineCharacterClass());
             op.m_jumps.append(jump());
 
@@ -657,7 +690,7 @@
             if (term->inputPosition)
                 op.m_jumps.append(jump());
             else
-                op.m_jumps.append(branch32(NotEqual, index, Imm32(m_checked)));
+                op.m_jumps.append(branch32(NotEqual, index, Imm32(m_checkedOffset.unsafeGet())));
         }
     }
     void backtrackAssertionBOL(size_t opIndex)
@@ -674,16 +707,16 @@
             const RegisterID character = regT0;
 
             JumpList matchDest;
-            if (term->inputPosition == m_checked)
+            if (term->inputPosition == m_checkedOffset.unsafeGet())
                 matchDest.append(atEndOfInput());
 
-            readCharacter(term->inputPosition - m_checked, character);
+            readCharacter(m_checkedOffset - term->inputPosition, character);
             matchCharacterClass(character, matchDest, m_pattern.newlineCharacterClass());
             op.m_jumps.append(jump());
 
             matchDest.link(this);
         } else {
-            if (term->inputPosition == m_checked)
+            if (term->inputPosition == m_checkedOffset.unsafeGet())
                 op.m_jumps.append(notAtEndOfInput());
             // Erk, really should poison out these alternatives early. :-/
             else
@@ -703,10 +736,10 @@
 
         const RegisterID character = regT0;
 
-        if (term->inputPosition == m_checked)
+        if (term->inputPosition == m_checkedOffset.unsafeGet())
             nextIsNotWordChar.append(atEndOfInput());
 
-        readCharacter((term->inputPosition - m_checked), character);
+        readCharacter(m_checkedOffset - term->inputPosition, character);
         matchCharacterClass(character, nextIsWordChar, m_pattern.wordcharCharacterClass());
     }
 
@@ -720,8 +753,8 @@
         Jump atBegin;
         JumpList matchDest;
         if (!term->inputPosition)
-            atBegin = branch32(Equal, index, Imm32(m_checked));
-        readCharacter((term->inputPosition - m_checked) - 1, character);
+            atBegin = branch32(Equal, index, Imm32(m_checkedOffset.unsafeGet()));
+        readCharacter(m_checkedOffset - term->inputPosition + 1, character);
         matchCharacterClass(character, matchDest, m_pattern.wordcharCharacterClass());
         if (!term->inputPosition)
             atBegin.link(this);
@@ -782,7 +815,7 @@
         }
 
         const RegisterID character = regT0;
-        int maxCharactersAtOnce = m_charSize == Char8 ? 4 : 2;
+        unsigned maxCharactersAtOnce = m_charSize == Char8 ? 4 : 2;
         unsigned ignoreCaseMask = 0;
 #if CPU(BIG_ENDIAN)
         int allCharacters = ch << (m_charSize == Char8 ? 24 : 16);
@@ -789,8 +822,8 @@
 #else
         int allCharacters = ch;
 #endif
-        int numberCharacters;
-        int startTermPosition = term->inputPosition;
+        unsigned numberCharacters;
+        unsigned startTermPosition = term->inputPosition;
 
         // For case-insesitive compares, non-ascii characters that have different
         // upper & lower case representations are converted to a character class.
@@ -841,25 +874,22 @@
         if (m_charSize == Char8) {
             switch (numberCharacters) {
             case 1:
-                op.m_jumps.append(jumpIfCharNotEquals(ch, startTermPosition - m_checked, character));
+                op.m_jumps.append(jumpIfCharNotEquals(ch, m_checkedOffset - startTermPosition, character));
                 return;
             case 2: {
-                BaseIndex address(input, index, TimesOne, (startTermPosition - m_checked) * sizeof(LChar));
-                load16Unaligned(address, character);
+                load16Unaligned(negativeOffsetIndexedAddress(m_checkedOffset - startTermPosition, character), character);
                 break;
             }
             case 3: {
-                BaseIndex highAddress(input, index, TimesOne, (startTermPosition - m_checked) * sizeof(LChar));
-                load16Unaligned(highAddress, character);
+                load16Unaligned(negativeOffsetIndexedAddress(m_checkedOffset - startTermPosition, character), character);
                 if (ignoreCaseMask)
                     or32(Imm32(ignoreCaseMask), character);
                 op.m_jumps.append(branch32(NotEqual, character, Imm32((allCharacters & 0xffff) | ignoreCaseMask)));
-                op.m_jumps.append(jumpIfCharNotEquals(allCharacters >> 16, startTermPosition + 2 - m_checked, character));
+                op.m_jumps.append(jumpIfCharNotEquals(allCharacters >> 16, m_checkedOffset - startTermPosition - 2, character));
                 return;
             }
             case 4: {
-                BaseIndex address(input, index, TimesOne, (startTermPosition - m_checked) * sizeof(LChar));
-                load32WithUnalignedHalfWords(address, character);
+                load32WithUnalignedHalfWords(negativeOffsetIndexedAddress(m_checkedOffset- startTermPosition, character), character);
                 break;
             }
             }
@@ -866,11 +896,10 @@
         } else {
             switch (numberCharacters) {
             case 1:
-                op.m_jumps.append(jumpIfCharNotEquals(ch, term->inputPosition - m_checked, character));
+                op.m_jumps.append(jumpIfCharNotEquals(ch, m_checkedOffset - term->inputPosition, character));
                 return;
             case 2:
-                BaseIndex address(input, index, TimesTwo, (term->inputPosition - m_checked) * sizeof(UChar));
-                load32WithUnalignedHalfWords(address, character);
+                load32WithUnalignedHalfWords(negativeOffsetIndexedAddress(m_checkedOffset- term->inputPosition, character), character);
                 break;
             }
         }
@@ -898,13 +927,7 @@
         sub32(Imm32(term->quantityCount.unsafeGet()), countRegister);
 
         Label loop(this);
-        BaseIndex address(input, countRegister, m_charScale, (Checked<int>(term->inputPosition - m_checked + Checked<int64_t>(term->quantityCount)) * static_cast<int>(m_charSize == Char8 ? sizeof(char) : sizeof(UChar))).unsafeGet());
-
-        if (m_charSize == Char8)
-            load8(address, character);
-        else
-            load16(address, character);
-
+        readCharacter(m_checkedOffset - term->inputPosition - term->quantityCount, character, countRegister);
         // For case-insesitive compares, non-ascii characters that have different
         // upper & lower case representations are converted to a character class.
         ASSERT(!m_pattern.ignoreCase() || isASCIIAlpha(ch) || isCanonicallyUnique(ch));
@@ -938,7 +961,7 @@
             JumpList failures;
             Label loop(this);
             failures.append(atEndOfInput());
-            failures.append(jumpIfCharNotEquals(ch, term->inputPosition - m_checked, character));
+            failures.append(jumpIfCharNotEquals(ch, m_checkedOffset - term->inputPosition, character));
 
             add32(TrustedImm32(1), countRegister);
             add32(TrustedImm32(1), index);
@@ -999,7 +1022,7 @@
             nonGreedyFailures.append(atEndOfInput());
             if (term->quantityCount != quantifyInfinite)
                 nonGreedyFailures.append(branch32(Equal, countRegister, Imm32(term->quantityCount.unsafeGet())));
-            nonGreedyFailures.append(jumpIfCharNotEquals(ch, term->inputPosition - m_checked, character));
+            nonGreedyFailures.append(jumpIfCharNotEquals(ch, m_checkedOffset - term->inputPosition, character));
 
             add32(TrustedImm32(1), countRegister);
             add32(TrustedImm32(1), index);
@@ -1020,7 +1043,7 @@
         const RegisterID character = regT0;
 
         JumpList matchDest;
-        readCharacter(term->inputPosition - m_checked, character);
+        readCharacter(m_checkedOffset - term->inputPosition, character);
         matchCharacterClass(character, matchDest, term->characterClass);
 
         if (term->invert())
@@ -1048,10 +1071,7 @@
 
         Label loop(this);
         JumpList matchDest;
-        if (m_charSize == Char8)
-            load8(BaseIndex(input, countRegister, TimesOne, (Checked<int>(term->inputPosition - m_checked + Checked<int64_t>(term->quantityCount)) * static_cast<int>(sizeof(char))).unsafeGet()), character);
-        else
-            load16(BaseIndex(input, countRegister, TimesTwo, (Checked<int>(term->inputPosition - m_checked + Checked<int64_t>(term->quantityCount)) * static_cast<int>(sizeof(UChar))).unsafeGet()), character);
+        readCharacter(m_checkedOffset - term->inputPosition - term->quantityCount, character, countRegister);
         matchCharacterClass(character, matchDest, term->characterClass);
 
         if (term->invert())
@@ -1084,11 +1104,11 @@
         failures.append(atEndOfInput());
 
         if (term->invert()) {
-            readCharacter(term->inputPosition - m_checked, character);
+            readCharacter(m_checkedOffset - term->inputPosition, character);
             matchCharacterClass(character, failures, term->characterClass);
         } else {
             JumpList matchDest;
-            readCharacter(term->inputPosition - m_checked, character);
+            readCharacter(m_checkedOffset - term->inputPosition, character);
             matchCharacterClass(character, matchDest, term->characterClass);
             failures.append(jump());
             matchDest.link(this);
@@ -1152,7 +1172,7 @@
         nonGreedyFailures.append(branch32(Equal, countRegister, Imm32(term->quantityCount.unsafeGet())));
 
         JumpList matchDest;
-        readCharacter(term->inputPosition - m_checked, character);
+        readCharacter(m_checkedOffset - term->inputPosition, character);
         matchCharacterClass(character, matchDest, term->characterClass);
 
         if (term->invert())
@@ -1417,7 +1437,7 @@
                 // set as appropriate to this alternative.
                 op.m_reentry = label();
 
-                m_checked += alternative->m_minimumSize;
+                m_checkedOffset += alternative->m_minimumSize;
                 break;
             }
             case OpBodyAlternativeNext:
@@ -1470,8 +1490,8 @@
                 }
 
                 if (op.m_op == OpBodyAlternativeNext)
-                    m_checked += alternative->m_minimumSize;
-                m_checked -= priorAlternative->m_minimumSize;
+                    m_checkedOffset += alternative->m_minimumSize;
+                m_checkedOffset -= priorAlternative->m_minimumSize;
                 break;
             }
 
@@ -1498,13 +1518,13 @@
                 PatternDisjunction* disjunction = term->parentheses.disjunction;
 
                 // Calculate how much input we need to check for, and if non-zero check.
-                op.m_checkAdjust = alternative->m_minimumSize;
+                op.m_checkAdjust = Checked<unsigned>(alternative->m_minimumSize);
                 if ((term->quantityType == QuantifierFixedCount) && (term->type != PatternTerm::TypeParentheticalAssertion))
                     op.m_checkAdjust -= disjunction->m_minimumSize;
                 if (op.m_checkAdjust)
-                    op.m_jumps.append(jumpIfNoAvailableInput(op.m_checkAdjust));
+                    op.m_jumps.append(jumpIfNoAvailableInput(op.m_checkAdjust.unsafeGet()));
 
-                m_checked += op.m_checkAdjust;
+                m_checkedOffset += op.m_checkAdjust;
                 break;
             }
             case OpSimpleNestedAlternativeNext:
@@ -1552,11 +1572,11 @@
                 if ((term->quantityType == QuantifierFixedCount) && (term->type != PatternTerm::TypeParentheticalAssertion))
                     op.m_checkAdjust -= disjunction->m_minimumSize;
                 if (op.m_checkAdjust)
-                    op.m_jumps.append(jumpIfNoAvailableInput(op.m_checkAdjust));
+                    op.m_jumps.append(jumpIfNoAvailableInput(op.m_checkAdjust.unsafeGet()));
 
                 YarrOp& lastOp = m_ops[op.m_previousOp];
-                m_checked -= lastOp.m_checkAdjust;
-                m_checked += op.m_checkAdjust;
+                m_checkedOffset -= lastOp.m_checkAdjust;
+                m_checkedOffset += op.m_checkAdjust;
                 break;
             }
             case OpSimpleNestedAlternativeEnd:
@@ -1585,7 +1605,7 @@
                 op.m_jumps.clear();
 
                 YarrOp& lastOp = m_ops[op.m_previousOp];
-                m_checked -= lastOp.m_checkAdjust;
+                m_checkedOffset -= lastOp.m_checkAdjust;
                 break;
             }
 
@@ -1629,11 +1649,12 @@
                 // offsets only afterwards, at the point the results array is
                 // being accessed.
                 if (term->capture() && compileMode == IncludeSubpatterns) {
-                    int inputOffset = term->inputPosition - m_checked;
+                    unsigned inputOffset = (m_checkedOffset - term->inputPosition).unsafeGet();
                     if (term->quantityType == QuantifierFixedCount)
-                        inputOffset -= term->parentheses.disjunction->m_minimumSize;
+                        inputOffset += term->parentheses.disjunction->m_minimumSize;
                     if (inputOffset) {
-                        add32(Imm32(inputOffset), index, indexTemporary);
+                        move(index, indexTemporary);
+                        sub32(Imm32(inputOffset), indexTemporary);
                         setSubpatternStart(indexTemporary, term->parentheses.subpatternId);
                     } else
                         setSubpatternStart(index, term->parentheses.subpatternId);
@@ -1661,9 +1682,10 @@
                 // offsets only afterwards, at the point the results array is
                 // being accessed.
                 if (term->capture() && compileMode == IncludeSubpatterns) {
-                    int inputOffset = term->inputPosition - m_checked;
+                    unsigned inputOffset = (m_checkedOffset - term->inputPosition).unsafeGet();
                     if (inputOffset) {
-                        add32(Imm32(inputOffset), index, indexTemporary);
+                        move(index, indexTemporary);
+                        sub32(Imm32(inputOffset), indexTemporary);
                         setSubpatternEnd(indexTemporary, term->parentheses.subpatternId);
                     } else
                         setSubpatternEnd(index, term->parentheses.subpatternId);
@@ -1729,11 +1751,11 @@
                 storeToFrame(index, parenthesesFrameLocation);
 
                 // Check 
-                op.m_checkAdjust = m_checked - term->inputPosition;
+                op.m_checkAdjust = m_checkedOffset - term->inputPosition;
                 if (op.m_checkAdjust)
-                    sub32(Imm32(op.m_checkAdjust), index);
+                    sub32(Imm32(op.m_checkAdjust.unsafeGet()), index);
 
-                m_checked -= op.m_checkAdjust;
+                m_checkedOffset -= op.m_checkAdjust;
                 break;
             }
             case OpParentheticalAssertionEnd: {
@@ -1751,7 +1773,7 @@
                 }
 
                 YarrOp& lastOp = m_ops[op.m_previousOp];
-                m_checked += lastOp.m_checkAdjust;
+                m_checkedOffset += lastOp.m_checkAdjust;
                 break;
             }
 
@@ -1809,9 +1831,9 @@
 
                 if (op.m_op == OpBodyAlternativeNext) {
                     PatternAlternative* priorAlternative = m_ops[op.m_previousOp].m_alternative;
-                    m_checked += priorAlternative->m_minimumSize;
+                    m_checkedOffset += priorAlternative->m_minimumSize;
                 }
-                m_checked -= alternative->m_minimumSize;
+                m_checkedOffset -= alternative->m_minimumSize;
 
                 // Is this the last alternative? If not, then if we backtrack to this point we just
                 // need to jump to try to match the next alternative.
@@ -2014,7 +2036,7 @@
                 ASSERT(m_backtrackingState.isEmpty());
 
                 PatternAlternative* priorAlternative = m_ops[op.m_previousOp].m_alternative;
-                m_checked += priorAlternative->m_minimumSize;
+                m_checkedOffset += priorAlternative->m_minimumSize;
                 break;
             }
 
@@ -2065,7 +2087,7 @@
                 if (op.m_checkAdjust) {
                     // Handle the cases where we need to link the backtracks here.
                     m_backtrackingState.link(this);
-                    sub32(Imm32(op.m_checkAdjust), index);
+                    sub32(Imm32(op.m_checkAdjust.unsafeGet()), index);
                     if (!isLastAlternative) {
                         // An alternative that is not the last should jump to its successor.
                         jump(nextOp.m_reentry);
@@ -2115,9 +2137,9 @@
 
                 if (!isBegin) {
                     YarrOp& lastOp = m_ops[op.m_previousOp];
-                    m_checked += lastOp.m_checkAdjust;
+                    m_checkedOffset += lastOp.m_checkAdjust;
                 }
-                m_checked -= op.m_checkAdjust;
+                m_checkedOffset -= op.m_checkAdjust;
                 break;
             }
             case OpSimpleNestedAlternativeEnd:
@@ -2148,7 +2170,7 @@
                 }
 
                 YarrOp& lastOp = m_ops[op.m_previousOp];
-                m_checked += lastOp.m_checkAdjust;
+                m_checkedOffset += lastOp.m_checkAdjust;
                 break;
             }
 
@@ -2261,7 +2283,7 @@
                      m_backtrackingState.link(this);
 
                     if (op.m_checkAdjust)
-                        add32(Imm32(op.m_checkAdjust), index);
+                        add32(Imm32(op.m_checkAdjust.unsafeGet()), index);
 
                     // In an inverted assertion failure to match the subpattern
                     // is treated as a successful match - jump to the end of the
@@ -2278,7 +2300,7 @@
                 // added the failure caused by a successful match to this.
                 m_backtrackingState.append(endOp.m_jumps);
 
-                m_checked += op.m_checkAdjust;
+                m_checkedOffset += op.m_checkAdjust;
                 break;
             }
             case OpParentheticalAssertionEnd: {
@@ -2290,7 +2312,7 @@
                 m_backtrackingState.takeBacktracksToJumpList(op.m_jumps, this);
 
                 YarrOp& lastOp = m_ops[op.m_previousOp];
-                m_checked -= lastOp.m_checkAdjust;
+                m_checkedOffset -= lastOp.m_checkAdjust;
                 break;
             }
 
@@ -2634,9 +2656,7 @@
         : m_vm(vm)
         , m_pattern(pattern)
         , m_charSize(charSize)
-        , m_charScale(m_charSize == Char8 ? TimesOne: TimesTwo)
         , m_shouldFallBack(false)
-        , m_checked(0)
     {
     }
 
@@ -2697,8 +2717,6 @@
 
     YarrCharSize m_charSize;
 
-    Scale m_charScale;
-
     // Used to detect regular _expression_ constructs that are not currently
     // supported in the JIT; fall back to the interpreter when this is detected.
     bool m_shouldFallBack;
@@ -2716,7 +2734,7 @@
     // FIXME: This should go away. Rather than tracking this value throughout
     // code generation, we should gather this information up front & store it
     // on the YarrOp structure.
-    int m_checked;
+    Checked<unsigned> m_checkedOffset;
 
     // This class records state whilst generating the backtracking path of code.
     BacktrackingState m_backtrackingState;

Modified: trunk/Source/_javascript_Core/yarr/YarrPattern.h (203205 => 203206)


--- trunk/Source/_javascript_Core/yarr/YarrPattern.h	2016-07-14 00:52:00 UTC (rev 203205)
+++ trunk/Source/_javascript_Core/yarr/YarrPattern.h	2016-07-14 01:20:11 UTC (rev 203206)
@@ -111,7 +111,7 @@
     };
     QuantifierType quantityType;
     Checked<unsigned> quantityCount;
-    int inputPosition;
+    unsigned inputPosition;
     unsigned frameLocation;
 
     PatternTerm(UChar32 ch)
_______________________________________________
webkit-changes mailing list
[email protected]
https://lists.webkit.org/mailman/listinfo/webkit-changes

Reply via email to