Branch: refs/heads/main
Home: https://github.com/WebKit/WebKit
Commit: 5d4feeaef67a95a369beb8242abd28bf64bc4251
https://github.com/WebKit/WebKit/commit/5d4feeaef67a95a369beb8242abd28bf64bc4251
Author: Michael Saboff <[email protected]>
Date: 2025-02-24 (Mon, 24 Feb 2025)
Changed paths:
A JSTests/microbenchmarks/regexp-keyword-parsing.js
A JSTests/stress/regexp-parsing-tokens.js
M Source/JavaScriptCore/yarr/YarrJIT.cpp
M Source/JavaScriptCore/yarr/YarrPattern.cpp
M Source/JavaScriptCore/yarr/YarrPattern.h
Log Message:
-----------
[Yarr] Improve processing of an alternation of strings
https://bugs.webkit.org/show_bug.cgi?id=288102
rdar://145222010
Reviewed by Yusuke Suzuki.
Added the notion of a string list to a parsed RegExp that is in the form of
/^(?:break|case|which|do|for)/ with an optional trailing $.
Such a RegExp will not backtrack and therefore we can streamline the code we
emit for such a pattern.
This change involves recognizing beginning of string anchored alternations of
strings while parsing and
then treating the generation of JIT code differently for these patterns. This
includes changing how
conditional branching works, specifically that instead of the "fall through on
match" for each term,
to a "jump on match" for the whole alternation.
Fixed a bug in the original version where we weren't properly checking the
nested alternatives to see
if they only contain fixed single count PatternCharacter terms.
The current code generated for the "case" elternative is:
8:Term PatternCharacter checked-offset:(3) 'c'
<156> 0x11381430c: add w1, w1, #2
<160> 0x113814310: cmp w1, w2
<164> 0x113814314: b.hi 0x113814444 -> <468>
10:Term PatternCharacter checked-offset:(4) 'c'
<168> 0x113814318: sub x17, x0, #4
<172> 0x11381431c: ldr w17, [x17, x1]
<176> 0x113814320: movz w16, #0x6163
<180> 0x113814324: movk w16, #0x6573, lsl #16 ->
0x65736163
<184> 0x113814328: cmp w17, w16
<188> 0x11381432c: b.ne 0x113814444 -> <468>
11:Term PatternCharacter checked-offset:(4) 'a' already handled
12:Term PatternCharacter checked-offset:(4) 's' already handled
13:Term PatternCharacter checked-offset:(4) 'e' already handled
14:NestedAlternativeNext minimum-size:(5),checked-offset:(5)
<192> 0x113814330: movz x16, #0x4444
<196> 0x113814334: movk x16, #0x1381, lsl #16
<200> 0x113814338: movk x16, #0x8001, lsl #32
<204> 0x11381433c: movk x16, #0xc973, lsl #48 ->
0x113814444 JIT PC
<208> 0x113814340: stur x16, [sp, #8]
<212> 0x113814344: b 0x113814404 -> <404>
With some additional backtracking code:
9:NestedAlternativeNext minimum-size:(4),checked-offset:(4)
<468> 0x113814444: sub w1, w1, #2
<472> 0x113814448: b 0x113814348 -> <216>
With this change, the processing of "case" becomes:
9:StringListAlternativeNext minimum-size:(4),checked-offset:(4)
<132> 0x12a8285c4: sub w1, w1, #1
<136> 0x12a8285c8: cmp w1, w2
<140> 0x12a8285cc: b.hi 0x12a8285e8 -> <168>
10:Term PatternCharacter checked-offset:(4) 'c'
<144> 0x12a8285d0: sub x17, x0, #4
<148> 0x12a8285d4: ldr w17, [x17, x1]
<152> 0x12a8285d8: movz w16, #0x6163
<156> 0x12a8285dc: movk w16, #0x6573, lsl #16 ->
0x65736163
<160> 0x12a8285e0: cmp w17, w16
<164> 0x12a8285e4: b.eq 0x12a82866c -> <300>
11:Term PatternCharacter checked-offset:(4) 'a' already handled
12:Term PatternCharacter checked-offset:(4) 's' already handled
13:Term PatternCharacter checked-offset:(4) 'e' already handled
14:StringListAlternativeNext minimum-size:(5),checked-offset:(5)
With no backtracking code.
We are able to eliminate one branch and the saving of the continuation PC for
backtracking.
The code size to process these string list RegExp is reduces. For the example
RegExp above,
the prior version created 1940 bytes (485 instructions) of code while the code
created with this
1392 bytes (345 instructions) of code, a nearly 30% reduction in code.
This change is a ~18% progression on the new regexp-keyword-parsing
microbenchmark:
Baseline YarrStringList
regexp-keyword-parsing 136.7065+-0.9807 ^ 116.0161+-1.1791 ^
definitely 1.1783x faster
<geometric> 136.7065+-0.9807 ^ 116.0161+-1.1791 ^
definitely 1.1783x faster
* JSTests/microbenchmarks/regexp-keyword-parsing.js: Added.
(arrayToString):
(objectToString):
(dumpValue):
(compareArray):
(compareGroups):
(testRegExp):
(testRegExpSyntaxError):
(let.re.break.case.catch.continue.debugger.default.else.finally.if):
(let.re1.break.case.catch.continue.debugger.default.else.finally.if):
* JSTests/stress/regexp-parsing-tokens.js: Added.
(arrayToString):
(objectToString):
(dumpValue):
(compareArray):
(compareGroups):
(testRegExp):
(testRegExpSyntaxError):
* Source/JavaScriptCore/yarr/YarrJIT.cpp:
* Source/JavaScriptCore/yarr/YarrPattern.cpp:
(JSC::Yarr::YarrPatternConstructor::atomParenthesesEnd):
(JSC::Yarr::YarrPatternConstructor::checkForTerminalParentheses):
(JSC::Yarr::PatternAlternative::dump):
(JSC::Yarr::PatternTerm::dump):
* Source/JavaScriptCore/yarr/YarrPattern.h:
(JSC::Yarr::PatternTerm::PatternTerm):
(JSC::Yarr::PatternAlternative::PatternAlternative):
Canonical link: https://commits.webkit.org/290982@main
To unsubscribe from these emails, change your notification settings at
https://github.com/WebKit/WebKit/settings/notifications
_______________________________________________
webkit-changes mailing list
[email protected]
https://lists.webkit.org/mailman/listinfo/webkit-changes