Branch: refs/heads/main
  Home:   https://github.com/WebKit/WebKit
  Commit: 12c34ef5e3058990f7e9997bc07ea1883c2a9eab
      
https://github.com/WebKit/WebKit/commit/12c34ef5e3058990f7e9997bc07ea1883c2a9eab
  Author: Michael Saboff <[email protected]>
  Date:   2025-02-21 (Fri, 21 Feb 2025)

  Changed paths:
    A JSTests/microbenchmarks/regexp-keyword-parsing.js
    A JSTests/stress/regexp-parsing-tokens.js
    M Source/JavaScriptCore/yarr/YarrJIT.cpp
    M Source/JavaScriptCore/yarr/YarrPattern.cpp
    M Source/JavaScriptCore/yarr/YarrPattern.h

  Log Message:
  -----------
  [Yarr] Improve processing of an alternation of strings
https://bugs.webkit.org/show_bug.cgi?id=288102
rdar://145222010

Reviewed by Yusuke Suzuki.

Added the notion of a string list to a parsed RegExp that is in the form of
  /^(?:break|case|which|do|for)/ with an optional trailing $.
Such a RegExp will not backtrack and therefore we can streamline the code we 
emit for such a pattern.

This change involves recognizing beginning of string anchored alternations of 
strings while parsing and
then treating the generation of JIT code differently for these patterns.  This 
includes changing how
conditional branching works, specifically that instead of the "fall through on 
match" for each term,
to a "jump on match" for the whole alternation.

The current code generated for the "case" elternative is:
   8:Term PatternCharacter checked-offset:(3) 'c'
               <156> 0x11381430c:    add      w1, w1, #2
               <160> 0x113814310:    cmp      w1, w2
               <164> 0x113814314:    b.hi     0x113814444 -> <468>
  10:Term PatternCharacter checked-offset:(4) 'c'
               <168> 0x113814318:    sub      x17, x0, #4
               <172> 0x11381431c:    ldr      w17, [x17, x1]
               <176> 0x113814320:    movz     w16, #0x6163
               <180> 0x113814324:    movk     w16, #0x6573, lsl #16 -> 
0x65736163
               <184> 0x113814328:    cmp      w17, w16
               <188> 0x11381432c:    b.ne     0x113814444 -> <468>
  11:Term PatternCharacter checked-offset:(4) 'a' already handled
  12:Term PatternCharacter checked-offset:(4) 's' already handled
  13:Term PatternCharacter checked-offset:(4) 'e' already handled
  14:NestedAlternativeNext minimum-size:(5),checked-offset:(5)
               <192> 0x113814330:    movz     x16, #0x4444
               <196> 0x113814334:    movk     x16, #0x1381, lsl #16
               <200> 0x113814338:    movk     x16, #0x8001, lsl #32
               <204> 0x11381433c:    movk     x16, #0xc973, lsl #48 -> 
0x113814444 JIT PC
               <208> 0x113814340:    stur     x16, [sp, #8]
               <212> 0x113814344:    b        0x113814404 -> <404>
With some additional backtracking code:
   9:NestedAlternativeNext minimum-size:(4),checked-offset:(4)
               <468> 0x113814444:    sub      w1, w1, #2
               <472> 0x113814448:    b        0x113814348 -> <216>

With this change, the processing of "case" becomes:
   9:StringListAlternativeNext minimum-size:(4),checked-offset:(4)
               <132> 0x12a8285c4:    sub      w1, w1, #1
               <136> 0x12a8285c8:    cmp      w1, w2
               <140> 0x12a8285cc:    b.hi     0x12a8285e8 -> <168>
  10:Term PatternCharacter checked-offset:(4) 'c'
               <144> 0x12a8285d0:    sub      x17, x0, #4
               <148> 0x12a8285d4:    ldr      w17, [x17, x1]
               <152> 0x12a8285d8:    movz     w16, #0x6163
               <156> 0x12a8285dc:    movk     w16, #0x6573, lsl #16 -> 
0x65736163
               <160> 0x12a8285e0:    cmp      w17, w16
               <164> 0x12a8285e4:    b.eq     0x12a82866c -> <300>
  11:Term PatternCharacter checked-offset:(4) 'a' already handled
  12:Term PatternCharacter checked-offset:(4) 's' already handled
  13:Term PatternCharacter checked-offset:(4) 'e' already handled
  14:StringListAlternativeNext minimum-size:(5),checked-offset:(5)
With no backtracking code.

We are able to eliminate one branch and the saving of the continuation PC for 
backtracking.
The code size to process these string list RegExp is reduces.  For the example 
RegExp above,
the prior version created 1940 bytes (485 instructions) of code while the code 
created with this
1392 bytes (345 instructions) of code, a nearly 30% reduction in code.

This change is a ~18% progression on the new regexp-keyword-parsing 
microbenchmark:

                                 Baseline               YarrStringList

regexp-keyword-parsing      136.7065+-0.9807     ^    116.0161+-1.1791        ^ 
definitely 1.1783x faster

<geometric>                 136.7065+-0.9807     ^    116.0161+-1.1791        ^ 
definitely 1.1783x faster

* JSTests/microbenchmarks/regexp-keyword-parsing.js: Added.
(arrayToString):
(objectToString):
(dumpValue):
(compareArray):
(compareGroups):
(testRegExp):
(testRegExpSyntaxError):
(let.re.break.case.catch.continue.debugger.default.else.finally.if):
(let.re1.break.case.catch.continue.debugger.default.else.finally.if):
* JSTests/stress/regexp-parsing-tokens.js: Added.
(arrayToString):
(objectToString):
(dumpValue):
(compareArray):
(compareGroups):
(testRegExp):
(testRegExpSyntaxError):
* Source/JavaScriptCore/yarr/YarrJIT.cpp:
* Source/JavaScriptCore/yarr/YarrPattern.cpp:
(JSC::Yarr::YarrPatternConstructor::atomParenthesesEnd):
(JSC::Yarr::YarrPatternConstructor::checkForTerminalParentheses):
(JSC::Yarr::PatternAlternative::dump):
(JSC::Yarr::PatternTerm::dump):
* Source/JavaScriptCore/yarr/YarrPattern.h:
(JSC::Yarr::PatternTerm::PatternTerm):
(JSC::Yarr::PatternAlternative::PatternAlternative):

Canonical link: https://commits.webkit.org/290791@main



To unsubscribe from these emails, change your notification settings at 
https://github.com/WebKit/WebKit/settings/notifications
_______________________________________________
webkit-changes mailing list
[email protected]
https://lists.webkit.org/mailman/listinfo/webkit-changes

Reply via email to