Branch: refs/heads/main
  Home:   https://github.com/WebKit/WebKit
  Commit: e6ee2eb472c1b527eef8a439a7c6d227955271ac
      
https://github.com/WebKit/WebKit/commit/e6ee2eb472c1b527eef8a439a7c6d227955271ac
  Author: Michael Saboff <[email protected]>
  Date:   2024-11-12 (Tue, 12 Nov 2024)

  Changed paths:
    A JSTests/microbenchmarks/regexp-anychar-character-classes.js
    A JSTests/stress/regexp-character-class-coalescing.js
    M Source/JavaScriptCore/yarr/YarrJIT.cpp
    M Source/JavaScriptCore/yarr/YarrPattern.cpp
    M Source/JavaScriptCore/yarr/YarrPattern.h
    M Source/JavaScriptCore/yarr/create_regex_tables

  Log Message:
  -----------
  [Yarr] Improve processing of [\s\S] character classes
https://bugs.webkit.org/show_bug.cgi?id=283003
rdar://135409524

Reviewed by Yusuke Suzuki.

The character class [\s\S], which is all white space and non-white space 
characters is used in lieu of
‘.’ (any character).  Many developers use [\s\S] instead of ’.’ because the ‘.’ 
any character does not
include line terminators unless the dot all ’s’ flag is added to the regex.  
The character class [\s\S]
matches any character regardless of flags added to the expression.

The processing of [\s\S] was sub optimal due to several issues.
 1. The code was not coalescing the combination of \s and \S into a single 
combined range of 0...max
    character.
 2. The JIT generation of regex’s for 8-bit strings that contained character 
classes with code points
    greater than 255 still included comparisons to 16 bit code points.
 3. The code point U+180E, the Mongolian Vowel Separator, was changed from 
being white space to non-white
    space with ECMAScript 2016.  Our code generator for \S was not changed to 
include that code point.
    When \S was used by itself, matching was done with the _spacesData table 
instead of using individual
    code points and ranges.  When combined with \s, or any other character 
class, we weren’t matching U+180E.
 4. When using the ‘v’ flag, the character class processing for an “any 
character” character classes that
    also contained strings did not process those strings.

Fixed these issues by fixing the coalescing code to reduce produced character 
classes to the minimum set
of individual code points and code point ranges to check.  Added code to 
extract the 8 bit only part of
a character class before emitting the JIT code.  Added U+180E to the non-spaces 
built-in character class.
Changed Yarr::matchCharacterClassTermInner() to handle strings with with all 
character class sets including
those that match all character.

After these changes, the ARM v8 code size for the test regexp, 
/([\s\S]+?)Abc123([\s\S]+)EOL/ went from
1460 bytes down to 356 bytes.

Added tests for these changes.  Also added a new micro benchmark to test 
performance improvements of
[\s\S] in four similar regular expressions.  These regular expressions differ 
in greediness and the
minimum match size.  On an M3 equipped MacBook Pro, that benchmark shows a 2+ 
times improvement.

                                     Baseline         FixCharacterClasses
regexp-anychar-character-classes:  112.3561+-2.4038  52.8811+-3.0929    ^ 
definitely 2.1247x faster

* JSTests/microbenchmarks/regexp-anychar-character-classes.js: Added.
* JSTests/stress/regexp-character-class-coalescing.js: Added.
(arrayToString):
(objectToString):
(dumpValue):
(compareArray):
(compareGroups):
(testRegExp):
(testRegExpSyntaxError):
* Source/JavaScriptCore/yarr/YarrJIT.cpp:
* Source/JavaScriptCore/yarr/YarrPattern.cpp:
(JSC::Yarr::CharacterClassConstructor::unicodeOpSorted):
(JSC::Yarr::CharacterClassConstructor::coalesceTables):
(JSC::Yarr::YarrPatternConstructor::atomCharacterClassEnd):
(JSC::Yarr::CharacterClass::copyOnly8BitCharacterData):
* Source/JavaScriptCore/yarr/YarrPattern.h:
* Source/JavaScriptCore/yarr/create_regex_tables:

Canonical link: https://commits.webkit.org/286509@main



To unsubscribe from these emails, change your notification settings at 
https://github.com/WebKit/WebKit/settings/notifications
_______________________________________________
webkit-changes mailing list
[email protected]
https://lists.webkit.org/mailman/listinfo/webkit-changes

Reply via email to