Branch: refs/heads/main
  Home:   https://github.com/WebKit/WebKit
  Commit: fb1eae0edd1479f4642007ffaa4005d476dfe234
      
https://github.com/WebKit/WebKit/commit/fb1eae0edd1479f4642007ffaa4005d476dfe234
  Author: Wenson Hsieh <[email protected]>
  Date:   2025-11-11 (Tue, 11 Nov 2025)

  Changed paths:
    M Source/WebKit/Shared/TextExtractionToStringConversion.cpp
    M Source/WebKit/Shared/TextExtractionToStringConversion.h
    M Source/WebKit/UIProcess/API/Cocoa/WKWebView.mm
    M Source/WebKit/UIProcess/API/Cocoa/_WKTextExtraction.h
    M Source/WebKit/UIProcess/API/Cocoa/_WKTextExtraction.mm
    M Source/WebKit/UIProcess/API/Cocoa/_WKTextExtractionInternal.h
    M 
Source/WebKit/UIProcess/Cocoa/TextExtraction/WKWebView+TextExtraction.swift
    M Tools/TestWebKitAPI/Tests/WebKitCocoa/TextExtractionTests.mm

  Log Message:
  -----------
  Refactor text extraction filtering logic to be more extensible
https://bugs.webkit.org/show_bug.cgi?id=302301
rdar://164442518

Reviewed by Abrar Rahman Protyasha and Megan Gardner.

Currently, the `TextExtractionOptions` passed into `convertToText` (i.e. 
converting extraction items
into text) only allow for a single text filtering callback which returns a 
`NativePromise` (which
is resolved once all relevant text filtering steps have been performed). 
However, this makes it
somewhat tricky to implement more sophisticated logic around conditionally 
enabling filtering steps
at both build-time and runtime:

1.  The `TextExtractionFilter` classifier is behind a compile-time flag, as 
well as a runtime flag.
    It's now also configurable by the WebKit client, via the new option flag.
2.  The text recognition filter is behind the same compile-time and runtime 
flags. It's also
    configurable by the client, independently of (1).
3.  The maximum word limit is configurable by the WebKit client.

To make this filtering callback mechanism more extensible, we convert the 
callback into a vector of
callbacks that represent a filtering pipeline, where any of the above steps 
(1-3) can be added as
needed (and any future steps can just be appended to the list as needed).

See below for more details.

Test: TextExtractionTests.FilterOptions

Test: Tools/TestWebKitAPI/Tests/WebKitCocoa/TextExtractionTests.mm
* Source/WebKit/Shared/TextExtractionToStringConversion.cpp:
(WebKit::TextExtractionAggregator::filter const):
(WebKit::TextExtractionAggregator::filterRecursive const):

Turn the `filterCallback` into a `Vector` of `filterCallbacks`; the callbacks 
are invoked in order,
and each callback's output is fed into the next callback as input (unless the 
promise rejects, in
which case we stop early).

* Source/WebKit/Shared/TextExtractionToStringConversion.h:
(WebKit::TextExtractionOptions::TextExtractionOptions):
* Source/WebKit/UIProcess/API/Cocoa/WKWebView.mm:
(-[WKWebView _debugTextWithConfiguration:completionHandler:]):
(joinAndTruncateLinesToWordLimit): Deleted.
* Source/WebKit/UIProcess/API/Cocoa/_WKTextExtraction.h:

Add a new enum options property, which allows clients to opt in or out of the 
classifier and/or OCR
filter during extraction.

* Source/WebKit/UIProcess/API/Cocoa/_WKTextExtraction.mm:
(-[_WKTextExtractionConfiguration _initForOnlyVisibleText:]):
* Source/WebKit/UIProcess/API/Cocoa/_WKTextExtractionInternal.h:
* Source/WebKit/UIProcess/Cocoa/TextExtraction/WKWebView+TextExtraction.swift:
* Tools/TestWebKitAPI/Tests/WebKitCocoa/TextExtractionTests.mm:
(TestWebKitAPI::TEST(TextExtractionTests, FilterOptions)):

Canonical link: https://commits.webkit.org/302850@main



To unsubscribe from these emails, change your notification settings at 
https://github.com/WebKit/WebKit/settings/notifications

Reply via email to