Branch: refs/heads/main
  Home:   https://github.com/WebKit/WebKit
  Commit: b137fef748b7711f22e63952f81b794968f6e504
      
https://github.com/WebKit/WebKit/commit/b137fef748b7711f22e63952f81b794968f6e504
  Author: Wenson Hsieh <[email protected]>
  Date:   2025-12-27 (Sat, 27 Dec 2025)

  Changed paths:
    M 
LayoutTests/fast/text-extraction/debug-text-extraction-shorten-urls-expected.txt
    M LayoutTests/fast/text-extraction/debug-text-extraction-shorten-urls.html
    M Source/WebKit/Shared/TextExtractionToStringConversion.cpp
    M Source/WebKit/Shared/TextExtractionToStringConversion.h
    A Source/WebKit/Shared/TextExtractionURLCache.cpp
    A Source/WebKit/Shared/TextExtractionURLCache.h
    M Source/WebKit/Sources.txt
    M Source/WebKit/UIProcess/API/Cocoa/WKWebView.mm
    M Source/WebKit/UIProcess/API/Cocoa/WKWebViewInternal.h
    M Source/WebKit/UIProcess/API/Cocoa/_WKTextExtraction.h
    M Source/WebKit/UIProcess/API/Cocoa/_WKTextExtraction.mm
    M Source/WebKit/UIProcess/API/Cocoa/_WKTextExtractionInternal.h
    M Source/WebKit/WebKit.xcodeproj/project.pbxproj

  Log Message:
  -----------
  [AutoFill Debugging] Part 2/2: Deduplicate shortened URLs and report 
replacements to the client
https://bugs.webkit.org/show_bug.cgi?id=304680
rdar://167149825

Reviewed by Richard Robinson.

Add a strategy to deduplicate different URLs that are shortened to the same 
string, when opting into
URL shortening for text extraction; combine that with a new property on 
`_WKTextExtractionResult`
that reports a mapping from shortened URLs to each original URL, back to the 
WebKit text extraction
client.

See below for more details.

* 
LayoutTests/fast/text-extraction/debug-text-extraction-shorten-urls-expected.txt:
* LayoutTests/fast/text-extraction/debug-text-extraction-shorten-urls.html:

Augment this layout test to exercise the deduplication strategy.

* Source/WebKit/Shared/TextExtractionToStringConversion.cpp:
(WebKit::TextExtractionAggregator::~TextExtractionAggregator):
(WebKit::TextExtractionAggregator::stringForURL):

Add a helper method that uses `TextExtractionURLCache` to map each shortened 
URL string to a
deduplicated string.

(WebKit::addPartsForItem):
(WebKit::addTextRepresentationRecursive):
(WebKit::centerEllipsize): Deleted.
* Source/WebKit/Shared/TextExtractionToStringConversion.h:
(WebKit::TextExtractionOptions::TextExtractionOptions):
* Source/WebKit/Shared/TextExtractionURLCache.cpp: Added.

Add a new helper class that represents a cache of shortened URL strings and 
original URLs.

(WebKit::TextExtractionURLCache::clear):

Reset the cache (called during top level navigation, or if the web process 
swaps or exits).

(WebKit::TextExtractionURLCache::add):

Main entry point for updating the cache: this takes the shortened URL string 
and original URL as
arguments, and returns the shortened URL string saved to the cache, adjusting 
it by adding a
numbered suffix if necessary. For instance:

-   Suppose `https://example.com/foo?bar=baz` is shortened to "example.com/foo".

-   If `https://example.com/foo?bar=garply` appears later on, it will also get 
shortened to
    "example.com/foo" which is then deduplicated to "example.com/foo2".

-   If `https://example.com/foo.html` appears, it will shorten to 
"example.com/foo.html" because the
    shortened version doesn't conflict with the other shortened strings above.

-   If `https://example.com/foo.html?bar=baz` appears later on, it will also 
get shortened to
    "example.com/foo.html" which is then deduplicated to 
"example.com/foo2.html".

(WebKit::TextExtractionURLCache::urlForShortenedString const):
* Source/WebKit/Shared/TextExtractionURLCache.h: Added.
(WebKit::TextExtractionURLCache::create):
* Source/WebKit/Sources.txt:
* Source/WebKit/UIProcess/API/Cocoa/WKWebView.mm:
(createEmptyTextExtractionResult):
(-[WKWebView 
_extractDebugTextWithConfigurationWithoutUpdatingFilterRules:completionHandler:]):
(-[WKWebView _clearTextExtractionFilterCache]):
* Source/WebKit/UIProcess/API/Cocoa/WKWebViewInternal.h:

Store a `TextExtractionURLCache` on the web view, and pass it into the text 
conversion pipeline.
Clear the cache upon navigation.

* Source/WebKit/UIProcess/API/Cocoa/_WKTextExtraction.h:
* Source/WebKit/UIProcess/API/Cocoa/_WKTextExtraction.mm:
(-[_WKTextExtractionResult 
initWithTextContent:filteredOutAnyText:shortenedURLs:]):
(-[_WKTextExtractionResult shortenedURLs]):

Add support for a new property on `_WKTextExtractionResult` that contains a map 
of shortened
extracted URLs to their original URLs.

(-[_WKTextExtractionResult initWithTextContent:filteredOutAnyText:]): Deleted.
* Source/WebKit/UIProcess/API/Cocoa/_WKTextExtractionInternal.h:
* Source/WebKit/WebKit.xcodeproj/project.pbxproj:

Canonical link: https://commits.webkit.org/304963@main



To unsubscribe from these emails, change your notification settings at 
https://github.com/WebKit/WebKit/settings/notifications

Reply via email to