Branch: refs/heads/main
  Home:   https://github.com/WebKit/WebKit
  Commit: 732ab0a8407576cbca271ee19d278a6088ace703
      
https://github.com/WebKit/WebKit/commit/732ab0a8407576cbca271ee19d278a6088ace703
  Author: Wenson Hsieh <[email protected]>
  Date:   2026-07-03 (Fri, 03 Jul 2026)

  Changed paths:
    M 
LayoutTests/fast/text-extraction/debug-text-extraction-shorten-urls-expected.txt
    M LayoutTests/fast/text-extraction/debug-text-extraction-shorten-urls.html
    M Source/WebCore/page/text-extraction/TextExtraction.cpp
    M Source/WebCore/page/text-extraction/TextExtractionTypes.h
    M Source/WebKit/Shared/TextExtractionToStringConversion.cpp
    M Source/WebKit/Shared/WebCoreArgumentCoders.serialization.in

  Log Message:
  -----------
  [Text Extraction] Include human-readable query parameters and fragments for 
links that navigate within the same page
https://bugs.webkit.org/show_bug.cgi?id=318582
rdar://181349810

Reviewed by Abrar Rahman Protyasha.

Adjust the URL shortening heuristic: for links that navigate to the same URL as 
the current page
(ignoring query and fragment), we currently omit the URL from the text 
extraction entirely. Instead,
add the query only if there's only one human-readable 
(`isProbablyHumanReadable`) key/value pair,
and fall back to the fragment only if it's human-readable.

This prevents us from losing information about links in some cases, where the 
rendered text alone
doesn't provide enough context.

* 
LayoutTests/fast/text-extraction/debug-text-extraction-shorten-urls-expected.txt:
* LayoutTests/fast/text-extraction/debug-text-extraction-shorten-urls.html:
* Source/WebCore/page/text-extraction/TextExtraction.cpp:
(WebCore::TextExtraction::extractItemData):
* Source/WebCore/page/text-extraction/TextExtractionTypes.h:
* Source/WebKit/Shared/TextExtractionToStringConversion.cpp:
(WebKit::TextExtractionAggregator::shortenedURLStringForLink):
(WebKit::addPartsForItem):
(WebKit::addTextRepresentationRecursive):
* Source/WebKit/Shared/WebCoreArgumentCoders.serialization.in:

Canonical link: https://commits.webkit.org/316503@main



To unsubscribe from these emails, change your notification settings at 
https://github.com/WebKit/WebKit/settings/notifications

Reply via email to