Title: [261247] trunk
Revision
261247
Author
[email protected]
Date
2020-05-06 13:02:40 -0700 (Wed, 06 May 2020)

Log Message

Cut and paste from Google Doc to Notes in several (non-Latin) languages doesn't work
https://bugs.webkit.org/show_bug.cgi?id=211498
<rdar://problem/56675345>

Reviewed by Darin Adler.

Source/WebCore:

When copying text in Google Docs, the page uses `DataTransfer.setData` to write text/html data to the system
pasteboard. This markup string includes a meta tag with `charset="utf-8"`, indicating that the HTML string that
was copied should be interpreted as UTF-8 data.

However, before we write this data to the system pasteboard, we first sanitize it by loading it in a separate
page, and then build the final sanitized markup string to write by iterating over only visible content in the
main document of this page. Importantly, this last step skips over the meta element containing the charset.

Later, when pasting in Notes or TextEdit, both apps use `-[NSAttributedString initWithData:...:]` to convert the
HTML data on the pasteboard into an NSAttributedString. This takes the NSPasteboard's HTML data (a blob of
`NSData`) and synchronously loads it in a new legacy WebKit view by calling `-[WebFrame
loadData:MIMEType:textEncodingName:baseURL:]`, passing in `nil` as the text encoding name. Since WebKit is only
given a blob of data and no particular encoding, we fall back to default Latin-1 encoding, which produces
gibberish for CJK text.

To fix this, we automatically insert a `<meta charset="utf-8">` tag when writing HTML to the pasteboard, if the
sanitized markup contains non-ASCII characters.

Test: CopyHTML.SanitizationPreservesCharacterSet

* Modules/async-clipboard/ClipboardItemBindingsDataSource.cpp:
(WebCore::ClipboardItemBindingsDataSource::ClipboardItemTypeLoader::sanitizeDataIfNeeded):

Pass in AddMetaCharsetIfNeeded::Yes.

* dom/DataTransfer.cpp:
(WebCore::DataTransfer::setDataFromItemList):

Pass in AddMetaCharsetIfNeeded::Yes here too.

* editing/cocoa/WebContentReaderCocoa.mm:
(WebCore::sanitizeMarkupWithArchive):
(WebCore::WebContentReader::readHTML):
(WebCore::WebContentMarkupReader::readHTML):
* editing/markup.cpp:
(WebCore::sanitizeMarkup):

Add a new enum so that we only add the extra meta tag when sanitizing content that is being written to the
system pasteboard through one of the clipboard DOM APIs.

(WebCore::sanitizedMarkupForFragmentInDocument):
* editing/markup.h:

Source/WebKit:

Add a new header to allow Cocoa code to reason about UIColors and NSColors on iOS and macOS (respectively)
without requiring platform ifdefs. A followup patch will adopt this in several places in WebKit, where we
currently need ifdefs for iOS and macOS.

* Platform/cocoa/CocoaColor.h: Added.
* WebKit.xcodeproj/project.pbxproj:

Tools:

Add a test to verify that when writing markup to the clipboard via DOM API, if non-ASCII characters appear in
the written markup, they can still be converted to `NSAttributedString`s containing the expected non-Latin text.

* TestWebKitAPI/Configurations/Base.xcconfig:

Adjust header search paths so that we can import CocoaColor.h in WebKit.

* TestWebKitAPI/Tests/WebKitCocoa/CopyHTML.mm:
(readHTMLDataFromPasteboard):
(readHTMLStringFromPasteboard):
(readHTMLFromPasteboard): Deleted.

Modified Paths

Added Paths

Diff

Modified: trunk/Source/WebCore/ChangeLog (261246 => 261247)


--- trunk/Source/WebCore/ChangeLog	2020-05-06 19:57:34 UTC (rev 261246)
+++ trunk/Source/WebCore/ChangeLog	2020-05-06 20:02:40 UTC (rev 261247)
@@ -1,3 +1,54 @@
+2020-05-06  Wenson Hsieh  <[email protected]>
+
+        Cut and paste from Google Doc to Notes in several (non-Latin) languages doesn't work
+        https://bugs.webkit.org/show_bug.cgi?id=211498
+        <rdar://problem/56675345>
+
+        Reviewed by Darin Adler.
+
+        When copying text in Google Docs, the page uses `DataTransfer.setData` to write text/html data to the system
+        pasteboard. This markup string includes a meta tag with `charset="utf-8"`, indicating that the HTML string that
+        was copied should be interpreted as UTF-8 data.
+
+        However, before we write this data to the system pasteboard, we first sanitize it by loading it in a separate
+        page, and then build the final sanitized markup string to write by iterating over only visible content in the
+        main document of this page. Importantly, this last step skips over the meta element containing the charset.
+
+        Later, when pasting in Notes or TextEdit, both apps use `-[NSAttributedString initWithData:...:]` to convert the
+        HTML data on the pasteboard into an NSAttributedString. This takes the NSPasteboard's HTML data (a blob of
+        `NSData`) and synchronously loads it in a new legacy WebKit view by calling `-[WebFrame
+        loadData:MIMEType:textEncodingName:baseURL:]`, passing in `nil` as the text encoding name. Since WebKit is only
+        given a blob of data and no particular encoding, we fall back to default Latin-1 encoding, which produces
+        gibberish for CJK text.
+
+        To fix this, we automatically insert a `<meta charset="utf-8">` tag when writing HTML to the pasteboard, if the
+        sanitized markup contains non-ASCII characters.
+
+        Test: CopyHTML.SanitizationPreservesCharacterSet
+
+        * Modules/async-clipboard/ClipboardItemBindingsDataSource.cpp:
+        (WebCore::ClipboardItemBindingsDataSource::ClipboardItemTypeLoader::sanitizeDataIfNeeded):
+
+        Pass in AddMetaCharsetIfNeeded::Yes.
+
+        * dom/DataTransfer.cpp:
+        (WebCore::DataTransfer::setDataFromItemList):
+
+        Pass in AddMetaCharsetIfNeeded::Yes here too.
+
+        * editing/cocoa/WebContentReaderCocoa.mm:
+        (WebCore::sanitizeMarkupWithArchive):
+        (WebCore::WebContentReader::readHTML):
+        (WebCore::WebContentMarkupReader::readHTML):
+        * editing/markup.cpp:
+        (WebCore::sanitizeMarkup):
+
+        Add a new enum so that we only add the extra meta tag when sanitizing content that is being written to the
+        system pasteboard through one of the clipboard DOM APIs.
+
+        (WebCore::sanitizedMarkupForFragmentInDocument):
+        * editing/markup.h:
+
 2020-05-06  Tim Horton  <[email protected]>
 
         REGRESSION (r260753): Frequent crashes under TextIndicator's estimatedTextColorsForRange

Modified: trunk/Source/WebCore/Modules/async-clipboard/ClipboardItemBindingsDataSource.cpp (261246 => 261247)


--- trunk/Source/WebCore/Modules/async-clipboard/ClipboardItemBindingsDataSource.cpp	2020-05-06 19:57:34 UTC (rev 261246)
+++ trunk/Source/WebCore/Modules/async-clipboard/ClipboardItemBindingsDataSource.cpp	2020-05-06 20:02:40 UTC (rev 261247)
@@ -269,7 +269,7 @@
         if (markupToSanitize.isEmpty())
             return;
 
-        m_data = { sanitizeMarkup(markupToSanitize) };
+        m_data = { sanitizeMarkup(markupToSanitize, AddMetaCharsetIfNeeded::Yes) };
     }
 
     if (m_type == "image/png"_s) {

Modified: trunk/Source/WebCore/dom/DataTransfer.cpp (261246 => 261247)


--- trunk/Source/WebCore/dom/DataTransfer.cpp	2020-05-06 19:57:34 UTC (rev 261246)
+++ trunk/Source/WebCore/dom/DataTransfer.cpp	2020-05-06 20:02:40 UTC (rev 261247)
@@ -253,7 +253,7 @@
 
     String sanitizedData;
     if (type == "text/html")
-        sanitizedData = sanitizeMarkup(data);
+        sanitizedData = sanitizeMarkup(data, AddMetaCharsetIfNeeded::Yes);
     else if (type == "text/uri-list") {
         auto url = "" }, data);
         if (url.isValid())

Modified: trunk/Source/WebCore/editing/cocoa/WebContentReaderCocoa.mm (261246 => 261247)


--- trunk/Source/WebCore/editing/cocoa/WebContentReaderCocoa.mm	2020-05-06 19:57:34 UTC (rev 261246)
+++ trunk/Source/WebCore/editing/cocoa/WebContentReaderCocoa.mm	2020-05-06 20:02:40 UTC (rev 261247)
@@ -449,7 +449,7 @@
 
     if (shouldReplaceRichContentWithAttachments()) {
         replaceRichContentWithAttachments(frame, fragment, markupAndArchive.archive->subresources());
-        return sanitizedMarkupForFragmentInDocument(WTFMove(fragment), *stagingDocument, msoListQuirks, markupAndArchive.markup);
+        return sanitizedMarkupForFragmentInDocument(WTFMove(fragment), *stagingDocument, AddMetaCharsetIfNeeded::No, msoListQuirks, markupAndArchive.markup);
     }
 
     HashMap<AtomString, AtomString> blobURLMap;
@@ -492,7 +492,7 @@
 
     replaceSubresourceURLs(fragment.get(), WTFMove(blobURLMap));
 
-    return sanitizedMarkupForFragmentInDocument(WTFMove(fragment), *stagingDocument, msoListQuirks, markupAndArchive.markup);
+    return sanitizedMarkupForFragmentInDocument(WTFMove(fragment), *stagingDocument, AddMetaCharsetIfNeeded::No, msoListQuirks, markupAndArchive.markup);
 }
 
 bool WebContentReader::readWebArchive(SharedBuffer& buffer)
@@ -580,7 +580,7 @@
 
     String markup;
     if (RuntimeEnabledFeatures::sharedFeatures().customPasteboardDataEnabled() && shouldSanitize()) {
-        markup = sanitizeMarkup(stringOmittingMicrosoftPrefix, msoListQuirksForMarkup(), WTF::Function<void (DocumentFragment&)> { [] (DocumentFragment& fragment) {
+        markup = sanitizeMarkup(stringOmittingMicrosoftPrefix, AddMetaCharsetIfNeeded::No, msoListQuirksForMarkup(), WTF::Function<void (DocumentFragment&)> { [] (DocumentFragment& fragment) {
             removeSubresourceURLAttributes(fragment, [] (const URL& url) {
                 return shouldReplaceSubresourceURL(url);
             });
@@ -599,7 +599,7 @@
 
     String rawHTML = stripMicrosoftPrefix(string);
     if (shouldSanitize()) {
-        markup = sanitizeMarkup(rawHTML, msoListQuirksForMarkup(), WTF::Function<void (DocumentFragment&)> { [] (DocumentFragment& fragment) {
+        markup = sanitizeMarkup(rawHTML, AddMetaCharsetIfNeeded::No, msoListQuirksForMarkup(), WTF::Function<void (DocumentFragment&)> { [] (DocumentFragment& fragment) {
             removeSubresourceURLAttributes(fragment, [] (const URL& url) {
                 return shouldReplaceSubresourceURL(url);
             });

Modified: trunk/Source/WebCore/editing/markup.cpp (261246 => 261247)


--- trunk/Source/WebCore/editing/markup.cpp	2020-05-06 19:57:34 UTC (rev 261246)
+++ trunk/Source/WebCore/editing/markup.cpp	2020-05-06 20:02:40 UTC (rev 261247)
@@ -201,7 +201,7 @@
     return page;
 }
 
-String sanitizeMarkup(const String& rawHTML, MSOListQuirks msoListQuirks, Optional<WTF::Function<void(DocumentFragment&)>> fragmentSanitizer)
+String sanitizeMarkup(const String& rawHTML, AddMetaCharsetIfNeeded addMetaCharsetIfNeeded, MSOListQuirks msoListQuirks, Optional<WTF::Function<void(DocumentFragment&)>> fragmentSanitizer)
 {
     auto page = createPageForSanitizingWebContent();
     Document* stagingDocument = page->mainFrame().document();
@@ -212,7 +212,7 @@
     if (fragmentSanitizer)
         (*fragmentSanitizer)(fragment);
 
-    return sanitizedMarkupForFragmentInDocument(WTFMove(fragment), *stagingDocument, msoListQuirks, rawHTML);
+    return sanitizedMarkupForFragmentInDocument(WTFMove(fragment), *stagingDocument, addMetaCharsetIfNeeded, msoListQuirks, rawHTML);
 }
 
 enum class MSOListMode { Preserve, DoNotPreserve };
@@ -945,7 +945,7 @@
         && htmlTag.contains("xmlns:w=\"urn:schemas-microsoft-com:office:word\"");
 }
 
-String sanitizedMarkupForFragmentInDocument(Ref<DocumentFragment>&& fragment, Document& document, MSOListQuirks msoListQuirks, const String& originalMarkup)
+String sanitizedMarkupForFragmentInDocument(Ref<DocumentFragment>&& fragment, Document& document, AddMetaCharsetIfNeeded addMetaCharsetIfNeeded, MSOListQuirks msoListQuirks, const String& originalMarkup)
 {
     MSOListMode msoListMode = msoListQuirks == MSOListQuirks::CheckIfNeeded && shouldPreserveMSOLists(originalMarkup)
         ? MSOListMode::Preserve : MSOListMode::DoNotPreserve;
@@ -958,18 +958,31 @@
     auto result = serializePreservingVisualAppearanceInternal(firstPositionInNode(bodyElement.get()), lastPositionInNode(bodyElement.get()), nullptr,
         ResolveURLs::YesExcludingLocalFileURLsForPrivacy, SerializeComposedTree::No, AnnotateForInterchange::Yes, ConvertBlocksToInlines::No,  StandardFontFamilySerializationMode::Strip, msoListMode);
 
+    StringBuilder builder;
     if (msoListMode == MSOListMode::Preserve) {
-        StringBuilder builder;
         builder.appendLiteral("<html xmlns:o=\"urn:schemas-microsoft-com:office:office\"\n"
             "xmlns:w=\"urn:schemas-microsoft-com:office:word\"\n"
             "xmlns:m=\"http://schemas.microsoft.com/office/2004/12/omml\"\n"
             "xmlns=\"http://www.w3.org/TR/REC-html40\">");
-        builder.append(result);
-        builder.appendLiteral("</html>");
-        return builder.toString();
     }
 
-    return result;
+#if PLATFORM(COCOA)
+    if (addMetaCharsetIfNeeded == AddMetaCharsetIfNeeded::Yes && !result.isAllASCII()) {
+        // On Cocoa platforms, this markup is eventually persisted to the pasteboard and read back as UTF-8 data,
+        // so this meta tag is needed for clients that read this data in the future from the pasteboard and load it.
+        // This logic is used by both DataTransfer and Clipboard APIs to sanitize "text/html" from the page.
+        builder.appendLiteral("<meta charset=\"UTF-8\">");
+    }
+#else
+    UNUSED_PARAM(addMetaCharsetIfNeeded);
+#endif
+
+    builder.append(result);
+
+    if (msoListMode == MSOListMode::Preserve)
+        builder.appendLiteral("</html>");
+
+    return builder.toString();
 }
 
 static void restoreAttachmentElementsInFragment(DocumentFragment& fragment)

Modified: trunk/Source/WebCore/editing/markup.h (261246 => 261247)


--- trunk/Source/WebCore/editing/markup.h	2020-05-06 19:57:34 UTC (rev 261246)
+++ trunk/Source/WebCore/editing/markup.h	2020-05-06 20:02:40 UTC (rev 261247)
@@ -52,10 +52,11 @@
 void replaceSubresourceURLs(Ref<DocumentFragment>&&, HashMap<AtomString, AtomString>&&);
 void removeSubresourceURLAttributes(Ref<DocumentFragment>&&, WTF::Function<bool(const URL&)> shouldRemoveURL);
 
-enum class MSOListQuirks { CheckIfNeeded, Disabled };
 std::unique_ptr<Page> createPageForSanitizingWebContent();
-String sanitizeMarkup(const String&, MSOListQuirks = MSOListQuirks::Disabled, Optional<WTF::Function<void(DocumentFragment&)>> fragmentSanitizer = WTF::nullopt);
-String sanitizedMarkupForFragmentInDocument(Ref<DocumentFragment>&&, Document&, MSOListQuirks, const String& originalMarkup);
+enum class MSOListQuirks : bool { CheckIfNeeded, Disabled };
+enum class AddMetaCharsetIfNeeded : bool { No, Yes };
+String sanitizeMarkup(const String&, AddMetaCharsetIfNeeded = AddMetaCharsetIfNeeded::No, MSOListQuirks = MSOListQuirks::Disabled, Optional<WTF::Function<void(DocumentFragment&)>> fragmentSanitizer = WTF::nullopt);
+String sanitizedMarkupForFragmentInDocument(Ref<DocumentFragment>&&, Document&, AddMetaCharsetIfNeeded, MSOListQuirks, const String& originalMarkup);
 
 WEBCORE_EXPORT Ref<DocumentFragment> createFragmentFromText(Range& context, const String& text);
 WEBCORE_EXPORT Ref<DocumentFragment> createFragmentFromMarkup(Document&, const String& markup, const String& baseURL, ParserContentPolicy = AllowScriptingContent);

Modified: trunk/Source/WebKit/ChangeLog (261246 => 261247)


--- trunk/Source/WebKit/ChangeLog	2020-05-06 19:57:34 UTC (rev 261246)
+++ trunk/Source/WebKit/ChangeLog	2020-05-06 20:02:40 UTC (rev 261247)
@@ -1,3 +1,18 @@
+2020-05-06  Wenson Hsieh  <[email protected]>
+
+        Cut and paste from Google Doc to Notes in several (non-Latin) languages doesn't work
+        https://bugs.webkit.org/show_bug.cgi?id=211498
+        <rdar://problem/56675345>
+
+        Reviewed by Darin Adler.
+
+        Add a new header to allow Cocoa code to reason about UIColors and NSColors on iOS and macOS (respectively)
+        without requiring platform ifdefs. A followup patch will adopt this in several places in WebKit, where we
+        currently need ifdefs for iOS and macOS.
+
+        * Platform/cocoa/CocoaColor.h: Added.
+        * WebKit.xcodeproj/project.pbxproj:
+
 2020-05-06  Antoine Quint  <[email protected]>
 
         pointermove event sometimes has incorrect pointerType of 'mouse' on touch interactions

Added: trunk/Source/WebKit/Platform/cocoa/CocoaColor.h (0 => 261247)


--- trunk/Source/WebKit/Platform/cocoa/CocoaColor.h	                        (rev 0)
+++ trunk/Source/WebKit/Platform/cocoa/CocoaColor.h	2020-05-06 20:02:40 UTC (rev 261247)
@@ -0,0 +1,34 @@
+/*
+* Copyright (C) 2020 Apple Inc. All rights reserved.
+*
+* Redistribution and use in source and binary forms, with or without
+* modification, are permitted provided that the following conditions
+* are met:
+* 1. Redistributions of source code must retain the above copyright
+*    notice, this list of conditions and the following disclaimer.
+* 2. Redistributions in binary form must reproduce the above copyright
+*    notice, this list of conditions and the following disclaimer in the
+*    documentation and/or other materials provided with the distribution.
+*
+* THIS SOFTWARE IS PROVIDED BY APPLE INC. AND ITS CONTRIBUTORS ``AS IS''
+* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+* THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+* PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL APPLE INC. OR ITS CONTRIBUTORS
+* BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+* THE POSSIBILITY OF SUCH DAMAGE.
+*/
+
+#pragma once
+
+#if USE(APPKIT)
+@class NSColor;
+using CocoaColor = NSColor;
+#else
+@class UIColor;
+using CocoaColor = UIColor;
+#endif

Modified: trunk/Source/WebKit/WebKit.xcodeproj/project.pbxproj (261246 => 261247)


--- trunk/Source/WebKit/WebKit.xcodeproj/project.pbxproj	2020-05-06 19:57:34 UTC (rev 261246)
+++ trunk/Source/WebKit/WebKit.xcodeproj/project.pbxproj	2020-05-06 20:02:40 UTC (rev 261247)
@@ -1847,6 +1847,7 @@
 		F4D5F51F206087A10038BBA8 /* WKQuickboardListViewController.h in Headers */ = {isa = PBXBuildFile; fileRef = F4D5F51B206087A10038BBA8 /* WKQuickboardListViewController.h */; };
 		F4DB54E62319E733009E3155 /* WKHighlightLongPressGestureRecognizer.h in Headers */ = {isa = PBXBuildFile; fileRef = F4DB54E42319E733009E3155 /* WKHighlightLongPressGestureRecognizer.h */; };
 		F4EC94E32356CC57000BB614 /* ApplicationServicesSPI.h in Headers */ = {isa = PBXBuildFile; fileRef = 29D04E2821F7C73D0076741D /* ApplicationServicesSPI.h */; };
+		F4FE0A3B24632B60002631E1 /* CocoaColor.h in Headers */ = {isa = PBXBuildFile; fileRef = F4FE0A3A24632B10002631E1 /* CocoaColor.h */; };
 		F6113E25126CE1820057D0A7 /* APIUserContentURLPattern.h in Headers */ = {isa = PBXBuildFile; fileRef = F6113E24126CE1820057D0A7 /* APIUserContentURLPattern.h */; };
 		F6113E29126CE19B0057D0A7 /* WKUserContentURLPattern.h in Headers */ = {isa = PBXBuildFile; fileRef = F6113E27126CE19B0057D0A7 /* WKUserContentURLPattern.h */; settings = {ATTRIBUTES = (Private, ); }; };
 		F634445612A885C8000612D8 /* APISecurityOrigin.h in Headers */ = {isa = PBXBuildFile; fileRef = F634445512A885C8000612D8 /* APISecurityOrigin.h */; };
@@ -5372,6 +5373,7 @@
 		F4DB54E52319E733009E3155 /* WKHighlightLongPressGestureRecognizer.mm */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.objcpp; name = WKHighlightLongPressGestureRecognizer.mm; path = ios/WKHighlightLongPressGestureRecognizer.mm; sourceTree = "<group>"; };
 		F4F59AD32065A5C9006CAA46 /* WKSelectMenuListViewController.mm */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.objcpp; name = WKSelectMenuListViewController.mm; path = ios/forms/WKSelectMenuListViewController.mm; sourceTree = "<group>"; };
 		F4F59AD42065A5CA006CAA46 /* WKSelectMenuListViewController.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = WKSelectMenuListViewController.h; path = ios/forms/WKSelectMenuListViewController.h; sourceTree = "<group>"; };
+		F4FE0A3A24632B10002631E1 /* CocoaColor.h */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.c.h; path = CocoaColor.h; sourceTree = "<group>"; };
 		F6113E24126CE1820057D0A7 /* APIUserContentURLPattern.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = APIUserContentURLPattern.h; sourceTree = "<group>"; };
 		F6113E26126CE19B0057D0A7 /* WKUserContentURLPattern.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = WKUserContentURLPattern.cpp; sourceTree = "<group>"; };
 		F6113E27126CE19B0057D0A7 /* WKUserContentURLPattern.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = WKUserContentURLPattern.h; sourceTree = "<group>"; };
@@ -7684,6 +7686,7 @@
 		4450AEBE1DC3FAAC009943F2 /* cocoa */ = {
 			isa = PBXGroup;
 			children = (
+				F4FE0A3A24632B10002631E1 /* CocoaColor.h */,
 				4482734624528F6000A95493 /* CocoaImage.h */,
 				BCE0937614FB128B001138D9 /* LayerHostingContext.h */,
 				BCE0937514FB128B001138D9 /* LayerHostingContext.mm */,
@@ -10728,6 +10731,7 @@
 				41897EDA1F415D8A0016FA42 /* CacheStorageEngineConnection.h in Headers */,
 				1AA2E51D12E4C05E00BC4966 /* CGUtilities.h in Headers */,
 				57B4B46020B504AC00D4AD79 /* ClientCertificateAuthenticationXPCConstants.h in Headers */,
+				F4FE0A3B24632B60002631E1 /* CocoaColor.h in Headers */,
 				4482734724528F6000A95493 /* CocoaImage.h in Headers */,
 				CE11AD521CBC482F00681EE5 /* CodeSigning.h in Headers */,
 				37BEC4E119491486008B4286 /* CompletionHandlerCallChecker.h in Headers */,

Modified: trunk/Tools/ChangeLog (261246 => 261247)


--- trunk/Tools/ChangeLog	2020-05-06 19:57:34 UTC (rev 261246)
+++ trunk/Tools/ChangeLog	2020-05-06 20:02:40 UTC (rev 261247)
@@ -1,3 +1,23 @@
+2020-05-06  Wenson Hsieh  <[email protected]>
+
+        Cut and paste from Google Doc to Notes in several (non-Latin) languages doesn't work
+        https://bugs.webkit.org/show_bug.cgi?id=211498
+        <rdar://problem/56675345>
+
+        Reviewed by Darin Adler.
+
+        Add a test to verify that when writing markup to the clipboard via DOM API, if non-ASCII characters appear in
+        the written markup, they can still be converted to `NSAttributedString`s containing the expected non-Latin text.
+
+        * TestWebKitAPI/Configurations/Base.xcconfig:
+
+        Adjust header search paths so that we can import CocoaColor.h in WebKit.
+
+        * TestWebKitAPI/Tests/WebKitCocoa/CopyHTML.mm:
+        (readHTMLDataFromPasteboard):
+        (readHTMLStringFromPasteboard):
+        (readHTMLFromPasteboard): Deleted.
+
 2020-05-06  Ryan Haddad  <[email protected]>
 
         REGRESSION (r260278): TestWebKitAPI.Fullscreen.Delegate is timing out on macOS bots

Modified: trunk/Tools/TestWebKitAPI/Configurations/Base.xcconfig (261246 => 261247)


--- trunk/Tools/TestWebKitAPI/Configurations/Base.xcconfig	2020-05-06 19:57:34 UTC (rev 261246)
+++ trunk/Tools/TestWebKitAPI/Configurations/Base.xcconfig	2020-05-06 20:02:40 UTC (rev 261247)
@@ -33,7 +33,7 @@
 CLANG_CXX_LIBRARY = libc++;
 CLANG_ENABLE_OBJC_WEAK = YES;
 CLANG_WARN_CXX0X_EXTENSIONS = NO;
-HEADER_SEARCH_PATHS = ${BUILT_PRODUCTS_DIR}/usr/local/include $(WEBCORE_PRIVATE_HEADERS_DIR)/ForwardingHeaders $(BUILT_PRODUCTS_DIR)/WebCoreTestSupport ${SRCROOT};
+HEADER_SEARCH_PATHS = ${BUILT_PRODUCTS_DIR}/usr/local/include $(WEBCORE_PRIVATE_HEADERS_DIR)/ForwardingHeaders $(BUILT_PRODUCTS_DIR)/WebCoreTestSupport ${SRCROOT} $(SRCROOT)/../../Source/WebKit/Platform/cocoa;
 
 GCC_NO_COMMON_BLOCKS = YES;
 GCC_PREPROCESSOR_DEFINITIONS = $(DEBUG_DEFINES) $(FEATURE_DEFINES) U_DISABLE_RENAMING=1 U_SHOW_CPLUSPLUS_API=0 $(GCC_PREPROCESSOR_DEFINITIONS_$(PLATFORM_NAME));

Modified: trunk/Tools/TestWebKitAPI/Tests/WebKitCocoa/CopyHTML.mm (261246 => 261247)


--- trunk/Tools/TestWebKitAPI/Tests/WebKitCocoa/CopyHTML.mm	2020-05-06 19:57:34 UTC (rev 261246)
+++ trunk/Tools/TestWebKitAPI/Tests/WebKitCocoa/CopyHTML.mm	2020-05-06 20:02:40 UTC (rev 261247)
@@ -28,6 +28,7 @@
 
 #if PLATFORM(COCOA)
 
+#import "CocoaColor.h"
 #import "PlatformUtilities.h"
 #import "TestWKWebView.h"
 #import <WebKit/WKPreferencesPrivate.h>
@@ -44,13 +45,26 @@
 @end
 
 #if PLATFORM(MAC)
-NSString *readHTMLFromPasteboard()
+
+NSData *readHTMLDataFromPasteboard()
 {
+    return [[NSPasteboard generalPasteboard] dataForType:NSHTMLPboardType];
+}
+
+NSString *readHTMLStringFromPasteboard()
+{
     return [[NSPasteboard generalPasteboard] stringForType:NSHTMLPboardType];
 }
+
 #else
-NSString *readHTMLFromPasteboard()
+
+NSData *readHTMLDataFromPasteboard()
 {
+    return [[UIPasteboard generalPasteboard] dataForPasteboardType:(__bridge NSString *)kUTTypeHTML];
+}
+
+NSString *readHTMLStringFromPasteboard()
+{
     id value = [[UIPasteboard generalPasteboard] valueForPasteboardType:(__bridge NSString *)kUTTypeHTML];
     if ([value isKindOfClass:[NSData class]])
         value = [[[NSString alloc] initWithData:(NSData *)value encoding:NSUTF8StringEncoding] autorelease];
@@ -57,6 +71,7 @@
     ASSERT([value isKindOfClass:[NSString class]]);
     return (NSString *)value;
 }
+
 #endif
 
 static RetainPtr<TestWKWebView> createWebViewWithCustomPasteboardDataEnabled()
@@ -80,7 +95,7 @@
     EXPECT_TRUE([webView stringByEvaluatingJavaScript:@"didPaste"].boolValue);
     EXPECT_WK_STREQ("<meta content=\"secret\"><b _onmouseover_=\"dangerousCode()\">hello</b><!-- secret-->, world<script>dangerousCode()</script>",
         [webView stringByEvaluatingJavaScript:@"pastedHTML"]);
-    String htmlInNativePasteboard = readHTMLFromPasteboard();
+    String htmlInNativePasteboard = readHTMLStringFromPasteboard();
     EXPECT_TRUE(htmlInNativePasteboard.contains("hello"));
     EXPECT_TRUE(htmlInNativePasteboard.contains(", world"));
     EXPECT_FALSE(htmlInNativePasteboard.contains("secret"));
@@ -87,6 +102,56 @@
     EXPECT_FALSE(htmlInNativePasteboard.contains("dangerousCode"));
 }
 
+TEST(CopyHTML, SanitizationPreservesCharacterSet)
+{
+    Vector<std::pair<RetainPtr<NSString>, RetainPtr<NSData>>, 3> markupStringsAndData;
+    auto webView = createWebViewWithCustomPasteboardDataEnabled();
+    for (NSString *encodingName in @[ @"utf-8", @"windows-1252", @"bogus-encoding" ]) {
+        [webView synchronouslyLoadHTMLString:[NSString stringWithFormat:@"<!DOCTYPE html>"
+            "<body>"
+            "<meta charset='%@'>"
+            "<p id='copy'>Copy me</p>"
+            "<script>"
+            "copy.addEventListener('copy', e => {"
+            "    e.clipboardData.setData('text/html', `<span style='color: red;'>我叫謝文昇</span>`);"
+            "    e.preventDefault();"
+            "});"
+            "getSelection().selectAllChildren(copy);"
+            "</script>"
+            "</body>", encodingName]];
+        [webView copy:nil];
+        [webView waitForNextPresentationUpdate];
+
+        markupStringsAndData.append({ readHTMLStringFromPasteboard(), readHTMLDataFromPasteboard() });
+    }
+
+    for (auto& [copiedMarkup, copiedData] : markupStringsAndData) {
+        EXPECT_TRUE([copiedMarkup containsString:@"<span "]);
+        EXPECT_TRUE([copiedMarkup containsString:@"color: red;"]);
+        EXPECT_TRUE([copiedMarkup containsString:@"我叫謝文昇"]);
+        EXPECT_TRUE([copiedMarkup containsString:@"</span>"]);
+
+        NSError *attributedStringConversionError = nil;
+
+        auto attributedString = adoptNS([[NSAttributedString alloc] initWithData:copiedData.get() options:@{ NSDocumentTypeDocumentOption: NSHTMLTextDocumentType } documentAttributes:nil error:&attributedStringConversionError]);
+        EXPECT_WK_STREQ("我叫謝文昇", [attributedString string]);
+
+        __block BOOL foundColorAttribute = NO;
+        [attributedString enumerateAttribute:NSForegroundColorAttributeName inRange:NSMakeRange(0, 5) options:0 usingBlock:^(CocoaColor *color, NSRange range, BOOL *) {
+            CGFloat redComponent = 0;
+            CGFloat greenComponent = 0;
+            CGFloat blueComponent = 0;
+            [color getRed:&redComponent green:&greenComponent blue:&blueComponent alpha:nil];
+
+            EXPECT_EQ(1., redComponent);
+            EXPECT_EQ(0., greenComponent);
+            EXPECT_EQ(0., blueComponent);
+            foundColorAttribute = YES;
+        }];
+        EXPECT_TRUE(foundColorAttribute);
+    }
+}
+
 #if PLATFORM(MAC)
 
 TEST(CopyHTML, ItemTypesWhenCopyingWebContent)
_______________________________________________
webkit-changes mailing list
[email protected]
https://lists.webkit.org/mailman/listinfo/webkit-changes

Reply via email to