On Thu, Apr 23, 2015 at 1:16 AM Hallvord Reiar Michaelsen Steen
hst...@mozilla.com wrote:
We're exploring text/html paste behaviours in Mozilla bug 586587 [1] and
running into some tricky questions I'd like to discuss here.
Basically, on Windows IE and other apps that write HTML to the clipboard
use the CF_HTML format. This format is simply described as
headers (name:value meta data)
htmlhead/head
body
!--StartFragment--HTML!--EndFragment--
/body
/html
where the StartFragment / EndFragment comment tags are inserted by
implementations writing HTML to the clipboard to show where the actually
selected content starts and ends. Several very common implementations
(including I believe Microsoft Word's) will add tags like STYLE outside of
the StartFragment/EndFragment tags and add rules that may be significant
for rendering the content of the fragment correctly. Also noteworthy is
that the meta data may include a SourceURL property showing the URL of the
page you copied from.
So, because of the significance of the STYLE information and other stuff
outside Start/EndFragment, certain browsers return the full document
including the Start/EndFragment comment tags when a script does
getData('text/html'). This is obviously very useful when there's important
stuff outside these tags. It still means scripts have to do extra work to
find those comments and extract the content inside them to know what data a
user actually intended to paste. This also adds a risk that scripts will be
tested only on Windows and authored to require those comments and fail if
they aren't there on other platforms.
Chrome's behavior is to return the literal HTML data, but without the
metadata header when a page calls getData('text/html'). However, if Chrome
is executing the default action of paste, we attempt to parse out the
fragment and only paste the fragment (however, we incorrectly don't include
styles).
Should we, then, standardise returning the full document including
Start/EndFragment comments (basically requiring or encouraging other
platform implementations to start using those comments when serializing
HTML for the OS clipboard) - or should getData() return only what's inside
the Start/EndFragment tags? Are any other important platforms already using
CF_HTML conventions, or would their developers balk at being encouraged to
do so?
CF_HTML is not a format that any other app on any other platform would be
expecting, so you wouldn't be able to just start writing it to the
clipboard on Mac/Linux in place of the original HTML. So there's a bit of a
chicken and egg problem here.
I also can't say I love the CF_HTML format: the markup is a lot easier to
work with when the styles are inlined, etc. Plus pasting style blocks
means there might be collisions in style rules, etc.
On a related topic, I see SourceURL as useful (could be used to properly
attribute citations automatically and such) - it would be nice to
standardise DataTransfer.sourceURL or something like that, to be set when
available.
-Hallvord
(editor of https://w3c.github.io/clipboard-apis/ )
[1] https://bugzilla.mozilla.org/show_bug.cgi?id=586587
You'd have to get all UAs to agree on a data property to use to transfer
this since I don't think using CF_HTML on other platforms is currently
workable.