Re: [clipboard] Dilemma: getData('text/html') and useful CF_HTML quirks

2015-04-23 Thread Daniel Cheng
On Thu, Apr 23, 2015 at 1:16 AM Hallvord Reiar Michaelsen Steen 
hst...@mozilla.com wrote:

 We're exploring text/html paste behaviours in Mozilla bug 586587 [1] and
 running into some tricky questions I'd like to discuss here.

 Basically, on Windows IE and other apps that write HTML to the clipboard
 use the CF_HTML format. This format is simply described as

  headers (name:value meta data)
 
  htmlhead/head
  body
  !--StartFragment--HTML!--EndFragment--
  /body
  /html

 where the StartFragment / EndFragment comment tags are inserted by
 implementations writing HTML to the clipboard to show where the actually
 selected content starts and ends. Several very common implementations
 (including I believe Microsoft Word's) will add tags like STYLE outside of
 the StartFragment/EndFragment tags and add rules that may be significant
 for rendering the content of the fragment correctly. Also noteworthy is
 that the meta data may include a SourceURL property showing the URL of the
 page you copied from.

 So, because of the significance of the STYLE information and other stuff
 outside Start/EndFragment, certain browsers return the full document
 including the Start/EndFragment comment tags when a script does
 getData('text/html'). This is obviously very useful when there's important
 stuff outside these tags. It still means scripts have to do extra work to
 find those comments and extract the content inside them to know what data a
 user actually intended to paste. This also adds a risk that scripts will be
 tested only on Windows and authored to require those comments and fail if
 they aren't there on other platforms.


Chrome's behavior is to return the literal HTML data, but without the
metadata header when a page calls getData('text/html'). However, if Chrome
is executing the default action of paste, we attempt to parse out the
fragment and only paste the fragment (however, we incorrectly don't include
styles).


 Should we, then, standardise returning the full document including
 Start/EndFragment comments (basically requiring or encouraging other
 platform implementations to start using those comments when serializing
 HTML for the OS clipboard) - or should getData() return only what's inside
 the Start/EndFragment tags? Are any other important platforms already using
 CF_HTML conventions, or would their developers balk at being encouraged to
 do so?


CF_HTML is not a format that any other app on any other platform would be
expecting, so you wouldn't be able to just start writing it to the
clipboard on Mac/Linux in place of the original HTML. So there's a bit of a
chicken and egg problem here.

I also can't say I love the CF_HTML format: the markup is a lot easier to
work with when the styles are inlined, etc. Plus pasting style blocks
means there might be collisions in style rules, etc.



 On a related topic, I see SourceURL as useful (could be used to properly
 attribute citations automatically and such) - it would be nice to
 standardise DataTransfer.sourceURL or something like that, to be set when
 available.
 -Hallvord
 (editor of https://w3c.github.io/clipboard-apis/ )
 [1] https://bugzilla.mozilla.org/show_bug.cgi?id=586587


You'd have to get all UAs to agree on a data property to use to transfer
this since I don't think using CF_HTML on other platforms is currently
workable.


Re: [clipboard] Dilemma: getData('text/html') and useful CF_HTML quirks

2015-04-23 Thread Ted Mielczarek
On Thu, Apr 23, 2015 at 4:13 AM, Hallvord Reiar Michaelsen Steen 
hst...@mozilla.com wrote:

 We're exploring text/html paste behaviours in Mozilla bug 586587 [1] and
 running into some tricky questions I'd like to discuss here.

 Basically, on Windows IE and other apps that write HTML to the clipboard
 use the CF_HTML format. This format is simply described as


There's some related discussion in bug 137450[1], I wrote a patch for that
quite a few years ago (that never landed). In the comments someone pointed
out that Microsoft has documented CF_HTML[2] since then, which is nice.

-Ted

1. https://bugzilla.mozilla.org/show_bug.cgi?id=137450#c33
2. https://msdn.microsoft.com/en-us/library/aa767917%28v=vs.85%29.aspx


[clipboard] Dilemma: getData('text/html') and useful CF_HTML quirks

2015-04-23 Thread Hallvord Reiar Michaelsen Steen
We're exploring text/html paste behaviours in Mozilla bug 586587 [1] and
running into some tricky questions I'd like to discuss here.

Basically, on Windows IE and other apps that write HTML to the clipboard
use the CF_HTML format. This format is simply described as

 headers (name:value meta data)

 htmlhead/head
 body
 !--StartFragment--HTML!--EndFragment--
 /body
 /html

where the StartFragment / EndFragment comment tags are inserted by
implementations writing HTML to the clipboard to show where the actually
selected content starts and ends. Several very common implementations
(including I believe Microsoft Word's) will add tags like STYLE outside of
the StartFragment/EndFragment tags and add rules that may be significant
for rendering the content of the fragment correctly. Also noteworthy is
that the meta data may include a SourceURL property showing the URL of the
page you copied from.

So, because of the significance of the STYLE information and other stuff
outside Start/EndFragment, certain browsers return the full document
including the Start/EndFragment comment tags when a script does
getData('text/html'). This is obviously very useful when there's important
stuff outside these tags. It still means scripts have to do extra work to
find those comments and extract the content inside them to know what data a
user actually intended to paste. This also adds a risk that scripts will be
tested only on Windows and authored to require those comments and fail if
they aren't there on other platforms.

Should we, then, standardise returning the full document including
Start/EndFragment comments (basically requiring or encouraging other
platform implementations to start using those comments when serializing
HTML for the OS clipboard) - or should getData() return only what's inside
the Start/EndFragment tags? Are any other important platforms already using
CF_HTML conventions, or would their developers balk at being encouraged to
do so?

On a related topic, I see SourceURL as useful (could be used to properly
attribute citations automatically and such) - it would be nice to
standardise DataTransfer.sourceURL or something like that, to be set when
available.
-Hallvord
(editor of https://w3c.github.io/clipboard-apis/ )
[1] https://bugzilla.mozilla.org/show_bug.cgi?id=586587