On Thu, Apr 23, 2015 at 1:16 AM Hallvord Reiar Michaelsen Steen < hst...@mozilla.com> wrote:
> We're exploring text/html paste behaviours in Mozilla bug 586587  and > running into some tricky questions I'd like to discuss here. > > Basically, on Windows IE and other apps that write HTML to the clipboard > use the CF_HTML format. This format is simply described as > > > headers (name:value meta data) > > > > <html><head></head> > > <body> > > <!--StartFragment-->HTML<!--EndFragment--> > > </body> > > </html> > > where the StartFragment / EndFragment comment tags are inserted by > implementations writing HTML to the clipboard to show where the actually > selected content starts and ends. Several very common implementations > (including I believe Microsoft Word's) will add tags like STYLE outside of > the StartFragment/EndFragment tags and add rules that may be significant > for rendering the content of the fragment correctly. Also noteworthy is > that the meta data may include a SourceURL property showing the URL of the > page you copied from. > > So, because of the significance of the STYLE information and other stuff > outside Start/EndFragment, certain browsers return the full document > including the Start/EndFragment comment tags when a script does > getData('text/html'). This is obviously very useful when there's important > stuff outside these tags. It still means scripts have to do extra work to > find those comments and extract the content inside them to know what data a > user actually intended to paste. This also adds a risk that scripts will be > tested only on Windows and authored to require those comments and fail if > they aren't there on other platforms. > > Chrome's behavior is to return the literal HTML data, but without the metadata header when a page calls getData('text/html'). However, if Chrome is executing the default action of paste, we attempt to parse out the fragment and only paste the fragment (however, we incorrectly don't include styles). > Should we, then, standardise returning the full document including > Start/EndFragment comments (basically requiring or encouraging other > platform implementations to start using those comments when serializing > HTML for the OS clipboard) - or should getData() return only what's inside > the Start/EndFragment tags? Are any other important platforms already using > CF_HTML conventions, or would their developers balk at being encouraged to > do so? > CF_HTML is not a format that any other app on any other platform would be expecting, so you wouldn't be able to just start writing it to the clipboard on Mac/Linux in place of the original HTML. So there's a bit of a chicken and egg problem here. I also can't say I love the CF_HTML format: the markup is a lot easier to work with when the styles are inlined, etc. Plus pasting <style> blocks means there might be collisions in style rules, etc. > > On a related topic, I see SourceURL as useful (could be used to properly > attribute citations automatically and such) - it would be nice to > standardise DataTransfer.sourceURL or something like that, to be set when > available. > -Hallvord > (editor of https://w3c.github.io/clipboard-apis/ ) >  https://bugzilla.mozilla.org/show_bug.cgi?id=586587 > You'd have to get all UAs to agree on a data property to use to transfer this since I don't think using CF_HTML on other platforms is currently workable.