Re: Concerns regarding cross-origin copy/paste security
Adam Barth w...@adambarth.com skreiv Wed, 08 Feb 2012 00:05:54 +0100 FWIW, my main concern was the hidden data aspect because it can be abused for cross-site request forgery if a malicious site by getting the user to copy and paste gets access to form anti-CSRF tokens and such. That's certainly possible, but I don't think it's possible for us to protect against the long tail of risks here. In these sorts of cases, it can be better for security to not implement a half-correct solution and instead decide not to try to mitigate a particular risk. You are right here. Also, on considering the abuse potential of getData('text/html'), I've realised that we are not introducing much new threat surface here, since a simple paste into a rich text editing-enabled element already inserts markup so that the target page can see much of what I proposed removing. I've changed the spec from saying the implementation *must* apply the sanitization algorithm to saying the user agent *may* apply it, made it clear that it is merely a suggestion, removed some of the most draconian parts and marked it as informative. I think it still has some value as an informative section. http://dev.w3.org/cvsweb/~checkout~/2006/webapi/clipops/clipops-source.html?rev=1.15;content-type=text%2Fhtml Perhaps we should publish a new working draft now? -- Hallvord R. M. Steen Core tester, Opera Software
Re: Concerns regarding cross-origin copy/paste security
On Mon, May 16, 2011 at 8:41 PM, Hallvord R. M. Steen hallv...@opera.com wrote: On Thu, 05 May 2011 06:46:55 +0900, Daniel Cheng dch...@chromium.org wrote: There was a recent discussion involving directly exposing the HTML fragment in a paste to a page, since we're doing the parsing anyway for security reasons. I have some concerns regarding http://www.w3.org/TR/clipboard-apis/#cross-origin-copy-paste-of-source-code though. From my understanding, we are trying to protect against [1] hidden data being copied without a user's knowledge and [2] XSS via pasting hostile HTML. In my opinion, the algorithm as written is either going to remove too much information or not enough. If it removes too much, the HTML paste is effectively useless to a client app. If it doesn't remove enough, then the client app is going to have to sanitize the HTML itself anyway. FWIW, my main concern was the hidden data aspect because it can be abused for cross-site request forgery if a malicious site by getting the user to copy and paste gets access to form anti-CSRF tokens and such. That's certainly possible, but I don't think it's possible for us to protect against the long tail of risks here. In these sorts of cases, it can be better for security to not implement a half-correct solution and instead decide not to try to mitigate a particular risk. I *intend* to leave some processing of the HTML to the client application, for example the removal of third-party application-specific or browser-specific CSS properties. I see that Chrome applies different security policies depending on whether the content is read by a JavaScript (getData('text/html') - style) and inserted directly. You do some extra work to avoid XSS, such as removing on* event listener attributes and href=javascript: when content is inserted directly (you also remove some browser-specific elements and class names). This sort of clean up and processing on direct data insertion by the user-agent is not really in scope for the events spec IMO. That makes sense. The risk here is somewhat different from what you've articulated above. Rather than trying to prevent information leaks from the source of the copy to the target of the paste, these checks aim to prevent the source from injecting script into the target. However, for getData('text/html') it seems you do no clean-up at all, not for cross-origin paste either. Correct. The idea here is to have a secure default but still let a sophisticated web application handle the complicated cases if they want to. I just spoke with Ryosuke and Daniel, and we're considering tightening up the default behavior somewhat to prevent injections of style and other dangerous elements (probably by switching to a whitelist). Implementing the current spec would thus require that you tighten your existing security policy. Will you consider doing so, or would you rather argue for removal of any spec-mandated clean-up of cross-origin source code? IMHO, we shouldn't try to protect the source of the data, but we should aim to protect the target. My understanding of your message is that would cause us to remove the text in this spec. If we find a good whitelist for protecting the target, that's probably worth writing in a spec so that browsers can interoperate, but it doesn't have to be this spec if you feel that this behavior is out of scope. [2] is no different than using data from any other untrusted source, like dragging HTML or data from an XHR. It doesn't make sense to special-case HTML pastes. Using data is not the only threat model - limiting the damage potential when the page you paste into is malicious is harder. However, there is some overlap in the strategies we might use - for example event attributes are certainly hidden data, might contain secrets and might cause XSS attacks so you might argue for their removal based on both abuse scenarios though I think [2] is a more relevant threat. The problem is that the tail of where sensitive information might reside is long and thick, making these security measures only partially effective, at best. Adam
Re: Concerns regarding cross-origin copy/paste security
On 2/2/2012 10:48 PM, Ryosuke Niwa wrote: On Thu, Feb 2, 2012 at 10:43 PM, Charles Pritchard ch...@jumis.com mailto:ch...@jumis.com wrote: On 2/2/12 10:27 PM, Ryosuke Niwa wrote: On Thu, Feb 2, 2012 at 10:20 PM, Charles Pritchard ch...@jumis.com mailto:ch...@jumis.com wrote: Seems like a very minor risk for high security sites, e.g. banking, in identifying form elements. In the spirit of giving it some thought: But even for those websites, what could input / textarea elements can reveal more than what user sees? Many sites use input hidden elements with what are essentially image maps for entering a PIN. But any element with display:none will be removed so input hidden should be removed. It's becoming more common that top level domains are being restricted or redirected to country codes. It seems plausible that domains may further be restricted to HTTPS (SSL) signatures. Going further, sites may be restricted to those which serve appropriate security headers against XSS attacks. Disabling the copy mechanism for any portion of a site does risk censorship. But, we are only examining high security portions of high security sites, such as input hidden and input type=password. input[type=password] is a good one. We should probably get rid of the value in that case? Yes, I think so. I'm working on an application in which I do a lot of copy and paste work. I'll let you know if I come across anything I think should change. -Charles
Re: Concerns regarding cross-origin copy/paste security
Sorry for the extremely slow reply. It slipped through hundreds of emails :( On Mon, May 16, 2011 at 8:41 PM, Hallvord R. M. Steen hallv...@opera.comwrote: To me, it doesn't make sense to remove the other elements: - OBJECT: Could be used for SVG as I understand. OBJECT is considered a form element, so it might have hidden data associated with it. It can also contain plugin content that could inject scripts and be used for XSS attacks. It may be too far-fetched or draconian to remove it though. (SVG is rich enough to be its own can of worms by the way..) Given the improved support for inline SVG and MathML, it's probably okay to strip it. However, we should add EMBED to the list since it's a plugin element. - INPUT (non-hidden, non-password): Content is already available via text/plain. An input's @name attribute is basically hidden data the user will not be aware of pasting. I'm not sure how much of a threat this is, but we should give it some thought. You mean input name=~? I don't think that'll expose much information. I'd prefer not removing these attributes as I've seen bugs filed against WebKit for form control editors; apparently some people would like to create form control editors using contenteditable. - Ryosuke
Re: Concerns regarding cross-origin copy/paste security
On 2/2/12 10:14 PM, Ryosuke Niwa wrote: Sorry for the extremely slow reply. It slipped through hundreds of emails :( On Mon, May 16, 2011 at 8:41 PM, Hallvord R. M. Steen hallv...@opera.com mailto:hallv...@opera.com wrote: To me, it doesn't make sense to remove the other elements: - OBJECT: Could be used for SVG as I understand. OBJECT is considered a form element, so it might have hidden data associated with it. It can also contain plugin content that could inject scripts and be used for XSS attacks. It may be too far-fetched or draconian to remove it though. (SVG is rich enough to be its own can of worms by the way..) Given the improved support for inline SVG and MathML, it's probably okay to strip it. However, we should add EMBED to the list since it's a plugin element. - INPUT (non-hidden, non-password): Content is already available via text/plain. An input's @name attribute is basically hidden data the user will not be aware of pasting. I'm not sure how much of a threat this is, but we should give it some thought. You mean input name=~? I don't think that'll expose much information. I'd prefer not removing these attributes as I've seen bugs filed against WebKit for form control editors; apparently some people would like to create form control editors using contenteditable. Seems like a very minor risk for high security sites, e.g. banking, in identifying form elements. In the spirit of giving it some thought: There are various XSS headers that signal enhanced security for websites, even to browser extensions. Perhaps some of them ought to be used in the copy mechanism. That way the data never reaches the clipboard for paste. -Charles
Re: Concerns regarding cross-origin copy/paste security
On Thu, Feb 2, 2012 at 10:20 PM, Charles Pritchard ch...@jumis.com wrote: Seems like a very minor risk for high security sites, e.g. banking, in identifying form elements. In the spirit of giving it some thought: But even for those websites, what could input / textarea elements can reveal more than what user sees? There are various XSS headers that signal enhanced security for websites, even to browser extensions. Perhaps some of them ought to be used in the copy mechanism. That way the data never reaches the clipboard for paste. That's also an option and may need to be spec'ed to some extent. - Ryosuke
Re: Concerns regarding cross-origin copy/paste security
On 2/2/12 10:27 PM, Ryosuke Niwa wrote: On Thu, Feb 2, 2012 at 10:20 PM, Charles Pritchard ch...@jumis.com mailto:ch...@jumis.com wrote: Seems like a very minor risk for high security sites, e.g. banking, in identifying form elements. In the spirit of giving it some thought: But even for those websites, what could input / textarea elements can reveal more than what user sees? Many sites use input hidden elements with what are essentially image maps for entering a PIN. In that case, a user does not see the PIN, though they do see an image map which has been obscured through various means. I doubt there are security risks in this area. There are various XSS headers that signal enhanced security for websites, even to browser extensions. Perhaps some of them ought to be used in the copy mechanism. That way the data never reaches the clipboard for paste. That's also an option and may need to be spec'ed to some extent. It's the best I have to offer, in hypothesizing how we may address the concern. High security sites use high security headers. If they've opted into those headers, we can do a lot to limit data exposure. There are many sorts of XSS attacks for sites that do not implement security headers. We can't help those, there are just too many leaks. So, I'd focus on specifying additional clipboard constraints for high security sites. I would put out one word of caution: such restrictions could be used for censorship. I don't think we have an option there. It's becoming more common that top level domains are being restricted or redirected to country codes. It seems plausible that domains may further be restricted to HTTPS (SSL) signatures. Going further, sites may be restricted to those which serve appropriate security headers against XSS attacks. Disabling the copy mechanism for any portion of a site does risk censorship. But, we are only examining high security portions of high security sites, such as input hidden and input type=password. We're examining those elements for the sake of consumer protection for users doing online banking and otherwise cooperating in a secure environment for private data. That's a good thing. -Charles
Re: Concerns regarding cross-origin copy/paste security
On Thu, Feb 2, 2012 at 10:43 PM, Charles Pritchard ch...@jumis.com wrote: ** On 2/2/12 10:27 PM, Ryosuke Niwa wrote: On Thu, Feb 2, 2012 at 10:20 PM, Charles Pritchard ch...@jumis.comwrote: Seems like a very minor risk for high security sites, e.g. banking, in identifying form elements. In the spirit of giving it some thought: But even for those websites, what could input / textarea elements can reveal more than what user sees? Many sites use input hidden elements with what are essentially image maps for entering a PIN. But any element with display:none will be removed so input hidden should be removed. It's becoming more common that top level domains are being restricted or redirected to country codes. It seems plausible that domains may further be restricted to HTTPS (SSL) signatures. Going further, sites may be restricted to those which serve appropriate security headers against XSS attacks. Disabling the copy mechanism for any portion of a site does risk censorship. But, we are only examining high security portions of high security sites, such as input hidden and input type=password. input[type=password] is a good one. We should probably get rid of the value in that case? - Ryosuke
Re: Concerns regarding cross-origin copy/paste security
On Thu, 05 May 2011 06:46:55 +0900, Daniel Cheng dch...@chromium.org wrote: There was a recent discussion involving directly exposing the HTML fragment in a paste to a page, since we're doing the parsing anyway for security reasons. I have some concerns regarding http://www.w3.org/TR/clipboard-apis/#cross-origin-copy-paste-of-source-code though. From my understanding, we are trying to protect against [1] hidden data being copied without a user's knowledge and [2] XSS via pasting hostile HTML. In my opinion, the algorithm as written is either going to remove too much information or not enough. If it removes too much, the HTML paste is effectively useless to a client app. If it doesn't remove enough, then the client app is going to have to sanitize the HTML itself anyway. FWIW, my main concern was the hidden data aspect because it can be abused for cross-site request forgery if a malicious site by getting the user to copy and paste gets access to form anti-CSRF tokens and such. I *intend* to leave some processing of the HTML to the client application, for example the removal of third-party application-specific or browser-specific CSS properties. I see that Chrome applies different security policies depending on whether the content is read by a JavaScript (getData('text/html') - style) and inserted directly. You do some extra work to avoid XSS, such as removing on* event listener attributes and href=javascript: when content is inserted directly (you also remove some browser-specific elements and class names). This sort of clean up and processing on direct data insertion by the user-agent is not really in scope for the events spec IMO. However, for getData('text/html') it seems you do no clean-up at all, not for cross-origin paste either. Implementing the current spec would thus require that you tighten your existing security policy. Will you consider doing so, or would you rather argue for removal of any spec-mandated clean-up of cross-origin source code? I would argue that we should primarily be trying to prevent [1] and leave it up to web pages to prevent [2]. Chrome currently does neither for the getData() case - as far as I can tell. [2] is no different than using data from any other untrusted source, like dragging HTML or data from an XHR. It doesn't make sense to special-case HTML pastes. Using data is not the only threat model - limiting the damage potential when the page you paste into is malicious is harder. However, there is some overlap in the strategies we might use - for example event attributes are certainly hidden data, might contain secrets and might cause XSS attacks so you might argue for their removal based on both abuse scenarios though I think [2] is a more relevant threat. In order to achieve [1], the algorithm merely needs to be: - Remove HTML comments, script, input type=hidden, and all other elements that have no effect on layout (display: none). Possibly remove applet as well. - Remove event handlers, data- and form action attributes. - Blanking input type=password elements. So you still suggest removing event handlers even though this is primarily about your case [2]? To me, it doesn't make sense to remove the other elements: - OBJECT: Could be used for SVG as I understand. OBJECT is considered a form element, so it might have hidden data associated with it. It can also contain plugin content that could inject scripts and be used for XSS attacks. It may be too far-fetched or draconian to remove it though. (SVG is rich enough to be its own can of worms by the way..) - FORM: Essentially harmless once the action attribute is cleared. Agree. I've changed the spec to allow FORM but remove @action. - INPUT (non-hidden, non-password): Content is already available via text/plain. An input's @name attribute is basically hidden data the user will not be aware of pasting. I'm not sure how much of a threat this is, but we should give it some thought. - TEXTAREA: See above. Ditto :) - BUTTON, INPUT buttons: Most of the content is already available via text/plain. We can scrub the value attribute if there is concern about that. More about @name regarding the principle of hidden data. However, I can easily be convinced that violating user expectations as little as possible is more important than taking this principle to its extreme consequences ;-) Perhaps other people would like to chime in here? - SELECT/OPTION/OPTGROUP: See above. The draft also does not mention how EMBED elements should be handled. Any thoughts on this? Finally: If a script calls getData('text/html'), the implementation supports pasting HTML, and the data available on the clipboard is from a different origin, the implementation must sanitize the content by following these steps: Should this sanitization be done during a copy as well to prevent data a paste in a non-conforming browser
Re: Concerns regarding cross-origin copy/paste security
On Wed, May 4, 2011 at 2:46 PM, Daniel Cheng dch...@chromium.org wrote: From my understanding, we are trying to protect against [1] hidden data being copied without a user's knowledge and [2] XSS via pasting hostile HTML. In my opinion, the algorithm as written is either going to remove too much information or not enough. If it removes too much, the HTML paste is effectively useless to a client app. If it doesn't remove enough, then the client app is going to have to sanitize the HTML itself anyway. I would argue that we should primarily be trying to prevent [1] and leave it up to web pages to prevent [2]. [2] is no different than using data from any other untrusted source, like dragging HTML or data from an XHR. It doesn't make sense to special-case HTML pastes. However, fragment parsing algorithm as spec'ed in HTML5 already prevents [2]. It removes event handler, script element, etc... To me, it doesn't make sense to remove the other elements: - OBJECT: Could be used for SVG as I understand. - FORM: Essentially harmless once the action attribute is cleared. - INPUT (non-hidden, non-password): Content is already available via text/plain. - TEXTAREA: See above. - BUTTON, INPUT buttons: Most of the content is already available via text/plain. We can scrub the value attribute if there is concern about that. - SELECT/OPTION/OPTGROUP: See above. I'm also curious as to why these elements are being removed. Hallvord? Should this sanitization be done during a copy as well to prevent data a paste in a non-conforming browser from pasting unexpected things? We already do some of this stuff in WebKit. For example, we avoid serializing non-rendered contents. - Ryosuke
Concerns regarding cross-origin copy/paste security
There was a recent discussion involving directly exposing the HTML fragment in a paste to a page, since we're doing the parsing anyway for security reasons. I have some concerns regarding http://www.w3.org/TR/clipboard-apis/#cross-origin-copy-paste-of-source-codethough. From my understanding, we are trying to protect against [1] hidden data being copied without a user's knowledge and [2] XSS via pasting hostile HTML. In my opinion, the algorithm as written is either going to remove too much information or not enough. If it removes too much, the HTML paste is effectively useless to a client app. If it doesn't remove enough, then the client app is going to have to sanitize the HTML itself anyway. I would argue that we should primarily be trying to prevent [1] and leave it up to web pages to prevent [2]. [2] is no different than using data from any other untrusted source, like dragging HTML or data from an XHR. It doesn't make sense to special-case HTML pastes. In order to achieve [1], the algorithm merely needs to be: - Remove HTML comments, script, input type=hidden, and all other elements that have no effect on layout (display: none). Possibly remove applet as well. - Remove event handlers, data- and form action attributes. - Blanking input type=password elements. To me, it doesn't make sense to remove the other elements: - OBJECT: Could be used for SVG as I understand. - FORM: Essentially harmless once the action attribute is cleared. - INPUT (non-hidden, non-password): Content is already available via text/plain. - TEXTAREA: See above. - BUTTON, INPUT buttons: Most of the content is already available via text/plain. We can scrub the value attribute if there is concern about that. - SELECT/OPTION/OPTGROUP: See above. The draft also does not mention how EMBED elements should be handled. Finally: If a script calls getData('text/html'), the implementation supports pasting HTML, and the data available on the clipboard is from a different origin, the implementation must sanitize the content by following these steps: Should this sanitization be done during a copy as well to prevent data a paste in a non-conforming browser from pasting unexpected things? Daniel (resending from the right address, sorry for the spam Hallvord)