Re: [whatwg] Security restriction allows content thievery
On Thu, Sep 6, 2012 at 9:53 PM, Ian Hickson wrote: > On Fri, 7 Sep 2012, Fred Andrews wrote: >> I think the aim is to have the URL of the page that includes these data: >> URLs sent to the tracking server? > > Ah, I see. So say you have a page A, which itself contains a data: URL, > and you load that data: URL as page B, and in B there is a link to another > resource C, the argument here is that in the network request for C, the > referrer information should be of A, rather than B? > > That's an interesting idea... Any browser vendors want to chip in on this? We're unlikely to implement that in WebKit. We'd like to keep documents created by data URLs in a unique origin and avoid leaking privileges (including the privilege to send a certain Referer into the iframe). Adam
Re: [whatwg] Security restriction allows content thievery
On Fri, 7 Sep 2012, Fred Andrews wrote: > > I think the aim is to have the URL of the page that includes these data: > URLs sent to the tracking server? Ah, I see. So say you have a page A, which itself contains a data: URL, and you load that data: URL as page B, and in B there is a link to another resource C, the argument here is that in the network request for C, the referrer information should be of A, rather than B? That's an interesting idea... Any browser vendors want to chip in on this? Unless there is browser-vendor interest in implementing this, I don't intend to add it to the spec, since it seems a little esoteric and could leak referrers in cases where authors had previously assumed they'd be safe (e.g. if a Webmail app is opening e-mails in iframes using data: URLs to prevent the e-mail's images from including the user's webmail client's URL in the referrer information, or something). -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Security restriction allows content thievery
> > I'm currently building an analysis system like Google Analytics, which > > gets embedded into a website via a small JavaScript snippet. When I > > analyzed the data, I came across a very interesting trick because I got > > a lot of requests (with the data from location.href) where the entire > > website was embedded into a data:text/html URI - except that all ads of > > the page were replaced. Fortunately, my tracking code has been left > > without modifications. > > Weird. Perhaps the concern is that content has been copied into a data: URL in violation of copyrights and used to obtain Ad revenue. However the content could very well be used with permission. Ads are dynamic and do change on otherwise static content pages. Thus this could well be an honest use of technology. It would be interesting to know if the search engines actually look at content in data: URLs - if not then the 'copied' content would seem to bring little advantage. Or perhaps the concern is just that it thwarts efforts to track the referer. > > But the scary thing is that this way you can monetize foreign content by > > simply embedding it somewhere you can direct traffic to. That's pretty > > clever, because the original site owner doesn't notice this abuse due to > > the fact that top.location.href isn't readable. Or even worse, he would > > never notice it at all when he doesn't sniff the URI with JavaScript, > > because image files would have no referrer. > > > > My final approach to convict the abuser is based on the fact, that the > > JavaScript was dynamically loaded from my server and that I can write to > > location.href. So I added this piece of code: > > > > if (top.location.protocol === 'data:') { > > top.location.href = 'http://example.com/trap/'; > > } > > > > But even then the referrer will not be passed to the server. So my > > proposal is that the data URI schema gets an exception on this security > > behavior. > > I don't understand. What referrer are you trying to set? To what? I think the aim is to have the URL of the page that includes these data: URLs sent to the tracking server? I can't see any technical issues raised here? Some think trackers are 'scary' and consider user privacy and safety more important, and would prefer to not send a referer and to even have such Javascript sandboxed so that it can't leak private information. cheers Fred
Re: [whatwg] Security restriction allows content thievery
On Mon, 16 Jul 2012, Robert Eisele wrote: > > Browsers are very restrictive when one tries to access the contents of > different domains (including the scheme), embedded via framesets. This > is normally a good practice, but I'd suggest to weaken this restriction > for the data: URI schema. It already is. The origin of documents and images using data: URLs is essentially the origin of wherever you found the URL. > I'm currently building an analysis system like Google Analytics, which > gets embedded into a website via a small JavaScript snippet. When I > analyzed the data, I came across a very interesting trick because I got > a lot of requests (with the data from location.href) where the entire > website was embedded into a data:text/html URI - except that all ads of > the page were replaced. Fortunately, my tracking code has been left > without modifications. Weird. > But the scary thing is that this way you can monetize foreign content by > simply embedding it somewhere you can direct traffic to. That's pretty > clever, because the original site owner doesn't notice this abuse due to > the fact that top.location.href isn't readable. Or even worse, he would > never notice it at all when he doesn't sniff the URI with JavaScript, > because image files would have no referrer. > > My final approach to convict the abuser is based on the fact, that the > JavaScript was dynamically loaded from my server and that I can write to > location.href. So I added this piece of code: > > if (top.location.protocol === 'data:') { > top.location.href = 'http://example.com/trap/'; > } > > But even then the referrer will not be passed to the server. So my > proposal is that the data URI schema gets an exception on this security > behavior. I don't understand. What referrer are you trying to set? To what? -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Security restriction allows content thievery
On Sun, Jul 15, 2012 at 4:02 PM, Robert Eisele wrote: > 2012/7/16 Tab Atkins Jr. > > On Sun, Jul 15, 2012 at 3:22 PM, Robert Eisele wrote: > > > Browsers are very restrictive when one tries to access the contents of > > > different domains (including the scheme), embedded via framesets. This > is > > > normally a good practice, but I'd suggest to weaken this restriction > for > > > the data: URI schema. > > > > > > I'm currently building an analysis system like Google Analytics, which > > gets > > > embedded into a website via a small JavaScript snippet. When I analyzed > > the > > > data, I came across a very interesting trick because I got a lot of > > > requests (with the data from location.href) where the entire website > was > > > embedded into a data:text/html URI - except that all ads of the page > were > > > replaced. Fortunately, my tracking code has been left without > > > modifications. > > > > > > But the scary thing is that this way you can monetize foreign content > by > > > simply embedding it somewhere you can direct traffic to. That's pretty > > > clever, because the original site owner doesn't notice this abuse due > to > > > the fact that top.location.href isn't readable. Or even worse, he would > > > never notice it at all when he doesn't sniff the URI with JavaScript, > > > because image files would have no referrer. > > > > > > My final approach to convict the abuser is based on the fact, that the > > > JavaScript was dynamically loaded from my server and that I can write > to > > > location.href. So I added this piece of code: > > > > > > if (top.location.protocol === 'data:') { > > > top.location.href = 'http://example.com/trap/'; > > > } > > > > > > But even then the referrer will not be passed to the server. So my > > proposal > > > is that the data URI schema gets an exception on this security > behavior. > > > > The problem you outline is not directly tied to the solution you > > present. You can scrape a site and display it as your own without any > > fancy tricks, just by downloading all the resources and hosting them > > yourself. This merely consumes a little more bandwidth for the > > attacker, since they're hosting the images/etc themselves. > > > > But you would get a valid referrer if the tracking code wasn't removed. The > data: protects the abuser in an unecessary way. But you're absolutely right > that the solution I present isn't entirly tied to the problem. > The embedder can easily remove the tracking code. Better yet, the embedder can host the content on his server and disallow access to all external resources to cripple your tracking code. > The correct solution to this kind of problem is legal - this is simple > > copyright violation. > > But if you don't have a chance to get information about the attacker, you > can't sue him. I had the strange idea to use a prompt to ask the user for > the original URL in his address bar. But as I said, that's strange. > That sounds like a problem we can't solve. - Ryosuke
Re: [whatwg] Security restriction allows content thievery
2012/7/16 Tab Atkins Jr. > On Sun, Jul 15, 2012 at 3:22 PM, Robert Eisele wrote: > > Browsers are very restrictive when one tries to access the contents of > > different domains (including the scheme), embedded via framesets. This is > > normally a good practice, but I'd suggest to weaken this restriction for > > the data: URI schema. > > > > I'm currently building an analysis system like Google Analytics, which > gets > > embedded into a website via a small JavaScript snippet. When I analyzed > the > > data, I came across a very interesting trick because I got a lot of > > requests (with the data from location.href) where the entire website was > > embedded into a data:text/html URI - except that all ads of the page were > > replaced. Fortunately, my tracking code has been left without > > modifications. > > > > But the scary thing is that this way you can monetize foreign content by > > simply embedding it somewhere you can direct traffic to. That's pretty > > clever, because the original site owner doesn't notice this abuse due to > > the fact that top.location.href isn't readable. Or even worse, he would > > never notice it at all when he doesn't sniff the URI with JavaScript, > > because image files would have no referrer. > > > > My final approach to convict the abuser is based on the fact, that the > > JavaScript was dynamically loaded from my server and that I can write to > > location.href. So I added this piece of code: > > > > if (top.location.protocol === 'data:') { > > top.location.href = 'http://example.com/trap/'; > > } > > > > But even then the referrer will not be passed to the server. So my > proposal > > is that the data URI schema gets an exception on this security behavior. > > The problem you outline is not directly tied to the solution you > present. You can scrape a site and display it as your own without any > fancy tricks, just by downloading all the resources and hosting them > yourself. This merely consumes a little more bandwidth for the > attacker, since they're hosting the images/etc themselves. > But you would get a valid referrer if the tracking code wasn't removed. The data: protects the abuser in an unecessary way. But you're absolutely right that the solution I present isn't entirly tied to the problem. > The correct solution to this kind of problem is legal - this is simple > copyright violation. > But if you don't have a chance to get information about the attacker, you can't sue him. I had the strange idea to use a prompt to ask the user for the original URL in his address bar. But as I said, that's strange. > > I'm not sure about the merits of your suggestion otherwise. It's > reasonable to make data: pages same-origin with their parent when > they're contained within something, but it seems dodgy to make them > same-origin with their *contained* pages as well. If not done > carefully, that could allow contained pages access to the data: page's > parent as well, or other cross-origin pages that the data: page is > containing. > Very intuitive thought, one could assume that data: pages are same-origin, or better that embedded data: pages are part of the current page. In this way, you wouldn't have the chance to get off the sandbox and access the parent. What would be a situation where a same-origin could be dangerous? > > ~TJ >
Re: [whatwg] Security restriction allows content thievery
On Sun, Jul 15, 2012 at 3:22 PM, Robert Eisele wrote: > Browsers are very restrictive when one tries to access the contents of > different domains (including the scheme), embedded via framesets. This is > normally a good practice, but I'd suggest to weaken this restriction for > the data: URI schema. > > I'm currently building an analysis system like Google Analytics, which gets > embedded into a website via a small JavaScript snippet. When I analyzed the > data, I came across a very interesting trick because I got a lot of > requests (with the data from location.href) where the entire website was > embedded into a data:text/html URI - except that all ads of the page were > replaced. Fortunately, my tracking code has been left without > modifications. > > But the scary thing is that this way you can monetize foreign content by > simply embedding it somewhere you can direct traffic to. That's pretty > clever, because the original site owner doesn't notice this abuse due to > the fact that top.location.href isn't readable. Or even worse, he would > never notice it at all when he doesn't sniff the URI with JavaScript, > because image files would have no referrer. > > My final approach to convict the abuser is based on the fact, that the > JavaScript was dynamically loaded from my server and that I can write to > location.href. So I added this piece of code: > > if (top.location.protocol === 'data:') { > top.location.href = 'http://example.com/trap/'; > } > > But even then the referrer will not be passed to the server. So my proposal > is that the data URI schema gets an exception on this security behavior. The problem you outline is not directly tied to the solution you present. You can scrape a site and display it as your own without any fancy tricks, just by downloading all the resources and hosting them yourself. This merely consumes a little more bandwidth for the attacker, since they're hosting the images/etc themselves. The correct solution to this kind of problem is legal - this is simple copyright violation. I'm not sure about the merits of your suggestion otherwise. It's reasonable to make data: pages same-origin with their parent when they're contained within something, but it seems dodgy to make them same-origin with their *contained* pages as well. If not done carefully, that could allow contained pages access to the data: page's parent as well, or other cross-origin pages that the data: page is containing. ~TJ
[whatwg] Security restriction allows content thievery
Browsers are very restrictive when one tries to access the contents of different domains (including the scheme), embedded via framesets. This is normally a good practice, but I'd suggest to weaken this restriction for the data: URI schema. I'm currently building an analysis system like Google Analytics, which gets embedded into a website via a small JavaScript snippet. When I analyzed the data, I came across a very interesting trick because I got a lot of requests (with the data from location.href) where the entire website was embedded into a data:text/html URI - except that all ads of the page were replaced. Fortunately, my tracking code has been left without modifications. But the scary thing is that this way you can monetize foreign content by simply embedding it somewhere you can direct traffic to. That's pretty clever, because the original site owner doesn't notice this abuse due to the fact that top.location.href isn't readable. Or even worse, he would never notice it at all when he doesn't sniff the URI with JavaScript, because image files would have no referrer. My final approach to convict the abuser is based on the fact, that the JavaScript was dynamically loaded from my server and that I can write to location.href. So I added this piece of code: if (top.location.protocol === 'data:') { top.location.href = 'http://example.com/trap/'; } But even then the referrer will not be passed to the server. So my proposal is that the data URI schema gets an exception on this security behavior. Kind Regards Robert Eisele http://www.xarg.org/