Re: [whatwg] Security restriction allows content thievery

2012-09-07 Thread Adam Barth
On Thu, Sep 6, 2012 at 9:53 PM, Ian Hickson  wrote:
> On Fri, 7 Sep 2012, Fred Andrews wrote:
>> I think the aim is to have the URL of the page that includes these data:
>> URLs sent to the tracking server?
>
> Ah, I see. So say you have a page A, which itself contains a data: URL,
> and you load that data: URL as page B, and in B there is a link to another
> resource C, the argument here is that in the network request for C, the
> referrer information should be of A, rather than B?
>
> That's an interesting idea... Any browser vendors want to chip in on this?

We're unlikely to implement that in WebKit.  We'd like to keep
documents created by data URLs in a unique origin and avoid leaking
privileges (including the privilege to send a certain Referer into the
iframe).

Adam


Re: [whatwg] Security restriction allows content thievery

2012-09-06 Thread Ian Hickson
On Fri, 7 Sep 2012, Fred Andrews wrote:
> 
> I think the aim is to have the URL of the page that includes these data: 
> URLs sent to the tracking server?

Ah, I see. So say you have a page A, which itself contains a data: URL, 
and you load that data: URL as page B, and in B there is a link to another 
resource C, the argument here is that in the network request for C, the 
referrer information should be of A, rather than B?

That's an interesting idea... Any browser vendors want to chip in on this?

Unless there is browser-vendor interest in implementing this, I don't 
intend to add it to the spec, since it seems a little esoteric and could 
leak referrers in cases where authors had previously assumed they'd be 
safe (e.g. if a Webmail app is opening e-mails in iframes using data: URLs 
to prevent the e-mail's images from including the user's webmail client's 
URL in the referrer information, or something).

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Security restriction allows content thievery

2012-09-06 Thread Fred Andrews

> > I'm currently building an analysis system like Google Analytics, which 
> > gets embedded into a website via a small JavaScript snippet. When I 
> > analyzed the data, I came across a very interesting trick because I got 
> > a lot of requests (with the data from location.href) where the entire 
> > website was embedded into a data:text/html URI - except that all ads of 
> > the page were replaced. Fortunately, my tracking code has been left 
> > without modifications.
> 
> Weird.

Perhaps the concern is that content has been copied into a data: URL in 
violation of copyrights and used to obtain Ad revenue. However the content 
could very well be used with permission.  Ads are dynamic and do change on 
otherwise static content pages.  Thus this could well be an honest use of 
technology. It would be interesting to know if the search engines actually look 
at content in data: URLs - if not then the 'copied' content would seem to bring 
little advantage.

Or perhaps the concern is just that it thwarts efforts to track the referer.
 
> > But the scary thing is that this way you can monetize foreign content by 
> > simply embedding it somewhere you can direct traffic to. That's pretty 
> > clever, because the original site owner doesn't notice this abuse due to 
> > the fact that top.location.href isn't readable. Or even worse, he would 
> > never notice it at all when he doesn't sniff the URI with JavaScript, 
> > because image files would have no referrer.
> > 
> > My final approach to convict the abuser is based on the fact, that the 
> > JavaScript was dynamically loaded from my server and that I can write to 
> > location.href. So I added this piece of code:
> > 
> > if (top.location.protocol === 'data:') {
> > top.location.href = 'http://example.com/trap/';
> > }
> > 
> > But even then the referrer will not be passed to the server. So my 
> > proposal is that the data URI schema gets an exception on this security 
> > behavior.
> 
> I don't understand. What referrer are you trying to set? To what?

I think the aim is to have the URL of the page that includes these data: URLs 
sent to the tracking server?

I can't see any technical issues raised here?

Some think trackers are 'scary' and consider user privacy and safety more 
important, and would prefer to not send a referer and to even have such  
Javascript sandboxed so that it can't leak private information.

cheers
Fred





  

Re: [whatwg] Security restriction allows content thievery

2012-09-06 Thread Ian Hickson
On Mon, 16 Jul 2012, Robert Eisele wrote:
>
> Browsers are very restrictive when one tries to access the contents of 
> different domains (including the scheme), embedded via framesets. This 
> is normally a good practice, but I'd suggest to weaken this restriction 
> for the data: URI schema.

It already is. The origin of documents and images using data: URLs is 
essentially the origin of wherever you found the URL.


> I'm currently building an analysis system like Google Analytics, which 
> gets embedded into a website via a small JavaScript snippet. When I 
> analyzed the data, I came across a very interesting trick because I got 
> a lot of requests (with the data from location.href) where the entire 
> website was embedded into a data:text/html URI - except that all ads of 
> the page were replaced. Fortunately, my tracking code has been left 
> without modifications.

Weird.


> But the scary thing is that this way you can monetize foreign content by 
> simply embedding it somewhere you can direct traffic to. That's pretty 
> clever, because the original site owner doesn't notice this abuse due to 
> the fact that top.location.href isn't readable. Or even worse, he would 
> never notice it at all when he doesn't sniff the URI with JavaScript, 
> because image files would have no referrer.
> 
> My final approach to convict the abuser is based on the fact, that the 
> JavaScript was dynamically loaded from my server and that I can write to 
> location.href. So I added this piece of code:
> 
> if (top.location.protocol === 'data:') {
> top.location.href = 'http://example.com/trap/';
> }
> 
> But even then the referrer will not be passed to the server. So my 
> proposal is that the data URI schema gets an exception on this security 
> behavior.

I don't understand. What referrer are you trying to set? To what?

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Security restriction allows content thievery

2012-07-15 Thread Ryosuke Niwa
On Sun, Jul 15, 2012 at 4:02 PM, Robert Eisele  wrote:

> 2012/7/16 Tab Atkins Jr. 
> > On Sun, Jul 15, 2012 at 3:22 PM, Robert Eisele  wrote:
> > > Browsers are very restrictive when one tries to access the contents of
> > > different domains (including the scheme), embedded via framesets. This
> is
> > > normally a good practice, but I'd suggest to weaken this restriction
> for
> > > the data: URI schema.
> > >
> > > I'm currently building an analysis system like Google Analytics, which
> > gets
> > > embedded into a website via a small JavaScript snippet. When I analyzed
> > the
> > > data, I came across a very interesting trick because I got a lot of
> > > requests (with the data from location.href) where the entire website
> was
> > > embedded into a data:text/html URI - except that all ads of the page
> were
> > > replaced. Fortunately, my tracking code has been left without
> > > modifications.
> > >
> > > But the scary thing is that this way you can monetize foreign content
> by
> > > simply embedding it somewhere you can direct traffic to. That's pretty
> > > clever, because the original site owner doesn't notice this abuse due
> to
> > > the fact that top.location.href isn't readable. Or even worse, he would
> > > never notice it at all when he doesn't sniff the URI with JavaScript,
> > > because image files would have no referrer.
> > >
> > > My final approach to convict the abuser is based on the fact, that the
> > > JavaScript was dynamically loaded from my server and that I can write
> to
> > > location.href. So I added this piece of code:
> > >
> > > if (top.location.protocol === 'data:') {
> > > top.location.href = 'http://example.com/trap/';
> > > }
> > >
> > > But even then the referrer will not be passed to the server. So my
> > proposal
> > > is that the data URI schema gets an exception on this security
> behavior.
> >
> > The problem you outline is not directly tied to the solution you
> > present.  You can scrape a site and display it as your own without any
> > fancy tricks, just by downloading all the resources and hosting them
> > yourself.  This merely consumes a little more bandwidth for the
> > attacker, since they're hosting the images/etc themselves.
> >
>
> But you would get a valid referrer if the tracking code wasn't removed. The
> data: protects the abuser in an unecessary way. But you're absolutely right
> that the solution I present isn't entirly tied to the problem.
>

The embedder can easily remove the tracking code. Better yet, the embedder
can host the content on his server and disallow access to all external
resources to cripple your tracking code.

> The correct solution to this kind of problem is legal - this is simple
> > copyright violation.
>
> But if you don't have a chance to get information about the attacker, you
> can't sue him. I had the strange idea to use a prompt to ask the user for
> the original URL in his address bar. But as I said, that's strange.
>

That sounds like a problem we can't solve.

- Ryosuke


Re: [whatwg] Security restriction allows content thievery

2012-07-15 Thread Robert Eisele
2012/7/16 Tab Atkins Jr. 

> On Sun, Jul 15, 2012 at 3:22 PM, Robert Eisele  wrote:
> > Browsers are very restrictive when one tries to access the contents of
> > different domains (including the scheme), embedded via framesets. This is
> > normally a good practice, but I'd suggest to weaken this restriction for
> > the data: URI schema.
> >
> > I'm currently building an analysis system like Google Analytics, which
> gets
> > embedded into a website via a small JavaScript snippet. When I analyzed
> the
> > data, I came across a very interesting trick because I got a lot of
> > requests (with the data from location.href) where the entire website was
> > embedded into a data:text/html URI - except that all ads of the page were
> > replaced. Fortunately, my tracking code has been left without
> > modifications.
> >
> > But the scary thing is that this way you can monetize foreign content by
> > simply embedding it somewhere you can direct traffic to. That's pretty
> > clever, because the original site owner doesn't notice this abuse due to
> > the fact that top.location.href isn't readable. Or even worse, he would
> > never notice it at all when he doesn't sniff the URI with JavaScript,
> > because image files would have no referrer.
> >
> > My final approach to convict the abuser is based on the fact, that the
> > JavaScript was dynamically loaded from my server and that I can write to
> > location.href. So I added this piece of code:
> >
> > if (top.location.protocol === 'data:') {
> > top.location.href = 'http://example.com/trap/';
> > }
> >
> > But even then the referrer will not be passed to the server. So my
> proposal
> > is that the data URI schema gets an exception on this security behavior.
>
> The problem you outline is not directly tied to the solution you
> present.  You can scrape a site and display it as your own without any
> fancy tricks, just by downloading all the resources and hosting them
> yourself.  This merely consumes a little more bandwidth for the
> attacker, since they're hosting the images/etc themselves.
>

But you would get a valid referrer if the tracking code wasn't removed. The
data: protects the abuser in an unecessary way. But you're absolutely right
that the solution I present isn't entirly tied to the problem.


> The correct solution to this kind of problem is legal - this is simple
> copyright violation.
>

But if you don't have a chance to get information about the attacker, you
can't sue him. I had the strange idea to use a prompt to ask the user for
the original URL in his address bar. But as I said, that's strange.


>
> I'm not sure about the merits of your suggestion otherwise.  It's
> reasonable to make data: pages same-origin with their parent when
> they're contained within something, but it seems dodgy to make them
> same-origin with their *contained* pages as well.  If not done
> carefully, that could allow contained pages access to the data: page's
> parent as well, or other cross-origin pages that the data: page is
> containing.
>

Very intuitive thought, one could assume that data: pages are same-origin,
or better that embedded data: pages are part of the current page. In this
way, you wouldn't have the chance to get off the sandbox and access the
parent. What would be a situation where a same-origin could be dangerous?


>
> ~TJ
>


Re: [whatwg] Security restriction allows content thievery

2012-07-15 Thread Tab Atkins Jr.
On Sun, Jul 15, 2012 at 3:22 PM, Robert Eisele  wrote:
> Browsers are very restrictive when one tries to access the contents of
> different domains (including the scheme), embedded via framesets. This is
> normally a good practice, but I'd suggest to weaken this restriction for
> the data: URI schema.
>
> I'm currently building an analysis system like Google Analytics, which gets
> embedded into a website via a small JavaScript snippet. When I analyzed the
> data, I came across a very interesting trick because I got a lot of
> requests (with the data from location.href) where the entire website was
> embedded into a data:text/html URI - except that all ads of the page were
> replaced. Fortunately, my tracking code has been left without
> modifications.
>
> But the scary thing is that this way you can monetize foreign content by
> simply embedding it somewhere you can direct traffic to. That's pretty
> clever, because the original site owner doesn't notice this abuse due to
> the fact that top.location.href isn't readable. Or even worse, he would
> never notice it at all when he doesn't sniff the URI with JavaScript,
> because image files would have no referrer.
>
> My final approach to convict the abuser is based on the fact, that the
> JavaScript was dynamically loaded from my server and that I can write to
> location.href. So I added this piece of code:
>
> if (top.location.protocol === 'data:') {
> top.location.href = 'http://example.com/trap/';
> }
>
> But even then the referrer will not be passed to the server. So my proposal
> is that the data URI schema gets an exception on this security behavior.

The problem you outline is not directly tied to the solution you
present.  You can scrape a site and display it as your own without any
fancy tricks, just by downloading all the resources and hosting them
yourself.  This merely consumes a little more bandwidth for the
attacker, since they're hosting the images/etc themselves.

The correct solution to this kind of problem is legal - this is simple
copyright violation.

I'm not sure about the merits of your suggestion otherwise.  It's
reasonable to make data: pages same-origin with their parent when
they're contained within something, but it seems dodgy to make them
same-origin with their *contained* pages as well.  If not done
carefully, that could allow contained pages access to the data: page's
parent as well, or other cross-origin pages that the data: page is
containing.

~TJ


[whatwg] Security restriction allows content thievery

2012-07-15 Thread Robert Eisele
Browsers are very restrictive when one tries to access the contents of
different domains (including the scheme), embedded via framesets. This is
normally a good practice, but I'd suggest to weaken this restriction for
the data: URI schema.

I'm currently building an analysis system like Google Analytics, which gets
embedded into a website via a small JavaScript snippet. When I analyzed the
data, I came across a very interesting trick because I got a lot of
requests (with the data from location.href) where the entire website was
embedded into a data:text/html URI - except that all ads of the page were
replaced. Fortunately, my tracking code has been left without
modifications.

But the scary thing is that this way you can monetize foreign content by
simply embedding it somewhere you can direct traffic to. That's pretty
clever, because the original site owner doesn't notice this abuse due to
the fact that top.location.href isn't readable. Or even worse, he would
never notice it at all when he doesn't sniff the URI with JavaScript,
because image files would have no referrer.

My final approach to convict the abuser is based on the fact, that the
JavaScript was dynamically loaded from my server and that I can write to
location.href. So I added this piece of code:

if (top.location.protocol === 'data:') {
top.location.href = 'http://example.com/trap/';
}

But even then the referrer will not be passed to the server. So my proposal
is that the data URI schema gets an exception on this security behavior.



Kind Regards

Robert Eisele
http://www.xarg.org/