Re: [whatwg] Script preloading

2013-08-28 Thread Jonas Sicking
Hi Ryosuke,

Based on the feedback here, it doesn't sound like you are a huge fan
of the original proposal in this thread.

At this point, has any implementation come out in support of the
proposal in this thread as a preferred solution over
noexecute/execute()?

The strongest support I've seen in this thread, though I very well
could have missed some, is "it's better than status quo".

Is that the case?

/ Jonas

On Wed, Aug 28, 2013 at 7:43 PM, Ryosuke Niwa  wrote:
> On Jul 13, 2013, at 5:55 AM, Andy Davies  wrote:
>
>> On 12 July 2013 01:25, Bruno Racineux  wrote:
>>
>>> On browser preloading:
>>>
>>> There seems to an inherent conflict between 'indiscriminate' Pre-parsers/
>>> PreloadScanner and "responsive design" for mobile. Responsive designs
>>> mostly implies that everything needed for a full screen desktop is
>>> provided in markup to all devices.
>>>
>>>
>> The pre-loader is a tradeoff, it's aiming to increase network utilisation
>> by speculatively downloading resources it can discover.
>>
>> Some of the resources downloaded may be not be used but with good design
>> and mobile first approaches hopefully this number can be minimised.
>>
>> Even if some unused resources get downloaded how much it matter?
>
> It matters a lot when you only have GSM wireless connection, and barely 
> loading anything at all.
>
>> By starting the downloads earlier, connections will be opened sooner, and
>> the TCP congestion window to grow sooner. Of course this has to be balanced
>> against visitors who might be paying to download those unused bytes, and
>> whether the unused resources are blocking something on the critical path
>> from being downloaded (believe some preloaders can re-prioritise resources
>> if they need them before the preloader has downloaded them)
>
> Exactly.  I'd to make sure whatever API we come up gives enough flexibility 
> for the UAs to decide whether a given resource needs to be loaded immediatley.
>
>
>
> On Jul 12, 2013, at 11:56 AM, Kyle Simpson  wrote:
>
>> My scope (as it always has been) put simply: I want (for all the reasons 
>> here and before) to have a "silver bullet" in script loading, which lets me 
>> load any number of scripts in parallel, and to the extent that is 
>> reasonable, be fully in control of what order they run in, if at all, 
>> responding to conditions AS THE SCRIPTS EXECUTE, not merely as they might 
>> have existed at the time of initial request. I want such a facility because 
>> I want to continue to have LABjs be a best-in-class fully-capable script 
>> loader that sets the standard for best-practice on-demand script loading.
>
>
> Because of the different network conditions and constraints various devices 
> have, I'm wary of any solution that gives the full control over when each 
> script is loaded.  While I'm sure large corporations with lots of resources 
> will get this right, I don't want to provide a preloading API that's hard to 
> use for ordinary Web developers.
>
>
> On Jul 15, 2013, at 7:55 AM, Kornel Lesiński  wrote:
>
>> There's a very high overlap between module dependencies and 

Re: [whatwg] Script preloading

2013-08-28 Thread Ryosuke Niwa
On Jul 13, 2013, at 5:55 AM, Andy Davies  wrote:

> On 12 July 2013 01:25, Bruno Racineux  wrote:
> 
>> On browser preloading:
>> 
>> There seems to an inherent conflict between 'indiscriminate' Pre-parsers/
>> PreloadScanner and "responsive design" for mobile. Responsive designs
>> mostly implies that everything needed for a full screen desktop is
>> provided in markup to all devices.
>> 
>> 
> The pre-loader is a tradeoff, it's aiming to increase network utilisation
> by speculatively downloading resources it can discover.
> 
> Some of the resources downloaded may be not be used but with good design
> and mobile first approaches hopefully this number can be minimised.
> 
> Even if some unused resources get downloaded how much it matter?

It matters a lot when you only have GSM wireless connection, and barely loading 
anything at all.

> By starting the downloads earlier, connections will be opened sooner, and
> the TCP congestion window to grow sooner. Of course this has to be balanced
> against visitors who might be paying to download those unused bytes, and
> whether the unused resources are blocking something on the critical path
> from being downloaded (believe some preloaders can re-prioritise resources
> if they need them before the preloader has downloaded them)

Exactly.  I'd to make sure whatever API we come up gives enough flexibility for 
the UAs to decide whether a given resource needs to be loaded immediatley.



On Jul 12, 2013, at 11:56 AM, Kyle Simpson  wrote:

> My scope (as it always has been) put simply: I want (for all the reasons here 
> and before) to have a "silver bullet" in script loading, which lets me load 
> any number of scripts in parallel, and to the extent that is reasonable, be 
> fully in control of what order they run in, if at all, responding to 
> conditions AS THE SCRIPTS EXECUTE, not merely as they might have existed at 
> the time of initial request. I want such a facility because I want to 
> continue to have LABjs be a best-in-class fully-capable script loader that 
> sets the standard for best-practice on-demand script loading.


Because of the different network conditions and constraints various devices 
have, I'm wary of any solution that gives the full control over when each 
script is loaded.  While I'm sure large corporations with lots of resources 
will get this right, I don't want to provide a preloading API that's hard to 
use for ordinary Web developers.


On Jul 15, 2013, at 7:55 AM, Kornel Lesiński  wrote:

> There's a very high overlap between module dependencies and 

Re: [whatwg] Zip archives as first-class citizens

2013-08-28 Thread Mark Nottingham
Hey Anne,

On 28/08/2013, at 11:32 PM, Anne van Kesteren  wrote:
> 
> * Fragments: fail to work well for URLs relative to a zip archive.
> 
> Fragments are conceptually the cleanest as the only part of a URL
> that's supposed to depend on the Content-Type is the fragment.
> However, if you want to link to an ID inside an HTML resource you'd
> have to do #path=test.html&id=test which would require adding
> knowledge to the HTML resource that it is contained in a zip archive
> and have special processing based on that. And not just HTML, same
> goes for CSS or JavaScript.


I'm sure you've thought about this more than I have, but can you humour me and 
dig in a bit here? 

If I wanted to link *within* the HTML, it could still be , 
correct?

Likewise, in the CSS if I wanted to define style for that id, it'd still be 
#test { ... }.

AIUI the case that's more of an issue is if I want to link from foo.html to 
bar.html#test, both inside the zip.

It seems to me that you need *some* idea of the structure of the zip inside 
there -- just as you need some idea of the structure of the Web site when 
linking between HTTP resources. The question to me is whether you can make it 
compatible with existing syntax to make it go down easier. 

E.g. if this would work: 

Couldn't that be done by saying that for URIs inside a ZIP file, the base URI 
is effectively an authority-less scheme? 

E.g., for "foo.html" the base uri would be "zip://foo.html". 

The zip URI scheme wouldn't be used in practice, just for rooting relative URIs 
inside of ZIP files. From the outside, the fragment identifier syntax for the 
zip format would dispatch appropriately, e.g.,

http://example.com/stuff.zip#path=foo.html&id=test

I *think* the end effect here would be that from the inside, HTML, CSS and JS 
wouldn't have to be changed to be zipped. From the outside, if you want to link 
*into* a zip file, you have to be aware of its structure, but that's really 
always going to be the case, isn't it?

Just a thought.

Cheers,


--
Mark Nottingham   http://www.mnot.net/





Re: [whatwg] Elements should be removed from the past names map once it's no longer associated with the form element

2013-08-28 Thread Ryosuke Niwa
Since Gecko has already implemented this behavior, I've gone ahead and changed 
WebKit's behavior:
http://trac.webkit.org/changeset/154761

- R. Niwa

On Aug 26, 2013, at 7:09 PM, Boris Zbarsky  wrote:

> On 8/26/13 9:51 PM, Ryosuke Niwa wrote:
>> That's good to hear.  So we're definitely in agreement with respect to this 
>> new behavior.
> 
> I filed https://www.w3.org/Bugs/Public/show_bug.cgi?id=23073
> 
> -Boris



Re: [whatwg] [blink-dev] Re: Intent to Update TextTrackCue and Add VTTCue

2013-08-28 Thread Ian Hickson
On Fri, 23 Aug 2013, Glenn Adams wrote:
> On Fri, Aug 23, 2013 at 4:16 PM, Ian Hickson  wrote:
> > On Fri, 23 Aug 2013, Glenn Adams wrote:
> > >
> > > As has been pointed out a number of times, there are already 
> > > implementations and JS client code using this technique.
> >
> > Where?
> 
> I think I've pointed this out to you at least four times before, but 
> I'll do so again:
> 
> http://www.cablelabs.com/specifications/CL-SP-HTML5-MAP-I02-120510.pdf
> 
> See section 5.2 Closed Captioning.

I see nothing in that section that is either an implementation or JS 
client code.

On Sat, 24 Aug 2013, Glenn Adams wrote:
> On Sat, Aug 24, 2013 at 9:48 AM, PhistucK  wrote:
> > 
> > But where is it used?
>
> This specification has been implemented by CableLabs in a reference 
> implementation of a DLNA defined TV/STB platform for remote user 
> interfaces. The "generic" usage implemented there is being used by 
> television service provider operators to access both MPEG-2 PSI and 
> CEA-608 data in JS client code.

Changing how the Web works wouldn't, as far as I can tell, have any impact 
on this. So this doesn't provide a reason to avoid changing the spec.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Canonical Image and Color

2013-08-28 Thread Brian Blakely
On Fri, Jul 12, 2013 at 1:32 PM, Ian Hickson  wrote:

> You are welcome to register these on the wiki and convince people to use
> them, sure. Seems like they already have solutions, though, as you show:


Would you kindly link me to the wiki?


> Sounds like this is already solved, then.
>

In a sense, but ultimately with caveats.  OpenGraph is very useful right
now, but Facebook can unilaterally change it or wipe it out entirely (both
have already happened to a degree).  Microsoft's color properties possess
mechanics that are extremely specific to how IE uses color in Windows.

Why isn't  sufficient?


That should suffice, I agree.  Meta Image can serve as a fallback when an
icon is not available – specifically, as an alternative to using a
programmatic screenshot of the app.  The principal concept in this Meta
Image proposal is to specify a graphic that represents the page content.


Re: [whatwg] Zip archives as first-class citizens

2013-08-28 Thread Eric Uhrhane
On Wed, Aug 28, 2013 at 10:21 AM, Glenn Maynard  wrote:
> On Wed, Aug 28, 2013 at 12:07 PM, Eric Uhrhane  wrote:
>>
>> We've covered this several times.  The directory records in a zip can
>> be superseded by further directories later in the archive, so you
>> can't trust that you've got the right directory until you're done
>> downloading.
>
> Both the local headers and the central record can be wrong.  (As mentioned
> on IRC the other day, apparently EPUB files often have broken central
> records, so eBook readers probably prefer the local records.)  If they're
> out of sync, then they'll always be broken in some clients.
>
> We just have to make sure that the record that takes priority in any
> particular case is well-defined, so we have interop.  If some malformed
> archives won't work in some cases as a result, using a different format
> isn't an improvement: that just means *zero* existing archives would work.

Broken files don't work, and I'm OK with that.  I'm saying that legal
zips can have multiple directories, where the definitive one is last
in the file, so it's not a good format for streaming.  If you're
saying that you want to change the format to make an earlier directory
definitive, that's going to break compat for the existing archives
everywhere, and would be confusing enough that we should just go with
a different archive format that doesn't require changes.  Or at least
don't call it zip when you're done messing with the spec.

> This applies to various other aspects of the format: the maximum supported
> length of comments and handling of duplicate filenames, for example.  This
> would all need to be specified; the ZIP "AppNote" doesn't specify a parser
> or error handling in the way the web needs, it just describes the format.
>
> --
> Glenn Maynard
>


Re: [whatwg] Zip archives as first-class citizens

2013-08-28 Thread Glenn Maynard
On Wed, Aug 28, 2013 at 12:07 PM, Eric Uhrhane  wrote:

> We've covered this several times.  The directory records in a zip can
> be superseded by further directories later in the archive, so you
> can't trust that you've got the right directory until you're done
> downloading.
>

Both the local headers and the central record can be wrong.  (As mentioned
on IRC the other day, apparently EPUB files often have broken central
records, so eBook readers probably prefer the local records.)  If they're
out of sync, then they'll always be broken in some clients.

We just have to make sure that the record that takes priority in any
particular case is well-defined, so we have interop.  If some malformed
archives won't work in some cases as a result, using a different format
isn't an improvement: that just means *zero* existing archives would work.

This applies to various other aspects of the format: the maximum supported
length of comments and handling of duplicate filenames, for example.  This
would all need to be specified; the ZIP "AppNote" doesn't specify a parser
or error handling in the way the web needs, it just describes the format.

-- 
Glenn Maynard


Re: [whatwg] Zip archives as first-class citizens

2013-08-28 Thread Matthew Kaufman

(resending)

On Aug 28, 2013, at 6:32 AM, Anne van Kesteren  wrote:


A couple of us have been toying around with the idea of making zip
archives first-class citizens on the web.


This sounds like a great opening for a discussion about the pros and cons of 
doing such a thing. But until such a discussion has happened, isn't it a little 
premature to worry about the URL details?

I'd start with things like "what is the fallback when using a browser behind an enterprise firewall that 
blocks all zip files?" and "what potential security vulnerabilities do we create by having the 
browser download a zip file and parse the contents?" and maybe "how does this influence the design 
of memory-constrained browsers?"

Matthew Kaufman

Sent from my iPad



Re: [whatwg] Zip archives as first-class citizens

2013-08-28 Thread Eric Uhrhane
On Wed, Aug 28, 2013 at 9:43 AM, Glenn Maynard  wrote:
> On Wed, Aug 28, 2013 at 4:54 PM, Eric Uhrhane  wrote:
>>
>> > Without commenting on the other parts of the proposal, let me just
>> > mention that every time .zip support comes up, we notice that it's not
>> > a great web archive format because it's not streamable.  That is, you
>> > can't actually use any of the contents until you've downloaded the
>> > whole file.
>
>
> ZIPs support both streaming and random access.  You can access files in a
> ZIP as the ZIP is downloaded, using the local file headers.  In this mode,
> they work like tars (except that you don't have to decompress unneeded data,
> like you do with a tar.gz).

Anne's quote snipped off an important piece of my message [which
apparently didn't get out due to the too-many-recipients problem]:

> [Before you respond that it's streamable, please look in the archives
> for the rebuttal.]

We've covered this several times.  The directory records in a zip can
be superseded by further directories later in the archive, so you
can't trust that you've got the right directory until you're done
downloading.

> This feature wouldn't want that, since you need to read the whole file up to
> the file you want.  Instead, it wants random access, which ZIPs also
> support.  You download the central directory record first, to find out where
> the file you want lies in the archive, then download just the slice of data
> you need.  You don't need to download the whole file.
>
> --
> Glenn Maynard
>


Re: [whatwg] Zip archives as first-class citizens

2013-08-28 Thread Anne van Kesteren
> On Wed, Aug 28, 2013 at 8:47 AM, Eric U  wrote:
> Without commenting on the other parts of the proposal, let me just
> mention that every time .zip support comes up, we notice that it's not
> a great web archive format because it's not streamable.  That is, you
> can't actually use any of the contents until you've downloaded the
> whole file.
>
> Perhaps some other archive format would be a better fit for the web?

My take on this is that zip archives are ubiquitous. That makes this
feature easy to deploy from the start. If zip archives turn out to be
a successful feature we can add support for an alternative format down
the line that handles that better. Adding zip archive support will
also make it easier to work with OOXML, EPUB, etc.


-- 
http://annevankesteren.nl/


Re: [whatwg] Zip archives as first-class citizens

2013-08-28 Thread Eric Uhrhane
Again from the right address...

On Wed, Aug 28, 2013 at 8:47 AM, Eric U  wrote:
> Without commenting on the other parts of the proposal, let me just
> mention that every time .zip support comes up, we notice that it's not
> a great web archive format because it's not streamable.  That is, you
> can't actually use any of the contents until you've downloaded the
> whole file.
>
> Perhaps some other archive format would be a better fit for the web?
>
> [Before you respond that it's streamable, please look in the archives
> for the rebuttal.]
>
>  Eric
>
>
> On Wed, Aug 28, 2013 at 6:32 AM, Anne van Kesteren  wrote:
>> A couple of us have been toying around with the idea of making zip
>> archives first-class citizens on the web. What we want to support:
>>
>> * Group a bunch of JavaScript files together in a single resource and
>> refer to them individually for upcoming JavaScript modules.
>> * Package a bunch of related resources together for a game or
>> applications (e.g. icons).
>> * Support self-contained packages, like Flash-ads or Flash-based games.
>>
>> Using zip archives for this makes sense as it has broad tooling
>> support. To lower adoption cost no special configuration should be
>> needed. Existing zip archives should be able to fit right in.
>>
>>
>> The above means we need URLs for zip archives. That is:
>>
>>   
>>
>> should work. As well as
>>
>>   
>>
>> and test.html should be able to contain URLs that reference other
>> resources inside the zip archive.
>>
>>
>> We have thought of three approaches for zip URL design thus far:
>>
>> * Using a sub-scheme (zip) with a zip-path (after !):
>> zip:http://www.example.org/zip!image.gif
>> * Introducing a zip-path (after %!): http://www.example.org/zip%!image.gif
>> * Using media fragments: http://www.example.org/zip#path=image.gif
>>
>> High-level drawbacks:
>>
>> * Sub-scheme: requires changing the URL syntax with both sub-scheme
>> and zip-path.
>> * Zip-path: requires changing the URL syntax.
>> * Fragments: fail to work well for URLs relative to a zip archive.
>>
>> Fragments are conceptually the cleanest as the only part of a URL
>> that's supposed to depend on the Content-Type is the fragment.
>> However, if you want to link to an ID inside an HTML resource you'd
>> have to do #path=test.html&id=test which would require adding
>> knowledge to the HTML resource that it is contained in a zip archive
>> and have special processing based on that. And not just HTML, same
>> goes for CSS or JavaScript.
>>
>> I'm not sure we need to consider sub-scheme if zip-path can work as
>> it's more complex and not very well thought out. E.g. imagine
>> view-source:zip:http://www.example.org/zip!test.html. (I hope we never
>> need to standardize view-source and that it can be restricted to the
>> address bar in browsers.)
>>
>> zip-path makes zip archive packaging by far the easiest. If we use %!
>> as separator that would cause a network error in some existing
>> browsers (due to an illegal %), which means it's extensible there,
>> though not backwards compatible.
>>
>> We'd adjust the URL parser to build a zip-path once %! is encountered.
>> And relative URLs would first look if there's a zip-path and work
>> against that, and use path otherwise.
>>
>> Fetching would always use the path. If there's a zip-path and the
>> returned resource is not a zip archive it would cause a network error.
>>
>>
>> As for nested zip archives. Andrea suggested we should support this,
>> but that would require zip-path to be a sequence of paths. I think we
>> never went to allow relative URLs to escape the top-most zip archive.
>> But I suppose we could support in a way that
>>
>>   %!test.zip!test.html
>>
>> goes one level deeper. And "../image.gif" in test.html looks in the
>> enclosing zip. And "../../image.gif" in test.html looks in the
>> enclosing zip as well because it cannot ever be relative to the path,
>> only the zip-path.
>>
>>
>> --
>> http://annevankesteren.nl/


Re: [whatwg] Zip archives as first-class citizens

2013-08-28 Thread Thaddee Tyl
The idea of making zip content (and hopefully XZ content) available
feels right, but adding complexity doesn't.

On Wed, Aug 28, 2013 at 1:32 PM, Anne van Kesteren  wrote:
> We have thought of three approaches for zip URL design thus far:
>
> * Using a sub-scheme (zip) with a zip-path (after !):
> zip:http://www.example.org/zip!image.gif
> * Introducing a zip-path (after %!): http://www.example.org/zip%!image.gif
> * Using media fragments: http://www.example.org/zip#path=image.gif

W.r.t. the sub-scheme, KDE kioslaves have something highly similar
(available for instance on their file explorers). The syntax is the following

zip:  / 

For instance,

zip:/home/tyl/vault.zip/js/simplex.js

Sure, a "real" directory can have a .zip extension, but spread across
all KDE users since kioslave's inception, more than 10 years ago, that
hasn't been raised as an issue (at least, I couldn't find one through
their bug tracker).

As a result, may I suggest this?

zip:http://www.example.org/js.zip/simplex.js

W.r.t. using fragments, which I agree is the cleanest approach,
can we change the URL parsing algorithm
to authorize reading any number of fragments?
It would require adding # to the simple encode set, which
can have consequences I didn't think of.

http://example.org/assets.zip#html/frame.html#editor

(Is there a reason we should have a path=, then?)

That would also take care of nested zips.


Re: [whatwg] Zip archives as first-class citizens

2013-08-28 Thread Matthew Kaufman
On Aug 28, 2013, at 6:32 AM, Anne van Kesteren  wrote:

> A couple of us have been toying around with the idea of making zip
> archives first-class citizens on the web. 

This sounds like a great opening for a discussion about the pros and cons of 
doing such a thing. But until such a discussion has happened, isn't it a little 
premature to worry about the URL details?

I'd start with things like "what is the fallback when using a browser behind an 
enterprise firewall that blocks all zip files?" and "what potential security 
vulnerabilities do we create by having the browser download a zip file and 
parse the contents?" and maybe "how does this influence the design of 
memory-constrained browsers?"

Matthew Kaufman

Sent from my iPad

Re: [whatwg] Zip archives as first-class citizens

2013-08-28 Thread Boris Zbarsky

On 8/28/13 12:20 PM, Jonas Sicking wrote:

* It makes it impossible to have create a relative URL from inside the
zip file to refer to something on the same server but outside of the
zip file.


I think this comes back to use cases.

If the idea of having the zip is "here is stuff that should live in its 
own world", then we do not want easy ways to get out of it via relative 
URIs.


If the idea is to have "here is a fancy way of representing a directory" 
then relative URIs should Just Work across the zip boundary, like they 
would for any other directory.


Which model are we working with here?  Or some other one that doesn't 
match either of those two?


-Boris


Re: [whatwg] Zip archives as first-class citizens

2013-08-28 Thread Glenn Maynard
On Wed, Aug 28, 2013 at 4:54 PM, Eric Uhrhane  wrote:

> > Without commenting on the other parts of the proposal, let me just
> > mention that every time .zip support comes up, we notice that it's not
> > a great web archive format because it's not streamable.  That is, you
> > can't actually use any of the contents until you've downloaded the
> > whole file.
>

ZIPs support both streaming and random access.  You can access files in a
ZIP as the ZIP is downloaded, using the local file headers.  In this mode,
they work like tars (except that you don't have to decompress unneeded
data, like you do with a tar.gz).

This feature wouldn't want that, since you need to read the whole file up
to the file you want.  Instead, it wants random access, which ZIPs also
support.  You download the central directory record first, to find out
where the file you want lies in the archive, then download just the slice
of data you need.  You don't need to download the whole file.

-- 
Glenn Maynard


Re: [whatwg] Zip archives as first-class citizens

2013-08-28 Thread Boris Zbarsky

On 8/28/13 11:40 AM, Anne van Kesteren wrote:

On Wed, Aug 28, 2013 at 4:04 PM, Boris Zbarsky  wrote:

What's the issue with that?  Gecko supports that (with jar:, not zip:),
fwiw.


As far as the web platform is considered today, URL objects are just
that. In Gecko you either have a URL object, or a linked list of URL
objects.


In Gecko you always have a URL object.

A small number of operations (extracting the origin is the main one) 
need to know about the fact that a URL object may delegate the work to 
some other URL object.



I'd likewise be interested to hear from other implementers.


Yes, this is the key part.



Re: [whatwg] Zip archives as first-class citizens

2013-08-28 Thread Boris Zbarsky

On 8/28/13 11:50 AM, Michal Zalewski wrote:

1) Both jar: and mhtml: (which work or worked in a very similar way)
have caused problems in absence of strict Content-Type matching.


This is an issue for both versions of this proposal.  We'd need to do 
strict matching on the type.


-Boris


Re: [whatwg] Zip archives as first-class citizens

2013-08-28 Thread Anne van Kesteren
On Wed, Aug 28, 2013 at 4:50 PM, Michal Zalewski  wrote:
> 1) Both jar: and mhtml: (which work or worked in a very similar way)
> have caused problems in absence of strict Content-Type matching. In
> essence, it is relatively easy for something like a valid
> user-supplied text document or an image to be also a valid archive.
> Such archives may end up containing "files" that the owner of the
> website never intended to host in their origin.

This also seems like a problem for being able to navigate to a zip
archive's resources. E.g. if you have a hosting service for zip
archives someone could upload one with an HTML subresource that
executes some malicious script and trick users into navigating to
http://hosting.example/pinkpony%!look.html

I wonder if that is enough of a concern to not support navigating to
zip resources at all. Or is Gecko's jar support enough to not have to
care about this? (But we probably should do more than sniffing as you
point out.)


> 2) Both schemes also have a long history of breaking origin / host
> name parsing in various places in the browser and introducing security
> bugs.


-- 
http://annevankesteren.nl/


Re: [whatwg] Zip archives as first-class citizens

2013-08-28 Thread Jonas Sicking
On Wed, Aug 28, 2013 at 8:04 AM, Boris Zbarsky  wrote:
> On 8/28/13 9:32 AM, Anne van Kesteren wrote:
>>
>> I'm not sure we need to consider sub-scheme if zip-path can work as
>> it's more complex and not very well thought out. E.g. imagine
>> view-source:zip:http://www.example.org/zip!test.html.
>
>
> What's the issue with that?  Gecko supports that (with jar:, not zip:),
> fwiw.

I have two concerns with the scheme-based approach.

* It dramatically complicates origin handling. This is something we've
seen multiple times in gecko and something that I expect authors will
struggle with too.

* It makes it impossible to have create a relative URL from inside the
zip file to refer to something on the same server but outside of the
zip file. Since anything outside of the zip file uses a different
scheme, it means that you have to use an absolute URL. Not even URLs
starting with "/" nor "//" can be used.

> 3)  We have implementation experience with the "sub-scheme" approach and we
> know it can work just fine (existence proof is jar: in Gecko).  The main
> difficulty it introduces is that computing the origin needs to be done via
> object accessors, not string-parsing...  Do we have any implementation
> experience with "zip-path"-like approaches?

I don't know about "can work just fine". Sure, if everyone does the
right thing, then it works. But we're having to strictly enforce that
no one does string parsing by hand and instead use URL objects and
Principal objects. Neither of which really are an option on the web
right now as all URL-related APIs use strings.

> I don't think relative URIs should ever escape a zip archive (though I do
> appreciate the way that would let someone replace directories with zipped-up
> versions of those directories).  The reason for that is that allowing it
> sometimes but not others seems really weird to me, and it seems like we
> don't want to allow it for toplevel zip archives.

Why not?

/ Jonas


Re: [whatwg] Zip archives as first-class citizens

2013-08-28 Thread Anne van Kesteren
Resending. I recommend that people replying trim the address list as
apparently "Too many recipients to the message" is a thing for this
mailing list.

On Wed, Aug 28, 2013 at 4:54 PM, Eric Uhrhane  wrote:
> Without commenting on the other parts of the proposal, let me just
> mention that every time .zip support comes up, we notice that it's not
> a great web archive format because it's not streamable.  That is, you
> can't actually use any of the contents until you've downloaded the
> whole file.
>
> Perhaps some other archive format would be a better fit for the web?

My take on this is that zip archives are ubiquitous. That makes this
feature easy to deploy from the start. If zip archives turn out to be
a successful feature we can add support for an alternative format down
the line that handles that better. Adding zip archive support will
also make it easier to work with OOXML, EPUB, etc.


-- 
http://annevankesteren.nl/


Re: [whatwg] Zip archives as first-class citizens

2013-08-28 Thread Michal Zalewski
Two implementation risks to keep in mind:

1) Both jar: and mhtml: (which work or worked in a very similar way)
have caused problems in absence of strict Content-Type matching. In
essence, it is relatively easy for something like a valid
user-supplied text document or an image to be also a valid archive.
Such archives may end up containing "files" that the owner of the
website never intended to host in their origin.

2) Both schemes also have a long history of breaking origin / host
name parsing in various places in the browser and introducing security
bugs.

/mz


Re: [whatwg] Zip archives as first-class citizens

2013-08-28 Thread Anne van Kesteren
On Wed, Aug 28, 2013 at 4:04 PM, Boris Zbarsky  wrote:
> What's the issue with that?  Gecko supports that (with jar:, not zip:),
> fwiw.

As far as the web platform is considered today, URL objects are just
that. In Gecko you either have a URL object, or a linked list of URL
objects. I guess the question is whether supporting a linked list of
URL objects in addition to plain URL objects is worth it just for zip
archive support. Model-wise it's quite a bit of added complexity.

I'd likewise be interested to hear from other implementers.


-- 
http://annevankesteren.nl/


Re: [whatwg] Zip archives as first-class citizens

2013-08-28 Thread Boris Zbarsky

On 8/28/13 9:32 AM, Anne van Kesteren wrote:

I'm not sure we need to consider sub-scheme if zip-path can work as
it's more complex and not very well thought out. E.g. imagine
view-source:zip:http://www.example.org/zip!test.html.


What's the issue with that?  Gecko supports that (with jar:, not zip:), 
fwiw.


My concerns with the zip-path approach are as follows:

1)  It requires doing the zip processing in a new layer on top of 
whatever pluggable architecture you have for schemes.  The zip: approach 
nicely encapsulates things so that the protocol handler for zip: 
delegates to the inner URI for the archive fetch and then knows how to 
process it.  It might be possible to do the zip processing by totally 
rewriting how browsers do fetch to interpose this zip-processing layer, 
but that seems like a nontrivial undertaking compared to having an 
orthogonal zip: handler that's invoked explicitly.  I would be 
interested in knowing what other implementors think about how 
implementable the two options are in their architectures.


2)  It changes semantics of existing URIs that happen to contain %!. 
I'm specifically worried about data: URIs, though Gordon points out that 
some http URIs may also be affected.


3)  We have implementation experience with the "sub-scheme" approach and 
we know it can work just fine (existence proof is jar: in Gecko).  The 
main difficulty it introduces is that computing the origin needs to be 
done via object accessors, not string-parsing...  Do we have any 
implementation experience with "zip-path"-like approaches?



As for nested zip archives. Andrea suggested we should support this,
but that would require zip-path to be a sequence of paths. I think we
never went to allow relative URLs to escape the top-most zip archive.
But I suppose we could support in a way that

   %!test.zip!test.html

goes one level deeper. And "../image.gif" in test.html looks in the
enclosing zip.


I don't think relative URIs should ever escape a zip archive (though I 
do appreciate the way that would let someone replace directories with 
zipped-up versions of those directories).  The reason for that is that 
allowing it sometimes but not others seems really weird to me, and it 
seems like we don't want to allow it for toplevel zip archives.


-Boris


Re: [whatwg] Zip archives as first-class citizens

2013-08-28 Thread Gordon P. Hemsley

On 8/28/13 9:32 AM, Anne van Kesteren wrote:

We have thought of three approaches for zip URL design thus far:

* Using a sub-scheme (zip) with a zip-path (after !):
zip:http://www.example.org/zip!image.gif
* Introducing a zip-path (after %!): http://www.example.org/zip%!image.gif
* Using media fragments: http://www.example.org/zip#path=image.gif

High-level drawbacks:

* Sub-scheme: requires changing the URL syntax with both sub-scheme
and zip-path.
* Zip-path: requires changing the URL syntax.
* Fragments: fail to work well for URLs relative to a zip archive.

Fragments are conceptually the cleanest as the only part of a URL
that's supposed to depend on the Content-Type is the fragment.
However, if you want to link to an ID inside an HTML resource you'd
have to do #path=test.html&id=test which would require adding
knowledge to the HTML resource that it is contained in a zip archive
and have special processing based on that. And not just HTML, same
goes for CSS or JavaScript.

I'm not sure we need to consider sub-scheme if zip-path can work as
it's more complex and not very well thought out. E.g. imagine
view-source:zip:http://www.example.org/zip!test.html. (I hope we never
need to standardize view-source and that it can be restricted to the
address bar in browsers.)

zip-path makes zip archive packaging by far the easiest. If we use %!
as separator that would cause a network error in some existing
browsers (due to an illegal %), which means it's extensible there,
though not backwards compatible.

We'd adjust the URL parser to build a zip-path once %! is encountered.
And relative URLs would first look if there's a zip-path and work
against that, and use path otherwise.

Fetching would always use the path. If there's a zip-path and the
returned resource is not a zip archive it would cause a network error.

As for nested zip archives. Andrea suggested we should support this,
but that would require zip-path to be a sequence of paths. I think we
never went to allow relative URLs to escape the top-most zip archive.
But I suppose we could support in a way that

   %!test.zip!test.html

goes one level deeper. And "../image.gif" in test.html looks in the
enclosing zip. And "../../image.gif" in test.html looks in the
enclosing zip as well because it cannot ever be relative to the path,
only the zip-path.



As the following URLs suggest, the %! (or %-anything) will likely not 
work for ZIP files generated by a script using the query portion of the 
URL, as the path information will be subsumed into the last value 
without causing a network error:


http://whatwg.gphemsley.org/url_test.php?file=test.zip&spacer=1%!example.png
http://whatwg.gphemsley.org/url_test.php?file=test.zip&spacer=1%/example.png
http://whatwg.gphemsley.org/url_test.php?file=test.zip&spacer=1?example.png

(And feel free to use that script to try out any other combos.)

However, since fragments (i.e. anything beginning with '#') are already 
not sent to the server, what if you modified the URL parser to use a 
special hash-prefix combo that indicates the path? Then you could avoid 
the problem of having to make documents aware of the fact that they're 
in a ZIP because the hash-prefix combo would come before the plain hash 
which holds the ID.


So, for example:

http://whatwg.gphemsley.org/url_test.php?file=test.zip&spacer=1#/example.html#middle

Then you could also take the opportunity to spec the #! prefix (and 
other hash-combo prefixes) that is used by a lot of sites nowadays.


--
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/


[whatwg] Zip archives as first-class citizens

2013-08-28 Thread Anne van Kesteren
A couple of us have been toying around with the idea of making zip
archives first-class citizens on the web. What we want to support:

* Group a bunch of JavaScript files together in a single resource and
refer to them individually for upcoming JavaScript modules.
* Package a bunch of related resources together for a game or
applications (e.g. icons).
* Support self-contained packages, like Flash-ads or Flash-based games.

Using zip archives for this makes sense as it has broad tooling
support. To lower adoption cost no special configuration should be
needed. Existing zip archives should be able to fit right in.


The above means we need URLs for zip archives. That is:

  

should work. As well as

  

and test.html should be able to contain URLs that reference other
resources inside the zip archive.


We have thought of three approaches for zip URL design thus far:

* Using a sub-scheme (zip) with a zip-path (after !):
zip:http://www.example.org/zip!image.gif
* Introducing a zip-path (after %!): http://www.example.org/zip%!image.gif
* Using media fragments: http://www.example.org/zip#path=image.gif

High-level drawbacks:

* Sub-scheme: requires changing the URL syntax with both sub-scheme
and zip-path.
* Zip-path: requires changing the URL syntax.
* Fragments: fail to work well for URLs relative to a zip archive.

Fragments are conceptually the cleanest as the only part of a URL
that's supposed to depend on the Content-Type is the fragment.
However, if you want to link to an ID inside an HTML resource you'd
have to do #path=test.html&id=test which would require adding
knowledge to the HTML resource that it is contained in a zip archive
and have special processing based on that. And not just HTML, same
goes for CSS or JavaScript.

I'm not sure we need to consider sub-scheme if zip-path can work as
it's more complex and not very well thought out. E.g. imagine
view-source:zip:http://www.example.org/zip!test.html. (I hope we never
need to standardize view-source and that it can be restricted to the
address bar in browsers.)

zip-path makes zip archive packaging by far the easiest. If we use %!
as separator that would cause a network error in some existing
browsers (due to an illegal %), which means it's extensible there,
though not backwards compatible.

We'd adjust the URL parser to build a zip-path once %! is encountered.
And relative URLs would first look if there's a zip-path and work
against that, and use path otherwise.

Fetching would always use the path. If there's a zip-path and the
returned resource is not a zip archive it would cause a network error.


As for nested zip archives. Andrea suggested we should support this,
but that would require zip-path to be a sequence of paths. I think we
never went to allow relative URLs to escape the top-most zip archive.
But I suppose we could support in a way that

  %!test.zip!test.html

goes one level deeper. And "../image.gif" in test.html looks in the
enclosing zip. And "../../image.gif" in test.html looks in the
enclosing zip as well because it cannot ever be relative to the path,
only the zip-path.


-- 
http://annevankesteren.nl/