Re: [whatwg] Zip archives as first-class citizens

Gordon P. Hemsley Wed, 28 Aug 2013 07:39:09 -0700

On 8/28/13 9:32 AM, Anne van Kesteren wrote:

We have thought of three approaches for zip URL design thus far:


* Using a sub-scheme (zip) with a zip-path (after !):
zip:http://www.example.org/zip!image.gif
* Introducing a zip-path (after %!): http://www.example.org/zip%!image.gif
* Using media fragments: http://www.example.org/zip#path=image.gif

High-level drawbacks:

* Sub-scheme: requires changing the URL syntax with both sub-scheme
and zip-path.
* Zip-path: requires changing the URL syntax.
* Fragments: fail to work well for URLs relative to a zip archive.

Fragments are conceptually the cleanest as the only part of a URL
that's supposed to depend on the Content-Type is the fragment.
However, if you want to link to an ID inside an HTML resource you'd
have to do #path=test.html&id=test which would require adding
knowledge to the HTML resource that it is contained in a zip archive
and have special processing based on that. And not just HTML, same
goes for CSS or JavaScript.

I'm not sure we need to consider sub-scheme if zip-path can work as
it's more complex and not very well thought out. E.g. imagine
view-source:zip:http://www.example.org/zip!test.html. (I hope we never
need to standardize view-source and that it can be restricted to the
address bar in browsers.)

zip-path makes zip archive packaging by far the easiest. If we use %!
as separator that would cause a network error in some existing
browsers (due to an illegal %), which means it's extensible there,
though not backwards compatible.

We'd adjust the URL parser to build a zip-path once %! is encountered.
And relative URLs would first look if there's a zip-path and work
against that, and use path otherwise.

Fetching would always use the path. If there's a zip-path and the
returned resource is not a zip archive it would cause a network error.

As for nested zip archives. Andrea suggested we should support this,
but that would require zip-path to be a sequence of paths. I think we
never went to allow relative URLs to escape the top-most zip archive.
But I suppose we could support in a way that

   %!test.zip!test.html

goes one level deeper. And "../image.gif" in test.html looks in the
enclosing zip. And "../../image.gif" in test.html looks in the
enclosing zip as well because it cannot ever be relative to the path,
only the zip-path.

As the following URLs suggest, the %! (or %-anything) will likely notwork for ZIP files generated by a script using the query portion of theURL, as the path information will be subsumed into the last valuewithout causing a network error:


http://whatwg.gphemsley.org/url_test.php?file=test.zip&spacer=1%!example.png
http://whatwg.gphemsley.org/url_test.php?file=test.zip&spacer=1%/example.png
http://whatwg.gphemsley.org/url_test.php?file=test.zip&spacer=1?example.png

(And feel free to use that script to try out any other combos.)

However, since fragments (i.e. anything beginning with '#') are alreadynot sent to the server, what if you modified the URL parser to use aspecial hash-prefix combo that indicates the path? Then you could avoidthe problem of having to make documents aware of the fact that they'rein a ZIP because the hash-prefix combo would come before the plain hashwhich holds the ID.


So, for example:

http://whatwg.gphemsley.org/url_test.php?file=test.zip&spacer=1#/example.html#middle

Then you could also take the opportunity to spec the #! prefix (andother hash-combo prefixes) that is used by a lot of sites nowadays.


--
Gordon P. Hemsley
[email protected]
http://gphemsley.org/

Re: [whatwg] Zip archives as first-class citizens

Reply via email to