On 8/28/13 9:32 AM, Anne van Kesteren wrote:
We have thought of three approaches for zip URL design thus far:

* Using a sub-scheme (zip) with a zip-path (after !):
zip:http://www.example.org/zip!image.gif
* Introducing a zip-path (after %!): http://www.example.org/zip%!image.gif
* Using media fragments: http://www.example.org/zip#path=image.gif

High-level drawbacks:

* Sub-scheme: requires changing the URL syntax with both sub-scheme
and zip-path.
* Zip-path: requires changing the URL syntax.
* Fragments: fail to work well for URLs relative to a zip archive.

Fragments are conceptually the cleanest as the only part of a URL
that's supposed to depend on the Content-Type is the fragment.
However, if you want to link to an ID inside an HTML resource you'd
have to do #path=test.html&id=test which would require adding
knowledge to the HTML resource that it is contained in a zip archive
and have special processing based on that. And not just HTML, same
goes for CSS or JavaScript.

I'm not sure we need to consider sub-scheme if zip-path can work as
it's more complex and not very well thought out. E.g. imagine
view-source:zip:http://www.example.org/zip!test.html. (I hope we never
need to standardize view-source and that it can be restricted to the
address bar in browsers.)

zip-path makes zip archive packaging by far the easiest. If we use %!
as separator that would cause a network error in some existing
browsers (due to an illegal %), which means it's extensible there,
though not backwards compatible.

We'd adjust the URL parser to build a zip-path once %! is encountered.
And relative URLs would first look if there's a zip-path and work
against that, and use path otherwise.

Fetching would always use the path. If there's a zip-path and the
returned resource is not a zip archive it would cause a network error.

As for nested zip archives. Andrea suggested we should support this,
but that would require zip-path to be a sequence of paths. I think we
never went to allow relative URLs to escape the top-most zip archive.
But I suppose we could support in a way that

   %!test.zip!test.html

goes one level deeper. And "../image.gif" in test.html looks in the
enclosing zip. And "../../image.gif" in test.html looks in the
enclosing zip as well because it cannot ever be relative to the path,
only the zip-path.


As the following URLs suggest, the %! (or %-anything) will likely not work for ZIP files generated by a script using the query portion of the URL, as the path information will be subsumed into the last value without causing a network error:

http://whatwg.gphemsley.org/url_test.php?file=test.zip&spacer=1%!example.png
http://whatwg.gphemsley.org/url_test.php?file=test.zip&spacer=1%/example.png
http://whatwg.gphemsley.org/url_test.php?file=test.zip&spacer=1?example.png

(And feel free to use that script to try out any other combos.)

However, since fragments (i.e. anything beginning with '#') are already not sent to the server, what if you modified the URL parser to use a special hash-prefix combo that indicates the path? Then you could avoid the problem of having to make documents aware of the fact that they're in a ZIP because the hash-prefix combo would come before the plain hash which holds the ID.

So, for example:

http://whatwg.gphemsley.org/url_test.php?file=test.zip&spacer=1#/example.html#middle

Then you could also take the opportunity to spec the #! prefix (and other hash-combo prefixes) that is used by a lot of sites nowadays.

--
Gordon P. Hemsley
[email protected]
http://gphemsley.org/

Reply via email to