On 11/1/14 5:29 AM, Anne van Kesteren wrote:
On Sat, Nov 1, 2014 at 1:01 AM, Sam Ruby <ru...@intertwingly.net> wrote:
Meanwhile, The IETF is actively working on a update:

https://tools.ietf.org/html/draft-ietf-appsawg-uri-scheme-reg-04

They are meeting F2F in a little over a week.  URIs in general, and this
proposal in specific will be discussed, and for that reason now would be a
good time to provide feedback.  I've only quickly scanned it, but it appears
sane to me in that it basically says that new schemes will not be viewed as
relative schemes.

It doesn't say that. (We should perhaps try to find some way to make
"{scheme}://" syntax work for schemes that are not problematic (e.g.
javascript would be problematic). Convincing implementers that it's
worth implementing might be trickier.)

How should it change?

1) Change the URL Goals to only obsolete RFC 3987, not RFC 3986 too.

See previous threads on the subject. The data models are incompatible,
at least around "%", likely also around other code points. It also
seems unacceptable to require two parsers for URLs.

Acknowledging that other parsers exist is quite a different statement than requiring two parsers. I'm only suggesting the former.

As a concrete statement, a compliant implementation of HTML would require a URL parser, but not a URI parser.

Also as a concrete statement, such a user agent will interact, primarily via the network, with other software that will interpret the canonicalized URL's as if they were URIs.

That may not be as we would wish it to be. But it would be a disservice to everyone to document how we would wish things to be rather than how they actually are (and, by all indications, are likely to remain for the foreseeable future).

3) Explicitly state that canonical URLs (i.e., the output of the URL parse
step) not only round trip but also are valid URIs.  If there are any RFC
3986 errata and/or willful violations necessary to make that a true
statement, so be it.

It might be interesting to figure out the delta. But there are major
differences between RFC 3986 and URL. Not obsoleting the former seems
like a disservice to anyone looking to implement a parser or find
information on URI/URL.

I do plan to work with others to figure out the delta. As to the data models, at the present time -- and without having actually done the necessary analysis -- I am not aware of a single case where they would be different. Undoubtedly we will be able to quickly find some, but even so, I would assert that they following statements will remain true for the domain of canonicalized URLs, by which I mean the set of possible outputs of the URL serializer:

1) the overlap is substantial, and I would dare say overwhelming.

2) RFC 3986 and URL compliant parsers would interpret the same bytes in such outputs as delimiters, schemes, paths, fragments, etc.

3) as to data models, the URL Standard is silent as to how such bytes be interpreted. As to the meaning of '%', both the URL Standard and RFC3986 recognize that encodings other than utf-8 exist, and that such will affect the interpretation of percent encoded byte sequences.

- Sam Ruby

Reply via email to