Re: [whatwg] [url] Feedback from TPAC

Sam Ruby Sat, 01 Nov 2014 04:39:08 -0700

On 11/1/14 5:29 AM, Anne van Kesteren wrote:

On Sat, Nov 1, 2014 at 1:01 AM, Sam Ruby <ru...@intertwingly.net> wrote:

Meanwhile, The IETF is actively working on a update:


https://tools.ietf.org/html/draft-ietf-appsawg-uri-scheme-reg-04

They are meeting F2F in a little over a week.  URIs in general, and this
proposal in specific will be discussed, and for that reason now would be a
good time to provide feedback.  I've only quickly scanned it, but it appears
sane to me in that it basically says that new schemes will not be viewed as
relative schemes.


It doesn't say that. (We should perhaps try to find some way to make
"{scheme}://" syntax work for schemes that are not problematic (e.g.
javascript would be problematic). Convincing implementers that it's
worth implementing might be trickier.)


How should it change?

1) Change the URL Goals to only obsolete RFC 3987, not RFC 3986 too.


See previous threads on the subject. The data models are incompatible,
at least around "%", likely also around other code points. It also
seems unacceptable to require two parsers for URLs.

Acknowledging that other parsers exist is quite a different statementthan requiring two parsers. I'm only suggesting the former.

As a concrete statement, a compliant implementation of HTML wouldrequire a URL parser, but not a URI parser.

Also as a concrete statement, such a user agent will interact, primarilyvia the network, with other software that will interpret thecanonicalized URL's as if they were URIs.

That may not be as we would wish it to be. But it would be a disserviceto everyone to document how we would wish things to be rather than howthey actually are (and, by all indications, are likely to remain for theforeseeable future).

3) Explicitly state that canonical URLs (i.e., the output of the URL parse
step) not only round trip but also are valid URIs.  If there are any RFC
3986 errata and/or willful violations necessary to make that a true
statement, so be it.


It might be interesting to figure out the delta. But there are major
differences between RFC 3986 and URL. Not obsoleting the former seems
like a disservice to anyone looking to implement a parser or find
information on URI/URL.

I do plan to work with others to figure out the delta. As to the datamodels, at the present time -- and without having actually done thenecessary analysis -- I am not aware of a single case where they wouldbe different. Undoubtedly we will be able to quickly find some, buteven so, I would assert that they following statements will remain truefor the domain of canonicalized URLs, by which I mean the set ofpossible outputs of the URL serializer:


1) the overlap is substantial, and I would dare say overwhelming.

2) RFC 3986 and URL compliant parsers would interpret the same bytes insuch outputs as delimiters, schemes, paths, fragments, etc.

3) as to data models, the URL Standard is silent as to how such bytes beinterpreted. As to the meaning of '%', both the URL Standard andRFC3986 recognize that encodings other than utf-8 exist, and that suchwill affect the interpretation of percent encoded byte sequences.


- Sam Ruby

Re: [whatwg] [url] Feedback from TPAC

Reply via email to