Re: [whatwg] [url] Feedback from TPAC

2014-11-04 Thread Sam Ruby

On 11/04/2014 11:25 AM, Domenic Denicola wrote:

From: whatwg [mailto:whatwg-boun...@lists.whatwg.org] On Behalf Of David Singer


(I don't have IE to hand at the moment).


I tried to test IE but unfortunately it looks like the "URL components from DOM 
properties" part of the demo page does not work in IE, I think because IE doesn't 
support document.baseURI.


Try experimenting with a base URL using a http scheme.

If you look closely at the source, you will see that function rebase 
will set both document.baseURI and the href element on the base element. 
 The latter is sufficent for non-IE browsers.  I had to add the former 
to get IE working.


But, as you undoubtedly have noted, unknown base schemes seem to cause 
IE too ignore the base URL entirely.


- Sam Ruby


Re: [whatwg] [url] Feedback from TPAC

2014-11-04 Thread Domenic Denicola
From: whatwg [mailto:whatwg-boun...@lists.whatwg.org] On Behalf Of David Singer

> (I don't have IE to hand at the moment).

I tried to test IE but unfortunately it looks like the "URL components from DOM 
properties" part of the demo page does not work in IE, I think because IE 
doesn't support document.baseURI.


Re: [whatwg] [url] Feedback from TPAC

2014-11-04 Thread Anne van Kesteren
On Tue, Nov 4, 2014 at 4:50 PM, David Singer  wrote:
> really?  Safari, Chrome and Opera all return what to me is eminently sensible
>
> stuff://www.app.com/a/b/banana

It does seem like they allow for some different behavior here, indeed!

We still need to special case schemes as e.g. "x" against
http:///test/ gives different results when parsed against x:///test/
(and we need the ignore extraneous slashes behavior for the former).
And Chromium is weird if you leave out a trailing slash as in x
against x://test. But it does seem like we could have something for
schemes that are not special cased, and are not javascript, data,
etc., that better matches RFC 3986.

I was thinking of introducing such a thing and that both WebKit and
Chromium exhibit such behavior to some extent makes it easier. Thanks.


-- 
https://annevankesteren.nl/


Re: [whatwg] [url] Feedback from TPAC

2014-11-04 Thread David Singer

On Nov 4, 2014, at 15:32 , Anne van Kesteren  wrote:

> On Tue, Nov 4, 2014 at 4:24 PM, David Singer  wrote:
>> at the moment I am more interested in understanding what the best behavior 
>> might be than majority voting
> 
> I don't think there is disagreement about what better behavior might
> be in this case, if we skip over the details for the moment.

really?  Safari, Chrome and Opera all return what to me is eminently sensible

stuff://www.app.com/a/b/banana

Only Firefox and your parser compose ‘banana’ against 
'stuff://www.app.com/a/b/' to make ‘banana’.  (I don’t have IE to hand at the 
moment).

Whether they do this because it’s sensible or because it’s the RFC behavior, I 
do not know, of course. But being future-resilient (we’ll never be fully future 
proof, I agree) seems pretty desirable.  Why is ‘banana’ the better answer 
here?  I assume it fixes some other issue we haven’t explicitly mentioned?

> However,
> how likely is it in your estimation that Apple changes the URL parser
> it ships in this regard?

I have no idea.  I am not in charge of the products we ship :-(, I just try to 
help the standards landscape include standards we could or would like to 
support. Clearly I would not yet be advocating for such a change (but I am 
asking questions in order to learn and tease out the issues, not oppose, right 
now).

David Singer
Manager, Software Standards, Apple Inc.



Re: [whatwg] [url] Feedback from TPAC

2014-11-04 Thread Anne van Kesteren
On Tue, Nov 4, 2014 at 4:24 PM, David Singer  wrote:
> at the moment I am more interested in understanding what the best behavior 
> might be than majority voting

I don't think there is disagreement about what better behavior might
be in this case, if we skip over the details for the moment. However,
how likely is it in your estimation that Apple changes the URL parser
it ships in this regard?


-- 
https://annevankesteren.nl/


Re: [whatwg] [url] Feedback from TPAC

2014-11-04 Thread David Singer

On Nov 4, 2014, at 15:09 , Sam Ruby  wrote:

> On 11/04/2014 09:55 AM, David Singer wrote:
>> I am pretty puzzled why the base URL  composed 
>> with the URL  results not in
>> 
>> stuff://www.app.com/a/b/banana
>> 
>> but
>> 
>> stuff:///banana
>> 
>> Is this a bug or feature of the spec., or a bug in this implementation?
> 
> Please refresh.  I've changed the implementation to match the spec. Spoiler 
> alert: the results returned now don't match either of the values you mention 
> above.

banana

really not good.

> Either of those results would be a bug in the implementation. Per the
> specification

proposal;  the existing specification is the RFC

> and dominant URL implementations

at the moment I am more interested in understanding what the best behavior 
might be than majority voting

> only a limited set of
> schemes support relative URLs.

Not good.  I mean, this means that we would have to change the base generic 
spec. whenever a new scheme comes along for which relative references make 
sense.  RFC 3986 is clear that relative references are a general feature.  I am 
obviously not understanding why it might be desirable to be so future-fragile 
here.  (Particularly since I am thinking of defining exactly such a scheme).


> 
> - Sam Ruby
> 
>> On Nov 4, 2014, at 14:32 , Anne van Kesteren  wrote:
>> 
>>> On Tue, Nov 4, 2014 at 3:28 PM, Sam Ruby  wrote:
 To help foster discussion, I've made an alternate version of the live URL
 parser page, one that enables setting of the base URL:
 
 http://intertwingly.net/projects/pegurl/liveview2.html#foobar://test/x
 
 Of course, if there are any bugs in the proposed reference implementation,
 I'm interested in that too.
>>> 
>>> Per the URL Standard resolving "x" against "test:test" results in
>>> failure, not "test:///x".
>>> 
>>> 
>>> --
>>> https://annevankesteren.nl/
>> 
>> David Singer
>> Manager, Software Standards, Apple Inc.
>> 

David Singer
Manager, Software Standards, Apple Inc.



Re: [whatwg] [url] Feedback from TPAC

2014-11-04 Thread Sam Ruby

On 11/04/2014 09:55 AM, David Singer wrote:

I am pretty puzzled why the base URL  composed with the URL 
 results not in

stuff://www.app.com/a/b/banana

but

stuff:///banana

Is this a bug or feature of the spec., or a bug in this implementation?


Please refresh.  I've changed the implementation to match the spec. 
Spoiler alert: the results returned now don't match either of the values 
you mention above.


- Sam Ruby


On Nov 4, 2014, at 14:32 , Anne van Kesteren  wrote:


On Tue, Nov 4, 2014 at 3:28 PM, Sam Ruby  wrote:

To help foster discussion, I've made an alternate version of the live URL
parser page, one that enables setting of the base URL:

http://intertwingly.net/projects/pegurl/liveview2.html#foobar://test/x

Of course, if there are any bugs in the proposed reference implementation,
I'm interested in that too.


Per the URL Standard resolving "x" against "test:test" results in
failure, not "test:///x".


--
https://annevankesteren.nl/


David Singer
Manager, Software Standards, Apple Inc.



Re: [whatwg] [url] Feedback from TPAC

2014-11-04 Thread Anne van Kesteren
On Tue, Nov 4, 2014 at 3:55 PM, David Singer  wrote:
> I am pretty puzzled why the base URL  composed with 
> the URL  results not in
>
> stuff://www.app.com/a/b/banana
>
> but
>
> stuff:///banana
>
> Is this a bug or feature of the spec., or a bug in this implementation?

Either of those results would be a bug in the implementation. Per the
specification and dominant URL implementations only a limited set of
schemes support relative URLs.


-- 
https://annevankesteren.nl/


Re: [whatwg] [url] Feedback from TPAC

2014-11-04 Thread David Singer
I am pretty puzzled why the base URL  composed with 
the URL  results not in 

stuff://www.app.com/a/b/banana

but

stuff:///banana 

Is this a bug or feature of the spec., or a bug in this implementation?

On Nov 4, 2014, at 14:32 , Anne van Kesteren  wrote:

> On Tue, Nov 4, 2014 at 3:28 PM, Sam Ruby  wrote:
>> To help foster discussion, I've made an alternate version of the live URL
>> parser page, one that enables setting of the base URL:
>> 
>> http://intertwingly.net/projects/pegurl/liveview2.html#foobar://test/x
>> 
>> Of course, if there are any bugs in the proposed reference implementation,
>> I'm interested in that too.
> 
> Per the URL Standard resolving "x" against "test:test" results in
> failure, not "test:///x".
> 
> 
> -- 
> https://annevankesteren.nl/

David Singer
Manager, Software Standards, Apple Inc.



Re: [whatwg] [url] Feedback from TPAC

2014-11-04 Thread Sam Ruby

On 11/04/2014 09:32 AM, Anne van Kesteren wrote:

On Tue, Nov 4, 2014 at 3:28 PM, Sam Ruby  wrote:

To help foster discussion, I've made an alternate version of the live URL
parser page, one that enables setting of the base URL:

http://intertwingly.net/projects/pegurl/liveview2.html#foobar://test/x

Of course, if there are any bugs in the proposed reference implementation,
I'm interested in that too.


Per the URL Standard resolving "x" against "test:test" results in
failure, not "test:///x".


Fixed.  Thanks!

Perhaps over time we could add this to urltestdata.txt[1]?  Meanwhile, 
I'll track such proposed additions here:


https://github.com/rubys/url/blob/peg.js/reference-implementation/test/moretestdata.txt

- Sam Ruby

[1] 
https://github.com/w3c/web-platform-tests/blob/master/url/urltestdata.txt


Re: [whatwg] [url] Feedback from TPAC

2014-11-04 Thread Anne van Kesteren
On Tue, Nov 4, 2014 at 3:28 PM, Sam Ruby  wrote:
> To help foster discussion, I've made an alternate version of the live URL
> parser page, one that enables setting of the base URL:
>
> http://intertwingly.net/projects/pegurl/liveview2.html#foobar://test/x
>
> Of course, if there are any bugs in the proposed reference implementation,
> I'm interested in that too.

Per the URL Standard resolving "x" against "test:test" results in
failure, not "test:///x".


-- 
https://annevankesteren.nl/


Re: [whatwg] [url] Feedback from TPAC

2014-11-04 Thread Sam Ruby

On 11/03/2014 10:32 AM, Anne van Kesteren wrote:

On Mon, Nov 3, 2014 at 4:19 PM, David Singer  wrote:

The readability is much better (I am not a fan of the current trend of writing 
specifications in pseudo-basic, which makes life easier for implementers and 
terrible for anyone else, including authors), and I also think that an approach 
that doesn’t obsolete RFC 3986 is attractive.


Is Apple interested in changing its URL infrastructure to not be
fundamentally incompatible with RFC 3986 then?

Other than slightly different eventual data models for URLs, which we
could maybe amend RFC 3986 for IETF gods willing, I think the main
problem is that a URL that goes through an RFC 3986 pipeline cannot go
through a URL pipeline. E.g. parsing "../test" against
"foobar://test/x" gives wildly different results. That is not a state
we want to be in, so something has to give.


I would hope that everybody involved would enter into this discussion 
being willing to give a bit.


To help foster discussion, I've made an alternate version of the live 
URL parser page, one that enables setting of the base URL:


http://intertwingly.net/projects/pegurl/liveview2.html#foobar://test/x

Of course, if there are any bugs in the proposed reference 
implementation, I'm interested in that too.


- Sam Ruby




Re: [whatwg] [url] Feedback from TPAC

2014-11-03 Thread David Singer

On Nov 3, 2014, at 15:32 , Anne van Kesteren  wrote:

> On Mon, Nov 3, 2014 at 4:19 PM, David Singer  wrote:
>> The readability is much better (I am not a fan of the current trend of 
>> writing specifications in pseudo-basic, which makes life easier for 
>> implementers and terrible for anyone else, including authors), and I also 
>> think that an approach that doesn’t obsolete RFC 3986 is attractive.
> 
> Is Apple interested in changing its URL infrastructure to not be
> fundamentally incompatible with RFC 3986 then?

I was expressing a personal opinion on readability, and on living in a larger 
community, not an Apple position.

> 
> Other than slightly different eventual data models for URLs, which we
> could maybe amend RFC 3986 for IETF gods willing, I think the main
> problem is that a URL that goes through an RFC 3986 pipeline cannot go
> through a URL pipeline. E.g. parsing "../test" against
> "foobar://test/x" gives wildly different results. That is not a state
> we want to be in, so something has to give.

Agreed, we have to work out the differences. 


David Singer
Manager, Software Standards, Apple Inc.



Re: [whatwg] [url] Feedback from TPAC

2014-11-03 Thread Anne van Kesteren
On Mon, Nov 3, 2014 at 4:19 PM, David Singer  wrote:
> The readability is much better (I am not a fan of the current trend of 
> writing specifications in pseudo-basic, which makes life easier for 
> implementers and terrible for anyone else, including authors), and I also 
> think that an approach that doesn’t obsolete RFC 3986 is attractive.

Is Apple interested in changing its URL infrastructure to not be
fundamentally incompatible with RFC 3986 then?

Other than slightly different eventual data models for URLs, which we
could maybe amend RFC 3986 for IETF gods willing, I think the main
problem is that a URL that goes through an RFC 3986 pipeline cannot go
through a URL pipeline. E.g. parsing "../test" against
"foobar://test/x" gives wildly different results. That is not a state
we want to be in, so something has to give.


-- 
https://annevankesteren.nl/


Re: [whatwg] [url] Feedback from TPAC

2014-11-03 Thread David Singer

On Nov 2, 2014, at 20:05 , Sam Ruby  wrote:

> Third, here's a completely different approach to defining URLs that produces 
> the same results (modulo one parse error that Anne agrees[2] should changed 
> in be in the WHATWG spec):
> 
> http://intertwingly.net/projects/pegurl/url.html#url
> 

I rather like this.  The readability is much better (I am not a fan of the 
current trend of writing specifications in pseudo-basic, which makes life 
easier for implementers and terrible for anyone else, including authors), and I 
also think that an approach that doesn’t obsolete RFC 3986 is attractive.


David Singer
Manager, Software Standards, Apple Inc.



Re: [whatwg] [url] Feedback from TPAC

2014-11-02 Thread Sam Ruby

On 11/02/2014 02:32 PM, Graham Klyne wrote:

On 01/11/2014 00:01, Sam Ruby wrote:


3) Explicitly state that canonical URLs (i.e., the output of the URL
parse step)
not only round trip but also are valid URIs.  If there are any RFC
3986 errata
and/or willful violations necessary to make that a true statement, so
be it.


It's not clear to me what it is that might be "willfully violated".


Perhaps nothing.


Specifically, I find the notion of "relative scheme" in  [1] to be, at
best, confusing, and at worst something that could break a whole swathe
of existing URI processing.  I don't know which, as on a brief look I
don't understand what [1] is trying to say here, and I lack time (and
will) to dive into the arcane style used for specifying URLs.


First, I'm assuming that by [1], you mean 
https://url.spec.whatwg.org/#relative-scheme


Second, I have no idea how a specification that essentially says "here's 
what a set of browsers, languages, and libraries are converging on to 
convert URLs into URIs can break URIs.


Third, here's a completely different approach to defining URLs that 
produces the same results (modulo one parse error that Anne agrees[2] 
should changed in be in the WHATWG spec):


http://intertwingly.net/projects/pegurl/url.html#url

If for some reason you don't find that to be to your liking, I'll be 
glad to try to meet you half way.  I just need something more to go on 
than "arcane".



I think there may be a confusion here between syntax and
interpretation.  When the term "relative" is used in URI/URL context, I
immediately think of "relative reference" per RFC3986.   I suspect what
is being alluded to is that some URI schemes are not global in the
idealized sense of URIs as a global namespace - file:///foo dereferences
differently depending on where it is used - the relativity here being in
the relation between the URI/URL and the thing identified, with respect
the the where the URI is actually processed.


If you find it confusing, perhaps others will too.  Concrete suggestions 
on what should be changed would be helpful.



To change the syntactic definition of "relative reference" to include
things like file: and ftp: URIs would cause all sorts of breakage, and
require significant updating of the resolution algorithm in RFC3986
(more than would be appropriate for a mere "erratum", IMO).  I'm hoping
this is not the kind of willful violation that is being contemplated here.


Note in reformulated grammar, file is no longer treated the same as 
other types of relative references.  I am not wedded to any of those 
terms, if you suggest better ones I'll accommodate.


If errata can be produced expeditiously for RFC3986, then there 
shouldn't be any need for willful violations.



#g
--


[2] 
http://lists.w3.org/Archives/Public/public-whatwg-archive/2014Oct/0267.html


Re: [whatwg] [url] Feedback from TPAC

2014-11-01 Thread Sam Ruby

On 11/01/2014 07:18 PM, Barry Leiba wrote:

Thanks, Sam, for this great summary -- I hadn't taken notes, and was
hoping that someone who was (or who has a better memory than I) would
post something.

One minor tweak, at the end:


More specifically, if something along these lines I describe above were
done, the IETF would be open to the idea of errata to RFC3987 and updating
specs to reference URLs.


Errata to 3986, that is, not 3987.  After this, 3987 will be
considered obsolete (the IESG might move to mark it "Historic", or
some such).


Thanks for the correction.  I did indeed mean errata to 3986.

- Sam Ruby


Barry, IETF Applications AD

On Fri, Oct 31, 2014 at 8:01 PM, Sam Ruby  wrote:

bcc: WebApps, IETF, TAG in the hopes that replies go to a single place.

- - -

I took the opportunity this week to meet with a number of parties interested
in the topic of URLs including not only a number of Working Groups, AC and
AB members, but also members of the TAG and members of the IETF.

Some of the feedback related to the proposal I am working on[1].  Some of
the feedback related to mechanics (example: employing Travis to do build
checks, something that makes more sense on the master copy of a given
specification than on a hopefully temporary branch.  These are not the
topics of this email.

The remaining items are more general, and are the subject of this note.  As
is often the case, they are intertwined.  I'll simply jump into the middle
and work outwards from there.

---

The nature of the world is that there will continue to be people who define
more schemes.  A current example is http://openjdk.java.net/jeps/220 (search
for "New URI scheme for naming stored modules, classes, and resources").
And people who are doing so will have a tendency to look to the IETF.

Meanwhile, The IETF is actively working on a update:

https://tools.ietf.org/html/draft-ietf-appsawg-uri-scheme-reg-04

They are meeting F2F in a little over a week[2].  URIs in general, and this
proposal in specific will be discussed, and for that reason now would be a
good time to provide feedback.  I've only quickly scanned it, but it appears
sane to me in that it basically says that new schemes will not be viewed as
relative schemes[3].

The obvious disconnect is that this is a registry for URI schemes, not URLs.
It looks to me like making a few, small, surgical updates to the URL
Standard would stitch all this together.

1) Change the URL Goals to only obsolete RFC 3987, not RFC 3986 too.

2) Reference draft-ietf-appsawg-uri-scheme-reg in
https://url.spec.whatwg.org/#url-writing as the way to register schemes,
stating that the set of valid URI schemes is the set of valid URL schemes.

3) Explicitly state that canonical URLs (i.e., the output of the URL parse
step) not only round trip but also are valid URIs.  If there are any RFC
3986 errata and/or willful violations necessary to make that a true
statement, so be it.

That's it.  The rest of the URL specification can stand as is.

What this means operationally is that there are two terms, URIs and URLs.
URIs would be of a legacy, academic topic that may be of relevance to some
(primarily back-end server) applications.  URLs are most people, and most
applications, will be concerned with.  This includes all the specifications
which today reference IRIs (as an example, RFC 4287, namely, Atom).

My sense was that all of the people I talked to were generally OK with this,
and that we would be likely to see statements from both the IETF and the W3C
TAG along these lines mid November-ish, most likely just after IETF meeting
91.

More specifically, if something along these lines I describe above were
done, the IETF would be open to the idea of errata to RFC3987 and updating
specs to reference URLs.

- Sam Ruby

[1] http://intertwingly.net/projects/pegurl/url.html
[2] https://www.ietf.org/meeting/91/index.html
[3] https://url.spec.whatwg.org/#relative-scheme





Re: [whatwg] [url] Feedback from TPAC

2014-11-01 Thread Anne van Kesteren
On Sat, Nov 1, 2014 at 1:29 PM, Sam Ruby  wrote:
> https://tools.ietf.org/html/draft-ietf-appsawg-uri-scheme-reg-04

I don't know how it should change. I just said that it doesn't say
that you cannot invent new hierarchical URIs (to use IETF terms). As
long as the IETF keeps building on top of RFC 3986, the mismatch will
continue.


>> I just gave you one, "%"... E.g. "http://example.org/?%"; does not have
>> an RFC 3986 representation.
>
> Here's the output of a URI parser:
>
> $ ruby -r addressable/uri -e "p
> Addressable::URI.parse('http://example.org/?%').query"
> "%"

That's a bug in that parser, then. (Assuming it meant to conform to RFC 3986.)

Not sure how that helps.


-- 
https://annevankesteren.nl/


Re: [whatwg] [url] Feedback from TPAC

2014-11-01 Thread Sam Ruby

On 11/1/14 7:56 AM, Anne van Kesteren wrote:

On Sat, Nov 1, 2014 at 12:38 PM, Sam Ruby  wrote:

On 11/1/14 5:29 AM, Anne van Kesteren wrote:

It doesn't say that. (We should perhaps try to find some way to make
"{scheme}://" syntax work for schemes that are not problematic (e.g.
javascript would be problematic). Convincing implementers that it's
worth implementing might be trickier.)


How should it change?


Not sure what you're referring to.


https://tools.ietf.org/html/draft-ietf-appsawg-uri-scheme-reg-04


I just gave you one, "%"... E.g. "http://example.org/?%"; does not have
an RFC 3986 representation.


Here's the output of a URL parser (the one I chose was Firefox):

new URL("http://example.com/?%";).search
"?%"

Here's the output of a URI parser:

$ ruby -r addressable/uri -e "p 
Addressable::URI.parse('http://example.org/?%').query"

"%"

I also assert that such a URL round-trips a URL parse/serialize sequence.

- Sam Ruby


Re: [whatwg] [url] Feedback from TPAC

2014-11-01 Thread Anne van Kesteren
On Sat, Nov 1, 2014 at 12:38 PM, Sam Ruby  wrote:
> On 11/1/14 5:29 AM, Anne van Kesteren wrote:
>> It doesn't say that. (We should perhaps try to find some way to make
>> "{scheme}://" syntax work for schemes that are not problematic (e.g.
>> javascript would be problematic). Convincing implementers that it's
>> worth implementing might be trickier.)
>
> How should it change?

Not sure what you're referring to.


> Acknowledging that other parsers exist is quite a different statement than
> requiring two parsers.  I'm only suggesting the former.

We haven't done that for other formats. And it doesn't help with convergence.


> That may not be as we would wish it to be.  But it would be a disservice to
> everyone to document how we would wish things to be rather than how they
> actually are (and, by all indications, are likely to remain for the
> foreseeable future).

This contradicts with most WHATWG work. WHATWG standards describe how
things should be, taking into account the realities of deployed
content.

That is not to say that documenting how things actually are is not
worthwhile, it's just not what we do. We describe something that
hopefully leads to convergence between implementations. That way
developers five to ten years or so from now, no longer have to paper
over the differences.


> I do plan to work with others to figure out the delta.  As to the data
> models, at the present time -- and without having actually done the
> necessary analysis -- I am not aware of a single case where they would be
> different.

I just gave you one, "%"... E.g. "http://example.org/?%"; does not have
an RFC 3986 representation.


-- 
https://annevankesteren.nl/


Re: [whatwg] [url] Feedback from TPAC

2014-11-01 Thread Sam Ruby

On 11/1/14 5:29 AM, Anne van Kesteren wrote:

On Sat, Nov 1, 2014 at 1:01 AM, Sam Ruby  wrote:

Meanwhile, The IETF is actively working on a update:

https://tools.ietf.org/html/draft-ietf-appsawg-uri-scheme-reg-04

They are meeting F2F in a little over a week.  URIs in general, and this
proposal in specific will be discussed, and for that reason now would be a
good time to provide feedback.  I've only quickly scanned it, but it appears
sane to me in that it basically says that new schemes will not be viewed as
relative schemes.


It doesn't say that. (We should perhaps try to find some way to make
"{scheme}://" syntax work for schemes that are not problematic (e.g.
javascript would be problematic). Convincing implementers that it's
worth implementing might be trickier.)


How should it change?


1) Change the URL Goals to only obsolete RFC 3987, not RFC 3986 too.


See previous threads on the subject. The data models are incompatible,
at least around "%", likely also around other code points. It also
seems unacceptable to require two parsers for URLs.


Acknowledging that other parsers exist is quite a different statement 
than requiring two parsers.  I'm only suggesting the former.


As a concrete statement, a compliant implementation of HTML would 
require a URL parser, but not a URI parser.


Also as a concrete statement, such a user agent will interact, primarily 
via the network, with other software that will interpret the 
canonicalized URL's as if they were URIs.


That may not be as we would wish it to be.  But it would be a disservice 
to everyone to document how we would wish things to be rather than how 
they actually are (and, by all indications, are likely to remain for the 
foreseeable future).



3) Explicitly state that canonical URLs (i.e., the output of the URL parse
step) not only round trip but also are valid URIs.  If there are any RFC
3986 errata and/or willful violations necessary to make that a true
statement, so be it.


It might be interesting to figure out the delta. But there are major
differences between RFC 3986 and URL. Not obsoleting the former seems
like a disservice to anyone looking to implement a parser or find
information on URI/URL.


I do plan to work with others to figure out the delta.  As to the data 
models, at the present time -- and without having actually done the 
necessary analysis -- I am not aware of a single case where they would 
be different.  Undoubtedly we will be able to quickly find some, but 
even so, I would assert that they following statements will remain true 
for the domain of canonicalized URLs, by which I mean the set of 
possible outputs of the URL serializer:


1) the overlap is substantial, and I would dare say overwhelming.

2) RFC 3986 and URL compliant parsers would interpret the same bytes in 
such outputs as delimiters, schemes, paths, fragments, etc.


3) as to data models, the URL Standard is silent as to how such bytes be 
interpreted.  As to the meaning of '%', both the URL Standard and 
RFC3986 recognize that encodings other than utf-8 exist, and that such 
will affect the interpretation of percent encoded byte sequences.


- Sam Ruby


Re: [whatwg] [url] Feedback from TPAC

2014-11-01 Thread Anne van Kesteren
On Sat, Nov 1, 2014 at 1:01 AM, Sam Ruby  wrote:
> Meanwhile, The IETF is actively working on a update:
>
> https://tools.ietf.org/html/draft-ietf-appsawg-uri-scheme-reg-04
>
> They are meeting F2F in a little over a week.  URIs in general, and this
> proposal in specific will be discussed, and for that reason now would be a
> good time to provide feedback.  I've only quickly scanned it, but it appears
> sane to me in that it basically says that new schemes will not be viewed as
> relative schemes.

It doesn't say that. (We should perhaps try to find some way to make
"{scheme}://" syntax work for schemes that are not problematic (e.g.
javascript would be problematic). Convincing implementers that it's
worth implementing might be trickier.)


> 1) Change the URL Goals to only obsolete RFC 3987, not RFC 3986 too.

See previous threads on the subject. The data models are incompatible,
at least around "%", likely also around other code points. It also
seems unacceptable to require two parsers for URLs.


> 3) Explicitly state that canonical URLs (i.e., the output of the URL parse
> step) not only round trip but also are valid URIs.  If there are any RFC
> 3986 errata and/or willful violations necessary to make that a true
> statement, so be it.

It might be interesting to figure out the delta. But there are major
differences between RFC 3986 and URL. Not obsoleting the former seems
like a disservice to anyone looking to implement a parser or find
information on URI/URL.


-- 
https://annevankesteren.nl/


[whatwg] [url] Feedback from TPAC

2014-10-31 Thread Sam Ruby

bcc: WebApps, IETF, TAG in the hopes that replies go to a single place.

- - -

I took the opportunity this week to meet with a number of parties 
interested in the topic of URLs including not only a number of Working 
Groups, AC and AB members, but also members of the TAG and members of 
the IETF.


Some of the feedback related to the proposal I am working on[1].  Some 
of the feedback related to mechanics (example: employing Travis to do 
build checks, something that makes more sense on the master copy of a 
given specification than on a hopefully temporary branch.  These are not 
the topics of this email.


The remaining items are more general, and are the subject of this note. 
 As is often the case, they are intertwined.  I'll simply jump into the 
middle and work outwards from there.


---

The nature of the world is that there will continue to be people who 
define more schemes.  A current example is 
http://openjdk.java.net/jeps/220 (search for "New URI scheme for naming 
stored modules, classes, and resources").  And people who are doing so 
will have a tendency to look to the IETF.


Meanwhile, The IETF is actively working on a update:

https://tools.ietf.org/html/draft-ietf-appsawg-uri-scheme-reg-04

They are meeting F2F in a little over a week[2].  URIs in general, and 
this proposal in specific will be discussed, and for that reason now 
would be a good time to provide feedback.  I've only quickly scanned it, 
but it appears sane to me in that it basically says that new schemes 
will not be viewed as relative schemes[3].


The obvious disconnect is that this is a registry for URI schemes, not 
URLs.  It looks to me like making a few, small, surgical updates to the 
URL Standard would stitch all this together.


1) Change the URL Goals to only obsolete RFC 3987, not RFC 3986 too.

2) Reference draft-ietf-appsawg-uri-scheme-reg in 
https://url.spec.whatwg.org/#url-writing as the way to register schemes, 
stating that the set of valid URI schemes is the set of valid URL schemes.


3) Explicitly state that canonical URLs (i.e., the output of the URL 
parse step) not only round trip but also are valid URIs.  If there are 
any RFC 3986 errata and/or willful violations necessary to make that a 
true statement, so be it.


That's it.  The rest of the URL specification can stand as is.

What this means operationally is that there are two terms, URIs and 
URLs.  URIs would be of a legacy, academic topic that may be of 
relevance to some (primarily back-end server) applications.  URLs are 
most people, and most applications, will be concerned with.  This 
includes all the specifications which today reference IRIs (as an 
example, RFC 4287, namely, Atom).


My sense was that all of the people I talked to were generally OK with 
this, and that we would be likely to see statements from both the IETF 
and the W3C TAG along these lines mid November-ish, most likely just 
after IETF meeting 91.


More specifically, if something along these lines I describe above were 
done, the IETF would be open to the idea of errata to RFC3987 and 
updating specs to reference URLs.


- Sam Ruby

[1] http://intertwingly.net/projects/pegurl/url.html
[2] https://www.ietf.org/meeting/91/index.html
[3] https://url.spec.whatwg.org/#relative-scheme