Re: [whatwg] [url] Feedback from TPAC

2014-11-01 Thread Sam Ruby

On 11/01/2014 07:18 PM, Barry Leiba wrote:

Thanks, Sam, for this great summary -- I hadn't taken notes, and was
hoping that someone who was (or who has a better memory than I) would
post something.

One minor tweak, at the end:


More specifically, if something along these lines I describe above were
done, the IETF would be open to the idea of errata to RFC3987 and updating
specs to reference URLs.


Errata to 3986, that is, not 3987.  After this, 3987 will be
considered obsolete (the IESG might move to mark it "Historic", or
some such).


Thanks for the correction.  I did indeed mean errata to 3986.

- Sam Ruby


Barry, IETF Applications AD

On Fri, Oct 31, 2014 at 8:01 PM, Sam Ruby  wrote:

bcc: WebApps, IETF, TAG in the hopes that replies go to a single place.

- - -

I took the opportunity this week to meet with a number of parties interested
in the topic of URLs including not only a number of Working Groups, AC and
AB members, but also members of the TAG and members of the IETF.

Some of the feedback related to the proposal I am working on[1].  Some of
the feedback related to mechanics (example: employing Travis to do build
checks, something that makes more sense on the master copy of a given
specification than on a hopefully temporary branch.  These are not the
topics of this email.

The remaining items are more general, and are the subject of this note.  As
is often the case, they are intertwined.  I'll simply jump into the middle
and work outwards from there.

---

The nature of the world is that there will continue to be people who define
more schemes.  A current example is http://openjdk.java.net/jeps/220 (search
for "New URI scheme for naming stored modules, classes, and resources").
And people who are doing so will have a tendency to look to the IETF.

Meanwhile, The IETF is actively working on a update:

https://tools.ietf.org/html/draft-ietf-appsawg-uri-scheme-reg-04

They are meeting F2F in a little over a week[2].  URIs in general, and this
proposal in specific will be discussed, and for that reason now would be a
good time to provide feedback.  I've only quickly scanned it, but it appears
sane to me in that it basically says that new schemes will not be viewed as
relative schemes[3].

The obvious disconnect is that this is a registry for URI schemes, not URLs.
It looks to me like making a few, small, surgical updates to the URL
Standard would stitch all this together.

1) Change the URL Goals to only obsolete RFC 3987, not RFC 3986 too.

2) Reference draft-ietf-appsawg-uri-scheme-reg in
https://url.spec.whatwg.org/#url-writing as the way to register schemes,
stating that the set of valid URI schemes is the set of valid URL schemes.

3) Explicitly state that canonical URLs (i.e., the output of the URL parse
step) not only round trip but also are valid URIs.  If there are any RFC
3986 errata and/or willful violations necessary to make that a true
statement, so be it.

That's it.  The rest of the URL specification can stand as is.

What this means operationally is that there are two terms, URIs and URLs.
URIs would be of a legacy, academic topic that may be of relevance to some
(primarily back-end server) applications.  URLs are most people, and most
applications, will be concerned with.  This includes all the specifications
which today reference IRIs (as an example, RFC 4287, namely, Atom).

My sense was that all of the people I talked to were generally OK with this,
and that we would be likely to see statements from both the IETF and the W3C
TAG along these lines mid November-ish, most likely just after IETF meeting
91.

More specifically, if something along these lines I describe above were
done, the IETF would be open to the idea of errata to RFC3987 and updating
specs to reference URLs.

- Sam Ruby

[1] http://intertwingly.net/projects/pegurl/url.html
[2] https://www.ietf.org/meeting/91/index.html
[3] https://url.spec.whatwg.org/#relative-scheme





Re: [whatwg] allow in body + DOM position as a rendering hint

2014-11-01 Thread Simon Pieters
On Sat, 01 Nov 2014 02:34:42 +0200, Ilya Grigorik   
wrote:


Before we get into the pros and cons of "scoped", I think it's important  
to

highlight that  in body is already a fact of life:
1) developers already put  tags in body, specs be damned.
2) all browsers support  tags in body because of #1.

Given the above conditions, the spec is out of sync with reality and I
think it's worth considering updating the spec to reflect this? Doing so
would also allow the browsers to convert this case from an error  
condition

into an optimization - e.g. we can treat position as a hint to optimize
rendering.


I think this line of reasoning is missing one consideration, namely the  
negative effect of using  or 

Re: [whatwg] HTML has no definition / automated test suite

2014-11-01 Thread Tab Atkins Jr.
On Sat, Nov 1, 2014 at 7:18 AM, Stefan Reich
 wrote:
> Hi WhatWG and friends!
>
> I am currently making an AI to create HTML. In the process, I discovered a
> logical problem: HTML is not clearly defined. Not as far as I know anyway.
>
> A proper definition of HTML would include collections of sample HTML source
> plus IMAGES of how they look rendered.
>
> That, my AI could work with.
>
> Also, I think this is very important to have - I vividly remember all those
> years of fighting with browser inconsistencies and the very undefinedness I
> am talking about that still exists. (More on my blog at tinybrain.de).
>
> Q: Does such a test suite for HTML exist? If not, it is time to create that.
>
> Alternatively, what one could create is a virtualized browser. My AI could
> also learn from that, basically. But a virtualized browser is a complicated
> piece of software, and there is no proper infrastructure for virtual
> programs yet (another lack in IT today).
>
> So I assume a test suite of sources + images will be easier to make right
> now.
>
> Let's define HTML properly!

The behavior of HTML is well-defined; where it's not, it's a bug, and
reporting it would be appreciated.

You're talking about rendering, which is the domain of CSS.  CSS
should also be reasonably well-defined.

Examples, including example renderings, are often useful for
understanding, but they're never part of an actual definition. They
just help the reader visualize something quickly, rather than
requiring them to understand it all from the code.

~TJ


[whatwg] HTML has no definition / automated test suite

2014-11-01 Thread Stefan Reich
Hi WhatWG and friends!

I am currently making an AI to create HTML. In the process, I discovered a
logical problem: HTML is not clearly defined. Not as far as I know anyway.

A proper definition of HTML would include collections of sample HTML source
plus IMAGES of how they look rendered.

That, my AI could work with.

Also, I think this is very important to have - I vividly remember all those
years of fighting with browser inconsistencies and the very undefinedness I
am talking about that still exists. (More on my blog at tinybrain.de).

Q: Does such a test suite for HTML exist? If not, it is time to create that.

Alternatively, what one could create is a virtualized browser. My AI could
also learn from that, basically. But a virtualized browser is a complicated
piece of software, and there is no proper infrastructure for virtual
programs yet (another lack in IT today).

So I assume a test suite of sources + images will be easier to make right
now.

Let's define HTML properly!

Cheers from Hamburg,
Stefan


Re: [whatwg] [url] Feedback from TPAC

2014-11-01 Thread Anne van Kesteren
On Sat, Nov 1, 2014 at 1:29 PM, Sam Ruby  wrote:
> https://tools.ietf.org/html/draft-ietf-appsawg-uri-scheme-reg-04

I don't know how it should change. I just said that it doesn't say
that you cannot invent new hierarchical URIs (to use IETF terms). As
long as the IETF keeps building on top of RFC 3986, the mismatch will
continue.


>> I just gave you one, "%"... E.g. "http://example.org/?%"; does not have
>> an RFC 3986 representation.
>
> Here's the output of a URI parser:
>
> $ ruby -r addressable/uri -e "p
> Addressable::URI.parse('http://example.org/?%').query"
> "%"

That's a bug in that parser, then. (Assuming it meant to conform to RFC 3986.)

Not sure how that helps.


-- 
https://annevankesteren.nl/


Re: [whatwg] [url] Feedback from TPAC

2014-11-01 Thread Sam Ruby

On 11/1/14 7:56 AM, Anne van Kesteren wrote:

On Sat, Nov 1, 2014 at 12:38 PM, Sam Ruby  wrote:

On 11/1/14 5:29 AM, Anne van Kesteren wrote:

It doesn't say that. (We should perhaps try to find some way to make
"{scheme}://" syntax work for schemes that are not problematic (e.g.
javascript would be problematic). Convincing implementers that it's
worth implementing might be trickier.)


How should it change?


Not sure what you're referring to.


https://tools.ietf.org/html/draft-ietf-appsawg-uri-scheme-reg-04


I just gave you one, "%"... E.g. "http://example.org/?%"; does not have
an RFC 3986 representation.


Here's the output of a URL parser (the one I chose was Firefox):

new URL("http://example.com/?%";).search
"?%"

Here's the output of a URI parser:

$ ruby -r addressable/uri -e "p 
Addressable::URI.parse('http://example.org/?%').query"

"%"

I also assert that such a URL round-trips a URL parse/serialize sequence.

- Sam Ruby


Re: [whatwg] [url] Feedback from TPAC

2014-11-01 Thread Anne van Kesteren
On Sat, Nov 1, 2014 at 12:38 PM, Sam Ruby  wrote:
> On 11/1/14 5:29 AM, Anne van Kesteren wrote:
>> It doesn't say that. (We should perhaps try to find some way to make
>> "{scheme}://" syntax work for schemes that are not problematic (e.g.
>> javascript would be problematic). Convincing implementers that it's
>> worth implementing might be trickier.)
>
> How should it change?

Not sure what you're referring to.


> Acknowledging that other parsers exist is quite a different statement than
> requiring two parsers.  I'm only suggesting the former.

We haven't done that for other formats. And it doesn't help with convergence.


> That may not be as we would wish it to be.  But it would be a disservice to
> everyone to document how we would wish things to be rather than how they
> actually are (and, by all indications, are likely to remain for the
> foreseeable future).

This contradicts with most WHATWG work. WHATWG standards describe how
things should be, taking into account the realities of deployed
content.

That is not to say that documenting how things actually are is not
worthwhile, it's just not what we do. We describe something that
hopefully leads to convergence between implementations. That way
developers five to ten years or so from now, no longer have to paper
over the differences.


> I do plan to work with others to figure out the delta.  As to the data
> models, at the present time -- and without having actually done the
> necessary analysis -- I am not aware of a single case where they would be
> different.

I just gave you one, "%"... E.g. "http://example.org/?%"; does not have
an RFC 3986 representation.


-- 
https://annevankesteren.nl/


Re: [whatwg] [url] Feedback from TPAC

2014-11-01 Thread Sam Ruby

On 11/1/14 5:29 AM, Anne van Kesteren wrote:

On Sat, Nov 1, 2014 at 1:01 AM, Sam Ruby  wrote:

Meanwhile, The IETF is actively working on a update:

https://tools.ietf.org/html/draft-ietf-appsawg-uri-scheme-reg-04

They are meeting F2F in a little over a week.  URIs in general, and this
proposal in specific will be discussed, and for that reason now would be a
good time to provide feedback.  I've only quickly scanned it, but it appears
sane to me in that it basically says that new schemes will not be viewed as
relative schemes.


It doesn't say that. (We should perhaps try to find some way to make
"{scheme}://" syntax work for schemes that are not problematic (e.g.
javascript would be problematic). Convincing implementers that it's
worth implementing might be trickier.)


How should it change?


1) Change the URL Goals to only obsolete RFC 3987, not RFC 3986 too.


See previous threads on the subject. The data models are incompatible,
at least around "%", likely also around other code points. It also
seems unacceptable to require two parsers for URLs.


Acknowledging that other parsers exist is quite a different statement 
than requiring two parsers.  I'm only suggesting the former.


As a concrete statement, a compliant implementation of HTML would 
require a URL parser, but not a URI parser.


Also as a concrete statement, such a user agent will interact, primarily 
via the network, with other software that will interpret the 
canonicalized URL's as if they were URIs.


That may not be as we would wish it to be.  But it would be a disservice 
to everyone to document how we would wish things to be rather than how 
they actually are (and, by all indications, are likely to remain for the 
foreseeable future).



3) Explicitly state that canonical URLs (i.e., the output of the URL parse
step) not only round trip but also are valid URIs.  If there are any RFC
3986 errata and/or willful violations necessary to make that a true
statement, so be it.


It might be interesting to figure out the delta. But there are major
differences between RFC 3986 and URL. Not obsoleting the former seems
like a disservice to anyone looking to implement a parser or find
information on URI/URL.


I do plan to work with others to figure out the delta.  As to the data 
models, at the present time -- and without having actually done the 
necessary analysis -- I am not aware of a single case where they would 
be different.  Undoubtedly we will be able to quickly find some, but 
even so, I would assert that they following statements will remain true 
for the domain of canonicalized URLs, by which I mean the set of 
possible outputs of the URL serializer:


1) the overlap is substantial, and I would dare say overwhelming.

2) RFC 3986 and URL compliant parsers would interpret the same bytes in 
such outputs as delimiters, schemes, paths, fragments, etc.


3) as to data models, the URL Standard is silent as to how such bytes be 
interpreted.  As to the meaning of '%', both the URL Standard and 
RFC3986 recognize that encodings other than utf-8 exist, and that such 
will affect the interpretation of percent encoded byte sequences.


- Sam Ruby


Re: [whatwg] [url] Feedback from TPAC

2014-11-01 Thread Anne van Kesteren
On Sat, Nov 1, 2014 at 1:01 AM, Sam Ruby  wrote:
> Meanwhile, The IETF is actively working on a update:
>
> https://tools.ietf.org/html/draft-ietf-appsawg-uri-scheme-reg-04
>
> They are meeting F2F in a little over a week.  URIs in general, and this
> proposal in specific will be discussed, and for that reason now would be a
> good time to provide feedback.  I've only quickly scanned it, but it appears
> sane to me in that it basically says that new schemes will not be viewed as
> relative schemes.

It doesn't say that. (We should perhaps try to find some way to make
"{scheme}://" syntax work for schemes that are not problematic (e.g.
javascript would be problematic). Convincing implementers that it's
worth implementing might be trickier.)


> 1) Change the URL Goals to only obsolete RFC 3987, not RFC 3986 too.

See previous threads on the subject. The data models are incompatible,
at least around "%", likely also around other code points. It also
seems unacceptable to require two parsers for URLs.


> 3) Explicitly state that canonical URLs (i.e., the output of the URL parse
> step) not only round trip but also are valid URIs.  If there are any RFC
> 3986 errata and/or willful violations necessary to make that a true
> statement, so be it.

It might be interesting to figure out the delta. But there are major
differences between RFC 3986 and URL. Not obsoleting the former seems
like a disservice to anyone looking to implement a parser or find
information on URI/URL.


-- 
https://annevankesteren.nl/