Re: [whatwg] [url] Feedback from TPAC
On 11/01/2014 07:18 PM, Barry Leiba wrote: Thanks, Sam, for this great summary -- I hadn't taken notes, and was hoping that someone who was (or who has a better memory than I) would post something. One minor tweak, at the end: More specifically, if something along these lines I describe above were done, the IETF would be open to the idea of errata to RFC3987 and updating specs to reference URLs. Errata to 3986, that is, not 3987. After this, 3987 will be considered obsolete (the IESG might move to mark it "Historic", or some such). Thanks for the correction. I did indeed mean errata to 3986. - Sam Ruby Barry, IETF Applications AD On Fri, Oct 31, 2014 at 8:01 PM, Sam Ruby wrote: bcc: WebApps, IETF, TAG in the hopes that replies go to a single place. - - - I took the opportunity this week to meet with a number of parties interested in the topic of URLs including not only a number of Working Groups, AC and AB members, but also members of the TAG and members of the IETF. Some of the feedback related to the proposal I am working on[1]. Some of the feedback related to mechanics (example: employing Travis to do build checks, something that makes more sense on the master copy of a given specification than on a hopefully temporary branch. These are not the topics of this email. The remaining items are more general, and are the subject of this note. As is often the case, they are intertwined. I'll simply jump into the middle and work outwards from there. --- The nature of the world is that there will continue to be people who define more schemes. A current example is http://openjdk.java.net/jeps/220 (search for "New URI scheme for naming stored modules, classes, and resources"). And people who are doing so will have a tendency to look to the IETF. Meanwhile, The IETF is actively working on a update: https://tools.ietf.org/html/draft-ietf-appsawg-uri-scheme-reg-04 They are meeting F2F in a little over a week[2]. URIs in general, and this proposal in specific will be discussed, and for that reason now would be a good time to provide feedback. I've only quickly scanned it, but it appears sane to me in that it basically says that new schemes will not be viewed as relative schemes[3]. The obvious disconnect is that this is a registry for URI schemes, not URLs. It looks to me like making a few, small, surgical updates to the URL Standard would stitch all this together. 1) Change the URL Goals to only obsolete RFC 3987, not RFC 3986 too. 2) Reference draft-ietf-appsawg-uri-scheme-reg in https://url.spec.whatwg.org/#url-writing as the way to register schemes, stating that the set of valid URI schemes is the set of valid URL schemes. 3) Explicitly state that canonical URLs (i.e., the output of the URL parse step) not only round trip but also are valid URIs. If there are any RFC 3986 errata and/or willful violations necessary to make that a true statement, so be it. That's it. The rest of the URL specification can stand as is. What this means operationally is that there are two terms, URIs and URLs. URIs would be of a legacy, academic topic that may be of relevance to some (primarily back-end server) applications. URLs are most people, and most applications, will be concerned with. This includes all the specifications which today reference IRIs (as an example, RFC 4287, namely, Atom). My sense was that all of the people I talked to were generally OK with this, and that we would be likely to see statements from both the IETF and the W3C TAG along these lines mid November-ish, most likely just after IETF meeting 91. More specifically, if something along these lines I describe above were done, the IETF would be open to the idea of errata to RFC3987 and updating specs to reference URLs. - Sam Ruby [1] http://intertwingly.net/projects/pegurl/url.html [2] https://www.ietf.org/meeting/91/index.html [3] https://url.spec.whatwg.org/#relative-scheme
Re: [whatwg] allow in body + DOM position as a rendering hint
On Sat, 01 Nov 2014 02:34:42 +0200, Ilya Grigorik wrote: Before we get into the pros and cons of "scoped", I think it's important to highlight that in body is already a fact of life: 1) developers already put tags in body, specs be damned. 2) all browsers support tags in body because of #1. Given the above conditions, the spec is out of sync with reality and I think it's worth considering updating the spec to reflect this? Doing so would also allow the browsers to convert this case from an error condition into an optimization - e.g. we can treat position as a hint to optimize rendering. I think this line of reasoning is missing one consideration, namely the negative effect of using or
Re: [whatwg] HTML has no definition / automated test suite
On Sat, Nov 1, 2014 at 7:18 AM, Stefan Reich wrote: > Hi WhatWG and friends! > > I am currently making an AI to create HTML. In the process, I discovered a > logical problem: HTML is not clearly defined. Not as far as I know anyway. > > A proper definition of HTML would include collections of sample HTML source > plus IMAGES of how they look rendered. > > That, my AI could work with. > > Also, I think this is very important to have - I vividly remember all those > years of fighting with browser inconsistencies and the very undefinedness I > am talking about that still exists. (More on my blog at tinybrain.de). > > Q: Does such a test suite for HTML exist? If not, it is time to create that. > > Alternatively, what one could create is a virtualized browser. My AI could > also learn from that, basically. But a virtualized browser is a complicated > piece of software, and there is no proper infrastructure for virtual > programs yet (another lack in IT today). > > So I assume a test suite of sources + images will be easier to make right > now. > > Let's define HTML properly! The behavior of HTML is well-defined; where it's not, it's a bug, and reporting it would be appreciated. You're talking about rendering, which is the domain of CSS. CSS should also be reasonably well-defined. Examples, including example renderings, are often useful for understanding, but they're never part of an actual definition. They just help the reader visualize something quickly, rather than requiring them to understand it all from the code. ~TJ
[whatwg] HTML has no definition / automated test suite
Hi WhatWG and friends! I am currently making an AI to create HTML. In the process, I discovered a logical problem: HTML is not clearly defined. Not as far as I know anyway. A proper definition of HTML would include collections of sample HTML source plus IMAGES of how they look rendered. That, my AI could work with. Also, I think this is very important to have - I vividly remember all those years of fighting with browser inconsistencies and the very undefinedness I am talking about that still exists. (More on my blog at tinybrain.de). Q: Does such a test suite for HTML exist? If not, it is time to create that. Alternatively, what one could create is a virtualized browser. My AI could also learn from that, basically. But a virtualized browser is a complicated piece of software, and there is no proper infrastructure for virtual programs yet (another lack in IT today). So I assume a test suite of sources + images will be easier to make right now. Let's define HTML properly! Cheers from Hamburg, Stefan
Re: [whatwg] [url] Feedback from TPAC
On Sat, Nov 1, 2014 at 1:29 PM, Sam Ruby wrote: > https://tools.ietf.org/html/draft-ietf-appsawg-uri-scheme-reg-04 I don't know how it should change. I just said that it doesn't say that you cannot invent new hierarchical URIs (to use IETF terms). As long as the IETF keeps building on top of RFC 3986, the mismatch will continue. >> I just gave you one, "%"... E.g. "http://example.org/?%"; does not have >> an RFC 3986 representation. > > Here's the output of a URI parser: > > $ ruby -r addressable/uri -e "p > Addressable::URI.parse('http://example.org/?%').query" > "%" That's a bug in that parser, then. (Assuming it meant to conform to RFC 3986.) Not sure how that helps. -- https://annevankesteren.nl/
Re: [whatwg] [url] Feedback from TPAC
On 11/1/14 7:56 AM, Anne van Kesteren wrote: On Sat, Nov 1, 2014 at 12:38 PM, Sam Ruby wrote: On 11/1/14 5:29 AM, Anne van Kesteren wrote: It doesn't say that. (We should perhaps try to find some way to make "{scheme}://" syntax work for schemes that are not problematic (e.g. javascript would be problematic). Convincing implementers that it's worth implementing might be trickier.) How should it change? Not sure what you're referring to. https://tools.ietf.org/html/draft-ietf-appsawg-uri-scheme-reg-04 I just gave you one, "%"... E.g. "http://example.org/?%"; does not have an RFC 3986 representation. Here's the output of a URL parser (the one I chose was Firefox): new URL("http://example.com/?%";).search "?%" Here's the output of a URI parser: $ ruby -r addressable/uri -e "p Addressable::URI.parse('http://example.org/?%').query" "%" I also assert that such a URL round-trips a URL parse/serialize sequence. - Sam Ruby
Re: [whatwg] [url] Feedback from TPAC
On Sat, Nov 1, 2014 at 12:38 PM, Sam Ruby wrote: > On 11/1/14 5:29 AM, Anne van Kesteren wrote: >> It doesn't say that. (We should perhaps try to find some way to make >> "{scheme}://" syntax work for schemes that are not problematic (e.g. >> javascript would be problematic). Convincing implementers that it's >> worth implementing might be trickier.) > > How should it change? Not sure what you're referring to. > Acknowledging that other parsers exist is quite a different statement than > requiring two parsers. I'm only suggesting the former. We haven't done that for other formats. And it doesn't help with convergence. > That may not be as we would wish it to be. But it would be a disservice to > everyone to document how we would wish things to be rather than how they > actually are (and, by all indications, are likely to remain for the > foreseeable future). This contradicts with most WHATWG work. WHATWG standards describe how things should be, taking into account the realities of deployed content. That is not to say that documenting how things actually are is not worthwhile, it's just not what we do. We describe something that hopefully leads to convergence between implementations. That way developers five to ten years or so from now, no longer have to paper over the differences. > I do plan to work with others to figure out the delta. As to the data > models, at the present time -- and without having actually done the > necessary analysis -- I am not aware of a single case where they would be > different. I just gave you one, "%"... E.g. "http://example.org/?%"; does not have an RFC 3986 representation. -- https://annevankesteren.nl/
Re: [whatwg] [url] Feedback from TPAC
On 11/1/14 5:29 AM, Anne van Kesteren wrote: On Sat, Nov 1, 2014 at 1:01 AM, Sam Ruby wrote: Meanwhile, The IETF is actively working on a update: https://tools.ietf.org/html/draft-ietf-appsawg-uri-scheme-reg-04 They are meeting F2F in a little over a week. URIs in general, and this proposal in specific will be discussed, and for that reason now would be a good time to provide feedback. I've only quickly scanned it, but it appears sane to me in that it basically says that new schemes will not be viewed as relative schemes. It doesn't say that. (We should perhaps try to find some way to make "{scheme}://" syntax work for schemes that are not problematic (e.g. javascript would be problematic). Convincing implementers that it's worth implementing might be trickier.) How should it change? 1) Change the URL Goals to only obsolete RFC 3987, not RFC 3986 too. See previous threads on the subject. The data models are incompatible, at least around "%", likely also around other code points. It also seems unacceptable to require two parsers for URLs. Acknowledging that other parsers exist is quite a different statement than requiring two parsers. I'm only suggesting the former. As a concrete statement, a compliant implementation of HTML would require a URL parser, but not a URI parser. Also as a concrete statement, such a user agent will interact, primarily via the network, with other software that will interpret the canonicalized URL's as if they were URIs. That may not be as we would wish it to be. But it would be a disservice to everyone to document how we would wish things to be rather than how they actually are (and, by all indications, are likely to remain for the foreseeable future). 3) Explicitly state that canonical URLs (i.e., the output of the URL parse step) not only round trip but also are valid URIs. If there are any RFC 3986 errata and/or willful violations necessary to make that a true statement, so be it. It might be interesting to figure out the delta. But there are major differences between RFC 3986 and URL. Not obsoleting the former seems like a disservice to anyone looking to implement a parser or find information on URI/URL. I do plan to work with others to figure out the delta. As to the data models, at the present time -- and without having actually done the necessary analysis -- I am not aware of a single case where they would be different. Undoubtedly we will be able to quickly find some, but even so, I would assert that they following statements will remain true for the domain of canonicalized URLs, by which I mean the set of possible outputs of the URL serializer: 1) the overlap is substantial, and I would dare say overwhelming. 2) RFC 3986 and URL compliant parsers would interpret the same bytes in such outputs as delimiters, schemes, paths, fragments, etc. 3) as to data models, the URL Standard is silent as to how such bytes be interpreted. As to the meaning of '%', both the URL Standard and RFC3986 recognize that encodings other than utf-8 exist, and that such will affect the interpretation of percent encoded byte sequences. - Sam Ruby
Re: [whatwg] [url] Feedback from TPAC
On Sat, Nov 1, 2014 at 1:01 AM, Sam Ruby wrote: > Meanwhile, The IETF is actively working on a update: > > https://tools.ietf.org/html/draft-ietf-appsawg-uri-scheme-reg-04 > > They are meeting F2F in a little over a week. URIs in general, and this > proposal in specific will be discussed, and for that reason now would be a > good time to provide feedback. I've only quickly scanned it, but it appears > sane to me in that it basically says that new schemes will not be viewed as > relative schemes. It doesn't say that. (We should perhaps try to find some way to make "{scheme}://" syntax work for schemes that are not problematic (e.g. javascript would be problematic). Convincing implementers that it's worth implementing might be trickier.) > 1) Change the URL Goals to only obsolete RFC 3987, not RFC 3986 too. See previous threads on the subject. The data models are incompatible, at least around "%", likely also around other code points. It also seems unacceptable to require two parsers for URLs. > 3) Explicitly state that canonical URLs (i.e., the output of the URL parse > step) not only round trip but also are valid URIs. If there are any RFC > 3986 errata and/or willful violations necessary to make that a true > statement, so be it. It might be interesting to figure out the delta. But there are major differences between RFC 3986 and URL. Not obsoleting the former seems like a disservice to anyone looking to implement a parser or find information on URI/URL. -- https://annevankesteren.nl/