Re: [whatwg] Unicode - ASCII copy/paste fallback
On Sun, Feb 15, 2015 at 3:16 AM, Glenn Maynard gl...@zewt.org wrote: On Sat, Feb 14, 2015 at 12:34 PM, David Sheets kosmo...@gmail.com wrote: I am writing a documentation generation tool for a programming language with right arrows represented as - but would like to render them as →. Programmers are used to writing in ASCII and reading typeset mathematics. If I present documentation to them via a purpose-built document browser, I should give them the option (at the generation/styling stage) of making those documents as pleasing as possible. Programmers a decade or two ago, maybe, but not today. As a programmer, if I see → on a page, select it and copy it, I expect to copy →, just as I selected. This sounds like something browsers should actively discourage. If you're reading documentation which includes types, it's nice to see implication arrows but copy valid syntax. Programming communities which use types or other formal methods commonly typeset their own documents with mathematical notation. For practical reasons, they define their language representations using ASCII. If you have nothing more useful to discuss beyond uninformed, opinionated naysaying, I'll be leaving this thread lie. -- Glenn Maynard
Re: [whatwg] Unicode - ASCII copy/paste fallback
On Fri, Feb 13, 2015 at 11:11 PM, Glenn Maynard gl...@zewt.org wrote: On Fri, Feb 13, 2015 at 9:02 AM, Glenn Maynard gl...@zewt.org wrote: Copying ASCII isn't desirable. It should copy the Unicode string a → b. After all, that's what gets copied if you had done spana → b/span in the first place. (Oh, I missed the obvious--the - from Firefox is coming from the HTML, of course.) I guess what you're after is being able to have separate text for display vs. copy. I'm sure you don't actually want to use a hacky custom font. What's the actual use case? In general I think browsers should always copy just what the user selected, and not let pages cause something other than that to be copied, since things like that are generally abused (eg. inserting linkback ads to copied text). I am writing a documentation generation tool for a programming language with right arrows represented as - but would like to render them as →. Programmers are used to writing in ASCII and reading typeset mathematics. If I present documentation to them via a purpose-built document browser, I should give them the option (at the generation/styling stage) of making those documents as pleasing as possible. A ligature font is the closest thing I've seen so far to being semantically accurate and degrades gracefully. It's not quite totally accurate as I already have a font and it already has all the glyphs I'd like to use. In this way, the font is a level too low at character rendering rather than glyph selection. Fortunately, the surrounding font doesn't really matter so the ligature font can be made to fit quite well. Unfortunately, I can imagine much simpler expressions of the behavior I'm after that don't involve talking about vectorized graphics and transmitting kilobytes of typeface description. I think the current behavior of CSS content not being copied makes some sense. It is *stylistic* content after all... I'm a bit disappointed that vendors can't seem to agree on what content is hidden as Boris has said. -- Glenn Maynard
Re: [whatwg] Unicode - ASCII copy/paste fallback
On Fri, Feb 13, 2015 at 10:23 PM, Ashley Gullen ash...@scirra.com wrote: Why is it desirable to copy ASCII versions of unicode text? Doesn't most software now support unicode so the user can copy and paste what they see, rather than some ASCII-art equivalent? I am writing a documentation generation tool for a programming language with right arrows represented as - but would like to render them as →. On 13 February 2015 at 15:45, Boris Zbarsky bzbar...@mit.edu wrote: On 2/13/15 10:15 AM, David Sheets wrote: I suppose currently Chrome is preventing copying hidden content but Firefox is not and neither picks up the CSS content. Both prevent copying hidden content, but may not have identical definitions of hidden. Neither picks up CSS generated content, because both represent selections in terms of DOM ranges, and DOM ranges can't represent CSS generated content... -Boris
[whatwg] Unicode - ASCII copy/paste fallback
Hello, I have a page with a span class=rarrspan-gt;/span/span b and style .rarr span { overflow: hidden; height: 0; width: 0; display: inline-block; } .rarr::after { content: →; } (That's RIGHTWARDS ARROW x2192.) In Firefox 36, this copies and pastes like a - b which is the desired behavior. In Chrome 40, this copies and pastes like a b. Is my desired behavior (to show unicode but copy an ASCII representation) generally possible? Are there specs somewhere about copy/paste behavior? I looked in https://html.spec.whatwg.org/ but found nothing relevant. Is this the right venue for this question? Should I take it somewhere else? Thanks, David Sheets
Re: [whatwg] Unicode - ASCII copy/paste fallback
On Fri, Feb 13, 2015 at 1:08 PM, Nils Dagsson Moskopp n...@dieweltistgarnichtso.net wrote: David Sheets kosmo...@gmail.com writes: On Fri, Feb 13, 2015 at 12:18 PM, James M. Greene james.m.gre...@gmail.com wrote: In this case, you can use Unicode escape values by preceding them with a slash: .rarr:after { content: \2192; } This is specified in the CSS 2.1 spec: http://www.w3.org/TR/CSS2/syndata.html#characters Personally, I probably would've just started on StackOverflow with this question (e.g. [1]) but no harm done. Hi James! Sorry, I wasn't clear. The issue is not with putting Unicode values into CSS. The issue is that I would like unicode values to be copied and pasted as a specific ASCII fallback value. That is, I would like the equivalent of a rarr; b to appear on a page but, upon copying, a - b to show up in the clipboard. I have a solution that works in Firefox 36 (described in original mail). Chrome 40 does not behave similarly. I can see some arguments for Chrome's behavior along security lines. I certainly can understand the utility of Firefox's behavior because I am writing a documentation generation tool for a programming language with right arrows represented as - but would like to render them as →. I would suggest to use OpenType ligatures for that. You could reasonably create a ligature font that renders any occurence of “-” as “→”. This is a really brilliant solution that satisfies my use case perfectly. I created the following (horrible) font that works as expected. data:application/font-woff;base64,d09GRk9UVE8AAAXQAA0ACAwAAQBDRkYgAAABMY4AAAGdFwHC50ZGVE0AAALAHBxuRRqXR0RFRgAAAtwqNABJADhHUE9TAAADCCAgRHZMdUdTVUIAAAMoQFCxBbQIT1MvMgAAA2gAAABHYFYhYsBjbWFwAAADsEMAAAFKAmIC1WhlYWQAAAP0MDYFFMPmaGhlYQAABCQdJAZsBDtobXR4AAAERBQUEegCzm1heHRYBgYABVAAbmFtZQAABGFZAAACi0q47Qlwb3N0AAAFvBMg/4MAM3icRY9BSxtRFIXvSzINhuGlCZlGwjSORGgL6cRKcSFmY0gQYtMoVupKDHnJhNYoydMacOdCcewu0EV2pf8gUOlm6EY63ZdCV8WV/gG9z7xAjF2kZ3XOtzjwEfD5gBASeFPnNf6elV8A8QCBJREE8ZiIuEdMeEXUF7/wfBmo3vjy54HqiwdgbG8tZdujovqRfu+N3ZqKDn+DOsBDHa5COoR1YodBuX8MQhTWoQTvrNaOxeob1Qbb5KxR2Nxir5g5k/k3y0apZZSbFmO8mUwmjQ81bhm57TrPbTeqzJgxp42nFuc7c6lUZUgr99RsVsw6489GBv9VhqEwQY7IMSiETM2+3u30Q7Yjig5xHHQdr/NImOJr33zg9A81UUS3X/RT/IYJgpNiUZOJ2wNMDMm56GqYk0wW5fJlCn/hmezgb4ViFylBKp5rtqRi/GQh7WYxg5nTH3IVm9kCKjKs0Bt0MUbwpShoH2VMuHb6yc8szuOK/Ue+xf18/lpGFNprR3ohzVYDmAgg/dRW1Ttgr6BKAAABAMw9os8A0QO3ygDRA7/9eJwlibENACAMwxzamSsY+f8+XKHI9hACbDlcvSiiY8u2i8wD+TwTFwCCAQoAHAAeAAFERkxUAAgABAD//wAAAHicY2BkYGDgYpBj0GFgdHHzCWHgYGABijD8/88AkmHMyUxPBIoxQHhAORYwzQHEQlCahYGZgQkIGUEQACkoBXB4nGNgYWFg/MLAysDANJPpDAMDQz+EZnzNYMzICRRlYGNmgAFGBiQQkOaawnCAQZfBjtn4vzFDDNOs/6dR1CgAIRMAc0gMkAB4nGNgYGBmgGAZBkYGEHAB8hjBfBYGDSDNBqQZGZgYdBns/v8H8sH0/yv/j0DVAwEjGwOCQwzAUMxEiu5BCQAKpAk1AHicY2BkYGAA4hCza8vj+W2+MnCzMIDAReYDvHC6isGUuZRpFpDLwcAEEgUABpUIx3icY2BkYGCaxWDKEMPCAALMpQyMDKiAFQAsSwGxAgQAAI0EAAEnBAAAegPoAKAAAFUAAHichZDNSsNAFIXP9A8KItInmI1QIU0nKd1kaaGI4NLuWzJpAjUpyZTSrYgrn8VXcO3atWufwJ0LT6ZjQQSbYe797uHOmTsBcIpnCOy/Szw6Fuji3XEDHXw6buJcXDluoSvuHbdxJn58OtTf2ClaXVYP9lTNAj28Om7gBB+Om7jGl+MWeiJ33IYUT4471F8wQQmNOQxjDIkFdowxKqRUNPUKnl0SW2SsU9IUBXJynUss2ScRwodi7rPDcK0RYciVuN7k0OvTM2HMrf8FMCn13OhYLnYyrlKtTeV5ntxmJpXTIjfTolxqGfpK9lNj1tFwmFBNatWvEj/Xhh639pJ6wJV9SkApN5lZ6Zh4Y7UMG9yx0HG2Yf7vFRH3X8u9HmCEATvrrViNafVrzEgeriYHo0E4CFUwPjbkjFrJf5PZuSS9a3ff5nomzHRZZUUulQp8pZQ8YvgNzG5wnHicY2BmAIP/DQzGDFgAACgUAbYA The browser inconsistency in the original case still stands, though. Is there a spec covering copy and paste? David -- Nils Dagsson Moskopp // erlehmann http://dieweltistgarnichtso.net
Re: [whatwg] Unicode - ASCII copy/paste fallback
On Fri, Feb 13, 2015 at 3:02 PM, Glenn Maynard gl...@zewt.org wrote: On Fri, Feb 13, 2015 at 5:45 AM, David Sheets kosmo...@gmail.com wrote: Hello, I have a page with a span class=rarrspan-gt;/span/span b and style .rarr span { overflow: hidden; height: 0; width: 0; display: inline-block; } .rarr::after { content: →; } (That's RIGHTWARDS ARROW x2192.) In Firefox 36, this copies and pastes like a - b which is the desired behavior. In Chrome 40, this copies and pastes like a b. Is my desired behavior (to show unicode but copy an ASCII representation) generally possible? Are there specs somewhere about copy/paste behavior? I looked in https://html.spec.whatwg.org/ but found nothing relevant. Copying ASCII isn't desirable. It should copy the Unicode string a → b. After all, that's what gets copied if you had done spana → b/span in the first place. (Chrome's issue isn't related to Unicode. It just doesn't know how to select text that's inside CSS content, so it isn't included in the copy.) The only relation this issue has to Unicode is a use case for alternate copy/paste behavior. Judging from the replies to my original inquiry, either Firefox or Chrome is doing something unexpected or both are behaving unexpectedly (and should put the unicode arrow on the clipboard). I'm not sure if all use cases for my original trick can be covered by using OpenType ligatures (thanks, Nils!) or if there are other 'alternative clipboard behavior' applications. Certainly, the most consistent behavior would be for both Chrome and Firefox (and other browsers that I haven't/don't care to test) to put the CSS content on the clipboard and ignore hidden content. I suppose currently Chrome is preventing copying hidden content but Firefox is not and neither picks up the CSS content. David
Re: [whatwg] Unicode - ASCII copy/paste fallback
On Fri, Feb 13, 2015 at 12:18 PM, James M. Greene james.m.gre...@gmail.com wrote: In this case, you can use Unicode escape values by preceding them with a slash: .rarr:after { content: \2192; } This is specified in the CSS 2.1 spec: http://www.w3.org/TR/CSS2/syndata.html#characters Personally, I probably would've just started on StackOverflow with this question (e.g. [1]) but no harm done. Hi James! Sorry, I wasn't clear. The issue is not with putting Unicode values into CSS. The issue is that I would like unicode values to be copied and pasted as a specific ASCII fallback value. That is, I would like the equivalent of a rarr; b to appear on a page but, upon copying, a - b to show up in the clipboard. I have a solution that works in Firefox 36 (described in original mail). Chrome 40 does not behave similarly. I can see some arguments for Chrome's behavior along security lines. I certainly can understand the utility of Firefox's behavior because I am writing a documentation generation tool for a programming language with right arrows represented as - but would like to render them as →. This seems like a pretty straightforward document feature but I can't seem to get interoperable behavior (or even find where such behavior might be specified). Thanks, David [1]: http://stackoverflow.com/questions/10393462/placing-unicode-character-in-css-content-value Sincerely, James Greene On Fri, Feb 13, 2015 at 5:45 AM, David Sheets kosmo...@gmail.com wrote: Hello, I have a page with a span class=rarrspan-gt;/span/span b and style .rarr span { overflow: hidden; height: 0; width: 0; display: inline-block; } .rarr::after { content: →; } (That's RIGHTWARDS ARROW x2192.) In Firefox 36, this copies and pastes like a - b which is the desired behavior. In Chrome 40, this copies and pastes like a b. Is my desired behavior (to show unicode but copy an ASCII representation) generally possible? Are there specs somewhere about copy/paste behavior? I looked in https://html.spec.whatwg.org/ but found nothing relevant. Is this the right venue for this question? Should I take it somewhere else? Thanks, David Sheets
Re: [whatwg] Unicode - ASCII copy/paste fallback
On Fri, Feb 13, 2015 at 12:23 PM, Mathias Bynens mathi...@opera.com wrote: On Fri, Feb 13, 2015 at 1:18 PM, James M. Greene james.m.gre...@gmail.com wrote: In this case, you can use Unicode escape values by preceding them with a slash: OP’s question wasn’t about how to escape non-ASCII characters, but rather about what the copy/paste behavior should be in browsers. @David, I don’t think it’s reasonable to expect non-ASCII characters to be transliterated to ASCII characters copying them. That said, it would be nice to standardize on the behavior here: should generated content be included when copying or not? Hi Mathias, Do you mean that it's not reasonable for transliteration to happen automatically? I agree. Do you mean that it's not reasonable to support specific replacements during copying? Firefox seems to support this currently (and perfectly). There are user trickery concerns here but, at least in my case, I think codepoint - 2 byte replacement is probably safe... Thanks, David
Re: [whatwg] New URL Standard
On Mon, Sep 24, 2012 at 9:18 PM, Ian Hickson i...@hixie.ch wrote: This is Anne's spec, so I'll let him give more canonical answers, but: On Mon, 24 Sep 2012, David Sheets wrote: Your conforming WHATWG-URL syntax will have production rule alphabets which are supersets of the alphabets in RFC3986. Not necessarily, but that's certainly possible. Personally I would recommend that we not change the definition of what is conforming from the current RFC3986/RFC3987 rules, except to the extent that the character encoding affects it (as per the HTML standard today). http://whatwg.org/html#valid-url I believe the '#' character in the fragment identifier qualifies. This is what I propose you define and it does not necessarily have to be in BNF (though a production rule language of some sort probably isn't a bad idea). We should definitely define what is a conforming URL, yes (either directly, or by reference to the RFCs, as HTML does now). Whether prose or a structured language is the better way to go depends on what the conformance rules are -- HTML is a good example here: it has parts that are defined in terms of prose (e.g. the HTML syntax as a whole), and other parts that are defined in terms of BNF (e.g. constraints on the conetnts of script elements in certain situations). It's up to Anne. HTML is far larger and more compositional than URI. I am confident that, no matter what is specified in the WHATWG New URL Standard, a formal language exists which can describe the structure of conforming identifiers. If no such formal language can be described, the syntax specification is likely to be incomplete or unsound. How will WHATWG-URLs which use the syntax extended from RFC3986 map into RFC3986 URI references for systems that only support those? The same way that those systems handle invalid URLs today, I would assume. Do you have any concrete systems in mind here? It would be good to add them to the list of systems that we test. (For what it's worth, in practice, I've never found software that exactly followed RFC3986 and also rejected any non-conforming strings. There are just too many invalid URLs out there for that to be a viable implementation strategy.) It is not the rejection of incoming nonconforming reference identifiers that causes issues but rather the emission of strictly conforming identifiers by Postel's Law (Robustness Principle). I know of several URI implementations that, given a nonconforming reference identifier, will only output conforming identifiers. Indeed, the standard under discussion will behave in exactly this way. This leads to loss of information in chains of URI processors that can and will change the meaning of identifiers. I remember when I was testing this years ago, when doing the first pass on attempting to fix this, that I found that some less widely tested software, e.g. wget(1), did not handle URLs in the same manner as more widely tested software, e.g. IE, with the result being that Web pages were not handled interoperably between these two software classes. This is the kind of thing we want to stop, by providing a single way to parse all input strings, valid or invalid, as URLs. Was wget in violation of the RFC? Was IE more lenient? If every string, valid or invalid, is parseable as a URI reference, is there an algorithm to accurately extract URIs from plain text? -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] New URL Standard
On Tue, Sep 25, 2012 at 8:03 AM, Anne van Kesteren ann...@annevk.nl wrote: On Tue, Sep 25, 2012 at 6:18 AM, Ian Hickson i...@hixie.ch wrote: Not necessarily, but that's certainly possible. Personally I would recommend that we not change the definition of what is conforming from the current RFC3986/RFC3987 rules, except to the extent that the character encoding affects it (as per the HTML standard today). http://whatwg.org/html#valid-url FWIW, given that browsers happily do requests to servers with characters in the URL that are invalid per the RFC (they are not URL escaped) and servers handle them fine I think we should make the syntax more lenient. E.g. allowing [ and ] in the path and query component is fine I think. I believe this would introduce ambiguity for parsing URI references. Is [::1] an authority reference or a path segment reference? As for the question about why not build this on top of RFC 3986. That does not handle non-ASCII code points. RFC 3987 does, but is not a suitable start either. As shown in http://url.spec.whatwg.org/ it is quite trivial to combine parsing, resolving, and canonicalizing into a single algorithm (and deal with URI/IRI, now URL, as one). Composition is often trivial but unenlightening. There is necessarily less information in a partially evaluated function composition than in the functions in isolation. Defining a formal language accurately and in a broadly understandable manner is nontrivial. Your task is nontrivial. Trying to somehow patch the language in RFC 3987 to deal with the encoding problems for the query component, to deal with parsing http:example.org when there is a base URL with the same scheme versus when there isn't, etc. is way more of a hassle I think, though I am happy to be proven wrong. I believe the encoding problems are handled by a normalization algorithm and parsing relative references is handled by the base scheme module. What is the acceptable trade-off between (y)our hassle and the time of technologists in the coming decades? Will you make it easier or harder for them to reconcile WHATWG-URL and Internet Standard 66 (RFC 3986)? -- http://annevankesteren.nl/
Re: [whatwg] New URL Standard
On Tue, Sep 25, 2012 at 2:13 PM, Glenn Maynard gl...@zewt.org wrote: On Mon, Sep 24, 2012 at 7:18 PM, David Sheets kosmo...@gmail.com wrote: Always. The appropriate interface is (string * string?) list. Id est, an association list of keys and nullable values (null is key-without-value and empty string is empty-value). If you prefer to not use a nullable value and don't like tuple representations in JS, you could use type: string list list i.e. [[key_without_value],[],[key,value],[],[numbers,1,2,3,4],[,],[,,]] This isn't an appropriate interface. It's terrible for 99.9% of use cases, where you really want dictionary-like access. This is the direct representation of the query string key-value convention. Looking up keys is easy in an association list. Filtering the list retains ordering. Appending to the list is well-defined. Folding into a dictionary is trivial and key merging can be defined according to the author's URL convention. The right approach is probably to expose the results in an object-like form, as Tab suggests, but to store the state internally in a list-like format, with modifications defined in terms of mutations to the list. This sounds more complicated to implement while maintaining invariants. A dictionary with an associated total order is an association list. That is, parsing a=1b=2a=3 would result in an internal representation like [('a', '1'), ('b', '2'), ('a', '3')]. When viewed from script, you see {a: ['1', '3'], 'b': ['2']}. If you serialize it right back to a URL the internal representation is unchanged, so the original order is preserved. The mutation algorithms can then do their best to preserve the list as reasonably as they can (eg. assigning query.a = ['5', '6'] would remove all 'a' keys, then insert items at the location of the first removed item, or append if there were none). Why hide the order? Is this not already supported by creating a new URL which contains only a relative query part? Like: query = new URL(?a=bc=d); query.query[a] = x; query.toString() == ?a=xc=d; Why is a new interface necessary? That won't work, since ?a=bc=d isn't a valid URL. ?a=bc=d is a valid URI reference. @href=?a=bc=d is valid. The invalid flag will be set, so the change to .query will be a no-op, and .href (presumably what toString will invoke) would return the original URL, ?a=bc=d, not ?a=xc=d. You'd need to do something like: var query = new URL(http://example.com?; + url.hash); query.query.a = x; url.hash = query.search.slice(1); // remove the leading ? That's awkward, but maybe it's good enough. This is a use case for parsing without composed relative resolution.
Re: [whatwg] New URL Standard
On Mon, Sep 24, 2012 at 2:34 AM, Anne van Kesteren ann...@annevk.nl wrote: On Sat, Sep 22, 2012 at 9:10 AM, Alexandre Morgaut alexandre.morg...@4d.com wrote: Shouldn't this document have references on some of the URL related RFCs: The plan is to obsolete the RFCs. But yes, I will add some references in the Goals section most likely. Similar to what has been done in the DOM Standard. Is there an issue with defining WHATWG-URL syntax as a grammar extension to the URI syntax in RFC3986? How about splitting the definition of the parsing algorithm into a canonicalization algorithm and a separate parser for the extended syntax? The type would be string - string with the codomain as a valid, unique WHATWG-URL serialization. Implementations/IDL could provide only the composition of canonicalization and parsing but humans trying to understand the semantics of the present algorithm would be aided by having these phases explicitly defined. Will any means be provided to map WHATWG-URL to Internet Standard RFC3986-URI? Is interoperability with the deployed base of URL consumers a goal? How will those URLs in the extended syntax be mapped into standard URIs? Will they be unrepresentable? Thanks, David Sheets
Re: [whatwg] New URL Standard
On Mon, Sep 24, 2012 at 4:07 PM, Glenn Maynard gl...@zewt.org wrote: On Mon, Sep 24, 2012 at 12:30 PM, Tab Atkins Jr. jackalm...@gmail.comwrote: I suggest just making it a map from String-[String]. You probably want a little bit of magic - if the setter receives an array, replace the current value with it; anything else, stringify then wrap in an array and replace the current value. The getter should return an empty array for non-existing params. You should be able to set .query itself with an object, which empties out the map and then runs the setter over all the items. Bam, every single methods is now obsolete. When should this API guarantee that it round-trips URLs cleanly (aside from quoting differences)? For example, maintaining order in a=1b=2a=1, and representing things like a=1b (no '=') and ab (no key at all). Always. The appropriate interface is (string * string?) list. Id est, an association list of keys and nullable values (null is key-without-value and empty string is empty-value). If you prefer to not use a nullable value and don't like tuple representations in JS, you could use type: string list list i.e. [[key_without_value],[],[key,value],[],[numbers,1,2,3,4],[,],[,,]] becomes ?key_without_valuekey=valuenumbers=1,2,3,4==, where I've assumed that values after the second are concatenated with commas (but it could be semicolons or some other separator). Unfortunately, JavaScript does not have any lightweight product types so a decision like this is necessary. Not round-tripping URLs might have annoying side-effects, like trying to use history.replaceState to replace the path portion of the URL, and unexpectedly having the query part of the URL get shuffled around or changed in other ways. That would be unacceptably broken. Maybe it could guarantee that the query round-trips only if the value is never modified (only assigned via the ctor or assigning to href), but once you modify the query, the order becomes normalized and any other non-round-trip side effects happen. Why can't as much information as possible be preserved? There exist many URI manipulation libraries that support maximal preservation. By the way, it would also be nice for the query part of this API to be usable in isolation. I often put query-like strings in the hash, resulting in URLs like http://example.com/server/side/path?server-side-query=1#client/side/path?client-side-query=1;, and it would be nice to be able to work with both of these with the same interface. That is, query = new URLQuery(a=bc=d); query[a] = x; query.toString() == a=xc=d; Is this not already supported by creating a new URL which contains only a relative query part? Like: query = new URL(?a=bc=d); query.query[a] = x; query.toString() == ?a=xc=d; Why is a new interface necessary? -- Glenn Maynard
Re: [whatwg] New URL Standard
On Mon, Sep 24, 2012 at 5:23 PM, Ian Hickson i...@hixie.ch wrote: On Mon, 24 Sep 2012, David Sheets wrote: Is there an issue with defining WHATWG-URL syntax as a grammar extension to the URI syntax in RFC3986? In general, BNF isn't very useful for defining the parsing rules when you also need to handle non-conforming content in a correct manner. Really it is only useful for saying whether or not content is conforming. Your conforming WHATWG-URL syntax will have production rule alphabets which are supersets of the alphabets in RFC3986. This is what I propose you define and it does not necessarily have to be in BNF (though a production rule language of some sort probably isn't a bad idea). If you read my mail carefully, you will notice that I address the non-conforming identifier case in the initial canonicalization algorithm. This normalization step is separate from the syntax of conforming WHATWG-URLs and would define how non-conforming strings are interpreted as conforming strings. The parsing algorithm then provides a map from these strings into a data structure. Error recovery and extended syntax for conforming representations are orthogonal. How will WHATWG-URLs which use the syntax extended from RFC3986 map into RFC3986 URI references for systems that only support those?