Re: [whatwg] Unicode - ASCII copy/paste fallback

2015-02-15 Thread David Sheets
On Sun, Feb 15, 2015 at 3:16 AM, Glenn Maynard gl...@zewt.org wrote:
 On Sat, Feb 14, 2015 at 12:34 PM, David Sheets kosmo...@gmail.com wrote:

 I am writing a documentation generation tool for a programming
 language with right arrows represented as - but would like to render
 them as →. Programmers are used to writing in ASCII and reading
 typeset mathematics. If I present documentation to them via a
 purpose-built document browser, I should give them the option (at the
 generation/styling stage) of making those documents as pleasing as
 possible.


 Programmers a decade or two ago, maybe, but not today.

 As a programmer, if I see → on a page, select it and copy it, I expect to
 copy →, just as I selected.  This sounds like something browsers should
 actively discourage.

If you're reading documentation which includes types, it's nice to see
implication arrows but copy valid syntax.

Programming communities which use types or other formal methods
commonly typeset their own documents with mathematical notation. For
practical reasons, they define their language representations using
ASCII.

If you have nothing more useful to discuss beyond uninformed,
opinionated naysaying, I'll be leaving this thread lie.

 --
 Glenn Maynard



Re: [whatwg] Unicode - ASCII copy/paste fallback

2015-02-14 Thread David Sheets
On Fri, Feb 13, 2015 at 11:11 PM, Glenn Maynard gl...@zewt.org wrote:
 On Fri, Feb 13, 2015 at 9:02 AM, Glenn Maynard gl...@zewt.org wrote:

 Copying ASCII isn't desirable.  It should copy the Unicode string a → b.
 After all, that's what gets copied if you had done spana → b/span in
 the first place.


 (Oh, I missed the obvious--the - from Firefox is coming from the HTML, of
 course.)

 I guess what you're after is being able to have separate text for display
 vs. copy.  I'm sure you don't actually want to use a hacky custom font.
 What's the actual use case?  In general I think browsers should always copy
 just what the user selected, and not let pages cause something other than
 that to be copied, since things like that are generally abused (eg.
 inserting linkback ads to copied text).

I am writing a documentation generation tool for a programming
language with right arrows represented as - but would like to render
them as →. Programmers are used to writing in ASCII and reading
typeset mathematics. If I present documentation to them via a
purpose-built document browser, I should give them the option (at the
generation/styling stage) of making those documents as pleasing as
possible.

A ligature font is the closest thing I've seen so far to being
semantically accurate and degrades gracefully.

It's not quite totally accurate as I already have a font and it
already has all the glyphs I'd like to use. In this way, the font is a
level too low at character rendering rather than glyph selection.
Fortunately, the surrounding font doesn't really matter so the
ligature font can be made to fit quite well. Unfortunately, I can
imagine much simpler expressions of the behavior I'm after that don't
involve talking about vectorized graphics and transmitting kilobytes
of typeface description.

I think the current behavior of CSS content not being copied makes
some sense. It is *stylistic* content after all... I'm a bit
disappointed that vendors can't seem to agree on what content is
hidden as Boris has said.

 --
 Glenn Maynard



Re: [whatwg] Unicode - ASCII copy/paste fallback

2015-02-14 Thread David Sheets
On Fri, Feb 13, 2015 at 10:23 PM, Ashley Gullen ash...@scirra.com wrote:
 Why is it desirable to copy ASCII versions of unicode text? Doesn't most
 software now support unicode so the user can copy and paste what they see,
 rather than some ASCII-art equivalent?

I am writing a documentation generation tool for a programming language
with right arrows represented as - but would like to render them as →.

 On 13 February 2015 at 15:45, Boris Zbarsky bzbar...@mit.edu wrote:

 On 2/13/15 10:15 AM, David Sheets wrote:

 I suppose currently Chrome is preventing copying hidden content but
 Firefox is not and neither picks up the CSS content.


 Both prevent copying hidden content, but may not have identical
 definitions of hidden.

 Neither picks up CSS generated content, because both represent selections
 in terms of DOM ranges, and DOM ranges can't represent CSS generated
 content...

 -Boris




[whatwg] Unicode - ASCII copy/paste fallback

2015-02-13 Thread David Sheets
Hello,

I have a page with

a span class=rarrspan-gt;/span/span b

and style

.rarr span { overflow: hidden; height: 0; width: 0; display: inline-block; }
.rarr::after { content: →; }

(That's RIGHTWARDS ARROW x2192.)

In Firefox 36, this copies and pastes like a - b which is the
desired behavior. In Chrome 40, this copies and pastes like a  b.

Is my desired behavior (to show unicode but copy an ASCII
representation) generally possible? Are there specs somewhere about
copy/paste behavior? I looked in https://html.spec.whatwg.org/ but
found nothing relevant.

Is this the right venue for this question? Should I take it somewhere else?

Thanks,

David Sheets


Re: [whatwg] Unicode - ASCII copy/paste fallback

2015-02-13 Thread David Sheets
On Fri, Feb 13, 2015 at 1:08 PM, Nils Dagsson Moskopp
n...@dieweltistgarnichtso.net wrote:
 David Sheets kosmo...@gmail.com writes:

 On Fri, Feb 13, 2015 at 12:18 PM, James M. Greene
 james.m.gre...@gmail.com wrote:
 In this case, you can use Unicode escape values by preceding them with a
 slash:

   .rarr:after { content: \2192; }


 This is specified in the CSS 2.1 spec:
 http://www.w3.org/TR/CSS2/syndata.html#characters

 Personally, I probably would've just started on StackOverflow with this
 question (e.g. [1]) but no harm done.

 Hi James!

 Sorry, I wasn't clear. The issue is not with putting Unicode values
 into CSS. The issue is that I would like unicode values to be copied
 and pasted as a specific ASCII fallback value.

 That is, I would like the equivalent of a rarr; b to appear on a
 page but, upon copying, a - b to show up in the clipboard.

 I have a solution that works in Firefox 36 (described in original
 mail). Chrome 40 does not behave similarly.

 I can see some arguments for Chrome's behavior along security lines. I
 certainly can understand the utility of Firefox's behavior because I
 am writing a documentation generation tool for a programming language
 with right arrows represented as - but would like to render them as
 →.

 I would suggest to use OpenType ligatures for that. You could reasonably
 create a ligature font that renders any occurence of “-” as “→”.

This is a really brilliant solution that satisfies my use case
perfectly. I created the following (horrible) font that works as
expected.

data:application/font-woff;base64,d09GRk9UVE8AAAXQAA0ACAwAAQBDRkYgAAABMY4AAAGdFwHC50ZGVE0AAALAHBxuRRqXR0RFRgAAAtwqNABJADhHUE9TAAADCCAgRHZMdUdTVUIAAAMoQFCxBbQIT1MvMgAAA2gAAABHYFYhYsBjbWFwAAADsEMAAAFKAmIC1WhlYWQAAAP0MDYFFMPmaGhlYQAABCQdJAZsBDtobXR4AAAERBQUEegCzm1heHRYBgYABVAAbmFtZQAABGFZAAACi0q47Qlwb3N0AAAFvBMg/4MAM3icRY9BSxtRFIXvSzINhuGlCZlGwjSORGgL6cRKcSFmY0gQYtMoVupKDHnJhNYoydMacOdCcewu0EV2pf8gUOlm6EY63ZdCV8WV/gG9z7xAjF2kZ3XOtzjwEfD5gBASeFPnNf6elV8A8QCBJREE8ZiIuEdMeEXUF7/wfBmo3vjy54HqiwdgbG8tZdujovqRfu+N3ZqKDn+DOsBDHa5COoR1YodBuX8MQhTWoQTvrNaOxeob1Qbb5KxR2Nxir5g5k/k3y0apZZSbFmO8mUwmjQ81bhm57TrPbTeqzJgxp42nFuc7c6lUZUgr99RsVsw6489GBv9VhqEwQY7IMSiETM2+3u30Q7Yjig5xHHQdr/NImOJr33zg9A81UUS3X/RT/IYJgpNiUZOJ2wNMDMm56GqYk0wW5fJlCn/hmezgb4ViFylBKp5rtqRi/GQh7WYxg5nTH3IVm9kCKjKs0Bt0MUbwpShoH2VMuHb6yc8szuOK/Ue+xf18/lpGFNprR3ohzVYDmAgg/dRW1Ttgr6BKAAABAMw9os8A0QO3ygDRA7/9eJwlibENACAMwxzamSsY+f8+XKHI9hACbDlcvSiiY8u2i8wD+TwTFwCCAQoAHAAeAAFERkxUAAgABAD//wAAAHicY2BkYGDgYpBj0GFgdHHzCWHgYGABijD8/88AkmHMyUxPBIoxQHhAORYwzQHEQlCahYGZgQkIGUEQACkoBXB4nGNgYWFg/MLAysDANJPpDAMDQz+EZnzNYMzICRRlYGNmgAFGBiQQkOaawnCAQZfBjtn4vzFDDNOs/6dR1CgAIRMAc0gMkAB4nGNgYGBmgGAZBkYGEHAB8hjBfBYGDSDNBqQZGZgYdBns/v8H8sH0/yv/j0DVAwEjGwOCQwzAUMxEiu5BCQAKpAk1AHicY2BkYGAA4hCza8vj+W2+MnCzMIDAReYDvHC6isGUuZRpFpDLwcAEEgUABpUIx3icY2BkYGCaxWDKEMPCAALMpQyMDKiAFQAsSwGxAgQAAI0EAAEnBAAAegPoAKAAAFUAAHichZDNSsNAFIXP9A8KItInmI1QIU0nKd1kaaGI4NLuWzJpAjUpyZTSrYgrn8VXcO3atWufwJ0LT6ZjQQSbYe797uHOmTsBcIpnCOy/Szw6Fuji3XEDHXw6buJcXDluoSvuHbdxJn58OtTf2ClaXVYP9lTNAj28Om7gBB+Om7jGl+MWeiJ33IYUT4471F8wQQmNOQxjDIkFdowxKqRUNPUKnl0SW2SsU9IUBXJynUss2ScRwodi7rPDcK0RYciVuN7k0OvTM2HMrf8FMCn13OhYLnYyrlKtTeV5ntxmJpXTIjfTolxqGfpK9lNj1tFwmFBNatWvEj/Xhh639pJ6wJV9SkApN5lZ6Zh4Y7UMG9yx0HG2Yf7vFRH3X8u9HmCEATvrrViNafVrzEgeriYHo0E4CFUwPjbkjFrJf5PZuSS9a3ff5nomzHRZZUUulQp8pZQ8YvgNzG5wnHicY2BmAIP/DQzGDFgAACgUAbYA

The browser inconsistency in the original case still stands, though.
Is there a spec covering copy and paste?

David

 --
 Nils Dagsson Moskopp // erlehmann
 http://dieweltistgarnichtso.net


Re: [whatwg] Unicode - ASCII copy/paste fallback

2015-02-13 Thread David Sheets
On Fri, Feb 13, 2015 at 3:02 PM, Glenn Maynard gl...@zewt.org wrote:
 On Fri, Feb 13, 2015 at 5:45 AM, David Sheets kosmo...@gmail.com wrote:

 Hello,

 I have a page with

 a span class=rarrspan-gt;/span/span b

 and style

 .rarr span { overflow: hidden; height: 0; width: 0; display: inline-block;
 }
 .rarr::after { content: →; }

 (That's RIGHTWARDS ARROW x2192.)

 In Firefox 36, this copies and pastes like a - b which is the
 desired behavior. In Chrome 40, this copies and pastes like a  b.

 Is my desired behavior (to show unicode but copy an ASCII
 representation) generally possible? Are there specs somewhere about
 copy/paste behavior? I looked in https://html.spec.whatwg.org/ but
 found nothing relevant.


 Copying ASCII isn't desirable.  It should copy the Unicode string a → b.
 After all, that's what gets copied if you had done spana → b/span in
 the first place.

 (Chrome's issue isn't related to Unicode.  It just doesn't know how to
 select text that's inside CSS content, so it isn't included in the copy.)

The only relation this issue has to Unicode is a use case for
alternate copy/paste behavior.

Judging from the replies to my original inquiry, either Firefox or
Chrome is doing something unexpected or both are behaving unexpectedly
(and should put the unicode arrow on the clipboard).

I'm not sure if all use cases for my original trick can be covered by
using OpenType ligatures (thanks, Nils!) or if there are other
'alternative clipboard behavior' applications. Certainly, the most
consistent behavior would be for both Chrome and Firefox (and other
browsers that I haven't/don't care to test) to put the CSS content on
the clipboard and ignore hidden content.

I suppose currently Chrome is preventing copying hidden content but
Firefox is not and neither picks up the CSS content.

David


Re: [whatwg] Unicode - ASCII copy/paste fallback

2015-02-13 Thread David Sheets
On Fri, Feb 13, 2015 at 12:18 PM, James M. Greene
james.m.gre...@gmail.com wrote:
 In this case, you can use Unicode escape values by preceding them with a
 slash:

   .rarr:after { content: \2192; }


 This is specified in the CSS 2.1 spec:
 http://www.w3.org/TR/CSS2/syndata.html#characters

 Personally, I probably would've just started on StackOverflow with this
 question (e.g. [1]) but no harm done.

Hi James!

Sorry, I wasn't clear. The issue is not with putting Unicode values
into CSS. The issue is that I would like unicode values to be copied
and pasted as a specific ASCII fallback value.

That is, I would like the equivalent of a rarr; b to appear on a
page but, upon copying, a - b to show up in the clipboard.

I have a solution that works in Firefox 36 (described in original
mail). Chrome 40 does not behave similarly.

I can see some arguments for Chrome's behavior along security lines. I
certainly can understand the utility of Firefox's behavior because I
am writing a documentation generation tool for a programming language
with right arrows represented as - but would like to render them as
→.

This seems like a pretty straightforward document feature but I can't
seem to get interoperable behavior (or even find where such behavior
might be specified).

Thanks,

David


 [1]:
 http://stackoverflow.com/questions/10393462/placing-unicode-character-in-css-content-value

 Sincerely,
 James Greene


 On Fri, Feb 13, 2015 at 5:45 AM, David Sheets kosmo...@gmail.com wrote:

 Hello,

 I have a page with

 a span class=rarrspan-gt;/span/span b

 and style

 .rarr span { overflow: hidden; height: 0; width: 0; display: inline-block;
 }
 .rarr::after { content: →; }

 (That's RIGHTWARDS ARROW x2192.)

 In Firefox 36, this copies and pastes like a - b which is the
 desired behavior. In Chrome 40, this copies and pastes like a  b.

 Is my desired behavior (to show unicode but copy an ASCII
 representation) generally possible? Are there specs somewhere about
 copy/paste behavior? I looked in https://html.spec.whatwg.org/ but
 found nothing relevant.

 Is this the right venue for this question? Should I take it somewhere
 else?

 Thanks,

 David Sheets




Re: [whatwg] Unicode - ASCII copy/paste fallback

2015-02-13 Thread David Sheets
On Fri, Feb 13, 2015 at 12:23 PM, Mathias Bynens mathi...@opera.com wrote:
 On Fri, Feb 13, 2015 at 1:18 PM, James M. Greene
 james.m.gre...@gmail.com wrote:
 In this case, you can use Unicode escape values by preceding them with a
 slash:

 OP’s question wasn’t about how to escape non-ASCII characters, but
 rather about what the copy/paste behavior should be in browsers.

 @David, I don’t think it’s reasonable to expect non-ASCII characters
 to be transliterated to ASCII characters copying them. That said, it
 would be nice to standardize on the behavior here: should generated
 content be included when copying or not?

Hi Mathias,

Do you mean that it's not reasonable for transliteration to happen
automatically? I agree.

Do you mean that it's not reasonable to support specific replacements
during copying? Firefox seems to support this currently (and
perfectly).

There are user trickery concerns here but, at least in my case, I
think codepoint - 2 byte replacement is probably safe...

Thanks,

David


Re: [whatwg] New URL Standard

2012-09-25 Thread David Sheets
On Mon, Sep 24, 2012 at 9:18 PM, Ian Hickson i...@hixie.ch wrote:

 This is Anne's spec, so I'll let him give more canonical answers, but:

 On Mon, 24 Sep 2012, David Sheets wrote:

 Your conforming WHATWG-URL syntax will have production rule alphabets
 which are supersets of the alphabets in RFC3986.

 Not necessarily, but that's certainly possible. Personally I would
 recommend that we not change the definition of what is conforming from the
 current RFC3986/RFC3987 rules, except to the extent that the character
 encoding affects it (as per the HTML standard today).

http://whatwg.org/html#valid-url

I believe the '#' character in the fragment identifier qualifies.

 This is what I propose you define and it does not necessarily have to be
 in BNF (though a production rule language of some sort probably isn't a
 bad idea).

 We should definitely define what is a conforming URL, yes (either
 directly, or by reference to the RFCs, as HTML does now). Whether prose or
 a structured language is the better way to go depends on what the
 conformance rules are -- HTML is a good example here: it has parts that
 are defined in terms of prose (e.g. the HTML syntax as a whole), and other
 parts that are defined in terms of BNF (e.g. constraints on the conetnts
 of script elements in certain situations). It's up to Anne.

HTML is far larger and more compositional than URI. I am confident
that, no matter what is specified in the WHATWG New URL Standard, a
formal language exists which can describe the structure of conforming
identifiers. If no such formal language can be described, the syntax
specification is likely to be incomplete or unsound.

 How will WHATWG-URLs which use the syntax extended from RFC3986 map into
 RFC3986 URI references for systems that only support those?

 The same way that those systems handle invalid URLs today, I would assume.
 Do you have any concrete systems in mind here? It would be good to add
 them to the list of systems that we test. (For what it's worth, in
 practice, I've never found software that exactly followed RFC3986 and
 also rejected any non-conforming strings. There are just too many invalid
 URLs out there for that to be a viable implementation strategy.)

It is not the rejection of incoming nonconforming reference
identifiers that causes issues but rather the emission of strictly
conforming identifiers by Postel's Law (Robustness Principle). I know
of several URI implementations that, given a nonconforming reference
identifier, will only output conforming identifiers. Indeed, the
standard under discussion will behave in exactly this way.

This leads to loss of information in chains of URI processors that can
and will change the meaning of identifiers.

 I remember when I was testing this years ago, when doing the first pass on
 attempting to fix this, that I found that some less widely tested
 software, e.g. wget(1), did not handle URLs in the same manner as more
 widely tested software, e.g. IE, with the result being that Web pages were
 not handled interoperably between these two software classes. This is the
 kind of thing we want to stop, by providing a single way to parse all
 input strings, valid or invalid, as URLs.

Was wget in violation of the RFC? Was IE more lenient?

If every string, valid or invalid, is parseable as a URI reference, is
there an algorithm to accurately extract URIs from plain text?

 --
 Ian Hickson   U+1047E)\._.,--,'``.fL
 http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
 Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] New URL Standard

2012-09-25 Thread David Sheets
On Tue, Sep 25, 2012 at 8:03 AM, Anne van Kesteren ann...@annevk.nl wrote:
 On Tue, Sep 25, 2012 at 6:18 AM, Ian Hickson i...@hixie.ch wrote:
 Not necessarily, but that's certainly possible. Personally I would
 recommend that we not change the definition of what is conforming from the
 current RFC3986/RFC3987 rules, except to the extent that the character
 encoding affects it (as per the HTML standard today).

http://whatwg.org/html#valid-url

 FWIW, given that browsers happily do requests to servers with
 characters in the URL that are invalid per the RFC (they are not URL
 escaped) and servers handle them fine I think we should make the
 syntax more lenient. E.g. allowing [ and ] in the path and query
 component is fine I think.

I believe this would introduce ambiguity for parsing URI references.
Is [::1] an authority reference or a path segment reference?

 As for the question about why not build this on top of RFC 3986. That
 does not handle non-ASCII code points. RFC 3987 does, but is not a
 suitable start either. As shown in http://url.spec.whatwg.org/ it is
 quite trivial to combine parsing, resolving, and canonicalizing into a
 single algorithm (and deal with URI/IRI, now URL, as one).

Composition is often trivial but unenlightening. There is necessarily
less information in a partially evaluated function composition than in
the functions in isolation.

Defining a formal language accurately and in a broadly understandable
manner is nontrivial. Your task is nontrivial.

 Trying to
 somehow patch the language in RFC 3987 to deal with the encoding
 problems for the query component, to deal with parsing
 http:example.org when there is a base URL with the same scheme versus
 when there isn't, etc. is way more of a hassle I think, though I am
 happy to be proven wrong.

I believe the encoding problems are handled by a normalization
algorithm and parsing relative references is handled by the base
scheme module.

What is the acceptable trade-off between (y)our hassle and the time of
technologists in the coming decades? Will you make it easier or harder
for them to reconcile WHATWG-URL and Internet Standard 66 (RFC 3986)?

 --
 http://annevankesteren.nl/


Re: [whatwg] New URL Standard

2012-09-25 Thread David Sheets
On Tue, Sep 25, 2012 at 2:13 PM, Glenn Maynard gl...@zewt.org wrote:
 On Mon, Sep 24, 2012 at 7:18 PM, David Sheets kosmo...@gmail.com wrote:

 Always. The appropriate interface is (string * string?) list. Id est,

 an association list of keys and nullable values (null is
 key-without-value and empty string is empty-value). If you prefer to
 not use a nullable value and don't like tuple representations in JS,
 you could use type: string list list

 i.e.


 [[key_without_value],[],[key,value],[],[numbers,1,2,3,4],[,],[,,]]


 This isn't an appropriate interface.  It's terrible for 99.9% of use cases,
 where you really want dictionary-like access.

This is the direct representation of the query string key-value convention.

Looking up keys is easy in an association list. Filtering the list
retains ordering. Appending to the list is well-defined. Folding into
a dictionary is trivial and key merging can be defined according to
the author's URL convention.

 The right approach is probably to expose the results in an object-like form,
 as Tab suggests, but to store the state internally in a list-like format,
 with modifications defined in terms of mutations to the list.

This sounds more complicated to implement while maintaining
invariants. A dictionary with an associated total order is an
association list.

 That is, parsing a=1b=2a=3 would result in an internal representation
 like [('a', '1'), ('b', '2'), ('a', '3')].  When viewed from script, you see
 {a: ['1', '3'], 'b': ['2']}.  If you serialize it right back to a URL the
 internal representation is unchanged, so the original order is preserved.
 The mutation algorithms can then do their best to preserve the list as
 reasonably as they can (eg. assigning query.a = ['5', '6'] would remove all
 'a' keys, then insert items at the location of the first removed item, or
 append if there were none).

Why hide the order?

 Is this not already supported by creating a new URL which contains
 only a relative query part?

 Like: query = new URL(?a=bc=d); query.query[a] = x;
 query.toString() == ?a=xc=d;

 Why is a new interface necessary?


 That won't work, since ?a=bc=d isn't a valid URL.

?a=bc=d is a valid URI reference. @href=?a=bc=d is valid.

 The invalid flag will
 be set, so the change to .query will be a no-op, and .href (presumably what
 toString will invoke) would return the original URL, ?a=bc=d, not
 ?a=xc=d.  You'd need to do something like:

 var query = new URL(http://example.com?; + url.hash);
 query.query.a = x;
 url.hash = query.search.slice(1); // remove the leading ?

 That's awkward, but maybe it's good enough.

This is a use case for parsing without composed relative resolution.


Re: [whatwg] New URL Standard

2012-09-24 Thread David Sheets
On Mon, Sep 24, 2012 at 2:34 AM, Anne van Kesteren ann...@annevk.nl wrote:
 On Sat, Sep 22, 2012 at 9:10 AM, Alexandre Morgaut alexandre.morg...@4d.com 
 wrote:
 Shouldn't this document have references on some of the URL related RFCs:

 The plan is to obsolete the RFCs. But yes, I will add some references
 in the Goals section most likely. Similar to what has been done in the
 DOM Standard.

Is there an issue with defining WHATWG-URL syntax as a grammar
extension to the URI syntax in RFC3986?

How about splitting the definition of the parsing algorithm into a
canonicalization algorithm and a separate parser for the extended
syntax? The type would be string - string with the codomain as a
valid, unique WHATWG-URL serialization. Implementations/IDL could
provide only the composition of canonicalization and parsing but
humans trying to understand the semantics of the present algorithm
would be aided by having these phases explicitly defined.

Will any means be provided to map WHATWG-URL to Internet Standard
RFC3986-URI? Is interoperability with the deployed base of URL
consumers a goal? How will those URLs in the extended syntax be mapped
into standard URIs? Will they be unrepresentable?

Thanks,

David Sheets


Re: [whatwg] New URL Standard

2012-09-24 Thread David Sheets
On Mon, Sep 24, 2012 at 4:07 PM, Glenn Maynard gl...@zewt.org wrote:
 On Mon, Sep 24, 2012 at 12:30 PM, Tab Atkins Jr. jackalm...@gmail.comwrote:

 I suggest just making it a map from String-[String].  You probably
 want a little bit of magic - if the setter receives an array, replace
 the current value with it; anything else, stringify then wrap in an
 array and replace the current value.  The getter should return an
 empty array for non-existing params.  You should be able to set .query
 itself with an object, which empties out the map and then runs the
 setter over all the items.  Bam, every single methods is now obsolete.


 When should this API guarantee that it round-trips URLs cleanly (aside from
 quoting differences)?  For example, maintaining order in a=1b=2a=1, and
 representing things like a=1b (no '=') and ab (no key at all).

Always. The appropriate interface is (string * string?) list. Id est,
an association list of keys and nullable values (null is
key-without-value and empty string is empty-value). If you prefer to
not use a nullable value and don't like tuple representations in JS,
you could use type: string list list

i.e.

[[key_without_value],[],[key,value],[],[numbers,1,2,3,4],[,],[,,]]

becomes

?key_without_valuekey=valuenumbers=1,2,3,4==,

where I've assumed that values after the second are concatenated with
commas (but it could be semicolons or some other separator).

Unfortunately, JavaScript does not have any lightweight product types
so a decision like this is necessary.

 Not round-tripping URLs might have annoying side-effects, like trying to
 use history.replaceState to replace the path portion of the URL, and
 unexpectedly having the query part of the URL get shuffled around or
 changed in other ways.

That would be unacceptably broken.

 Maybe it could guarantee that the query round-trips only if the value is
 never modified (only assigned via the ctor or assigning to href), but once
 you modify the query, the order becomes normalized and any other
 non-round-trip side effects happen.

Why can't as much information as possible be preserved? There exist
many URI manipulation libraries that support maximal preservation.

 By the way, it would also be nice for the query part of this API to be
 usable in isolation.  I often put query-like strings in the hash, resulting
 in URLs like 
 http://example.com/server/side/path?server-side-query=1#client/side/path?client-side-query=1;,
 and it would be nice to be able to work with both of these with the same
 interface.  That is, query = new URLQuery(a=bc=d); query[a] = x;
 query.toString() == a=xc=d;

Is this not already supported by creating a new URL which contains
only a relative query part?

Like: query = new URL(?a=bc=d); query.query[a] = x;
query.toString() == ?a=xc=d;

Why is a new interface necessary?

 --
 Glenn Maynard


Re: [whatwg] New URL Standard

2012-09-24 Thread David Sheets
On Mon, Sep 24, 2012 at 5:23 PM, Ian Hickson i...@hixie.ch wrote:
 On Mon, 24 Sep 2012, David Sheets wrote:

 Is there an issue with defining WHATWG-URL syntax as a grammar extension
 to the URI syntax in RFC3986?

 In general, BNF isn't very useful for defining the parsing rules when you
 also need to handle non-conforming content in a correct manner. Really it
 is only useful for saying whether or not content is conforming.

Your conforming WHATWG-URL syntax will have production rule alphabets
which are supersets of the alphabets in RFC3986. This is what I
propose you define and it does not necessarily have to be in BNF
(though a production rule language of some sort probably isn't a bad
idea).

If you read my mail carefully, you will notice that I address the
non-conforming identifier case in the initial canonicalization
algorithm. This normalization step is separate from the syntax of
conforming WHATWG-URLs and would define how non-conforming strings are
interpreted as conforming strings. The parsing algorithm then provides
a map from these strings into a data structure.

Error recovery and extended syntax for conforming representations are
orthogonal.

How will WHATWG-URLs which use the syntax extended from RFC3986 map
into RFC3986 URI references for systems that only support those?