Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
On 15 February 2014 21:06, Allen Wirfs-Brock al...@wirfs-brock.com wrote: On Feb 15, 2014, at 11:47 AM, Brendan Eich wrote: C. Scott Ananian wrote: On Feb 15, 2014 9:13 AM, Brendan Eich bren...@mozilla.com mailto:bren...@mozilla.com wrote: Aside: ECMASpeak is neither accurate (we don't work for Ecma, it's JS not ES :-P), nor euphonious. I'm learning all sorts of things! I guess there are two names here; what's your preferred phrase for the language used to write algorithms in the ES6 spec (JS6?), and, if it differs, the language used by members of the TC39 committee among themselves when describing language primitives in a very precise way? When I'm in a bad mood, I call it VisualCobol. It's painfully low-level and verbose, yet hard to verify. Let's hope that the JSCert work will help, and Allen has been common'ing subroutines. Whatever we call it, the spec language ain't great. But remember, prior to ES5, it was closer to Cobolish machine language. No structured control, goto's targeting numeric step numbers, intermediate results referenced by step number (sorta SSA with numeric ids), etc. There has never been a complete redo, just incremental improvements and refactorings. But we've definitely advanced from the early 1950s to the late 1970s. Well, Algol-60 already was more structured a language than our spec-speak. Let alone how far the Algol-68 spec was ahead of us. :) /Andreas ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
Andreas Rossberg wrote: On 15 February 2014 20:47, Brendan Eichbren...@mozilla.com wrote: Using -Speak as a stem conjures Orwell. Not good. Ah, relax. Gilad Bracha even named his own language Newspeak. Yeah, but no ECMA -- the double-whammy. Self-mockery is good. I pay my dues (see wat played with commentary at Fluent 2012 and narrated with tech details at Strange Loop 2012). /be ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
Are recordings available? --scott On Feb 17, 2014 10:26 AM, Brendan Eich bren...@mozilla.com wrote: Andreas Rossberg wrote: On 15 February 2014 20:47, Brendan Eichbren...@mozilla.com wrote: Using -Speak as a stem conjures Orwell. Not good. Ah, relax. Gilad Bracha even named his own language Newspeak. Yeah, but no ECMA -- the double-whammy. Self-mockery is good. I pay my dues (see wat played with commentary at Fluent 2012 and narrated with tech details at Strange Loop 2012). /be ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
C. Scott Ananian wrote: Are recordings available? http://www.infoq.com/presentations/State-JavaScript starting at 1:50 Youtube has more. /be ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
Aside: ECMASpeak is neither accurate (we don't work for Ecma, it's JS not ES :-P), nor euphonious. But here's a pointer: C. Scott Ananian wrote: new string object. new string primitive, because string object (especially with new in front) suggests new String('hi'). /be ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
On 14 Feb 2014, at 19:59, Allen Wirfs-Brock al...@wirfs-brock.com wrote: It's a really high bar to get over that closed gate. Unless the exclusion of a feature was a mistake […] I don't think we should be talking about adding it to ES6. It does feel like a mistake to me to introduce `String.prototype.codePointAt`, but no similar function that returns the symbol instead. ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
On Feb 15, 2014 9:13 AM, Brendan Eich bren...@mozilla.com wrote: Aside: ECMASpeak is neither accurate (we don't work for Ecma, it's JS not ES :-P), nor euphonious. I'm learning all sorts of things! I guess there are two names here; what's your preferred phrase for the language used to write algorithms in the ES6 spec (JS6?), and, if it differs, the language used by members of the TC39 committee among themselves when describing language primitives in a very precise way? new string object. new string primitive, because string object (especially with new in front) suggests new String('hi'). I wrestled with the phrasing there. I think what I really mean is avoid allocating new backing storage, since there are new string primitives returned regardless. If there's a better phrase for string backing storage I'd be glad to add that to my dictionary. --scott ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
Allen mentioned that `String#at` might not make it to ES6 because nobody in TC39 is championing it. I’ve now asked Rick if he would be the champion for this, and he agreed. (Thanks again!) Looking over the ‘TC39 progress’ document at https://docs.google.com/a/chromium.org/document/d/1QbEE0BsO4lvl7NFTn5WXWeiEIBfaVUF7Dk0hpPpPDzU, it seems most of the work is already taken care of: the use case was discussed in this thread, the proposal has a complete spec text, and there’s an example implementation/polyfill with unit tests. See http://mths.be/at. Is there anything else I can do to help get this included as a non-TC39-member? ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
RE: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
This was the method that was only useful if you pass `0` to it? -Original Message- From: es-discuss [mailto:es-discuss-boun...@mozilla.org] On Behalf Of Mathias Bynens Sent: Friday, February 14, 2014 10:34 To: Rick Waldron; Allen Wirfs-Brock Cc: es-discuss@mozilla.org list Subject: Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`) Allen mentioned that `String#at` might not make it to ES6 because nobody in TC39 is championing it. I've now asked Rick if he would be the champion for this, and he agreed. (Thanks again!) Looking over the 'TC39 progress' document at https://docs.google.com/a/chromium.org/document/d/1QbEE0BsO4lvl7NFTn5WXWeiEIBfaVUF7Dk0hpPpPDzU, it seems most of the work is already taken care of: the use case was discussed in this thread, the proposal has a complete spec text, and there's an example implementation/polyfill with unit tests. See http://mths.be/at. Is there anything else I can do to help get this included as a non-TC39-member? ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
Note that `Array.from(str)` and `str[Symbol.iterator]` overlap significantly. In particular, it's somewhat awkward to iterate over code points using `String#symbolAt`; it's much easier to use `substr()` and then use the StringIterator. --scott ps. I see that Domenic has said something similar. On Thu, Feb 13, 2014 at 11:34 PM, Mathias Bynens math...@qiwi.be wrote: Allen mentioned that `String#at` might not make it to ES6 because nobody in TC39 is championing it. I’ve now asked Rick if he would be the champion for this, and he agreed. (Thanks again!) Looking over the ‘TC39 progress’ document at https://docs.google.com/a/chromium.org/document/d/1QbEE0BsO4lvl7NFTn5WXWeiEIBfaVUF7Dk0hpPpPDzU, it seems most of the work is already taken care of: the use case was discussed in this thread, the proposal has a complete spec text, and there’s an example implementation/polyfill with unit tests. See http://mths.be/at. Is there anything else I can do to help get this included as a non-TC39-member? ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
On 14 Feb 2014, at 11:11, Domenic Denicola dome...@domenicdenicola.com wrote: This was the method that was only useful if you pass `0` to it? I’ll just avoid the infinite loop here by pointing to earlier posts in this thread where this was discussed before: http://esdiscuss.org/topic/string-prototype-symbolat-improved-string-prototype-charat#content-34 and http://esdiscuss.org/topic/string-prototype-symbolat-improved-string-prototype-charat#content-40. This method is just as useful as `String.prototype.codePointAt`. If that method is included, so should `String.prototype.at`. If `String.prototype.at` is found not to be useful, `String.prototype.codePointAt` should be removed too. ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
On 14 Feb 2014, at 11:14, C. Scott Ananian ecmascr...@cscott.net wrote: Note that `Array.from(str)` and `str[Symbol.iterator]` overlap significantly. In particular, it's somewhat awkward to iterate over code points using `String#symbolAt`; it's much easier to use `substr()` and then use the StringIterator. `String#at` is not meant for iterating over code points – that’s what the `StringIterator` is for. `String#at` is exactly like `String#codePointAt` except it returns strings (containing the symbol) instead of numbers (representing the code point value). It can be used to get the symbol at a given code unit position in a string (similar to how `String#codePointAt` can be used to get the code point at a given code unit position in a string). ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
Yes, I know what `String#at` is supposed to do. I was pointing out that `String#at` makes it easy to do the wrong thing. If you do `Array.from(str)` then you suddenly have a complete random-access data structure where you can find out the number of code points in the String, iterate it in reverse from the end to the start, slice it, find the midpoint, etc. `Array.from` looks like an O(n) operation, and it is -- so it encourages developers to cache the value and reuse it. That said, I can see where a lexer might want to use `String#at`, being careful to do the correct index bump based on `result.length`. However, the fastest JS lexers don't create String objects, they operate directly on the code point (see http://marijnhaverbeke.nl/acorn/#section-58). So I'm -0, mostly because the name isn't great. But I have exactly zero say in the matter anyway. So I'll shut up now. --scott ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
RE: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
I think Mathias's point, that it is exactly as useful or useless as `codePointAt`, is a reasonable one. However, This method is just as useful as `String.prototype.codePointAt`. If that method is included, so should `String.prototype.at`. If `String.prototype.at` is found not to be useful, `String.prototype.codePointAt` should be removed too. This does not follow. The choice is not between adding two useless methods and adding zero. There is no reason to exclude the possibility of adding only one useless method. But anyway, as some people seem to think that both methods are in fact useful---including Rick, who has agreed to champion---I agree with Scott that after having said our piece it's time to exit the thread. -Original Message- From: es-discuss [mailto:es-discuss-boun...@mozilla.org] On Behalf Of C. Scott Ananian Sent: Friday, February 14, 2014 12:12 To: Mathias Bynens Cc: es-discuss@mozilla.org list Subject: Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`) Yes, I know what `String#at` is supposed to do. I was pointing out that `String#at` makes it easy to do the wrong thing. If you do `Array.from(str)` then you suddenly have a complete random-access data structure where you can find out the number of code points in the String, iterate it in reverse from the end to the start, slice it, find the midpoint, etc. `Array.from` looks like an O(n) operation, and it is -- so it encourages developers to cache the value and reuse it. That said, I can see where a lexer might want to use `String#at`, being careful to do the correct index bump based on `result.length`. However, the fastest JS lexers don't create String objects, they operate directly on the code point (see http://marijnhaverbeke.nl/acorn/#section-58). So I'm -0, mostly because the name isn't great. But I have exactly zero say in the matter anyway. So I'll shut up now. --scott ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
On Fri, Feb 14, 2014 at 1:34 AM, Mathias Bynens math...@qiwi.be wrote: Allen mentioned that `String#at` might not make it to ES6 because nobody in TC39 is championing it. I've now asked Rick if he would be the champion for this, and he agreed. (Thanks again!) Published to wiki here: http://wiki.ecmascript.org/doku.php?id=strawman:string_at Rick ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
On Feb 14, 2014, at 1:34 AM, Mathias Bynens wrote: Allen mentioned that `String#at` might not make it to ES6 because nobody in TC39 is championing it. I’ve now asked Rick if he would be the champion for this, and he agreed. (Thanks again!) Looking over the ‘TC39 progress’ document at https://docs.google.com/a/chromium.org/document/d/1QbEE0BsO4lvl7NFTn5WXWeiEIBfaVUF7Dk0hpPpPDzU, it seems most of the work is already taken care of: the use case was discussed in this thread, the proposal has a complete spec text, and there’s an example implementation/polyfill with unit tests. See http://mths.be/at. Is there anything else I can do to help get this included as a non-TC39-member? But just to be even clear, the new feature gate for ES6 is officially closed. It's a really high bar to get over that closed gate. Unless the exclusion of a feature was a mistake, fixes a bug, or is somehow essentially to supporting something that is already in ES6 I don't think we should be talking about adding it to ES6. I don't think String.prototype.at fits any of those criteria. We've talked about it several times, including in the context of Norbert's original ES6 full unicode support proposal, and never achieved consensus on including it. Personally, I think it should be there but it's time to start talking about it for ES7 not ES6. Allen ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
On Fri, Feb 14, 2014 at 10:59 AM, Allen Wirfs-Brock al...@wirfs-brock.comwrote: On Feb 14, 2014, at 1:34 AM, Mathias Bynens wrote: Allen mentioned that `String#at` might not make it to ES6 because nobody in TC39 is championing it. I've now asked Rick if he would be the champion for this, and he agreed. (Thanks again!) Looking over the 'TC39 progress' document at https://docs.google.com/a/chromium.org/document/d/1QbEE0BsO4lvl7NFTn5WXWeiEIBfaVUF7Dk0hpPpPDzU, it seems most of the work is already taken care of: the use case was discussed in this thread, the proposal has a complete spec text, and there's an example implementation/polyfill with unit tests. See http://mths.be/at. Is there anything else I can do to help get this included as a non-TC39-member? But just to be even clear, the new feature gate for ES6 is officially closed. It's a really high bar to get over that closed gate. Unless the exclusion of a feature was a mistake, fixes a bug, or is somehow essentially to supporting something that is already in ES6 I don't think we should be talking about adding it to ES6. I don't think String.prototype.at fits any of those criteria. We've talked about it several times, including in the context of Norbert's original ES6 full unicode support proposal, and never achieved consensus on including it. Personally, I think it should be there but it's time to start talking about it for ES7 not ES6. Yes, I absolutely agree, apologies as I realize that was not addressed in my previous message. Rick Allen ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
I'm excited to start working on es7-shim once we get to that point! (String.prototype.at has a particularly simple shim, thankfully...) --scott ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
On Fri, Feb 14, 2014 at 12:23 PM, C. Scott Ananian ecmascr...@cscott.netwrote: I'm excited to start working on es7-shim once we get to that point! (String.prototype.at has a particularly simple shim, thankfully...) Have you seen: https://github.com/mathiasbynens/String.prototype.at ? Rick ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
yes, of course. es6-shim is a large-ish collection of such. However, it would be much better to use an implementation of `String#at` which used substr and thus avoided creating and appending a new string object. --scott ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
On 19 Oct 2013, at 12:54, Domenic Denicola dome...@domenicdenicola.com wrote: My proposed cowpaths: ```js Object.mixin(String.prototype, { realCharacterAt(i) { let index = 0; for (var c of this) { if (index++ === i) { return c; } } } get realLength() { let counter = 0; for (var c of this) { ++counter; } return counter; } }); ``` Good stuff! To account for [lookalike symbols due to combining marks] [1], just add a call to `String.prototype.normalize`: Object.mixin(String.prototype, { get realLength() { let counter = 0; for (var c of this.normalize('NFC')) { ++counter; } return counter; } }); assert('ma\xF1ana'.realLength == 'man\u0303ana'.realLength); [1]: http://mathiasbynens.be/notes/javascript-unicode#accounting-for-lookalikes ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
so it's a for/of with a break when it finds a code point? if that's the only use case I'd like to have an example of how convenient it is. I am just wondering, not saying is not useful (trying to understand when/where/why I'd like to use .at()) Thanks On Fri, Oct 18, 2013 at 10:12 PM, Mathias Bynens math...@qiwi.be wrote: On 18 Oct 2013, at 17:51, Joshua Bell jsb...@google.com wrote: Given that you can only use the proposed String.prototype.at() properly for indexes 0 if you know the index of a non-BMP character or lead surrogate by some other means, or if you will test the return value for a trailing surrogate, is it really an advantage over using codePointAt / fromCodePoint? The name at is so tempting I'm imagining naive scripts of the form for (i = 0; i s.length; ++i) { r += s.at(i); } which will work fine until they get a non-BMP input at which point they're suddenly duplicating the trailing surrogates. Pushing people towards for-of iteration and even Allen's Array.from( '팆팆팆'))[1] seems safer; users who need more subtle things have have codePointAt / fromCodePoint available and hopefully the knowledge to use them. Just because new features can be used incorrectly doesn’t mean the feature isn’t useful. `for…of` on strings and `String.prototype.at` are two very different things for two very different use cases. It’s a matter of using the right tool for the job, IMHO. In your example (iterating over all code points in a string), `for…of` should be used. `String.prototype.codePointAt` or `String.prototype.at` come in handy in case you only need to get the first code point or symbol in a string, for example. ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
On Oct 18, 2013, at 10:53 PM, Domenic Denicola wrote: On 19 Oct 2013, at 01:12, Mathias Bynens math...@qiwi.be wrote: `String.prototype.codePointAt` or `String.prototype.at` come in handy in case you only need to get the first code point or symbol in a string, for example. Are they useful for anything else, though? For example, if I wanted to get the second symbol in a string, how would I do that? We discussed the utility of 'codePointAt' in the context of Norbert's full Unicode support proposal. At that time we concluded that it was something we needed. I don't see any new evidence that suggests that we need to reopen that decision at this point in the process. The utility of a hypothetical 'at' method is presumably exactly that of 'codePointAt'. str.at(p) would just be a convenience for expressing String.fromCodePoint(str.codePointAt(p)) So the real question is probably, how common is that use case. It's relatively easy using 'at' do a for loop over the characters of a string using 'at'. Something like: let c = ''; for (let p=0; pstr.length; p+=c.length) { c = str.at(p); ... } although, a for-of would be better in most cases: for (let c of str) The use case that we don't support well is any sort of back wards iteration of the characters of a string. We don't current have an iterator specified to do it, nor do we have a one stop way to test whether we at looking at the trailing surrogate of a surrogate pair. Allen ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
On Oct 18, 2013, at 4:22 PM, André Bargull wrote: On Oct 18, 2013, at 4:01 PM, Allen Wirfs-Brock wrote: On Oct 18, 2013, at 1:29 PM, Allen Wirfs-Brock wrote: Array.from( '팆팆팆'))[1] maybe even better: Uint32Array.from( '팆팆팆'))[1] err...maybe not if you want a string value: String.fromCodePoint(Uint32Array.from( '팆팆팆')[1]) That does not seem to be too useful: js String.fromCodePoint(Uint32Array.from(\u{1d306}\u{1d306}\u{1d306})[1]) \u right, it would need to be String.fromCodePoint(Uint32Array.from( '팆팆팆', s=s.codePointAt(0))[1]) According to http://norbertlindenberg.com/2012/05/ecmascript-supplementary-characters/index.html#String, String.prototype[@@iterator] does not return plain code points, but the String value for the code point. yes, that's correct and how I have it spec'ed in rev20 Allen___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
* Allen Wirfs-Brock wrote: The utility of a hypothetical 'at' method is presumably exactly that of 'codePointAt'. str.at(p) would just be a convenience for expressing String.fromCodePoint(str.codePointAt(p)) So the real question is probably, how common is that use case. Certainly not common enough to warrant a two-character method on the native string type. Odds are people will use it incorrectly in an attempt to make their code look concise, not understanding that it'll retrieve a substring of .length 1 or 2, possibly consisting of a lone surrogate, based on a 16 bit index that might fall in the middle of a character; the problematic cases are fairly rare, so it's hard to notice improper use of `.at` in automated testing or in code review. -- Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
On 19 Oct 2013, at 12:15, Bjoern Hoehrmann derhoe...@gmx.net wrote: Certainly not common enough to warrant a two-character method on the native string type. Odds are people will use it incorrectly in an attempt to make their code look concise […] Are you saying that changing the name to something that is longer than `at` would solve this problem? […] not understanding that it'll retrieve a substring of .length 1 or 2, possibly consisting of a lone surrogate, based on a 16 bit index that might fall in the middle of a character; the problematic cases are fairly rare, so it's hard to notice improper use of `.at` in automated testing or in code review. People are using `String.prototype.charAt()` incorrectly too, expecting it to return whole symbols instead of surrogate halves wherever possible. How would _not_ introducing a method that avoids this problem help? ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
On 19 Oct 2013, at 00:53, Domenic Denicola dome...@domenicdenicola.com wrote: On 19 Oct 2013, at 01:12, Mathias Bynens math...@qiwi.be wrote: `String.prototype.codePointAt` or `String.prototype.at` come in handy in case you only need to get the first code point or symbol in a string, for example. Are they useful for anything else, though? For example, if I wanted to get the second symbol in a string, how would I do that? Yeah, that’s the problem with these methods. Additional user code is required to handle non-zero `position` arguments, unless you’re sure the `position` is actually the start of a code point (and not in the middle of a surrogate pair). I guess there are situations where that’s a certainty, for example when you’re dealing with a string in which the user selected some text. This brings us back to the earlier discussion of whether something like `String.prototype.codePoints` should be added: http://esdiscuss.org/topic/how-to-count-the-number-of-symbols-in-a-string It could be a getter or a generator… Or does `for…of` iteration handle this use case adequately? ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
RE: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
From: Mathias Bynens [mailto:math...@qiwi.be] This brings us back to the earlier discussion of whether something like `String.prototype.codePoints` should be added: http://esdiscuss.org/topic/how-to-count-the-number-of-symbols-in-a-string It could be a getter or a generator… Or does `for…of` iteration handle this use case adequately? It sounds like you are proposing a second name for `String.prototype[Symbol.iterator]`, which does not sound very useful. A property for the string's real length does seem somewhat useful, as does a method that does random-access on real characters. Certainly more useful than the proposed symbolAt/at. But I suppose we can pave whatever cowpaths arise. My proposed cowpaths: ```js Object.mixin(String.prototype, { realCharacterAt(i) { let index = 0; for (var c of this) { if (index++ === i) { return c; } } } get realLength() { let counter = 0; for (var c of this) { ++counter; } return counter; } }); ``` This would allow you to e.g. find the character in the real middle of a string with code like ```js var middleIndex = Math.floor(theString.realLength / 2); var middleRealCharacter = theString.realCharacterAt(middleIndex); ``` ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
AFAIK that's also what Allen said didn't want to implement in core. An expensive operation per each invocation due stateless loop over arbitrary indexes. Although, strings are immutable in JS so I'd implement that logic creating a snapshot once and use that as if it was an Array ... something like the following: ```javascript !function(dict){ function getOrCreate(str) { if (!(str in dict)) { dict[str] = { i: 0, l: 0, v: (Array.from || function(){ // miserable callback return str.split('') })(str) // or the for/of loop }; } // times it's used dict[str].i++; return dict[str].v; } setInterval(function () { var key, value; for(key in dict) { value = dict[key]; value.l = value.i - value.l; // used only once or never used again if (value.l 2) { // free all the RAM delete dict[key]; } } }, 5000); // 5 seconds should be enough ? // incremental works better with // slower timeout though // 500 might be good too Object.defineProperties( String.prototype, { at: { configurable: true, writable: true, value: function at(i) { return getOrCreate(this)[i]; } }, // or any meaningful name size: { configurable: true, get: function () { return getOrCreate(this).length; } } } ); }(Object.create(null)); // @example var str = 'abc'; alert([ str.size, // 3 str.at(1) // b ]); ``` Regards On Sat, Oct 19, 2013 at 10:54 AM, Domenic Denicola dome...@domenicdenicola.com wrote: From: Mathias Bynens [mailto:math...@qiwi.be] This brings us back to the earlier discussion of whether something like `String.prototype.codePoints` should be added: http://esdiscuss.org/topic/how-to-count-the-number-of-symbols-in-a-stringIt could be a getter or a generator… Or does `for…of` iteration handle this use case adequately? It sounds like you are proposing a second name for `String.prototype[Symbol.iterator]`, which does not sound very useful. A property for the string's real length does seem somewhat useful, as does a method that does random-access on real characters. Certainly more useful than the proposed symbolAt/at. But I suppose we can pave whatever cowpaths arise. My proposed cowpaths: ```js Object.mixin(String.prototype, { realCharacterAt(i) { let index = 0; for (var c of this) { if (index++ === i) { return c; } } } get realLength() { let counter = 0; for (var c of this) { ++counter; } return counter; } }); ``` This would allow you to e.g. find the character in the real middle of a string with code like ```js var middleIndex = Math.floor(theString.realLength / 2); var middleRealCharacter = theString.realCharacterAt(middleIndex); ``` ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
example mroe readable and with some typo fixed in github: https://gist.github.com/WebReflection/7059536 license wtfpl v2 http://www.wtfpl.net/txt/copying/ Cheers On Sat, Oct 19, 2013 at 11:18 AM, Andrea Giammarchi andrea.giammar...@gmail.com wrote: AFAIK that's also what Allen said didn't want to implement in core. An expensive operation per each invocation due stateless loop over arbitrary indexes. Although, strings are immutable in JS so I'd implement that logic creating a snapshot once and use that as if it was an Array ... something like the following: ```javascript !function(dict){ function getOrCreate(str) { if (!(str in dict)) { dict[str] = { i: 0, l: 0, v: (Array.from || function(){ // miserable callback return str.split('') })(str) // or the for/of loop }; } // times it's used dict[str].i++; return dict[str].v; } setInterval(function () { var key, value; for(key in dict) { value = dict[key]; value.l = value.i - value.l; // used only once or never used again if (value.l 2) { // free all the RAM delete dict[key]; } } }, 5000); // 5 seconds should be enough ? // incremental works better with // slower timeout though // 500 might be good too Object.defineProperties( String.prototype, { at: { configurable: true, writable: true, value: function at(i) { return getOrCreate(this)[i]; } }, // or any meaningful name size: { configurable: true, get: function () { return getOrCreate(this).length; } } } ); }(Object.create(null)); // @example var str = 'abc'; alert([ str.size, // 3 str.at(1) // b ]); ``` Regards On Sat, Oct 19, 2013 at 10:54 AM, Domenic Denicola dome...@domenicdenicola.com wrote: From: Mathias Bynens [mailto:math...@qiwi.be] This brings us back to the earlier discussion of whether something like `String.prototype.codePoints` should be added: http://esdiscuss.org/topic/how-to-count-the-number-of-symbols-in-a-stringIt could be a getter or a generator… Or does `for…of` iteration handle this use case adequately? It sounds like you are proposing a second name for `String.prototype[Symbol.iterator]`, which does not sound very useful. A property for the string's real length does seem somewhat useful, as does a method that does random-access on real characters. Certainly more useful than the proposed symbolAt/at. But I suppose we can pave whatever cowpaths arise. My proposed cowpaths: ```js Object.mixin(String.prototype, { realCharacterAt(i) { let index = 0; for (var c of this) { if (index++ === i) { return c; } } } get realLength() { let counter = 0; for (var c of this) { ++counter; } return counter; } }); ``` This would allow you to e.g. find the character in the real middle of a string with code like ```js var middleIndex = Math.floor(theString.realLength / 2); var middleRealCharacter = theString.realCharacterAt(middleIndex); ``` ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
* Mathias Bynens wrote: On 19 Oct 2013, at 12:15, Bjoern Hoehrmann derhoe...@gmx.net wrote: Certainly not common enough to warrant a two-character method on the native string type. Odds are people will use it incorrectly in an attempt to make their code look concise […] Are you saying that changing the name to something that is longer than `at` would solve this problem? If it was `.getOneOrTwoCodepointLongSubstringAtUcs2CodeUnitIndex(...)` I am sure people would be reluctant using it because it's unreasonably long compared to `String.fromCodePoint(str.codePointAt(p))` and harder to understand than the combination of those two primitives. […] not understanding that it'll retrieve a substring of .length 1 or 2, possibly consisting of a lone surrogate, based on a 16 bit index that might fall in the middle of a character; the problematic cases are fairly rare, so it's hard to notice improper use of `.at` in automated testing or in code review. People are using `String.prototype.charAt()` incorrectly too, expecting it to return whole symbols instead of surrogate halves wherever possible. How would _not_ introducing a method that avoids this problem help? Right now people do not have much of a choice other than writing code that does not do the right thing when faced with malformed strings or non-BMP characters, it's unreasonable to call a method like `substr` and then manually smooth it up around the edges and perhaps scan the interior for lone surrogates to ensure that at least your code doesn't do the wrong thing. That gives you well-known bad code, which is a good thing to have, better than more complicated code that might have unknown bugs. Allen's loop `for (let p=0; pstr.length; p+=c.length)` for instance is just waiting for someone to improve or replace it with code that increments by `1` instead of `.length` because that's simpler. The methods `fromCodePoint` and `codePointAt` can be used to get ugly constants out of code that tries to do the right thing, and they will offer some insight into how developers might go from UCS-only code to something more proper, but for the moment duplicating all the UCS-based methods strikes me as premature, especially when giving them seductive names. How would a somewhat-surrogate-aware `substring` method work and what would it be called, for instance? If it is omitted, we would be back to square one, someone in need of substring functionality has to jump through overly complicated hoops to make it work correctly and ends up mixing surrogate-pair-aware with -unaware code. -- Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
Allen Wirfs-Brock wrote: The use case that we don't support well is any sort of back wards iteration of the characters of a string. We don't current have an iterator specified to do it, nor do we have a one stop way to test whether we at looking at the trailing surrogate of a surrogate pair. What do you mean by one stop? O(1)? We aren't going to mandate implementations make such tests (or backward iteration) that cheap. Is there yet a real world (from the field, not a testcase) use-case for backward iteration? /be ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
`String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
ES6 fixes `String.fromCharCode` by introducing `String.fromCodePoint`. Similarly, `String.prototype.charCodeAt` is fixed by `String.prototype.codePointAt`. Should there be a method that is like `String.prototype.charAt` except it deals with astral Unicode symbols wherever possible? '팆'.charAt(0) // U+1D306 '\uD834' // the first surrogate half for U+1D306 '팆'.symbolAt(0) // U+1D306 '팆' // U+1D306 Has this been discussed before? If there’s any interest I’d be happy to create a strawman. Mathias http://mathiasbynens.be/ ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
On Fri, Oct 18, 2013 at 8:46 AM, Mathias Bynens math...@qiwi.be wrote: ES6 fixes `String.fromCharCode` by introducing `String.fromCodePoint`. Similarly, `String.prototype.charCodeAt` is fixed by `String.prototype.codePointAt`. Should there be a method that is like `String.prototype.charAt` except it deals with astral Unicode symbols wherever possible? '팆'.charAt(0) // U+1D306 '\uD834' // the first surrogate half for U+1D306 '팆'.symbolAt(0) // U+1D306 '팆' // U+1D306 I think the idea is good, but the name may be confusing with regard to Symbols (maybe not?) Rick ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
On 18 Oct 2013, at 09:21, Rick Waldron waldron.r...@gmail.com wrote: I think the idea is good, but the name may be confusing with regard to Symbols (maybe not?) Yeah, I thought about that, but couldn’t figure out a better name. “Glyph” or “Grapheme” wouldn’t be accurate. Any suggestions? Anyway, if everyone agrees this is a good idea I’ll get started on fleshing out a proposal. We can then use this thread to bikeshed about the name. ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
I also noticed the naming similarity to ES6 `Symbol`s. I've seen people fill `String.prototype.getFullChar` before and similarly things like `String.prototype.fromFullCharCode` for dealing with surrogates before. I like `String.prototype.signAt` but I haven't seen it used before. I'm eager to hear what Allen has to say about this given his work on unicode in ecmascript. Especially how it settles with this http://wiki.ecmascript.org/doku.php?id=strawman:support_full_unicode_in_stringsrev=1304034700 I also think that this is important enough to be there. -- Forwarded message -- From: Mathias Bynens math...@qiwi.be To: Rick Waldron waldron.r...@gmail.com Cc: es-discuss@mozilla.org list es-discuss@mozilla.org Date: Fri, 18 Oct 2013 09:47:21 -0500 Subject: Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`) On 18 Oct 2013, at 09:21, Rick Waldron waldron.r...@gmail.com wrote: I think the idea is good, but the name may be confusing with regard to Symbols (maybe not?) Yeah, I thought about that, but couldn’t figure out a better name. “Glyph” or “Grapheme” wouldn’t be accurate. Any suggestions? Anyway, if everyone agrees this is a good idea I’ll get started on fleshing out a proposal. We can then use this thread to bikeshed about the name. ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
Here’s my proposal. Feedback welcome, as well as suggestions for a better name (if any). ## String.prototype.symbolAt(pos) NOTE: Returns a single-element String containing the code point at element position `pos` in the String `value` resulting from converting the `this` object to a String. If there is no element at that position, the result is the empty String. The result is a String value, not a String object. When the `symbolAt` method is called with one argument `pos`, the following steps are taken: 01. Let `O` be `CheckObjectCoercible(this value)`. 02. Let `S` be `ToString(O)`. 03. `ReturnIfAbrupt(S)`. 04. Let `position` be `ToInteger(pos)`. 05. `ReturnIfAbrupt(position)`. 06. Let `size` be the number of elements in `S`. 07. If `position 0` or `position ≥ size`, return the empty String. 08. Let `first` be the code unit at index `position` in the String `S`. 09. Let `cuFirst` be the code unit value of the element at index `0` in the String `first`. 10. If `cuFirst 0xD800` or `cuFirst 0xDBFF` or `position + 1 = size`, then return `first`. 11. Let `cuSecond` be the code unit value of the element at index `position + 1` in the String `S`. 12. If `cuSecond 0xDC00` or `cuSecond 0xDFFF`, then return `first`. 13. Let `second` be the code unit at index `position + 1` in the string `S`. 14. Let `cp` be `(first – 0xD800) × 0x400 + (second – 0xDC00) + 0x1`. 15. Return the elements of the UTF-16 Encoding (clause 6) of `cp`. NOTE: The `symbolAt` function is intentionally generic; it does not require that its `this` value be a String object. Therefore it can be transferred to other kinds of objects for use as a method. ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
On Fri, Oct 18, 2013 at 10:47 AM, Mathias Bynens math...@qiwi.be wrote: On 18 Oct 2013, at 09:21, Rick Waldron waldron.r...@gmail.com wrote: I think the idea is good, but the name may be confusing with regard to Symbols (maybe not?) Yeah, I thought about that, but couldn’t figure out a better name. “Glyph” or “Grapheme” wouldn’t be accurate. Any suggestions? Anyway, if everyone agrees this is a good idea I’ll get started on fleshing out a proposal. We can then use this thread to bikeshed about the name. I think it's worthwhile to write up a proposal. And the shed should always be pink ;) Rick ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
On Fri, Oct 18, 2013 at 11:15 AM, Mathias Bynens math...@qiwi.be wrote: Here’s my proposal. Feedback welcome, as well as suggestions for a better name (if any). ## String.prototype.symbolAt(pos) Here goes... String.prototype.elementAt? Rick ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
RE: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
Doesn't Unicode have some name for visual representation of a code point? Maybe it's symbol? ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
On Fri, Oct 18, 2013 at 1:46 PM, Mathias Bynens math...@qiwi.be wrote: Similarly, `String.prototype.charCodeAt` is fixed by `String.prototype.codePointAt`. When you phrase it like that, I see another problem with codePointAt(). You can't just replace existing usage of charCodeAt() with codePointAt() as that would fail for input with paired surrogates. E.g. a simple loop over a string that prints code points would print both the code point and the trail surrogate code point for a surrogate pair. The same goes for this new method. I still think that only offering a better way to iterate strings (as planned) seems like a much safer start into this brave new code point-based world. -- http://annevankesteren.nl/ ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
On 18 Oct 2013, at 10:48, Anne van Kesteren ann...@annevk.nl wrote: On Fri, Oct 18, 2013 at 1:46 PM, Mathias Bynens math...@qiwi.be wrote: Similarly, `String.prototype.charCodeAt` is fixed by `String.prototype.codePointAt`. When you phrase it like that, I see another problem with codePointAt(). You can't just replace existing usage of charCodeAt() with codePointAt() as that would fail for input with paired surrogates. E.g. a simple loop over a string that prints code points would print both the code point and the trail surrogate code point for a surrogate pair. I disagree. In those situations you should just iterate over the string using `for…of`. `.symbolAt()` can be a useful replacement for `.charAt()` in case you only need to get the first symbol in the string. The same goes for `.codePointAt()` vs. `.charCodeAt()`. ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
On Fri, Oct 18, 2013 at 11:53 AM, Mathias Bynens math...@qiwi.be wrote: On 18 Oct 2013, at 10:25, Rick Waldron waldron.r...@gmail.com wrote: String.prototype.elementAt? This may be confusing too, since the spec refers to `elements` as code units, not code points. Yes, slight mis-reading of your proposal—thanks for clarifying Rick ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
On Fri, Oct 18, 2013 at 4:58 PM, Mathias Bynens math...@qiwi.be wrote: On 18 Oct 2013, at 10:48, Anne van Kesteren ann...@annevk.nl wrote: When you phrase it like that, I see another problem with codePointAt(). You can't just replace existing usage of charCodeAt() with codePointAt() as that would fail for input with paired surrogates. E.g. a simple loop over a string that prints code points would print both the code point and the trail surrogate code point for a surrogate pair. I disagree. In those situations you should just iterate over the string using `for…of`. That seems to iterate over code units as far as I can tell. for (var x of ) print(x.charCodeAt(0)) invokes print() twice in Gecko. -- http://annevankesteren.nl/ ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
/ I disagree. In those situations you should just iterate over the string using `for...of`. / That seems to iterate over code units as far as I can tell. for (var x of ?) print(x.charCodeAt(0)) invokes print() twice in Gecko. SpiderMonkey does not implement the (yet to be) spec'ed String.prototype.@@iterator function, instead it simply aliases String.prototype[@@iterator] to Array.prototype[@@iterator]: js String.prototype[@@iterator] === Array.prototype[@@iterator] true - André ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
On Oct 18, 2013, at 7:21 AM, Rick Waldron wrote: On Fri, Oct 18, 2013 at 8:46 AM, Mathias Bynens math...@qiwi.be wrote: ES6 fixes `String.fromCharCode` by introducing `String.fromCodePoint`. Similarly, `String.prototype.charCodeAt` is fixed by `String.prototype.codePointAt`. Should there be a method that is like `String.prototype.charAt` except it deals with astral Unicode symbols wherever possible? '팆'.charAt(0) // U+1D306 '\uD834' // the first surrogate half for U+1D306 '팆'.symbolAt(0) // U+1D306 '팆' // U+1D306 I think the idea is good, but the name may be confusing with regard to Symbols (maybe not?) Given that we have charAt, charCodeAt and codePointAt, I think the most appropiate name for such a method would be 'at': '팆'.at(0) The issue when this sort of method has been discussed in the past has been what to do when you index at a trailing surrogate possition: '팆'.at(1) do you still get '팆' or do you get the equivalent of String.fromCharCode('팆'[1]) ? Allen ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
On Oct 18, 2013, at 9:05 AM, Anne van Kesteren wrote: On Fri, Oct 18, 2013 at 4:58 PM, Mathias Bynens math...@qiwi.be wrote: On 18 Oct 2013, at 10:48, Anne van Kesteren ann...@annevk.nl wrote: When you phrase it like that, I see another problem with codePointAt(). You can't just replace existing usage of charCodeAt() with codePointAt() as that would fail for input with paired surrogates. E.g. a simple loop over a string that prints code points would print both the code point and the trail surrogate code point for a surrogate pair. I disagree. In those situations you should just iterate over the string using `for…of`. That seems to iterate over code units as far as I can tell. for (var x of ) print(x.charCodeAt(0)) invokes print() twice in Gecko. No that's not correct, the @@iterator method of String.prototype is supposed to returns an interator the iterates code points and returns single codepoint strings. The spec. for this will be in the next draft that I release. Allen ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
+1 for the simplified `at(symbolIndex)` I would expect '팆'.at(1) to fail same way 'a'.charAt(1) or 'a'.charCodeAt(1) would. I would expect '팆'.at(symbolIndex) to behave as `length` does based on unique symbol (unicode extra) so that everyone, except RAM and CPU, will have life easier with strings. Long story short: there's no symbol at 1, the symbol is at 0 because the size of that unicode string is 1 That said, I am sure the discussion went through this already ^_^ On Fri, Oct 18, 2013 at 9:57 AM, Allen Wirfs-Brock al...@wirfs-brock.comwrote: On Oct 18, 2013, at 7:21 AM, Rick Waldron wrote: On Fri, Oct 18, 2013 at 8:46 AM, Mathias Bynens math...@qiwi.be wrote: ES6 fixes `String.fromCharCode` by introducing `String.fromCodePoint`. Similarly, `String.prototype.charCodeAt` is fixed by `String.prototype.codePointAt`. Should there be a method that is like `String.prototype.charAt` except it deals with astral Unicode symbols wherever possible? '팆'.charAt(0) // U+1D306 '\uD834' // the first surrogate half for U+1D306 '팆'.symbolAt(0) // U+1D306 '팆' // U+1D306 I think the idea is good, but the name may be confusing with regard to Symbols (maybe not?) Given that we have charAt, charCodeAt and codePointAt, I think the most appropiate name for such a method would be 'at': '팆'.at(0) The issue when this sort of method has been discussed in the past has been what to do when you index at a trailing surrogate possition: '팆'.at(1) do you still get '팆' or do you get the equivalent of String.fromCharCode('팆'[1]) ? Allen ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
the size of that unicode string is 1 ... meaning the **virtual** size for human eyes On Fri, Oct 18, 2013 at 10:06 AM, Andrea Giammarchi andrea.giammar...@gmail.com wrote: +1 for the simplified `at(symbolIndex)` I would expect '팆'.at(1) to fail same way 'a'.charAt(1) or 'a'.charCodeAt(1) would. I would expect '팆'.at(symbolIndex) to behave as `length` does based on unique symbol (unicode extra) so that everyone, except RAM and CPU, will have life easier with strings. Long story short: there's no symbol at 1, the symbol is at 0 because the size of that unicode string is 1 That said, I am sure the discussion went through this already ^_^ On Fri, Oct 18, 2013 at 9:57 AM, Allen Wirfs-Brock al...@wirfs-brock.comwrote: On Oct 18, 2013, at 7:21 AM, Rick Waldron wrote: On Fri, Oct 18, 2013 at 8:46 AM, Mathias Bynens math...@qiwi.be wrote: ES6 fixes `String.fromCharCode` by introducing `String.fromCodePoint`. Similarly, `String.prototype.charCodeAt` is fixed by `String.prototype.codePointAt`. Should there be a method that is like `String.prototype.charAt` except it deals with astral Unicode symbols wherever possible? '팆'.charAt(0) // U+1D306 '\uD834' // the first surrogate half for U+1D306 '팆'.symbolAt(0) // U+1D306 '팆' // U+1D306 I think the idea is good, but the name may be confusing with regard to Symbols (maybe not?) Given that we have charAt, charCodeAt and codePointAt, I think the most appropiate name for such a method would be 'at': '팆'.at(0) The issue when this sort of method has been discussed in the past has been what to do when you index at a trailing surrogate possition: '팆'.at(1) do you still get '팆' or do you get the equivalent of String.fromCharCode('팆'[1]) ? Allen ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
if this is true then .at(symbolIndex) should be a no-brain ? ``` var virtualLength = 0; for (var x of ) { virtualLength++; } // equivalent of for(var i = 0; i virtualLength; i++) { .at(i); } ``` Am I missing something ? On Fri, Oct 18, 2013 at 10:03 AM, Allen Wirfs-Brock al...@wirfs-brock.comwrote: On Oct 18, 2013, at 9:05 AM, Anne van Kesteren wrote: On Fri, Oct 18, 2013 at 4:58 PM, Mathias Bynens math...@qiwi.be wrote: On 18 Oct 2013, at 10:48, Anne van Kesteren ann...@annevk.nl wrote: When you phrase it like that, I see another problem with codePointAt(). You can't just replace existing usage of charCodeAt() with codePointAt() as that would fail for input with paired surrogates. E.g. a simple loop over a string that prints code points would print both the code point and the trail surrogate code point for a surrogate pair. I disagree. In those situations you should just iterate over the string using `for…of`. That seems to iterate over code units as far as I can tell. for (var x of ) print(x.charCodeAt(0)) invokes print() twice in Gecko. No that's not correct, the @@iterator method of String.prototype is supposed to returns an interator the iterates code points and returns single codepoint strings. The spec. for this will be in the next draft that I release. Allen ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
On Oct 18, 2013, at 10:06 AM, Andrea Giammarchi wrote: +1 for the simplified `at(symbolIndex)` I would expect '팆'.at(1) to fail same way 'a'.charAt(1) or 'a'.charCodeAt(1) would. They are comparable, as the 'a' example are index out of bounds errors. We only use code unit indices with strings so '팆'[1] is valid (and so presumably should be '팆'.at(1) with 1 having the same meaning in each case. The most consistent way to define String.prototype.at be be: String.prototype.at = function(pos} { let cp = this.codePointAt(pos); return cp===undefined ? undefined : String.fromCodePoint(cp) } Allen ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
On Fri, Oct 18, 2013 at 12:03 PM, Allen Wirfs-Brock al...@wirfs-brock.com wrote: for (var x of ) print(x.charCodeAt(0)) invokes print() twice in Gecko. No that's not correct, the @@iterator method of String.prototype is supposed to returns an interator the iterates code points and returns single codepoint strings. Filed: https://bugzilla.mozilla.org/show_bug.cgi?id=928508 -j ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
fair enough, that was my point about except for RAM and CPU, life is going to be easier for devs so my counter-question would be: is there any way to do that in core so that we can “”.split() it so that we can have an ArrayLike that with [1] gives back the single “” and not the whole thing ? Or does Mathyas have already a RegExp able to split like that with reasonable perfomance ? P.S. I am in Chrome and Safari and I had no idea until I've seen that on twitter what kind of “” we were talking about :D On Fri, Oct 18, 2013 at 10:34 AM, Allen Wirfs-Brock al...@wirfs-brock.comwrote: On Oct 18, 2013, at 10:18 AM, Andrea Giammarchi wrote: if this is true then .at(symbolIndex) should be a no-brain ? ``` var virtualLength = 0; for (var x of ) { virtualLength++; } // equivalent of for(var i = 0; i virtualLength; i++) { .at(i); } ``` Am I missing something ? Yes, we don't want to introduce code point based direct indexing, which alway requires scanning from the front of the string. We already made that decision in the context of charPointAt which only use code unit indices. Allen On Fri, Oct 18, 2013 at 10:03 AM, Allen Wirfs-Brock al...@wirfs-brock.com wrote: On Oct 18, 2013, at 9:05 AM, Anne van Kesteren wrote: On Fri, Oct 18, 2013 at 4:58 PM, Mathias Bynens math...@qiwi.be wrote: On 18 Oct 2013, at 10:48, Anne van Kesteren ann...@annevk.nl wrote: When you phrase it like that, I see another problem with codePointAt(). You can't just replace existing usage of charCodeAt() with codePointAt() as that would fail for input with paired surrogates. E.g. a simple loop over a string that prints code points would print both the code point and the trail surrogate code point for a surrogate pair. I disagree. In those situations you should just iterate over the string using `for…of`. That seems to iterate over code units as far as I can tell. for (var x of ) print(x.charCodeAt(0)) invokes print() twice in Gecko. No that's not correct, the @@iterator method of String.prototype is supposed to returns an interator the iterates code points and returns single codepoint strings. The spec. for this will be in the next draft that I release. Allen ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
On 18 Oct 2013, at 11:05, Anne van Kesteren ann...@annevk.nl wrote: On Fri, Oct 18, 2013 at 4:58 PM, Mathias Bynens math...@qiwi.be wrote: I disagree. In those situations you should just iterate over the string using `for…of`. That seems to iterate over code units as far as I can tell. for (var x of ) print(x.charCodeAt(0)) invokes print() twice in Gecko. Woah, that doesn’t seem very useful. Is that a bug, or the way it’s supposed to work? I thought it was supposed to only iterate over whole code points (i.e. only print once for each code point, not once for each surrogate half). ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
On Oct 18, 2013, at 1:12 PM, Andrea Giammarchi wrote: fair enough, that was my point about except for RAM and CPU, life is going to be easier for devs so my counter-question would be: is there any way to do that in core so that we can “”.split() it so that we can have an ArrayLike that with [1] gives back the single “” and not the whole thing ? Array.from( '팆팆팆'))[1] Allen ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
Please ignore my previous email; it has been answered already. (It was a draft I wrote up this morning before I lost my internet connection.) On 18 Oct 2013, at 11:57, Allen Wirfs-Brock al...@wirfs-brock.com wrote: Given that we have charAt, charCodeAt and codePointAt, I think the most appropiate name for such a method would be 'at': '팆'.at(0) Love it! The issue when this sort of method has been discussed in the past has been what to do when you index at a trailing surrogate possition: '팆'.at(1) do you still get '팆' or do you get the equivalent of String.fromCharCode('팆'[1]) ? In my proposal it would return the equivalent of `String.fromCharCode('팆'[1])`. I think that’s the most sane behavior in that case. This also mimics the way `String.codePointAt` works in such a case. Here’s a prollyfill for `String.prototype.at` based on my earlier proposal: https://github.com/mathiasbynens/String.prototype.at Tests: https://github.com/mathiasbynens/String.prototype.at/blob/master/tests/tests.js ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
On 18 Oct 2013, at 15:12, Andrea Giammarchi andrea.giammar...@gmail.com wrote: so my counter-question would be: is there any way to do that in core so that we can “”.split() it so that we can have an ArrayLike that with [1] gives back the single “” and not the whole thing ? This brings us back to the earlier discussion of whether something like `String.prototype.codePoints` should be added: http://esdiscuss.org/topic/how-to-count-the-number-of-symbols-in-a-string I think it would be useful ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
If I understand Allen answer looks like `Array.from(“”).length` would do, being 3, and making the operation straight forward? Cheers On Fri, Oct 18, 2013 at 1:33 PM, Mathias Bynens math...@qiwi.be wrote: On 18 Oct 2013, at 15:12, Andrea Giammarchi andrea.giammar...@gmail.com wrote: so my counter-question would be: is there any way to do that in core so that we can “”.split() it so that we can have an ArrayLike that with [1] gives back the single “” and not the whole thing ? This brings us back to the earlier discussion of whether something like `String.prototype.codePoints` should be added: http://esdiscuss.org/topic/how-to-count-the-number-of-symbols-in-a-stringI think it would be useful ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
Given that you can only use the proposed String.prototype.at() properly for indexes 0 if you know the index of a non-BMP character or lead surrogate by some other means, or if you will test the return value for a trailing surrogate, is it really an advantage over using codePointAt / fromCodePoint? The name at is so tempting I'm imagining naive scripts of the form for (i = 0; i s.length; ++i) { r += s.at(i); } which will work fine until they get a non-BMP input at which point they're suddenly duplicating the trailing surrogates. Pushing people towards for-of iteration and even Allen's Array.from( '팆팆팆'))[1] seems safer; users who need more subtle things have have codePointAt / fromCodePoint available and hopefully the knowledge to use them. On Fri, Oct 18, 2013 at 1:30 PM, Mathias Bynens math...@qiwi.be wrote: Please ignore my previous email; it has been answered already. (It was a draft I wrote up this morning before I lost my internet connection.) On 18 Oct 2013, at 11:57, Allen Wirfs-Brock al...@wirfs-brock.com wrote: Given that we have charAt, charCodeAt and codePointAt, I think the most appropiate name for such a method would be 'at': '팆'.at(0) Love it! The issue when this sort of method has been discussed in the past has been what to do when you index at a trailing surrogate possition: '팆'.at(1) do you still get '팆' or do you get the equivalent of String.fromCharCode('팆'[1]) ? In my proposal it would return the equivalent of `String.fromCharCode('팆'[1])`. I think that’s the most sane behavior in that case. This also mimics the way `String.codePointAt` works in such a case. Here’s a prollyfill for `String.prototype.at` based on my earlier proposal: https://github.com/mathiasbynens/String.prototype.at Tests: https://github.com/mathiasbynens/String.prototype.at/blob/master/tests/tests.js ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
On Oct 18, 2013, at 1:29 PM, Allen Wirfs-Brock wrote: Array.from( '팆팆팆'))[1] maybe even better: Uint32Array.from( '팆팆팆'))[1] Allen ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
On Oct 18, 2013, at 4:01 PM, Allen Wirfs-Brock wrote: On Oct 18, 2013, at 1:29 PM, Allen Wirfs-Brock wrote: Array.from( '팆팆팆'))[1] maybe even better: Uint32Array.from( '팆팆팆'))[1] err...maybe not if you want a string value: String.fromCodePoint(Uint32Array.from( '팆팆팆')[1]) ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
On Oct 18, 2013, at 4:01 PM, Allen Wirfs-Brock wrote: / // On Oct 18, 2013, at 1:29 PM, Allen Wirfs-Brock wrote: // // Array.from( '???'))[1] // // maybe even better: // // Uint32Array.from( '???'))[1] / err...maybe not if you want a string value: String.fromCodePoint(Uint32Array.from( '???')[1]) That does not seem to be too useful: js String.fromCodePoint(Uint32Array.from(\u{1d306}\u{1d306}\u{1d306})[1]) \u According to http://norbertlindenberg.com/2012/05/ecmascript-supplementary-characters/index.html#String, String.prototype[@@iterator] does not return plain code points, but the String value for the code point. - André ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
On 18 Oct 2013, at 17:51, Joshua Bell jsb...@google.com wrote: Given that you can only use the proposed String.prototype.at() properly for indexes 0 if you know the index of a non-BMP character or lead surrogate by some other means, or if you will test the return value for a trailing surrogate, is it really an advantage over using codePointAt / fromCodePoint? The name at is so tempting I'm imagining naive scripts of the form for (i = 0; i s.length; ++i) { r += s.at(i); } which will work fine until they get a non-BMP input at which point they're suddenly duplicating the trailing surrogates. Pushing people towards for-of iteration and even Allen's Array.from( '팆팆팆'))[1] seems safer; users who need more subtle things have have codePointAt / fromCodePoint available and hopefully the knowledge to use them. Just because new features can be used incorrectly doesn’t mean the feature isn’t useful. `for…of` on strings and `String.prototype.at` are two very different things for two very different use cases. It’s a matter of using the right tool for the job, IMHO. In your example (iterating over all code points in a string), `for…of` should be used. `String.prototype.codePointAt` or `String.prototype.at` come in handy in case you only need to get the first code point or symbol in a string, for example. ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)
On 19 Oct 2013, at 01:12, Mathias Bynens math...@qiwi.be wrote: `String.prototype.codePointAt` or `String.prototype.at` come in handy in case you only need to get the first code point or symbol in a string, for example. Are they useful for anything else, though? For example, if I wanted to get the second symbol in a string, how would I do that? ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss