subject:"`String.prototype.symbolAt\(\)` \(improved `String.prototype.charAt\(\)`\)"

Note that `Array.from(str)` and `str[Symbol.iterator]` overlap
significantly.  In particular, it's somewhat awkward to iterate over
code points using `String#symbolAt`; it's much easier to use
`substr()` and then use the StringIterator.
  --scott
ps. I see that Domenic has said something similar.

On Thu, Feb 13, 2014 at 11:34 PM, Mathias Bynens math...@qiwi.be wrote:
 Allen mentioned that `String#at` might not make it to ES6 because nobody in 
 TC39 is championing it. I’ve now asked Rick if he would be the champion for 
 this, and he agreed. (Thanks again!)

 Looking over the ‘TC39 progress’ document at 
 https://docs.google.com/a/chromium.org/document/d/1QbEE0BsO4lvl7NFTn5WXWeiEIBfaVUF7Dk0hpPpPDzU,
  it seems most of the work is already taken care of: the use case was 
 discussed in this thread, the proposal has a complete spec text, and there’s 
 an example implementation/polyfill with unit tests. See http://mths.be/at.

 Is there anything else I can do to help get this included as a 
 non-TC39-member?
 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2014-02-14 Thread Mathias Bynens

On 14 Feb 2014, at 11:11, Domenic Denicola dome...@domenicdenicola.com wrote:

 This was the method that was only useful if you pass `0` to it?

I’ll just avoid the infinite loop here by pointing to earlier posts in this 
thread where this was discussed before: 
http://esdiscuss.org/topic/string-prototype-symbolat-improved-string-prototype-charat#content-34
 and 
http://esdiscuss.org/topic/string-prototype-symbolat-improved-string-prototype-charat#content-40.

This method is just as useful as `String.prototype.codePointAt`. If that method 
is included, so should `String.prototype.at`. If `String.prototype.at` is found 
not to be useful, `String.prototype.codePointAt` should be removed too.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2014-02-14 Thread Mathias Bynens

On 14 Feb 2014, at 11:14, C. Scott Ananian ecmascr...@cscott.net wrote:

 Note that `Array.from(str)` and `str[Symbol.iterator]` overlap
 significantly.  In particular, it's somewhat awkward to iterate over
 code points using `String#symbolAt`; it's much easier to use
 `substr()` and then use the StringIterator.

`String#at` is not meant for iterating over code points – that’s what the 
`StringIterator` is for.

`String#at` is exactly like `String#codePointAt` except it returns strings 
(containing the symbol) instead of numbers (representing the code point value). 
It can be used to get the symbol at a given code unit position in a string 
(similar to how `String#codePointAt` can be used to get the code point at a 
given code unit position in a string).
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

Yes, I know what `String#at` is supposed to do.

I was pointing out that `String#at` makes it easy to do the wrong
thing.  If you do `Array.from(str)` then you suddenly have a complete
random-access data structure where you can find out the number of code
points in the String, iterate it in reverse from the end to the start,
slice it, find the midpoint, etc.  `Array.from` looks like an O(n)
operation, and it is -- so it encourages developers to cache the value
and reuse it.

That said, I can see where a lexer might want to use `String#at`,
being careful to do the correct index bump based on `result.length`.
However, the fastest JS lexers don't create String objects, they
operate directly on the code point (see
http://marijnhaverbeke.nl/acorn/#section-58).  So I'm -0, mostly
because the name isn't great.  But I have exactly zero say in the
matter anyway.  So I'll shut up now.
 --scott
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

RE: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2014-02-14 Thread Domenic Denicola

I think Mathias's point, that it is exactly as useful or useless as 
`codePointAt`, is a reasonable one. However,

 This method is just as useful as `String.prototype.codePointAt`. If that 
 method is included, so should `String.prototype.at`. If `String.prototype.at` 
 is found not to be useful, `String.prototype.codePointAt` should be removed 
 too.

This does not follow. The choice is not between adding two useless methods and 
adding zero. There is no reason to exclude the possibility of adding only one 
useless method.

But anyway, as some people seem to think that both methods are in fact 
useful---including Rick, who has agreed to champion---I agree with Scott that 
after having said our piece it's time to exit the thread.

-Original Message-
From: es-discuss [mailto:es-discuss-boun...@mozilla.org] On Behalf Of C. Scott 
Ananian
Sent: Friday, February 14, 2014 12:12
To: Mathias Bynens
Cc: es-discuss@mozilla.org list
Subject: Re: `String.prototype.symbolAt()` (improved 
`String.prototype.charAt()`)

Yes, I know what `String#at` is supposed to do.

I was pointing out that `String#at` makes it easy to do the wrong thing.  If 
you do `Array.from(str)` then you suddenly have a complete random-access data 
structure where you can find out the number of code points in the String, 
iterate it in reverse from the end to the start, slice it, find the midpoint, 
etc.  `Array.from` looks like an O(n) operation, and it is -- so it encourages 
developers to cache the value and reuse it.

That said, I can see where a lexer might want to use `String#at`, being careful 
to do the correct index bump based on `result.length`.
However, the fastest JS lexers don't create String objects, they operate 
directly on the code point (see http://marijnhaverbeke.nl/acorn/#section-58).  
So I'm -0, mostly because the name isn't great.  But I have exactly zero say in 
the matter anyway.  So I'll shut up now.
 --scott
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2014-02-14 Thread Rick Waldron

On Fri, Feb 14, 2014 at 1:34 AM, Mathias Bynens math...@qiwi.be wrote:

 Allen mentioned that `String#at` might not make it to ES6 because nobody
 in TC39 is championing it. I've now asked Rick if he would be the champion
 for this, and he agreed. (Thanks again!)


Published to wiki here:
http://wiki.ecmascript.org/doku.php?id=strawman:string_at

Rick
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2014-02-14 Thread Allen Wirfs-Brock

On Feb 14, 2014, at 1:34 AM, Mathias Bynens wrote:

 Allen mentioned that `String#at` might not make it to ES6 because nobody in 
 TC39 is championing it. I’ve now asked Rick if he would be the champion for 
 this, and he agreed. (Thanks again!)
 
 Looking over the ‘TC39 progress’ document at 
 https://docs.google.com/a/chromium.org/document/d/1QbEE0BsO4lvl7NFTn5WXWeiEIBfaVUF7Dk0hpPpPDzU,
  it seems most of the work is already taken care of: the use case was 
 discussed in this thread, the proposal has a complete spec text, and there’s 
 an example implementation/polyfill with unit tests. See http://mths.be/at.
 
 Is there anything else I can do to help get this included as a 
 non-TC39-member?
 

But just to be even clear,  the new feature gate for ES6 is officially closed.

It's a really high bar to get over that closed gate.  Unless the exclusion of a 
feature was a mistake, fixes a bug, or is somehow essentially to supporting 
something that is already in ES6 I don't think we should be talking about 
adding it to ES6.

I don't think String.prototype.at fits any of those criteria.  We've talked 
about it several times, including in the context of Norbert's original ES6 full 
unicode support proposal, and never achieved consensus on including it.  
Personally, I think it should be there but it's time to start talking about it 
for ES7 not ES6.

Allen 


___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2014-02-14 Thread Rick Waldron

On Fri, Feb 14, 2014 at 10:59 AM, Allen Wirfs-Brock
al...@wirfs-brock.comwrote:

On Feb 14, 2014, at 1:34 AM, Mathias Bynens wrote:

Allen mentioned that `String#at` might not make it to ES6 because nobody
in TC39 is championing it. I've now asked Rick if he would be the champion
for this, and he agreed. (Thanks again!)

Looking over the 'TC39 progress' document at
https://docs.google.com/a/chromium.org/document/d/1QbEE0BsO4lvl7NFTn5WXWeiEIBfaVUF7Dk0hpPpPDzU,
it seems most of the work is already taken care of: the use case was
discussed in this thread, the proposal has a complete spec text, and
there's an example implementation/polyfill with unit tests. See
http://mths.be/at.

Is there anything else I can do to help get this included as a
non-TC39-member?

But just to be even clear, the new feature gate for ES6 is officially
closed.

It's a really high bar to get over that closed gate. Unless the exclusion
of a feature was a mistake, fixes a bug, or is somehow essentially to
supporting something that is already in ES6 I don't think we should be
talking about adding it to ES6.

I don't think String.prototype.at fits any of those criteria. We've
talked about it several times, including in the context of Norbert's
original ES6 full unicode support proposal, and never achieved consensus on
including it. Personally, I think it should be there but it's time to
start talking about it for ES7 not ES6.

Yes, I absolutely agree, apologies as I realize that was not addressed in
my previous message.

Rick

Allen

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

I'm excited to start working on es7-shim once we get to that point!
(String.prototype.at has a particularly simple shim, thankfully...)
  --scott
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2014-02-14 Thread Rick Waldron

On Fri, Feb 14, 2014 at 12:23 PM, C. Scott Ananian ecmascr...@cscott.netwrote:

 I'm excited to start working on es7-shim once we get to that point!
 (String.prototype.at has a particularly simple shim, thankfully...)


Have you seen: https://github.com/mathiasbynens/String.prototype.at ?


Rick
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

yes, of course.  es6-shim is a large-ish collection of such.

However, it would be much better to use an implementation of
`String#at` which used substr and thus avoided creating and appending
a new string object.
 --scott
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2013-10-20 Thread Mathias Bynens

On 19 Oct 2013, at 12:54, Domenic Denicola dome...@domenicdenicola.com wrote:

 My proposed cowpaths:
 
 ```js
 Object.mixin(String.prototype, {
  realCharacterAt(i) {
let index = 0;
for (var c of this) {
  if (index++ === i) {
return c;
  }
}
  }
  get realLength() {
let counter = 0;
for (var c of this) {
  ++counter;
}
return counter;
  }
 });
 ```

Good stuff!

To account for [lookalike symbols due to combining marks] [1], just add a call 
to `String.prototype.normalize`:

Object.mixin(String.prototype, {
  get realLength() {
let counter = 0;
for (var c of this.normalize('NFC')) {
  ++counter;
}
return counter;
  }
});

assert('ma\xF1ana'.realLength == 'man\u0303ana'.realLength);

[1]: http://mathiasbynens.be/notes/javascript-unicode#accounting-for-lookalikes

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2013-10-19 Thread Andrea Giammarchi

so it's a for/of with a break when it finds a code point? if that's the
only use case I'd like to have an example of how convenient it is. I am
just wondering, not saying is not useful (trying to understand
when/where/why I'd like to use .at())

Thanks


On Fri, Oct 18, 2013 at 10:12 PM, Mathias Bynens math...@qiwi.be wrote:

 On 18 Oct 2013, at 17:51, Joshua Bell jsb...@google.com wrote:

  Given that you can only use the proposed String.prototype.at() properly
 for indexes  0 if you know the index of a non-BMP character or lead
 surrogate by some other means, or if you will test the return value for a
 trailing surrogate, is it really an advantage over using codePointAt /
 fromCodePoint?
 
  The name at is so tempting I'm imagining naive scripts of the form for
 (i = 0; i  s.length; ++i) { r += s.at(i); } which will work fine until
 they get a non-BMP input at which point they're suddenly duplicating the
 trailing surrogates.
 
  Pushing people towards for-of iteration and even Allen's Array.from(
 '팆팆팆'))[1] seems safer; users who need more subtle things have have
 codePointAt / fromCodePoint available and hopefully the knowledge to use
 them.

 Just because new features can be used incorrectly doesn’t mean the feature
 isn’t useful. `for…of` on strings and `String.prototype.at` are two very
 different things for two very different use cases. It’s a matter of using
 the right tool for the job, IMHO.

 In your example (iterating over all code points in a string), `for…of`
 should be used.

 `String.prototype.codePointAt` or `String.prototype.at` come in handy in
 case you only need to get the first code point or symbol in a string, for
 example.

 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2013-10-19 Thread Allen Wirfs-Brock


On Oct 18, 2013, at 10:53 PM, Domenic Denicola wrote:

 On 19 Oct 2013, at 01:12, Mathias Bynens math...@qiwi.be wrote:
 `String.prototype.codePointAt` or `String.prototype.at` come in handy in 
 case you only need to get the first code point or symbol in a string, for 
 example.
 
 Are they useful for anything else, though? For example, if I wanted to get 
 the second symbol in a string, how would I do that?

We discussed the utility of 'codePointAt' in the context of Norbert's full 
Unicode support proposal.  At that time we concluded that it was something we 
needed.  I don't see any new evidence  that suggests that we need to reopen 
that decision at this point in the process.

The utility of a hypothetical 'at' method is presumably exactly that of 
'codePointAt'. 

   str.at(p)
would just be a convenience  for expressing
   String.fromCodePoint(str.codePointAt(p))

So the real question is probably, how common is that  use case.

It's relatively easy using 'at'  do a for loop over the characters of a string 
using 'at'. Something like:

let c = '';
for (let p=0; pstr.length; p+=c.length) {
   c = str.at(p);
   ...
}

although, a for-of would be better in most cases:
   for (let c of str)

The use case that we don't support well is any sort of back wards iteration of 
the characters of a string. We don't current have an iterator specified to do 
it, nor do we have a one stop way to test whether we at looking at the trailing 
surrogate of a surrogate pair.

Allen


___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2013-10-19 Thread Allen Wirfs-Brock


On Oct 18, 2013, at 4:22 PM, André Bargull wrote:

 On Oct 18, 2013, at 4:01 PM, Allen Wirfs-Brock wrote:
 
  
  On Oct 18, 2013, at 1:29 PM, Allen Wirfs-Brock wrote:
  
  Array.from( '팆팆팆'))[1]
  
  maybe even better:
  
  Uint32Array.from( '팆팆팆'))[1]
 
 err...maybe not if you want a string value:
 
 String.fromCodePoint(Uint32Array.from( '팆팆팆')[1])
 
 That does not seem to be too useful:
 
 js String.fromCodePoint(Uint32Array.from(\u{1d306}\u{1d306}\u{1d306})[1])
 \u

right, it would need to be

String.fromCodePoint(Uint32Array.from( '팆팆팆', s=s.codePointAt(0))[1])
 
 
 According to 
 http://norbertlindenberg.com/2012/05/ecmascript-supplementary-characters/index.html#String,
  String.prototype[@@iterator] does not return plain code points, but the 
 String value for the code point.

yes, that's correct and how I have it spec'ed in rev20

Allen___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2013-10-19 Thread Bjoern Hoehrmann

* Allen Wirfs-Brock wrote:
The utility of a hypothetical 'at' method is presumably exactly that of 
'codePointAt'. 

   str.at(p)
would just be a convenience  for expressing
   String.fromCodePoint(str.codePointAt(p))

So the real question is probably, how common is that  use case.

Certainly not common enough to warrant a two-character method on the
native string type. Odds are people will use it incorrectly in an
attempt to make their code look concise, not understanding that it'll
retrieve a substring of .length 1 or 2, possibly consisting of a lone
surrogate, based on a 16 bit index that might fall in the middle of a
character; the problematic cases are fairly rare, so it's hard to
notice improper use of `.at` in automated testing or in code review.
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2013-10-19 Thread Mathias Bynens

On 19 Oct 2013, at 12:15, Bjoern Hoehrmann derhoe...@gmx.net wrote:

 Certainly not common enough to warrant a two-character method on the
 native string type. Odds are people will use it incorrectly in an
 attempt to make their code look concise […]

Are you saying that changing the name to something that is longer than `at` 
would solve this problem?

 […] not understanding that it'll retrieve a substring of .length 1 or 2,
 possibly consisting of a lone surrogate, based on a 16 bit index that
 might fall in the middle of a character; the problematic cases are
 fairly rare, so it's hard to notice improper use of `.at` in automated
 testing or in code review.

People are using `String.prototype.charAt()` incorrectly too, expecting it to 
return whole symbols instead of surrogate halves wherever possible. How would 
_not_ introducing a method that avoids this problem help?
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2013-10-19 Thread Mathias Bynens

On 19 Oct 2013, at 00:53, Domenic Denicola dome...@domenicdenicola.com wrote:

 On 19 Oct 2013, at 01:12, Mathias Bynens math...@qiwi.be wrote:
 `String.prototype.codePointAt` or `String.prototype.at` come in handy in 
 case you only need to get the first code point or symbol in a string, for 
 example.
 
 Are they useful for anything else, though? For example, if I wanted to get 
 the second symbol in a string, how would I do that?

Yeah, that’s the problem with these methods. Additional user code is required 
to handle non-zero `position` arguments, unless you’re sure the `position` is 
actually the start of a code point (and not in the middle of a surrogate pair). 
I guess there are situations where that’s a certainty, for example when you’re 
dealing with a string in which the user selected some text.

This brings us back to the earlier discussion of whether something like 
`String.prototype.codePoints` should be added: 
http://esdiscuss.org/topic/how-to-count-the-number-of-symbols-in-a-string It 
could be a getter or a generator… Or does `for…of` iteration handle this use 
case adequately?
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

RE: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2013-10-19 Thread Domenic Denicola

From: Mathias Bynens [mailto:math...@qiwi.be]


 This brings us back to the earlier discussion of whether something like 
 `String.prototype.codePoints` should be added: 
 http://esdiscuss.org/topic/how-to-count-the-number-of-symbols-in-a-string It 
 could be a getter or a generator… Or does `for…of` iteration handle this use 
 case adequately?

It sounds like you are proposing a second name for 
`String.prototype[Symbol.iterator]`, which does not sound very useful.

A property for the string's real length does seem somewhat useful, as does a 
method that does random-access on real characters. Certainly more useful than 
the proposed symbolAt/at. But I suppose we can pave whatever cowpaths arise.

My proposed cowpaths:

```js
Object.mixin(String.prototype, {
  realCharacterAt(i) {
let index = 0;
for (var c of this) {
  if (index++ === i) {
return c;
  }
}
  }
  get realLength() {
let counter = 0;
for (var c of this) {
  ++counter;
}
return counter;
  }
});
```

This would allow you to e.g. find the character in the real middle of a 
string with code like

```js
var middleIndex = Math.floor(theString.realLength / 2);
var middleRealCharacter = theString.realCharacterAt(middleIndex);
```

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2013-10-19 Thread Andrea Giammarchi

AFAIK that's also what Allen said didn't want to implement in core. An
expensive operation per each invocation due stateless loop over arbitrary
indexes.

Although, strings are immutable in JS so I'd implement that logic creating
a snapshot once and use that as if it was an Array ... something like the
following:

```javascript

!function(dict){

  function getOrCreate(str) {
if (!(str in dict)) {
  dict[str] = {
i: 0,
l: 0,
v: (Array.from || function(){
  // miserable callback
  return str.split('')
})(str)
// or the for/of loop
  };
}
// times it's used
dict[str].i++;
return dict[str].v;
  }

  setInterval(function () {
var key, value;
for(key in dict) {
  value = dict[key];
  value.l = value.i - value.l;
  // used only once or never used again
  if (value.l  2) {
// free all the RAM
delete dict[key];
  }
}
  }, 5000); // 5 seconds should be enough ?
// incremental works better with
// slower timeout though
// 500 might be good too

  Object.defineProperties(
String.prototype,
{
  at: {
configurable: true,
writable: true,
value: function at(i) {
  return getOrCreate(this)[i];
}
  },
  // or any meaningful name
  size: {
configurable: true,
get: function () {
  return getOrCreate(this).length;
}
  }
}
  );

}(Object.create(null));


// @example
var str = 'abc';
alert([
  str.size, // 3
  str.at(1) // b
]);


```

Regards




On Sat, Oct 19, 2013 at 10:54 AM, Domenic Denicola 
dome...@domenicdenicola.com wrote:

 From: Mathias Bynens [mailto:math...@qiwi.be]


  This brings us back to the earlier discussion of whether something like
 `String.prototype.codePoints` should be added:
 http://esdiscuss.org/topic/how-to-count-the-number-of-symbols-in-a-stringIt 
 could be a getter or a generator… Or does `for…of` iteration handle this
 use case adequately?

 It sounds like you are proposing a second name for
 `String.prototype[Symbol.iterator]`, which does not sound very useful.

 A property for the string's real length does seem somewhat useful, as
 does a method that does random-access on real characters. Certainly more
 useful than the proposed symbolAt/at. But I suppose we can pave whatever
 cowpaths arise.

 My proposed cowpaths:

 ```js
 Object.mixin(String.prototype, {
   realCharacterAt(i) {
 let index = 0;
 for (var c of this) {
   if (index++ === i) {
 return c;
   }
 }
   }
   get realLength() {
 let counter = 0;
 for (var c of this) {
   ++counter;
 }
 return counter;
   }
 });
 ```

 This would allow you to e.g. find the character in the real middle of a
 string with code like

 ```js
 var middleIndex = Math.floor(theString.realLength / 2);
 var middleRealCharacter = theString.realCharacterAt(middleIndex);
 ```

 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2013-10-19 Thread Andrea Giammarchi

example mroe readable and with some typo fixed in github:
https://gist.github.com/WebReflection/7059536

license wtfpl v2 http://www.wtfpl.net/txt/copying/

Cheers


On Sat, Oct 19, 2013 at 11:18 AM, Andrea Giammarchi 
andrea.giammar...@gmail.com wrote:

 AFAIK that's also what Allen said didn't want to implement in core. An
 expensive operation per each invocation due stateless loop over arbitrary
 indexes.

 Although, strings are immutable in JS so I'd implement that logic creating
 a snapshot once and use that as if it was an Array ... something like the
 following:

 ```javascript

 !function(dict){

   function getOrCreate(str) {
 if (!(str in dict)) {
   dict[str] = {
 i: 0,
 l: 0,
 v: (Array.from || function(){
   // miserable callback
   return str.split('')
 })(str)
 // or the for/of loop
   };
 }
 // times it's used
 dict[str].i++;
 return dict[str].v;
   }

   setInterval(function () {
 var key, value;
 for(key in dict) {
   value = dict[key];
   value.l = value.i - value.l;
   // used only once or never used again
   if (value.l  2) {
 // free all the RAM
 delete dict[key];
   }
 }
   }, 5000); // 5 seconds should be enough ?
 // incremental works better with
 // slower timeout though
 // 500 might be good too

   Object.defineProperties(
 String.prototype,
 {
   at: {
 configurable: true,
 writable: true,
 value: function at(i) {
   return getOrCreate(this)[i];
 }
   },
   // or any meaningful name
   size: {
 configurable: true,
 get: function () {
   return getOrCreate(this).length;
 }
   }
 }
   );

 }(Object.create(null));


 // @example
 var str = 'abc';
 alert([
   str.size, // 3
   str.at(1) // b
 ]);


 ```

 Regards




 On Sat, Oct 19, 2013 at 10:54 AM, Domenic Denicola 
 dome...@domenicdenicola.com wrote:

 From: Mathias Bynens [mailto:math...@qiwi.be]


  This brings us back to the earlier discussion of whether something like
 `String.prototype.codePoints` should be added:
 http://esdiscuss.org/topic/how-to-count-the-number-of-symbols-in-a-stringIt 
 could be a getter or a generator… Or does `for…of` iteration handle this
 use case adequately?

 It sounds like you are proposing a second name for
 `String.prototype[Symbol.iterator]`, which does not sound very useful.

 A property for the string's real length does seem somewhat useful, as
 does a method that does random-access on real characters. Certainly more
 useful than the proposed symbolAt/at. But I suppose we can pave whatever
 cowpaths arise.

 My proposed cowpaths:

 ```js
 Object.mixin(String.prototype, {
   realCharacterAt(i) {
 let index = 0;
 for (var c of this) {
   if (index++ === i) {
 return c;
   }
 }
   }
   get realLength() {
 let counter = 0;
 for (var c of this) {
   ++counter;
 }
 return counter;
   }
 });
 ```

 This would allow you to e.g. find the character in the real middle of a
 string with code like

 ```js
 var middleIndex = Math.floor(theString.realLength / 2);
 var middleRealCharacter = theString.realCharacterAt(middleIndex);
 ```

 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss



___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2013-10-19 Thread Bjoern Hoehrmann

* Mathias Bynens wrote:
On 19 Oct 2013, at 12:15, Bjoern Hoehrmann derhoe...@gmx.net wrote:

 Certainly not common enough to warrant a two-character method on the
 native string type. Odds are people will use it incorrectly in an
 attempt to make their code look concise […]

Are you saying that changing the name to something that is longer than 
`at` would solve this problem?

If it was `.getOneOrTwoCodepointLongSubstringAtUcs2CodeUnitIndex(...)`
I am sure people would be reluctant using it because it's unreasonably
long compared to `String.fromCodePoint(str.codePointAt(p))` and harder
to understand than the combination of those two primitives.

 […] not understanding that it'll retrieve a substring of .length 1 or 2,
 possibly consisting of a lone surrogate, based on a 16 bit index that
 might fall in the middle of a character; the problematic cases are
 fairly rare, so it's hard to notice improper use of `.at` in automated
 testing or in code review.

People are using `String.prototype.charAt()` incorrectly too, expecting
it to return whole symbols instead of surrogate halves wherever possible.
How would _not_ introducing a method that avoids this problem help?

Right now people do not have much of a choice other than writing code
that does not do the right thing when faced with malformed strings or
non-BMP characters, it's unreasonable to call a method like `substr`
and then manually smooth it up around the edges and perhaps scan the
interior for lone surrogates to ensure that at least your code doesn't
do the wrong thing. That gives you well-known bad code, which is a
good thing to have, better than more complicated code that might have
unknown bugs. Allen's loop `for (let p=0; pstr.length; p+=c.length)`
for instance is just waiting for someone to improve or replace it with
code that increments by `1` instead of `.length` because that's simpler.

The methods `fromCodePoint` and `codePointAt` can be used to get ugly
constants out of code that tries to do the right thing, and they will
offer some insight into how developers might go from UCS-only code to
something more proper, but for the moment duplicating all the UCS-based
methods strikes me as premature, especially when giving them seductive
names. How would a somewhat-surrogate-aware `substring` method work and
what would it be called, for instance? If it is omitted, we would be
back to square one, someone in need of substring functionality has to
jump through overly complicated hoops to make it work correctly and
ends up mixing surrogate-pair-aware with -unaware code.
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2013-10-19 Thread Brendan Eich


Allen Wirfs-Brock wrote:

The use case that we don't support well is any sort of back wards iteration of 
the characters of a string. We don't current have an iterator specified to do 
it, nor do we have a one stop way to test whether we at looking at the trailing 
surrogate of a surrogate pair.


What do you mean by one stop? O(1)? We aren't going to mandate 
implementations make such tests (or backward iteration) that cheap.


Is there yet a real world (from the field, not a testcase) use-case for 
backward iteration?


/be
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

`String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

ES6 fixes `String.fromCharCode` by introducing `String.fromCodePoint`.

Similarly, `String.prototype.charCodeAt` is fixed by 
`String.prototype.codePointAt`.

Should there be a method that is like `String.prototype.charAt` except it deals 
with astral Unicode symbols wherever possible?

 '팆'.charAt(0) // U+1D306
'\uD834' // the first surrogate half for U+1D306

 '팆'.symbolAt(0) // U+1D306
'팆' // U+1D306

Has this been discussed before? If there’s any interest I’d be happy to create 
a strawman.

Mathias  
http://mathiasbynens.be/
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

On Fri, Oct 18, 2013 at 8:46 AM, Mathias Bynens math...@qiwi.be wrote:

 ES6 fixes `String.fromCharCode` by introducing `String.fromCodePoint`.

 Similarly, `String.prototype.charCodeAt` is fixed by
 `String.prototype.codePointAt`.

 Should there be a method that is like `String.prototype.charAt` except it
 deals with astral Unicode symbols wherever possible?

  '팆'.charAt(0) // U+1D306
 '\uD834' // the first surrogate half for U+1D306

  '팆'.symbolAt(0) // U+1D306
 '팆' // U+1D306


I think the idea is good, but the name may be confusing with regard to
Symbols (maybe not?)

Rick
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

On 18 Oct 2013, at 09:21, Rick Waldron waldron.r...@gmail.com wrote:

 I think the idea is good, but the name may be confusing with regard to 
 Symbols (maybe not?)

Yeah, I thought about that, but couldn’t figure out a better name. “Glyph” or 
“Grapheme” wouldn’t be accurate. Any suggestions?

Anyway, if everyone agrees this is a good idea I’ll get started on fleshing out 
a proposal. We can then use this thread to bikeshed about the name.

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2013-10-18 Thread Benjamin (Inglor) Gruenbaum

I also noticed the naming similarity to ES6 `Symbol`s.

 I've seen people fill  `String.prototype.getFullChar` before and similarly
things like `String.prototype.fromFullCharCode` for dealing with surrogates
before. I like `String.prototype.signAt` but I haven't seen it used before.

I'm eager to hear what Allen has to say about this given his work on
unicode in ecmascript. Especially how it settles with this
http://wiki.ecmascript.org/doku.php?id=strawman:support_full_unicode_in_stringsrev=1304034700


I also think that this is important enough to be there.

-- Forwarded message --
From: Mathias Bynens math...@qiwi.be
To: Rick Waldron waldron.r...@gmail.com
Cc: es-discuss@mozilla.org list es-discuss@mozilla.org
Date: Fri, 18 Oct 2013 09:47:21 -0500
Subject: Re: `String.prototype.symbolAt()` (improved
`String.prototype.charAt()`)
On 18 Oct 2013, at 09:21, Rick Waldron waldron.r...@gmail.com wrote:

 I think the idea is good, but the name may be confusing with regard to
Symbols (maybe not?)

Yeah, I thought about that, but couldn’t figure out a better name. “Glyph”
or “Grapheme” wouldn’t be accurate. Any suggestions?

Anyway, if everyone agrees this is a good idea I’ll get started on fleshing
out a proposal. We can then use this thread to bikeshed about the name.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

Here’s my proposal. Feedback welcome, as well as suggestions for a better name 
(if any).

## String.prototype.symbolAt(pos)

NOTE: Returns a single-element String containing the code point at element 
position `pos` in the String `value` resulting from converting the `this` 
object to a String. If there is no element at that position, the result is the 
empty String. The result is a String value, not a String object.

When the `symbolAt` method is called with one argument `pos`, the following 
steps are taken:

01. Let `O` be `CheckObjectCoercible(this value)`.
02. Let `S` be `ToString(O)`.
03. `ReturnIfAbrupt(S)`.
04. Let `position` be `ToInteger(pos)`.
05. `ReturnIfAbrupt(position)`.
06. Let `size` be the number of elements in `S`.
07. If `position  0` or `position ≥ size`, return the empty String.
08. Let `first` be the code unit at index `position` in the String `S`.
09. Let `cuFirst` be the code unit value of the element at index `0` in the 
String `first`.
10. If `cuFirst  0xD800` or `cuFirst  0xDBFF` or `position + 1 = size`, then 
return `first`.
11. Let `cuSecond` be the code unit value of the element at index `position + 
1` in the String `S`.
12. If `cuSecond  0xDC00` or `cuSecond  0xDFFF`, then return `first`.
13. Let `second` be the code unit at index `position + 1` in the string `S`.
14. Let `cp` be `(first – 0xD800) × 0x400 + (second – 0xDC00) + 0x1`.
15. Return the elements of the UTF-16 Encoding (clause 6) of `cp`.

NOTE: The `symbolAt` function is intentionally generic; it does not require 
that its `this` value be a String object. Therefore it can be transferred to 
other kinds of objects for use as a method.

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

On Fri, Oct 18, 2013 at 10:47 AM, Mathias Bynens math...@qiwi.be wrote:

 On 18 Oct 2013, at 09:21, Rick Waldron waldron.r...@gmail.com wrote:

  I think the idea is good, but the name may be confusing with regard to
 Symbols (maybe not?)

 Yeah, I thought about that, but couldn’t figure out a better name. “Glyph”
 or “Grapheme” wouldn’t be accurate. Any suggestions?

 Anyway, if everyone agrees this is a good idea I’ll get started on
 fleshing out a proposal. We can then use this thread to bikeshed about the
 name.


I think it's worthwhile to write up a proposal.

And the shed should always be pink ;)

Rick
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

On Fri, Oct 18, 2013 at 11:15 AM, Mathias Bynens math...@qiwi.be wrote:

 Here’s my proposal. Feedback welcome, as well as suggestions for a better
 name (if any).

 ## String.prototype.symbolAt(pos)


Here goes...

String.prototype.elementAt?

Rick
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

RE: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2013-10-18 Thread Domenic Denicola

Doesn't Unicode have some name for visual representation of a code point? 
Maybe it's symbol?
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2013-10-18 Thread Anne van Kesteren

On Fri, Oct 18, 2013 at 1:46 PM, Mathias Bynens math...@qiwi.be wrote:
 Similarly, `String.prototype.charCodeAt` is fixed by 
 `String.prototype.codePointAt`.

When you phrase it like that, I see another problem with
codePointAt(). You can't just replace existing usage of charCodeAt()
with codePointAt() as that would fail for input with paired
surrogates. E.g. a simple loop over a string that prints code points
would print both the code point and the trail surrogate code point for
a surrogate pair.

The same goes for this new method. I still think that only offering a
better way to iterate strings (as planned) seems like a much safer
start into this brave new code point-based world.


-- 
http://annevankesteren.nl/
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

On 18 Oct 2013, at 10:48, Anne van Kesteren ann...@annevk.nl wrote:

 On Fri, Oct 18, 2013 at 1:46 PM, Mathias Bynens math...@qiwi.be wrote:
 Similarly, `String.prototype.charCodeAt` is fixed by 
 `String.prototype.codePointAt`.
 
 When you phrase it like that, I see another problem with
 codePointAt(). You can't just replace existing usage of charCodeAt()
 with codePointAt() as that would fail for input with paired
 surrogates. E.g. a simple loop over a string that prints code points
 would print both the code point and the trail surrogate code point for
 a surrogate pair.

I disagree. In those situations you should just iterate over the string using 
`for…of`.

`.symbolAt()` can be a useful replacement for `.charAt()` in case you only need 
to get the first symbol in the string. The same goes for `.codePointAt()` vs. 
`.charCodeAt()`.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

On Fri, Oct 18, 2013 at 11:53 AM, Mathias Bynens math...@qiwi.be wrote:

 On 18 Oct 2013, at 10:25, Rick Waldron waldron.r...@gmail.com wrote:

  String.prototype.elementAt?

 This may be confusing too, since the spec refers to `elements` as code
 units, not code points.


Yes, slight mis-reading of your proposal—thanks for clarifying

Rick
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2013-10-18 Thread Anne van Kesteren

On Fri, Oct 18, 2013 at 4:58 PM, Mathias Bynens math...@qiwi.be wrote:
 On 18 Oct 2013, at 10:48, Anne van Kesteren ann...@annevk.nl wrote:
 When you phrase it like that, I see another problem with
 codePointAt(). You can't just replace existing usage of charCodeAt()
 with codePointAt() as that would fail for input with paired
 surrogates. E.g. a simple loop over a string that prints code points
 would print both the code point and the trail surrogate code point for
 a surrogate pair.

 I disagree. In those situations you should just iterate over the string using 
 `for…of`.

That seems to iterate over code units as far as I can tell.

  for (var x of )
print(x.charCodeAt(0))

invokes print() twice in Gecko.


-- 
http://annevankesteren.nl/
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2013-10-18 Thread André Bargull


/  I disagree. In those situations you should just iterate over the string 
using `for...of`.
/
That seems to iterate over code units as far as I can tell.

   for (var x of ?)
 print(x.charCodeAt(0))

invokes print() twice in Gecko.


SpiderMonkey does not implement the (yet to be) spec'ed 
String.prototype.@@iterator function, instead it simply aliases 
String.prototype[@@iterator] to Array.prototype[@@iterator]:


js String.prototype[@@iterator] === Array.prototype[@@iterator]
true


- André
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)


On Oct 18, 2013, at 7:21 AM, Rick Waldron wrote:

 
 
 
 On Fri, Oct 18, 2013 at 8:46 AM, Mathias Bynens math...@qiwi.be wrote:
 ES6 fixes `String.fromCharCode` by introducing `String.fromCodePoint`.
 
 Similarly, `String.prototype.charCodeAt` is fixed by 
 `String.prototype.codePointAt`.
 
 Should there be a method that is like `String.prototype.charAt` except it 
 deals with astral Unicode symbols wherever possible?
 
  '팆'.charAt(0) // U+1D306
 '\uD834' // the first surrogate half for U+1D306
 
  '팆'.symbolAt(0) // U+1D306
 '팆' // U+1D306
 
 I think the idea is good, but the name may be confusing with regard to 
 Symbols (maybe not?)
 

Given that we have charAt, charCodeAt and codePointAt,  I think the most 
appropiate name for such a method would be 'at':
 '팆'.at(0)

The issue when this sort of method has been discussed in the past has been what 
to do when you index at a trailing surrogate possition:

'팆'.at(1)

do you still get '팆' or do you get the equivalent of 
String.fromCharCode('팆'[1]) ?

Allen

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)


On Oct 18, 2013, at 9:05 AM, Anne van Kesteren wrote:

 On Fri, Oct 18, 2013 at 4:58 PM, Mathias Bynens math...@qiwi.be wrote:
 On 18 Oct 2013, at 10:48, Anne van Kesteren ann...@annevk.nl wrote:
 When you phrase it like that, I see another problem with
 codePointAt(). You can't just replace existing usage of charCodeAt()
 with codePointAt() as that would fail for input with paired
 surrogates. E.g. a simple loop over a string that prints code points
 would print both the code point and the trail surrogate code point for
 a surrogate pair.
 
 I disagree. In those situations you should just iterate over the string 
 using `for…of`.
 
 That seems to iterate over code units as far as I can tell.
 
  for (var x of )
print(x.charCodeAt(0))
 
 invokes print() twice in Gecko.
 

No that's not correct, the @@iterator method of String.prototype is supposed to 
returns an interator the iterates code points and returns single codepoint 
strings.

The spec. for this will be in the next draft that I release.

Allen
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

+1 for the simplified `at(symbolIndex)`

I would expect '팆'.at(1) to fail same way 'a'.charAt(1) or
'a'.charCodeAt(1) would.

I would expect '팆'.at(symbolIndex) to behave as `length` does based on
unique symbol (unicode extra) so that everyone, except RAM and CPU, will
have life easier with strings.

Long story short: there's no symbol at 1, the symbol is at 0 because the
size of that unicode string is 1

That said, I am sure the discussion went through this already ^_^





On Fri, Oct 18, 2013 at 9:57 AM, Allen Wirfs-Brock al...@wirfs-brock.comwrote:


 On Oct 18, 2013, at 7:21 AM, Rick Waldron wrote:




 On Fri, Oct 18, 2013 at 8:46 AM, Mathias Bynens math...@qiwi.be wrote:

 ES6 fixes `String.fromCharCode` by introducing `String.fromCodePoint`.

 Similarly, `String.prototype.charCodeAt` is fixed by
 `String.prototype.codePointAt`.

 Should there be a method that is like `String.prototype.charAt` except it
 deals with astral Unicode symbols wherever possible?

  '팆'.charAt(0) // U+1D306
 '\uD834' // the first surrogate half for U+1D306

  '팆'.symbolAt(0) // U+1D306
 '팆' // U+1D306


 I think the idea is good, but the name may be confusing with regard to
 Symbols (maybe not?)


 Given that we have charAt, charCodeAt and codePointAt,  I think the most
 appropiate name for such a method would be 'at':
  '팆'.at(0)

 The issue when this sort of method has been discussed in the past has been
 what to do when you index at a trailing surrogate possition:

 '팆'.at(1)

 do you still get '팆' or do you get the equivalent of
 String.fromCharCode('팆'[1]) ?

 Allen


 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss


___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

the size of that unicode string is 1 ... meaning the **virtual** size for
human eyes


On Fri, Oct 18, 2013 at 10:06 AM, Andrea Giammarchi 
andrea.giammar...@gmail.com wrote:

 +1 for the simplified `at(symbolIndex)`

 I would expect '팆'.at(1) to fail same way 'a'.charAt(1) or
 'a'.charCodeAt(1) would.

 I would expect '팆'.at(symbolIndex) to behave as `length` does based on
 unique symbol (unicode extra) so that everyone, except RAM and CPU, will
 have life easier with strings.

 Long story short: there's no symbol at 1, the symbol is at 0 because the
 size of that unicode string is 1

 That said, I am sure the discussion went through this already ^_^





 On Fri, Oct 18, 2013 at 9:57 AM, Allen Wirfs-Brock 
 al...@wirfs-brock.comwrote:


 On Oct 18, 2013, at 7:21 AM, Rick Waldron wrote:




 On Fri, Oct 18, 2013 at 8:46 AM, Mathias Bynens math...@qiwi.be wrote:

 ES6 fixes `String.fromCharCode` by introducing `String.fromCodePoint`.

 Similarly, `String.prototype.charCodeAt` is fixed by
 `String.prototype.codePointAt`.

 Should there be a method that is like `String.prototype.charAt` except
 it deals with astral Unicode symbols wherever possible?

  '팆'.charAt(0) // U+1D306
 '\uD834' // the first surrogate half for U+1D306

  '팆'.symbolAt(0) // U+1D306
 '팆' // U+1D306


 I think the idea is good, but the name may be confusing with regard to
 Symbols (maybe not?)


 Given that we have charAt, charCodeAt and codePointAt,  I think the most
 appropiate name for such a method would be 'at':
  '팆'.at(0)

 The issue when this sort of method has been discussed in the past has
 been what to do when you index at a trailing surrogate possition:

 '팆'.at(1)

 do you still get '팆' or do you get the equivalent of
 String.fromCharCode('팆'[1]) ?

 Allen


 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss



___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

if this is true then .at(symbolIndex) should be a no-brain ?

```
var virtualLength = 0;
for (var x of ) {
  virtualLength++;
}

// equivalent of
for(var i = 0; i  virtualLength; i++) {
  .at(i);
}

```

Am I missing something ?


On Fri, Oct 18, 2013 at 10:03 AM, Allen Wirfs-Brock
al...@wirfs-brock.comwrote:


 On Oct 18, 2013, at 9:05 AM, Anne van Kesteren wrote:

  On Fri, Oct 18, 2013 at 4:58 PM, Mathias Bynens math...@qiwi.be wrote:
  On 18 Oct 2013, at 10:48, Anne van Kesteren ann...@annevk.nl wrote:
  When you phrase it like that, I see another problem with
  codePointAt(). You can't just replace existing usage of charCodeAt()
  with codePointAt() as that would fail for input with paired
  surrogates. E.g. a simple loop over a string that prints code points
  would print both the code point and the trail surrogate code point for
  a surrogate pair.
 
  I disagree. In those situations you should just iterate over the string
 using `for…of`.
 
  That seems to iterate over code units as far as I can tell.
 
   for (var x of )
 print(x.charCodeAt(0))
 
  invokes print() twice in Gecko.
 

 No that's not correct, the @@iterator method of String.prototype is
 supposed to returns an interator the iterates code points and returns
 single codepoint strings.

 The spec. for this will be in the next draft that I release.

 Allen
 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)


On Oct 18, 2013, at 10:06 AM, Andrea Giammarchi wrote:

 +1 for the simplified `at(symbolIndex)`
 
 I would expect '팆'.at(1) to fail same way 'a'.charAt(1) or 'a'.charCodeAt(1) 
 would.

They are comparable, as the 'a' example are index out of bounds errors. We 
only use code unit indices with strings so '팆'[1] is valid (and so presumably 
should be '팆'.at(1) with 1 having the same meaning in each case.

The most consistent way to define String.prototype.at be be:

String.prototype.at = function(pos} {
let cp = this.codePointAt(pos);
return cp===undefined ? undefined : String.fromCodePoint(cp)
}
   
Allen



___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2013-10-18 Thread Jason Orendorff

On Fri, Oct 18, 2013 at 12:03 PM, Allen Wirfs-Brock
al...@wirfs-brock.com wrote:
  for (var x of )
print(x.charCodeAt(0))

 invokes print() twice in Gecko.

 No that's not correct, the @@iterator method of String.prototype is supposed 
 to returns an interator the iterates code points and returns single codepoint 
 strings.

Filed: https://bugzilla.mozilla.org/show_bug.cgi?id=928508

-j
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

fair enough, that was my point about

 except for RAM and CPU, life is going to be easier for devs

so my counter-question would be: is there any way to do that in core so
that we can “”.split() it so that we can have an ArrayLike that with
[1] gives back the single “” and not the whole thing ?

Or does Mathyas have already a RegExp able to split like that with
reasonable perfomance ?

P.S. I am in Chrome and Safari and I had no idea until I've seen that on
twitter what kind of “” we were talking about :D

On Fri, Oct 18, 2013 at 10:34 AM, Allen Wirfs-Brock
al...@wirfs-brock.comwrote:


 On Oct 18, 2013, at 10:18 AM, Andrea Giammarchi wrote:

 if this is true then .at(symbolIndex) should be a no-brain ?

 ```
 var virtualLength = 0;
 for (var x of ) {
   virtualLength++;
 }

 // equivalent of
 for(var i = 0; i  virtualLength; i++) {
   .at(i);
 }

 ```

 Am I missing something ?


 Yes, we don't want to introduce code point based direct indexing, which
 alway requires scanning from the front of the string.  We already made that
 decision in the context of charPointAt which only use code unit indices.

 Allen








 On Fri, Oct 18, 2013 at 10:03 AM, Allen Wirfs-Brock al...@wirfs-brock.com
  wrote:


 On Oct 18, 2013, at 9:05 AM, Anne van Kesteren wrote:

  On Fri, Oct 18, 2013 at 4:58 PM, Mathias Bynens math...@qiwi.be
 wrote:
  On 18 Oct 2013, at 10:48, Anne van Kesteren ann...@annevk.nl wrote:
  When you phrase it like that, I see another problem with
  codePointAt(). You can't just replace existing usage of charCodeAt()
  with codePointAt() as that would fail for input with paired
  surrogates. E.g. a simple loop over a string that prints code points
  would print both the code point and the trail surrogate code point for
  a surrogate pair.
 
  I disagree. In those situations you should just iterate over the
 string using `for…of`.
 
  That seems to iterate over code units as far as I can tell.
 
   for (var x of )
 print(x.charCodeAt(0))
 
  invokes print() twice in Gecko.
 

 No that's not correct, the @@iterator method of String.prototype is
 supposed to returns an interator the iterates code points and returns
 single codepoint strings.

 The spec. for this will be in the next draft that I release.

 Allen
 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss




___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

On 18 Oct 2013, at 11:05, Anne van Kesteren ann...@annevk.nl wrote:

 On Fri, Oct 18, 2013 at 4:58 PM, Mathias Bynens math...@qiwi.be wrote:
 I disagree. In those situations you should just iterate over the string 
 using `for…of`.
 
 That seems to iterate over code units as far as I can tell.
 
 for (var x of )
  print(x.charCodeAt(0))
 
 invokes print() twice in Gecko.

Woah, that doesn’t seem very useful. Is that a bug, or the way it’s supposed to 
work? I thought it was supposed to only iterate over whole code points (i.e. 
only print once for each code point, not once for each surrogate half).
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)


On Oct 18, 2013, at 1:12 PM, Andrea Giammarchi wrote:

 fair enough, that was my point about 
 
  except for RAM and CPU, life is going to be easier for devs
 
 so my counter-question would be: is there any way to do that in core so that 
 we can “”.split() it so that we can have an ArrayLike that with [1] gives 
 back the single “” and not the whole thing ?

Array.from( '팆팆팆'))[1]

Allen
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

Please ignore my previous email; it has been answered already. (It was a draft 
I wrote up this morning before I lost my internet connection.)

On 18 Oct 2013, at 11:57, Allen Wirfs-Brock al...@wirfs-brock.com wrote:

 Given that we have charAt, charCodeAt and codePointAt,  I think the most 
 appropiate name for such a method would be 'at':
  '팆'.at(0)

Love it!

 The issue when this sort of method has been discussed in the past has been 
 what to do when you index at a trailing surrogate possition:
 
 '팆'.at(1)
 
 do you still get '팆' or do you get the equivalent of 
 String.fromCharCode('팆'[1]) ?

In my proposal it would return the equivalent of `String.fromCharCode('팆'[1])`. 
I think that’s the most sane behavior in that case. This also mimics the way 
`String.codePointAt` works in such a case.

Here’s a prollyfill for `String.prototype.at` based on my earlier proposal: 
https://github.com/mathiasbynens/String.prototype.at Tests: 
https://github.com/mathiasbynens/String.prototype.at/blob/master/tests/tests.js
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

On 18 Oct 2013, at 15:12, Andrea Giammarchi andrea.giammar...@gmail.com wrote:

 so my counter-question would be: is there any way to do that in core so that 
 we can “”.split() it so that we can have an ArrayLike that with [1] gives 
 back the single “” and not the whole thing ?

This brings us back to the earlier discussion of whether something like 
`String.prototype.codePoints` should be added: 
http://esdiscuss.org/topic/how-to-count-the-number-of-symbols-in-a-string I 
think it would be useful

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

If I understand Allen answer looks like `Array.from(“”).length` would
do, being 3, and making the operation straight forward?

Cheers


On Fri, Oct 18, 2013 at 1:33 PM, Mathias Bynens math...@qiwi.be wrote:

 On 18 Oct 2013, at 15:12, Andrea Giammarchi andrea.giammar...@gmail.com
 wrote:

  so my counter-question would be: is there any way to do that in core so
 that we can “”.split() it so that we can have an ArrayLike that with
 [1] gives back the single “” and not the whole thing ?

 This brings us back to the earlier discussion of whether something like
 `String.prototype.codePoints` should be added:
 http://esdiscuss.org/topic/how-to-count-the-number-of-symbols-in-a-stringI 
 think it would be useful


___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2013-10-18 Thread Joshua Bell

Given that you can only use the proposed String.prototype.at() properly for
indexes  0 if you know the index of a non-BMP character or lead surrogate
by some other means, or if you will test the return value for a trailing
surrogate, is it really an advantage over using codePointAt / fromCodePoint?

The name at is so tempting I'm imagining naive scripts of the form for (i
= 0; i  s.length; ++i) { r += s.at(i); } which will work fine until they
get a non-BMP input at which point they're suddenly duplicating the
trailing surrogates.

Pushing people towards for-of iteration and even Allen's Array.from(
'팆팆팆'))[1] seems safer; users who need more subtle things have have
codePointAt
/ fromCodePoint available and hopefully the knowledge to use them.


On Fri, Oct 18, 2013 at 1:30 PM, Mathias Bynens math...@qiwi.be wrote:

 Please ignore my previous email; it has been answered already. (It was a
 draft I wrote up this morning before I lost my internet connection.)

 On 18 Oct 2013, at 11:57, Allen Wirfs-Brock al...@wirfs-brock.com wrote:

  Given that we have charAt, charCodeAt and codePointAt,  I think the most
 appropiate name for such a method would be 'at':
   '팆'.at(0)

 Love it!

  The issue when this sort of method has been discussed in the past has
 been what to do when you index at a trailing surrogate possition:
 
  '팆'.at(1)
 
  do you still get '팆' or do you get the equivalent of
 String.fromCharCode('팆'[1]) ?

 In my proposal it would return the equivalent of
 `String.fromCharCode('팆'[1])`. I think that’s the most sane behavior in
 that case. This also mimics the way `String.codePointAt` works in such a
 case.

 Here’s a prollyfill for `String.prototype.at` based on my earlier
 proposal: https://github.com/mathiasbynens/String.prototype.at Tests:
 https://github.com/mathiasbynens/String.prototype.at/blob/master/tests/tests.js
 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)


On Oct 18, 2013, at 1:29 PM, Allen Wirfs-Brock wrote:
 
 Array.from( '팆팆팆'))[1]

maybe even better:

Uint32Array.from( '팆팆팆'))[1]

Allen
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)


On Oct 18, 2013, at 4:01 PM, Allen Wirfs-Brock wrote:

 
 On Oct 18, 2013, at 1:29 PM, Allen Wirfs-Brock wrote:
 
 Array.from( '팆팆팆'))[1]
 
 maybe even better:
 
 Uint32Array.from( '팆팆팆'))[1]

err...maybe not if you want a string value:

String.fromCodePoint(Uint32Array.from( '팆팆팆')[1])
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2013-10-18 Thread André Bargull


On Oct 18, 2013, at 4:01 PM, Allen Wirfs-Brock wrote:

/  
//  On Oct 18, 2013, at 1:29 PM, Allen Wirfs-Brock wrote:
//  
//  Array.from( '???'))[1]
//  
//  maybe even better:
//  
//  Uint32Array.from( '???'))[1]

/
err...maybe not if you want a string value:

String.fromCodePoint(Uint32Array.from( '???')[1])


That does not seem to be too useful:

js String.fromCodePoint(Uint32Array.from(\u{1d306}\u{1d306}\u{1d306})[1])
\u


According to 
http://norbertlindenberg.com/2012/05/ecmascript-supplementary-characters/index.html#String, 
String.prototype[@@iterator] does not return plain code points, but the 
String value for the code point.



- André
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)