Re: why is regexp /\-/u a syntax-error?

2019-09-20 Thread Mathias Bynens
Think of the `u` flag as a strict mode for regular expressions.

`/\a/u` throws, because there is no reason to escape `a` as `\a` --
therefore, if such an escape sequence is present, it's likely a user error.
The same goes for `/\-/u`. `-` only has special meaning within character
classes, not outside of them.

On Fri, Sep 20, 2019 at 11:22 AM kai zhu  wrote:

> jslint previously warned against unescaped literal "-" in regexp.
>
> however, escaping "-" together with unicode flag "u", causes syntax error
> in chrome/firefox/edge (and jslint has since removed warning):
>
> ```javascript
> let rgx = /\-/u
> VM21:1 Uncaught SyntaxError: Invalid regular expression: /\-/: Invalid
> escape
> at :1:10
> ```
>
> just, curious on reason why above edge-case is a syntax-error?
> ___
> es-discuss mailing list
> es-discuss@mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Proposal: `String.prototype.codePointCount`

2019-08-08 Thread Mathias Bynens
Prior discussion from 7 years ago:
https://esdiscuss.org/topic/how-to-count-the-number-of-symbols-in-a-string

[...string].length does what you want. But it's definitely not always what
you need
.

On Thu, Aug 8, 2019 at 4:37 AM fanerge  wrote:

> I expect to be able to add an attribute to String.prototype that returns
> the number of codePoints of the string to reflect the actual number of
> characters instead of the code unit.
>
>
> Definition of String.prototype.length
>
> This property returns the number of code units in the string. UTF-16
> , the string format used by
> JavaScript, uses a single 16-bit code unit to represent the most common
> characters, but needs to use two code units for less commonly-used
> characters, so it's possible for the value returned by length to not
> match the actual number of characters in the string.
>
> We refer to the String class in Java
>
> The String class in the Java JVM uses UTF-16 encoding.
> String.length(): The method returns the number of characters in char in
> the string;
> String.codePointCount(): The method returns the number of codewords in
> the string.
>
>
> *I want the ECMA organization to be able to add a property or method to
> String.prototype that returns the value of the codePoint of the string. For
> example: String.prototype.codePointCount can return the actual number of
> codePoints instead of code unit.*
>
> *```*
>
> const str1 = ‘’;
>
> str1.length; // 4
>
> str1.codePointCount; // 4
>
> // ‘1’.codePointAt(0) // 49
>
>
> const str2 = '’;
>
> str2.length; // 8
>
> str2.codePointCount; // 4
>
> // 'ஷ'.codePointAt(0); // 134071
>
>
> const str3 = ‘’;
>
> str3.length; // 8
>
> str3.codePointCount; // 4
>
> // ''.codePointAt(0); // 128559
>
> *```*
>
> *I believe that most developers need such a method and property to get the
> number of codePoints in a string. I sincerely hope that you can accept my
> proposal*,* thanks.*
>
>
>
> ___
> es-discuss mailing list
> es-discuss@mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Overload str.replace to take a Map?

2018-05-20 Thread Mathias Bynens
ngify4':
> value = JSON.stringify(value, null, 4);
> break;
> case 'notHtmlSafe':
> notHtmlSafe = true;
> break;
> case 'truncate':
> skip = ii + 1;
> if (value.length > list[skip]) {
> value = value.slice(0, list[skip] - 3).trimRight() +
> '...';
> }
> break;
> // default to String.prototype[arg0]()
> default:
> if (ii === skip) {
> break;
> }
>     value = value[arg0]();
> break;
> }
> });
> value = String(value);
> // default to htmlSafe
> if (!notHtmlSafe) {
> value = value
> .replace((/"/g), '')
> .replace((/&/g), '')
> .replace((/'/g), '')
> .replace((/ .replace((/>/g), '')
> .replace((/(amp;|apos;|gt;|lt;|quot;)/ig), '&$1');
> }
> return value;
> });
> };
>
> console.log(templateRender(process.argv[2], JSON.parse(process.argv[3])));
> ```
>
>
>
> kai zhu
> kaizhu...@gmail.com
>
>
>
> On 20 May 2018, at 6:32 PM, Isiah Meadows <isiahmead...@gmail.com> wrote:
>
> @Mathias
>
> My partcular `escapeHTML` example *could* be written like that (and it
> *is* somewhat in the prose). But you're right that in the prose, I did
> bring up the potential for things like `str.replace({cheese: "cake", ham:
> "eggs"})`.
>
> @Kai
>
> Have you ever tried writing an HTML template system on the front end? This
> *will* almost inevitably come up, and most of my use cases for this is on
> the front end itself handling various scenarios.
>
> @Cyril
>
> And every single one of those patterns is going to need compiled and
> executed, and compiling and interpreting regular expressions is definitely
> not quick, especially when you can nest Kleene stars. (See:
> https://en.wikipedia.org/wiki/Regular_expression#
> Implementations_and_running_times) That's why I'm against it - we don't
> need to complicate this proposal with that mess.
>
> -
>
> Isiah Meadows
> m...@isiahmeadows.com
> www.isiahmeadows.com
>
> On Sat, May 19, 2018 at 7:04 PM, Mathias Bynens <math...@qiwi.be> wrote:
>
>> Hey Kai, you’re oversimplifying. Your solution works for a single Unicode
>> symbol (corresponding to a single code point) but falls apart as soon as
>> you need to match multiple symbols of possibly varying length, like in the
>> `escapeHtml` example.
>>
>> On Sat, May 19, 2018 at 8:43 AM, kai zhu <kaizhu...@gmail.com> wrote:
>>
>>> again, you backend-engineers are making something more complicated than
>>> needs be, when simple, throwaway glue-code will suffice.  agree with
>>> jordan, this feature is a needless cross-cut of String.prototype.replace.
>>>
>>> ```
>>> /*jslint
>>> node: true
>>> */
>>> 'use strict';
>>> var dict;
>>> dict = {
>>> '$': '^',
>>> '1': '2',
>>> '<': '',
>>> '': '',
>>> '-': '_',
>>> ']': '@'
>>> };
>>> // output: "test_^^[22@ foo>"
>>> console.log('test-$$[11] '.replace((/[\S\s]/gu), function
>>> (character) {
>>> return dict.hasOwnProperty(character)
>>> ? dict[character]
>>> : character;
>>> }));
>>> ```
>>>
>>> kai zhu
>>> kaizhu...@gmail.com
>>>
>>>
>>>
>>> On 19 May 2018, at 4:08 PM, Cyril Auburtin <cyril.aubur...@gmail.com>
>>> wrote:
>>>
>>> You can also have a
>>>
>>> ```js
>>> var replacer = replacements => {
>>>   const re = new RegExp(replacements.map(([k,_,escaped=k]) =>
>>> escaped).join('|'), 'gu');
>>>   const replaceMap = new Map(replacements);
>>>   return s => s.replace(re, w => replaceMap.get(w));
>>> }
>>> var replace = replacer([['$', '^', String.raw`\$`], ['1', '2'], ['<',
>>> ''], ['', ''], ['-', '_'], [']', '@', String.raw`\]`]]);
>>> replace('test-$$[11] ') // "test_^^[22@ foo>"
>>> ```
>>> but it's quickly messy to work with escaping
>>>
>>> Le sam. 19 mai 2018 à 08:17, Isiah Meadows <isiahmead...@gmail.com> a
>>> écrit :

Re: Overload str.replace to take a Map?

2018-05-19 Thread Mathias Bynens
Hey Kai, you’re oversimplifying. Your solution works for a single Unicode
symbol (corresponding to a single code point) but falls apart as soon as
you need to match multiple symbols of possibly varying length, like in the
`escapeHtml` example.

On Sat, May 19, 2018 at 8:43 AM, kai zhu  wrote:

> again, you backend-engineers are making something more complicated than
> needs be, when simple, throwaway glue-code will suffice.  agree with
> jordan, this feature is a needless cross-cut of String.prototype.replace.
>
> ```
> /*jslint
> node: true
> */
> 'use strict';
> var dict;
> dict = {
> '$': '^',
> '1': '2',
> '<': '',
> '': '',
> '-': '_',
> ']': '@'
> };
> // output: "test_^^[22@ foo>"
> console.log('test-$$[11] '.replace((/[\S\s]/gu), function
> (character) {
> return dict.hasOwnProperty(character)
> ? dict[character]
> : character;
> }));
> ```
>
> kai zhu
> kaizhu...@gmail.com
>
>
>
> On 19 May 2018, at 4:08 PM, Cyril Auburtin 
> wrote:
>
> You can also have a
>
> ```js
> var replacer = replacements => {
>   const re = new RegExp(replacements.map(([k,_,escaped=k]) =>
> escaped).join('|'), 'gu');
>   const replaceMap = new Map(replacements);
>   return s => s.replace(re, w => replaceMap.get(w));
> }
> var replace = replacer([['$', '^', String.raw`\$`], ['1', '2'], ['<',
> ''], ['', ''], ['-', '_'], [']', '@', String.raw`\]`]]);
> replace('test-$$[11] ') // "test_^^[22@ foo>"
> ```
> but it's quickly messy to work with escaping
>
> Le sam. 19 mai 2018 à 08:17, Isiah Meadows  a
> écrit :
>
>> Here's what I'd prefer instead: overload `String.prototype.replace` to
>> take non-callable objects, as sugar for this:
>>
>> ```js
>> const old = Function.call.bind(Function.call, String.prototype.replace)
>> String.prototype.replace = function (regexp, object) {
>> if (object == null && regexp != null && typeof regexp === "object") {
>> const re = new RegExp(
>> Object.keys(regexp)
>> .map(key => `${old(key, /[\\^$*+?.()|[\]{}]/g, '\\$&')}`)
>> .join("|")
>> )
>> return old(this, re, m => object[m])
>> } else {
>> return old(this, regexp, object)
>> }
>> }
>> ```
>>
>> This would cover about 99% of my use for something like this, with
>> less runtime overhead (that of not needing to check for and
>> potentially match multiple regular expressions at runtime) and better
>> static analyzability (you only need to check it's an object literal or
>> constant frozen object, not that it's argument is the result of the
>> built-in `Map` call). It's exceptionally difficult to optimize for
>> this unless you know everything's a string, but most cases where I had
>> to pass a callback that wasn't super complex looked a lot like this:
>>
>> ```js
>> // What I use:
>> function escapeHTML(str) {
>> return str.replace(/["'&<>]/g, m => {
>> switch (m) {
>> case '"': return ""
>> case "'": return ""
>> case "&": return ""
>> case "<": return ""
>> case ">": return ""
>> default: throw new TypeError("unreachable")
>> }
>> })
>> }
>>
>> // What it could be
>> function escapeHTML(str) {
>> return str.replace({
>> '"': "",
>> "'": "",
>> "&": "",
>> "<": "",
>> ">": "",
>> })
>> }
>> ```
>>
>> And yes, this enables optimizations engines couldn't easily produce
>> otherwise. In this instance, an engine could find that the object is
>> static with only single-character entries, and it could replace the
>> call to a fast-path one that relies on a cheap lookup table instead
>> (Unicode replacement would be similar, except you'd need an extra
>> layer of indirection with astrals to avoid blowing up memory when
>> generating these tables):
>>
>> ```js
>> // Original
>> function escapeHTML(str) {
>> return str.replace({
>> '"': "",
>> "'": "",
>> "&": "",
>> "<": "",
>> ">": "",
>> })
>> }
>>
>> // Not real JS, but think of it as how an engine might implement this. The
>> // implementation of the runtime function `ReplaceWithLookupTable` is
>> omitted
>> // for brevity, but you could imagine how it could be implemented, given
>> the
>> // pseudo-TS signature:
>> //
>> // ```ts
>> // declare function %ReplaceWithLookupTable(
>> // str: string,
>> // table: string[]
>> // ): string
>> // ```
>> function escapeHTML(str) {
>> static {
>> // A zero-initialized array with 2^16 entries (U+-U+),
>> except
>> // for the object's members. This takes up to about 70K per
>> instance,
>> // but these are *far* more often called than created.
>> const _lookup_escapeHTML = %calloc(65536)
>>
>> _lookup_escapeHTML[34] = ""
>> _lookup_escapeHTML[38] = ""
>> _lookup_escapeHTML[39] = ""
>> 

Re: add reverse() method to strings

2018-03-18 Thread Mathias Bynens
For arrays, indexing is unambiguous: `array[42]` is whatever value you put
there. As a result, it’s clear what it means to “reverse” an array.

This is not the case for strings, where indexing is inherently ambiguous.
Should `string[42]` index by UCS-2/UTF-16 code unit? By Unicode code point?
By grapheme cluster?



On Mon, Mar 19, 2018 at 6:28 AM, Felipe Nascimento de Moura <
felipenmo...@gmail.com> wrote:

> I have had to use that one, parsing texts and I remember I had to reverse
> strings that represented tokens...but that was very specific.
>
> What I would like to see in strings would be something like "firstCase"
> for transforming "felipe" into "Felipe" for example.
> I always have to use something like `str[0].toUpperCase() + str.slice(1)`.
>
> The only reason I would defend the "reverse" method in strings is because
> it makes sense.
> I think JavaScript is very intuitive, and, as Arrays do have the "reverse"
> method, that simply makes sense to have it in strings as well.
>
> Cheers.
>
>
> [ ]s
>
> *--*
>
> *Felipe N. Moura*
> Web Developer, Google Developer Expert
> <https://developers.google.com/experts/people/felipe-moura>, Founder of
> BrazilJS <https://braziljs.org/> and Nasc <http://nasc.io/>.
>
> Website:  http://felipenmoura.com / http://nasc.io/
> Twitter:@felipenmoura <http://twitter.com/felipenmoura>
> Facebook: http://fb.com/felipenmoura
> LinkedIn: http://goo.gl/qGmq
> -
> *Changing  the  world*  is the least I expect from  myself!
>
> On Sun, Mar 18, 2018 at 12:00 PM, Mark Davis ☕️ <m...@macchiato.com>
> wrote:
>
>> .reverse would only be reasonable for a subset of characters supported by
>> Unicode. Its primary cited use case is for a particular educational
>> example, when there are probably thousands of similar examples of educational
>> snippets that would be rarely used in a production environment. Given
>> that, it would be far better for those people who really need it to just
>> provide that to their students as a provided function for the sake of that
>> example.
>>
>> Mark
>>
>> On Sun, Mar 18, 2018 at 8:56 AM, Grigory Hatsevich <g.hatsev...@gmail.com
>> > wrote:
>>
>>> "This would remove the challenge and actively worsen their learning
>>> process" -- this is not true. You can see it e.g. by looking at the
>>> specific task I was talking about:
>>>
>>> "Given a string, find the shortest possible string which can be achieved
>>> by adding characters to the end of initial string to make it a palindrome."
>>>
>>> This is my code for this task:
>>>
>>> function buildPalindrome(s){
>>> String.prototype.reverse=function(){
>>> return this.split('').reverse().join('')
>>> }
>>>
>>> function isPalindrome(s){
>>> return s===s.reverse()
>>> }
>>> for (i=0;i<s.length;i++){
>>> first=s.slice(0,i);
>>> rest=s.slice(i);
>>> if(isPalindrome(rest)){
>>> return s+first.reverse()
>>>}
>>> }
>>> }
>>>
>>>
>>> As you see, the essence of this challenge is not in the process of
>>> reversing a string. Having a reverse() method just makes the code more
>>> readable -- comparing to alternative when one would have to write
>>> .split('').reverse().join('') each time instead of just .reverse()
>>>
>>> On Sun, Mar 18, 2018 at 2:38 PM, Frederick Stark <coagm...@gmail.com>
>>> wrote:
>>>
>>>> The point of a coding task for a beginner is to practice their problem
>>>> solving skills to solve the task. This would remove the challenge and
>>>> actively worsen their learning process
>>>>
>>>>
>>>> On Mar 18 2018, at 6:26 pm, Grigory Hatsevich <g.hatsev...@gmail.com>
>>>> wrote:
>>>>
>>>>
>>>> My use case is solving coding tasks about palindromes on codefights.com.
>>>> Not sure if that counts as "real-world", but probably a lot of beginning
>>>> developers encounter such tasks at least once.
>>>>
>>>>
>>>>
>>>>
>>>> On Sun, 18 Mar 2018 06:41:46 +0700, Mathias Bynens <math...@qiwi.be>
>>>> wrote:
>>>>
>>>> So far no one has provided a real-world use case.
>>>>
>>>> On Mar 18, 2018 10:15, 

Re: add reverse() method to strings

2018-03-17 Thread Mathias Bynens
So far no one has provided a real-world use case.

On Mar 18, 2018 10:15, "Mike Samuel"  wrote:

> Previous discussion: https://esdiscuss.org/topic/wiki-updates-for-string-
> number-and-math-libraries#content-1
>
> """
> String.prototype.reverse(), as proposed, corrupts supplementary
> characters. Clause 6 of Ecma-262 redefines the word "character" as "a
> 16-bit unsigned value used to represent a single 16-bit unit of text", that
> is, a UTF-16 code unit. In contrast, the phrase "Unicode character" is used
> for Unicode code points. For reverse(), this means that the proposed spec
> will reverse the sequence of the two UTF-16 code units representing a
> supplementary character, resulting in corruption. If this function is
> really needed (is it? for what?), it should preserve the order of surrogate
> pairs, as does java.lang.StringBuilder.reverse:download.oracle.com/
> javase/7/docs/api/java/lang/StringBuilder.html#reverse()
> """
>
> On Sat, Mar 17, 2018 at 1:41 PM, Grigory Hatsevich 
> wrote:
>
>> Hi! I would propose to add reverse() method to strings. Something
>> equivalent to the following:
>>
>> String.prototype.reverse = function(){
>>   return this.split('').reverse().join('')
>> }
>>
>> It seems natural to have such method. Why not?
>>
>
>
>
> ___
> es-discuss mailing list
> es-discuss@mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
>
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: JSON.canonicalize()

2018-03-16 Thread Mathias Bynens
On Fri, Mar 16, 2018 at 9:04 PM, Mike Samuel  wrote:

>
> The output of JSON.canonicalize would also not be in the subset of JSON
> that is also a subset of JavaScript's PrimaryExpression.
>
>JSON.canonicalize(JSON.stringify("\u2028\u2029")) === `"\u2028\u2029"`
>

Soon U+2028 and U+2029 will no longer be edge cases. A Stage 3 proposal
(currently shipping in Chrome) makes them valid in ECMAScript string
literals, making JSON a strict subset of ECMAScript:
https://github.com/tc39/proposal-json-superset
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Ranges

2016-11-04 Thread Mathias Bynens
On Fri, Nov 4, 2016 at 6:24 PM, Jordan Harband  wrote:
> Here you go:
>
> 1) `function* range(start, end) { for (const i = +start; i < end; ++i) {
> yield i; } }`

For future reference: `++i` throws when `i` is a `const` binding. The
intended example uses `let` instead.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Adding DOTALL modifier to ECMAScript regex standards

2016-08-15 Thread Mathias Bynens

> On 10 Aug 2016, at 16:02, Jake Reynolds  wrote:
> 
> I brought up the topic of adding the DOTALL modifier to the Chrome V8 Engine 
> here and was directed to es-discuss.  I was curious about the practicality 
> and the want for adding a DOTALL modifier to the ECMAScript standards in the 
> future?
> 
> For those that don't know that DOTALL modifier is a regex modifier that 
> allows the '.' symbol to match newlines as well.
> 
> Example regex: /he[.*]?llo/
> Example search string 1: hello
> Example search string 2: he
> llo
> 
> The above regex will match the 1st search string but will not match the 2nd.
> 
> In ECMAScript the only current way to make a match like that work is to use 
> [\d\D] which will match everything including newlines, given below.  
> 
> Current workaround regex: /he[\d\D]?llo/
> 
> The s modifier is the standard in most major languages except Javascript and 
> Ruby.  This will allow newline matching for the . symbol.  The proposed regex 
> is below:
> 
> Proposed new regex: /he[.*]?llo/s
> Example search string: he
> llo

Formal proposal (incl. proposed spec changes) for this feature: 
https://github.com/mathiasbynens/es-regexp-singleline-flag
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Adding DOTALL modifier to ECMAScript regex standards

2016-08-10 Thread Mathias Bynens
On Wed, Aug 10, 2016 at 4:40 PM, Bob Myers  wrote:
> If it's any consolation there is the more compact hack of `[^]`, which I
> **think** is supposed to work everywhere.

That doesn’t work in IE < 9, but that shouldn’t matter in 2016.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Could we add the missing Regexp features from perl?

2016-06-24 Thread Mathias Bynens

> On 18 Jun 2016, at 00:01, Sebastian Zartner  
> wrote:
> 
> There are already a few regexp features in the pipeline, see 
> https://github.com/goyakin/es-regexp (listed in the Stage 0 proposals at 
> https://github.com/tc39/proposals/blob/master/stage-0-proposals.md).

Another one would be the proposal to add Unicode property escapes of the form 
`\p{…}` and `\P{…}` to ECMAScript regular expressions: 
https://github.com/mathiasbynens/es-regexp-unicode-property-escapes This is 
scheduled to be presented at the next TC39 meeting.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Observing whether a function is strict

2016-05-26 Thread Mathias Bynens
On Thu, May 26, 2016 at 9:48 AM, Claude Pache  wrote:
> I was wondering whether there is a way to observe whether a given random 
> function is strict (or sloppy, or neither).
> […] Are there other ways? (If not, I find it somewhat unfortunate that only 
> such nonstandard features leak this information.)

I smell a proposal for `Reflect.isStrict` in the making…
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Object.getOwnPropertyDescriptors still at stage 0

2016-01-18 Thread Mathias Bynens
On Mon, Jan 18, 2016 at 6:50 PM, Andrea Giammarchi
 wrote:
> Accordingly with this ecma262 stage 0 summary
> https://github.com/tc39/ecma262/blob/master/stage0.md the (quite long time
> ago) discussed `Object.getOwnPropertyDescriptors`
> https://gist.github.com/WebReflection/9353781 hasn't move a bit from there.
>
> However, there are already use cases
> https://gist.github.com/WebReflection/9353781#gistcomment-1672863 and it's
> already available with Babel.
>
> On top of that, the npm package
> https://www.npmjs.com/package/object.getownpropertydescriptors has some
> download, actually surpassing es7-shim repository
> https://github.com/es-shims/es7-shim#shims
>
> What should be done in order to move this little improvement to a stage 1
> situation?

Consider turning your excellent gist into a repository that fulfills
the criteria described here: https://tc39.github.io/process-document/
See https://github.com/tc39/Array.prototype.includes for a good
example.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Object.getOwnPropertyDescriptors still at stage 0

2016-01-18 Thread Mathias Bynens
On Mon, Jan 18, 2016 at 8:27 PM, Andrea Giammarchi
 wrote:
> Do you (or anyone else) know if that should be filed as a PR to tc39/ecma262
> or if it should just be a repository eventually posted in here?

It should be a repository that can eventually move to the tc39
organization if all goes well. Good luck!
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: RegExp.escape()

2015-06-30 Thread Mathias Bynens
On Mon, Jun 29, 2015 at 9:04 PM, Benjamin Gruenbaum
benjami...@gmail.com wrote:
 Why? What advantage would it offer?

See Scott’s previous email:

On Mon, Jun 29, 2015 at 8:42 PM, C. Scott Ananian ecmascr...@cscott.net wrote:
 Imagine trying to ensure that any characters over \u007f were
 escaped.  You don't want an iterable over ~64k characters.

 In addition, a RegExp would allow you to concisely specify hex digits, but
 only at the start of the string and some of the other oddities we've
 considered.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Parser for ES6?

2015-05-07 Thread Mathias Bynens
On Thu, May 7, 2015 at 5:37 PM, Park, Daejun dpar...@illinois.edu wrote:
 Is there any parser for ES6?

https://github.com/shapesecurity/shift-parser-js supports ES6/ES2015
RC2. You can read more about it here:
http://engineering.shapesecurity.com/2015/04/two-phase-parsing.html
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Declaration binding instationationing

2015-04-29 Thread Mathias Bynens
On Wed, Apr 29, 2015 at 7:29 AM, Garrett Smith dhtmlkitc...@gmail.com wrote:
 There is an English problem here:

 Let existingProp be the resulting of calling the [[GetProperty]]
 internal method of go with argument fn.

s/resulting/result/ indeed.

 Can the spec be made easier to read?

FYI, the best way to go about this is to file a spec bug on
https://bugs.ecmascript.org/ under the “editorial issue” component.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Should const be favored over let?

2015-04-17 Thread Mathias Bynens
On Fri, Apr 17, 2015 at 7:53 AM, Glen Huang curvedm...@gmail.com wrote:
 I've completely replaced var with let in my es 2015 code, but I noticed 
 most variables I introduced never change.

Note that `const` has nothing to do with the value of a variable
changing or not. It can still change:

const foo = {};
foo.bar = 42; // does not throw

`const` indicates the *binding* is constant, i.e. there will be no
reassignments etc.

In my post-ES6 code, I use `const` by default, falling back to `let`
if I explicitly need rebinding. `var` is for legacy code.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Number.prototype not being an instance breaks the web, too

2015-04-13 Thread Mathias Bynens
CCing Piotr.

On Mon, Apr 13, 2015 at 5:37 PM, Mark S. Miller erig...@google.com wrote:
 Hold on. I may have reacted too quickly. If it is only jsfiddle, since this
 is an online service rather than a widely copied library, they could just
 fix it. OTOH, if it really is a mootools issue, then yes, we really do need
 to change the spec. (History: Facebook fixed JSON incompatibility. ES5 fixed
 Object.prototype.toString.call(null) incompat with jQuery.)

 Could someone please reply-all to this thread cc'ing Piotr Zalewa and Oskar
 Krawczyk? Thanks.



 On Mon, Apr 13, 2015 at 8:26 AM, Mark S. Miller erig...@google.com wrote:

 I agree. With Number.prototype joining Array.prototype and
 Function.prototype on the dark side, we should ask which others should too.
 When it was only Function.prototype and Array.prototype, principle of least
 surprise (POLS) had us keep the list as small as possible -- until we had
 precisely this kind of evidence of incompatibility. From a security pov, the
 important ones not to revert are those carrying mutable state not locked
 down by Object.freeze. In ES5 this was only Date.prototype. Of the ES5
 builtins in ES6, this now includes RegExp.prototype because of
 RegExp.prototype.compile. (Because of de facto stack magic, this might
 include Error.prototype as well.) Fotunately, there is still no evidence
 that we need to corrupt these as well.

 OTOH, POLS still says that almost everything should not go to the dark
 side, for consistency with ES6 classes. So the precise line becomes a matter
 of taste. I propose that the co-corrupted list be

 Function.prototype
 Array.prototype
 Number.prototype
 Boolean.prototype   // No incompat data. Only POLS
 String.prototype   // No incompat data. Only POLS

 since Number, Boolean, and String are the ordinary ES5 wrappers of
 primitive data values.

 For builtins that are new with ES6, clearly there's no compat issue. And
 both security and consistency with ES6 classes argue in general for not
 corrupting new things. But POLS should put very little weight on the ES5 vs
 ES6 difference since post ES6 programmers will just see all of this as JS.

 Given that, I could argue Symbol.prototype either way, since Symbol is
 kinda another wrapper of a primitive type. But I prefer not. I think we
 should keep the list to those 5.


 Allen, process-wise, is this too late for ES6? If there's any way this fix
 can go in ES6, it should. Otherwise, it should become the first member of
 ES6 errata.


 All that said, I do find corrupting only Number.prototype to be plausible.
 I would not mind if we decided not to spread the corruption even to
 Boolean.prototype and String.prototype. If we have to do a last minute
 as-small-as-possible change to the spec, to get it into ES6, this might be
 best.



 On Mon, Apr 13, 2015 at 7:47 AM, Andreas Rossberg rossb...@google.com
 wrote:

 V8 just rolled a change into Canary yesterday that implements the new ES6
 semantics for Number.prototype (and Boolean.prototype) being ordinary
 objects. Unfortunately, that seems to break the web. In particular
 http://jsfiddle.net/#run fails to load now.

 What I see happening on that page is a TypeError
 Number.prototype.valueOf is not generic being thrown in this function
 (probably part of moo tools):

 Number.prototype.$family = function(){
 return isFinite(this) ? 'number' : 'null';
 }.hide();

 after being invoked on Number.prototype.

 AFAICS, that leaves only one option: backing out of this spec change.

 (See https://code.google.com/p/chromium/issues/detail?id=476437 for the
 bug.)

 /Andreas



 --
 Cheers,
 --MarkM

 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Unicode normalization problem

2015-04-02 Thread Mathias Bynens
On Thu, Apr 2, 2015 at 1:39 AM, Andrea Giammarchi
andrea.giammar...@gmail.com wrote:
 Jordan the purpose of `Array.from` is to iterate over the string, and the 
 point of iteration instead of splitting is to have automagically codepoints. 
 This, unless I've misunderstood Mathias presentation (might be)

 So, here there is a different problem: there are code-points that do not 
 represent real visual representation ...

Those are called grapheme clusters or just “graphemes”, as Boris
mentioned. And here’s how to deal with them:
https://mathiasbynens.be/notes/javascript-unicode#other-grapheme-clusters

“Unicode Standard Annex #29 describes [an algorithm for determining
grapheme cluster
boundaries](http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries).
For a _completely_ accurate solution that works for all Unicode
scripts, implement this algorithm in JavaScript, and then count each
grapheme cluster as a single symbol.”

 or maybe, the real problem, is about broken `Array.from` polyfill?

`Array.from` just uses `String.prototype[Symbol.iterator]` internally,
and that is defined to deal with code points, not grapheme clusters.
Either choice would have confused some developers. IIRC, Perl 6 has
built-in capabilities to deal with grapheme clusters, but until ES
does, this use case must be addressed in user-land.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Unicode normalization problem

2015-04-01 Thread Mathias Bynens
On Wed, Apr 1, 2015 at 9:17 PM, Alexander Guinness monolit...@gmail.com wrote:
 My reasoning is based on the following example:

 ```js
 var text = '퐀';

 text.length; // 2

 Array.from(text).length // 1
 ```

What you’re seeing there is not normalization, but rather the string
iterator that automatically accounts for surrogate pairs (treating
them as a single unit).
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Unicode normalization problem

2015-04-01 Thread Mathias Bynens
On Wed, Apr 1, 2015 at 10:30 PM, monolithed monolit...@gmail.com wrote:
 What you’re seeing there is not normalization, but rather the string
 iterator that automatically accounts for surrogate pairs (treating them as a
 single unit).

 ```js
 var foo = '퐀';
 var bar = 'Й';
 foo.length; // 2
 Array.from(foo).length // 1

 bar.length; // 2
 Array.from(foo).length // 2
 ```

 I think this is strange.
 How to safely work with strings?

It depends on your use case. FWIW, I’ve outlined some examples here:
https://mathiasbynens.be/notes/javascript-unicode
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Should `use strict` be a valid strict pragma?

2015-02-05 Thread Mathias Bynens

 On 5 Feb 2015, at 11:04, Leon Arnott leonarn...@gmail.com wrote:
 
 Well, that isn't quite the full story - if it were just a case of pragmas 
 having to use something, anything, that could pass ES3 engines, then there 
 wouldn't necessarily be two otherwise-redundant forms of the syntax - `use 
 strict` and `'use strict'`. The reason those exist is to save the author 
 remembering which string delimiter to use - it mirrors the string literal 
 syntax exactly.

If that were the case, then e.g. `'\x75\x73\x65\x20\x73\x74\x72\x69\x63\x74'` 
would trigger strict mode. (It doesn’t, and that’s a good thing.)
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Q: Lonely surrogates and unicode regexps

2015-01-28 Thread Mathias Bynens

 On 28 Jan 2015, at 11:36, Marja Hölttä ma...@chromium.org wrote:
 
 TL;DR: /foo.bar/u.test(“foo\uD83Dbar”) == ?
 
 The ES6 unicode regexp spec is not very clear regarding what should happen if 
 the regexp or the matched string contains lonely surrogates (a lead surrogate 
 without a trail, or a trail without a lead). For example, for the . operator, 
 the relevant parts of the spec speak about characters:
 
 https://people.mozilla.org/~jorendorff/es6-draft.html#sec-atom
 https://people.mozilla.org/~jorendorff/es6-draft.html#sec-runtime-semantics-charactersetmatcher-abstract-operation
 https://people.mozilla.org/~jorendorff/es6-draft.html#sec-runtime-semantics-canonicalize-abstract-operation
 
 E.g.,
 “Let A be the set of all *characters* except LineTerminator.”
 “Let ch be the *character* Input[e].”
 
 But is a lonely surrogate a character? According to the Unicode standard, 
 it’s not. If it's not, what will ch be if the input string contains a lonely 
 surrogate in the relevant position?
 
 Q1: Are lonely surrogates allowed in /u regexps?
 
 E.g., /foo\uD83D/u; (note lonely lead surrogate), should this be allowed? 
 Will it match a lead surrogate inside a surrogate pair?
 
 Suggestion: we shouldn't allow lonely surrogates in /u regexps.
 
 If users actually want to match lonely surrogates (e.g., to check for them or 
 remove them) then they can use non-/u regexps.

You’re proposing to define “characters” in terms of Unicode scalar values in 
the case `/u` is used. I could get behind that — it reinforces the idea that 
`/u` is like a strict mode for regular expressions.

Playing devil’s advocate, the problem is that regular expressions and strings 
go hand in hand, and there is no guarantee that JavaScript strings only consist 
of valid code points. Making `.` not match lone surrogates breaks the developer 
expectation that `(.)` matches every “part” of the string. Having to avoid `/u` 
to prevent this seems like a potentially bad thing.

 The regexp syntax treats a lonely surrogate as a normal unicode escape, and 
 the rules say e.g., The production RegExpUnicodeEscapeSequence :: u 
 Hex4Digits evaluates as follows: Return the character whose code is the SV of 
 Hex4Digits. - it's also unclear what this means if no valid character has 
 this code.
 
 Q2: If the string contains a lonely surrogate, what should it match? Should 
 it match .? Should it match [^a] ? (Or is it undefined behavior?)
 
 Test cases:
 /foo.bar/u.test(foo\uD83Dbar) == ?
 /foo.bar/u.test(foo\uDC00bar) == ?
 /foo[^a]bar/u.test(foo\uD83Dbar) == ?
 /foo[^a]bar/u.test(foo\uDC00bar) == ?
 /foo/u.test(bar\uD83Dbarfoo) == ?
 /foo/u.test(bar\uDC00barfoo) == ?
 /foo(.*)bar\1/u.test(foo\uD834bar\uD834\uDC00) == ? // Should the 
 backreference be allowed to match the lead surrogate of a surrogate pair?
 /^(.+)\1$/u.test(\uDC00foobar\uD83D\uDC00foobar\uD83D) == ?? // Should we 
 allow splitting the surrogate pair like this?
 
 Suggestion: a lonely surrogate should not be a character and it should not 
 match . or [^a] etc. However, a lonely surrogate in the input string 
 shouldn't prevent some other part of the string from matching.
 
 If a lonely surrogate is treated as a character, the matching rule for . gets 
 complicated and difficult / slow to implement: . should not match individual 
 surrogates inside a surrogate pair, but if it has to match a lonely 
 surrogate, we'll end up needing lookahead and lookbehind logic to implement 
 that behavior.
 
 For example, the current version of Mathias’s ES6 Unicode regular expression 
 transpiler ( https://mothereff.in/regexpu ) converts /a.b/u into 
 /a(?:[\0-\t\x0B\f\x0E-\u2027\u202A-\uD7FF\uE000-\u]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\uD800-\uDBFF](?![\uDC00-\uDFFF])|(?:[^\uD800-\uDBFF]|^)[\uDC00-\uDFFF])b/
  and afaics it’s not yet fully consistent wrt lonely surrogates, so, a 
 consistent implementation is going to be more complex than this.

This is indeed an incomplete solution. The lack of lookbehind support in ES 
makes this hard to transpile correctly. Ideas welcome!

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: escaping - in /u RegExp

2015-01-14 Thread Mathias Bynens

 On 13 Jan 2015, at 22:23, Allen Wirfs-Brock al...@wirfs-brock.com wrote:
 
 Would those of you who consider yourselves RegExp experts take a look at 
 https://bugs.ecmascript.org/show_bug.cgi?id=3519  Is this a bug? If so, what 
 is the fix?
 
 This construction for Identity Escape goes back to Norbert's original 
 proposal 
 http://norbertlindenberg.com/2012/05/ecmascript-supplementary-characters/index.html
  
 
 Perhaps we need to add a:
   ClassAttom[U] :: [+U]  \-
 
 production or some such to the pattern grammar.

I think it’s a bug — see 
https://codereview.chromium.org/788043005/diff/220001/src/parser.cc#newcode4354 
for the discussion that led to this report.

Your change would allow developers to use an escaped `-` in a character class, 
e.g. `/[a-f\-A-Z]/u`, rather than having to move it to the beginning (i.e. 
`/[-a-fA-Z]/u` or end (`/[a-fA-Z-]/u`) of the character class, as is possible 
today without the `u` flag. That is a good thing IMHO.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Sept 23 2014 Meeting Notes

2014-10-03 Thread Mathias Bynens
Thanks for once again putting this together, Rick!

On Fri, Oct 3, 2014 at 3:22 PM, Rick Waldron waldron.r...@gmail.com wrote:
 ## 4.4 Number('0b0101'). NaN or not?
 (Erik Arvidsson)

 EA: Previous discussion:
 https://github.com/rwaldron/tc39-notes/blob/c61f48cea5f2339a1ec65ca89827c8cff170779b/es6/2014-04/apr-9.md#46-updates-to-parseint

 Should `Number` be able to parse the string 0b0 or 0o1

 (Discussion of people (ab)using Number for converting user input and whether
 this should affect things.)

 Yes.

  Conclusion/Resolution

 - Use spec-internal `ToNumber` via userland `Number` called as a function
 will convert (ie `Number('0b101') === 5)`.
 - Upholding previous consensus on `parseInt`

I don’t understand this reasoning. Making `parseInt` understand the
new syntax was considered a security hazard, but for `Number` we can
somehow get away with it? Any more information on the disconnect?
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: RegExps that don't modify global state?

2014-09-17 Thread Mathias Bynens
On Tue, Sep 16, 2014 at 8:16 PM, Domenic Denicola
dome...@domenicdenicola.com wrote:
 I also noticed today that the static `RegExp` properties are not specced, 
 which seems at odds with our new mandate to at least Annex B-ify the 
 required-for-web-compat stuff.

As a general note to people looking to spec some Annex B stuff,
https://javascript.spec.whatwg.org/ is a good place to start. Many
such things are listed there, but still lack a proper spec definition.
Case in point: https://javascript.spec.whatwg.org/#regexp.$n
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: use strict VS setTimeout

2014-09-07 Thread Mathias Bynens
On Sun, Sep 7, 2014 at 7:29 PM, Andrea Giammarchi
andrea.giammar...@gmail.com wrote:
 This looks like a potential problem when possible passed methods are not
 bound + it looks inconsistent with *use strict* expectations.

It’s not just `setTimeout` – other DOM timer methods have the same
behavior. The spec is here, FWIW:
http://www.whatwg.org/specs/web-apps/current-work/multipage/webappapis.html#dom-windowtimers-settimeout
Pretty sure this cannot be changed without breaking the Web.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Questions regarding ES6 Unicode regular expressions

2014-08-26 Thread Mathias Bynens
On 26 Aug 2014, at 02:16, Norbert Lindenberg 
ecmascr...@lindenbergsoftware.com wrote:

 […]

Thanks for confirming. Sounds like my “ES6 Unicode regular expressions to ES5” 
transpiler is working correctly, then: https://github.com/mathiasbynens/regexpu 
Demo: http://mothereff.in/regexpu (Bug reports welcome.)

 On Aug 25, 2014, at 1:59 , Mathias Bynens math...@qiwi.be wrote:
 
 Norbert’s original proposal for the `u` flag 
 (http://norbertlindenberg.com/2012/05/ecmascript-supplementary-characters/#RegExp)
  mentioned the following:
 
 Possibly the definition of the character classes `\d\D\w\W\b\B` is extended 
 to their Unicode extensions, such as all characters in the Unicode category 
 “Number, decimal” for `\d`, as proposed by Steven Levithan. Whether this 
 can be done under the same flag or requires a different one still needs 
 discussion.
 
 Has this been discussed any further? (I couldn’t find any mention of it in 
 the meeting notes repository.) Should I file a bug?
 
 The “needs discussion” part actually came from the March 2012 TC39 meeting:
 https://mail.mozilla.org/pipermail/es-discuss/2012-March/021919.html
 We subsequently had some discussions about how to go about such a discussion, 
 which petered out because no regular expression expert was available to work 
 with.
 
 I suspect this issue needs a proposal rather than a bug.

https://github.com/mathiasbynens/es6-unicode-character-class-escape-sets#readme 
I’m fairly confident in the proposals for `\d` and `\w`, but `\b` needs work.

@Steven Levithan, would you mind lending your expertise on this? This is your 
chance to make `/na\b/u.test('naïve')` return `false` :)
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Questions regarding ES6 Unicode regular expressions

2014-08-26 Thread Mathias Bynens
On 26 Aug 2014, at 19:01, Allen Wirfs-Brock al...@wirfs-brock.com wrote:

 I've thought about this a bit. I was initially inclined to agree with the 
 idea of extending the existing character classes similar to what Mathias' 
 proposes.  But I now think that is probably not a very good idea and that 
 what is currently spec'ed (essentially that the /u flag doesn't change the 
 meaning of \w, \d, etc.) is the better path. […] It seems to me, that we want 
 programmers to start migrating to full Unicode regular expressions without 
 having to do major logic rewrite of their code.  For example, ideally the 
 above expression could simply be replaced by 
 `parseInt(/\s*(\d+)/u.exec(input)[1])` and everything in the application 
 could continue to work unchanged.

I see your point, but I disagree with the notion that we must absolutely 
maintain backwards compatibility in this case. The fact that the new flag is 
opt-in gives us an opportunity to improve behavior without obsessing about 
back-compat, similar to how the strict mode opt-in is used to make all sorts of 
things better. When [evangelizing 
`/u`](https://mathiasbynens.be/notes/es6-unicode-regex), we can educate 
developers and tell them to not blindly/needlessly add `/u` to their existing 
regular expressions.

 Instead, we should leave the definitions of \d, \w and \s unchanged and plan 
 to adopt the already established convention that `\p{Unicode property}` is 
 the notation for matching Unicode categories. See 
 http://www.regular-expressions.info/unicode.html 

We could do both: improve `\d` and `\w` now, and add `\p{property}` and 
`\P{property}` later. Anyhow, I’ve filed 
https://bugs.ecmascript.org/show_bug.cgi?id=3157 for reserving `\p{…}`/`\P{…}`.

 I think digesting all the \p{} possibilities is too much to do for ES6, so I 
 suggest that for ES6 that we simply reserve the \p{characters} and 
 \P{characters} syntax within /u patterns.  A \p proposal can then be 
 developed for ES7.

Sounds good to me.

 I see one remaining issue:
 In ES5 (and ES6): `/a-z/i`  does not match U+017F (ſ) or U+212A (K) because 
 the ES canonicalization algorithm excludes mapping code points  127 that 
 toUpperCase to code points 128.
 However, as currently spec'ed, the ES6 canonicalization algorithm for /u 
 RegExps does not include that 127/128 exclusion.  It maps U+017F to S 
 which matches. 
 This is probably a minor variation, from the ES5 behavior, but we should 
 probably be sure it is a desirable and tolerable change as we presumably 
 could also apply the 127/128 filter to /u canonicalization.

This is a useful feature, and the explicit opt-in makes the small back-compat 
break acceptable IMHO.

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Questions regarding ES6 Unicode regular expressions

2014-08-25 Thread Mathias Bynens
Norbert’s original proposal for the `u` flag 
(http://norbertlindenberg.com/2012/05/ecmascript-supplementary-characters/#RegExp)
 mentioned the following:

 Possibly the definition of the character classes `\d\D\w\W\b\B` is extended 
 to their Unicode extensions, such as all characters in the Unicode category 
 “Number, decimal” for `\d`, as proposed by Steven Levithan. Whether this can 
 be done under the same flag or requires a different one still needs 
 discussion.

Has this been discussed any further? (I couldn’t find any mention of it in the 
meeting notes repository.) Should I file a bug?

Norbert also suggested replacing ‘characters’ with ‘code points’ in sections 
like 
https://people.mozilla.org/~jorendorff/es6-draft.html#sec-characterclassescape 
and 
https://people.mozilla.org/~jorendorff/es6-draft.html#sec-runtime-semantics-charactersetmatcher-abstract-operation
 when the `u` flag is set. It seems the intent was to make e.g. `/\d/u` match 
`/[0-9]/`, and `/\D/u` match all Unicode code points except `[0-9]`. This is 
different from `/\D/` which only matches BMP code points.

It seems like this change has not propagated to the spec draft, though. Is this 
correct, and if so, what’s the reason for that?

The same goes for `/[^a]/u` – should this match all Unicode code points except 
`a` or should it only match BMP code points?

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Early error on '0' followed by '8' or '9' in numeric literals does not seem to be web-compatible

2014-08-08 Thread Mathias Bynens
Claude Pache proposed the following spec patch: 
https://bugs.ecmascript.org/show_bug.cgi?id=2792#c11
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: July 29 2014 TC39 Meeting Notes

2014-08-06 Thread Mathias Bynens
On 5 Aug 2014, at 18:30, Rick Waldron waldron.r...@gmail.com wrote:

 - Spread now works on strings `var codeUnits = [...this is a string]`

The code example implies it results in an array of strings, one item for each 
UCS-2/UTF-16 code unit. Shouldn’t this be symbols matching whole Unicode code 
points (matching `StringIterator`) instead, i.e. no separate items for each 
surrogate half?
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Early error on '0' followed by '8' or '9' in numeric literals does not seem to be web-compatible

2014-08-06 Thread Mathias Bynens
On 7 Aug 2014, at 02:46, Bill Frantz fra...@pwpconsult.com wrote:

 On Tue, Aug 5, 2014 at 7:56 AM, Mathias Bynens math...@qiwi.be wrote:
 
 ...
 In section 11.8.3 (Numeric Literals), the definition for
 `DecimalIntegerLiteral` should somehow be tweaked to match that of
 `DecimalDigits`, with the exception that if the first digit is `0` and all
 other digits are octal digits (0-7) it must be treated as a legacy octal
 literal.
 
 So this horrible footgun, changing the value of a constant changes its radix, 
 is only lurking in sloppy mode.

It affects strict mode code too in existing implementations: there you go from 
not throwing on e.g. `0123456789` (which is not an octal literal because of the 
`8` and `9`) to suddenly throwing a syntax error when the value changes to `0` 
followed by only octal digits (as then it is an octal literal). See my previous 
posts in this thread.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Early error on '0' followed by '8' or '9' in numeric literals does not seem to be web-compatible

2014-08-05 Thread Mathias Bynens
On 5 Aug 2014, at 02:41, Allen Wirfs-Brock al...@wirfs-brock.com wrote:

 there is already a bug open on this 
 https://bugs.ecmascript.org/show_bug.cgi?id=2792 

Older bug report: https://bugs.ecmascript.org/show_bug.cgi?id=1553

We previously discussed this up at the April TC39 meeting: 
https://github.com/rwaldron/tc39-notes/blob/master/es6/2014-04/apr-9.md#change-escapesequence-0-lookahead--decimaldigit-to-match-reality
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Early error on '0' followed by '8' or '9' in numeric literals does not seem to be web-compatible

2014-08-05 Thread Mathias Bynens

On 5 Aug 2014, at 08:40, Mathias Bynens math...@qiwi.be wrote:

 On 5 Aug 2014, at 02:41, Allen Wirfs-Brock al...@wirfs-brock.com wrote:
 
 there is already a bug open on this 
 https://bugs.ecmascript.org/show_bug.cgi?id=2792 
 
 Older bug report: https://bugs.ecmascript.org/show_bug.cgi?id=1553
 
 We previously discussed this up at the April TC39 meeting: 
 https://github.com/rwaldron/tc39-notes/blob/master/es6/2014-04/apr-9.md#change-escapesequence-0-lookahead--decimaldigit-to-match-reality

Never mind – I was confused. This topic is about numeric literals rather than 
string literals (although the underlying issue is more or less the same). Carry 
on!
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Early error on '0' followed by '8' or '9' in numeric literals does not seem to be web-compatible

2014-08-05 Thread Mathias Bynens
On 4 Aug 2014, at 18:55, Jason Orendorff jason.orendo...@gmail.com wrote:

 We're talking about something different here, legacy *decimal* integer
 literals starting with 0 and containing 8 or 9. As far as I know, no
 version of ES has ever permitted this kind of nonsense, but supporting
 it is apparently required for Web compatibility. (One more great
 reason to write all your code under use strict.)

I don’t understand this comment. What does strict mode have to do with this? 
Note that `08` and `09` are not octal literals, since `8` and `9` are not 
`OctalDigit`s.

In non-strict mode, 
https://people.mozilla.org/~jorendorff/es6-draft.html#sec-additional-syntax-numeric-literals
 applies, but even then `08` and `09` should throw (as per the current spec) 
for the same reason.

Strict mode doesn’t make a difference as per the current spec when parsing this 
program:

```js
08
```

It does in Firefox/Spidermonkey, but that seems like a bug. Test this in the 
most recent nightly:

```js
(function() { 'use strict'; return 08; }())
```

This currently throws:

 SyntaxError: octal literals and octal escape sequences are deprecated

…which is a misleading message. It should instead say something like:

 SyntaxError: numbers starting with 0 followed by a digit are octals and can't 
 contain 8

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Early error on '0' followed by '8' or '9' in numeric literals does not seem to be web-compatible

2014-08-05 Thread Mathias Bynens

On 5 Aug 2014, at 16:20, Allen Wirfs-Brock al...@wirfs-brock.com wrote:

 We're only talking about Annex B, non-strict.  Right?

All engines are going to implement this anyway, so why make it Annex B only? I 
wouldn’t restrict it to non-strict mode either, as this decision seems to be 
purely based on the Firefox/SpiderMonkey bug that was discussed earlier.

 It would be great is somebody wanted to proposal the actual annex B language 
 that is need to correctly describe the web reality semantics.

In section 11.8.3 (Numeric Literals), the definition for 
`DecimalIntegerLiteral` should somehow be tweaked to match that of 
`DecimalDigits`, with the exception that if the first digit is `0` and all 
other digits are octal digits (0-7) it must be treated as a legacy octal 
literal.

 Regarding, leading 0 constants in strict mode. The long term plan is to 
 eventually make them legal decimal constants. The only reason not to do that 
 now is because it might screw up people who are migrating non-strict web 
 reality code containing octal constants into strict mode.

Firefox is the only browser that throws on `(function() { 'use strict'; return 
08; }())` and the only reason it does that is because of a bug (see my earlier 
email). In general, strict mode does not matter here.

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Early error on '0' followed by '8' or '9' in numeric literals does not seem to be web-compatible

2014-08-05 Thread Mathias Bynens
On 5 Aug 2014, at 16:56, Alex Kocharin a...@kocharin.ru wrote:

 What about allowing one-digit numbers with leading zeroes? 07 equals to 7 
 no matter whether it parsed as an octal or as a decimal. Thus, no harm there.

That wouldn’t solve the problem. Consider e.g. `01234567` (i.e. `342391`) vs. 
`01234568` (which must equal `1234568` for compatibility with existing code).
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Early error on '0' followed by '8' or '9' in numeric literals does not seem to be web-compatible

2014-08-05 Thread Mathias Bynens
On 5 Aug 2014, at 17:05, Mark S. Miller erig...@google.com wrote:

 Because of compatibility constraints, JS history can generally proceed only 
 in an additive manner, which means a steady degradation of quality along the 
 simplicity dimension. An opt-in mode switch is the only way to escape that 
 dynamic. Strict mode is the only one we've got, and the only one we're likely 
 to have in the foreseeable future. Strict mode should not accept octal 
 literals. Regarding sloppy mode, it continues to exist only for the sake of 
 legacy compat, so adding more crap to it for better web compat is the right 
 tradeoff -- as long as the crap stays quarantined within sloppy mode.

My point was that the crap under discussion is already available in strict mode 
in existing implementations (except for the one in Firefox/SpiderMonkey). It’s 
just not demonstrated yet if The Web depends on this functionality in strict 
mode too. (It not working in Firefox is an indication that it may not, sure.)
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Early error on '0' followed by '8' or '9' in numeric literals does not seem to be web-compatible

2014-08-05 Thread Mathias Bynens
On 5 Aug 2014, at 17:05, Mark S. Miller erig...@google.com wrote:

 Strict mode should not accept octal literals.

The literals under discussion (e.g. `08` and `09`) are not octal literals.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Early error on '0' followed by '8' or '9' in numeric literals does not seem to be web-compatible

2014-08-05 Thread Mathias Bynens
On 5 Aug 2014, at 17:19, Mark S. Miller erig...@google.com wrote:

 On Tue, Aug 5, 2014 at 8:17 AM, Mathias Bynens math...@qiwi.be wrote:
 
 The literals under discussion (e.g. `08` and `09`) are not octal literals.
 
 Strict mode should reject these even more vehemently! (Allen, can we have an 
 early vehement error?)

Now I’m confused again. That contradicts what Allen said earlier in this thread:

On 5 Aug 2014, at 16:20, Allen Wirfs-Brock al...@wirfs-brock.com wrote:

 Regarding, leading 0 constants in strict mode. The long term plan is to 
 eventually make them legal decimal constants.

I stand by my earlier suggestion:

1. Accept decimal integer literals with leading `0`, even in strict mode.
2. Interpret the value of such literals as octal in case they consist of octal 
digits only. (Note: this is already in Annex B – see 
`LegacyOctalIntegerLiteral`.)

Strict mode would accept `08` as it’s a zero-prefixed decimal literal but not 
`07` since that’s an octal literal.

This matches what all browsers already do (except Firefox), and fulfills the 
long-term plan Allen was talking about.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: [Strawman proposal] StrictMath variant of Math

2014-08-01 Thread Mathias Bynens
On 1 Aug 2014, at 09:25, Carl Shapiro carl.shap...@gmail.com wrote:

 Thanks for the suggestion.
 
 As Ray pointed out, the Math package in Java still has its accuracy 
 requirements specified and so it is not analogous to the current status of 
 Math package in ES6.  Also, the StrictMath package and the strictfp class 
 qualifier came about in Java back when the x87 was the predominant FPU.  
 Because of the idiosyncrasies of the x87 one could not compute bit-identical 
 floating point results without additional overhead.  Nevertheless, the 
 accuracy requirements and conformance was still achieved with satisfactory 
 performance.  Much of the history is still available on-line
 
 http://math.nist.gov/javanumerics/reports/jgfnwg-minutes-6-00.html
 http://math.nist.gov/javanumerics/reports/jgfnwg-02.html
 
 It is unclear how much of these strict modes is still relevant given that 
 SSE2 is now the predominant FPU.  The strict modes were always effectively a 
 non-issue on other architectures.
 
 Much of the pressure to relax the accuracy of the special functions seems to 
 be coming from their use in various benchmark suites and the tireless efforts 
 of the compiler engineers to squeeze out additional performance gains.  
 Requiring bounds on the accuracy of the special functions has the additional 
 benefit of putting all the browsers on equal ground so nobody has to have 
 their product suffer the indignity of a benchmark loss because they try to do 
 the right thing in the name of numerical accuracy.

+1

Introducing a new global `Math` variant wouldn’t solve the interoperability 
issues. IMHO, the accuracy of the existing `Math` methods and properties should 
be codified in the spec instead.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: 5 June 2014 TC39 Meeting Notes

2014-06-14 Thread Mathias Bynens
On 13 Jun 2014, at 18:15, Domenic Denicola dome...@domenicdenicola.com wrote:

 IMO it would be a good universe where `module` had the following things 
 `script` has:
 
 - Does not require escaping'  in any contexts.
 - Terminates when seeing `/module` + extra chars. (Possibly we could do this 
 only when it would otherwise be a parsing error, to avoid `/mod + ule` 
 grossness? But that would require some intertwingling of the HTML and ES 
 parsers, which I can imagine implementers disliking.)
 
 But it removes the following things `script` has:
 
 - `!--` escaped data mode and double-escaped mode
 - \r, \r\n, \0 special-casing
 - The two new single-line comment forms (maybe; I know these work in Node 
 though, so maybe just leave them in as part of the ES6 spec).

The majority of those are impossible without introducing different parse trees 
in old browsers (that do not recognize `module`) versus in new browsers. 
Different parse trees are a security risk.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Idea for ECMAScript 7: Number.compare(a, b)

2014-06-06 Thread Mathias Bynens
On 6 Jun 2014, at 01:15, Axel Rauschmayer a...@rauschma.de wrote:

 It’d be nice to have a built-in way for comparing numbers, e.g. when sorting 
 arrays.
 
 ```js
 // Compact ECMAScript 6 solution
 // Risk: number overflow
 [1, 5, 3, 12, 2].sort((a,b) = a-b)
 
 // Proposed new function:
 [1, 5, 3, 12, 2].sort(Number.compare)
 ```

That sorts in ascending order. What if you need to sort in descending order? 
Would there need to be a built-in function for that too?
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Array.prototype.last()

2014-05-14 Thread Mathias Bynens
Previous discussion on this topic: 
http://esdiscuss.org/topic/array-prototype-last

We should look at how existing utility libraries handle this behavior and base 
any proposals on that IMHO. Underscore and Lo-Dash have 
[`_.first`](http://lodash.com/docs#first) and 
[`_.last`](http://lodash.com/docs#last), which both take an optional `callback` 
parameter, in which case all the first/last `n` elements for which `callback` 
returns a truthy value are returned. This seems like a sensible thing to add to 
the proposal.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Native base64 utility methods

2014-05-08 Thread Mathias Bynens
On 5 May 2014, at 20:22, Andrea Giammarchi andrea.giammar...@gmail.com wrote:

 @mathias didn't mean to change atob and btoa rather add two extra methods 
 such encode/decode for strings (could land without problems in the 
 String.prototype, IMO) with less silly names whatever definition of silly 
 we have ^_^

Agreed. Moving `TextEncoder`/`TextDecoder` to ES would be nice (but it requires 
`ArrayBuffer` / `Uint8Array`). http://encoding.spec.whatwg.org/#api
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: ToPropertyDescriptor, [[HasProperty]], [[HasOwnProperty]]

2014-05-08 Thread Mathias Bynens
On Fri, May 9, 2014 at 1:44 AM, John-David Dalton
john.david.dal...@gmail.com wrote:
 Should I create a spec bug for tracking this?

Please do.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Native base64 utility methods

2014-05-05 Thread Mathias Bynens
On 5 May 2014, at 00:00, Andrea Giammarchi andrea.giammar...@gmail.com wrote:

 as generic global utility it would be also nice to make it compatible with 
 all strings.

For backwards compatibility reasons, `atob`/`btoa` should probably continue to 
work in exactly the same way they work now (i.e as per 
http://whatwg.org/html/webappapis.html#atob). Otherwise, any existing code that 
uses `atob`/`btoa` before UTF-8-decoding or after UTF-8-encoding, including 
your snippet, would suddenly break.

Like you demonstrated, it’s easy enough to encode or decode the input using 
UTF-8 or any other character encoding before passing to `atob`/`btoa`. (E.g. 
http://mothereff.in/base64)

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Native base64 utility methods

2014-05-05 Thread Mathias Bynens
On 5 May 2014, at 10:48, Claude Pache claude.pa...@gmail.com wrote:

 In my view, if `atob` and `btoa` were to enter in ES, it should be in 
 Appendix B (the deprecated legacy features of web browsers), where it would 
 be in good company with the other utility that does an implicit confusion 
 between binary and ISO-8859-1-encoded strings, namely `escape/unescape`.

How do `atob` and `btoa` do any sort of implicit conversion between binary and 
any other encoding? Their behavior is well-defined, and they’re explicitly 
limited to extended ASCII.

I don’t think this is Annex B material regardless — this is not a legacy 
feature.

 We should be able to define a better designed function (and with a less silly 
 name, while we're at it).

That would kind of defeat the purpose IMHO. We’re stuck with `atob`/`btoa` 
anyway in browsers — adding yet another name for the same thing does not really 
help.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Native base64 utility methods

2014-05-04 Thread Mathias Bynens
To convert from base64 to ASCII and vice versa, browsers have had global `atob` 
and `btoa` functions for a while now. At the moment, these are defined in the 
HTML standard: http://whatwg.org/html/webappapis.html#atob

However, such utility methods are not only useful in browsers. How about adding 
these as global functions to ECMAScript so that they’re natively available in 
all JavaScript engines, not just in browser environments?
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: RegExp.escape

2014-03-21 Thread Mathias Bynens
On 21 Mar 2014, at 16:38, C. Scott Ananian ecmascr...@cscott.net wrote:

 ```js
 function replaceTitle(title, str) {
  return str.replace(new RegExp(title), ...);
 }
 ```
 
 There ought to be a standard simple way of writing this correctly.

I’ve used something like this in the past:

RegExp.escape = function(text) {
  return text.replace(/[-[\]{}()*+?.,\\^$|#\s]/g, '\\$');
};

It escapes some characters that do not strictly need escaping to avoid bugs in 
ancient JavaScript engines. A standardized version could be even simpler, and 
would indeed be very welcome IMHO.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Array.prototype.contains

2014-03-05 Thread Mathias Bynens
On 5 Mar 2014, at 17:19, Domenic Denicola dome...@domenicdenicola.com wrote:

 Personally I think the more useful model to follow than 
 `String.prototype.contains` is `Set.prototype.has`.

But then DOM4 `DOMStringList` would still have its own `contains` _and_ the 
`has` it inherits from `Array.prototype`. That seems confusing, no?
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Another switch

2014-02-20 Thread Mathias Bynens
On 20 Feb 2014, at 21:20, Eric Elliott e...@ericleads.com wrote:

 Object literals are already a great alternative to switch in JS:
 
 var cases = {
   val1:  function () {},
   val2: function () {}
 };
 
 cases[val]();

In that case, you’d need a `hasOwnProperty` check to make sure you’re not 
trying to call `__proto__` or `toString`, etc. See 
https://github.com/rwaldron/idiomatic.js/#misc for a more complete example.

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2014-02-15 Thread Mathias Bynens
On 14 Feb 2014, at 19:59, Allen Wirfs-Brock al...@wirfs-brock.com wrote:

 It's a really high bar to get over that closed gate.  Unless the exclusion of 
 a feature was a mistake […] I don't think we should be talking about adding 
 it to ES6.

It does feel like a mistake to me to introduce `String.prototype.codePointAt`, 
but no similar function that returns the symbol instead.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2014-02-14 Thread Mathias Bynens
Allen mentioned that `String#at` might not make it to ES6 because nobody in 
TC39 is championing it. I’ve now asked Rick if he would be the champion for 
this, and he agreed. (Thanks again!)

Looking over the ‘TC39 progress’ document at 
https://docs.google.com/a/chromium.org/document/d/1QbEE0BsO4lvl7NFTn5WXWeiEIBfaVUF7Dk0hpPpPDzU,
 it seems most of the work is already taken care of: the use case was discussed 
in this thread, the proposal has a complete spec text, and there’s an example 
implementation/polyfill with unit tests. See http://mths.be/at.

Is there anything else I can do to help get this included as a non-TC39-member?
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2014-02-14 Thread Mathias Bynens
On 14 Feb 2014, at 11:11, Domenic Denicola dome...@domenicdenicola.com wrote:

 This was the method that was only useful if you pass `0` to it?

I’ll just avoid the infinite loop here by pointing to earlier posts in this 
thread where this was discussed before: 
http://esdiscuss.org/topic/string-prototype-symbolat-improved-string-prototype-charat#content-34
 and 
http://esdiscuss.org/topic/string-prototype-symbolat-improved-string-prototype-charat#content-40.

This method is just as useful as `String.prototype.codePointAt`. If that method 
is included, so should `String.prototype.at`. If `String.prototype.at` is found 
not to be useful, `String.prototype.codePointAt` should be removed too.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2014-02-14 Thread Mathias Bynens
On 14 Feb 2014, at 11:14, C. Scott Ananian ecmascr...@cscott.net wrote:

 Note that `Array.from(str)` and `str[Symbol.iterator]` overlap
 significantly.  In particular, it's somewhat awkward to iterate over
 code points using `String#symbolAt`; it's much easier to use
 `substr()` and then use the StringIterator.

`String#at` is not meant for iterating over code points – that’s what the 
`StringIterator` is for.

`String#at` is exactly like `String#codePointAt` except it returns strings 
(containing the symbol) instead of numbers (representing the code point value). 
It can be used to get the symbol at a given code unit position in a string 
(similar to how `String#codePointAt` can be used to get the code point at a 
given code unit position in a string).
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: comment overflow

2014-02-10 Thread Mathias Bynens
On 10 Feb 2014, at 10:30, Michael Dyck jmd...@ibiblio.org wrote:

 On a more meta level, do the process plans for ES7 include any new
 mechanisms for:
 (a) submitting comments on spec drafts, or
 (b) reducing the number of errors in spec drafts to begin with?

If only the spec were maintained in a plain text-based format (like Markdown or 
even HTML) it would be easy to host its repository on, say, GitHub, which would 
enable commenting on inline diffs (= perfect for pointing out small typos 
etc.). That way, it would also be possible to link to specific lines in a 
specific revision of the spec. Those things would already avoid a lot of 
overhead currently present when filing bugs IMHO. 

This brings us back to the good old let’s-stop-using-a-Word-document discussion.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Ecmascript.org

2014-01-31 Thread Mathias Bynens

 I was wondering who was in charge of the ecmascript.org web site.

$ whois ecmascript.org
[snip]
Registrant Organization:Mozilla Corporation
Registrant Street: 650 Castro St Ste 300
Registrant City:Mountain View
Registrant State/Province:CA
Registrant Postal Code:94041
Registrant Country:US
Registrant Phone:+1.6509030800
Registrant Phone Ext:
Registrant Fax:
Registrant Fax Ext:
Registrant Email:hostmas...@mozilla.com
[snip]

Mozillians can probably tell you more.

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: `String.prototype.contains(regex)`

2013-12-23 Thread Mathias Bynens
On 18 Dec 2013, at 23:02, Benjamin (Inglor) Gruenbaum ing...@gmail.com wrote:

 If anything, I'd expect all of them to throw when passed multiple arguments 
 for forward compatibility. It might be useful to check multiple values in 
 contains/endsWith/startsWith or constrain it in some way. 

The reason `String.prototype.{starts,ends}With` throw when passed a regular 
expression is forward compatibility:

 Note 2. Throwing an exception if the first argument is a RegExp is specified 
 in order to allow future editions to define extends that allow such argument 
 values.

It seems that `contains` was forgotten about when 
https://bugs.ecmascript.org/show_bug.cgi?id=498#c3 was fixed, so I’ve filed 
https://bugs.ecmascript.org/show_bug.cgi?id=2407 asking to make 
`String.prototype.contains(regex)` throw as well.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


`String.prototype.contains(regex)`

2013-12-18 Thread Mathias Bynens
Both `String.prototype.startsWith` and `String.prototype.endsWith` throw a 
`TypeError` if the first argument is a RegExp:

 Throwing an exception if the first argument is a RegExp is specified in order 
 to allow future editions to define extends that allow such argument values.

However, this is not the case for `String.prototype.contains`, even though it’s 
a very similar method. As per the latest ES6 draft, 
`String.prototype.contains(regex)` behaves like 
`String.prototype.contains(String(regex))`. This seems inconsistent. What’s the 
reason for this inconsistency?
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Function.prototype.apply() Function.prototype.call() with `undefined` or `null` as `thisArg`

2013-12-10 Thread Mathias Bynens
From http://ecma-international.org/ecma-262/5.1/#sec-15.3.4.3 and 
http://ecma-international.org/ecma-262/5.1/#sec-15.3.4.4:

 The `thisArg` value is passed without modification as the `this` value. This 
 is a change from Edition 3, where a `undefined` or `null` `thisArg` is 
 replaced with the global object and `ToObject` is applied to all other values 
 and that result is passed as the `this` value.

It seems like modern engines still have the ES3 behavior:

function foo() {
  console.log(this);
  return this;
};
foo.call(undefined) === undefined; // `false`, expected `true`

I’ve tested this in Spidermonkey/Firefox, Carakan/PrestOpera, JSC/Safari, and 
v8/Chrome. They all show FAIL in this test case:

data:text/html,scriptfunction foo() { console.log(this); return this; }; 
document.write(foo.call(undefined) === undefined %3F 'PASS' %3A 
'FAIL');/script

Is this…

1. a wilful violation of the ES5 spec for back-compat reasons, or…
2. is it just an oversight that this never got implemented, or…
3. am I misreading the spec?

If 1 is the case, the ES6 spec should match reality by reverting the change 
introduced in ES5.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Function.prototype.apply() Function.prototype.call() with `undefined` or `null` as `thisArg`

2013-12-10 Thread Mathias Bynens
Turns out this is a bug in the spec: 
https://bugs.ecmascript.org/show_bug.cgi?id=2370
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Using Unicode code point escape sequences in regular expressions without the `u` flag

2013-11-22 Thread Mathias Bynens
One more related question: are these three regular expression literals 
equivalent?

1. `/[-]/u`: raw astral symbols
2. `/[\u{1F4A9}-\u{1F4AB}]/u`: astral symbols represented using Unicode code 
point escape sequences
3. `/[\uD83D\uDCA9-\uD83D\uDCAB]/u`: astral symbols represented as a surrogate 
pair

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Using Unicode code point escape sequences in regular expressions without the `u` flag

2013-11-22 Thread Mathias Bynens
On 22 Nov 2013, at 11:20, Allen Wirfs-Brock al...@wirfs-brock.com wrote:

 Did you check the ES6 draft grammar{1]?  The answer to that should be fairly 
 obvious there and if it isn't it would be good to know so we can try to make 
 it clearer in the spec.
 
 [1]: http://people.mozilla.org/~jorendorff/es6-draft.html#sec-patterns 

It’s pretty clear that (1) is equivalent to (2). I guess (3) is equivalent to 
(1) and (2) because of the following:

RegExpUnicodeEscapeSequence[U] ::
[+U] LeadSurrogate \u TrailSurrogate

…but I was looking for confirmation.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Using Unicode code point escape sequences in regular expressions without the `u` flag

2013-11-21 Thread Mathias Bynens
If I’m reading the latest draft correctly, `RegExpUnicodeEscapeSequence`s 
aren’t allowed in regular expressions without the `u` flag. Why is that?

AFAICT, the only situations that require looking at code points rather than 
UCS-2/UTF-16 code units in order to support full Unicode are:

* the regex is case-insensitive;
* the regex contains a character class;
* the regex uses `.`;
* the regex uses a quantifier.

I’d suggest allowing `\u{xx}`-style escape sequences everywhere, and simply 
changing the behavior of the resulting regular expression depending on the `u` 
flag. There’s no good reason to disallow e.g. `/\u{20}/` or even `/\u{1F4A9}/`.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: [Json] BOMs

2013-11-21 Thread Mathias Bynens
On 21 Nov 2013, at 09:41, Bjoern Hoehrmann derhoe...@gmx.net wrote:

 Is there any chance, by the way, to change `JSON.stringify` so it does
 not output strings that cannot be encoded using UTF-8? Specifically,
 
  JSON.stringify(JSON.parse(\\uD800\))
 
 would need to escape the surrogate instead of emitting it literally.

Previous discussion: 
http://esdiscuss.org/topic/code-points-vs-unicode-scalar-values#content-14

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Working with grapheme clusters

2013-10-27 Thread Mathias Bynens
On 26 Oct 2013, at 14:39, Bjoern Hoehrmann derhoe...@gmx.net wrote:

 * Norbert Lindenberg wrote:
 On Oct 25, 2013, at 18:35 , Jason Orendorff jason.orendo...@gmail.com 
 wrote:
 
 UTF-16 is designed so that you can search based on code units
 alone, without computing boundaries. RegExp searches fall in this
 category.
 
 Not if the RegExp is case insensitive, or uses a character class, or ., or 
 a
 quantifier - these all require looking at code points rather than UTF-16 code
 units in order to support the full Unicode character set.
 
 If you have a regular expression over an alphabet like Unicode scalar
 values it is easy to turn it into an equivalent regular expression over
 an alphabet like UTF-16 code units.

FWIW, [Regenerate](http://mths.be/regenerate) is a JavaScript library that can 
be used for this. A few examples from 
http://mathiasbynens.be/notes/javascript-unicode#regex:

 Here’s a regular expression is created that matches any Unicode scalar value:
 
  regenerate()
  .addRange(0x0, 0x10) // all Unicode code points
  .removeRange(0xD800, 0xDBFF) // minus high surrogates
  .removeRange(0xDC00, 0xDFFF) // minus low surrogates
  .toRegExp()
 /[\0-\uD7FF\uE000-\u]|[\uD800-\uDBFF][\uDC00-\uDFFF]/


Similarly, to polyfill `.` in a Unicode-enabled ES6 regex:

 When the `u` flag is set, `.` is equivalent to the following 
 backwards-compatible regular expression pattern:
 
  regenerate()
  .addRange(0x0, 0x10) // all Unicode code points
  .remove(  // minus `LineTerminator`s 
 (http://ecma-international.org/ecma-262/5.1/#sec-7.3):
0x000A, // Line Feed LF
0x000D, // Carriage Return CR
0x2028, // Line Separator LS
0x2029  // Paragraph Separator PS
  )
  .toString();
 
 '[\0-\x09\x0B\x0C\x0E-\u2027\u202A-\uD7FF\uDC00-\u]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\uD800-\uDBFF]'
 
  
 /foo(?:[\0-\x09\x0B\x0C\x0E-\u2027\u202A-\uD7FF\uDC00-\u]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\uD800-\uDBFF])bar/u.test('foobar')
 true
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Working with grapheme clusters

2013-10-24 Thread Mathias Bynens
On 24 Oct 2013, at 16:02, Claude Pache claude.pa...@gmail.com wrote:

 Therefore, I propose the following basic operations to operate on grapheme 
 clusters:

Out of curiosity, is there any programming language that operates on grapheme 
clusters (rather than code points) by default? FWIW, code point iteration is 
what I’d expect in any language.

   text.graphemeAt(0) // get the first grapheme of the text
 
   // shorten a text to its first hundred graphemes
   var shortenText = ''
   let numGraphemes = 0
   for (let grapheme of text) {
   numGraphemes += 1
   if (numGraphemes  100) {
   shortenText += '…'
   break
   }
   shortenText += grapheme
   }

So, you would want to change the string iterator’s behavior too?

 As a side note, I ask whether the `String.prototype.symbolAt 
 `/`String.prototype.at` as proposed in a recent thread, and the 
 `String.prototype[@@iterator]` as currently specified, are really what people 
 need, or if they would mistakenly use them with the intended meaning of 
 `String.prototype.graphemeAt` and `String.prototype.graphemes` as discussed 
 in the present message?

I don’t think this would be an issue. The new `String` methods and the iterator 
are well-defined and documented in terms of *code points*.

IMHO combining marks are easy enough to match and special-case in your code if 
that’s what you need. You could use a regular expression to iterate over all 
grapheme clusters in the string:

// Based on the example on 
http://mathiasbynens.be/notes/javascript-unicode#accounting-for-other-combining-marks
var regexGraphemeCluster = 
/([\0-\u02FF\u0370-\u1DBF\u1E00-\u20CF\u2100-\uD7FF\uDC00-\uFE1F\uFE30-\u]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\uD800-\uDBFF])([\u0300-\u036F\u
1DC0-\u1DFF\u20D0-\u20FF\uFE20-\uFE2F]*)/g;

var zalgo = 
'Z͑ͫ̓ͪ̂ͫ̽͏̴̙̤̞͉͚̯̞̠͍A̴̵̜̰͔ͫ͗͢L̠ͨͧͩ͘G̴̻͈͍͔̹̑͗̎̅͛́Ǫ̵̹̻̝̳͂̌̌͘!͖̬̰̙̗̿̋ͥͥ̂ͣ̐́́͜͞';

zalgo.match(regexGraphemeCluster);
[
  Z͑ͫ̓ͪ̂ͫ̽͏̴̙̤̞͉͚̯̞̠͍,
  A̴̵̜̰͔ͫ͗͢,
  L̠ͨͧͩ͘,
  G̴̻͈͍͔̹̑͗̎̅͛́,
  Ǫ̵̹̻̝̳͂̌̌͘,
  !͖̬̰̙̗̿̋ͥͥ̂ͣ̐́́͜͞
]
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Working with grapheme clusters

2013-10-24 Thread Mathias Bynens
On 24 Oct 2013, at 16:22, Anne van Kesteren ann...@annevk.nl wrote:

 On Thu, Oct 24, 2013 at 3:02 PM, Claude Pache claude.pa...@gmail.com wrote:
 As a side note, I ask whether the `String.prototype.symbolAt 
 `/`String.prototype.at` as proposed in a recent thread,
 and the `String.prototype[@@iterator]` as currently specified, are really 
 what people need,
 or if they would mistakenly use them with the intended meaning of 
 `String.prototype.graphemeAt`
 and `String.prototype.graphemes` as discussed in the present message?
 
 Thoughts?
 
 If we want to make it easier for developers to work with text, we should 
 offer them functionality at the grapheme cluster level and not distract 
 everyone with code units and code points. Thanks for making a proposal!

I’d welcome grapheme helper methods (even though the ES6 string methods already 
make it easier to deal with grapheme clusters than ever before), but I strongly 
disagree the string iterator should be changed. I think the use case of 
iterating over code points is much more common.

Imagine you’re writing a JavaScript library that escapes a given string as an 
HTML character reference, or as a CSS identifier, or anything else. In those 
cases, you don’t care about grapheme clusters, you care about code points, 
cause those are the units you end up escaping individually.

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Making the identifier identification strawman less restrictive

2013-10-22 Thread Mathias Bynens
On 14 Oct 2013, at 23:21, Erik Arvidsson erik.arvids...@gmail.com wrote:

 I'm concerned about the latest version of this on the wiki. The
 edition parameter requires that we ship 2 tables today. This seems
 like it might change to 3 in ES7 and n in ES(n+4). I think the only
 reasonable requirement is that it matches what the engine actually
 uses. For tools it seems better for them to include this table. I
 don't want all runtimes to have to pay for something that only tools
 need.

This strawman is only useful for tools. If tools need to implement this 
themselves, this basically means the strawman is rejected, right?
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2013-10-20 Thread Mathias Bynens
On 19 Oct 2013, at 12:54, Domenic Denicola dome...@domenicdenicola.com wrote:

 My proposed cowpaths:
 
 ```js
 Object.mixin(String.prototype, {
  realCharacterAt(i) {
let index = 0;
for (var c of this) {
  if (index++ === i) {
return c;
  }
}
  }
  get realLength() {
let counter = 0;
for (var c of this) {
  ++counter;
}
return counter;
  }
 });
 ```

Good stuff!

To account for [lookalike symbols due to combining marks] [1], just add a call 
to `String.prototype.normalize`:

Object.mixin(String.prototype, {
  get realLength() {
let counter = 0;
for (var c of this.normalize('NFC')) {
  ++counter;
}
return counter;
  }
});

assert('ma\xF1ana'.realLength == 'man\u0303ana'.realLength);

[1]: http://mathiasbynens.be/notes/javascript-unicode#accounting-for-lookalikes

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2013-10-19 Thread Mathias Bynens
On 19 Oct 2013, at 12:15, Bjoern Hoehrmann derhoe...@gmx.net wrote:

 Certainly not common enough to warrant a two-character method on the
 native string type. Odds are people will use it incorrectly in an
 attempt to make their code look concise […]

Are you saying that changing the name to something that is longer than `at` 
would solve this problem?

 […] not understanding that it'll retrieve a substring of .length 1 or 2,
 possibly consisting of a lone surrogate, based on a 16 bit index that
 might fall in the middle of a character; the problematic cases are
 fairly rare, so it's hard to notice improper use of `.at` in automated
 testing or in code review.

People are using `String.prototype.charAt()` incorrectly too, expecting it to 
return whole symbols instead of surrogate halves wherever possible. How would 
_not_ introducing a method that avoids this problem help?
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2013-10-19 Thread Mathias Bynens
On 19 Oct 2013, at 00:53, Domenic Denicola dome...@domenicdenicola.com wrote:

 On 19 Oct 2013, at 01:12, Mathias Bynens math...@qiwi.be wrote:
 `String.prototype.codePointAt` or `String.prototype.at` come in handy in 
 case you only need to get the first code point or symbol in a string, for 
 example.
 
 Are they useful for anything else, though? For example, if I wanted to get 
 the second symbol in a string, how would I do that?

Yeah, that’s the problem with these methods. Additional user code is required 
to handle non-zero `position` arguments, unless you’re sure the `position` is 
actually the start of a code point (and not in the middle of a surrogate pair). 
I guess there are situations where that’s a certainty, for example when you’re 
dealing with a string in which the user selected some text.

This brings us back to the earlier discussion of whether something like 
`String.prototype.codePoints` should be added: 
http://esdiscuss.org/topic/how-to-count-the-number-of-symbols-in-a-string It 
could be a getter or a generator… Or does `for…of` iteration handle this use 
case adequately?
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


`String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2013-10-18 Thread Mathias Bynens
ES6 fixes `String.fromCharCode` by introducing `String.fromCodePoint`.

Similarly, `String.prototype.charCodeAt` is fixed by 
`String.prototype.codePointAt`.

Should there be a method that is like `String.prototype.charAt` except it deals 
with astral Unicode symbols wherever possible?

 '팆'.charAt(0) // U+1D306
'\uD834' // the first surrogate half for U+1D306

 '팆'.symbolAt(0) // U+1D306
'팆' // U+1D306

Has this been discussed before? If there’s any interest I’d be happy to create 
a strawman.

Mathias  
http://mathiasbynens.be/
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2013-10-18 Thread Mathias Bynens
On 18 Oct 2013, at 09:21, Rick Waldron waldron.r...@gmail.com wrote:

 I think the idea is good, but the name may be confusing with regard to 
 Symbols (maybe not?)

Yeah, I thought about that, but couldn’t figure out a better name. “Glyph” or 
“Grapheme” wouldn’t be accurate. Any suggestions?

Anyway, if everyone agrees this is a good idea I’ll get started on fleshing out 
a proposal. We can then use this thread to bikeshed about the name.

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2013-10-18 Thread Mathias Bynens
Here’s my proposal. Feedback welcome, as well as suggestions for a better name 
(if any).

## String.prototype.symbolAt(pos)

NOTE: Returns a single-element String containing the code point at element 
position `pos` in the String `value` resulting from converting the `this` 
object to a String. If there is no element at that position, the result is the 
empty String. The result is a String value, not a String object.

When the `symbolAt` method is called with one argument `pos`, the following 
steps are taken:

01. Let `O` be `CheckObjectCoercible(this value)`.
02. Let `S` be `ToString(O)`.
03. `ReturnIfAbrupt(S)`.
04. Let `position` be `ToInteger(pos)`.
05. `ReturnIfAbrupt(position)`.
06. Let `size` be the number of elements in `S`.
07. If `position  0` or `position ≥ size`, return the empty String.
08. Let `first` be the code unit at index `position` in the String `S`.
09. Let `cuFirst` be the code unit value of the element at index `0` in the 
String `first`.
10. If `cuFirst  0xD800` or `cuFirst  0xDBFF` or `position + 1 = size`, then 
return `first`.
11. Let `cuSecond` be the code unit value of the element at index `position + 
1` in the String `S`.
12. If `cuSecond  0xDC00` or `cuSecond  0xDFFF`, then return `first`.
13. Let `second` be the code unit at index `position + 1` in the string `S`.
14. Let `cp` be `(first – 0xD800) × 0x400 + (second – 0xDC00) + 0x1`.
15. Return the elements of the UTF-16 Encoding (clause 6) of `cp`.

NOTE: The `symbolAt` function is intentionally generic; it does not require 
that its `this` value be a String object. Therefore it can be transferred to 
other kinds of objects for use as a method.

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2013-10-18 Thread Mathias Bynens
On 18 Oct 2013, at 10:48, Anne van Kesteren ann...@annevk.nl wrote:

 On Fri, Oct 18, 2013 at 1:46 PM, Mathias Bynens math...@qiwi.be wrote:
 Similarly, `String.prototype.charCodeAt` is fixed by 
 `String.prototype.codePointAt`.
 
 When you phrase it like that, I see another problem with
 codePointAt(). You can't just replace existing usage of charCodeAt()
 with codePointAt() as that would fail for input with paired
 surrogates. E.g. a simple loop over a string that prints code points
 would print both the code point and the trail surrogate code point for
 a surrogate pair.

I disagree. In those situations you should just iterate over the string using 
`for…of`.

`.symbolAt()` can be a useful replacement for `.charAt()` in case you only need 
to get the first symbol in the string. The same goes for `.codePointAt()` vs. 
`.charCodeAt()`.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2013-10-18 Thread Mathias Bynens
On 18 Oct 2013, at 11:05, Anne van Kesteren ann...@annevk.nl wrote:

 On Fri, Oct 18, 2013 at 4:58 PM, Mathias Bynens math...@qiwi.be wrote:
 I disagree. In those situations you should just iterate over the string 
 using `for…of`.
 
 That seems to iterate over code units as far as I can tell.
 
 for (var x of )
  print(x.charCodeAt(0))
 
 invokes print() twice in Gecko.

Woah, that doesn’t seem very useful. Is that a bug, or the way it’s supposed to 
work? I thought it was supposed to only iterate over whole code points (i.e. 
only print once for each code point, not once for each surrogate half).
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2013-10-18 Thread Mathias Bynens
Please ignore my previous email; it has been answered already. (It was a draft 
I wrote up this morning before I lost my internet connection.)

On 18 Oct 2013, at 11:57, Allen Wirfs-Brock al...@wirfs-brock.com wrote:

 Given that we have charAt, charCodeAt and codePointAt,  I think the most 
 appropiate name for such a method would be 'at':
  '팆'.at(0)

Love it!

 The issue when this sort of method has been discussed in the past has been 
 what to do when you index at a trailing surrogate possition:
 
 '팆'.at(1)
 
 do you still get '팆' or do you get the equivalent of 
 String.fromCharCode('팆'[1]) ?

In my proposal it would return the equivalent of `String.fromCharCode('팆'[1])`. 
I think that’s the most sane behavior in that case. This also mimics the way 
`String.codePointAt` works in such a case.

Here’s a prollyfill for `String.prototype.at` based on my earlier proposal: 
https://github.com/mathiasbynens/String.prototype.at Tests: 
https://github.com/mathiasbynens/String.prototype.at/blob/master/tests/tests.js
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2013-10-18 Thread Mathias Bynens
On 18 Oct 2013, at 15:12, Andrea Giammarchi andrea.giammar...@gmail.com wrote:

 so my counter-question would be: is there any way to do that in core so that 
 we can “”.split() it so that we can have an ArrayLike that with [1] gives 
 back the single “” and not the whole thing ?

This brings us back to the earlier discussion of whether something like 
`String.prototype.codePoints` should be added: 
http://esdiscuss.org/topic/how-to-count-the-number-of-symbols-in-a-string I 
think it would be useful

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2013-10-18 Thread Mathias Bynens
On 18 Oct 2013, at 17:51, Joshua Bell jsb...@google.com wrote:

 Given that you can only use the proposed String.prototype.at() properly for 
 indexes  0 if you know the index of a non-BMP character or lead surrogate by 
 some other means, or if you will test the return value for a trailing 
 surrogate, is it really an advantage over using codePointAt / fromCodePoint?
 
 The name at is so tempting I'm imagining naive scripts of the form for (i = 
 0; i  s.length; ++i) { r += s.at(i); } which will work fine until they get a 
 non-BMP input at which point they're suddenly duplicating the trailing 
 surrogates.
 
 Pushing people towards for-of iteration and even Allen's Array.from( 
 '팆팆팆'))[1] seems safer; users who need more subtle things have have 
 codePointAt / fromCodePoint available and hopefully the knowledge to use them.

Just because new features can be used incorrectly doesn’t mean the feature 
isn’t useful. `for…of` on strings and `String.prototype.at` are two very 
different things for two very different use cases. It’s a matter of using the 
right tool for the job, IMHO.

In your example (iterating over all code points in a string), `for…of` should 
be used.

`String.prototype.codePointAt` or `String.prototype.at` come in handy in case 
you only need to get the first code point or symbol in a string, for example.

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: How is let compatibility resolved?

2013-10-14 Thread Mathias Bynens
On 2 Oct 2013, at 10:45, Petka Antonov petka_anto...@hotmail.com wrote:

 In current version, this works just fine:
 
var let = 6;

Note that `let` was reserved in strict mode (only) in ES5, meaning that even as 
per ES5 that snippet only works in sloppy mode.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Fwd: Making the identifier identification strawman less restrictive

2013-10-11 Thread Mathias Bynens
Forwarding Marijn’s message since he’s not subscribed to es-discuss.

Begin forwarded message:

 From: Marijn Haverbeke mari...@gmail.com
 Subject: Re: Making the identifier identification strawman less restrictive
 Date: 10 October 2013 11:13:34 CEST
 To: Norbert Lindenberg ecmascr...@lindenbergsoftware.com
 Cc: Mathias Bynens math...@qiwi.be, es-discuss es-discuss@mozilla.org, 
 Anton Kovalyov an...@kovalyov.net, Yusuke SUZUKI utatane@gmail.com, 
 ariya.hida...@gmail.com, Jeremy Ashkenas jashke...@gmail.com, 
 mi...@bazon.net
 
 I have no particular opinion about this. Identifiers with obscure
 characters tend to be so rare that I don't expect to have any trouble
 with this except for constructed conformance tests. Since you'll
 probably be the people who are going to construct such tests, I'll
 leave you to figure out what's sane.

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Fwd: Making the identifier identification strawman less restrictive

2013-10-10 Thread Mathias Bynens
Forwarding Anton’s message since he’s not subscribed to es-discuss.

Begin forwarded message:

 From: Anton Kovalyov an...@kovalyov.net
 Subject: Re: Making the identifier identification strawman less restrictive
 Date: 9 October 2013 22:17:50 CEST
 To: Mathias Bynens math...@qiwi.be
 Cc: Norbert Lindenberg ecmascr...@lindenbergsoftware.com, es-discuss 
 es-discuss@mozilla.org, Yusuke SUZUKI utatane@gmail.com, Ariya 
 Hidayat ariya.hida...@gmail.com, Jeremy Ashkenas jashke...@gmail.com, 
 Marijn Haverbeke mari...@gmail.com, mi...@bazon.net
 
 Hi,
 
 If someone who’s running their code in the ES5 environment has a potential of 
 running into problems when using Unicode 6.3, JSHint needs to warn about it. 
 Today it doesn’t mostly because I’m really fuzzy on differences between 
 Unicode versions and I don’t have much time to dig into that so I’m relying 
 on incoming patches.
 
 Hope that helps at all. Let me know if you need more info or if I 
 misunderstood the question.
 
 Anton

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Making the identifier identification strawman less restrictive

2013-10-09 Thread Mathias Bynens
CC’ing the creators of the tools we’ve been talking about to get their input. 
Hi guys! Please start reading here: 
http://esdiscuss.org/topic/making-the-identifier-identification-strawman-less-restrictive.

On 9 Oct 2013, at 07:48, Norbert Lindenberg ecmascr...@lindenbergsoftware.com 
wrote:

 - For a code transformation tool, such as CoffeeScript, I agree that you 
 probably don't want to introduce any artificial restrictions, so you want to 
 use the latest Unicode version possible. Step 10 of the proposed algorithm 
 (let unicode be the Unicode version supported by the implementation in 
 ECMAScript identifiers) is intended to cover that case.

But that makes it an implementation-dependent impure function, which is 
unacceptable for code transformation tools like CoffeeScript and parsers like 
Esprima, Acorn, or UglifyJS. They’d support certain identifiers in engine A but 
not in engine B, without any control over it. If this is how 
`String.isIdentifier{Start,Part}` works I think these tools will stick to their 
custom identifier identification methods, which would defeat the purpose of the 
entire strawman. (Ariya, Marijn, Mihai: any thoughts?)

 For these reasons, I’d suggest changing the identifier identification 
 proposal as follows. […]
 
 That would create several problems:
 
 - The Unicode version for ES 5 would be above that for ES 6 (step 9).

I would love to see that changed too as per 
http://javascript.spec.whatwg.org/#unicode-database-version, but that’s an 
issue with the main ES spec. https://bugs.ecmascript.org/show_bug.cgi?id=2071

 - Tools like JSHint, if they want to ensure compatibility with all ES 5 
 implementations, would have to lie and specify ES 3.

They don’t at the moment. @Anton, any thoughts?

 - Step 11 would allow all Unicode code points that are matched by the 
 IdentifierStart production, including supplementary code points, which ES 5 
 does not permit in identifiers. (Note that Unicode 3.0, the version 
 referenced by the ES 3 and ES 5 specs, was the last one that did not define 
 any supplementary characters, so the spec as proposed doesn't have that 
 problem).

Step 11 says “If cp is matched by the IdentifierStart production in edition 
`edition` of the ECMAScript Language Specification using Unicode version 
`unicode`, then return `true`” so this is not a problem either way. ES5 
`IdentifierStart` doesn’t include supplementary code points, like you said, 
because of the way ES5 defines “character”.

 - Implementations that don't support Unicode 6.3 yet, e.g., because they rely 
 on Unicode information provided by the operating system, would not be able to 
 comply with the spec.

Which implementations do that? The ones I’ve seen all use custom-generated 
Unicode data files. Is this really an issue?

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: FYI: Ecma-404 approved and published

2013-10-08 Thread Mathias Bynens
On 8 Oct 2013, at 19:59, Allen Wirfs-Brock al...@wirfs-brock.com wrote:
 The Ecma General Assembly has approved by letter ballot Ecma-404: THE JSON 
 Data Interchange Formal
 See http://www.ecma-international.org/publications/standards/Ecma-404.htm 

As for Unicode, it explicitly refers to Unicode 6.2.0, even though version 
6.3.0 was released last week.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: FYI: Ecma-404 approved and published

2013-10-08 Thread Mathias Bynens
On 8 Oct 2013, at 22:19, Rick Waldron waldron.r...@gmail.com wrote:
 On Tue, Oct 8, 2013 at 4:10 PM, Mathias Bynens math...@qiwi.be wrote:
  As for Unicode, it explicitly refers to Unicode 6.2.0, even though version 
  6.3.0 was released last week.
 
 The document was written in July, which was before last week.

No need to get snarky.

Why not just refer to http://www.unicode.org/versions/latest/, i.e. the latest 
available Unicode version? The version number doesn’t really matter for JSON as 
all it cares about is the concept of “code points”, the range of which is fixed.

Sorry for not raising this earlier, I must’ve missed the call for feedback 
in/before July.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: FYI: Ecma-404 approved and published

2013-10-08 Thread Mathias Bynens
On 8 Oct 2013, at 23:39, Mark S. Miller erig...@google.com wrote:
 JSON must not change. If it refers to the latest Unicode, whatever that is, 
 then it is potentially subject to disruption by (admittedly unlikely) future 
 changes to Unicode.

By that logic, it should have referred to either Unicode v5.0.0 or v4.1.0 
because that were the latest available versions back in July 2006 as per 
http://www.unicode.org/history/publicationdates.html.

On 8 Oct 2013, at 23:51, Allen Wirfs-Brock al...@wirfs-brock.com wrote:
 If you look at the actual dependencies, it hardly matter as they are upon 
 things that is very hard to image ever changing.
 
 The dependencies are:
 1)The definition of code point 
 http://www.unicode.org/versions/Unicode6.2.0/ch03.pdf#G2212 
 2) the actual code point to abstract character associations for the 
 ASCII characters mentioned in the spec. 
 3) the UTF-16 encoding algorithm used for non-BMP code points
 4)  ?? is there anything else?

Not as far as I can tell.

On 8 Oct 2013, at 23:51, Allen Wirfs-Brock al...@wirfs-brock.com wrote:
 I suspect the version specificity could be removed in the future.


Yay!
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Making the identifier identification strawman less restrictive

2013-10-06 Thread Mathias Bynens
This is about the identifier identification strawman: 
http://wiki.ecmascript.org/doku.php?id=strawman:identifier_identification

For tooling, it’s better to have a false positive than to have a false 
negative. In the case of identifier identification, it’s more useful to flag an 
identifier that is permitted as per the latest Unicode version as valid instead 
of rejecting it, even if it’s perhaps not supported in some engines that use 
data tables based on older Unicode versions.

In general, tools try to be lenient rather than restrictive in the input they 
accept. The list of ECMAScript 5 parsers that handle non-ASCII symbols in 
identifiers in the strawman backs this up: instead of using Unicode 3.0.0 data, 
more recent Unicode versions are used, in an attempt to handle as many 
technically valid identifiers as possible.

* Esprima and Acorn parse identifiers as per Unicode 6.3.0.
* UglifyJS v1 and v2 use Unicode 6.1.0, which as far as ECMAScript 5.1 
identifiers go, is identical to Unicode 6.3.0.

For these reasons, I’d suggest changing the identifier identification proposal 
as follows. Step 8 currently says:

 If `edition` is `3` or `5`, let `unicode` be `3.0`.

Change that into step 8a:

 If `edition` is `3`, let `unicode` be `3.0`.

Then, add a new step `8b`:

 If `edition` is `5`, let `unicode` be `6.3`.

Mathias  
http://mathiasbynens.be/

P.S. I’ve created an identifier identification prollyfill 
(https://github.com/mathiasbynens/identifier-identification) based on the 
current strawman. I’ll happily modify it if the strawman gets updated in any 
way.

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


On `String.prototype.codePointAt` and `String.fromCodePoint`

2013-09-24 Thread Mathias Bynens
Patches implementing `String.prototype.codePointAt` and `String.fromCodePoint` 
are available for both SpiderMonkey 
(https://bugzilla.mozilla.org/show_bug.cgi?id=918879) and V8 
(https://code.google.com/p/v8/issues/detail?id=2840).

One spec bug remains to be fixed, though: 
https://bugs.ecmascript.org/show_bug.cgi?id=1153. It seems pretty clear the 
intent is to return `undefined` and not `NaN` (the algorithms in both the 
proposal and the ES6 draft agree on it), but it would be good to have this 
confirmed.

Is it a good idea for engines to start implementing these methods, or is their 
design still being discussed? The definitions of these methods have been in the 
ES6 draft for a long time (since July 2012) without any changes. Does that 
indicate stability? How sure are we that they will end up in the final ES6 spec?

Mathias  
http://mathiasbynens.be/
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: On `String.prototype.codePointAt` and `String.fromCodePoint`

2013-09-24 Thread Mathias Bynens
 I think I'm convinced that String.fromCodePoint()'s design is correct,
 especially since the rendering subsystem deals with code points too.

Glad to hear.

 String.prototype.codePointAt() however still feels wrong since you
 always need to iterate from the start to get the correct code *unit*
 offset anyway so why would you use it rather than the code *point*
 iterator that is planned for inclusion?

I think there are valid use cases for both.

For example, `String.prototype.codePointAt()` makes it easy to get only the 
code point at the first position, i.e. `str.codePointAt(0)`. `for…of` iterates 
over all code points in the string by default.

One key difference is that `String.prototype.codePointAt` is polyfillable in 
ES3/ES5, while `for…of` isn’t. This makes it easier to switch to 
`String.prototype.codePointAt` in existing code that is (incorrectly) using 
`String.prototype.charCodeAt` to loop over all code points in a string.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Backwards compatibility and U+2E2F in `Identifier`s

2013-09-18 Thread Mathias Bynens
On 18 Sep 2013, at 21:05, Anne van Kesteren ann...@annevk.nl wrote:

 On Mon, Aug 19, 2013 at 5:25 AM, Mathias Bynens math...@qiwi.be wrote:
 After comparing the output, I noticed that both regular expressions are 
 identical except for the following: ECMAScript 5 allows U+2E2F VERTICAL 
 TILDE in `IdentifierStart` and `IdentifierPart`, but ECMAScript 6 / Unicode 
 TR31 doesn’t.
 
 Per ES6 identifiers start with code points whose category is ID_Start
 which per http://www.unicode.org/reports/tr31/ includes Lm which per
 http://www.unicode.org/Public/UNIDATA/UnicodeData.txt is true for
 U+2E2F. So why exactly is it disallowed?

`ID_Start` includes code points in the `Lm` category indeed, but then later 
explicitly disallows `Pattern_Syntax` and `Pattern_White_Space` code points. As 
it says on the page you linked to:

 In set notation, this is 
 [[:L:][:Nl:]--[:Pattern_Syntax:]--[:Pattern_White_Space:]] plus stability 
 extensions.

U+2E2F has the `Pattern_Syntax` property and is thus not a valid `ID_Start` 
code point.

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Code points vs Unicode scalar values

2013-09-11 Thread Mathias Bynens
On 10 Sep 2013, at 18:30, Allen Wirfs-Brock al...@wirfs-brock.com wrote:

 On Sep 10, 2013, at 12:14 AM, Mathias Bynens wrote:
 
 FWIW, here’s a real-world example of a case where this behavior is 
 annoying/unexpected to developers: http://cirw.in/blog/node-unicode
 
 This suggests to me that the problem is in JSON.stringify's Quote operation.  
 I can see an argument that Quote should convert all unpaired surrogates to 
 \u escapes.  I wonder if changing Quote to do this would break anything…

*If* this turns out to be a non-breaking change, it would make sense to have 
`JSON.stringify` escape any non-ASCII symbols, as well as any non-printable 
ASCII symbols, similar to `jsesc`’s `json` option [1]. This would improve 
portability of the serialized data in case it was saved to a misconfigured 
database, saved to a file with a non-UTF-8 encoding, served to a browser 
without `charset=utf-8` in the `Content-Type` header, et cetera.

[1] http://mths.be/jsesc#json
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Code points vs Unicode scalar values

2013-09-10 Thread Mathias Bynens
FWIW, here’s a real-world example of a case where this behavior is 
annoying/unexpected to developers: http://cirw.in/blog/node-unicode
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Question about allowed characters in identifier names

2013-09-05 Thread Mathias Bynens

On 26 Aug 2013, at 04:08, Norbert Lindenberg 
ecmascr...@lindenbergsoftware.com wrote:

 On Aug 24, 2013, at 23:43 , Mathias Bynens math...@qiwi.be wrote:
 
 I would suggest adding something like `String.isIdentifier` which accepts a 
 multi-symbol string or an array of code points to the strawman. Seems useful 
 to be able to do `String.isIdentifier('foobar')`
 
 What would be the use case(s) for that?

Tools like http://mothereff.in/js-escapes.

 Would it accept only an actual identifier or all possible escaped forms of 
 one (i.e., only ஷ野家 or also \u{20BB7}野\u5BB6)?

Both, since `ஷ野家 === \u{20BB7}野\u5BB6`. That string is also equal to 
`\uD842\uDFB7\u91CE\u5BB6` although it hasn’t been decided if that should be 
a valid identifier too (since it uses the surrogate code points explicitly): 
https://bugs.ecmascript.org/show_bug.cgi?id=469. That complicates things.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Code points vs Unicode scalar values

2013-09-04 Thread Mathias Bynens

On 4 Sep 2013, at 18:34, Brendan Eich bren...@mozilla.com wrote:

 Here, from the latest ES6 draft, is 15.5.2.3 String.fromCodePoint ( 
 ...codePoints):
 
 The String.fromCodePoint function may be called with a variable number of 
 arguments which form the
 rest parameter codePoints. The following steps are taken:
 1. Assert: codePoints is a well-formed rest parameter object.
 2. Let length be the result of Get(codePoints, length).
 3. Let elements be a new List.
 4. Let nextIndex be 0.
 5. Repeat while nextIndex  length
 a. Let next be the result of Get(codePoints, ToString(nextIndex)).
 b. Let nextCP be ToNumber(next).
 c. ReturnIfAbrupt(nextCP).
 d. If SameValue(nextCP, ToInteger(nextCP)) is false,then throw a RangeError 
 exception.
 e. If nextCP  0 or nextCP  0x10, then throw a RangeError exception.
 f. Append the elements of the UTF-16 Encoding (clause 6) of nextCP to the end 
 of elements.
 g. Let nextIndex be nextIndex + 1.
 6. Return the String value whose elements are, in order, the elements in the 
 List elements. If length is 0, the
 empty string is returned.
 
 
 No exposed surrogates here!

I think what Anne means to say is that `String.fromCodePoint(0xD800)` returns 
'\uD800` as per that algorithm, which is a lone surrogate (and not a scalar 
value).
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Question about allowed characters in identifier names

2013-08-25 Thread Mathias Bynens
On 25 Aug 2013, at 04:17, Norbert Lindenberg 
ecmascr...@lindenbergsoftware.com wrote:

 I don't think that's a technical problem. String.isIdentifier{Start,Part}, as 
 I proposed them, don't deal with actual identifiers in source text; they 
 check individual identifier characters.
 
 The functions are intended to be called by a parser, and it's up to the 
 parser to deal with escaping rules, throwing exceptions or unescaping as 
 specified before passing code points to String.isIdentifier{Start,Part}. 
 Calling the functions with string literals doesn't seem like a useful use 
 case.

Ah, I see. Step 1 of the proposed algorithm in 
http://wiki.ecmascript.org/doku.php?id=strawman:identifier_identification would 
convert the string to a single code point. Thanks for clarifying!

I would suggest adding something like `String.isIdentifier` which accepts a 
multi-symbol string or an array of code points to the strawman. Seems useful to 
be able to do `String.isIdentifier('foobar')`

 I do think it's a problem in learning and understanding the language. Having 
 different rules for \uD87E\uDC00 in string literals and identifiers, and 
 therefore also for identifiers embedded in strings passed to eval(), adds yet 
 another of those random inconsistencies that already litter ECMAScript, and 
 ensures a wat moment for everybody who comes across them.

Agreed.

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Question about allowed characters in identifier names

2013-08-24 Thread Mathias Bynens
On 27 Feb 2012, at 22:58, Allen Wirfs-Brock al...@wirfs-brock.com wrote:

 On Feb 26, 2012, at 1:55 AM, Mathias Bynens wrote:
 
 For example, U+2F800 CJK COMPATIBILITY IDEOGRAPH-2F800 is a supplementary 
 Unicode character in the [Lo] category, which leads me to believe it should 
 be allowed in identifier names. After all, the spec says:
 
 UnicodeLetter = any character in the Unicode categories “Uppercase letter 
 (Lu)”, “Lowercase letter (Ll)”, “Titlecase letter (Lt)”, “Modifier letter 
 (Lm)”, “Other letter (Lo)”, or “Letter number (Nl)”.
 
 However, since JavaScript uses UCS-2 internally, this symbol is represented 
 by a surrogate pair, i.e. two code units: `\uD87E\uDC00`.
 
 The spec, however, defines “character” as follows: 
 http://es5.github.com/x6.html#x6
 
 Throughout the rest of this document, the phrase “code unit” and the word 
 “character” will be used to refer to a 16-bit unsigned value used to 
 represent a single 16-bit unit of text. The phrase “Unicode character” will 
 be used to refer to the abstract linguistic or typographical unit 
 represented by a single Unicode scalar value (which may be longer than 16 
 bits and thus may be represented by more than one code unit). The phrase 
 “code point” refers to such a Unicode scalar value. “Unicode character” only 
 refers to entities represented by single Unicode scalar values: the 
 components of a combining character sequence are still individual “Unicode 
 characters,” even though a user might think of the whole sequence as a 
 single character.
 
 So, based on this definition of “character” (code unit), U+2F800 should not 
 be allowed in an identifier name after all.
 
 I’m not sure if my interpretation of the spec is correct, though. Could 
 anyone confirm or deny this? Are supplementary (non-BMP) Unicode characters 
 allowed in identifiers or not? For example, is this valid JavaScript or not?
 
 Yes, this interpretation is consistent with my understanding of the 
 requirements as expressed in the ES5 spec.   ES5 logically only works with 
 UCS-2 characters corresponding to the BMP.
 
 Some (probably most) implementations pass UTF-16 encodings of supplemental 
 characters to the JavaScript compiler.  According to the spec, these are 
 processed as two UCS-2 characters neither of which would be a member of any 
 of the above character categories.  Their use in an identifier context should 
 result in a syntax error.  Within a string literal, the two UCS-2 characters 
 would generate two string elements.
 
 This is something that I think can be clarified for the ES6 specification, 
 independent of the on-going discussion of the possibility of 21-bit string 
 elements.  My preference for the future is to simply define the input 
 alphabet of ECMAScript as all Unicode characters independent of actual 
 encoding.

That sounds nice.

 var \ud87e\udc00 would probably still be illegal because each \u define a 
 separate character but: var \u{2f800} =42; schould be find as should the 
 direct none escaped occurrence of that characters.

Wouldn’t this be confusing, though?

global['\u{2F800}'] = 42; // would work (compatible with ES5 behavior)
global['\uD87E\uDC00'] = 42; // would work, too, since `'\uD87E\uDC00' == 
'\u{2F800}'` (compatible with ES5 behavior)
var \uD87E\uDC00 = 42; // would fail (compatible with ES5 behavior)
var \u{2F800} = 42; // would work (as per your comment; incompatible with 
ES5 behavior)
var  = 42; // would work (as per your comment; incompatible with ES5 
behavior)

Using astral symbols in identifiers would be backwards incompatible, even if 
the raw (unescaped) symbol is used. There’d be no way to use such an identifier 
in an ES5 environment. Is this a problem?
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


  1   2   >