Re: RegExps in array functions
Brendan Eich wrote: js [fOo,bAr].filter(function(){return /foo/i}); [fOo, bAr] LOL, I returned a truthy value. Fixing: js [fOo,bAr].filter(function(s){return /foo/i.test(s)}); [fOo] Even longer :-/. So yes, callable regexps were useful -- that's why I made 'em callable way back when (Netscape 4 impl, based on Perl 4[!], fed into ES3). But they wound up being outlawed by ES3's text. /be ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: RegExps in array functions
On Sat, Mar 24, 2012 at 5:54 AM, Brandon Benvie bran...@brandonbenvie.com wrote: I've been struggling for a way to describe this idea, but it's almost like we're lacking what amounts to a valueOf where the expected result is a callable. The regex-as-filter is a good example of that use case. You don't want the base object to be a function or callable necessarily, but you do want to indicate that there's a clear callable representative of the object. The problem is that there are so many ways you might want to use an object as a function, not just one. In this case, we want it as a predicate. In other cases you might want it to return the match as a string, or as a match. Or you want your generic objects to match against different properties. The use-case here is that you have an object and want it to act as a function in a specific case, and it's driven by the way the object is used, which is unbounded, not the way the object is defined. You can't be expected to predict all the ways an object is going to be used, and create one function representation suiting the all. For this, you need a function per usage, so the predicate function suggested above is exactly the right level of abstraction. /L ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
[Fwd: Setting inheritance chains, |, extends, class and all the stuff]
Somehow, this did not get into the list, so I am resending it... hopefully it won't get duplicated. Original Message From: Herby Vojčík he...@mailbox.sk Subject: Setting inheritance chains, |, extends, class and all the stuff Date: Wed, 21 Mar 2012 18:58:59 +0100 To: ECMAScript discussion es-discuss@mozilla.org Hi, as for setting the [[Prototype]] chains with constructor function, I see there are three scenarios: CP: aka 'classical inheritance' SubCtr.[[Prototype]] is set to SuperObj SubCtr.prototype.[[Prototype]] is set to SuperObj.prototype C: aka 'constructor inheritance' SubCtr.[[Prototype]] is set to SuperObj SubCtr.prototype.[[Prototype]] is set to Object.prototype P: aka 'prototype inheritance' SubCtr.[[Prototype]] is set to Function.prototype SubCtr.prototype.[[Prototype]] is set to SuperObj I propose these should be covered syntactically this way: CP: class SubCtr extends SuperObj { ... } // and if possible also, to have more basic construct without // defining methods, just the constructor function: function SubCtr (...) extends SuperObj { ... } C: SuperObj | function SubCtr (...) { ... } P: SuperObj | class SubCtr { ... } Precondition: class's extends keyword is agreed to create CP type inheritance Rationale: If extends represent the classish CP inheritance, and looking at all other uses of | you see that its spirit is to set only the [[Prototype]] of thing described by RHS, we can use both and have them unambiguously define what kind of [[Prototype]] they set only based on which construct is used, not based on structure of LHS. This plays with explicit is better then implicit card - you know instantly when you read/write code, what type of chaining you want to accomplish, so readability and hidden bugs from using variable LHS could be prevented. Comments? Herby ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: Finding a safety syntax for classes
On Mar 23, 2012, at 1:17 PM, Brendan Eich wrote: class Point extends Evil { constructor(ax, ay) { public x, y; super(); this.x = ax; this.y = ay; } ... } class Evil { constructor() { console.log(this.x); } } Should undefined by logged, or an error thrown? ... I think we can -- it seems obvious to me. ... no, error thrown. I don't agree with that, it's not how writable properties on objects work in JS. +1 That kind of protection matters if you're trying to define a statically typed class system that can guarantee all properties are initialized before use, e.g., to have non-nullable types. That's not what we're doing here. We are doing classes as a codification of existing practice. So you can refer to properties before they've been initialized, and you get undefined. Dave ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: module path resolution
On 23 March 2012 21:59, Irakli Gozalishvili rfo...@gmail.com wrote: A. ./foo.js ../foo/bar.js B. foo.js, foo/bar.js I'd suggest to resolve A type paths relative to a requirer (ether require module url, or document url). And resolve B type paths relative to a `document.baseURI`. This is almost what CommonJS specifies and is quite close to what we use locally. We have a top-level path that is prepended to type B require statements which is by default calculated from window.location.href. We can, of course, override it, in case we move the page to a location where this calculation does not work -- exactly analogous to a base href setting. The fact that the modules themselves use relative paths to co-dependencies means that it's easy to move modules around, from app to app, server to server, etc. One thing that would be nice that we don't currently have is the ability to load modules relative to the calling web page. This is an oversight in our loader. Wes -- Wesley W. Garland Director, Product Development PageMail, Inc. +1 613 542 2787 x 102 ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: Finding a safety syntax for classes
On Mar 21, 2012, at 9:13 AM, Allen Wirfs-Brock wrote: On Mar 20, 2012, at 11:32 PM, David Herman wrote: Well, hang on now. The 'constructor' syntax is not just the constructor property. It still carries special status; the semantics of a class says look for the property called 'constructor' and make that the [[Call]] and [[Construct]] behavior of the class. Actually, the semantics is probably more like: the value bound to the classMame is the function object defined by the methodDefinition whose propertyName is constructor. Yes, sure. Doesn't change the point: 'constructor' is still a special, distinguished method. Regardless of whether we spell it 'constructor' or 'new' it requires special semantics that says there's one distinguished method form in the body that determines the constructor of the class. There is one distinguished method that is the class object (aka, the constructor function). A class declaration essentially desugars into the definition of a constructor function and a object that is the prototype associated with that constructor. Again, doesn't change the point: whether you spell it 'new' or 'constructor' it's still a distinguished method, and then you can desugar that however you want. The question is how we spell that. This is 99.9% a surface syntax question. Tou could argue that spelling it 'new' should define a [new] method, or a [new] method and a [constructor] method, or just a [constructor] method. If the latter, it's semantically *identical* to spelling it 'constructor'. But even if we chose one of the other two alternatives, the semantic differences here are minor, and the ergonomics of the syntax matter. You need to drop the [ ]'s (although I'm not sure what you meant by them... I meant that there is a property of the prototype that you can access via either p[new] or p[constructor] or both, depending on which semantics we decide to give. No matter what the surface syntax, any of those semantics is available to us. I say: a) spell it new -- ergonomics trumps corner cases; hard cases make bad law b) desugar it to the constructor function and the p.constructor property only c) i.e., don't create a p.new property -- no more prototype pollution please d) an explicit 'constructor' method overrides the implicit creation of the 'constructor' method but does not define the constructor function Why d)? Remember, the .constructor idiom is a *very weak* idiom that many JS programs don't follow. If a JS program has some reason to use 'constructor' for a different purpose, trust them. Personally I think the answer should be A which implies that we have class-side inheritance. This is a departure from current practice but because classes are functions there is no way in ES=5.1 to set up class side inheritance other than by mutating __proto__. I always found this the more appealing, but then again, if I'm supposed to be going with the opposite of my instincts (see above), then maybe I should disagree with you. ;) I would guess that your instinctive response comes from thinking about a class is something more than just a composite of objects. We can talk more about this later after I respond to Mark I think you misread me. My instinctive response agrees with yours, not Mark's. If the value of SOMEEXPRESSION is a constructor function (typeof == function and has a prototype property) then the new constructor inherits from SOMEEXPRESSION and the new prototype inherits from SOMEEXPRESSION.prototype. Otherwise, the new consructor inherits from Function.prototype and the new prototype inherits from SOMEEXPRESSION. That is essentially the semantics I've defined for SOMEEXPRESSION | function () {} I'm not happy with that semantics, for either classes or | (I believe others have objected on the list to the special-case semantics for | as well). Since functions are objects, you can pass functions into contexts that expect an object and those contexts don't need to care whether the object they have is a function or an object. So this will lead to WTFjs moments where people take an object they got passed in from someone else and create a class with it, and it won't be wired up right because they didn't realize the object was a function. This kind of special-case ad hoc type testing in the semantics has a bad smell. It reminds me of stuff like the Array constructor that special-cases the number argument. B = do{ let B = SOMEEXPRESSION | function B(...) { ...}; B.prototype= SOMEEXPRESSION.prototype | {constructor: B, ...} B } (or to be maximally explicit} B= do { let B = function B(...) {...}; if (typeof SOMEEXPRESSION == function typeof SOMEEXPRESSION.prototype == object) B.__proto__=SOMEEXPRESSION; B.prototype= SOMEEXPRESSION.prototype | {constructor: B, ...} B } Nit: you'd need to bind the result of
Re: String.prototype.split fixed fields extension
This seems special purpose enough (as you say, for legacy formats) and easy enough to implement, that it probably doesn't warrant being included in the language. - Russ On Mar 23, 2012, at 5:36 PM, Roger Andrews roger.andr...@mail104.co.uk wrote: String.prototype.split is good for cutting records into fields based on a delimiter string or regexp. E.g. rec.split( ',' ) // split CSV record (no commas in fields) rec.split( /\s+/ ) // split into whitespace-separated fields How about extending 'split', or inventing a new method 'splitlen', which splits a record into defined-length fields? This simplifies a long list of 'substring's. Old data formats invented in the days of punch-cards are still around. For example NASA's two-line element set (http://en.wikipedia.org/wiki/Two-line_element_set) which records the orbital elements of Earth satellites. E.g. here is the TLE for for International Space Station: ISS (ZARYA) 1 25544U 98067A 08264.51782528 -.2182 0-0 -11606-4 0 2927 2 25544 51.6416 247.4627 0006703 130.5360 325.0288 15.72125391563537 Proposed design: split( len1, len2, len3, len4, . ) // returns array of fields where each numeric length argument either (1) captures a field of the given length if positive, or (2) ignores a field of the absolute given length if negative. The special argument * could repeat the previous argument to the end of the record. Examples: // chop into 5-char fields rec.split( 5, * ) // capture a 1-char and a 5-char field and all chars after index 17 rec.split( -7, 1, 5, -4, Infinity ) ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: Full Unicode based on UTF-16 proposal
On Mar 23, 2012, at 11:45 AM, Roger Andrews wrote: Concerning UTF-16 surrogate pairs, how about a function like: String.isValid( str ) to discover whether surrogates are used correctly in 'str'? Something like Array.isArray(). No need for it to be a class method, since it only operates on strings. We could simply have String.prototype.isValid(). Note that it would work for primitive strings as well, thanks to JS's automatic promotion semantics. Dave ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: Full Unicode based on UTF-16 proposal
On Mar 23, 2012, at 6:30 AM, Steven Levithan wrote: I've been wondering whether it might be best for the /u flag to do three things at once, making it an all-around support Unicode better flag: +all my internet points Now you're talking!! 1. Switches from code unit to code point mode. /./gu matches any Unicode code point, among other benefits outlined by Norbert. 2. Makes \d\D\w\W\b\B match Unicode decimal digits and word characters. [0-9], [A-Za-z0-9_], and lookaround provide fallbacks if you want to match ASCII characters only while using /u. 3. [New proposal] Makes /i use Unicode casefolding rules. /ΣΤΙΓΜΑΣ/iu.test(στιγμας) == true. This is really exciting. As for whether the switch to code-point-based matching should be universal or require /u (an issue that your proposal leaves open), IMHO it's better to require /u since it avoids the need for transforming \u[\u-\u] to [{\u\u}-{\u\u}] and [\u-\u][\uDC00-\uDFFF] to [{\u\uDC00}-{\u\uDFFF}], and additionally avoids as least three potentially breaking changes (two of which are explicitly mentioned in your proposal): I haven't completely understood this part of the discussion. Looking at /u as a little red switch (LRS), i.e., an opportunity to make judicious breaks with compatibility, could we not allow character classes with unescaped non-BMP code points, e.g.: js 팆팇팈팉팊.match(/[팆-퍖]+/u) [팆팇팈팉팊] I'm still getting up to speed on Unicode and JS string semantics, so I'm guessing that I'm missing a reason why that wouldn't work... Presumably the JS source, as a sequence of UTF-16 code units, represents the tetragram code points as surrogate pairs. Can we not recognize surrogate pairs in character classes within a /u regexp and interpret them as code points? Dave ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: Full Unicode based on UTF-16 proposal
Presumably the JS source, as a sequence of UTF-16 code units, represents the tetragram code points as surrogate pairs. Clarification: the JS source *of the regexp literal*. Dave ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: Using Object Literals as Classes
(function () {}) creates two object, not one. I'm not sure what you meant here. .. the empty object for the function's .prototype currently seems to elude me in all its forms.. Claus ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: Full Unicode based on UTF-16 proposal
On 24 March 2012 17:22, David Herman dher...@mozilla.com wrote: I'm not 100% clear on this point yet, but e.g. the SourceCharacter production in Annex A.1 is described as any Unicode code unit. Ugh, IMHO, that's wrong, and should be any Unicode code point. (let the flames begin?) The underlying transport format should not be a concern for the JS lexer. eval Eval is a red herring: its input is defined as the contents of the given String. So, we come full-circle back to what's in a String?. I'm still partial to Brendan's BRS idea, because at least it fixes everything all at once. Wes -- Wesley W. Garland Director, Product Development PageMail, Inc. +1 613 542 2787 x 102 ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: Let's replace | with :: (was Breaking up the |...)
Andreas Rossberg wrote: (OTOH, I'd prefer finding a way to do guards with single ':'... :-) ) I started there too (ES4, ML which was used [SML] for the ES4 reference implementation, other work such as Links). But TC39 argued about it and the desire to guard property names in object literals led us to favor ::. let guardedObj = {p1::t1: v1, ~~~ pN::tN: vN}; Using : is ambiguous (using :, a property assignment, assuming optional guards, could have p:t or p:v or p:t:v). I suggested requiring parenthesization: let guardedObj = {(p1:t1): v1 ~~~ (pN:tN): vN}; but no one liked that! Since :: was only twice as bad as :, and Haskell used it, we agreed to :: for guards. /be ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: hexadecimal literals for floating-point
This isn't a huge Ask to use your term, but it's also not a big win. We can talk about it off-agenda at next week's meeting, try to get a feel for odds of adding it. I'll bring it up. Thanks, /be Roger Andrews wrote: Perhaps I spoke a little too loosely there. The hexadecimal exponential format allows control over the bits of a *finite* Number, not arbitrary bit patterns. Just like the decimal exponential format. The hex format does not allow the programmer to mess with IEEE Infinities, NaNs, signaling NaNs, NaN payloads. (Except that 0x1p1024 overflows to Infinity, just like 1e310.) PS: My two examples should be 0x1p-52 and 0x1p1023 (I forgot the 0x prefix for hexadecimal). -- Note that some ES implementation uses NaN-based tagging to distinguish Number values from internal object pointers.Allowing generation of arbitrary bit pattern FP values would open the door for pointer spoofing. At the very least, implementation would have to reject any such hex fp literals that the implementation would interpret as a pointer. Allen On Mar 18, 2012, at 6:27 PM, Roger Andrews wrote: As a C-like language JavaScript has hexadecimal integer literals like 0x123ABC. C99 introduced hexadecimal floating-point literals, see: http://publib.boulder.ibm.com/infocenter/zos/v1r12/topic/com.ibm.zos.r12.cbclx01/lit_fltpt.htm#lit_fltpt__hex_float_constants of the general form: [sign] 0x [hexdigits] [. hexdigits] [p [sign] decdigits] (with the proviso that at least one significant hexdigit must appear). The letter 'p' means times 2 to the power of; and the exponent field is signed decimal. Case is ignored. (In C the exponent field *must* appear in order to disambiguate a trailing 'f' = float flag, but this would not be necessary in JavaScript.) Would it be a Big Ask to have hexadecimal floating-point literals in ES6? It extends the existing hexadecimal integer literal syntax slightly to encompass all the finite Numbers. This gives the programmer the ability to precisely specify every bit of a Number in a well-understood form. And it is a natural match with the binary64 sign-significand-exponent fields. For example: var EPSILON = 1p-52, MAX_POWTWO = 1p1023; instead of var EPSILON = 1 / 0x10, MAX_POWTWO = 1 / (Number.MIN_VALUE * 0x8); JavaScript Numbers can be integer or fractional, decimal literals too, why do hexadecimal literals discriminate against fractions? ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: Finding a safety syntax for classes
What about setting arbitrary expressions as the value for prototype methods? Being able to use higher-order functions to dynamically create functions is very important for functional style programming. I use it very often to decorate functions, compose functions together, set partially applied functions as methods, etc. It seems to be impossible with syntax proposed here - I think adding it to the safety syntax is very much needed and should not be over looked. On Sat, Mar 24, 2012 at 4:28 PM, David Herman dher...@mozilla.com wrote: On Mar 21, 2012, at 9:13 AM, Allen Wirfs-Brock wrote: On Mar 20, 2012, at 11:32 PM, David Herman wrote: Well, hang on now. The 'constructor' syntax is not just the constructor property. It still carries special status; the semantics of a class says look for the property called 'constructor' and make that the [[Call]] and [[Construct]] behavior of the class. Actually, the semantics is probably more like: the value bound to the classMame is the function object defined by the methodDefinition whose propertyName is constructor. Yes, sure. Doesn't change the point: 'constructor' is still a special, distinguished method. Regardless of whether we spell it 'constructor' or 'new' it requires special semantics that says there's one distinguished method form in the body that determines the constructor of the class. There is one distinguished method that *is* the class object (aka, the constructor function). A class declaration essentially desugars into the definition of a constructor function and a object that is the prototype associated with that constructor. Again, doesn't change the point: whether you spell it 'new' or 'constructor' it's still a distinguished method, and then you can desugar that however you want. The question is how we spell that. This is 99.9% a surface syntax question. Tou could argue that spelling it 'new' should define a [new] method, or a [new] method and a [constructor] method, or just a [constructor] method. If the latter, it's semantically *identical* to spelling it 'constructor'. But even if we chose one of the other two alternatives, the semantic differences here are minor, and the ergonomics of the syntax matter. You need to drop the [ ]'s (although I'm not sure what you meant by them... I meant that there is a property of the prototype that you can access via either p[new] or p[constructor] or both, depending on which semantics we decide to give. No matter what the surface syntax, any of those semantics is available to us. I say: a) spell it new -- ergonomics trumps corner cases; hard cases make bad law b) desugar it to the constructor function and the p.constructor property only c) i.e., don't create a p.new property -- no more prototype pollution please d) an explicit 'constructor' method overrides the implicit creation of the 'constructor' method but does not define the constructor function Why d)? Remember, the .constructor idiom is a *very weak* idiom that many JS programs don't follow. If a JS program has some reason to use 'constructor' for a different purpose, trust them. Personally I think the answer should be A which implies that we have class-side inheritance. This is a departure from current practice but because classes are functions there is no way in ES=5.1 to set up class side inheritance other than by mutating __proto__. I always found this the more appealing, but then again, if I'm supposed to be going with the opposite of my instincts (see above), then maybe I should disagree with you. ;) I would guess that your instinctive response comes from thinking about a class is something more than just a composite of objects. We can talk more about this later after I respond to Mark I think you misread me. My instinctive response agrees with yours, not Mark's. If the value of SOMEEXPRESSION is a constructor function (typeof == function and has a prototype property) then the new constructor inherits from SOMEEXPRESSION and the new prototype inherits from SOMEEXPRESSION.prototype. Otherwise, the new consructor inherits from Function.prototype and the new prototype inherits from SOMEEXPRESSION. That is essentially the semantics I've defined for SOMEEXPRESSION | function () {} I'm not happy with that semantics, for either classes or | (I believe others have objected on the list to the special-case semantics for | as well). Since functions are objects, you can pass functions into contexts that expect an object and those contexts don't need to care whether the object they have is a function or an object. So this will lead to WTFjs moments where people take an object they got passed in from someone else and create a class with it, and it won't be wired up right because they didn't realize the object was a function. This kind of special-case ad hoc type testing in the semantics has a bad smell. It reminds me of
Re: Finding a safety syntax for classes
On Mar 24, 2012, at 3:01 PM, Nadav Shesek wrote: What about setting arbitrary expressions as the value for prototype methods? Being able to use higher-order functions to dynamically create functions is very important for functional style programming. I use it very often to decorate functions, compose functions together, set partially applied functions as methods, etc. It seems to be impossible with syntax proposed here - I think adding it to the safety syntax is very much needed and should not be over looked. Yep, we already agreed to this -- see the grammar on Allen's maximally minimal proposal: http://wiki.ecmascript.org/doku.php?id=strawman:maximally_minimal_classes#class_declarations_and_expressions Dave ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: Full Unicode based on UTF-16 proposal
On Mar 23, 2012, at 6:30 , Steven Levithan wrote: Norbert Lindenberg wrote: I've updated the proposal based on the feedback received so far. Changes are listed in the Updates section. http://norbertlindenberg.com/2012/03/ecmascript-supplementary-characters/ Cool. From the proposal's Updates section: Indicated that u may not be the actual character for the flag for code point mode in regular expressions, as a u flag has already been proposed for Unicode-aware digit and word character matching. I've been wondering whether it might be best for the /u flag to do three things at once, making it an all-around support Unicode better flag: 1. Switches from code unit to code point mode. /./gu matches any Unicode code point, among other benefits outlined by Norbert. 2. Makes \d\D\w\W\b\B match Unicode decimal digits and word characters. [0-9], [A-Za-z0-9_], and lookaround provide fallbacks if you want to match ASCII characters only while using /u. One concern: I think code point based matching should be the default for regex literals within modules (where we know the code is written for Harmony). Does it make sense to also interpret \d\D\w\W\b\B as full Unicode sets for such literals? In the other direction it's clear that using /u for \d\D\w\W\b\B has to imply code point mode. 3. [New proposal] Makes /i use Unicode casefolding rules. /ΣΤΙΓΜΑΣ/iu.test(στιγμας) == true. We probably should review the complete Unicode Technical Standard #18, Unicode Regular Expressions, and see how we can upgrade RegExp for better Unicode support. Maybe on a separate thread... Item number 3 is inspired by but different than Java's lowercase u flag for Unicode casefolding. In Java, flag u itself enables Unicode casefolding and does not need to be paired with flag i (which is equivalent to ES's /i). As an aside, merging these three things would likely lead to /u seeing widespread use when dealing with anything more than ASCII, at least in environments where you don't have to worry about backcompat. This would help developers avoid stumbling on code unit issues in the small minority of cases where non-BMP characters are used or encountered. If /u's only purpose was to switch to code point mode, most likely it would be used *far* less often, and more developers would continue to get bitten by code-unit-based processing. Good thinking :-) As for whether the switch to code-point-based matching should be universal or require /u (an issue that your proposal leaves open), IMHO it's better to require /u since it avoids the need for transforming \u[\u-\u] to [{\u\u}-{\u\u}] and [\u-\u][\uDC00-\uDFFF] to [{\u\uDC00}-{\u\uDFFF}], and additionally avoids as least three potentially breaking changes (two of which are explicitly mentioned in your proposal): 1. [S]ome applications might have processed gunk with regular expressions where neither the 'characters' in the patterns nor the input to be matched are text. 2. s.match(/^.$/)[0].length can now be 2. I'll add, /.{3}/.exec(s)[0].length can now be anywhere between 3 and 6. 3. /./g.exec(s) can now increment the regex's lastIndex by 2. -- Steven Levithan ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: Full Unicode based on UTF-16 proposal
On Mar 23, 2012, at 7:12 , Lasse Reichstein wrote: On Fri, Mar 23, 2012 at 2:30 PM, Steven Levithan steves_l...@hotmail.com wrote: I've been wondering whether it might be best for the /u flag to do three things at once, making it an all-around support Unicode better flag: ... 3. [New proposal] Makes /i use Unicode casefolding rules. Yey, I'm for it :) Especially if it means dropping the rather naïve canonicalize function that can't canonicalize an ASCII character with a non-ASCII character. /ΣΤΙΓΜΑΣ/iu.test(στιγμας) == true. I think a compliant implementation should (read: ought to) already get that example, since στιγμας.toUpperCase() == ΣΤΙΓΜΑΣ.toUpperCase() in the browsers I have checked, and the ignore-case canonicalization is based on toUpperCase. Alas, most of the implementations miss it anyway. According to the ES5 spec, /ΣΤΙΓΜΑΣ/i.test(στιγμας) must be true indeed. Chrome and Node (i.e., V8) and IE get this right; Safari, Firefox, and Opera don't. Note that toUpperCase allows mappings from 1 to multiple code units, while RegExp canonicalization in ES5 doesn't, so /SS/i.test(ß) === false even though SS.toUpperCase() === ß.toUpperCase(). Norbert ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: Full Unicode based on UTF-16 proposal
Thanks for the detailed comments! Replies below. Norbert On Mar 23, 2012, at 9:46 , Phillips, Addison wrote: Comments follow. 1. Definition of string. You say: -- However, ECMAScript does not place any restrictions or requirements on the sequence of code units in a String value, so it may be ill-formed when interpreted as a UTF-16 code unit sequence. -- I know what you mean, but others might not. Perhaps: -- However, ECMAScript does not place any restrictions or requirements on the sequence of code units in a String value, so the sequence of code units may contain code units that are not valid in Unicode or sequences that do not represent Unicode code points (such as unpaired surrogates). -- I can add a note that ill-formed here means containing unpaired surrogates. If I read chapter 3 of the Unicode Standard correctly, there's no other way for UTF-16 to be ill-formed. UTF-16 code units by themselves cannot be invalid - any 16-bit value can occur in a well-formed UTF-16 string. 2. In this section, I would define string after code unit and code point. I would also include a definition of surrogates/surrogate pairs. Makes sense. 3. Under text interpretation you say: -- For compatibility with existing applications, it has to allow surrogate code points (code points between U+D800 and U+DFFF which can never represent characters). -- This would (see above) benefit from having a definition in place. As noted, this is slightly incomplete, since surrogate code units are used to form supplementary characters. The text is about surrogate code points, not about surrogate code units. 4. 0xFFFE and 0x are non-characters in Unicode. I do think you do the right thing here. It's just a nit that you never note this ;-). 5. Editorial unnecessary ;-): -- This transformation is rather ugly, but I’m afraid it’s the price ECMAScript has to pay for being 12 years late in supporting supplementary characters. -- 6. Under 'details' you suggest a number of renamings. Are these strictly necessary? The term 'character' could be taken to mean 'code point' instead, with an explanatory note. Unfortunately, the term character is poisoned in ES5 by a redefinition as code unit (chapter 6). For ES6, I'd like the spec to be really clear where it means code units and where it means code points. Maybe we can then reintroduce character in ES7... 7. Skipping down a lot, to section 6 source text, you propose: -- The text is expected to have been normalised to Unicode Normalization Form C (Canonical Decomposition, followed by Canonical Composition), as described in Unicode Standard Annex #15. -- I think this should be removed or modified. This sentence is essentially copied from ES5 (with corrected references), and as I copied it, I made a note to myself that we need to discuss normalization, just not as part of this proposal... Automatic application of NFC is not always desirable, as it can affect presentation or processing. Perhaps: -- Normalization of the text to Unicode Normalization Form C (Canonical Decomposition, followed by Canonical Composition), as described in Unicode Standard Annex #15, is recommended when transcoding from another character encoding. -- 8. In 7.6 Identifier Names and Identifiers you don't actually forbid unpaired surrogates or non-characters in the text (Identifier_Part:: does this by implication). Perhaps state it? Also, ZWJ and ZWNJ are permitted as the last character in an identifier. I can add a note about surrogate code points and non-characters, but, as you say, they are already ruled out because they can't have the required Unicode properties ID_Start or ID_Continue. The use of ZWJ and ZWNJ is unchanged from ES5. UAX 31 has much stricter rules on where they would be allowed, but I'm not sure we have a strong case for changing the rules in ECMAScript. http://www.unicode.org/reports/tr31/tr31-9.html#Layout_and_Format_Control_Characters 9. 15.5.4.6: you say (a nonnegative integer less than 0x10), whereas it should say: (a nonnegative integer less than or equal to 0x10) Will fix. 10. In the section on what about utf-32, you say: and the code points start at positions 1, 2, 3.. Of course this should be ... and the code points start at positions 0, 1, 2. Of course. Thanks for this proposal! Addison -Original Message- From: Norbert Lindenberg [mailto:ecmascr...@norbertlindenberg.com] Sent: Thursday, March 22, 2012 10:14 PM To: es-discuss@mozilla.org Subject: Re: Full Unicode based on UTF-16 proposal I've updated the proposal based on the feedback received so far. Changes are listed in the Updates section. http://norbertlindenberg.com/2012/03/ecmascript-supplementary-characters/ Norbert On Mar 16, 2012, at 0:18 , Norbert Lindenberg wrote: Based on my prioritization of goals for support for full Unicode in
Re: Finding a safety syntax for classes
On Mar 24, 2012, at 7:28 AM, David Herman wrote: On Mar 21, 2012, at 9:13 AM, Allen Wirfs-Brock wrote: On Mar 20, 2012, at 11:32 PM, David Herman wrote: Well, hang on now. The 'constructor' syntax is not just the constructor property. It still carries special status; the semantics of a class says look for the property called 'constructor' and make that the [[Call]] and [[Construct]] behavior of the class. Actually, the semantics is probably more like: the value bound to the classMame is the function object defined by the methodDefinition whose propertyName is constructor. Yes, sure. Doesn't change the point: 'constructor' is still a special, distinguished method. Yes it is. I think the core issue WRT to identifying this distinguished method with an identifier other than constructor concerns how it would relate to an actual method named constructor. The conservative course is to use constructor as the distinguishing identifier. It is a direct reflection of the underlying ES=5.1 chapter 13 and15 class model and it doesn't introduce any of these name naming issues. I primarily favor sticking with constructor because the safety/maximally-miminal proposal is all about being conservative... ... The question is how we spell that. This is 99.9% a surface syntax question. Tou could argue that spelling it 'new' should define a [new] method, or a [new] method and a [constructor] method, or just a [constructor] method. If the latter, it's semantically *identical* to spelling it 'constructor'. But even if we chose one of the other two alternatives, the semantic differences here are minor, and the ergonomics of the syntax matter. You need to drop the [ ]'s (although I'm not sure what you meant by them... I meant that there is a property of the prototype that you can access via either p[new] or p[constructor] or both, depending on which semantics we decide to give. No matter what the surface syntax, any of those semantics is available to us. I say: a) spell it new -- ergonomics trumps corner cases; hard cases make bad law deviates from chapters 13 and 15. (class () {}) isn't interchangeable with (function () {}). Shouldn't it be? b) desugar it to the constructor function and the p.constructor property only but presumably means that an instance method named new can't be defined using a class definition. Or, perhaps only if new is explicitly string quoted as a property name. new is not a totally unreasonable method name. Also presumably means that defining an instance method named constructor is disallowed. c) i.e., don't create a p.new property -- no more prototype pollution please I presume you mean it also doesn't define a p.constructor property. This has similar problems to a). It deviates from the legacy ES class model Also, most instance inherit from Object.prototype so they will still have an inherited constructor property whose value is Object. d) an explicit 'constructor' method overrides the implicit creation of the 'constructor' method but does not define the constructor function Why d)? Remember, the .constructor idiom is a *very weak* idiom that many JS programs don't follow. If a JS program has some reason to use 'constructor' for a different purpose, trust them. I believe the constructor idiom is most commonly not followed today when the prototype property of a function is set to a new object (perhaps defined using an object literal) and correctly dynamically setting the constructor property is an extra step that is easily forgotten (an usually as no ill-effects) However, with syntactic class definition support (including inheritance) in the language I am sure that we are going to see much more use of OO idioms in ES programs (if that wasn't the case why would be add them). A very common idiom is to query the class of an object. Just look how frequently you see could like p.class in Java or Ruby code or( p class) in Smalltalk code. Thew equivalent of this using ES chapter 13/`5 objects is p.constructor. Which would you prefer to see in future ES code p.constructor===q.constructor or p.new===q.new BTW, many OO experts including myself, tell people that querying the class of an object in this manner is an undesirable practice. Maybe even an anti-pattern. But in reality it is widely done and the negatives all relate to non-fucctional issues like code flexibility and reusability. Developer are going to do it, probably a lot. Personally I think the answer should be A which implies that we have class-side inheritance. This is a departure from current practice but because classes are functions there is no way in ES=5.1 to set up class side inheritance other than by mutating __proto__. I always found this the more appealing, but then again, if I'm supposed to be going with the opposite of my instincts (see above), then maybe I should disagree
Re: RegExps in array functions
Lasse Reichstein wrote: The problem is that there are so many ways you might want to use an object as a function, not just one. In this case, we want it as a predicate. In other cases you might want it to return the match as a string, or as a match. Or you want your generic objects to match against different properties. The use-case here is that you have an object and want it to act as a function in a specific case, and it's driven by the way the object is used, which is unbounded, not the way the object is defined. You can't be expected to predict all the ways an object is going to be used, and create one function representation suiting the all. For this, you need a function per usage, so the predicate function suggested above is exactly the right level of abstraction. Well argued. I agree in general. In this specific case, just a couple of thoughts (not disagreeing): When I made regexps callable way back when, I chose calling a regexp as short for exec'ing, which creates a match array result. Then I optimized the exec/call built-in to peek into its continuation to see if the result was used only for its boolishness. If so, the built-in would pass a flag parameter to the common internal API for regexp execution to avoid creating the result array, substituting false/true for null/the-array. One way for generic callback-invoking map, forEach, filter, etc. functions to go: call the .call method of the callback, assuming the callback is a function but enabling non-function objects to delegate to a call method that knows how to invoke them. This is shown at Steve Levithan's XRegExp github readme: https://github.com/slevithan/XRegExp // XRegExp regexes get call and apply methods // To demonstrate, let's first create the function we'll be using... function filter(array, fn) { var res = []; array.forEach(function (el) {if (fn.call(null, el)) res.push(el);}); return res; } // Now we can filter arrays using functions and regexes filter(['a', 'ba', 'ab', 'b'], XRegExp('^a')); // - ['a', 'ab'] This may be optimization-hostile at first glance, but VMs optimize call and apply with special caching and inlining. The bad old hardcoded-per-builtin optimization to avoid creating a useless-except-for-its-boolish-value result array is ideally generalized via inlining and type inference to cover many such results-allocating functions that are sometimes or often called just to test that they return non-null. /be ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: Finding a safety syntax for classes
David Herman wrote: On Mar 24, 2012, at 3:01 PM, Nadav Shesek wrote: What about setting arbitrary expressions as the value for prototype methods? Being able to use higher-order functions to dynamically create functions is very important for functional style programming. I use it very often to decorate functions, compose functions together, set partially applied functions as methods, etc. It seems to be impossible with syntax proposed here - I think adding it to the safety syntax is very much needed and should not be over looked. Yep, we already agreed to this -- see the grammar on Allen's maximally minimal proposal: http://wiki.ecmascript.org/doku.php?id=strawman:maximally_minimal_classes#class_declarations_and_expressions I don't see what I read Nadav as asking for: the ability to initialize a prototype method from an arbitrary expression. What am I missing? Class bodies ClassElement : PrototypePropertyDefinition ; //semicolons are allowed but have no significance PrototypePropertyDefinition : PropertyName ( FormalParameterList? ) { FunctionBody } // method *PropertyName ( FormalParameterList? ) { FunctionBody } // generator method get PropertyName ( ) { FunctionBody } // getter set PropertyName ( ProopertySetParameterList ) { FunctionBody } // setter /be ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: Finding a safety syntax for classes
On Mar 24, 2012, at 7:29 PM, Brendan Eich wrote: David Herman wrote: On Mar 24, 2012, at 3:01 PM, Nadav Shesek wrote: What about setting arbitrary expressions as the value for prototype methods? Being able to use higher-order functions to dynamically create functions is very important for functional style programming. I use it very often to decorate functions, compose functions together, set partially applied functions as methods, etc. It seems to be impossible with syntax proposed here - I think adding it to the safety syntax is very much needed and should not be over looked. Yep, we already agreed to this -- see the grammar on Allen's maximally minimal proposal: http://wiki.ecmascript.org/doku.php?id=strawman:maximally_minimal_classes#class_declarations_and_expressions I don't see what I read Nadav as asking for: the ability to initialize a prototype method from an arbitrary expression. What am I missing? I suspect Dave misinterpreted Nadav's question. So did I, when I originally read it. The superclass can be set to an arbitrary AssignmentExpression. This permits using a higher-order functions to be to define the [[Prototype]] of the class' prototype object. Potentially this mechanisms might be used to essentially have the effect of injecting dynamically generated methods into the class definition. However, they would be inherited methods of the prototype object rather than own methods, although that may not matter. To actually add a computed function as the value of a prototype object property within the class definition is pretty much the same thing as defining an arbitrary valued prototype data property. Defining non-method prototype properties is one of the features that we have previous been unable to reach consensus on and for that reason was intentionally excluded from the maximal-minimal proposal. As the proposal says: There is (intentionally) no direct declarative way to define either prototype data properties (other than methods) class properties, or instance property Class properties and prototype data properties need be created outside the declaration. Allen ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss