date:20120324

Re: RegExps in array functions

2012-03-24 Thread Brendan Eich


Brendan Eich wrote:

js [fOo,bAr].filter(function(){return /foo/i});
[fOo, bAr] 


LOL, I returned a truthy value. Fixing:

js [fOo,bAr].filter(function(s){return /foo/i.test(s)});
[fOo]

Even longer :-/.

So yes, callable regexps were useful -- that's why I made 'em callable 
way back when (Netscape 4 impl, based on Perl 4[!], fed into ES3). But 
they wound up being outlawed by ES3's text.


/be
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: RegExps in array functions

2012-03-24 Thread Lasse Reichstein

On Sat, Mar 24, 2012 at 5:54 AM, Brandon Benvie
bran...@brandonbenvie.com wrote:
 I've been struggling for a way to describe this idea, but it's almost like
 we're lacking what amounts to a valueOf where the expected result is a
 callable. The regex-as-filter is a good example of that use case. You don't
 want the base object to be a function or callable necessarily, but you do
 want to indicate that there's a clear callable representative of the object.

The problem is that there are so many ways you might want to use an
object as a function, not just one. In this case, we want it as a
predicate. In other cases you might want it to return the match as a
string, or as a match. Or you want your generic objects to match
against different properties.
The use-case here is that you have an object and want it to act as a
function in a specific case, and it's driven by the way the object is
used, which is unbounded, not the way the object is defined. You can't
be expected to predict all the ways an object is going to be used, and
create one function representation suiting the all.

For this, you need a function per usage, so the predicate function
suggested above is exactly the right level of abstraction.

/L
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

[Fwd: Setting inheritance chains, |, extends, class and all the stuff]

2012-03-24 Thread Herby Vojčík

Somehow, this did not get into the list, so I am resending it... 
hopefully it won't get duplicated.

 Original Message 
From: Herby Vojčík he...@mailbox.sk
Subject: Setting inheritance chains, |, extends, class and all the stuff
Date: Wed, 21 Mar 2012 18:58:59 +0100
To: ECMAScript discussion es-discuss@mozilla.org

Hi,

as for setting the [[Prototype]] chains with constructor function, I see
there are three scenarios:

 CP: aka 'classical inheritance'
   SubCtr.[[Prototype]] is set to SuperObj
   SubCtr.prototype.[[Prototype]] is set to SuperObj.prototype

 C: aka 'constructor inheritance'
   SubCtr.[[Prototype]] is set to SuperObj
   SubCtr.prototype.[[Prototype]] is set to Object.prototype

 P: aka 'prototype inheritance'
   SubCtr.[[Prototype]] is set to Function.prototype
   SubCtr.prototype.[[Prototype]] is set to SuperObj

I propose these should be covered syntactically this way:

 CP:
   class SubCtr extends SuperObj { ... }
   // and if possible also, to have more basic construct without
   // defining methods, just the constructor function:
   function SubCtr (...) extends SuperObj { ... }

 C:
   SuperObj | function SubCtr (...) { ... }

 P:
   SuperObj | class SubCtr { ... }

Precondition:
  class's extends keyword is agreed to create CP type inheritance

Rationale:
  If extends represent the classish CP inheritance, and looking at all
other uses of | you see that its spirit is to set only the
[[Prototype]] of thing described by RHS, we can use both and have them
unambiguously define what kind of [[Prototype]] they set only based on
which construct is used, not based on structure of LHS.

  This plays with explicit is better then implicit card - you know
instantly when you read/write code, what type of chaining you want to
accomplish, so readability and hidden bugs from using variable LHS could
be prevented.

Comments?

Herby
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Finding a safety syntax for classes

2012-03-24 Thread David Herman

On Mar 23, 2012, at 1:17 PM, Brendan Eich wrote:

  class Point extends Evil {
constructor(ax, ay) {
  public x, y;
super();
  this.x = ax;
  this.y = ay;
}
...
  }
 
  class Evil {
constructor() {
 console.log(this.x);
}
  }
 
 Should undefined by logged, or an error thrown? ...
 
 I think we can -- it seems obvious to me.
 
 ... no, error thrown. I don't agree with that, it's not how writable 
 properties on objects work in JS.

+1

That kind of protection matters if you're trying to define a statically typed 
class system that can guarantee all properties are initialized before use, 
e.g., to have non-nullable types. That's not what we're doing here. We are 
doing classes as a codification of existing practice. So you can refer to 
properties before they've been initialized, and you get undefined.

Dave

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: module path resolution

2012-03-24 Thread Wes Garland

On 23 March 2012 21:59, Irakli Gozalishvili rfo...@gmail.com wrote:


 A. ./foo.js ../foo/bar.js
 B. foo.js, foo/bar.js

 I'd suggest to resolve A type paths relative to a requirer (ether require
 module url, or document url). And resolve B
 type paths relative to a `document.baseURI`.


This is almost what CommonJS specifies and is quite close to what we use
locally.  We have a top-level path that is prepended to type B require
statements which is by default calculated from window.location.href.  We
can, of course, override it, in case we move the page to a location where
this calculation does not work -- exactly analogous to a base href
setting.

The fact that the modules themselves use relative paths to co-dependencies
means that it's easy to move modules around, from app to app, server to
server, etc.

One thing that would be nice that we don't currently have is the ability to
load modules relative to the calling web page.  This is an oversight in our
loader.

Wes

-- 
Wesley W. Garland
Director, Product Development
PageMail, Inc.
+1 613 542 2787 x 102
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Finding a safety syntax for classes

2012-03-24 Thread David Herman

On Mar 21, 2012, at 9:13 AM, Allen Wirfs-Brock wrote:

 On Mar 20, 2012, at 11:32 PM, David Herman wrote:
 
 Well, hang on now. The 'constructor' syntax is not just the constructor 
 property. It still carries special status; the semantics of a class says 
 look for the property called 'constructor' and make that the [[Call]] and 
 [[Construct]] behavior of the class.
 
 Actually, the semantics is probably more like:  the value bound to the 
 classMame  is the function object defined by the methodDefinition whose 
 propertyName is constructor.

Yes, sure. Doesn't change the point: 'constructor' is still a special, 
distinguished method.

 Regardless of whether we spell it 'constructor' or 'new' it requires special 
 semantics that says there's one distinguished method form in the body that 
 determines the constructor of the class.
 
 There is one distinguished method that is the class object (aka, the 
 constructor function).  A class declaration essentially desugars into the 
 definition of a constructor function and a object that is the prototype 
 associated with that constructor. 

Again, doesn't change the point: whether you spell it 'new' or 'constructor' 
it's still a distinguished method, and then you can desugar that however you 
want.

 The question is how we spell that. This is 99.9% a surface syntax question. 
 Tou could argue that spelling it 'new' should define a [new] method, or a 
 [new] method and a [constructor] method, or just a [constructor] 
 method. If the latter, it's semantically *identical* to spelling it 
 'constructor'. But even if we chose one of the other two alternatives, the 
 semantic differences here are minor, and the ergonomics of the syntax matter.
 
 You need to drop the [ ]'s  (although I'm not sure what you meant by them...

I meant that there is a property of the prototype that you can access via 
either p[new] or p[constructor] or both, depending on which semantics we 
decide to give. No matter what the surface syntax, any of those semantics is 
available to us. I say:

a) spell it new -- ergonomics trumps corner cases; hard cases make bad law

b) desugar it to the constructor function and the p.constructor property 
only

c) i.e., don't create a p.new property -- no more prototype pollution please

d) an explicit 'constructor' method overrides the implicit creation of the 
'constructor' method but does not define the constructor function

Why d)? Remember, the .constructor idiom is a *very weak* idiom that many JS 
programs don't follow. If a JS program has some reason to use 'constructor' for 
a different purpose, trust them.

 Personally I think the answer should be A which implies that we have 
 class-side inheritance.   This is a departure from current practice but 
 because classes are functions there is no way in ES=5.1  to set up class 
 side inheritance other than by mutating __proto__.
 
 I always found this the more appealing, but then again, if I'm supposed to 
 be going with the opposite of my instincts (see above), then maybe I should 
 disagree with you. ;)
 
 I would guess that your instinctive response comes from thinking about a 
 class is something more than just  a composite of objects.  We can talk 
 more about this later after I respond to Mark

I think you misread me. My instinctive response agrees with yours, not Mark's.

 If the value of SOMEEXPRESSION is a constructor function (typeof == 
 function  and has a prototype property) then the new constructor 
 inherits from SOMEEXPRESSION and the new prototype inherits from 
 SOMEEXPRESSION.prototype.  Otherwise, the new consructor inherits from 
 Function.prototype and the new prototype inherits from SOMEEXPRESSION.  That 
 is essentially the semantics I've defined for
SOMEEXPRESSION | function () {}

I'm not happy with that semantics, for either classes or | (I believe others 
have objected on the list to the special-case semantics for | as well). Since 
functions are objects, you can pass functions into contexts that expect an 
object and those contexts don't need to care whether the object they have is a 
function or an object. So this will lead to WTFjs moments where people take an 
object they got passed in from someone else and create a class with it, and it 
won't be wired up right because they didn't realize the object was a function.

This kind of special-case ad hoc type testing in the semantics has a bad smell. 
It reminds me of stuff like the Array constructor that special-cases the number 
argument.

 B = do{
   let B = SOMEEXPRESSION | function B(...) { ...};
   B.prototype= SOMEEXPRESSION.prototype | {constructor: B, ...}
   B
 }
 
 (or to be maximally explicit}
 
 B= do {
 let B = function B(...) {...};
 if (typeof SOMEEXPRESSION == function  typeof 
 SOMEEXPRESSION.prototype == object) B.__proto__=SOMEEXPRESSION;
 B.prototype= SOMEEXPRESSION.prototype | {constructor: B, ...}
 B
 }

Nit: you'd need to bind the result of

Re: String.prototype.split fixed fields extension

2012-03-24 Thread Russell Leggett

This seems special purpose enough (as you say, for legacy formats) and easy 
enough to implement, that it probably doesn't warrant being included in the 
language.

- Russ

On Mar 23, 2012, at 5:36 PM, Roger Andrews roger.andr...@mail104.co.uk 
wrote:

 String.prototype.split is good for cutting records into fields based on a
 delimiter string or regexp.  E.g.
   rec.split( ',' )   // split CSV record (no commas in fields)
   rec.split( /\s+/ ) // split into whitespace-separated fields
 
 How about extending 'split', or inventing a new method 'splitlen', which
 splits a record into defined-length fields?  This simplifies a long list of
 'substring's.
 
 
 Old data formats invented in the days of punch-cards are still around.  For
 example NASA's two-line element set
 (http://en.wikipedia.org/wiki/Two-line_element_set)
 which records the orbital elements of Earth satellites.
 E.g. here is the TLE for for International Space Station:
  ISS (ZARYA)
  1 25544U 98067A   08264.51782528 -.2182  0-0 -11606-4 0  2927
  2 25544  51.6416 247.4627 0006703 130.5360 325.0288 15.72125391563537
 
 Proposed design:
split( len1, len2, len3, len4, . )   // returns array of fields
 where each numeric length argument either
  (1) captures a field of the given length if positive, or
  (2) ignores a field of the absolute given length if negative.
 The special argument * could repeat the previous argument to the end of
 the record.
 
 Examples:
  // chop into 5-char fields
  rec.split( 5, * )
  // capture a 1-char and a 5-char field and all chars after index 17
  rec.split( -7, 1, 5, -4, Infinity )
 
 
 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Full Unicode based on UTF-16 proposal

2012-03-24 Thread David Herman

On Mar 23, 2012, at 11:45 AM, Roger Andrews wrote:

 Concerning UTF-16 surrogate pairs, how about a function like:
  String.isValid( str )
 to discover whether surrogates are used correctly in 'str'?
 
 Something like Array.isArray().

No need for it to be a class method, since it only operates on strings. We 
could simply have String.prototype.isValid(). Note that it would work for 
primitive strings as well, thanks to JS's automatic promotion semantics.

Dave

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Full Unicode based on UTF-16 proposal

2012-03-24 Thread David Herman

On Mar 23, 2012, at 6:30 AM, Steven Levithan wrote:

 I've been wondering whether it might be best for the /u flag to do three 
 things at once, making it an all-around support Unicode better flag:

+all my internet points

Now you're talking!!

 1. Switches from code unit to code point mode. /./gu matches any Unicode code 
 point, among other benefits outlined by Norbert.
 
 2. Makes \d\D\w\W\b\B match Unicode decimal digits and word characters. 
 [0-9], [A-Za-z0-9_], and lookaround provide fallbacks if you want to match 
 ASCII characters only while using /u.
 
 3. [New proposal] Makes /i use Unicode casefolding rules. 
 /ΣΤΙΓΜΑΣ/iu.test(στιγμας) == true.

This is really exciting.

 As for whether the switch to code-point-based matching should be universal or 
 require /u (an issue that your proposal leaves open), IMHO it's better to 
 require /u since it avoids the need for transforming \u[\u-\u] to 
 [{\u\u}-{\u\u}] and [\u-\u][\uDC00-\uDFFF] to 
 [{\u\uDC00}-{\u\uDFFF}], and additionally avoids as least three 
 potentially breaking changes (two of which are explicitly mentioned in your 
 proposal):

I haven't completely understood this part of the discussion. Looking at /u as a 
little red switch (LRS), i.e., an opportunity to make judicious breaks with 
compatibility, could we not allow character classes with unescaped non-BMP code 
points, e.g.:

js 팆팇팈팉팊.match(/[팆-퍖]+/u)
[팆팇팈팉팊]

I'm still getting up to speed on Unicode and JS string semantics, so I'm 
guessing that I'm missing a reason why that wouldn't work... Presumably the JS 
source, as a sequence of UTF-16 code units, represents the tetragram code 
points as surrogate pairs. Can we not recognize surrogate pairs in character 
classes within a /u regexp and interpret them as code points?

Dave

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Full Unicode based on UTF-16 proposal

2012-03-24 Thread David Herman

 Presumably the JS source, as a sequence of UTF-16 code units, represents the 
 tetragram code points as surrogate pairs.

Clarification: the JS source *of the regexp literal*.

Dave

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Using Object Literals as Classes

2012-03-24 Thread Claus Reinke


(function () {}) creates two object, not one.


I'm not sure what you meant here. ..


the empty object for the function's .prototype
currently seems to elude me in all its forms..

Claus

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Full Unicode based on UTF-16 proposal

2012-03-24 Thread Wes Garland

On 24 March 2012 17:22, David Herman dher...@mozilla.com wrote:

 I'm not 100% clear on this point yet, but e.g. the SourceCharacter
 production in Annex A.1 is described as any Unicode code unit.


Ugh, IMHO, that's wrong, and should be any Unicode code point.  (let the
flames begin?)

 The underlying transport format should not be a concern for the JS lexer.


 eval


Eval is a red herring: its input is defined as the contents of the given
String.  So, we come full-circle back to what's in a String?.   I'm still
partial to Brendan's BRS idea, because at least it fixes everything all at
once.

Wes

-- 
Wesley W. Garland
Director, Product Development
PageMail, Inc.
+1 613 542 2787 x 102
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Let's replace | with :: (was Breaking up the |...)

2012-03-24 Thread Brendan Eich


Andreas Rossberg wrote:

(OTOH, I'd prefer finding a way to do guards with single ':'... :-) )


I started there too (ES4, ML which was used [SML] for the ES4 reference 
implementation, other work such as Links). But TC39 argued about it and 
the desire to guard property names in object literals led us to favor ::.


let guardedObj = {p1::t1: v1, ~~~ pN::tN: vN};

Using : is ambiguous (using :, a property assignment, assuming optional 
guards, could have p:t or p:v or p:t:v). I suggested requiring 
parenthesization:


let guardedObj = {(p1:t1): v1 ~~~ (pN:tN): vN};

but no one liked that! Since :: was only twice as bad as :, and Haskell 
used it, we agreed to :: for guards.


/be
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: hexadecimal literals for floating-point

2012-03-24 Thread Brendan Eich

This isn't a huge Ask to use your term, but it's also not a big win.
We can talk about it off-agenda at next week's meeting, try to get a
feel for odds of adding it. I'll bring it up. Thanks,

/be

Roger Andrews wrote:

Perhaps I spoke a little too loosely there.
The hexadecimal exponential format allows control over the bits of a
*finite* Number, not arbitrary bit patterns. Just like the decimal
exponential format.

The hex format does not allow the programmer to mess with IEEE
Infinities, NaNs, signaling NaNs, NaN payloads. (Except that 0x1p1024
overflows to Infinity, just like 1e310.)

PS: My two examples should be 0x1p-52 and 0x1p1023 (I forgot the 0x
prefix for hexadecimal).

--
Note that some ES implementation uses NaN-based tagging to
distinguish Number values from internal object pointers.Allowing
generation of arbitrary bit pattern FP values would open the door for
pointer spoofing. At the very least, implementation would have to
reject any such hex fp literals that the implementation would
interpret as a pointer.

Allen

On Mar 18, 2012, at 6:27 PM, Roger Andrews wrote:

As a C-like language JavaScript has hexadecimal integer literals
like 0x123ABC.

C99 introduced hexadecimal floating-point literals, see:
http://publib.boulder.ibm.com/infocenter/zos/v1r12/topic/com.ibm.zos.r12.cbclx01/lit_fltpt.htm#lit_fltpt__hex_float_constants

of the general form:
[sign] 0x [hexdigits] [. hexdigits] [p [sign] decdigits]
(with the proviso that at least one significant hexdigit must appear).

The letter 'p' means times 2 to the power of; and the exponent
field is signed decimal. Case is ignored.

(In C the exponent field *must* appear in order to disambiguate a
trailing 'f' = float flag, but this would not be necessary in
JavaScript.)

Would it be a Big Ask to have hexadecimal floating-point literals in
ES6?
It extends the existing hexadecimal integer literal syntax slightly
to encompass all the finite Numbers.

This gives the programmer the ability to precisely specify every bit
of a Number in a well-understood form. And it is a natural match
with the binary64 sign-significand-exponent fields.

For example:
var EPSILON = 1p-52,
MAX_POWTWO = 1p1023;
instead of
var EPSILON = 1 / 0x10,
MAX_POWTWO = 1 / (Number.MIN_VALUE * 0x8);

JavaScript Numbers can be integer or fractional, decimal literals
too, why do hexadecimal literals discriminate against fractions?

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Finding a safety syntax for classes

2012-03-24 Thread Nadav Shesek

What about setting arbitrary expressions as the value for prototype
methods? Being able to use higher-order functions to dynamically create
functions is very important for functional style programming. I use it very
often to decorate functions, compose functions together, set partially
applied functions as methods, etc. It seems to be impossible with syntax
proposed here - I think adding it to the safety syntax is very much
needed and should not be over looked.

On Sat, Mar 24, 2012 at 4:28 PM, David Herman dher...@mozilla.com wrote:

 On Mar 21, 2012, at 9:13 AM, Allen Wirfs-Brock wrote:

 On Mar 20, 2012, at 11:32 PM, David Herman wrote:

 Well, hang on now. The 'constructor' syntax is not just the constructor
 property. It still carries special status; the semantics of a class says
 look for the property called 'constructor' and make that the [[Call]] and
 [[Construct]] behavior of the class.


 Actually, the semantics is probably more like:  the value bound to the
 classMame  is the function object defined by the methodDefinition whose
 propertyName is constructor.


 Yes, sure. Doesn't change the point: 'constructor' is still a special,
 distinguished method.

 Regardless of whether we spell it 'constructor' or 'new' it requires
 special semantics that says there's one distinguished method form in the
 body that determines the constructor of the class.


 There is one distinguished method that *is* the class object (aka, the
 constructor function).  A class declaration essentially desugars into the
 definition of a constructor function and a object that is the prototype
 associated with that constructor.


 Again, doesn't change the point: whether you spell it 'new' or
 'constructor' it's still a distinguished method, and then you can desugar
 that however you want.

 The question is how we spell that. This is 99.9% a surface syntax
 question. Tou could argue that spelling it 'new' should define a [new]
 method, or a [new] method and a [constructor] method, or just a
 [constructor] method. If the latter, it's semantically *identical* to
 spelling it 'constructor'. But even if we chose one of the other two
 alternatives, the semantic differences here are minor, and the ergonomics
 of the syntax matter.


 You need to drop the [ ]'s  (although I'm not sure what you meant by
 them...


 I meant that there is a property of the prototype that you can access via
 either p[new] or p[constructor] or both, depending on which semantics
 we decide to give. No matter what the surface syntax, any of those
 semantics is available to us. I say:

 a) spell it new -- ergonomics trumps corner cases; hard cases make
 bad law

 b) desugar it to the constructor function and the p.constructor
 property only

 c) i.e., don't create a p.new property -- no more prototype pollution
 please

 d) an explicit 'constructor' method overrides the implicit creation of
 the 'constructor' method but does not define the constructor function

 Why d)? Remember, the .constructor idiom is a *very weak* idiom that many
 JS programs don't follow. If a JS program has some reason to use
 'constructor' for a different purpose, trust them.

 Personally I think the answer should be A which implies that we have
 class-side inheritance.   This is a departure from current practice but
 because classes are functions there is no way in ES=5.1  to set up class
 side inheritance other than by mutating __proto__.


 I always found this the more appealing, but then again, if I'm supposed to
 be going with the opposite of my instincts (see above), then maybe I should
 disagree with you. ;)


 I would guess that your instinctive response comes from thinking about a
 class is something more than just  a composite of objects.  We can talk
 more about this later after I respond to Mark


 I think you misread me. My instinctive response agrees with yours, not
 Mark's.

 If the value of SOMEEXPRESSION is a constructor function (typeof ==
 function  and has a prototype property) then the new constructor
 inherits from SOMEEXPRESSION and the new prototype inherits from
 SOMEEXPRESSION.prototype.  Otherwise, the new consructor inherits from
 Function.prototype and the new prototype inherits from SOMEEXPRESSION.
  That is essentially the semantics I've defined for
SOMEEXPRESSION | function () {}


 I'm not happy with that semantics, for either classes or | (I believe
 others have objected on the list to the special-case semantics for | as
 well). Since functions are objects, you can pass functions into contexts
 that expect an object and those contexts don't need to care whether the
 object they have is a function or an object. So this will lead to WTFjs
 moments where people take an object they got passed in from someone else
 and create a class with it, and it won't be wired up right because they
 didn't realize the object was a function.

 This kind of special-case ad hoc type testing in the semantics has a bad
 smell. It reminds me of

Re: Finding a safety syntax for classes

2012-03-24 Thread David Herman

On Mar 24, 2012, at 3:01 PM, Nadav Shesek wrote:

 What about setting arbitrary expressions as the value for prototype methods? 
 Being able to use higher-order functions to dynamically create functions is 
 very important for functional style programming. I use it very often to 
 decorate functions, compose functions together, set partially applied 
 functions as methods, etc. It seems to be impossible with syntax proposed 
 here - I think adding it to the safety syntax is very much needed and 
 should not be over looked.

Yep, we already agreed to this -- see the grammar on Allen's maximally 
minimal proposal:


http://wiki.ecmascript.org/doku.php?id=strawman:maximally_minimal_classes#class_declarations_and_expressions

Dave

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Full Unicode based on UTF-16 proposal

2012-03-24 Thread Norbert Lindenberg


On Mar 23, 2012, at 6:30 , Steven Levithan wrote:

 Norbert Lindenberg wrote:
 
 I've updated the proposal based on the feedback received so far. Changes
 are listed in the Updates section.
 http://norbertlindenberg.com/2012/03/ecmascript-supplementary-characters/
 
 Cool.
 
 From the proposal's Updates section:
 
 Indicated that u may not be the actual character for the flag for code
 point mode in regular expressions, as a u flag has already been proposed
 for Unicode-aware digit and word character matching.
 
 I've been wondering whether it might be best for the /u flag to do three 
 things at once, making it an all-around support Unicode better flag:
 
 1. Switches from code unit to code point mode. /./gu matches any Unicode code 
 point, among other benefits outlined by Norbert.
 
 2. Makes \d\D\w\W\b\B match Unicode decimal digits and word characters. 
 [0-9], [A-Za-z0-9_], and lookaround provide fallbacks if you want to match 
 ASCII characters only while using /u.

One concern: I think code point based matching should be the default for regex 
literals within modules (where we know the code is written for Harmony). Does 
it make sense to also interpret \d\D\w\W\b\B as full Unicode sets for such 
literals?

In the other direction it's clear that using /u for \d\D\w\W\b\B has to imply 
code point mode.

 3. [New proposal] Makes /i use Unicode casefolding rules. 
 /ΣΤΙΓΜΑΣ/iu.test(στιγμας) == true.

We probably should review the complete Unicode Technical Standard #18, Unicode 
Regular Expressions, and see how we can upgrade RegExp for better Unicode 
support. Maybe on a separate thread...

 Item number 3 is inspired by but different than Java's lowercase u flag for 
 Unicode casefolding. In Java, flag u itself enables Unicode casefolding and 
 does not need to be paired with flag i (which is equivalent to ES's /i).
 
 As an aside, merging these three things would likely lead to /u seeing 
 widespread use when dealing with anything more than ASCII, at least in 
 environments where you don't have to worry about backcompat. This would help 
 developers avoid stumbling on code unit issues in the small minority of cases 
 where non-BMP characters are used or encountered. If /u's only purpose was to 
 switch to code point mode, most likely it would be used *far* less often, and 
 more developers would continue to get bitten by code-unit-based processing.

Good thinking :-)

 As for whether the switch to code-point-based matching should be universal or 
 require /u (an issue that your proposal leaves open), IMHO it's better to 
 require /u since it avoids the need for transforming \u[\u-\u] to 
 [{\u\u}-{\u\u}] and [\u-\u][\uDC00-\uDFFF] to 
 [{\u\uDC00}-{\u\uDFFF}], and additionally avoids as least three 
 potentially breaking changes (two of which are explicitly mentioned in your 
 proposal):
 
 1. [S]ome applications might have processed gunk with regular expressions 
 where neither the 'characters' in the patterns nor the input to be matched 
 are text.
 
 2. s.match(/^.$/)[0].length can now be 2.
 I'll add, /.{3}/.exec(s)[0].length can now be anywhere between 3 and 6.
 
 3. /./g.exec(s) can now increment the regex's lastIndex by 2.
 
 -- Steven Levithan

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Full Unicode based on UTF-16 proposal

2012-03-24 Thread Norbert Lindenberg


On Mar 23, 2012, at 7:12 , Lasse Reichstein wrote:

 On Fri, Mar 23, 2012 at 2:30 PM, Steven Levithan
 steves_l...@hotmail.com wrote:
 I've been wondering whether it might be best for the /u flag to do three
 things at once, making it an all-around support Unicode better flag:
 
 ...
 
 3. [New proposal] Makes /i use Unicode casefolding rules.
 
 Yey, I'm for it :)
 Especially if it means dropping the rather naïve canonicalize function
 that can't canonicalize an ASCII character with a non-ASCII character.
 
 /ΣΤΙΓΜΑΣ/iu.test(στιγμας) == true.
 
 I think a compliant implementation should (read: ought to) already get
 that example, since στιγμας.toUpperCase() == ΣΤΙΓΜΑΣ.toUpperCase()
 in the browsers I have checked, and the ignore-case canonicalization
 is based on toUpperCase. Alas, most of the implementations miss it
 anyway.

According to the ES5 spec, /ΣΤΙΓΜΑΣ/i.test(στιγμας) must be true indeed. 
Chrome and Node (i.e., V8) and IE get this right; Safari, Firefox, and Opera 
don't.

Note that toUpperCase allows mappings from 1 to multiple code units, while 
RegExp canonicalization in ES5 doesn't, so /SS/i.test(ß) === false even 
though SS.toUpperCase() === ß.toUpperCase().

Norbert

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Full Unicode based on UTF-16 proposal

2012-03-24 Thread Norbert Lindenberg

Thanks for the detailed comments! Replies below.

Norbert


On Mar 23, 2012, at 9:46 , Phillips, Addison wrote:

 Comments follow.
 
 1. Definition of string. You say:
 
 --
 However,
ECMAScript does not place any restrictions or requirements on the sequence
of code units in a String value, so it may be ill-formed when interpreted
as a UTF-16 code unit sequence.
 --
 
 I know what you mean, but others might not. Perhaps:
 
 --
 However, ECMAScript does not place any restrictions or requirements on the 
 sequence of code units in a String value, so the sequence of code units may 
 contain code units that are not valid in Unicode or sequences that do not 
 represent Unicode code points (such as unpaired surrogates).
 --

I can add a note that ill-formed here means containing unpaired surrogates. If 
I read chapter 3 of the Unicode Standard correctly, there's no other way for 
UTF-16 to be ill-formed. UTF-16 code units by themselves cannot be invalid - 
any 16-bit value can occur in a well-formed UTF-16 string.

 2. In this section, I would define string after code unit and code point. I 
 would also include a definition of surrogates/surrogate pairs.

Makes sense.

 3. Under text interpretation you say:
 
 --
 For compatibility with existing applications, it
  has to allow surrogate code points (code points between U+D800 and U+DFFF 
 which
  can never represent characters).
 --
 
 This would (see above) benefit from having a definition in place. As noted, 
 this is slightly incomplete, since surrogate code units are used to form 
 supplementary characters.

The text is about surrogate code points, not about surrogate code units.

 4. 0xFFFE and 0x are non-characters in Unicode. I do think you do the 
 right thing here. It's just a nit that you never note this ;-).
 
 5. Editorial unnecessary ;-):
 
 --
 This transformation is rather ugly, but I’m afraid it’s the price ECMAScript
  has to pay for being 12 years late in supporting supplementary characters.
 --
 
 6. Under 'details' you suggest a number of renamings. Are these strictly 
 necessary? The term 'character' could be taken to mean 'code point' instead, 
 with an explanatory note.

Unfortunately, the term character is poisoned in ES5 by a redefinition as 
code unit (chapter 6). For ES6, I'd like the spec to be really clear where it 
means code units and where it means code points. Maybe we can then reintroduce 
character in ES7...

 7. Skipping down a lot, to section 6 source text, you propose:
 
 --
 The text is expected to have been normalised
to Unicode Normalization Form C (Canonical Decomposition, followed by 
 Canonical
Composition), as described in Unicode Standard Annex #15.
 --
 
 I think this should be removed or modified.

This sentence is essentially copied from ES5 (with corrected references), and 
as I copied it, I made a note to myself that we need to discuss normalization, 
just not as part of this proposal...

 Automatic application of NFC is not always desirable, as it can affect 
 presentation or processing. Perhaps:
 
 --
 Normalization of the text to Unicode Normalization Form C (Canonical 
 Decomposition, followed by Canonical Composition), as described in Unicode 
 Standard Annex #15, is recommended when transcoding from another character 
 encoding.
 --
 
 8. In 7.6 Identifier Names and Identifiers you don't actually forbid 
 unpaired surrogates or non-characters in the text (Identifier_Part:: does 
 this by implication). Perhaps state it? Also, ZWJ and ZWNJ are permitted as 
 the last character in an identifier.

I can add a note about surrogate code points and non-characters, but, as you 
say, they are already ruled out because they can't have the required Unicode 
properties ID_Start or ID_Continue.

The use of ZWJ and ZWNJ is unchanged from ES5. UAX 31 has much stricter rules 
on where they would be allowed, but I'm not sure we have a strong case for 
changing the rules in ECMAScript.
http://www.unicode.org/reports/tr31/tr31-9.html#Layout_and_Format_Control_Characters

 9. 15.5.4.6: you say (a nonnegative integer less than 0x10), whereas 
 it should say: (a nonnegative integer less than or equal to 0x10)

Will fix.

 10. In the section on what about utf-32, you say:  and the code points 
 start at positions 1, 2, 3.. Of course this should be ... and the code 
 points start at positions 0, 1, 2.

Of course.

 Thanks for this proposal!
 
 Addison
 
 -Original Message-
 From: Norbert Lindenberg [mailto:ecmascr...@norbertlindenberg.com]
 Sent: Thursday, March 22, 2012 10:14 PM
 To: es-discuss@mozilla.org
 Subject: Re: Full Unicode based on UTF-16 proposal
 
 I've updated the proposal based on the feedback received so far. Changes are
 listed in the Updates section.
 http://norbertlindenberg.com/2012/03/ecmascript-supplementary-characters/
 
 Norbert
 
 
 On Mar 16, 2012, at 0:18 , Norbert Lindenberg wrote:
 
 Based on my prioritization of goals for support for full Unicode in

Re: Finding a safety syntax for classes

2012-03-24 Thread Allen Wirfs-Brock


On Mar 24, 2012, at 7:28 AM, David Herman wrote:

 On Mar 21, 2012, at 9:13 AM, Allen Wirfs-Brock wrote:
 
 On Mar 20, 2012, at 11:32 PM, David Herman wrote:
 
 Well, hang on now. The 'constructor' syntax is not just the constructor 
 property. It still carries special status; the semantics of a class says 
 look for the property called 'constructor' and make that the [[Call]] and 
 [[Construct]] behavior of the class.
 
 Actually, the semantics is probably more like:  the value bound to the 
 classMame  is the function object defined by the methodDefinition whose 
 propertyName is constructor.
 
 Yes, sure. Doesn't change the point: 'constructor' is still a special, 
 distinguished method.

Yes it is.  I think the core issue WRT to identifying this distinguished method 
with an identifier other than constructor concerns how it would relate to an 
actual method named constructor.

The conservative course is to use constructor as the distinguishing 
identifier.  It is a direct reflection of the underlying ES=5.1 chapter 13 
and15 class model and it doesn't introduce any of these name naming issues.

I primarily favor sticking with constructor because the 
safety/maximally-miminal proposal is all about being conservative...
 ...
 The question is how we spell that. This is 99.9% a surface syntax question. 
 Tou could argue that spelling it 'new' should define a [new] method, or a 
 [new] method and a [constructor] method, or just a [constructor] 
 method. If the latter, it's semantically *identical* to spelling it 
 'constructor'. But even if we chose one of the other two alternatives, the 
 semantic differences here are minor, and the ergonomics of the syntax 
 matter.
 
 You need to drop the [ ]'s  (although I'm not sure what you meant by them...
 
 I meant that there is a property of the prototype that you can access via 
 either p[new] or p[constructor] or both, depending on which semantics we 
 decide to give. No matter what the surface syntax, any of those semantics is 
 available to us. I say:
 
 a) spell it new -- ergonomics trumps corner cases; hard cases make bad 
 law
deviates from chapters 13 and 15.  

  (class () {})  isn't interchangeable with (function () {}). Shouldn't it be?
 
 b) desugar it to the constructor function and the p.constructor property 
 only

but presumably means that an instance method named new can't be defined using a 
class definition.  Or, perhaps only if new is explicitly string quoted as a 
property name. new is not a totally unreasonable method name.  Also 
presumably means that defining an instance method named constructor is 
disallowed.

 
 c) i.e., don't create a p.new property -- no more prototype pollution 
 please

I presume you mean it also doesn't define a p.constructor property.  This has 
similar problems to a).  It deviates from the legacy ES class model

Also, most instance inherit from Object.prototype so they will still have an 
inherited constructor property whose value is Object.

 
 d) an explicit 'constructor' method overrides the implicit creation of 
 the 'constructor' method but does not define the constructor function
 
 Why d)? Remember, the .constructor idiom is a *very weak* idiom that many JS 
 programs don't follow. If a JS program has some reason to use 'constructor' 
 for a different purpose, trust them.

I believe the constructor idiom is most commonly not followed today when the 
prototype property of a function is set to a new object (perhaps defined using 
an object literal) and correctly dynamically setting the constructor property 
is an extra step that is easily forgotten (an usually as no ill-effects) 

However, with syntactic class definition support (including inheritance) in the 
language I am sure that we are going to see much more use of OO idioms in ES 
programs (if that wasn't the case why would be add them).  A very common idiom 
is to query the class of an object.  Just look how frequently you see could 
like p.class in Java or Ruby code  or( p class) in Smalltalk code. Thew 
equivalent of this using ES chapter 13/`5 objects is p.constructor.  Which 
would you prefer to see in future ES code p.constructor===q.constructor or 
p.new===q.new

BTW, many OO experts including myself, tell people that querying the class of 
an object in this manner is an undesirable practice.  Maybe even an 
anti-pattern.  But in reality it is widely done and the negatives all relate to 
non-fucctional issues like code flexibility and reusability. Developer are 
going to do it, probably a lot.


 
 Personally I think the answer should be A which implies that we have 
 class-side inheritance.   This is a departure from current practice but 
 because classes are functions there is no way in ES=5.1  to set up 
 class side inheritance other than by mutating __proto__.
 
 I always found this the more appealing, but then again, if I'm supposed to 
 be going with the opposite of my instincts (see above), then maybe I should 
 disagree

Re: RegExps in array functions

2012-03-24 Thread Brendan Eich


Lasse Reichstein wrote:

The problem is that there are so many ways you might want to use an
object as a function, not just one. In this case, we want it as a
predicate. In other cases you might want it to return the match as a
string, or as a match. Or you want your generic objects to match
against different properties.
The use-case here is that you have an object and want it to act as a
function in a specific case, and it's driven by the way the object is
used, which is unbounded, not the way the object is defined. You can't
be expected to predict all the ways an object is going to be used, and
create one function representation suiting the all.

For this, you need a function per usage, so the predicate function
suggested above is exactly the right level of abstraction.


Well argued. I agree in general.

In this specific case, just a couple of thoughts (not disagreeing):

When I made regexps callable way back when, I chose calling a regexp as 
short for exec'ing, which creates a match array result. Then I optimized 
the exec/call built-in to peek into its continuation to see if the 
result was used only for its boolishness. If so, the built-in would pass 
a flag parameter to the common internal API for regexp execution to 
avoid creating the result array, substituting false/true for null/the-array.


One way for generic callback-invoking map, forEach, filter, etc. 
functions to go: call the .call method of the callback, assuming the 
callback is a function but enabling non-function objects to delegate to 
a call method that knows how to invoke them. This is shown at Steve 
Levithan's XRegExp github readme:


https://github.com/slevithan/XRegExp

// XRegExp regexes get call and apply methods
// To demonstrate, let's first create the function we'll be using...
function  filter(array,  fn)  {
var  res  =  [];
array.forEach(function  (el)  {if  (fn.call(null,  el))  res.push(el);});
return  res;
}
// Now we can filter arrays using functions and regexes
filter(['a',  'ba',  'ab',  'b'],  XRegExp('^a'));  // -  ['a', 'ab']


This may be optimization-hostile at first glance, but VMs optimize call 
and apply with special caching and inlining.


The bad old hardcoded-per-builtin optimization to avoid creating a 
useless-except-for-its-boolish-value result array is ideally generalized 
via inlining and type inference to cover many such results-allocating 
functions that are sometimes or often called just to test that they 
return non-null.


 /be

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Finding a safety syntax for classes

2012-03-24 Thread Brendan Eich


David Herman wrote:

On Mar 24, 2012, at 3:01 PM, Nadav Shesek wrote:


What about setting arbitrary expressions as the value for prototype methods? Being able 
to use higher-order functions to dynamically create functions is very important for 
functional style programming. I use it very often to decorate functions, compose 
functions together, set partially applied functions as methods, etc. It seems to be 
impossible with syntax proposed here - I think adding it to the safety syntax 
is very much needed and should not be over looked.


Yep, we already agreed to this -- see the grammar on Allen's maximally 
minimal proposal:

 
http://wiki.ecmascript.org/doku.php?id=strawman:maximally_minimal_classes#class_declarations_and_expressions


I don't see what I read Nadav as asking for: the ability to initialize a 
prototype method from an arbitrary expression. What am I missing?



   Class bodies

ClassElement :
PrototypePropertyDefinition
;   //semicolons are allowed but have no significance

PrototypePropertyDefinition :
PropertyName ( FormalParameterList? ) { FunctionBody } // method
*PropertyName ( FormalParameterList? ) { FunctionBody } // generator 
method
get PropertyName ( )  { FunctionBody } // getter
set PropertyName ( ProopertySetParameterList ) { FunctionBody } // setter



/be
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Finding a safety syntax for classes

2012-03-24 Thread Allen Wirfs-Brock


On Mar 24, 2012, at 7:29 PM, Brendan Eich wrote:

 David Herman wrote:
 On Mar 24, 2012, at 3:01 PM, Nadav Shesek wrote:
 
 What about setting arbitrary expressions as the value for prototype 
 methods? Being able to use higher-order functions to dynamically create 
 functions is very important for functional style programming. I use it very 
 often to decorate functions, compose functions together, set partially 
 applied functions as methods, etc. It seems to be impossible with syntax 
 proposed here - I think adding it to the safety syntax is very much 
 needed and should not be over looked.
 
 Yep, we already agreed to this -- see the grammar on Allen's maximally 
 minimal proposal:
 
 
 http://wiki.ecmascript.org/doku.php?id=strawman:maximally_minimal_classes#class_declarations_and_expressions
 
 I don't see what I read Nadav as asking for: the ability to initialize a 
 prototype method from an arbitrary expression. What am I missing?

I suspect Dave misinterpreted Nadav's question.  So did I, when I originally 
read it.

The superclass can be set to an arbitrary AssignmentExpression.  This permits 
using a higher-order functions to be to define the [[Prototype]] of the class' 
prototype object. Potentially this mechanisms might be used to essentially have 
the effect of injecting dynamically generated methods into the class 
definition.  However, they would be inherited methods of the prototype object 
rather than own methods, although that may not matter.

To actually add a computed function as the value of a prototype object property 
within the class definition is pretty much the same thing as defining an 
arbitrary valued  prototype data property.  Defining non-method prototype 
properties is one of the features that we have previous been unable to reach 
consensus on and for that reason was intentionally excluded from the 
maximal-minimal proposal. As the proposal says:

There is (intentionally) no direct declarative way to define either prototype 
data properties (other than methods) class properties, or instance property
Class properties and prototype data properties need be created outside the 
declaration.


Allen



___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: RegExps in array functions

Re: RegExps in array functions

[Fwd: Setting inheritance chains, |, extends, class and all the stuff]

Re: Finding a safety syntax for classes

Re: module path resolution

Re: Finding a safety syntax for classes

Re: String.prototype.split fixed fields extension

Re: Full Unicode based on UTF-16 proposal

Re: Full Unicode based on UTF-16 proposal

Re: Full Unicode based on UTF-16 proposal

Re: Using Object Literals as Classes

Re: Full Unicode based on UTF-16 proposal

Re: Let's replace | with :: (was Breaking up the |...)

Re: hexadecimal literals for floating-point

Re: Finding a safety syntax for classes

Re: Finding a safety syntax for classes

Re: Full Unicode based on UTF-16 proposal

Re: Full Unicode based on UTF-16 proposal

Re: Full Unicode based on UTF-16 proposal

Re: Finding a safety syntax for classes

Re: RegExps in array functions

Re: Finding a safety syntax for classes

Re: Finding a safety syntax for classes

23 matches

Site Navigation

Mail list logo

Footer information